Changes since v9 [1]:
* Added acks from Aneesh and Steven Steven Rostedt.
Changes since v8 [2]:
* Updated proposed changes to remove usage of term 'SCM' due to
ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
[ Aneesh ]
* Renamed NVDIMM health defines from PAPR_SCM_DIMM_* to PAPR_PMEM_* .
* Updates to various newly introduced identifiers in 'papr_scm.c'
removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed uapi header 'papr_scm_pdsm.h' to 'papr_pdsm.h'.
* Renamed sysfs ABI document from sysfs-bus-papr-scm to
sysfs-bus-papr-pmem.
* No behavioural changes from v8 [1].
[1] https://lore.kernel.org/linux-nvdimm/[email protected]
[2] https://lore.kernel.org/linux-nvdimm/[email protected]/
---
The PAPR standard[3][5] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[4]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[4]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[6]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
References:
[3] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[4] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[5] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[6] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v9
---
Vaibhav Jain (5):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 ++
Documentation/powerpc/papr_hcalls.rst | 46 ++-
arch/powerpc/include/uapi/asm/papr_pdsm.h | 175 +++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 363 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 600 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
--
2.26.2
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit bitmap, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.
The patch also adds a documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-pmem.
[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")
Cc: "Aneesh Kumar K . V" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ira Weiny <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* Added ack from Aneesh.
v8..v9:
* Rename some variables and defines to reduce usage of term SCM
replacing it with PMEM [Dan Williams, Aneesh]
* s/PAPR_SCM_DIMM/PAPR_PMEM/g
* s/papr_scm_nd_attributes/papr_nd_attributes/g
* s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
* s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
* Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
v7..v8:
* Update type of variable 'rc' in __drc_pmem_query_health() and
drc_pmem_query_health() to long and int respectively. [ Ira ]
* Updated the patch description to s/64 bit Big Endian Number/64-bit
bitmap/ [ Ira, Aneesh ].
Resend:
* None
v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
READ_ONCE(). [Mpe]
v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
[Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
integer [Mpe]
* Updated patch description to reflect the changes made in this
version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
flags_show() [Dan Williams]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
NVDIMM unarmed [Aneesh]
v1..v2 :
* New patch in the series.
---
Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
arch/powerpc/platforms/pseries/papr_scm.c | 169 +++++++++++++++++-
2 files changed, 194 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
new file mode 100644
index 000000000000..5b10d036a8d4
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -0,0 +1,27 @@
+What: /sys/bus/nd/devices/nmemX/papr/flags
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: linuxppc-dev <[email protected]>, [email protected],
+Description:
+ (RO) Report flags indicating various states of a
+ papr-pmem NVDIMM device. Each flag maps to a one or
+ more bits set in the dimm-health-bitmap retrieved in
+ response to H_SCM_HEALTH hcall. The details of the bit
+ flags returned in response to this hcall is available
+ at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+ the flags reported in this sysfs file:
+
+ * "not_armed" : Indicates that NVDIMM contents will not
+ survive a power cycle.
+ * "flush_fail" : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+ * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+ * "encrypted" : NVDIMM contents are encrypted.
+ * "smart_notify": There is health event for the NVDIMM.
+ * "scrubbed" : Indicating that contents of the
+ NVDIMM have been scrubbed.
+ * "locked" : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..149431594839 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -12,6 +12,7 @@
#include <linux/libnvdimm.h>
#include <linux/platform_device.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
@@ -22,6 +23,44 @@
(1ul << ND_CMD_GET_CONFIG_DATA) | \
(1ul << ND_CMD_SET_CONFIG_DATA))
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_PMEM_UNARMED (1ULL << (63 - 0))
+/* SCM device failed to persist memory contents */
+#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_PMEM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_PMEM_EMPTY (1ULL << (63 - 3))
+/* SCM device memory life remaining is critically low */
+#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4))
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5))
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_PMEM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
+/* SCM device is encrypted */
+#define PAPR_PMEM_ENCRYPTED (1ULL << (63 - 8))
+/* SCM device has been scrubbed and locked */
+#define PAPR_PMEM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
+ PAPR_PMEM_HEALTH_FATAL | \
+ PAPR_PMEM_HEALTH_UNHEALTHY)
+
+/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+ /* Protect dimm health data from concurrent read/writes */
+ struct mutex health_mutex;
+
+ /* Last time the health information of the dimm was updated */
+ unsigned long lasthealth_jiffies;
+
+ /* Health information for the dimm */
+ u64 health_bitmap;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
+ * health information.
+ */
+static int __drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ long rc;
+
+ /* issue the hcall */
+ rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query health information, Err:%ld\n", rc);
+ rc = -ENXIO;
+ goto out;
+ }
+
+ p->lasthealth_jiffies = jiffies;
+ p->health_bitmap = ret[0] & ret[1];
+
+ dev_dbg(&p->pdev->dev,
+ "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
+ ret[0], ret[1]);
+out:
+ return rc;
+}
+
+/* Min interval in seconds for assuming stable dimm health */
+#define MIN_HEALTH_QUERY_INTERVAL 60
+
+/* Query cached health info and if needed call drc_pmem_query_health */
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long cache_timeout;
+ int rc;
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ /* Jiffies offset for which the health data is assumed to be same */
+ cache_timeout = p->lasthealth_jiffies +
+ msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
+
+ /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
+ if (time_after(jiffies, cache_timeout))
+ rc = __drc_pmem_query_health(p);
+ else
+ /* Assume cached health data is valid */
+ rc = 0;
+
+ mutex_unlock(&p->health_mutex);
+ return rc;
+}
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
@@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t flags_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+ struct seq_buf s;
+ u64 health;
+ int rc;
+
+ rc = drc_pmem_query_health(p);
+ if (rc)
+ return rc;
+
+ /* Copy health_bitmap locally, check masks & update out buffer */
+ health = READ_ONCE(p->health_bitmap);
+
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ seq_buf_printf(&s, "not_armed ");
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ seq_buf_printf(&s, "flush_fail ");
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ seq_buf_printf(&s, "restore_fail ");
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ seq_buf_printf(&s, "encrypted ");
+
+ if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ seq_buf_printf(&s, "smart_notify ");
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
+ seq_buf_printf(&s, "scrubbed locked ");
+
+ if (seq_buf_used(&s))
+ seq_buf_printf(&s, "\n");
+
+ return seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(flags);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_nd_attributes[] = {
+ &dev_attr_flags.attr,
+ NULL,
+};
+
+static struct attribute_group papr_nd_attribute_group = {
+ .name = "papr",
+ .attrs = papr_nd_attributes,
+};
+
+static const struct attribute_group *papr_nd_attr_groups[] = {
+ &papr_nd_attribute_group,
+ NULL,
+};
+
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
struct device *dev = &p->pdev->dev;
@@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_LABELING, &dimm_flags);
- p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+ p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
@@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
if (!p)
return -ENOMEM;
+ /* Initialize the dimm mutex */
+ mutex_init(&p->health_mutex);
+
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
--
2.26.2
'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its still not
exported to kernel loadable modules limiting its usage.
Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
kernel loadable GPL modules. This symbol will be used in later parts
of this patch-set to simplify content creation for a sysfs attribute.
Cc: Piotr Maziarz <[email protected]>
Cc: Cezary Rojewski <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* Added ack from Steven Rostedt
v8..v9:
* None
v7..v8:
* Updated the patch title [ Christoph Hellwig ]
* Updated patch description to replace confusing term 'external kernel
modules' to 'kernel lodable modules'.
Resend:
* Added ack from Steven Rostedt
v6..v7:
* New patch in the series
---
lib/seq_buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
return ret;
}
+EXPORT_SYMBOL_GPL(seq_buf_printf);
#ifdef CONFIG_BINARY_PRINTF
/**
--
2.26.2
This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_pdsm_health() that queries the nvdimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.
The patch also introduces a new member 'struct papr_scm_priv.health'
thats an instance of 'struct nd_papr_pdsm_health' to cache the health
information of a nvdimm. As a result functions drc_pmem_query_health()
and flags_show() are updated to populate and use this new struct
instead of a u64 integer that was earlier used.
Cc: "Aneesh Kumar K . V" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ira Weiny <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* Added ack from Aneesh.
v8..v9:
* s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
* s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
* Renamed papr_scm_get_health() to papr_psdm_health()
* Updated patch description to replace papr-scm dimm with nvdimm.
v7..v8:
* None
Resend:
* None
v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
__drc_pmem_query_health() bypassing the cache [Mpe].
v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
gaurd against possibility of different compilers adding different
paddings to the struct [ Dan Williams ]
* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
'bool' and also updated drc_pmem_query_health() to take this into
account. [ Dan Williams ]
v4..v5:
* None
v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
from enum to #defines [Aneesh]
v1..v2:
* New patch in the series
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++++++
arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
2 files changed, 147 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
index 6407fefcc007..411725a91591 100644
--- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
*/
enum papr_pdsm {
PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_HEALTH,
PAPR_PDSM_MAX,
};
@@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
return (void *)(pcmd->payload);
}
+/* Various nvdimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY 0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL 2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * dimm_unarmed : Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown : Previous shutdown did not persist contents.
+ * dimm_bad_restore : Contents from previous shutdown werent restored.
+ * dimm_scrubbed : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ */
+struct nd_papr_pdsm_health_v1 {
+ __u8 dimm_unarmed;
+ __u8 dimm_bad_shutdown;
+ __u8 dimm_bad_restore;
+ __u8 dimm_scrubbed;
+ __u8 dimm_locked;
+ __u8 dimm_encrypted;
+ __u16 dimm_health;
+} __packed;
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_PDSM_HEALTH_VERSION 1
+
#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 5e2237e7ec08..c0606c0c659c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -88,7 +88,7 @@ struct papr_scm_priv {
unsigned long lasthealth_jiffies;
/* Health information for the dimm */
- u64 health_bitmap;
+ struct nd_papr_pdsm_health health;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
static int __drc_pmem_query_health(struct papr_scm_priv *p)
{
unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ u64 health;
long rc;
/* issue the hcall */
@@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
if (rc != H_SUCCESS) {
dev_err(&p->pdev->dev,
"Failed to query health information, Err:%ld\n", rc);
- rc = -ENXIO;
- goto out;
+ return -ENXIO;
}
p->lasthealth_jiffies = jiffies;
- p->health_bitmap = ret[0] & ret[1];
+ health = ret[0] & ret[1];
dev_dbg(&p->pdev->dev,
"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
ret[0], ret[1]);
-out:
- return rc;
+
+ memset(&p->health, 0, sizeof(p->health));
+
+ /* Check for various masks in bitmap and set the buffer */
+ if (health & PAPR_PMEM_UNARMED_MASK)
+ p->health.dimm_unarmed = 1;
+
+ if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ p->health.dimm_bad_shutdown = 1;
+
+ if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ p->health.dimm_bad_restore = 1;
+
+ if (health & PAPR_PMEM_ENCRYPTED)
+ p->health.dimm_encrypted = 1;
+
+ if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
+ p->health.dimm_locked = 1;
+ p->health.dimm_scrubbed = 1;
+ }
+
+ if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
+ p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+ if (health & PAPR_PMEM_HEALTH_CRITICAL)
+ p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+
+ if (health & PAPR_PMEM_HEALTH_FATAL)
+ p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
+
+ return 0;
}
/* Min interval in seconds for assuming stable dimm health */
@@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+/* Fetch the DIMM health info and populate it in provided package. */
+static int papr_pdsm_health(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ int rc;
+ size_t copysize = sizeof(p->health);
+
+ /* Ensure dimm health mutex is taken preventing concurrent access */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ goto out;
+
+ /* Always fetch upto date dimm health data ignoring cached values */
+ rc = __drc_pmem_query_health(p);
+ if (rc)
+ goto out_unlock;
+ /*
+ * If the requested payload version is greater than one we know
+ * about, return the payload version we know about and let
+ * caller/userspace handle.
+ */
+ if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
+ pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
+
+ if (pkg->hdr.nd_size_out < copysize) {
+ dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
+ pkg->hdr.nd_size_out, copysize);
+ rc = -ENOSPC;
+ goto out_unlock;
+ }
+
+ dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
+ copysize, pkg->payload_version);
+
+ /* Copy the health struct to the payload */
+ memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
+ pkg->hdr.nd_fw_size = copysize;
+
+out_unlock:
+ mutex_unlock(&p->health_mutex);
+
+out:
+ /*
+ * Put the error in out package and return success from function
+ * so that errors if any are propogated back to userspace.
+ */
+ pkg->cmd_status = rc;
+ dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
+
+ return 0;
+}
+
static int papr_scm_service_pdsm(struct papr_scm_priv *p,
struct nd_pdsm_cmd_pkg *call_pkg)
{
@@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
/* Depending on the DSM command call appropriate service routine */
switch (call_pkg->hdr.nd_command) {
+ case PAPR_PDSM_HEALTH:
+ return papr_pdsm_health(p, call_pkg);
+
default:
dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
call_pkg->hdr.nd_command);
@@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
struct nvdimm *dimm = to_nvdimm(dev);
struct papr_scm_priv *p = nvdimm_provider_data(dimm);
struct seq_buf s;
- u64 health;
int rc;
rc = drc_pmem_query_health(p);
if (rc)
return rc;
- /* Copy health_bitmap locally, check masks & update out buffer */
- health = READ_ONCE(p->health_bitmap);
-
seq_buf_init(&s, buf, PAGE_SIZE);
- if (health & PAPR_PMEM_UNARMED_MASK)
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ if (p->health.dimm_unarmed)
seq_buf_printf(&s, "not_armed ");
- if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
+ if (p->health.dimm_bad_shutdown)
seq_buf_printf(&s, "flush_fail ");
- if (health & PAPR_PMEM_BAD_RESTORE_MASK)
+ if (p->health.dimm_bad_restore)
seq_buf_printf(&s, "restore_fail ");
- if (health & PAPR_PMEM_ENCRYPTED)
+ if (p->health.dimm_encrypted)
seq_buf_printf(&s, "encrypted ");
- if (health & PAPR_PMEM_SMART_EVENT_MASK)
+ if (p->health.dimm_health)
seq_buf_printf(&s, "smart_notify ");
- if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
- seq_buf_printf(&s, "scrubbed locked ");
+ if (p->health.dimm_scrubbed)
+ seq_buf_printf(&s, "scrubbed ");
+
+ if (p->health.dimm_locked)
+ seq_buf_printf(&s, "locked ");
+
+ mutex_unlock(&p->health_mutex);
if (seq_buf_used(&s))
seq_buf_printf(&s, "\n");
--
2.26.2
Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
module and add the command family NVDIMM_FAMILY_PAPR to the white list
of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
nvdimm command mask and implement necessary scaffolding in the module
to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_cmd_pkg.nd_command' and size of payload that need to be
sent/received for servicing the PDSM.
A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.
Cc: "Aneesh Kumar K . V" <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Ira Weiny <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* Added ack from Aneesh.
v8..v9:
* Reduced the usage of term SCM replacing it with appropriate
replacement [ Dan Williams, Aneesh ]
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
* s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
* s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
* Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
* Minor update to patch description.
v7..v8:
* Removed the 'payload_offset' field from 'struct
nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
* To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
'reserved' field of 10-bytes is introduced. [ Aneesh ]
* Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
[ Ira ]
Resend:
* None
v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
[Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
[Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
[ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
[ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
against different compiler adding paddings between the fields.
[ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]
v1..v2 :
* None
---
arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++++++++++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 101 +++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 232 insertions(+), 6 deletions(-)
create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
new file mode 100644
index 000000000000..6407fefcc007
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
@@ -0,0 +1,136 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
+
+#include <linux/types.h>
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * envelope which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
+ * There is reserved field that can used to introduce new fields to the
+ * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
+ * lies at a 8-byte boundary.
+ *
+ * +-------------+---------------------+---------------------------+
+ * | 64-Bytes | 16-Bytes | Max 176-Bytes |
+ * +-------------+---------------------+---------------------------+
+ * | nd_pdsm_cmd_pkg | |
+ * |-------------+ | |
+ * | nd_cmd_pkg | | |
+ * +-------------+---------------------+---------------------------+
+ * | nd_family | | |
+ * | nd_size_out | cmd_status | |
+ * | nd_size_in | payload_version | payload |
+ * | nd_command | reserved | |
+ * | nd_fw_size | | |
+ * +-------------+---------------------+---------------------------+
+ *
+ * PDSM Header:
+ *
+ * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
+ * 'payload_version' : (In/Out) Version number associated with the payload.
+ * 'reserved' : Not used and reserved for future.
+ *
+ * PDSM Payload:
+ *
+ * The layout of the PDSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a PDSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer based on the
+ * PDSM command. Similarly the output of servicing the PDSM command will be
+ * copied to the payload field using the same struct.
+ *
+ * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
+ * leaves around 176 bytes for the envelope payload (ignoring any padding that
+ * the compiler may silently introduce).
+ *
+ * Payload Version:
+ *
+ * A 'payload_version' field is present in PDSM header that indicates a specific
+ * version of the structure present in PDSM Payload for a given PDSM command.
+ * This provides backward compatibility in case the PDSM Payload structure
+ * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
+ *
+ * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
+ * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
+ * module when servicing the PDSM envelope checks the 'payload_version' and then
+ * uses 'payload struct version' == MIN('payload_version field',
+ * 'max payload-struct-version supported by papr_scm') to service the PDSM.
+ * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
+ * struct in returned 'payload_version' field.
+ *
+ * Libndctl on receiving the envelope back from papr_scm again checks the
+ * 'payload_version' field and based on it use the appropriate version dsm
+ * struct to parse the results.
+ *
+ * Backward Compatibility:
+ *
+ * Above scheme of exchanging different versioned PDSM struct between libndctl
+ * and papr_scm should provide backward compatibility until following two
+ * assumptions/conditions when defining new PDSM structs hold:
+ *
+ * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
+ *
+ * 1. T(X) is a proper subset of T(Y) if Y > X.
+ * i.e Each new version of PDSM struct should retain existing struct
+ * attributes from previous version
+ *
+ * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
+ * it should also support T(1), T(2)...T(X - 1).
+ * i.e When adding support for new version of a PDSM struct, libndctl
+ * and papr_scm should retain support of the existing PDSM struct
+ * version they support.
+ */
+
+/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
+struct nd_pdsm_cmd_pkg {
+ struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
+ __s32 cmd_status; /* Out: Sub-cmd status returned back */
+ __u16 reserved[5]; /* Ignored and to be used in future */
+ __u16 payload_version; /* In/Out: version of the payload */
+ __u8 payload[]; /* In/Out: Sub-cmd data buffer */
+} __packed;
+
+/*
+ * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
+ * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
+ */
+enum papr_pdsm {
+ PAPR_PDSM_MIN = 0x0,
+ PAPR_PDSM_MAX,
+};
+
+/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
+static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
+{
+ return (struct nd_pdsm_cmd_pkg *) cmd;
+}
+
+/* Return the payload pointer for a given pcmd */
+static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
+{
+ if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
+ return NULL;
+ else
+ return (void *)(pcmd->payload);
+}
+
+#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 149431594839..5e2237e7ec08 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -15,13 +15,15 @@
#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
+#include <asm/papr_pdsm.h>
#define BIND_ANY_ADDR (~0ul)
#define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
(1ul << ND_CMD_GET_CONFIG_DATA) | \
- (1ul << ND_CMD_SET_CONFIG_DATA))
+ (1ul << ND_CMD_SET_CONFIG_DATA) | \
+ (1ul << ND_CMD_CALL))
/* DIMM health bitmap bitmap indicators */
/* SCM device is unable to persist memory contents */
@@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
}
+/*
+ * Validate the inputs args to dimm-control function and return '0' if valid.
+ * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
+ */
+static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+ struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
+ struct papr_scm_priv *p;
+
+ /* Only dimm-specific calls are supported atm */
+ if (!nvdimm)
+ return -EINVAL;
+
+ /* get the provider date from struct nvdimm */
+ p = nvdimm_provider_data(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask)) {
+ dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
+ return -EINVAL;
+ } else if (cmd == ND_CMD_CALL) {
+
+ /* Verify the envelope package */
+ if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
+ buf_len);
+ return -EINVAL;
+ }
+
+ /* Verify that the PDSM family is valid */
+ if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
+ pkg->hdr.nd_family);
+ return -EINVAL;
+
+ }
+
+ /* We except a payload with all PDSM commands */
+ if (pdsm_cmd_to_payload(pkg) == NULL) {
+ dev_dbg(&p->pdev->dev,
+ "Empty payload for sub-command=0x%llx\n",
+ pkg->hdr.nd_command);
+ return -EINVAL;
+ }
+ }
+
+ /* Command looks valid */
+ return 0;
+}
+
+static int papr_scm_service_pdsm(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *call_pkg)
+{
+ /* unknown subcommands return error in packages */
+ if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
+ call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
+ dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
+ call_pkg->hdr.nd_command);
+ call_pkg->cmd_status = -EINVAL;
+ return 0;
+ }
+
+ /* Depending on the DSM command call appropriate service routine */
+ switch (call_pkg->hdr.nd_command) {
+ default:
+ dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
+ call_pkg->hdr.nd_command);
+ call_pkg->cmd_status = -ENOENT;
+ return 0;
+ }
+}
+
static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
{
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+ struct nd_pdsm_cmd_pkg *call_pkg = NULL;
+ int rc;
- /* Only dimm-specific calls are supported atm */
- if (!nvdimm)
- return -EINVAL;
+ /* Use a local variable in case cmd_rc pointer is NULL */
+ if (cmd_rc == NULL)
+ cmd_rc = &rc;
+
+ *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
+ if (*cmd_rc) {
+ pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
+ return *cmd_rc;
+ }
p = nvdimm_provider_data(nvdimm);
@@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
*cmd_rc = papr_scm_meta_set(p, buf);
break;
+ case ND_CMD_CALL:
+ call_pkg = nd_to_pdsm_cmd_pkg(buf);
+ *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
+ break;
+
default:
- return -EINVAL;
+ dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
+ *cmd_rc = -EINVAL;
}
dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
- return 0;
+ return *cmd_rc;
}
static ssize_t flags_show(struct device *dev,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..0e09dc5cec19 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR 5
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.26.2
On Tue, Jun 02, 2020 at 03:44:36PM +0530, Vaibhav Jain wrote:
> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit bitmap, bitwise-and of which is then stored in
> 'struct papr_scm_priv' and subsequently partially exposed to
> user-space via newly introduced dimm specific attribute
> 'papr/flags'. Since the hcall is costly, the health information is
> cached and only re-queried, 60s after the previous successful hcall.
>
> The patch also adds a documentation text describing flags reported by
> the the new sysfs attribute 'papr/flags' is also introduced at
> Documentation/ABI/testing/sysfs-bus-papr-pmem.
>
> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
>
> Cc: "Aneesh Kumar K . V" <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
> Changelog:
>
> Resend:
> * Added ack from Aneesh.
>
> v8..v9:
> * Rename some variables and defines to reduce usage of term SCM
> replacing it with PMEM [Dan Williams, Aneesh]
> * s/PAPR_SCM_DIMM/PAPR_PMEM/g
> * s/papr_scm_nd_attributes/papr_nd_attributes/g
> * s/papr_scm_nd_attribute_group/papr_nd_attribute_group/g
> * s/papr_scm_dimm_attr_groups/papr_nd_attribute_groups/g
> * Renamed file sysfs-bus-papr-scm to sysfs-bus-papr-pmem
>
> v7..v8:
> * Update type of variable 'rc' in __drc_pmem_query_health() and
> drc_pmem_query_health() to long and int respectively. [ Ira ]
> * Updated the patch description to s/64 bit Big Endian Number/64-bit
> bitmap/ [ Ira, Aneesh ].
>
> Resend:
> * None
>
> v6..v7 :
> * Used the exported buf_seq_printf() function to generate content for
> 'papr/flags'
> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
> and removed the papr_scm.h file [Mpe]
> * Some minor consistency issued in sysfs-bus-papr-scm
> documentation. [Mpe]
> * s/dimm_mutex/health_mutex/g [Mpe]
> * Split drc_pmem_query_health() into two function one of which takes
> care of caching and locking. [Mpe]
> * Fixed a local copy creation of dimm health information using
> READ_ONCE(). [Mpe]
>
> v5..v6 :
> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
> [Dan Williams]
> * Include documentation for 'papr/flags' attr [Dan Williams]
> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
> * Replaced two __be64 integers from papr_scm_priv to a single u64
> integer [Mpe]
> * Updated patch description to reflect the changes made in this
> version.
> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
> flags_show() [Dan Williams]
>
> v4..v5 :
> * None
>
> v3..v4 :
> * None
>
> v2..v3 :
> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
> NVDIMM unarmed [Aneesh]
>
> v1..v2 :
> * New patch in the series.
> ---
> Documentation/ABI/testing/sysfs-bus-papr-pmem | 27 +++
> arch/powerpc/platforms/pseries/papr_scm.c | 169 +++++++++++++++++-
> 2 files changed, 194 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-pmem
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> new file mode 100644
> index 000000000000..5b10d036a8d4
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
> @@ -0,0 +1,27 @@
> +What: /sys/bus/nd/devices/nmemX/papr/flags
> +Date: Apr, 2020
> +KernelVersion: v5.8
> +Contact: linuxppc-dev <[email protected]>, [email protected],
> +Description:
> + (RO) Report flags indicating various states of a
> + papr-pmem NVDIMM device. Each flag maps to a one or
> + more bits set in the dimm-health-bitmap retrieved in
> + response to H_SCM_HEALTH hcall. The details of the bit
> + flags returned in response to this hcall is available
> + at 'Documentation/powerpc/papr_hcalls.rst' . Below are
> + the flags reported in this sysfs file:
> +
> + * "not_armed" : Indicates that NVDIMM contents will not
> + survive a power cycle.
> + * "flush_fail" : Indicates that NVDIMM contents
> + couldn't be flushed during last
> + shut-down event.
> + * "restore_fail": Indicates that NVDIMM contents
> + couldn't be restored during NVDIMM
> + initialization.
> + * "encrypted" : NVDIMM contents are encrypted.
> + * "smart_notify": There is health event for the NVDIMM.
> + * "scrubbed" : Indicating that contents of the
> + NVDIMM have been scrubbed.
> + * "locked" : Indicating that NVDIMM contents cant
> + be modified until next power cycle.
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index f35592423380..149431594839 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,7 @@
> #include <linux/libnvdimm.h>
> #include <linux/platform_device.h>
> #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>
> #include <asm/plpar_wrappers.h>
>
> @@ -22,6 +23,44 @@
> (1ul << ND_CMD_GET_CONFIG_DATA) | \
> (1ul << ND_CMD_SET_CONFIG_DATA))
>
> +/* DIMM health bitmap bitmap indicators */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_PMEM_UNARMED (1ULL << (63 - 0))
> +/* SCM device failed to persist memory contents */
> +#define PAPR_PMEM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
> +/* SCM device contents are persisted from previous IPL */
> +#define PAPR_PMEM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
> +/* SCM device contents are not persisted from previous IPL */
> +#define PAPR_PMEM_EMPTY (1ULL << (63 - 3))
> +/* SCM device memory life remaining is critically low */
> +#define PAPR_PMEM_HEALTH_CRITICAL (1ULL << (63 - 4))
> +/* SCM device will be garded off next IPL due to failure */
> +#define PAPR_PMEM_HEALTH_FATAL (1ULL << (63 - 5))
> +/* SCM contents cannot persist due to current platform health status */
> +#define PAPR_PMEM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
> +/* SCM device is unable to persist memory contents in certain conditions */
> +#define PAPR_PMEM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
> +/* SCM device is encrypted */
> +#define PAPR_PMEM_ENCRYPTED (1ULL << (63 - 8))
> +/* SCM device has been scrubbed and locked */
> +#define PAPR_PMEM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
> +
> +/* Bits status indicators for health bitmap indicating unarmed dimm */
> +#define PAPR_PMEM_UNARMED_MASK (PAPR_PMEM_UNARMED | \
> + PAPR_PMEM_HEALTH_UNHEALTHY)
> +
> +/* Bits status indicators for health bitmap indicating unflushed dimm */
> +#define PAPR_PMEM_BAD_SHUTDOWN_MASK (PAPR_PMEM_SHUTDOWN_DIRTY)
> +
> +/* Bits status indicators for health bitmap indicating unrestored dimm */
> +#define PAPR_PMEM_BAD_RESTORE_MASK (PAPR_PMEM_EMPTY)
> +
> +/* Bit status indicators for smart event notification */
> +#define PAPR_PMEM_SMART_EVENT_MASK (PAPR_PMEM_HEALTH_CRITICAL | \
> + PAPR_PMEM_HEALTH_FATAL | \
> + PAPR_PMEM_HEALTH_UNHEALTHY)
> +
> +/* private struct associated with each region */
> struct papr_scm_priv {
> struct platform_device *pdev;
> struct device_node *dn;
> @@ -39,6 +78,15 @@ struct papr_scm_priv {
> struct resource res;
> struct nd_region *region;
> struct nd_interleave_set nd_set;
> +
> + /* Protect dimm health data from concurrent read/writes */
> + struct mutex health_mutex;
I question if this really needs protection. But I don't think it hurts.
Reviewed-by: Ira Weiny <[email protected]>
> +
> + /* Last time the health information of the dimm was updated */
> + unsigned long lasthealth_jiffies;
> +
> + /* Health information for the dimm */
> + u64 health_bitmap;
> };
>
> static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> return drc_pmem_bind(p);
> }
>
> +/*
> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> + * health information.
> + */
> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> + unsigned long ret[PLPAR_HCALL_BUFSIZE];
> + long rc;
> +
> + /* issue the hcall */
> + rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> + if (rc != H_SUCCESS) {
> + dev_err(&p->pdev->dev,
> + "Failed to query health information, Err:%ld\n", rc);
> + rc = -ENXIO;
> + goto out;
> + }
> +
> + p->lasthealth_jiffies = jiffies;
> + p->health_bitmap = ret[0] & ret[1];
> +
> + dev_dbg(&p->pdev->dev,
> + "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> + ret[0], ret[1]);
> +out:
> + return rc;
> +}
> +
> +/* Min interval in seconds for assuming stable dimm health */
> +#define MIN_HEALTH_QUERY_INTERVAL 60
> +
> +/* Query cached health info and if needed call drc_pmem_query_health */
> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> + unsigned long cache_timeout;
> + int rc;
> +
> + /* Protect concurrent modifications to papr_scm_priv */
> + rc = mutex_lock_interruptible(&p->health_mutex);
> + if (rc)
> + return rc;
> +
> + /* Jiffies offset for which the health data is assumed to be same */
> + cache_timeout = p->lasthealth_jiffies +
> + msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> +
> + /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> + if (time_after(jiffies, cache_timeout))
> + rc = __drc_pmem_query_health(p);
> + else
> + /* Assume cached health data is valid */
> + rc = 0;
> +
> + mutex_unlock(&p->health_mutex);
> + return rc;
> +}
>
> static int papr_scm_meta_get(struct papr_scm_priv *p,
> struct nd_cmd_get_config_data_hdr *hdr)
> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> return 0;
> }
>
> +static ssize_t flags_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct nvdimm *dimm = to_nvdimm(dev);
> + struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> + struct seq_buf s;
> + u64 health;
> + int rc;
> +
> + rc = drc_pmem_query_health(p);
> + if (rc)
> + return rc;
> +
> + /* Copy health_bitmap locally, check masks & update out buffer */
> + health = READ_ONCE(p->health_bitmap);
> +
> + seq_buf_init(&s, buf, PAGE_SIZE);
> + if (health & PAPR_PMEM_UNARMED_MASK)
> + seq_buf_printf(&s, "not_armed ");
> +
> + if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> + seq_buf_printf(&s, "flush_fail ");
> +
> + if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> + seq_buf_printf(&s, "restore_fail ");
> +
> + if (health & PAPR_PMEM_ENCRYPTED)
> + seq_buf_printf(&s, "encrypted ");
> +
> + if (health & PAPR_PMEM_SMART_EVENT_MASK)
> + seq_buf_printf(&s, "smart_notify ");
> +
> + if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
> + seq_buf_printf(&s, "scrubbed locked ");
> +
> + if (seq_buf_used(&s))
> + seq_buf_printf(&s, "\n");
> +
> + return seq_buf_used(&s);
> +}
> +DEVICE_ATTR_RO(flags);
> +
> +/* papr_scm specific dimm attributes */
> +static struct attribute *papr_nd_attributes[] = {
> + &dev_attr_flags.attr,
> + NULL,
> +};
> +
> +static struct attribute_group papr_nd_attribute_group = {
> + .name = "papr",
> + .attrs = papr_nd_attributes,
> +};
> +
> +static const struct attribute_group *papr_nd_attr_groups[] = {
> + &papr_nd_attribute_group,
> + NULL,
> +};
> +
> static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> {
> struct device *dev = &p->pdev->dev;
> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> dimm_flags = 0;
> set_bit(NDD_LABELING, &dimm_flags);
>
> - p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
> - PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> + p->nvdimm = nvdimm_create(p->bus, p, papr_nd_attr_groups,
> + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> if (!p->nvdimm) {
> dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
> goto err;
> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
> if (!p)
> return -ENOMEM;
>
> + /* Initialize the dimm mutex */
> + mutex_init(&p->health_mutex);
> +
> /* optional DT properties */
> of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
>
> --
> 2.26.2
>
On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
> module and add the command family NVDIMM_FAMILY_PAPR to the white list
> of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
> nvdimm command mask and implement necessary scaffolding in the module
> to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>
> The layout of the PDSM request as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_pdsm.h' which
> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
> to communicate the PDSM request via member
> 'nd_cmd_pkg.nd_command' and size of payload that need to be
> sent/received for servicing the PDSM.
>
> A new function is_cmd_valid() is implemented that reads the args to
> papr_scm_ndctl() and performs sanity tests on them. A new function
> papr_scm_service_pdsm() is introduced and is called from
> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
> command from libnvdimm.
>
> Cc: "Aneesh Kumar K . V" <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
> Changelog:
>
> Resend:
> * Added ack from Aneesh.
>
> v8..v9:
> * Reduced the usage of term SCM replacing it with appropriate
> replacement [ Dan Williams, Aneesh ]
> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
> * Minor update to patch description.
>
> v7..v8:
> * Removed the 'payload_offset' field from 'struct
> nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
> at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
> 'reserved' field of 10-bytes is introduced. [ Aneesh ]
> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
> [ Ira ]
>
> Resend:
> * None
>
> v6..v7 :
> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
> [Mpe].
> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
> [Mpe].
> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>
> v5..v6 :
> * Changed the usage of the term DSM to PDSM to distinguish it from the
> ACPI term [ Dan Williams ]
> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
> to reflect the new terminology.
> * Updated the patch description and title to reflect the new terminology.
> * Squashed patch to introduce new command family in 'ndctl.h' with
> this patch [ Dan Williams ]
> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
> [ Dan Williams ]
> * Removed redundant license text from the papr_scm_psdm.h file.
> [ Dan Williams ]
> * s/envelop/envelope/ at various places [ Dan Williams ]
> * Added '__packed' attribute to command package header to gaurd
> against different compiler adding paddings between the fields.
> [ Dan Williams]
> * Converted various pr_debug to dev_debug [ Dan Williams ]
>
> v4..v5 :
> * None
>
> v3..v4 :
> * None
>
> v2..v3 :
> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>
> v1..v2 :
> * None
> ---
> arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++++++++++++++++++++++
> arch/powerpc/platforms/pseries/papr_scm.c | 101 +++++++++++++++-
> include/uapi/linux/ndctl.h | 1 +
> 3 files changed, 232 insertions(+), 6 deletions(-)
> create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
>
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> new file mode 100644
> index 000000000000..6407fefcc007
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -0,0 +1,136 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
> +
> +#include <linux/types.h>
> +
> +/*
> + * PDSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * envelope which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> + * There is reserved field that can used to introduce new fields to the
> + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
> + * lies at a 8-byte boundary.
> + *
> + * +-------------+---------------------+---------------------------+
> + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
> + * +-------------+---------------------+---------------------------+
> + * | nd_pdsm_cmd_pkg | |
> + * |-------------+ | |
> + * | nd_cmd_pkg | | |
> + * +-------------+---------------------+---------------------------+
> + * | nd_family | | |
> + * | nd_size_out | cmd_status | |
> + * | nd_size_in | payload_version | payload |
> + * | nd_command | reserved | |
> + * | nd_fw_size | | |
> + * +-------------+---------------------+---------------------------+
> + *
> + * PDSM Header:
> + *
> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> + * contained in 'struct nd_cmd_pkg', the header also has members following
^^^^^
... the ...
> + * members:
> + *
> + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
> + * 'payload_version' : (In/Out) Version number associated with the payload.
> + * 'reserved' : Not used and reserved for future.
> + *
> + * PDSM Payload:
> + *
> + * The layout of the PDSM Payload is defined by various structs shared between
> + * papr_scm and libndctl so that contents of payload can be interpreted. During
> + * servicing of a PDSM the papr_scm module will read input args from the payload
> + * field by casting its contents to an appropriate struct pointer based on the
> + * PDSM command. Similarly the output of servicing the PDSM command will be
> + * copied to the payload field using the same struct.
> + *
> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> + * leaves around 176 bytes for the envelope payload (ignoring any padding that
> + * the compiler may silently introduce).
> + *
> + * Payload Version:
> + *
> + * A 'payload_version' field is present in PDSM header that indicates a specific
> + * version of the structure present in PDSM Payload for a given PDSM command.
> + * This provides backward compatibility in case the PDSM Payload structure
> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> + *
> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> + * module when servicing the PDSM envelope checks the 'payload_version' and then
> + * uses 'payload struct version' == MIN('payload_version field',
> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> + * struct in returned 'payload_version' field.
> + *
> + * Libndctl on receiving the envelope back from papr_scm again checks the
> + * 'payload_version' field and based on it use the appropriate version dsm
> + * struct to parse the results.
> + *
> + * Backward Compatibility:
> + *
> + * Above scheme of exchanging different versioned PDSM struct between libndctl
> + * and papr_scm should provide backward compatibility until following two
> + * assumptions/conditions when defining new PDSM structs hold:
> + *
> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> + *
> + * 1. T(X) is a proper subset of T(Y) if Y > X.
> + * i.e Each new version of PDSM struct should retain existing struct
> + * attributes from previous version
> + *
> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> + * it should also support T(1), T(2)...T(X - 1).
> + * i.e When adding support for new version of a PDSM struct, libndctl
> + * and papr_scm should retain support of the existing PDSM struct
> + * version they support.
Please see this thread for an example why versions are a bad idea in UAPIs:
https://lkml.org/lkml/2020/3/26/213
While the use of version is different in that thread the fundamental issues are
the same. You end up with some weird matrix of supported features and
structure definitions. For example, you are opening up the possibility of
changing structures with a different version for no good reason.
Also having the user query with version Z and get back version X (older) is
odd. Generally if the kernel does not know about a feature (ie version Z of
the structure) it should return -EINVAL and let the user figure out what to do.
The user may just give up or they could try a different query.
> + */
> +
> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> +struct nd_pdsm_cmd_pkg {
> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> + __u16 reserved[5]; /* Ignored and to be used in future */
How do you know when reserved is used for something else in the future? Is
reserved guaranteed (and checked by the code) to be 0?
> + __u16 payload_version; /* In/Out: version of the payload */
Why is payload_version after reserved?
> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> +} __packed;
> +
> +/*
> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> + */
> +enum papr_pdsm {
> + PAPR_PDSM_MIN = 0x0,
> + PAPR_PDSM_MAX,
> +};
> +
> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> +{
> + return (struct nd_pdsm_cmd_pkg *) cmd;
> +}
> +
> +/* Return the payload pointer for a given pcmd */
> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> +{
> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> + return NULL;
> + else
> + return (void *)(pcmd->payload);
> +}
> +
> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 149431594839..5e2237e7ec08 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -15,13 +15,15 @@
> #include <linux/seq_buf.h>
>
> #include <asm/plpar_wrappers.h>
> +#include <asm/papr_pdsm.h>
>
> #define BIND_ANY_ADDR (~0ul)
>
> #define PAPR_SCM_DIMM_CMD_MASK \
> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> (1ul << ND_CMD_GET_CONFIG_DATA) | \
> - (1ul << ND_CMD_SET_CONFIG_DATA))
> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
> + (1ul << ND_CMD_CALL))
>
> /* DIMM health bitmap bitmap indicators */
> /* SCM device is unable to persist memory contents */
> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> return 0;
> }
>
> +/*
> + * Validate the inputs args to dimm-control function and return '0' if valid.
> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> + */
> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> + unsigned int buf_len)
> +{
> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> + struct papr_scm_priv *p;
> +
> + /* Only dimm-specific calls are supported atm */
> + if (!nvdimm)
> + return -EINVAL;
> +
> + /* get the provider date from struct nvdimm */
s/date/data
> + p = nvdimm_provider_data(nvdimm);
> +
> + if (!test_bit(cmd, &cmd_mask)) {
> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> + return -EINVAL;
> + } else if (cmd == ND_CMD_CALL) {
> +
> + /* Verify the envelope package */
> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> + buf_len);
> + return -EINVAL;
> + }
> +
> + /* Verify that the PDSM family is valid */
> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> + pkg->hdr.nd_family);
> + return -EINVAL;
> +
> + }
> +
> + /* We except a payload with all PDSM commands */
> + if (pdsm_cmd_to_payload(pkg) == NULL) {
> + dev_dbg(&p->pdev->dev,
> + "Empty payload for sub-command=0x%llx\n",
> + pkg->hdr.nd_command);
> + return -EINVAL;
> + }
> + }
> +
> + /* Command looks valid */
I assume the first command to be implemented also checks the { nd_command,
payload_version, payload length } for correctness?
> + return 0;
> +}
> +
> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> + struct nd_pdsm_cmd_pkg *call_pkg)
> +{
> + /* unknown subcommands return error in packages */
> + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
> + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
> + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> + call_pkg->hdr.nd_command);
> + call_pkg->cmd_status = -EINVAL;
> + return 0;
> + }
> +
> + /* Depending on the DSM command call appropriate service routine */
> + switch (call_pkg->hdr.nd_command) {
> + default:
> + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> + call_pkg->hdr.nd_command);
> + call_pkg->cmd_status = -ENOENT;
> + return 0;
> + }
> +}
> +
> static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> unsigned int buf_len, int *cmd_rc)
> {
> struct nd_cmd_get_config_size *get_size_hdr;
> struct papr_scm_priv *p;
> + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> + int rc;
>
> - /* Only dimm-specific calls are supported atm */
> - if (!nvdimm)
> - return -EINVAL;
> + /* Use a local variable in case cmd_rc pointer is NULL */
> + if (cmd_rc == NULL)
> + cmd_rc = &rc;
Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
and you did not change it.
> +
> + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> + if (*cmd_rc) {
> + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> + return *cmd_rc;
> + }
>
> p = nvdimm_provider_data(nvdimm);
>
> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> *cmd_rc = papr_scm_meta_set(p, buf);
> break;
>
> + case ND_CMD_CALL:
> + call_pkg = nd_to_pdsm_cmd_pkg(buf);
> + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> + break;
> +
> default:
> - return -EINVAL;
> + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> + *cmd_rc = -EINVAL;
Is this change related? If there is a bug where there is a caller of
papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
that issue.
Ira
> }
>
> dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>
> - return 0;
> + return *cmd_rc;
> }
>
> static ssize_t flags_show(struct device *dev,
> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> index de5d90212409..0e09dc5cec19 100644
> --- a/include/uapi/linux/ndctl.h
> +++ b/include/uapi/linux/ndctl.h
> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
> #define NVDIMM_FAMILY_HPE2 2
> #define NVDIMM_FAMILY_MSFT 3
> #define NVDIMM_FAMILY_HYPERV 4
> +#define NVDIMM_FAMILY_PAPR 5
>
> #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
> struct nd_cmd_pkg)
> --
> 2.26.2
>
On Tue, Jun 02, 2020 at 01:51:49PM -0700, 'Ira Weiny' wrote:
> On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
...
> > +
> > +/*
> > + * PDSM Envelope:
> > + *
> > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> > + * envelope which consists of a header and user-defined payload sections.
> > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> > + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> > + * There is reserved field that can used to introduce new fields to the
> > + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
> > + * lies at a 8-byte boundary.
> > + *
> > + * +-------------+---------------------+---------------------------+
> > + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
> > + * +-------------+---------------------+---------------------------+
> > + * | nd_pdsm_cmd_pkg | |
> > + * |-------------+ | |
> > + * | nd_cmd_pkg | | |
> > + * +-------------+---------------------+---------------------------+
> > + * | nd_family | | |
> > + * | nd_size_out | cmd_status | |
> > + * | nd_size_in | payload_version | payload |
> > + * | nd_command | reserved | |
> > + * | nd_fw_size | | |
> > + * +-------------+---------------------+---------------------------+
One more comment WRT nd_size_[in|out]. I know that it is defined as the size
of the FW payload but normally when you nest headers 'size' in Header A
represents everything after Header A, including Header B. In this case that
would be including nd_pdsm_cmd_pkg...
It looks like that is not what you have done? Or perhaps I missed it?
Ira
> > + *
> > + * PDSM Header:
> > + *
> > + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> > + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> > + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> > + * contained in 'struct nd_cmd_pkg', the header also has members following
> ^^^^^
> ... the ...
>
> > + * members:
> > + *
> > + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
> > + * 'payload_version' : (In/Out) Version number associated with the payload.
> > + * 'reserved' : Not used and reserved for future.
> > + *
> > + * PDSM Payload:
> > + *
> > + * The layout of the PDSM Payload is defined by various structs shared between
> > + * papr_scm and libndctl so that contents of payload can be interpreted. During
> > + * servicing of a PDSM the papr_scm module will read input args from the payload
> > + * field by casting its contents to an appropriate struct pointer based on the
> > + * PDSM command. Similarly the output of servicing the PDSM command will be
> > + * copied to the payload field using the same struct.
> > + *
> > + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> > + * leaves around 176 bytes for the envelope payload (ignoring any padding that
> > + * the compiler may silently introduce).
> > + *
> > + * Payload Version:
> > + *
> > + * A 'payload_version' field is present in PDSM header that indicates a specific
> > + * version of the structure present in PDSM Payload for a given PDSM command.
> > + * This provides backward compatibility in case the PDSM Payload structure
> > + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> > + *
> > + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> > + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> > + * module when servicing the PDSM envelope checks the 'payload_version' and then
> > + * uses 'payload struct version' == MIN('payload_version field',
> > + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> > + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> > + * struct in returned 'payload_version' field.
> > + *
> > + * Libndctl on receiving the envelope back from papr_scm again checks the
> > + * 'payload_version' field and based on it use the appropriate version dsm
> > + * struct to parse the results.
> > + *
> > + * Backward Compatibility:
> > + *
> > + * Above scheme of exchanging different versioned PDSM struct between libndctl
> > + * and papr_scm should provide backward compatibility until following two
> > + * assumptions/conditions when defining new PDSM structs hold:
> > + *
> > + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> > + *
> > + * 1. T(X) is a proper subset of T(Y) if Y > X.
> > + * i.e Each new version of PDSM struct should retain existing struct
> > + * attributes from previous version
> > + *
> > + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> > + * it should also support T(1), T(2)...T(X - 1).
> > + * i.e When adding support for new version of a PDSM struct, libndctl
> > + * and papr_scm should retain support of the existing PDSM struct
> > + * version they support.
>
> Please see this thread for an example why versions are a bad idea in UAPIs:
>
> https://lkml.org/lkml/2020/3/26/213
>
> While the use of version is different in that thread the fundamental issues are
> the same. You end up with some weird matrix of supported features and
> structure definitions. For example, you are opening up the possibility of
> changing structures with a different version for no good reason.
>
> Also having the user query with version Z and get back version X (older) is
> odd. Generally if the kernel does not know about a feature (ie version Z of
> the structure) it should return -EINVAL and let the user figure out what to do.
> The user may just give up or they could try a different query.
>
> > + */
> > +
> > +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> > +struct nd_pdsm_cmd_pkg {
> > + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
> > + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> > + __u16 reserved[5]; /* Ignored and to be used in future */
>
> How do you know when reserved is used for something else in the future? Is
> reserved guaranteed (and checked by the code) to be 0?
>
> > + __u16 payload_version; /* In/Out: version of the payload */
>
> Why is payload_version after reserved?
>
> > + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> > +} __packed;
> > +
> > +/*
> > + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> > + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> > + */
> > +enum papr_pdsm {
> > + PAPR_PDSM_MIN = 0x0,
> > + PAPR_PDSM_MAX,
> > +};
> > +
> > +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> > +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> > +{
> > + return (struct nd_pdsm_cmd_pkg *) cmd;
> > +}
> > +
> > +/* Return the payload pointer for a given pcmd */
> > +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> > +{
> > + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> > + return NULL;
> > + else
> > + return (void *)(pcmd->payload);
> > +}
> > +
> > +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> > index 149431594839..5e2237e7ec08 100644
> > --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > @@ -15,13 +15,15 @@
> > #include <linux/seq_buf.h>
> >
> > #include <asm/plpar_wrappers.h>
> > +#include <asm/papr_pdsm.h>
> >
> > #define BIND_ANY_ADDR (~0ul)
> >
> > #define PAPR_SCM_DIMM_CMD_MASK \
> > ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> > (1ul << ND_CMD_GET_CONFIG_DATA) | \
> > - (1ul << ND_CMD_SET_CONFIG_DATA))
> > + (1ul << ND_CMD_SET_CONFIG_DATA) | \
> > + (1ul << ND_CMD_CALL))
> >
> > /* DIMM health bitmap bitmap indicators */
> > /* SCM device is unable to persist memory contents */
> > @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> > return 0;
> > }
> >
> > +/*
> > + * Validate the inputs args to dimm-control function and return '0' if valid.
> > + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> > + */
> > +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> > + unsigned int buf_len)
> > +{
> > + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> > + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> > + struct papr_scm_priv *p;
> > +
> > + /* Only dimm-specific calls are supported atm */
> > + if (!nvdimm)
> > + return -EINVAL;
> > +
> > + /* get the provider date from struct nvdimm */
>
> s/date/data
>
> > + p = nvdimm_provider_data(nvdimm);
> > +
> > + if (!test_bit(cmd, &cmd_mask)) {
> > + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> > + return -EINVAL;
> > + } else if (cmd == ND_CMD_CALL) {
> > +
> > + /* Verify the envelope package */
> > + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> > + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> > + buf_len);
> > + return -EINVAL;
> > + }
> > +
> > + /* Verify that the PDSM family is valid */
> > + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> > + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> > + pkg->hdr.nd_family);
> > + return -EINVAL;
> > +
> > + }
> > +
> > + /* We except a payload with all PDSM commands */
> > + if (pdsm_cmd_to_payload(pkg) == NULL) {
> > + dev_dbg(&p->pdev->dev,
> > + "Empty payload for sub-command=0x%llx\n",
> > + pkg->hdr.nd_command);
> > + return -EINVAL;
> > + }
> > + }
> > +
> > + /* Command looks valid */
>
> I assume the first command to be implemented also checks the { nd_command,
> payload_version, payload length } for correctness?
>
> > + return 0;
> > +}
> > +
> > +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> > + struct nd_pdsm_cmd_pkg *call_pkg)
> > +{
> > + /* unknown subcommands return error in packages */
> > + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
> > + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
> > + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> > + call_pkg->hdr.nd_command);
> > + call_pkg->cmd_status = -EINVAL;
> > + return 0;
> > + }
> > +
> > + /* Depending on the DSM command call appropriate service routine */
> > + switch (call_pkg->hdr.nd_command) {
> > + default:
> > + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> > + call_pkg->hdr.nd_command);
> > + call_pkg->cmd_status = -ENOENT;
> > + return 0;
> > + }
> > +}
> > +
> > static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> > struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> > unsigned int buf_len, int *cmd_rc)
> > {
> > struct nd_cmd_get_config_size *get_size_hdr;
> > struct papr_scm_priv *p;
> > + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> > + int rc;
> >
> > - /* Only dimm-specific calls are supported atm */
> > - if (!nvdimm)
> > - return -EINVAL;
> > + /* Use a local variable in case cmd_rc pointer is NULL */
> > + if (cmd_rc == NULL)
> > + cmd_rc = &rc;
>
> Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
> and you did not change it.
>
> > +
> > + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> > + if (*cmd_rc) {
> > + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> > + return *cmd_rc;
> > + }
> >
> > p = nvdimm_provider_data(nvdimm);
> >
> > @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> > *cmd_rc = papr_scm_meta_set(p, buf);
> > break;
> >
> > + case ND_CMD_CALL:
> > + call_pkg = nd_to_pdsm_cmd_pkg(buf);
> > + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> > + break;
> > +
> > default:
> > - return -EINVAL;
> > + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> > + *cmd_rc = -EINVAL;
>
> Is this change related? If there is a bug where there is a caller of
> papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
> that issue.
>
> Ira
>
> > }
> >
> > dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
> >
> > - return 0;
> > + return *cmd_rc;
> > }
> >
> > static ssize_t flags_show(struct device *dev,
> > diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> > index de5d90212409..0e09dc5cec19 100644
> > --- a/include/uapi/linux/ndctl.h
> > +++ b/include/uapi/linux/ndctl.h
> > @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
> > #define NVDIMM_FAMILY_HPE2 2
> > #define NVDIMM_FAMILY_MSFT 3
> > #define NVDIMM_FAMILY_HYPERV 4
> > +#define NVDIMM_FAMILY_PAPR 5
> >
> > #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
> > struct nd_cmd_pkg)
> > --
> > 2.26.2
> >
> _______________________________________________
> Linux-nvdimm mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
> containing dimm health information back to user space in response to
> ND_CMD_CALL. This functionality is implemented in newly introduced
> papr_pdsm_health() that queries the nvdimm health information and
> then copies this information to the package payload whose layout is
> defined by 'struct nd_papr_pdsm_health'.
>
> The patch also introduces a new member 'struct papr_scm_priv.health'
> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
> information of a nvdimm. As a result functions drc_pmem_query_health()
> and flags_show() are updated to populate and use this new struct
> instead of a u64 integer that was earlier used.
>
> Cc: "Aneesh Kumar K . V" <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
> Changelog:
>
> Resend:
> * Added ack from Aneesh.
>
> v8..v9:
> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
> * Renamed papr_scm_get_health() to papr_psdm_health()
> * Updated patch description to replace papr-scm dimm with nvdimm.
>
> v7..v8:
> * None
>
> Resend:
> * None
>
> v6..v7:
> * Updated flags_show() to use seq_buf_printf(). [Mpe]
> * Updated papr_scm_get_health() to use newly introduced
> __drc_pmem_query_health() bypassing the cache [Mpe].
>
> v5..v6:
> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
> gaurd against possibility of different compilers adding different
> paddings to the struct [ Dan Williams ]
>
> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
> 'bool' and also updated drc_pmem_query_health() to take this into
> account. [ Dan Williams ]
>
> v4..v5:
> * None
>
> v3..v4:
> * Call the DSM_PAPR_SCM_HEALTH service function from
> papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
>
> v2..v3:
> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
> as its exported to the userspace [Aneesh]
> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
> from enum to #defines [Aneesh]
>
> v1..v2:
> * New patch in the series
> ---
> arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++++++
> arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
> 2 files changed, 147 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> index 6407fefcc007..411725a91591 100644
> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
> */
> enum papr_pdsm {
> PAPR_PDSM_MIN = 0x0,
> + PAPR_PDSM_HEALTH,
> PAPR_PDSM_MAX,
> };
>
> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> return (void *)(pcmd->payload);
> }
>
> +/* Various nvdimm health indicators */
> +#define PAPR_PDSM_DIMM_HEALTHY 0
> +#define PAPR_PDSM_DIMM_UNHEALTHY 1
> +#define PAPR_PDSM_DIMM_CRITICAL 2
> +#define PAPR_PDSM_DIMM_FATAL 3
> +
> +/*
> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
> + * Various flags indicate the health status of the dimm.
> + *
> + * dimm_unarmed : Dimm not armed. So contents wont persist.
> + * dimm_bad_shutdown : Previous shutdown did not persist contents.
> + * dimm_bad_restore : Contents from previous shutdown werent restored.
> + * dimm_scrubbed : Contents of the dimm have been scrubbed.
> + * dimm_locked : Contents of the dimm cant be modified until CEC reboot
> + * dimm_encrypted : Contents of dimm are encrypted.
> + * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
> + */
> +struct nd_papr_pdsm_health_v1 {
> + __u8 dimm_unarmed;
> + __u8 dimm_bad_shutdown;
> + __u8 dimm_bad_restore;
> + __u8 dimm_scrubbed;
> + __u8 dimm_locked;
> + __u8 dimm_encrypted;
> + __u16 dimm_health;
> +} __packed;
> +
> +/*
> + * Typedef the current struct for dimm_health so that any application
> + * or kernel recompiled after introducing a new version automatically
> + * supports the new version.
> + */
> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
> +
> +/* Current version number for the dimm health struct */
This can't be the 'current' version. You will need a list of versions you
support. Because if the user passes in an old version you need to be able to
respond with that old version. Also if you plan to support 'return X for a Y
query' then the user will need both X and Y defined to interpret X.
> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
> +
> #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 5e2237e7ec08..c0606c0c659c 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -88,7 +88,7 @@ struct papr_scm_priv {
> unsigned long lasthealth_jiffies;
>
> /* Health information for the dimm */
> - u64 health_bitmap;
> + struct nd_papr_pdsm_health health;
ok so we are throwing away all the #defs from patch 1? Are they still valid?
I'm confused that patch 3 added this and we are throwing it away here...
> };
>
> static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> static int __drc_pmem_query_health(struct papr_scm_priv *p)
> {
> unsigned long ret[PLPAR_HCALL_BUFSIZE];
> + u64 health;
> long rc;
>
> /* issue the hcall */
> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
> if (rc != H_SUCCESS) {
> dev_err(&p->pdev->dev,
> "Failed to query health information, Err:%ld\n", rc);
> - rc = -ENXIO;
> - goto out;
> + return -ENXIO;
I missed this... probably did not need the goto in the first patch?
> }
>
> p->lasthealth_jiffies = jiffies;
> - p->health_bitmap = ret[0] & ret[1];
> + health = ret[0] & ret[1];
>
> dev_dbg(&p->pdev->dev,
> "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> ret[0], ret[1]);
> -out:
> - return rc;
> +
> + memset(&p->health, 0, sizeof(p->health));
> +
> + /* Check for various masks in bitmap and set the buffer */
> + if (health & PAPR_PMEM_UNARMED_MASK)
Oh ok... odd. (don't add code then just take it away in a series)
You could have lead with the user structure and put this code in patch 3.
Why does the user need u8 to represent a single bit? Does this help protect
against endian issues?
> + p->health.dimm_unarmed = 1;
> +
> + if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> + p->health.dimm_bad_shutdown = 1;
> +
> + if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> + p->health.dimm_bad_restore = 1;
> +
> + if (health & PAPR_PMEM_ENCRYPTED)
> + p->health.dimm_encrypted = 1;
> +
> + if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
> + p->health.dimm_locked = 1;
> + p->health.dimm_scrubbed = 1;
> + }
> +
> + if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
> + p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
> +
> + if (health & PAPR_PMEM_HEALTH_CRITICAL)
> + p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
> +
> + if (health & PAPR_PMEM_HEALTH_FATAL)
> + p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
> +
> + return 0;
> }
>
> /* Min interval in seconds for assuming stable dimm health */
> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> return 0;
> }
>
> +/* Fetch the DIMM health info and populate it in provided package. */
> +static int papr_pdsm_health(struct papr_scm_priv *p,
> + struct nd_pdsm_cmd_pkg *pkg)
> +{
> + int rc;
> + size_t copysize = sizeof(p->health);
> +
> + /* Ensure dimm health mutex is taken preventing concurrent access */
> + rc = mutex_lock_interruptible(&p->health_mutex);
> + if (rc)
> + goto out;
> +
> + /* Always fetch upto date dimm health data ignoring cached values */
> + rc = __drc_pmem_query_health(p);
> + if (rc)
> + goto out_unlock;
> + /*
> + * If the requested payload version is greater than one we know
> + * about, return the payload version we know about and let
> + * caller/userspace handle.
> + */
> + if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
> + pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
I know this seems easy now but I do think you will run into trouble later.
Ira
> +
> + if (pkg->hdr.nd_size_out < copysize) {
> + dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
> + pkg->hdr.nd_size_out, copysize);
> + rc = -ENOSPC;
> + goto out_unlock;
> + }
> +
> + dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
> + copysize, pkg->payload_version);
> +
> + /* Copy the health struct to the payload */
> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> + pkg->hdr.nd_fw_size = copysize;
> +
> +out_unlock:
> + mutex_unlock(&p->health_mutex);
> +
> +out:
> + /*
> + * Put the error in out package and return success from function
> + * so that errors if any are propogated back to userspace.
> + */
> + pkg->cmd_status = rc;
> + dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
> +
> + return 0;
> +}
> +
> static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> struct nd_pdsm_cmd_pkg *call_pkg)
> {
> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>
> /* Depending on the DSM command call appropriate service routine */
> switch (call_pkg->hdr.nd_command) {
> + case PAPR_PDSM_HEALTH:
> + return papr_pdsm_health(p, call_pkg);
> +
> default:
> dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> call_pkg->hdr.nd_command);
> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
> struct nvdimm *dimm = to_nvdimm(dev);
> struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> struct seq_buf s;
> - u64 health;
> int rc;
>
> rc = drc_pmem_query_health(p);
> if (rc)
> return rc;
>
> - /* Copy health_bitmap locally, check masks & update out buffer */
> - health = READ_ONCE(p->health_bitmap);
> -
> seq_buf_init(&s, buf, PAGE_SIZE);
> - if (health & PAPR_PMEM_UNARMED_MASK)
> +
> + /* Protect concurrent modifications to papr_scm_priv */
> + rc = mutex_lock_interruptible(&p->health_mutex);
> + if (rc)
> + return rc;
> +
> + if (p->health.dimm_unarmed)
> seq_buf_printf(&s, "not_armed ");
>
> - if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> + if (p->health.dimm_bad_shutdown)
> seq_buf_printf(&s, "flush_fail ");
>
> - if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> + if (p->health.dimm_bad_restore)
> seq_buf_printf(&s, "restore_fail ");
>
> - if (health & PAPR_PMEM_ENCRYPTED)
> + if (p->health.dimm_encrypted)
> seq_buf_printf(&s, "encrypted ");
>
> - if (health & PAPR_PMEM_SMART_EVENT_MASK)
> + if (p->health.dimm_health)
> seq_buf_printf(&s, "smart_notify ");
>
> - if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
> - seq_buf_printf(&s, "scrubbed locked ");
> + if (p->health.dimm_scrubbed)
> + seq_buf_printf(&s, "scrubbed ");
> +
> + if (p->health.dimm_locked)
> + seq_buf_printf(&s, "locked ");
> +
> + mutex_unlock(&p->health_mutex);
>
> if (seq_buf_used(&s))
> seq_buf_printf(&s, "\n");
> --
> 2.26.2
>
Hi Ira,
Thanks for reviewing this patch. My responses below:
Ira Weiny <[email protected]> writes:
> On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
>> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
>> module and add the command family NVDIMM_FAMILY_PAPR to the white list
>> of NVDIMM command sets. Also advertise support for ND_CMD_CALL for the
>> nvdimm command mask and implement necessary scaffolding in the module
>> to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>>
>> The layout of the PDSM request as we expect from libnvdimm/libndctl is
>> described in newly introduced uapi header 'papr_pdsm.h' which
>> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
>> to communicate the PDSM request via member
>> 'nd_cmd_pkg.nd_command' and size of payload that need to be
>> sent/received for servicing the PDSM.
>>
>> A new function is_cmd_valid() is implemented that reads the args to
>> papr_scm_ndctl() and performs sanity tests on them. A new function
>> papr_scm_service_pdsm() is introduced and is called from
>> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
>> command from libnvdimm.
>>
>> Cc: "Aneesh Kumar K . V" <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Reviewed-by: Aneesh Kumar K.V <[email protected]>
>> Signed-off-by: Vaibhav Jain <[email protected]>
>> ---
>> Changelog:
>>
>> Resend:
>> * Added ack from Aneesh.
>>
>> v8..v9:
>> * Reduced the usage of term SCM replacing it with appropriate
>> replacement [ Dan Williams, Aneesh ]
>> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
>> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
>> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
>> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
>> * Minor update to patch description.
>>
>> v7..v8:
>> * Removed the 'payload_offset' field from 'struct
>> nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
>> at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
>> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
>> 'reserved' field of 10-bytes is introduced. [ Aneesh ]
>> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
>> [ Ira ]
>>
>> Resend:
>> * None
>>
>> v6..v7 :
>> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>> [Mpe].
>> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
>> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>> [Mpe].
>> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>>
>> v5..v6 :
>> * Changed the usage of the term DSM to PDSM to distinguish it from the
>> ACPI term [ Dan Williams ]
>> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>> to reflect the new terminology.
>> * Updated the patch description and title to reflect the new terminology.
>> * Squashed patch to introduce new command family in 'ndctl.h' with
>> this patch [ Dan Williams ]
>> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
>> [ Dan Williams ]
>> * Removed redundant license text from the papr_scm_psdm.h file.
>> [ Dan Williams ]
>> * s/envelop/envelope/ at various places [ Dan Williams ]
>> * Added '__packed' attribute to command package header to gaurd
>> against different compiler adding paddings between the fields.
>> [ Dan Williams]
>> * Converted various pr_debug to dev_debug [ Dan Williams ]
>>
>> v4..v5 :
>> * None
>>
>> v3..v4 :
>> * None
>>
>> v2..v3 :
>> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>>
>> v1..v2 :
>> * None
>> ---
>> arch/powerpc/include/uapi/asm/papr_pdsm.h | 136 ++++++++++++++++++++++
>> arch/powerpc/platforms/pseries/papr_scm.c | 101 +++++++++++++++-
>> include/uapi/linux/ndctl.h | 1 +
>> 3 files changed, 232 insertions(+), 6 deletions(-)
>> create mode 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
>>
>> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> new file mode 100644
>> index 000000000000..6407fefcc007
>> --- /dev/null
>> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> @@ -0,0 +1,136 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> +/*
>> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
>> + *
>> + * (C) Copyright IBM 2020
>> + *
>> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
>> + */
>> +
>> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/*
>> + * PDSM Envelope:
>> + *
>> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
>> + * envelope which consists of a header and user-defined payload sections.
>> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
>> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
>> + * There is reserved field that can used to introduce new fields to the
>> + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
>> + * lies at a 8-byte boundary.
>> + *
>> + * +-------------+---------------------+---------------------------+
>> + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_pdsm_cmd_pkg | |
>> + * |-------------+ | |
>> + * | nd_cmd_pkg | | |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_family | | |
>> + * | nd_size_out | cmd_status | |
>> + * | nd_size_in | payload_version | payload |
>> + * | nd_command | reserved | |
>> + * | nd_fw_size | | |
>> + * +-------------+---------------------+---------------------------+
>> + *
>> + * PDSM Header:
>> + *
>> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
>> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
>> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
>> + * contained in 'struct nd_cmd_pkg', the header also has members following
> ^^^^^
> ... the ...
Thanks for catching this will get it fixed.
>
>> + * members:
>> + *
>> + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
>> + * 'payload_version' : (In/Out) Version number associated with the payload.
>> + * 'reserved' : Not used and reserved for future.
>> + *
>> + * PDSM Payload:
>> + *
>> + * The layout of the PDSM Payload is defined by various structs shared between
>> + * papr_scm and libndctl so that contents of payload can be interpreted. During
>> + * servicing of a PDSM the papr_scm module will read input args from the payload
>> + * field by casting its contents to an appropriate struct pointer based on the
>> + * PDSM command. Similarly the output of servicing the PDSM command will be
>> + * copied to the payload field using the same struct.
>> + *
>> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
>> + * leaves around 176 bytes for the envelope payload (ignoring any padding that
>> + * the compiler may silently introduce).
>> + *
>> + * Payload Version:
>> + *
>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>> + * version of the structure present in PDSM Payload for a given PDSM command.
>> + * This provides backward compatibility in case the PDSM Payload structure
>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>> + *
>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>> + * uses 'payload struct version' == MIN('payload_version field',
>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>> + * struct in returned 'payload_version' field.
>> + *
>> + * Libndctl on receiving the envelope back from papr_scm again checks the
>> + * 'payload_version' field and based on it use the appropriate version dsm
>> + * struct to parse the results.
>> + *
>> + * Backward Compatibility:
>> + *
>> + * Above scheme of exchanging different versioned PDSM struct between libndctl
>> + * and papr_scm should provide backward compatibility until following two
>> + * assumptions/conditions when defining new PDSM structs hold:
>> + *
>> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> + *
>> + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> + * i.e Each new version of PDSM struct should retain existing struct
>> + * attributes from previous version
>> + *
>> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> + * it should also support T(1), T(2)...T(X - 1).
>> + * i.e When adding support for new version of a PDSM struct, libndctl
>> + * and papr_scm should retain support of the existing PDSM struct
>> + * version they support.
>
> Please see this thread for an example why versions are a bad idea in UAPIs:
>
> https://lkml.org/lkml/2020/3/26/213
>
> While the use of version is different in that thread the fundamental issues are
> the same. You end up with some weird matrix of supported features and
> structure definitions. For example, you are opening up the possibility of
> changing structures with a different version for no good reason.
Not really sure I understand the statement correctly "you are opening up
the possibility of changing structures with a different version for no
good reason."
We want to return more data in the struct in future iterations. So
'changing structure with different version' is something we are
expecting.
With the backward compatibility constraints 1 & 2 above, it will ensure
that support matrix looks like a lower traingular matrix with each
successive version supporting previous version attributes. So supporting
future versions is relatively simplified.
>
> Also having the user query with version Z and get back version X (older) is
> odd. Generally if the kernel does not know about a feature (ie version Z of
> the structure) it should return -EINVAL and let the user figure out what to do.
> The user may just give up or they could try a different query.
>
Considering the flow of ndctl/libndctl this is needed. libndctl will
usually issues only one CMD_CALL ioctl to kernel and if that fails then
an error is reported and ndctl will exit loosing state.
Adding mechanism in libndctl to reissue CMD_CALL ioctl to fetch a
appropriate version of pdsm struct is going to be considerably more
work.
This version fall-back mechanism, ensures that libndctl will receive
usable data without having to reissue a more CMD_CALL ioctls.
>> + */
>> +
>> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> +struct nd_pdsm_cmd_pkg {
>> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> + __u16 reserved[5]; /* Ignored and to be used in future */
>
> How do you know when reserved is used for something else in the future? Is
> reserved guaranteed (and checked by the code) to be 0?
For current set of pdsm requests ignore these reserved fields. However a
future pdsm request can leverage these reserved fields. So papr_scm
just bind the usage of these fields with the value of
'nd_cmd_pkg.nd_command' that indicates the pdsm request.
That being said checking if the reserved fields are set to 0 will be a
good measure. Will add this check in next iteration.
>
>> + __u16 payload_version; /* In/Out: version of the payload */
>
> Why is payload_version after reserved?
Want to place the payload version field just before the payload data so
that it can be accessed with simple pointer arithmetic.
>
>> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> + */
>> +enum papr_pdsm {
>> + PAPR_PDSM_MIN = 0x0,
>> + PAPR_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> +{
>> + return (struct nd_pdsm_cmd_pkg *) cmd;
>> +}
>> +
>> +/* Return the payload pointer for a given pcmd */
>> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> +{
>> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> + return NULL;
>> + else
>> + return (void *)(pcmd->payload);
>> +}
>> +
>> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 149431594839..5e2237e7ec08 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -15,13 +15,15 @@
>> #include <linux/seq_buf.h>
>>
>> #include <asm/plpar_wrappers.h>
>> +#include <asm/papr_pdsm.h>
>>
>> #define BIND_ANY_ADDR (~0ul)
>>
>> #define PAPR_SCM_DIMM_CMD_MASK \
>> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> - (1ul << ND_CMD_SET_CONFIG_DATA))
>> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> + (1ul << ND_CMD_CALL))
>>
>> /* DIMM health bitmap bitmap indicators */
>> /* SCM device is unable to persist memory contents */
>> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> return 0;
>> }
>>
>> +/*
>> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
>> + */
>> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> + unsigned int buf_len)
>> +{
>> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> + struct papr_scm_priv *p;
>> +
>> + /* Only dimm-specific calls are supported atm */
>> + if (!nvdimm)
>> + return -EINVAL;
>> +
>> + /* get the provider date from struct nvdimm */
>
> s/date/data
Thanks for point this out. Will fix this in next iteration.
>
>> + p = nvdimm_provider_data(nvdimm);
>> +
>> + if (!test_bit(cmd, &cmd_mask)) {
>> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> + return -EINVAL;
>> + } else if (cmd == ND_CMD_CALL) {
>> +
>> + /* Verify the envelope package */
>> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> + buf_len);
>> + return -EINVAL;
>> + }
>> +
>> + /* Verify that the PDSM family is valid */
>> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
>> + pkg->hdr.nd_family);
>> + return -EINVAL;
>> +
>> + }
>> +
>> + /* We except a payload with all PDSM commands */
>> + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> + dev_dbg(&p->pdev->dev,
>> + "Empty payload for sub-command=0x%llx\n",
>> + pkg->hdr.nd_command);
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + /* Command looks valid */
>
> I assume the first command to be implemented also checks the { nd_command,
> payload_version, payload length } for correctness?
Yes the pdsm service functions do check the payload_version and
payload_length. Please see the papr_pdsm_health() that services the
PAPR_PDSM_HEALTH pdsm in Patch-5
>
>> + return 0;
>> +}
>> +
>> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> + struct nd_pdsm_cmd_pkg *call_pkg)
>> +{
>> + /* unknown subcommands return error in packages */
>> + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
>> + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
>> + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
>> + call_pkg->hdr.nd_command);
>> + call_pkg->cmd_status = -EINVAL;
>> + return 0;
>> + }
>> +
>> + /* Depending on the DSM command call appropriate service routine */
>> + switch (call_pkg->hdr.nd_command) {
>> + default:
>> + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> + call_pkg->hdr.nd_command);
>> + call_pkg->cmd_status = -ENOENT;
>> + return 0;
>> + }
>> +}
>> +
>> static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> unsigned int buf_len, int *cmd_rc)
>> {
>> struct nd_cmd_get_config_size *get_size_hdr;
>> struct papr_scm_priv *p;
>> + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
>> + int rc;
>>
>> - /* Only dimm-specific calls are supported atm */
>> - if (!nvdimm)
>> - return -EINVAL;
>> + /* Use a local variable in case cmd_rc pointer is NULL */
>> + if (cmd_rc == NULL)
>> + cmd_rc = &rc;
>
> Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
> and you did not change it.
This pointer is coming from outside the papr_scm code hence need to be
defensive here. Also as per[1] cmd_rc is "translation of firmware status"
and not every caller would need it hence making this pointer optional.
This is evident in acpi_nfit_blk_get_flags() where the 'nd_desc->ndctl'
is called with 'cmd_rc == NULL'.
[1] https://lore.kernel.org/linux-nvdimm/CAPcyv4hE_FG0YZXJVA1G=CBq8b9e0K54jxk5Sq5UKU-dnWT2Kg@mail.gmail.com/
>
>> +
>> + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
>> + if (*cmd_rc) {
>> + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
>> + return *cmd_rc;
>> + }
>>
>> p = nvdimm_provider_data(nvdimm);
>>
>> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> *cmd_rc = papr_scm_meta_set(p, buf);
>> break;
>>
>> + case ND_CMD_CALL:
>> + call_pkg = nd_to_pdsm_cmd_pkg(buf);
>> + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
>> + break;
>> +
>> default:
>> - return -EINVAL;
>> + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
>> + *cmd_rc = -EINVAL;
>
> Is this change related? If there is a bug where there is a caller of
> papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
> that issue.
This simplifies a bit debugging of errors reported in
papr_scm_ndctl() as it ensures that subsequest dev_dbg "Returned with
cmd_rc" is always logged.
I think, this is a too small change to be carved out as an independent
patch. Also this doesnt change the behaviour of the code except logging
some more error info.
However, If you feel too strongly about it I will spin a separate patch
in this patch series for this.
>
> Ira
>
>> }
>>
>> dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>>
>> - return 0;
>> + return *cmd_rc;
>> }
>>
>> static ssize_t flags_show(struct device *dev,
>> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
>> index de5d90212409..0e09dc5cec19 100644
>> --- a/include/uapi/linux/ndctl.h
>> +++ b/include/uapi/linux/ndctl.h
>> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>> #define NVDIMM_FAMILY_HPE2 2
>> #define NVDIMM_FAMILY_MSFT 3
>> #define NVDIMM_FAMILY_HYPERV 4
>> +#define NVDIMM_FAMILY_PAPR 5
>>
>> #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
>> struct nd_cmd_pkg)
>> --
>> 2.26.2
>>
--
Cheers
~ Vaibhav
Hi Ira,
Thanks for reviewing this patch. My responses below:
Ira Weiny <[email protected]> writes:
> On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
>> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
>> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
>> containing dimm health information back to user space in response to
>> ND_CMD_CALL. This functionality is implemented in newly introduced
>> papr_pdsm_health() that queries the nvdimm health information and
>> then copies this information to the package payload whose layout is
>> defined by 'struct nd_papr_pdsm_health'.
>>
>> The patch also introduces a new member 'struct papr_scm_priv.health'
>> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
>> information of a nvdimm. As a result functions drc_pmem_query_health()
>> and flags_show() are updated to populate and use this new struct
>> instead of a u64 integer that was earlier used.
>>
>> Cc: "Aneesh Kumar K . V" <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Reviewed-by: Aneesh Kumar K.V <[email protected]>
>> Signed-off-by: Vaibhav Jain <[email protected]>
>> ---
>> Changelog:
>>
>> Resend:
>> * Added ack from Aneesh.
>>
>> v8..v9:
>> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
>> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
>> * Renamed papr_scm_get_health() to papr_psdm_health()
>> * Updated patch description to replace papr-scm dimm with nvdimm.
>>
>> v7..v8:
>> * None
>>
>> Resend:
>> * None
>>
>> v6..v7:
>> * Updated flags_show() to use seq_buf_printf(). [Mpe]
>> * Updated papr_scm_get_health() to use newly introduced
>> __drc_pmem_query_health() bypassing the cache [Mpe].
>>
>> v5..v6:
>> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
>> gaurd against possibility of different compilers adding different
>> paddings to the struct [ Dan Williams ]
>>
>> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
>> 'bool' and also updated drc_pmem_query_health() to take this into
>> account. [ Dan Williams ]
>>
>> v4..v5:
>> * None
>>
>> v3..v4:
>> * Call the DSM_PAPR_SCM_HEALTH service function from
>> papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
>>
>> v2..v3:
>> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
>> as its exported to the userspace [Aneesh]
>> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
>> from enum to #defines [Aneesh]
>>
>> v1..v2:
>> * New patch in the series
>> ---
>> arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++++++
>> arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
>> 2 files changed, 147 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> index 6407fefcc007..411725a91591 100644
>> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
>> */
>> enum papr_pdsm {
>> PAPR_PDSM_MIN = 0x0,
>> + PAPR_PDSM_HEALTH,
>> PAPR_PDSM_MAX,
>> };
>>
>> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> return (void *)(pcmd->payload);
>> }
>>
>> +/* Various nvdimm health indicators */
>> +#define PAPR_PDSM_DIMM_HEALTHY 0
>> +#define PAPR_PDSM_DIMM_UNHEALTHY 1
>> +#define PAPR_PDSM_DIMM_CRITICAL 2
>> +#define PAPR_PDSM_DIMM_FATAL 3
>> +
>> +/*
>> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
>> + * Various flags indicate the health status of the dimm.
>> + *
>> + * dimm_unarmed : Dimm not armed. So contents wont persist.
>> + * dimm_bad_shutdown : Previous shutdown did not persist contents.
>> + * dimm_bad_restore : Contents from previous shutdown werent restored.
>> + * dimm_scrubbed : Contents of the dimm have been scrubbed.
>> + * dimm_locked : Contents of the dimm cant be modified until CEC reboot
>> + * dimm_encrypted : Contents of dimm are encrypted.
>> + * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
>> + */
>> +struct nd_papr_pdsm_health_v1 {
>> + __u8 dimm_unarmed;
>> + __u8 dimm_bad_shutdown;
>> + __u8 dimm_bad_restore;
>> + __u8 dimm_scrubbed;
>> + __u8 dimm_locked;
>> + __u8 dimm_encrypted;
>> + __u16 dimm_health;
>> +} __packed;
>> +
>> +/*
>> + * Typedef the current struct for dimm_health so that any application
>> + * or kernel recompiled after introducing a new version automatically
>> + * supports the new version.
>> + */
>> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
>> +
>> +/* Current version number for the dimm health struct */
>
> This can't be the 'current' version. You will need a list of versions you
> support. Because if the user passes in an old version you need to be able to
> respond with that old version. Also if you plan to support 'return X for a Y
> query' then the user will need both X and Y defined to interpret X.
Yes, and that change will be introduced with addition of version-2 of
nd_papr_pdsm_health. Earlier version of the patchset[1] had such a table
implemented. But to simplify the patchset, as we are only dealing with
version-1 of the structs right now, it was dropped.
[1] :
https://lore.kernel.org/linuxppc-dev/[email protected]/
>
>> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
>> +
>> #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 5e2237e7ec08..c0606c0c659c 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -88,7 +88,7 @@ struct papr_scm_priv {
>> unsigned long lasthealth_jiffies;
>>
>> /* Health information for the dimm */
>> - u64 health_bitmap;
>> + struct nd_papr_pdsm_health health;
>
> ok so we are throwing away all the #defs from patch 1? Are they still valid?
>
> I'm confused that patch 3 added this and we are throwing it away
> here...
The #defines are still valid, only the usage moved to a __drc_pmem_query_health().
>
>> };
>>
>> static int drc_pmem_bind(struct papr_scm_priv *p)
>> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>> static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> {
>> unsigned long ret[PLPAR_HCALL_BUFSIZE];
>> + u64 health;
>> long rc;
>>
>> /* issue the hcall */
>> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> if (rc != H_SUCCESS) {
>> dev_err(&p->pdev->dev,
>> "Failed to query health information, Err:%ld\n", rc);
>> - rc = -ENXIO;
>> - goto out;
>> + return -ENXIO;
>
> I missed this... probably did not need the goto in the first patch?
Yes, will get rid of the goto from patch-1.
>
>> }
>>
>> p->lasthealth_jiffies = jiffies;
>> - p->health_bitmap = ret[0] & ret[1];
>> + health = ret[0] & ret[1];
>>
>> dev_dbg(&p->pdev->dev,
>> "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>> ret[0], ret[1]);
>> -out:
>> - return rc;
>> +
>> + memset(&p->health, 0, sizeof(p->health));
>> +
>> + /* Check for various masks in bitmap and set the buffer */
>> + if (health & PAPR_PMEM_UNARMED_MASK)
>
> Oh ok... odd. (don't add code then just take it away in a series)
> You could have lead with the user structure and put this code in patch
> 3.
The struct nd_papr_pdsm_health in only introduced this patch in header
'papr_pdsm.h' as means of exchanging nvdimm health information with
userspace. Introducing this struct without introducing the necessary
scafolding in 'papr_pdsm.h' would have been very counter-intutive.
>
> Why does the user need u8 to represent a single bit? Does this help protect
> against endian issues?
This was 'bool' earlier but since type 'bool' isnt suitable for ioctl abi
and I wanted to avoid bit fields here as not sure if their packing may
differ across compilers hence replaced with u8.
>
>> + p->health.dimm_unarmed = 1;
>> +
>> + if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> + p->health.dimm_bad_shutdown = 1;
>> +
>> + if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> + p->health.dimm_bad_restore = 1;
>> +
>> + if (health & PAPR_PMEM_ENCRYPTED)
>> + p->health.dimm_encrypted = 1;
>> +
>> + if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
>> + p->health.dimm_locked = 1;
>> + p->health.dimm_scrubbed = 1;
>> + }
>> +
>> + if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
>> + p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
>> +
>> + if (health & PAPR_PMEM_HEALTH_CRITICAL)
>> + p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
>> +
>> + if (health & PAPR_PMEM_HEALTH_FATAL)
>> + p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
>> +
>> + return 0;
>> }
>>
>> /* Min interval in seconds for assuming stable dimm health */
>> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> return 0;
>> }
>>
>> +/* Fetch the DIMM health info and populate it in provided package. */
>> +static int papr_pdsm_health(struct papr_scm_priv *p,
>> + struct nd_pdsm_cmd_pkg *pkg)
>> +{
>> + int rc;
>> + size_t copysize = sizeof(p->health);
>> +
>> + /* Ensure dimm health mutex is taken preventing concurrent access */
>> + rc = mutex_lock_interruptible(&p->health_mutex);
>> + if (rc)
>> + goto out;
>> +
>> + /* Always fetch upto date dimm health data ignoring cached values */
>> + rc = __drc_pmem_query_health(p);
>> + if (rc)
>> + goto out_unlock;
>> + /*
>> + * If the requested payload version is greater than one we know
>> + * about, return the payload version we know about and let
>> + * caller/userspace handle.
>> + */
>> + if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
>> + pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
>
> I know this seems easy now but I do think you will run into trouble later.
I did addressed this in an earlier iteration of this patchset[1] and
dropped it in favour of simplicity.
[1] :
https://lore.kernel.org/linuxppc-dev/[email protected]/
> Ira
>
>> +
>> + if (pkg->hdr.nd_size_out < copysize) {
>> + dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
>> + pkg->hdr.nd_size_out, copysize);
>> + rc = -ENOSPC;
>> + goto out_unlock;
>> + }
>> +
>> + dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
>> + copysize, pkg->payload_version);
>> +
>> + /* Copy the health struct to the payload */
>> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
>> + pkg->hdr.nd_fw_size = copysize;
>> +
>> +out_unlock:
>> + mutex_unlock(&p->health_mutex);
>> +
>> +out:
>> + /*
>> + * Put the error in out package and return success from function
>> + * so that errors if any are propogated back to userspace.
>> + */
>> + pkg->cmd_status = rc;
>> + dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
>> +
>> + return 0;
>> +}
>> +
>> static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> struct nd_pdsm_cmd_pkg *call_pkg)
>> {
>> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>>
>> /* Depending on the DSM command call appropriate service routine */
>> switch (call_pkg->hdr.nd_command) {
>> + case PAPR_PDSM_HEALTH:
>> + return papr_pdsm_health(p, call_pkg);
>> +
>> default:
>> dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> call_pkg->hdr.nd_command);
>> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
>> struct nvdimm *dimm = to_nvdimm(dev);
>> struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> struct seq_buf s;
>> - u64 health;
>> int rc;
>>
>> rc = drc_pmem_query_health(p);
>> if (rc)
>> return rc;
>>
>> - /* Copy health_bitmap locally, check masks & update out buffer */
>> - health = READ_ONCE(p->health_bitmap);
>> -
>> seq_buf_init(&s, buf, PAGE_SIZE);
>> - if (health & PAPR_PMEM_UNARMED_MASK)
>> +
>> + /* Protect concurrent modifications to papr_scm_priv */
>> + rc = mutex_lock_interruptible(&p->health_mutex);
>> + if (rc)
>> + return rc;
>> +
>> + if (p->health.dimm_unarmed)
>> seq_buf_printf(&s, "not_armed ");
>>
>> - if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> + if (p->health.dimm_bad_shutdown)
>> seq_buf_printf(&s, "flush_fail ");
>>
>> - if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> + if (p->health.dimm_bad_restore)
>> seq_buf_printf(&s, "restore_fail ");
>>
>> - if (health & PAPR_PMEM_ENCRYPTED)
>> + if (p->health.dimm_encrypted)
>> seq_buf_printf(&s, "encrypted ");
>>
>> - if (health & PAPR_PMEM_SMART_EVENT_MASK)
>> + if (p->health.dimm_health)
>> seq_buf_printf(&s, "smart_notify ");
>>
>> - if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
>> - seq_buf_printf(&s, "scrubbed locked ");
>> + if (p->health.dimm_scrubbed)
>> + seq_buf_printf(&s, "scrubbed ");
>> +
>> + if (p->health.dimm_locked)
>> + seq_buf_printf(&s, "locked ");
>> +
>> + mutex_unlock(&p->health_mutex);
>>
>> if (seq_buf_used(&s))
>> seq_buf_printf(&s, "\n");
>> --
>> 2.26.2
>>
--
Cheers
~ Vaibhav
Hi Ira,
Thanks again for reviewing this patch. My Response below:
Ira Weiny <[email protected]> writes:
> On Tue, Jun 02, 2020 at 01:51:49PM -0700, 'Ira Weiny' wrote:
>> On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
>
> ...
>
>> > +
>> > +/*
>> > + * PDSM Envelope:
>> > + *
>> > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
>> > + * envelope which consists of a header and user-defined payload sections.
>> > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
>> > + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
>> > + * There is reserved field that can used to introduce new fields to the
>> > + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
>> > + * lies at a 8-byte boundary.
>> > + *
>> > + * +-------------+---------------------+---------------------------+
>> > + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
>> > + * +-------------+---------------------+---------------------------+
>> > + * | nd_pdsm_cmd_pkg | |
>> > + * |-------------+ | |
>> > + * | nd_cmd_pkg | | |
>> > + * +-------------+---------------------+---------------------------+
>> > + * | nd_family | | |
>> > + * | nd_size_out | cmd_status | |
>> > + * | nd_size_in | payload_version | payload |
>> > + * | nd_command | reserved | |
>> > + * | nd_fw_size | | |
>> > + * +-------------+---------------------+---------------------------+
>
> One more comment WRT nd_size_[in|out]. I know that it is defined as the size
> of the FW payload but normally when you nest headers 'size' in Header A
> represents everything after Header A, including Header B. In this case that
> would be including nd_pdsm_cmd_pkg...
>
> It looks like that is not what you have done? Or perhaps I missed it?
>
Not sure if I understand the question correctly.
'struct nd_pdsm_cmd_pkg' contains 'struct nd_cmd_pkg' at its head and
its size_[in|out] are populated by the libndctl in userspace, setting
them to data following the 'struct nd_cmd_pkg'.
Copying of 'struct nd_cmd_pkg' to the input/out envelop is implicitly
done in __nd_ioctl via the command descriptor array
__nd_cmd_bus_descs. So I dont need to add the size of 'struct
nd_cmd_pkg' to nd_size_[in|out].
> Ira
>
>> > + *
>> > + * PDSM Header:
>> > + *
>> > + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
>> > + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
>> > + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
>> > + * contained in 'struct nd_cmd_pkg', the header also has members following
>> ^^^^^
>> ... the ...
>>
>> > + * members:
>> > + *
>> > + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
>> > + * 'payload_version' : (In/Out) Version number associated with the payload.
>> > + * 'reserved' : Not used and reserved for future.
>> > + *
>> > + * PDSM Payload:
>> > + *
>> > + * The layout of the PDSM Payload is defined by various structs shared between
>> > + * papr_scm and libndctl so that contents of payload can be interpreted. During
>> > + * servicing of a PDSM the papr_scm module will read input args from the payload
>> > + * field by casting its contents to an appropriate struct pointer based on the
>> > + * PDSM command. Similarly the output of servicing the PDSM command will be
>> > + * copied to the payload field using the same struct.
>> > + *
>> > + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
>> > + * leaves around 176 bytes for the envelope payload (ignoring any padding that
>> > + * the compiler may silently introduce).
>> > + *
>> > + * Payload Version:
>> > + *
>> > + * A 'payload_version' field is present in PDSM header that indicates a specific
>> > + * version of the structure present in PDSM Payload for a given PDSM command.
>> > + * This provides backward compatibility in case the PDSM Payload structure
>> > + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>> > + *
>> > + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>> > + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>> > + * module when servicing the PDSM envelope checks the 'payload_version' and then
>> > + * uses 'payload struct version' == MIN('payload_version field',
>> > + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> > + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>> > + * struct in returned 'payload_version' field.
>> > + *
>> > + * Libndctl on receiving the envelope back from papr_scm again checks the
>> > + * 'payload_version' field and based on it use the appropriate version dsm
>> > + * struct to parse the results.
>> > + *
>> > + * Backward Compatibility:
>> > + *
>> > + * Above scheme of exchanging different versioned PDSM struct between libndctl
>> > + * and papr_scm should provide backward compatibility until following two
>> > + * assumptions/conditions when defining new PDSM structs hold:
>> > + *
>> > + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> > + *
>> > + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> > + * i.e Each new version of PDSM struct should retain existing struct
>> > + * attributes from previous version
>> > + *
>> > + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> > + * it should also support T(1), T(2)...T(X - 1).
>> > + * i.e When adding support for new version of a PDSM struct, libndctl
>> > + * and papr_scm should retain support of the existing PDSM struct
>> > + * version they support.
>>
>> Please see this thread for an example why versions are a bad idea in UAPIs:
>>
>> https://lkml.org/lkml/2020/3/26/213
>>
>> While the use of version is different in that thread the fundamental issues are
>> the same. You end up with some weird matrix of supported features and
>> structure definitions. For example, you are opening up the possibility of
>> changing structures with a different version for no good reason.
>>
>> Also having the user query with version Z and get back version X (older) is
>> odd. Generally if the kernel does not know about a feature (ie version Z of
>> the structure) it should return -EINVAL and let the user figure out what to do.
>> The user may just give up or they could try a different query.
>>
>> > + */
>> > +
>> > +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> > +struct nd_pdsm_cmd_pkg {
>> > + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> > + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> > + __u16 reserved[5]; /* Ignored and to be used in future */
>>
>> How do you know when reserved is used for something else in the future? Is
>> reserved guaranteed (and checked by the code) to be 0?
>>
>> > + __u16 payload_version; /* In/Out: version of the payload */
>>
>> Why is payload_version after reserved?
>>
>> > + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> > +} __packed;
>> > +
>> > +/*
>> > + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> > + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> > + */
>> > +enum papr_pdsm {
>> > + PAPR_PDSM_MIN = 0x0,
>> > + PAPR_PDSM_MAX,
>> > +};
>> > +
>> > +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> > +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> > +{
>> > + return (struct nd_pdsm_cmd_pkg *) cmd;
>> > +}
>> > +
>> > +/* Return the payload pointer for a given pcmd */
>> > +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> > +{
>> > + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> > + return NULL;
>> > + else
>> > + return (void *)(pcmd->payload);
>> > +}
>> > +
>> > +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> > index 149431594839..5e2237e7ec08 100644
>> > --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> > @@ -15,13 +15,15 @@
>> > #include <linux/seq_buf.h>
>> >
>> > #include <asm/plpar_wrappers.h>
>> > +#include <asm/papr_pdsm.h>
>> >
>> > #define BIND_ANY_ADDR (~0ul)
>> >
>> > #define PAPR_SCM_DIMM_CMD_MASK \
>> > ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> > (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> > - (1ul << ND_CMD_SET_CONFIG_DATA))
>> > + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> > + (1ul << ND_CMD_CALL))
>> >
>> > /* DIMM health bitmap bitmap indicators */
>> > /* SCM device is unable to persist memory contents */
>> > @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> > return 0;
>> > }
>> >
>> > +/*
>> > + * Validate the inputs args to dimm-control function and return '0' if valid.
>> > + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
>> > + */
>> > +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> > + unsigned int buf_len)
>> > +{
>> > + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> > + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> > + struct papr_scm_priv *p;
>> > +
>> > + /* Only dimm-specific calls are supported atm */
>> > + if (!nvdimm)
>> > + return -EINVAL;
>> > +
>> > + /* get the provider date from struct nvdimm */
>>
>> s/date/data
>>
>> > + p = nvdimm_provider_data(nvdimm);
>> > +
>> > + if (!test_bit(cmd, &cmd_mask)) {
>> > + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> > + return -EINVAL;
>> > + } else if (cmd == ND_CMD_CALL) {
>> > +
>> > + /* Verify the envelope package */
>> > + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> > + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> > + buf_len);
>> > + return -EINVAL;
>> > + }
>> > +
>> > + /* Verify that the PDSM family is valid */
>> > + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> > + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
>> > + pkg->hdr.nd_family);
>> > + return -EINVAL;
>> > +
>> > + }
>> > +
>> > + /* We except a payload with all PDSM commands */
>> > + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> > + dev_dbg(&p->pdev->dev,
>> > + "Empty payload for sub-command=0x%llx\n",
>> > + pkg->hdr.nd_command);
>> > + return -EINVAL;
>> > + }
>> > + }
>> > +
>> > + /* Command looks valid */
>>
>> I assume the first command to be implemented also checks the { nd_command,
>> payload_version, payload length } for correctness?
>>
>> > + return 0;
>> > +}
>> > +
>> > +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> > + struct nd_pdsm_cmd_pkg *call_pkg)
>> > +{
>> > + /* unknown subcommands return error in packages */
>> > + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
>> > + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
>> > + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
>> > + call_pkg->hdr.nd_command);
>> > + call_pkg->cmd_status = -EINVAL;
>> > + return 0;
>> > + }
>> > +
>> > + /* Depending on the DSM command call appropriate service routine */
>> > + switch (call_pkg->hdr.nd_command) {
>> > + default:
>> > + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> > + call_pkg->hdr.nd_command);
>> > + call_pkg->cmd_status = -ENOENT;
>> > + return 0;
>> > + }
>> > +}
>> > +
>> > static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> > struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> > unsigned int buf_len, int *cmd_rc)
>> > {
>> > struct nd_cmd_get_config_size *get_size_hdr;
>> > struct papr_scm_priv *p;
>> > + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
>> > + int rc;
>> >
>> > - /* Only dimm-specific calls are supported atm */
>> > - if (!nvdimm)
>> > - return -EINVAL;
>> > + /* Use a local variable in case cmd_rc pointer is NULL */
>> > + if (cmd_rc == NULL)
>> > + cmd_rc = &rc;
>>
>> Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
>> and you did not change it.
>>
>> > +
>> > + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
>> > + if (*cmd_rc) {
>> > + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
>> > + return *cmd_rc;
>> > + }
>> >
>> > p = nvdimm_provider_data(nvdimm);
>> >
>> > @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> > *cmd_rc = papr_scm_meta_set(p, buf);
>> > break;
>> >
>> > + case ND_CMD_CALL:
>> > + call_pkg = nd_to_pdsm_cmd_pkg(buf);
>> > + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
>> > + break;
>> > +
>> > default:
>> > - return -EINVAL;
>> > + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
>> > + *cmd_rc = -EINVAL;
>>
>> Is this change related? If there is a bug where there is a caller of
>> papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
>> that issue.
>>
>> Ira
>>
>> > }
>> >
>> > dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>> >
>> > - return 0;
>> > + return *cmd_rc;
>> > }
>> >
>> > static ssize_t flags_show(struct device *dev,
>> > diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
>> > index de5d90212409..0e09dc5cec19 100644
>> > --- a/include/uapi/linux/ndctl.h
>> > +++ b/include/uapi/linux/ndctl.h
>> > @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>> > #define NVDIMM_FAMILY_HPE2 2
>> > #define NVDIMM_FAMILY_MSFT 3
>> > #define NVDIMM_FAMILY_HYPERV 4
>> > +#define NVDIMM_FAMILY_PAPR 5
>> >
>> > #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
>> > struct nd_cmd_pkg)
>> > --
>> > 2.26.2
>> >
>> _______________________________________________
>> Linux-nvdimm mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
--
Cheers
~ Vaibhav
On Wed, Jun 03, 2020 at 11:41:42PM +0530, Vaibhav Jain wrote:
> Hi Ira,
>
> Thanks for reviewing this patch. My responses below:
>
> Ira Weiny <[email protected]> writes:
>
...
> >> + *
> >> + * Payload Version:
> >> + *
> >> + * A 'payload_version' field is present in PDSM header that indicates a specific
> >> + * version of the structure present in PDSM Payload for a given PDSM command.
> >> + * This provides backward compatibility in case the PDSM Payload structure
> >> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> >> + *
> >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> >> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> >> + * module when servicing the PDSM envelope checks the 'payload_version' and then
> >> + * uses 'payload struct version' == MIN('payload_version field',
> >> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> >> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> >> + * struct in returned 'payload_version' field.
> >> + *
> >> + * Libndctl on receiving the envelope back from papr_scm again checks the
> >> + * 'payload_version' field and based on it use the appropriate version dsm
> >> + * struct to parse the results.
> >> + *
> >> + * Backward Compatibility:
> >> + *
> >> + * Above scheme of exchanging different versioned PDSM struct between libndctl
> >> + * and papr_scm should provide backward compatibility until following two
> >> + * assumptions/conditions when defining new PDSM structs hold:
> >> + *
> >> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> >> + *
> >> + * 1. T(X) is a proper subset of T(Y) if Y > X.
> >> + * i.e Each new version of PDSM struct should retain existing struct
> >> + * attributes from previous version
> >> + *
> >> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> >> + * it should also support T(1), T(2)...T(X - 1).
> >> + * i.e When adding support for new version of a PDSM struct, libndctl
> >> + * and papr_scm should retain support of the existing PDSM struct
> >> + * version they support.
> >
> > Please see this thread for an example why versions are a bad idea in UAPIs:
> >
> > https://lkml.org/lkml/2020/3/26/213
> >
>
> > While the use of version is different in that thread the fundamental issues are
> > the same. You end up with some weird matrix of supported features and
> > structure definitions. For example, you are opening up the possibility of
> > changing structures with a different version for no good reason.
>
> Not really sure I understand the statement correctly "you are opening up
> the possibility of changing structures with a different version for no
> good reason."
What I mean is:
struct v1 {
u32 x;
u32 y;
};
struct v2 {
u32 y;
u32 x;
};
x and y are the same data but you have now redefined the order of the struct.
You don't need that flexibility/complexity.
Generally I think you are defining:
struct v1 {
u32 x;
u32 y;
};
struct v2 {
u32 x;
u32 y;
u32 z;
u32 a;
};
Which becomes 2 structures... There is no need.
The easiest thing to do is:
struct user_data {
u32 x;
u32 y;
};
And later you modify user_data to:
struct user_data {
u32 x;
u32 y;
u32 z;
u32 a;
};
libndctl always passes sizeof(struct user_data) to the call. [Do ensure
structures are 64bit aligned for this to work.]
The kernel sees the size and returns the amount of data up to that size.
Therefore, older kernels automatically fill in x and y, newer kernels fill in
z/a if the buffer was big enough. libndctl only uses the fields it knows about.
It is _much_ easier this way. Almost nothing needs to get changed as versions
roll forward. The only big issue is if libndctl _needs_ z then it has to check
if z is returned.
In that case add a cap_mask with bit fields which the kernel can fill in for
which fields are valid.
struct user_data {
u64 cap_mask; /* where bits define extra future capabilities */
u32 x;
u32 y;
};
IFF you need to add data within fields which are reserved you can use
capability flags to indicate which fields are requested and which are returned
by the kernel.
But I _think_ for what you want libndctl must survive if z/a are not available
right? So just adding to the structure should be fine.
> We want to return more data in the struct in future iterations. So
> 'changing structure with different version' is something we are
> expecting.
>
> With the backward compatibility constraints 1 & 2 above, it will ensure
> that support matrix looks like a lower traingular matrix with each
> successive version supporting previous version attributes. So supporting
> future versions is relatively simplified.
But you end up with weird switch/if's etc to deal with the multiple structures.
With the size method the kernel simply returns the same size data as the user
requested and everything is done. No logic required at all. Literally it can
just copy the data it has (truncating if necessary).
>
> >
> > Also having the user query with version Z and get back version X (older) is
> > odd. Generally if the kernel does not know about a feature (ie version Z of
> > the structure) it should return -EINVAL and let the user figure out what to do.
> > The user may just give up or they could try a different query.
> >
> Considering the flow of ndctl/libndctl this is needed. libndctl will
> usually issues only one CMD_CALL ioctl to kernel and if that fails then
> an error is reported and ndctl will exit loosing state.
>
> Adding mechanism in libndctl to reissue CMD_CALL ioctl to fetch a
> appropriate version of pdsm struct is going to be considerably more
> work.
>
> This version fall-back mechanism, ensures that libndctl will receive
> usable data without having to reissue a more CMD_CALL ioctls.
Define usable?
What happens if libndctl does not get 'z' in my example above? What does it
do? If I understand correctly it does not _need_ z. So why have a check on
the version from the kernel?
What if we change to:
struct v3 {
u32 x;
u32 y;
u32 z;
u32 a;
u32 b;
u32 c;
};
Now it has to
if(version 2)
z/a valid do something()
if(version 3)
b/c valid do something else()
if z, a, b, c are all 0 does it matter?
If not, the logic above disappears.
If so, then you need a cap mask. Then the kernel can say c and a are valid
(but c is 0) or other flexible stuff like that.
>
> >> + */
> >> +
> >> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> >> +struct nd_pdsm_cmd_pkg {
> >> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
> >> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> >> + __u16 reserved[5]; /* Ignored and to be used in future */
> >
> > How do you know when reserved is used for something else in the future? Is
> > reserved guaranteed (and checked by the code) to be 0?
>
> For current set of pdsm requests ignore these reserved fields. However a
> future pdsm request can leverage these reserved fields. So papr_scm
> just bind the usage of these fields with the value of
> 'nd_cmd_pkg.nd_command' that indicates the pdsm request.
>
> That being said checking if the reserved fields are set to 0 will be a
> good measure. Will add this check in next iteration.
Exactly, if you don't check them now you will end up with an older libndctl
which passes in garbage and breaks future users... Basically rendering the
reserved fields useless.
>
> >
> >> + __u16 payload_version; /* In/Out: version of the payload */
> >
> > Why is payload_version after reserved?
> Want to place the payload version field just before the payload data so
> that it can be accessed with simple pointer arithmetic.
I did not see that in the patch. I thought you were using
nd_pdsm_cmd_pkg->payload_version?
>
> >
> >> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> >> +} __packed;
> >> +
> >> +/*
> >> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> >> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> >> + */
> >> +enum papr_pdsm {
> >> + PAPR_PDSM_MIN = 0x0,
> >> + PAPR_PDSM_MAX,
> >> +};
> >> +
> >> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> >> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> >> +{
> >> + return (struct nd_pdsm_cmd_pkg *) cmd;
> >> +}
> >> +
> >> +/* Return the payload pointer for a given pcmd */
> >> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> >> +{
> >> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> >> + return NULL;
> >> + else
> >> + return (void *)(pcmd->payload);
> >> +}
> >> +
> >> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 149431594839..5e2237e7ec08 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -15,13 +15,15 @@
> >> #include <linux/seq_buf.h>
> >>
> >> #include <asm/plpar_wrappers.h>
> >> +#include <asm/papr_pdsm.h>
> >>
> >> #define BIND_ANY_ADDR (~0ul)
> >>
> >> #define PAPR_SCM_DIMM_CMD_MASK \
> >> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> >> (1ul << ND_CMD_GET_CONFIG_DATA) | \
> >> - (1ul << ND_CMD_SET_CONFIG_DATA))
> >> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
> >> + (1ul << ND_CMD_CALL))
> >>
> >> /* DIMM health bitmap bitmap indicators */
> >> /* SCM device is unable to persist memory contents */
> >> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> >> return 0;
> >> }
> >>
> >> +/*
> >> + * Validate the inputs args to dimm-control function and return '0' if valid.
> >> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> >> + */
> >> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> >> + unsigned int buf_len)
> >> +{
> >> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> >> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> >> + struct papr_scm_priv *p;
> >> +
> >> + /* Only dimm-specific calls are supported atm */
> >> + if (!nvdimm)
> >> + return -EINVAL;
> >> +
> >> + /* get the provider date from struct nvdimm */
> >
> > s/date/data
> Thanks for point this out. Will fix this in next iteration.
>
> >
> >> + p = nvdimm_provider_data(nvdimm);
> >> +
> >> + if (!test_bit(cmd, &cmd_mask)) {
> >> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> >> + return -EINVAL;
> >> + } else if (cmd == ND_CMD_CALL) {
> >> +
> >> + /* Verify the envelope package */
> >> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> >> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> >> + buf_len);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + /* Verify that the PDSM family is valid */
> >> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> >> + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> >> + pkg->hdr.nd_family);
> >> + return -EINVAL;
> >> +
> >> + }
> >> +
> >> + /* We except a payload with all PDSM commands */
> >> + if (pdsm_cmd_to_payload(pkg) == NULL) {
> >> + dev_dbg(&p->pdev->dev,
> >> + "Empty payload for sub-command=0x%llx\n",
> >> + pkg->hdr.nd_command);
> >> + return -EINVAL;
> >> + }
> >> + }
> >> +
> >> + /* Command looks valid */
> >
> > I assume the first command to be implemented also checks the { nd_command,
> > payload_version, payload length } for correctness?
> Yes the pdsm service functions do check the payload_version and
> payload_length. Please see the papr_pdsm_health() that services the
> PAPR_PDSM_HEALTH pdsm in Patch-5
>
cool.
> >
> >> + return 0;
> >> +}
> >> +
> >> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> >> + struct nd_pdsm_cmd_pkg *call_pkg)
> >> +{
> >> + /* unknown subcommands return error in packages */
> >> + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
> >> + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
> >> + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> >> + call_pkg->hdr.nd_command);
> >> + call_pkg->cmd_status = -EINVAL;
> >> + return 0;
> >> + }
> >> +
> >> + /* Depending on the DSM command call appropriate service routine */
> >> + switch (call_pkg->hdr.nd_command) {
> >> + default:
> >> + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> >> + call_pkg->hdr.nd_command);
> >> + call_pkg->cmd_status = -ENOENT;
> >> + return 0;
> >> + }
> >> +}
> >> +
> >> static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >> struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> >> unsigned int buf_len, int *cmd_rc)
> >> {
> >> struct nd_cmd_get_config_size *get_size_hdr;
> >> struct papr_scm_priv *p;
> >> + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> >> + int rc;
> >>
> >> - /* Only dimm-specific calls are supported atm */
> >> - if (!nvdimm)
> >> - return -EINVAL;
> >> + /* Use a local variable in case cmd_rc pointer is NULL */
> >> + if (cmd_rc == NULL)
> >> + cmd_rc = &rc;
> >
> > Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
> > and you did not change it.
> This pointer is coming from outside the papr_scm code hence need to be
> defensive here. Also as per[1] cmd_rc is "translation of firmware status"
> and not every caller would need it hence making this pointer optional.
>
> This is evident in acpi_nfit_blk_get_flags() where the 'nd_desc->ndctl'
> is called with 'cmd_rc == NULL'.
>
> [1] https://lore.kernel.org/linux-nvdimm/CAPcyv4hE_FG0YZXJVA1G=CBq8b9e0K54jxk5Sq5UKU-dnWT2Kg@mail.gmail.com/
Ah... Ok. So this is a bug fix which needs to happen regardless of the status
of this patch...
>
> >
> >> +
> >> + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> >> + if (*cmd_rc) {
> >> + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> >> + return *cmd_rc;
> >> + }
> >>
> >> p = nvdimm_provider_data(nvdimm);
> >>
> >> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >> *cmd_rc = papr_scm_meta_set(p, buf);
... Because this will break here. even without this new code... right?
Lets get this fix in as a prelim-patch.
> >> break;
> >>
> >> + case ND_CMD_CALL:
> >> + call_pkg = nd_to_pdsm_cmd_pkg(buf);
> >> + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> >> + break;
> >> +
> >> default:
> >> - return -EINVAL;
> >> + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> >> + *cmd_rc = -EINVAL;
> >
> > Is this change related? If there is a bug where there is a caller of
> > papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
> > that issue.
> This simplifies a bit debugging of errors reported in
> papr_scm_ndctl() as it ensures that subsequest dev_dbg "Returned with
> cmd_rc" is always logged.
>
> I think, this is a too small change to be carved out as an independent
> patch. Also this doesnt change the behaviour of the code except logging
> some more error info.
>
> However, If you feel too strongly about it I will spin a separate patch
> in this patch series for this.
This can go in as part of a 'protect against cmd_rc == NULL' preliminary patch.
I flagged this because at first I could not figure out what this had to do with
the ND_CMD_CALL...
For reviewers you want to make your patches concise to what you are
fixing/adding...
Also, based on acpi_nfit_blk_get_flags() using cmd_rc == NULL it looks like we
have a bug which needs to get fixed regardless of the this patch. And if that
bug exists in earlier kernels you will need a separate patch to backport as a
fix.
So lets get that in first and separate... :-D
Ira
>
> >
> > Ira
> >
> >> }
> >>
> >> dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
> >>
> >> - return 0;
> >> + return *cmd_rc;
> >> }
> >>
> >> static ssize_t flags_show(struct device *dev,
> >> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> >> index de5d90212409..0e09dc5cec19 100644
> >> --- a/include/uapi/linux/ndctl.h
> >> +++ b/include/uapi/linux/ndctl.h
> >> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
> >> #define NVDIMM_FAMILY_HPE2 2
> >> #define NVDIMM_FAMILY_MSFT 3
> >> #define NVDIMM_FAMILY_HYPERV 4
> >> +#define NVDIMM_FAMILY_PAPR 5
> >>
> >> #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
> >> struct nd_cmd_pkg)
> >> --
> >> 2.26.2
> >>
>
> --
> Cheers
> ~ Vaibhav
On Thu, Jun 04, 2020 at 01:58:00AM +0530, Vaibhav Jain wrote:
> Hi Ira,
>
> Thanks again for reviewing this patch. My Response below:
>
> Ira Weiny <[email protected]> writes:
>
> > On Tue, Jun 02, 2020 at 01:51:49PM -0700, 'Ira Weiny' wrote:
> >> On Tue, Jun 02, 2020 at 03:44:37PM +0530, Vaibhav Jain wrote:
> >
> > ...
> >
> >> > +
> >> > +/*
> >> > + * PDSM Envelope:
> >> > + *
> >> > + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> >> > + * envelope which consists of a header and user-defined payload sections.
> >> > + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> >> > + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
> >> > + * There is reserved field that can used to introduce new fields to the
> >> > + * structure in future. It also tries to ensure that 'nd_pdsm_cmd_pkg.payload'
> >> > + * lies at a 8-byte boundary.
> >> > + *
> >> > + * +-------------+---------------------+---------------------------+
> >> > + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
> >> > + * +-------------+---------------------+---------------------------+
> >> > + * | nd_pdsm_cmd_pkg | |
> >> > + * |-------------+ | |
> >> > + * | nd_cmd_pkg | | |
> >> > + * +-------------+---------------------+---------------------------+
> >> > + * | nd_family | | |
> >> > + * | nd_size_out | cmd_status | |
> >> > + * | nd_size_in | payload_version | payload |
> >> > + * | nd_command | reserved | |
> >> > + * | nd_fw_size | | |
> >> > + * +-------------+---------------------+---------------------------+
> >
> > One more comment WRT nd_size_[in|out]. I know that it is defined as the size
> > of the FW payload but normally when you nest headers 'size' in Header A
> > represents everything after Header A, including Header B. In this case that
> > would be including nd_pdsm_cmd_pkg...
> >
> > It looks like that is not what you have done? Or perhaps I missed it?
> >
> Not sure if I understand the question correctly.
> 'struct nd_pdsm_cmd_pkg' contains 'struct nd_cmd_pkg' at its head and
> its size_[in|out] are populated by the libndctl in userspace, setting
> them to data following the 'struct nd_cmd_pkg'.
>
> Copying of 'struct nd_cmd_pkg' to the input/out envelop is implicitly
> done in __nd_ioctl via the command descriptor array
> __nd_cmd_bus_descs. So I dont need to add the size of 'struct
> nd_cmd_pkg' to nd_size_[in|out].
Yea I see that now... Coming from a networking background I find that odd...
:-/ Usually 'size' in a header includes all data after that header. Because
header A knows nothing of the rest of the 'payload'...
FWIW you could define nd_size_in anyway you want because you are not really
sending any payload back from firmware directly. But I suppose I can live with
it.
Ira
>
> > Ira
> >
> >> > + *
> >> > + * PDSM Header:
> >> > + *
> >> > + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> >> > + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> >> > + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> >> > + * contained in 'struct nd_cmd_pkg', the header also has members following
> >> ^^^^^
> >> ... the ...
> >>
> >> > + * members:
> >> > + *
> >> > + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
> >> > + * 'payload_version' : (In/Out) Version number associated with the payload.
> >> > + * 'reserved' : Not used and reserved for future.
> >> > + *
> >> > + * PDSM Payload:
> >> > + *
> >> > + * The layout of the PDSM Payload is defined by various structs shared between
> >> > + * papr_scm and libndctl so that contents of payload can be interpreted. During
> >> > + * servicing of a PDSM the papr_scm module will read input args from the payload
> >> > + * field by casting its contents to an appropriate struct pointer based on the
> >> > + * PDSM command. Similarly the output of servicing the PDSM command will be
> >> > + * copied to the payload field using the same struct.
> >> > + *
> >> > + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> >> > + * leaves around 176 bytes for the envelope payload (ignoring any padding that
> >> > + * the compiler may silently introduce).
> >> > + *
> >> > + * Payload Version:
> >> > + *
> >> > + * A 'payload_version' field is present in PDSM header that indicates a specific
> >> > + * version of the structure present in PDSM Payload for a given PDSM command.
> >> > + * This provides backward compatibility in case the PDSM Payload structure
> >> > + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> >> > + *
> >> > + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> >> > + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> >> > + * module when servicing the PDSM envelope checks the 'payload_version' and then
> >> > + * uses 'payload struct version' == MIN('payload_version field',
> >> > + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> >> > + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> >> > + * struct in returned 'payload_version' field.
> >> > + *
> >> > + * Libndctl on receiving the envelope back from papr_scm again checks the
> >> > + * 'payload_version' field and based on it use the appropriate version dsm
> >> > + * struct to parse the results.
> >> > + *
> >> > + * Backward Compatibility:
> >> > + *
> >> > + * Above scheme of exchanging different versioned PDSM struct between libndctl
> >> > + * and papr_scm should provide backward compatibility until following two
> >> > + * assumptions/conditions when defining new PDSM structs hold:
> >> > + *
> >> > + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> >> > + *
> >> > + * 1. T(X) is a proper subset of T(Y) if Y > X.
> >> > + * i.e Each new version of PDSM struct should retain existing struct
> >> > + * attributes from previous version
> >> > + *
> >> > + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> >> > + * it should also support T(1), T(2)...T(X - 1).
> >> > + * i.e When adding support for new version of a PDSM struct, libndctl
> >> > + * and papr_scm should retain support of the existing PDSM struct
> >> > + * version they support.
> >>
> >> Please see this thread for an example why versions are a bad idea in UAPIs:
> >>
> >> https://lkml.org/lkml/2020/3/26/213
> >>
> >> While the use of version is different in that thread the fundamental issues are
> >> the same. You end up with some weird matrix of supported features and
> >> structure definitions. For example, you are opening up the possibility of
> >> changing structures with a different version for no good reason.
> >>
> >> Also having the user query with version Z and get back version X (older) is
> >> odd. Generally if the kernel does not know about a feature (ie version Z of
> >> the structure) it should return -EINVAL and let the user figure out what to do.
> >> The user may just give up or they could try a different query.
> >>
> >> > + */
> >> > +
> >> > +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> >> > +struct nd_pdsm_cmd_pkg {
> >> > + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
> >> > + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> >> > + __u16 reserved[5]; /* Ignored and to be used in future */
> >>
> >> How do you know when reserved is used for something else in the future? Is
> >> reserved guaranteed (and checked by the code) to be 0?
> >>
> >> > + __u16 payload_version; /* In/Out: version of the payload */
> >>
> >> Why is payload_version after reserved?
> >>
> >> > + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> >> > +} __packed;
> >> > +
> >> > +/*
> >> > + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> >> > + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> >> > + */
> >> > +enum papr_pdsm {
> >> > + PAPR_PDSM_MIN = 0x0,
> >> > + PAPR_PDSM_MAX,
> >> > +};
> >> > +
> >> > +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> >> > +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> >> > +{
> >> > + return (struct nd_pdsm_cmd_pkg *) cmd;
> >> > +}
> >> > +
> >> > +/* Return the payload pointer for a given pcmd */
> >> > +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> >> > +{
> >> > + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> >> > + return NULL;
> >> > + else
> >> > + return (void *)(pcmd->payload);
> >> > +}
> >> > +
> >> > +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> >> > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >> > index 149431594839..5e2237e7ec08 100644
> >> > --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> > @@ -15,13 +15,15 @@
> >> > #include <linux/seq_buf.h>
> >> >
> >> > #include <asm/plpar_wrappers.h>
> >> > +#include <asm/papr_pdsm.h>
> >> >
> >> > #define BIND_ANY_ADDR (~0ul)
> >> >
> >> > #define PAPR_SCM_DIMM_CMD_MASK \
> >> > ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> >> > (1ul << ND_CMD_GET_CONFIG_DATA) | \
> >> > - (1ul << ND_CMD_SET_CONFIG_DATA))
> >> > + (1ul << ND_CMD_SET_CONFIG_DATA) | \
> >> > + (1ul << ND_CMD_CALL))
> >> >
> >> > /* DIMM health bitmap bitmap indicators */
> >> > /* SCM device is unable to persist memory contents */
> >> > @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> >> > return 0;
> >> > }
> >> >
> >> > +/*
> >> > + * Validate the inputs args to dimm-control function and return '0' if valid.
> >> > + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> >> > + */
> >> > +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> >> > + unsigned int buf_len)
> >> > +{
> >> > + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> >> > + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> >> > + struct papr_scm_priv *p;
> >> > +
> >> > + /* Only dimm-specific calls are supported atm */
> >> > + if (!nvdimm)
> >> > + return -EINVAL;
> >> > +
> >> > + /* get the provider date from struct nvdimm */
> >>
> >> s/date/data
> >>
> >> > + p = nvdimm_provider_data(nvdimm);
> >> > +
> >> > + if (!test_bit(cmd, &cmd_mask)) {
> >> > + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> >> > + return -EINVAL;
> >> > + } else if (cmd == ND_CMD_CALL) {
> >> > +
> >> > + /* Verify the envelope package */
> >> > + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> >> > + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> >> > + buf_len);
> >> > + return -EINVAL;
> >> > + }
> >> > +
> >> > + /* Verify that the PDSM family is valid */
> >> > + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> >> > + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> >> > + pkg->hdr.nd_family);
> >> > + return -EINVAL;
> >> > +
> >> > + }
> >> > +
> >> > + /* We except a payload with all PDSM commands */
> >> > + if (pdsm_cmd_to_payload(pkg) == NULL) {
> >> > + dev_dbg(&p->pdev->dev,
> >> > + "Empty payload for sub-command=0x%llx\n",
> >> > + pkg->hdr.nd_command);
> >> > + return -EINVAL;
> >> > + }
> >> > + }
> >> > +
> >> > + /* Command looks valid */
> >>
> >> I assume the first command to be implemented also checks the { nd_command,
> >> payload_version, payload length } for correctness?
> >>
> >> > + return 0;
> >> > +}
> >> > +
> >> > +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> >> > + struct nd_pdsm_cmd_pkg *call_pkg)
> >> > +{
> >> > + /* unknown subcommands return error in packages */
> >> > + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
> >> > + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
> >> > + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> >> > + call_pkg->hdr.nd_command);
> >> > + call_pkg->cmd_status = -EINVAL;
> >> > + return 0;
> >> > + }
> >> > +
> >> > + /* Depending on the DSM command call appropriate service routine */
> >> > + switch (call_pkg->hdr.nd_command) {
> >> > + default:
> >> > + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> >> > + call_pkg->hdr.nd_command);
> >> > + call_pkg->cmd_status = -ENOENT;
> >> > + return 0;
> >> > + }
> >> > +}
> >> > +
> >> > static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >> > struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> >> > unsigned int buf_len, int *cmd_rc)
> >> > {
> >> > struct nd_cmd_get_config_size *get_size_hdr;
> >> > struct papr_scm_priv *p;
> >> > + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> >> > + int rc;
> >> >
> >> > - /* Only dimm-specific calls are supported atm */
> >> > - if (!nvdimm)
> >> > - return -EINVAL;
> >> > + /* Use a local variable in case cmd_rc pointer is NULL */
> >> > + if (cmd_rc == NULL)
> >> > + cmd_rc = &rc;
> >>
> >> Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
> >> and you did not change it.
> >>
> >> > +
> >> > + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> >> > + if (*cmd_rc) {
> >> > + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> >> > + return *cmd_rc;
> >> > + }
> >> >
> >> > p = nvdimm_provider_data(nvdimm);
> >> >
> >> > @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >> > *cmd_rc = papr_scm_meta_set(p, buf);
> >> > break;
> >> >
> >> > + case ND_CMD_CALL:
> >> > + call_pkg = nd_to_pdsm_cmd_pkg(buf);
> >> > + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> >> > + break;
> >> > +
> >> > default:
> >> > - return -EINVAL;
> >> > + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> >> > + *cmd_rc = -EINVAL;
> >>
> >> Is this change related? If there is a bug where there is a caller of
> >> papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
> >> that issue.
> >>
> >> Ira
> >>
> >> > }
> >> >
> >> > dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
> >> >
> >> > - return 0;
> >> > + return *cmd_rc;
> >> > }
> >> >
> >> > static ssize_t flags_show(struct device *dev,
> >> > diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> >> > index de5d90212409..0e09dc5cec19 100644
> >> > --- a/include/uapi/linux/ndctl.h
> >> > +++ b/include/uapi/linux/ndctl.h
> >> > @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
> >> > #define NVDIMM_FAMILY_HPE2 2
> >> > #define NVDIMM_FAMILY_MSFT 3
> >> > #define NVDIMM_FAMILY_HYPERV 4
> >> > +#define NVDIMM_FAMILY_PAPR 5
> >> >
> >> > #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
> >> > struct nd_cmd_pkg)
> >> > --
> >> > 2.26.2
> >> >
> >> _______________________________________________
> >> Linux-nvdimm mailing list -- [email protected]
> >> To unsubscribe send an email to [email protected]
>
> --
> Cheers
> ~ Vaibhav
On Thu, Jun 04, 2020 at 12:34:04AM +0530, Vaibhav Jain wrote:
> Hi Ira,
>
> Thanks for reviewing this patch. My responses below:
>
> Ira Weiny <[email protected]> writes:
>
> > On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
> >> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
> >> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
> >> containing dimm health information back to user space in response to
> >> ND_CMD_CALL. This functionality is implemented in newly introduced
> >> papr_pdsm_health() that queries the nvdimm health information and
> >> then copies this information to the package payload whose layout is
> >> defined by 'struct nd_papr_pdsm_health'.
> >>
> >> The patch also introduces a new member 'struct papr_scm_priv.health'
> >> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
> >> information of a nvdimm. As a result functions drc_pmem_query_health()
> >> and flags_show() are updated to populate and use this new struct
> >> instead of a u64 integer that was earlier used.
> >>
> >> Cc: "Aneesh Kumar K . V" <[email protected]>
> >> Cc: Dan Williams <[email protected]>
> >> Cc: Michael Ellerman <[email protected]>
> >> Cc: Ira Weiny <[email protected]>
> >> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> >> Signed-off-by: Vaibhav Jain <[email protected]>
> >> ---
> >> Changelog:
> >>
> >> Resend:
> >> * Added ack from Aneesh.
> >>
> >> v8..v9:
> >> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
> >> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
> >> * Renamed papr_scm_get_health() to papr_psdm_health()
> >> * Updated patch description to replace papr-scm dimm with nvdimm.
> >>
> >> v7..v8:
> >> * None
> >>
> >> Resend:
> >> * None
> >>
> >> v6..v7:
> >> * Updated flags_show() to use seq_buf_printf(). [Mpe]
> >> * Updated papr_scm_get_health() to use newly introduced
> >> __drc_pmem_query_health() bypassing the cache [Mpe].
> >>
> >> v5..v6:
> >> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
> >> gaurd against possibility of different compilers adding different
> >> paddings to the struct [ Dan Williams ]
> >>
> >> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
> >> 'bool' and also updated drc_pmem_query_health() to take this into
> >> account. [ Dan Williams ]
> >>
> >> v4..v5:
> >> * None
> >>
> >> v3..v4:
> >> * Call the DSM_PAPR_SCM_HEALTH service function from
> >> papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
> >>
> >> v2..v3:
> >> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
> >> as its exported to the userspace [Aneesh]
> >> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
> >> from enum to #defines [Aneesh]
> >>
> >> v1..v2:
> >> * New patch in the series
> >> ---
> >> arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++++++
> >> arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
> >> 2 files changed, 147 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> >> index 6407fefcc007..411725a91591 100644
> >> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
> >> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> >> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
> >> */
> >> enum papr_pdsm {
> >> PAPR_PDSM_MIN = 0x0,
> >> + PAPR_PDSM_HEALTH,
> >> PAPR_PDSM_MAX,
> >> };
> >>
> >> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> >> return (void *)(pcmd->payload);
> >> }
> >>
> >> +/* Various nvdimm health indicators */
> >> +#define PAPR_PDSM_DIMM_HEALTHY 0
> >> +#define PAPR_PDSM_DIMM_UNHEALTHY 1
> >> +#define PAPR_PDSM_DIMM_CRITICAL 2
> >> +#define PAPR_PDSM_DIMM_FATAL 3
> >> +
> >> +/*
> >> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
> >> + * Various flags indicate the health status of the dimm.
> >> + *
> >> + * dimm_unarmed : Dimm not armed. So contents wont persist.
> >> + * dimm_bad_shutdown : Previous shutdown did not persist contents.
> >> + * dimm_bad_restore : Contents from previous shutdown werent restored.
> >> + * dimm_scrubbed : Contents of the dimm have been scrubbed.
> >> + * dimm_locked : Contents of the dimm cant be modified until CEC reboot
> >> + * dimm_encrypted : Contents of dimm are encrypted.
> >> + * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
> >> + */
> >> +struct nd_papr_pdsm_health_v1 {
> >> + __u8 dimm_unarmed;
> >> + __u8 dimm_bad_shutdown;
> >> + __u8 dimm_bad_restore;
> >> + __u8 dimm_scrubbed;
> >> + __u8 dimm_locked;
> >> + __u8 dimm_encrypted;
> >> + __u16 dimm_health;
> >> +} __packed;
> >> +
> >> +/*
> >> + * Typedef the current struct for dimm_health so that any application
> >> + * or kernel recompiled after introducing a new version automatically
> >> + * supports the new version.
> >> + */
> >> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
> >> +
> >> +/* Current version number for the dimm health struct */
> >
> > This can't be the 'current' version. You will need a list of versions you
> > support. Because if the user passes in an old version you need to be able to
> > respond with that old version. Also if you plan to support 'return X for a Y
> > query' then the user will need both X and Y defined to interpret X.
> Yes, and that change will be introduced with addition of version-2 of
> nd_papr_pdsm_health. Earlier version of the patchset[1] had such a table
> implemented. But to simplify the patchset, as we are only dealing with
> version-1 of the structs right now, it was dropped.
>
> [1] :
> https://lore.kernel.org/linuxppc-dev/[email protected]/
I'm not sure I follow that comment.
I feel like there is some confusion about what firmware can return vs the UAPI
structure. You have already marshaled the data between the 2. We can define
whatever we want for the UAPI structures throwing away data the kernel does not
understand from the firmware.
>
> >
> >> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
> >> +
> >> #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 5e2237e7ec08..c0606c0c659c 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -88,7 +88,7 @@ struct papr_scm_priv {
> >> unsigned long lasthealth_jiffies;
> >>
> >> /* Health information for the dimm */
> >> - u64 health_bitmap;
> >> + struct nd_papr_pdsm_health health;
> >
> > ok so we are throwing away all the #defs from patch 1? Are they still valid?
> >
> > I'm confused that patch 3 added this and we are throwing it away
> > here...
> The #defines are still valid, only the usage moved to a __drc_pmem_query_health().
>
> >
> >> };
> >>
> >> static int drc_pmem_bind(struct papr_scm_priv *p)
> >> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> >> static int __drc_pmem_query_health(struct papr_scm_priv *p)
> >> {
> >> unsigned long ret[PLPAR_HCALL_BUFSIZE];
> >> + u64 health;
> >> long rc;
> >>
> >> /* issue the hcall */
> >> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
> >> if (rc != H_SUCCESS) {
> >> dev_err(&p->pdev->dev,
> >> "Failed to query health information, Err:%ld\n", rc);
> >> - rc = -ENXIO;
> >> - goto out;
> >> + return -ENXIO;
> >
> > I missed this... probably did not need the goto in the first patch?
> Yes, will get rid of the goto from patch-1.
Cool.
>
> >
> >> }
> >>
> >> p->lasthealth_jiffies = jiffies;
> >> - p->health_bitmap = ret[0] & ret[1];
> >> + health = ret[0] & ret[1];
> >>
> >> dev_dbg(&p->pdev->dev,
> >> "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> >> ret[0], ret[1]);
> >> -out:
> >> - return rc;
> >> +
> >> + memset(&p->health, 0, sizeof(p->health));
> >> +
> >> + /* Check for various masks in bitmap and set the buffer */
> >> + if (health & PAPR_PMEM_UNARMED_MASK)
> >
> > Oh ok... odd. (don't add code then just take it away in a series)
> > You could have lead with the user structure and put this code in patch
> > 3.
> The struct nd_papr_pdsm_health in only introduced this patch in header
> 'papr_pdsm.h' as means of exchanging nvdimm health information with
> userspace. Introducing this struct without introducing the necessary
> scafolding in 'papr_pdsm.h' would have been very counter-intutive.
I respectfully disagree. You intended to use a copy of this structure in
kernel to store the data. Just do that.
>
> >
> > Why does the user need u8 to represent a single bit? Does this help protect
> > against endian issues?
> This was 'bool' earlier but since type 'bool' isnt suitable for ioctl abi
> and I wanted to avoid bit fields here as not sure if their packing may
> differ across compilers hence replaced with u8.
>
ok works for me...
> >
> >> + p->health.dimm_unarmed = 1;
> >> +
> >> + if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> >> + p->health.dimm_bad_shutdown = 1;
> >> +
> >> + if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> >> + p->health.dimm_bad_restore = 1;
> >> +
> >> + if (health & PAPR_PMEM_ENCRYPTED)
> >> + p->health.dimm_encrypted = 1;
> >> +
> >> + if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
> >> + p->health.dimm_locked = 1;
> >> + p->health.dimm_scrubbed = 1;
> >> + }
> >> +
> >> + if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
> >> + p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
> >> +
> >> + if (health & PAPR_PMEM_HEALTH_CRITICAL)
> >> + p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
> >> +
> >> + if (health & PAPR_PMEM_HEALTH_FATAL)
> >> + p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
> >> +
> >> + return 0;
> >> }
> >>
> >> /* Min interval in seconds for assuming stable dimm health */
> >> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> >> return 0;
> >> }
> >>
> >> +/* Fetch the DIMM health info and populate it in provided package. */
> >> +static int papr_pdsm_health(struct papr_scm_priv *p,
> >> + struct nd_pdsm_cmd_pkg *pkg)
> >> +{
> >> + int rc;
> >> + size_t copysize = sizeof(p->health);
> >> +
> >> + /* Ensure dimm health mutex is taken preventing concurrent access */
> >> + rc = mutex_lock_interruptible(&p->health_mutex);
> >> + if (rc)
> >> + goto out;
> >> +
> >> + /* Always fetch upto date dimm health data ignoring cached values */
> >> + rc = __drc_pmem_query_health(p);
> >> + if (rc)
> >> + goto out_unlock;
> >> + /*
> >> + * If the requested payload version is greater than one we know
> >> + * about, return the payload version we know about and let
> >> + * caller/userspace handle.
> >> + */
> >> + if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
> >> + pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
> >
> > I know this seems easy now but I do think you will run into trouble later.
>
> I did addressed this in an earlier iteration of this patchset[1] and
> dropped it in favour of simplicity.
>
> [1] :
> https://lore.kernel.org/linuxppc-dev/[email protected]/
I don't see how that addresses this? See my other email.
Ira
>
> > Ira
> >
> >> +
> >> + if (pkg->hdr.nd_size_out < copysize) {
> >> + dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
> >> + pkg->hdr.nd_size_out, copysize);
> >> + rc = -ENOSPC;
> >> + goto out_unlock;
> >> + }
> >> +
> >> + dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
> >> + copysize, pkg->payload_version);
> >> +
> >> + /* Copy the health struct to the payload */
> >> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> >> + pkg->hdr.nd_fw_size = copysize;
> >> +
> >> +out_unlock:
> >> + mutex_unlock(&p->health_mutex);
> >> +
> >> +out:
> >> + /*
> >> + * Put the error in out package and return success from function
> >> + * so that errors if any are propogated back to userspace.
> >> + */
> >> + pkg->cmd_status = rc;
> >> + dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
> >> +
> >> + return 0;
> >> +}
> >> +
> >> static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> >> struct nd_pdsm_cmd_pkg *call_pkg)
> >> {
> >> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> >>
> >> /* Depending on the DSM command call appropriate service routine */
> >> switch (call_pkg->hdr.nd_command) {
> >> + case PAPR_PDSM_HEALTH:
> >> + return papr_pdsm_health(p, call_pkg);
> >> +
> >> default:
> >> dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> >> call_pkg->hdr.nd_command);
> >> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
> >> struct nvdimm *dimm = to_nvdimm(dev);
> >> struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> >> struct seq_buf s;
> >> - u64 health;
> >> int rc;
> >>
> >> rc = drc_pmem_query_health(p);
> >> if (rc)
> >> return rc;
> >>
> >> - /* Copy health_bitmap locally, check masks & update out buffer */
> >> - health = READ_ONCE(p->health_bitmap);
> >> -
> >> seq_buf_init(&s, buf, PAGE_SIZE);
> >> - if (health & PAPR_PMEM_UNARMED_MASK)
> >> +
> >> + /* Protect concurrent modifications to papr_scm_priv */
> >> + rc = mutex_lock_interruptible(&p->health_mutex);
> >> + if (rc)
> >> + return rc;
> >> +
> >> + if (p->health.dimm_unarmed)
> >> seq_buf_printf(&s, "not_armed ");
> >>
> >> - if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
> >> + if (p->health.dimm_bad_shutdown)
> >> seq_buf_printf(&s, "flush_fail ");
> >>
> >> - if (health & PAPR_PMEM_BAD_RESTORE_MASK)
> >> + if (p->health.dimm_bad_restore)
> >> seq_buf_printf(&s, "restore_fail ");
> >>
> >> - if (health & PAPR_PMEM_ENCRYPTED)
> >> + if (p->health.dimm_encrypted)
> >> seq_buf_printf(&s, "encrypted ");
> >>
> >> - if (health & PAPR_PMEM_SMART_EVENT_MASK)
> >> + if (p->health.dimm_health)
> >> seq_buf_printf(&s, "smart_notify ");
> >>
> >> - if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
> >> - seq_buf_printf(&s, "scrubbed locked ");
> >> + if (p->health.dimm_scrubbed)
> >> + seq_buf_printf(&s, "scrubbed ");
> >> +
> >> + if (p->health.dimm_locked)
> >> + seq_buf_printf(&s, "locked ");
> >> +
> >> + mutex_unlock(&p->health_mutex);
> >>
> >> if (seq_buf_used(&s))
> >> seq_buf_printf(&s, "\n");
> >> --
> >> 2.26.2
> >>
>
> --
> Cheers
> ~ Vaibhav
Hi Dan,
Thanks for review and insights on this. My responses below:
"Williams, Dan J" <[email protected]> writes:
> [ forgive formatting I'm temporarily stuck using Outlook this week... ]
>
>> From: Vaibhav Jain <[email protected]>
> [..]
>>
>> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in papr_scm
>> module and add the command family NVDIMM_FAMILY_PAPR to the white
>> list of NVDIMM command sets. Also advertise support for ND_CMD_CALL for
>> the nvdimm command mask and implement necessary scaffolding in the
>> module to handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>>
>> The layout of the PDSM request as we expect from libnvdimm/libndctl is
>> described in newly introduced uapi header 'papr_pdsm.h' which defines a
>> new 'struct nd_pdsm_cmd_pkg' header. This header is used to communicate
>> the PDSM request via member 'nd_cmd_pkg.nd_command' and size of
>> payload that need to be sent/received for servicing the PDSM.
>>
>> A new function is_cmd_valid() is implemented that reads the args to
>> papr_scm_ndctl() and performs sanity tests on them. A new function
>> papr_scm_service_pdsm() is introduced and is called from
>> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
>> command from libnvdimm.
>>
>> Cc: "Aneesh Kumar K . V" <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Reviewed-by: Aneesh Kumar K.V <[email protected]>
>> Signed-off-by: Vaibhav Jain <[email protected]>
>> ---
>> Changelog:
>>
>> Resend:
>> * Added ack from Aneesh.
>>
>> v8..v9:
>> * Reduced the usage of term SCM replacing it with appropriate
>> replacement [ Dan Williams, Aneesh ]
>> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
>> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
>> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
>> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
>> * Minor update to patch description.
>>
>> v7..v8:
>> * Removed the 'payload_offset' field from 'struct
>> nd_pdsm_cmd_pkg'. Instead command payload is always assumed to start
>> at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
>> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
>> 'reserved' field of 10-bytes is introduced. [ Aneesh ]
>> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
>> [ Ira ]
>>
>> Resend:
>> * None
>>
>> v6..v7 :
>> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>> [Mpe].
>> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
>> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>> [Mpe].
>> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>>
>> v5..v6 :
>> * Changed the usage of the term DSM to PDSM to distinguish it from the
>> ACPI term [ Dan Williams ]
>> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various
>> struct
>> to reflect the new terminology.
>> * Updated the patch description and title to reflect the new terminology.
>> * Squashed patch to introduce new command family in 'ndctl.h' with
>> this patch [ Dan Williams ]
>> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
>> [ Dan Williams ]
>> * Removed redundant license text from the papr_scm_psdm.h file.
>> [ Dan Williams ]
>> * s/envelop/envelope/ at various places [ Dan Williams ]
>> * Added '__packed' attribute to command package header to gaurd
>> against different compiler adding paddings between the fields.
>> [ Dan Williams]
>> * Converted various pr_debug to dev_debug [ Dan Williams ]
>>
>> v4..v5 :
>> * None
>>
>> v3..v4 :
>> * None
>>
>> v2..v3 :
>> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>>
>> v1..v2 :
>> * None
>> ---
>> arch/powerpc/include/uapi/asm/papr_pdsm.h | 136
>> ++++++++++++++++++++++ arch/powerpc/platforms/pseries/papr_scm.c |
>> 101 +++++++++++++++-
>> include/uapi/linux/ndctl.h | 1 +
>> 3 files changed, 232 insertions(+), 6 deletions(-) create mode 100644
>> arch/powerpc/include/uapi/asm/papr_pdsm.h
>>
>> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> new file mode 100644
>> index 000000000000..6407fefcc007
>> --- /dev/null
>> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> @@ -0,0 +1,136 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> +/*
>> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
>> + *
>> + * (C) Copyright IBM 2020
>> + *
>> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com> */
>> +
>> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> +#define _UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/*
>> + * PDSM Envelope:
>> + *
>> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel
>> +via
>> + * envelope which consists of a header and user-defined payload sections.
>> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
>> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload' field.
>> + * There is reserved field that can used to introduce new fields to the
>> + * structure in future. It also tries to ensure that
>> 'nd_pdsm_cmd_pkg.payload'
>> + * lies at a 8-byte boundary.
>> + *
>> + * +-------------+---------------------+---------------------------+
>> + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_pdsm_cmd_pkg | |
>> + * |-------------+ | |
>> + * | nd_cmd_pkg | | |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_family | | |
>> + * | nd_size_out | cmd_status | |
>> + * | nd_size_in | payload_version | payload |
>> + * | nd_command | reserved | |
>> + * | nd_fw_size | | |
>> + * +-------------+---------------------+---------------------------+
>> + *
>> + * PDSM Header:
>> + *
>> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
>> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to
>> member
>> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the
>> envelope
>> +which is
>> + * contained in 'struct nd_cmd_pkg', the header also has members
>> +following
>> + * members:
>> + *
>> + * 'cmd_status' : (Out) Errors if any encountered while
>> servicing PDSM.
>> + * 'payload_version' : (In/Out) Version number associated with the
>> payload.
>> + * 'reserved' : Not used and reserved for future.
>> + *
>> + * PDSM Payload:
>> + *
>> + * The layout of the PDSM Payload is defined by various structs shared
>> +between
>> + * papr_scm and libndctl so that contents of payload can be
>> +interpreted. During
>> + * servicing of a PDSM the papr_scm module will read input args from
>> +the payload
>> + * field by casting its contents to an appropriate struct pointer based
>> +on the
>> + * PDSM command. Similarly the output of servicing the PDSM command
>> +will be
>> + * copied to the payload field using the same struct.
>> + *
>> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size,
>> +which
>> + * leaves around 176 bytes for the envelope payload (ignoring any
>> +padding that
>> + * the compiler may silently introduce).
>> + *
>> + * Payload Version:
>> + *
>> + * A 'payload_version' field is present in PDSM header that indicates a
>> +specific
>> + * version of the structure present in PDSM Payload for a given PDSM
>> command.
>> + * This provides backward compatibility in case the PDSM Payload
>> +structure
>> + * evolves and different structures are supported by 'papr_scm' and
>> 'libndctl'.
>> + *
>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send
>> +the version
>> + * of the payload struct it supports via 'payload_version' field. The
>> 'papr_scm'
>> + * module when servicing the PDSM envelope checks the 'payload_version'
>> +and then
>> + * uses 'payload struct version' == MIN('payload_version field',
>> + * 'max payload-struct-version supported by papr_scm') to service the
>> PDSM.
>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of
>> +payload
>> + * struct in returned 'payload_version' field.
>> + *
>> + * Libndctl on receiving the envelope back from papr_scm again checks
>> +the
>> + * 'payload_version' field and based on it use the appropriate version
>> +dsm
>> + * struct to parse the results.
>> + *
>> + * Backward Compatibility:
>> + *
>> + * Above scheme of exchanging different versioned PDSM struct between
>> +libndctl
>> + * and papr_scm should provide backward compatibility until following
>> +two
>> + * assumptions/conditions when defining new PDSM structs hold:
>> + *
>> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> + *
>> + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> + * i.e Each new version of PDSM struct should retain existing struct
>> + * attributes from previous version
>> + *
>> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> + * it should also support T(1), T(2)...T(X - 1).
>> + * i.e When adding support for new version of a PDSM struct, libndctl
>> + * and papr_scm should retain support of the existing PDSM struct
>> + * version they support.
>> + */
>> +
>> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from
>> libnvdimm
>> +*/ struct nd_pdsm_cmd_pkg {
>> + struct nd_cmd_pkg hdr; /* Package header containing sub-
>> cmd */
>> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> + __u16 reserved[5]; /* Ignored and to be used in future */
>> + __u16 payload_version; /* In/Out: version of the payload */
>> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to
>> the
>> +kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct */
>> +enum papr_pdsm {
>> + PAPR_PDSM_MIN = 0x0,
>> + PAPR_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */ static inline
>> +struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg
>> *cmd) {
>> + return (struct nd_pdsm_cmd_pkg *) cmd; }
>> +
>> +/* Return the payload pointer for a given pcmd */ static inline void
>> +*pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) {
>> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> + return NULL;
>> + else
>> + return (void *)(pcmd->payload);
>> +}
>> +
>> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 149431594839..5e2237e7ec08 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -15,13 +15,15 @@
>> #include <linux/seq_buf.h>
>>
>> #include <asm/plpar_wrappers.h>
>> +#include <asm/papr_pdsm.h>
>>
>> #define BIND_ANY_ADDR (~0ul)
>>
>> #define PAPR_SCM_DIMM_CMD_MASK \
>> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> - (1ul << ND_CMD_SET_CONFIG_DATA))
>> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> + (1ul << ND_CMD_CALL))
>>
>> /* DIMM health bitmap bitmap indicators */
>> /* SCM device is unable to persist memory contents */ @@ -350,16 +352,97
>> @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> return 0;
>> }
>>
>> +/*
>> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> + * This also does initial sanity validation to ND_CMD_CALL sub-command
>> packages.
>> + */
>> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void
>> *buf,
>> + unsigned int buf_len)
>> +{
>> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> + struct papr_scm_priv *p;
>> +
>> + /* Only dimm-specific calls are supported atm */
>> + if (!nvdimm)
>> + return -EINVAL;
>> +
>> + /* get the provider date from struct nvdimm */
>> + p = nvdimm_provider_data(nvdimm);
>> +
>> + if (!test_bit(cmd, &cmd_mask)) {
>> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> + return -EINVAL;
>> + } else if (cmd == ND_CMD_CALL) {
>> +
>> + /* Verify the envelope package */
>> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> + buf_len);
>> + return -EINVAL;
>> + }
>> +
>> + /* Verify that the PDSM family is valid */
>> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg
>> family=0x%llx\n",
>> + pkg->hdr.nd_family);
>> + return -EINVAL;
>> +
>> + }
>> +
>> + /* We except a payload with all PDSM commands */
>> + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> + dev_dbg(&p->pdev->dev,
>> + "Empty payload for sub-command=0x%llx\n",
>> + pkg->hdr.nd_command);
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + /* Command looks valid */
>
<snip>
> So this is where I would expect the kernel to validate the command vs
> a known list of supported commands / payloads. One of the goals of
> requiring public documentation of any commands that libnvdimm might
> support for the ioctl path is to give the kernel the ability to gate
> future enabling on consideration of a common kernel front-end
> interface. I believe this would also address questions about the
> versioning scheme because userspace would be actively prevented from
> sending command payloads that were not first explicitly enabled in the
> kernel. This interface as it stands in this patch set seems to be a
> very thin / "anything goes" passthrough with no consideration for that
> policy.
>
> As an example of the utility of this policy, consider the recent
> support for nvdimm security commands that allow a passphrase to be set
> and issue commands like "unlock" and "secure erase". The kernel
> actively prevents those commands from being sent from userspace. See
> acpi_nfit_clear_to_send() and nd_cmd_clear_to_send(). The reasoning is
> that it enforces the kernel's nvdimm security model that uses
> encrypted/trusted keys to protect key material (clear text keys
> only-ever exist in kernel-space). Yes, that restriction is painful for
> people that don't want the kernel's security model and just want the
> simplicity of passing clear-text keys around, but it's necessary for
> the kernel to have any chance to provide a common abstraction across
> vendors. The pain of negotiating every single command with what the
> kernel will support is useful for the long term health of the
> kernel. It forces ongoing conversations across vendors to consolidate
> interfaces and reuse kernel best practices like encrypted/trusted
> keys. Code acceptance is the only real gate the kernel has to enforce
> cooperation across vendors.
>
> The expectation is that the kernel does not allow any command to pass
> that is not explicitly listed in a bitmap of known commands. I would
> expect that if you changed the payload of an existing command that
> would likely require a new entry in this bitmap. The goal is to give
> the kernel a chance to constrain the passthrough interface to afford a
> chance to have a discussion of what might done in a common
> implementation. Another example is the label-area read-write
> commands. The kernel needs explicit control to ensure that it owns the
> label area and that userspace is not able to corrupt it (write it
> behind the kernel's back).
>
> Now that said, I have battle scars with some OEMs that just want a
> generic passthrough interface so they never need to work with the
> kernel community again and can just write their custom validation
> tooling and be done. I've mostly been successful in that fight outside
> of the gaping hole of ND_CMD_VENDOR. That's the path that ipmctl has
> used to issue commands that have not made it into the public
> specification on docs.pmem.io. My warning shot for that is the
> "disable_vendor_specific" module option that administrators can set to
> only allow commands that the kernel explicitly knows the effects of to
> be issued. The result is only tooling / enabling that submits to this
> auditing regime is guaranteed to work everywhere.
Agree with points made above. With this patchset we arent really trying
to push an ioctl passthrough to exchange arbitary data with
papr-scm module. Nor do we want to bypass the kernel community for any
future enhancements on this interface. We made some design choices based on
our understanding of certain restriction we saw in
ndctl/libndctl. Specifically wanted to avoid issuing two CMD_CALL ioctl
roundtrips.
That being said I had an extended discussion with Aneesh rethinking the
'version' field and we both agreed *to remove this field* from the
proposed 'struct nd_pdsm_cmd_pkg'. This should resolve the contentions
around this Patch-4 in this patchset. Since the 'version' field isnt
extensively used right now the impact on the patchset would be small.
>
> So, that long explanation out of the way, what does that mean for this
> patch set? I'd like to understand if you still see a need for a
> versioning scheme if the implementation is required to explicitly list
> all the commands it supports? I.e. that the kernel need not worry
> about userspace sending future unknown payloads because unknown
> payloads are blocked. Also if your interface has anything similar to a
> "vendor specific" passthrough I would like to require that go through
> the ND_CMD_VENDOR ioctl, so that the kernel still has a common check
> point to prevent vendor specific "I don't want to talk to the kernel
> community" shenanigans, but even better if ND_CMD_VENDOR is something
> the kernel can eventually jettison because nobody is using it.
As I mentioned above this isn't a 'vendor specific passthrough'
machenism. The 'version' field was proposed to avoid two CMD_CALL ioctl
roundtrip to fetch and report extended nvdimm health data like
'life-remaining' which isnt always available for papr-scm.
However we just realized instead of relying on 'version' field we can
advertise support for these extended attributes via nvdimm-flags from
sysfs. Looking at the nvdimm-flags libndctl can use an appropriate
pdsm command and struct to fetch the dimm health information from
papr_scm via CMD_CALL.
But thats something we plan to do in future and not with the current
patchset which only reports fixed set of nvdimm health attributes.
>
> I feel like this is a conversation that will take a few days to
> resolve, which does not leave time to push this for v5.8. That said, I
> do think the health flags patches at the beginning of this series are
> low risk and uncontentious. How about I merge those for v5.8 and
> circle back to get this ioctl path queued early in v5.8-rc? Apologies
> for the late feedback on this relative to v5.8.
>
Thanks for this consideration. Agree to the proposal. However changes to
patchset with removal of 'version' field is fairly small hence can
quickly push an updated patch series cumulating rest of the review
comments from Ira.
Does that sounds reasonable ?
Thanks,
~ Vaibhav
Ira Weiny <[email protected]> writes:
> On Wed, Jun 03, 2020 at 11:41:42PM +0530, Vaibhav Jain wrote:
>> Hi Ira,
>>
>> Thanks for reviewing this patch. My responses below:
>>
>> Ira Weiny <[email protected]> writes:
>>
>
> ...
>
>> >> + *
>> >> + * Payload Version:
>> >> + *
>> >> + * A 'payload_version' field is present in PDSM header that indicates a specific
>> >> + * version of the structure present in PDSM Payload for a given PDSM command.
>> >> + * This provides backward compatibility in case the PDSM Payload structure
>> >> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>> >> + *
>> >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>> >> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>> >> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>> >> + * uses 'payload struct version' == MIN('payload_version field',
>> >> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> >> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>> >> + * struct in returned 'payload_version' field.
>> >> + *
>> >> + * Libndctl on receiving the envelope back from papr_scm again checks the
>> >> + * 'payload_version' field and based on it use the appropriate version dsm
>> >> + * struct to parse the results.
>> >> + *
>> >> + * Backward Compatibility:
>> >> + *
>> >> + * Above scheme of exchanging different versioned PDSM struct between libndctl
>> >> + * and papr_scm should provide backward compatibility until following two
>> >> + * assumptions/conditions when defining new PDSM structs hold:
>> >> + *
>> >> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> >> + *
>> >> + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> >> + * i.e Each new version of PDSM struct should retain existing struct
>> >> + * attributes from previous version
>> >> + *
>> >> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> >> + * it should also support T(1), T(2)...T(X - 1).
>> >> + * i.e When adding support for new version of a PDSM struct, libndctl
>> >> + * and papr_scm should retain support of the existing PDSM struct
>> >> + * version they support.
>> >
>> > Please see this thread for an example why versions are a bad idea in UAPIs:
>> >
>> > https://lkml.org/lkml/2020/3/26/213
>> >
>>
>> > While the use of version is different in that thread the fundamental issues are
>> > the same. You end up with some weird matrix of supported features and
>> > structure definitions. For example, you are opening up the possibility of
>> > changing structures with a different version for no good reason.
>>
>> Not really sure I understand the statement correctly "you are opening up
>> the possibility of changing structures with a different version for no
>> good reason."
>
[..]
> What I mean is:
>
> struct v1 {
> u32 x;
> u32 y;
> };
>
> struct v2 {
> u32 y;
> u32 x;
> };
>
> x and y are the same data but you have now redefined the order of the struct.
> You don't need that flexibility/complexity.
>
> Generally I think you are defining:
>
> struct v1 {
> u32 x;
> u32 y;
> };
>
> struct v2 {
> u32 x;
> u32 y;
> u32 z;
> u32 a;
> };
>
> Which becomes 2 structures... There is no need.
>
> The easiest thing to do is:
>
> struct user_data {
> u32 x;
> u32 y;
> };
>
> And later you modify user_data to:
>
> struct user_data {
> u32 x;
> u32 y;
> u32 z;
> u32 a;
> };
>
> libndctl always passes sizeof(struct user_data) to the call. [Do ensure
> structures are 64bit aligned for this to work.]
>
> The kernel sees the size and returns the amount of data up to that size.
>
> Therefore, older kernels automatically fill in x and y, newer kernels fill in
> z/a if the buffer was big enough. libndctl only uses the fields it knows about.
>
> It is _much_ easier this way. Almost nothing needs to get changed as versions
> roll forward. The only big issue is if libndctl _needs_ z then it has to check
> if z is returned.
>
> In that case add a cap_mask with bit fields which the kernel can fill in for
> which fields are valid.
>
> struct user_data {
> u64 cap_mask; /* where bits define extra future capabilities */
> u32 x;
> u32 y;
> };
>
> IFF you need to add data within fields which are reserved you can use
> capability flags to indicate which fields are requested and which are returned
> by the kernel.
>
> But I _think_ for what you want libndctl must survive if z/a are not available
> right? So just adding to the structure should be fine.
Agreed. But as I mentioned in my response to Dan's review comments [1], we
will be removing the version field altogether and instead will introduce
new psdm requests bound to new struct definitions in conjuntion to
nvdimm-flags. I have a patchset ready which I will be sending out soon.
[1] https://lore.kernel.org/linux-nvdimm/[email protected]/
>
>> We want to return more data in the struct in future iterations. So
>> 'changing structure with different version' is something we are
>> expecting.
>>
>> With the backward compatibility constraints 1 & 2 above, it will ensure
>> that support matrix looks like a lower traingular matrix with each
>> successive version supporting previous version attributes. So supporting
>> future versions is relatively simplified.
>
> But you end up with weird switch/if's etc to deal with the multiple structures.
>
> With the size method the kernel simply returns the same size data as the user
> requested and everything is done. No logic required at all. Literally it can
> just copy the data it has (truncating if necessary).
>
Agreed. But with version field gone now we will instead use new psdm
requests bound to new struct definitions in conjuntion to nvdimm-flags
to retrive extended data from papr_scm.
>>
>> >
>> > Also having the user query with version Z and get back version X (older) is
>> > odd. Generally if the kernel does not know about a feature (ie version Z of
>> > the structure) it should return -EINVAL and let the user figure out what to do.
>> > The user may just give up or they could try a different query.
>> >
>> Considering the flow of ndctl/libndctl this is needed. libndctl will
>> usually issues only one CMD_CALL ioctl to kernel and if that fails then
>> an error is reported and ndctl will exit loosing state.
>>
>> Adding mechanism in libndctl to reissue CMD_CALL ioctl to fetch a
>> appropriate version of pdsm struct is going to be considerably more
>> work.
>>
>> This version fall-back mechanism, ensures that libndctl will receive
>> usable data without having to reissue a more CMD_CALL ioctls.
>
> Define usable?
>
> What happens if libndctl does not get 'z' in my example above? What does it
> do? If I understand correctly it does not _need_ z. So why have a check on
> the version from the kernel?
>
> What if we change to:
>
> struct v3 {
> u32 x;
> u32 y;
> u32 z;
> u32 a;
> u32 b;
> u32 c;
> };
>
> Now it has to
>
> if(version 2)
> z/a valid do something()
>
> if(version 3)
> b/c valid do something else()
>
> if z, a, b, c are all 0 does it matter?
>
> If not, the logic above disappears.
>
> If so, then you need a cap mask. Then the kernel can say c and a are valid
> (but c is 0) or other flexible stuff like that.
>
>>
>> >> + */
>> >> +
>> >> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> >> +struct nd_pdsm_cmd_pkg {
>> >> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> >> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> >> + __u16 reserved[5]; /* Ignored and to be used in future */
>> >
>> > How do you know when reserved is used for something else in the future? Is
>> > reserved guaranteed (and checked by the code) to be 0?
>>
>> For current set of pdsm requests ignore these reserved fields. However a
>> future pdsm request can leverage these reserved fields. So papr_scm
>> just bind the usage of these fields with the value of
>> 'nd_cmd_pkg.nd_command' that indicates the pdsm request.
>>
>> That being said checking if the reserved fields are set to 0 will be a
>> good measure. Will add this check in next iteration.
>
> Exactly, if you don't check them now you will end up with an older libndctl
> which passes in garbage and breaks future users... Basically rendering the
> reserved fields useless.
I have addressed this in my new patch-series which adds checks for
reserved fields to be '0'
>
>>
>> >
>> >> + __u16 payload_version; /* In/Out: version of the payload */
>> >
>> > Why is payload_version after reserved?
>> Want to place the payload version field just before the payload data so
>> that it can be accessed with simple pointer arithmetic.
>
> I did not see that in the patch. I thought you were using
> nd_pdsm_cmd_pkg->payload_version?
Thats right, but it just provided another simple way to retrive
payload_version without resorting to 'container_of' macro. Anyways the
version field is now gone hence 'payload' follows the reserved fields.
>
>>
>> >
>> >> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> >> +} __packed;
>> >> +
>> >> +/*
>> >> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> >> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> >> + */
>> >> +enum papr_pdsm {
>> >> + PAPR_PDSM_MIN = 0x0,
>> >> + PAPR_PDSM_MAX,
>> >> +};
>> >> +
>> >> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> >> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> >> +{
>> >> + return (struct nd_pdsm_cmd_pkg *) cmd;
>> >> +}
>> >> +
>> >> +/* Return the payload pointer for a given pcmd */
>> >> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> >> +{
>> >> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> >> + return NULL;
>> >> + else
>> >> + return (void *)(pcmd->payload);
>> >> +}
>> >> +
>> >> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index 149431594839..5e2237e7ec08 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -15,13 +15,15 @@
>> >> #include <linux/seq_buf.h>
>> >>
>> >> #include <asm/plpar_wrappers.h>
>> >> +#include <asm/papr_pdsm.h>
>> >>
>> >> #define BIND_ANY_ADDR (~0ul)
>> >>
>> >> #define PAPR_SCM_DIMM_CMD_MASK \
>> >> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> >> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> >> - (1ul << ND_CMD_SET_CONFIG_DATA))
>> >> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> >> + (1ul << ND_CMD_CALL))
>> >>
>> >> /* DIMM health bitmap bitmap indicators */
>> >> /* SCM device is unable to persist memory contents */
>> >> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> >> return 0;
>> >> }
>> >>
>> >> +/*
>> >> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> >> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
>> >> + */
>> >> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >> + unsigned int buf_len)
>> >> +{
>> >> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> >> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> >> + struct papr_scm_priv *p;
>> >> +
>> >> + /* Only dimm-specific calls are supported atm */
>> >> + if (!nvdimm)
>> >> + return -EINVAL;
>> >> +
>> >> + /* get the provider date from struct nvdimm */
>> >
>> > s/date/data
>> Thanks for point this out. Will fix this in next iteration.
>>
>> >
>> >> + p = nvdimm_provider_data(nvdimm);
>> >> +
>> >> + if (!test_bit(cmd, &cmd_mask)) {
>> >> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> >> + return -EINVAL;
>> >> + } else if (cmd == ND_CMD_CALL) {
>> >> +
>> >> + /* Verify the envelope package */
>> >> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> >> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> >> + buf_len);
>> >> + return -EINVAL;
>> >> + }
>> >> +
>> >> + /* Verify that the PDSM family is valid */
>> >> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> >> + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
>> >> + pkg->hdr.nd_family);
>> >> + return -EINVAL;
>> >> +
>> >> + }
>> >> +
>> >> + /* We except a payload with all PDSM commands */
>> >> + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> >> + dev_dbg(&p->pdev->dev,
>> >> + "Empty payload for sub-command=0x%llx\n",
>> >> + pkg->hdr.nd_command);
>> >> + return -EINVAL;
>> >> + }
>> >> + }
>> >> +
>> >> + /* Command looks valid */
>> >
>> > I assume the first command to be implemented also checks the { nd_command,
>> > payload_version, payload length } for correctness?
>> Yes the pdsm service functions do check the payload_version and
>> payload_length. Please see the papr_pdsm_health() that services the
>> PAPR_PDSM_HEALTH pdsm in Patch-5
>>
>
> cool.
>
>> >
>> >> + return 0;
>> >> +}
>> >> +
>> >> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >> + struct nd_pdsm_cmd_pkg *call_pkg)
>> >> +{
>> >> + /* unknown subcommands return error in packages */
>> >> + if (call_pkg->hdr.nd_command <= PAPR_PDSM_MIN ||
>> >> + call_pkg->hdr.nd_command >= PAPR_PDSM_MAX) {
>> >> + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
>> >> + call_pkg->hdr.nd_command);
>> >> + call_pkg->cmd_status = -EINVAL;
>> >> + return 0;
>> >> + }
>> >> +
>> >> + /* Depending on the DSM command call appropriate service routine */
>> >> + switch (call_pkg->hdr.nd_command) {
>> >> + default:
>> >> + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> >> + call_pkg->hdr.nd_command);
>> >> + call_pkg->cmd_status = -ENOENT;
>> >> + return 0;
>> >> + }
>> >> +}
>> >> +
>> >> static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> >> struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >> unsigned int buf_len, int *cmd_rc)
>> >> {
>> >> struct nd_cmd_get_config_size *get_size_hdr;
>> >> struct papr_scm_priv *p;
>> >> + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
>> >> + int rc;
>> >>
>> >> - /* Only dimm-specific calls are supported atm */
>> >> - if (!nvdimm)
>> >> - return -EINVAL;
>> >> + /* Use a local variable in case cmd_rc pointer is NULL */
>> >> + if (cmd_rc == NULL)
>> >> + cmd_rc = &rc;
>> >
>> > Why is this needed? AFAICT The caller of papr_scm_ndctl does not specify null
>> > and you did not change it.
>> This pointer is coming from outside the papr_scm code hence need to be
>> defensive here. Also as per[1] cmd_rc is "translation of firmware status"
>> and not every caller would need it hence making this pointer optional.
>>
>> This is evident in acpi_nfit_blk_get_flags() where the 'nd_desc->ndctl'
>> is called with 'cmd_rc == NULL'.
>>
>> [1] https://lore.kernel.org/linux-nvdimm/CAPcyv4hE_FG0YZXJVA1G=CBq8b9e0K54jxk5Sq5UKU-dnWT2Kg@mail.gmail.com/
>
> Ah... Ok. So this is a bug fix which needs to happen regardless of the status
> of this patch...
>
>>
>> >
>> >> +
>> >> + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
>> >> + if (*cmd_rc) {
>> >> + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
>> >> + return *cmd_rc;
>> >> + }
>> >>
>> >> p = nvdimm_provider_data(nvdimm);
>> >>
>> >> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> >> *cmd_rc = papr_scm_meta_set(p, buf);
>
> ... Because this will break here. even without this new code... right?
>
> Lets get this fix in as a prelim-patch.
Yes, I have moved the changes proposed in this hunk to a separate prelim
patch in the v10 of the patch series.
>
>> >> break;
>> >>
>> >> + case ND_CMD_CALL:
>> >> + call_pkg = nd_to_pdsm_cmd_pkg(buf);
>> >> + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
>> >> + break;
>> >> +
>> >> default:
>> >> - return -EINVAL;
>> >> + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
>> >> + *cmd_rc = -EINVAL;
>> >
>> > Is this change related? If there is a bug where there is a caller of
>> > papr_scm_ndctl() with cmd_rc == NULL this should be a separate patch to fix
>> > that issue.
>> This simplifies a bit debugging of errors reported in
>> papr_scm_ndctl() as it ensures that subsequest dev_dbg "Returned with
>> cmd_rc" is always logged.
>>
>> I think, this is a too small change to be carved out as an independent
>> patch. Also this doesnt change the behaviour of the code except logging
>> some more error info.
>>
>> However, If you feel too strongly about it I will spin a separate patch
>> in this patch series for this.
[..]
>
> This can go in as part of a 'protect against cmd_rc == NULL'
> preliminary patch.
Yes, have coalesced this change with changes I reffered to in
my previous comment into a single prelim patch.
[..]
>
> I flagged this because at first I could not figure out what this had to do with
> the ND_CMD_CALL...
>
> For reviewers you want to make your patches concise to what you are
> fixing/adding...
Sure. Will be more careful of this in future patches.
>
> Also, based on acpi_nfit_blk_get_flags() using cmd_rc == NULL it looks like we
> have a bug which needs to get fixed regardless of the this patch. And if that
> bug exists in earlier kernels you will need a separate patch to backport as a
> fix.
>
> So lets get that in first and separate... :-D
Sure, will send out a separate independent patch fixing the cmd_rc ==
NULL issue in acpi_nfit_blk_get_flags addressing it to stable tree.
>
> Ira
>
>>
>> >
>> > Ira
>> >
>> >> }
>> >>
>> >> dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>> >>
>> >> - return 0;
>> >> + return *cmd_rc;
>> >> }
>> >>
>> >> static ssize_t flags_show(struct device *dev,
>> >> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
>> >> index de5d90212409..0e09dc5cec19 100644
>> >> --- a/include/uapi/linux/ndctl.h
>> >> +++ b/include/uapi/linux/ndctl.h
>> >> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>> >> #define NVDIMM_FAMILY_HPE2 2
>> >> #define NVDIMM_FAMILY_MSFT 3
>> >> #define NVDIMM_FAMILY_HYPERV 4
>> >> +#define NVDIMM_FAMILY_PAPR 5
>> >>
>> >> #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
>> >> struct nd_cmd_pkg)
>> >> --
>> >> 2.26.2
>> >>
>>
>> --
>> Cheers
>> ~ Vaibhav
--
Cheers
~ Vaibhav
Hi Ira,
Thanks again for looking into patch. My responses below:
Ira Weiny <[email protected]> writes:
> On Thu, Jun 04, 2020 at 12:34:04AM +0530, Vaibhav Jain wrote:
>> Hi Ira,
>>
>> Thanks for reviewing this patch. My responses below:
>>
>> Ira Weiny <[email protected]> writes:
>>
>> > On Tue, Jun 02, 2020 at 03:44:38PM +0530, Vaibhav Jain wrote:
>> >> This patch implements support for PDSM request 'PAPR_PDSM_HEALTH'
>> >> that returns a newly introduced 'struct nd_papr_pdsm_health' instance
>> >> containing dimm health information back to user space in response to
>> >> ND_CMD_CALL. This functionality is implemented in newly introduced
>> >> papr_pdsm_health() that queries the nvdimm health information and
>> >> then copies this information to the package payload whose layout is
>> >> defined by 'struct nd_papr_pdsm_health'.
>> >>
>> >> The patch also introduces a new member 'struct papr_scm_priv.health'
>> >> thats an instance of 'struct nd_papr_pdsm_health' to cache the health
>> >> information of a nvdimm. As a result functions drc_pmem_query_health()
>> >> and flags_show() are updated to populate and use this new struct
>> >> instead of a u64 integer that was earlier used.
>> >>
>> >> Cc: "Aneesh Kumar K . V" <[email protected]>
>> >> Cc: Dan Williams <[email protected]>
>> >> Cc: Michael Ellerman <[email protected]>
>> >> Cc: Ira Weiny <[email protected]>
>> >> Reviewed-by: Aneesh Kumar K.V <[email protected]>
>> >> Signed-off-by: Vaibhav Jain <[email protected]>
>> >> ---
>> >> Changelog:
>> >>
>> >> Resend:
>> >> * Added ack from Aneesh.
>> >>
>> >> v8..v9:
>> >> * s/PAPR_SCM_PDSM_HEALTH/PAPR_PDSM_HEALTH/g [ Dan , Aneesh ]
>> >> * s/PAPR_SCM_PSDM_DIMM_*/PAPR_PDSM_DIMM_*/g
>> >> * Renamed papr_scm_get_health() to papr_psdm_health()
>> >> * Updated patch description to replace papr-scm dimm with nvdimm.
>> >>
>> >> v7..v8:
>> >> * None
>> >>
>> >> Resend:
>> >> * None
>> >>
>> >> v6..v7:
>> >> * Updated flags_show() to use seq_buf_printf(). [Mpe]
>> >> * Updated papr_scm_get_health() to use newly introduced
>> >> __drc_pmem_query_health() bypassing the cache [Mpe].
>> >>
>> >> v5..v6:
>> >> * Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
>> >> gaurd against possibility of different compilers adding different
>> >> paddings to the struct [ Dan Williams ]
>> >>
>> >> * Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
>> >> 'bool' and also updated drc_pmem_query_health() to take this into
>> >> account. [ Dan Williams ]
>> >>
>> >> v4..v5:
>> >> * None
>> >>
>> >> v3..v4:
>> >> * Call the DSM_PAPR_SCM_HEALTH service function from
>> >> papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
>> >>
>> >> v2..v3:
>> >> * Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
>> >> as its exported to the userspace [Aneesh]
>> >> * Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
>> >> from enum to #defines [Aneesh]
>> >>
>> >> v1..v2:
>> >> * New patch in the series
>> >> ---
>> >> arch/powerpc/include/uapi/asm/papr_pdsm.h | 39 +++++++
>> >> arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++++++---
>> >> 2 files changed, 147 insertions(+), 17 deletions(-)
>> >>
>> >> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> index 6407fefcc007..411725a91591 100644
>> >> --- a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> @@ -115,6 +115,7 @@ struct nd_pdsm_cmd_pkg {
>> >> */
>> >> enum papr_pdsm {
>> >> PAPR_PDSM_MIN = 0x0,
>> >> + PAPR_PDSM_HEALTH,
>> >> PAPR_PDSM_MAX,
>> >> };
>> >>
>> >> @@ -133,4 +134,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> >> return (void *)(pcmd->payload);
>> >> }
>> >>
>> >> +/* Various nvdimm health indicators */
>> >> +#define PAPR_PDSM_DIMM_HEALTHY 0
>> >> +#define PAPR_PDSM_DIMM_UNHEALTHY 1
>> >> +#define PAPR_PDSM_DIMM_CRITICAL 2
>> >> +#define PAPR_PDSM_DIMM_FATAL 3
>> >> +
>> >> +/*
>> >> + * Struct exchanged between kernel & ndctl in for PAPR_PDSM_HEALTH
>> >> + * Various flags indicate the health status of the dimm.
>> >> + *
>> >> + * dimm_unarmed : Dimm not armed. So contents wont persist.
>> >> + * dimm_bad_shutdown : Previous shutdown did not persist contents.
>> >> + * dimm_bad_restore : Contents from previous shutdown werent restored.
>> >> + * dimm_scrubbed : Contents of the dimm have been scrubbed.
>> >> + * dimm_locked : Contents of the dimm cant be modified until CEC reboot
>> >> + * dimm_encrypted : Contents of dimm are encrypted.
>> >> + * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
>> >> + */
>> >> +struct nd_papr_pdsm_health_v1 {
>> >> + __u8 dimm_unarmed;
>> >> + __u8 dimm_bad_shutdown;
>> >> + __u8 dimm_bad_restore;
>> >> + __u8 dimm_scrubbed;
>> >> + __u8 dimm_locked;
>> >> + __u8 dimm_encrypted;
>> >> + __u16 dimm_health;
>> >> +} __packed;
>> >> +
>> >> +/*
>> >> + * Typedef the current struct for dimm_health so that any application
>> >> + * or kernel recompiled after introducing a new version automatically
>> >> + * supports the new version.
>> >> + */
>> >> +#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
>> >> +
>> >> +/* Current version number for the dimm health struct */
>> >
>> > This can't be the 'current' version. You will need a list of versions you
>> > support. Because if the user passes in an old version you need to be able to
>> > respond with that old version. Also if you plan to support 'return X for a Y
>> > query' then the user will need both X and Y defined to interpret X.
>> Yes, and that change will be introduced with addition of version-2 of
>> nd_papr_pdsm_health. Earlier version of the patchset[1] had such a table
>> implemented. But to simplify the patchset, as we are only dealing with
>> version-1 of the structs right now, it was dropped.
>>
>> [1] :
>> https://lore.kernel.org/linuxppc-dev/[email protected]/
>
> I'm not sure I follow that comment.
>
> I feel like there is some confusion about what firmware can return vs the UAPI
> structure. You have already marshaled the data between the 2. We can define
> whatever we want for the UAPI structures throwing away data the kernel does not
> understand from the firmware.
>
>>
>> >
>> >> +#define ND_PAPR_PDSM_HEALTH_VERSION 1
>> >> +
>> >> #endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index 5e2237e7ec08..c0606c0c659c 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -88,7 +88,7 @@ struct papr_scm_priv {
>> >> unsigned long lasthealth_jiffies;
>> >>
>> >> /* Health information for the dimm */
>> >> - u64 health_bitmap;
>> >> + struct nd_papr_pdsm_health health;
>> >
>> > ok so we are throwing away all the #defs from patch 1? Are they still valid?
>> >
>> > I'm confused that patch 3 added this and we are throwing it away
>> > here...
>> The #defines are still valid, only the usage moved to a __drc_pmem_query_health().
>>
>> >
>> >> };
>> >>
>> >> static int drc_pmem_bind(struct papr_scm_priv *p)
>> >> @@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>> >> static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> >> {
>> >> unsigned long ret[PLPAR_HCALL_BUFSIZE];
>> >> + u64 health;
>> >> long rc;
>> >>
>> >> /* issue the hcall */
>> >> @@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> >> if (rc != H_SUCCESS) {
>> >> dev_err(&p->pdev->dev,
>> >> "Failed to query health information, Err:%ld\n", rc);
>> >> - rc = -ENXIO;
>> >> - goto out;
>> >> + return -ENXIO;
>> >
>> > I missed this... probably did not need the goto in the first patch?
>> Yes, will get rid of the goto from patch-1.
>
> Cool.
>
>>
>> >
>> >> }
>> >>
>> >> p->lasthealth_jiffies = jiffies;
>> >> - p->health_bitmap = ret[0] & ret[1];
>> >> + health = ret[0] & ret[1];
>> >>
>> >> dev_dbg(&p->pdev->dev,
>> >> "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>> >> ret[0], ret[1]);
>> >> -out:
>> >> - return rc;
>> >> +
>> >> + memset(&p->health, 0, sizeof(p->health));
>> >> +
>> >> + /* Check for various masks in bitmap and set the buffer */
>> >> + if (health & PAPR_PMEM_UNARMED_MASK)
>> >
>> > Oh ok... odd. (don't add code then just take it away in a series)
>> > You could have lead with the user structure and put this code in patch
>> > 3.
>> The struct nd_papr_pdsm_health in only introduced this patch in header
>> 'papr_pdsm.h' as means of exchanging nvdimm health information with
>> userspace. Introducing this struct without introducing the necessary
>> scafolding in 'papr_pdsm.h' would have been very counter-intutive.
>
> I respectfully disagree. You intended to use a copy of this structure in
> kernel to store the data. Just do that.
Have addressed this in v10 that doesnt resort to removing the
functionality that was introduced in an earlier patch.
>
>>
>> >
>> > Why does the user need u8 to represent a single bit? Does this help protect
>> > against endian issues?
>> This was 'bool' earlier but since type 'bool' isnt suitable for ioctl abi
>> and I wanted to avoid bit fields here as not sure if their packing may
>> differ across compilers hence replaced with u8.
>>
>
> ok works for me...
>
>> >
>> >> + p->health.dimm_unarmed = 1;
>> >> +
>> >> + if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> >> + p->health.dimm_bad_shutdown = 1;
>> >> +
>> >> + if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> >> + p->health.dimm_bad_restore = 1;
>> >> +
>> >> + if (health & PAPR_PMEM_ENCRYPTED)
>> >> + p->health.dimm_encrypted = 1;
>> >> +
>> >> + if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED) {
>> >> + p->health.dimm_locked = 1;
>> >> + p->health.dimm_scrubbed = 1;
>> >> + }
>> >> +
>> >> + if (health & PAPR_PMEM_HEALTH_UNHEALTHY)
>> >> + p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
>> >> +
>> >> + if (health & PAPR_PMEM_HEALTH_CRITICAL)
>> >> + p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
>> >> +
>> >> + if (health & PAPR_PMEM_HEALTH_FATAL)
>> >> + p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
>> >> +
>> >> + return 0;
>> >> }
>> >>
>> >> /* Min interval in seconds for assuming stable dimm health */
>> >> @@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> >> return 0;
>> >> }
>> >>
>> >> +/* Fetch the DIMM health info and populate it in provided package. */
>> >> +static int papr_pdsm_health(struct papr_scm_priv *p,
>> >> + struct nd_pdsm_cmd_pkg *pkg)
>> >> +{
>> >> + int rc;
>> >> + size_t copysize = sizeof(p->health);
>> >> +
>> >> + /* Ensure dimm health mutex is taken preventing concurrent access */
>> >> + rc = mutex_lock_interruptible(&p->health_mutex);
>> >> + if (rc)
>> >> + goto out;
>> >> +
>> >> + /* Always fetch upto date dimm health data ignoring cached values */
>> >> + rc = __drc_pmem_query_health(p);
>> >> + if (rc)
>> >> + goto out_unlock;
>> >> + /*
>> >> + * If the requested payload version is greater than one we know
>> >> + * about, return the payload version we know about and let
>> >> + * caller/userspace handle.
>> >> + */
>> >> + if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
>> >> + pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
>> >
>> > I know this seems easy now but I do think you will run into trouble later.
>>
>> I did addressed this in an earlier iteration of this patchset[1] and
>> dropped it in favour of simplicity.
>>
>> [1] :
>> https://lore.kernel.org/linuxppc-dev/[email protected]/
>
> I don't see how that addresses this? See my other email.
>
> Ira
>
>>
>> > Ira
>> >
>> >> +
>> >> + if (pkg->hdr.nd_size_out < copysize) {
>> >> + dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
>> >> + pkg->hdr.nd_size_out, copysize);
>> >> + rc = -ENOSPC;
>> >> + goto out_unlock;
>> >> + }
>> >> +
>> >> + dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
>> >> + copysize, pkg->payload_version);
>> >> +
>> >> + /* Copy the health struct to the payload */
>> >> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
>> >> + pkg->hdr.nd_fw_size = copysize;
>> >> +
>> >> +out_unlock:
>> >> + mutex_unlock(&p->health_mutex);
>> >> +
>> >> +out:
>> >> + /*
>> >> + * Put the error in out package and return success from function
>> >> + * so that errors if any are propogated back to userspace.
>> >> + */
>> >> + pkg->cmd_status = rc;
>> >> + dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
>> >> +
>> >> + return 0;
>> >> +}
>> >> +
>> >> static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >> struct nd_pdsm_cmd_pkg *call_pkg)
>> >> {
>> >> @@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> >>
>> >> /* Depending on the DSM command call appropriate service routine */
>> >> switch (call_pkg->hdr.nd_command) {
>> >> + case PAPR_PDSM_HEALTH:
>> >> + return papr_pdsm_health(p, call_pkg);
>> >> +
>> >> default:
>> >> dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> >> call_pkg->hdr.nd_command);
>> >> @@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
>> >> struct nvdimm *dimm = to_nvdimm(dev);
>> >> struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> >> struct seq_buf s;
>> >> - u64 health;
>> >> int rc;
>> >>
>> >> rc = drc_pmem_query_health(p);
>> >> if (rc)
>> >> return rc;
>> >>
>> >> - /* Copy health_bitmap locally, check masks & update out buffer */
>> >> - health = READ_ONCE(p->health_bitmap);
>> >> -
>> >> seq_buf_init(&s, buf, PAGE_SIZE);
>> >> - if (health & PAPR_PMEM_UNARMED_MASK)
>> >> +
>> >> + /* Protect concurrent modifications to papr_scm_priv */
>> >> + rc = mutex_lock_interruptible(&p->health_mutex);
>> >> + if (rc)
>> >> + return rc;
>> >> +
>> >> + if (p->health.dimm_unarmed)
>> >> seq_buf_printf(&s, "not_armed ");
>> >>
>> >> - if (health & PAPR_PMEM_BAD_SHUTDOWN_MASK)
>> >> + if (p->health.dimm_bad_shutdown)
>> >> seq_buf_printf(&s, "flush_fail ");
>> >>
>> >> - if (health & PAPR_PMEM_BAD_RESTORE_MASK)
>> >> + if (p->health.dimm_bad_restore)
>> >> seq_buf_printf(&s, "restore_fail ");
>> >>
>> >> - if (health & PAPR_PMEM_ENCRYPTED)
>> >> + if (p->health.dimm_encrypted)
>> >> seq_buf_printf(&s, "encrypted ");
>> >>
>> >> - if (health & PAPR_PMEM_SMART_EVENT_MASK)
>> >> + if (p->health.dimm_health)
>> >> seq_buf_printf(&s, "smart_notify ");
>> >>
>> >> - if (health & PAPR_PMEM_SCRUBBED_AND_LOCKED)
>> >> - seq_buf_printf(&s, "scrubbed locked ");
>> >> + if (p->health.dimm_scrubbed)
>> >> + seq_buf_printf(&s, "scrubbed ");
>> >> +
>> >> + if (p->health.dimm_locked)
>> >> + seq_buf_printf(&s, "locked ");
>> >> +
>> >> + mutex_unlock(&p->health_mutex);
>> >>
>> >> if (seq_buf_used(&s))
>> >> seq_buf_printf(&s, "\n");
>> >> --
>> >> 2.26.2
>> >>
>>
>> --
>> Cheers
>> ~ Vaibhav
--
Cheers
~ Vaibhav
> -----Original Message-----
> From: Vaibhav Jain <[email protected]>
> Sent: Thursday, June 4, 2020 2:06 AM
> To: Williams, Dan J <[email protected]>; linuxppc-
> [email protected]; [email protected]; linux-
> [email protected]
> Cc: Santosh Sivaraj <[email protected]>; Aneesh Kumar K . V
> <[email protected]>; Steven Rostedt <[email protected]>;
> Oliver O'Halloran <[email protected]>; Weiny, Ira <[email protected]>
> Subject: RE: [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for
> PAPR nvdimm specific methods
>
> Hi Dan,
>
> Thanks for review and insights on this. My responses below:
>
> "Williams, Dan J" <[email protected]> writes:
>
> > [ forgive formatting I'm temporarily stuck using Outlook this week...
> > ]
> >
> >> From: Vaibhav Jain <[email protected]>
> > [..]
> >>
> >> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in
> papr_scm
> >> module and add the command family NVDIMM_FAMILY_PAPR to the
> white
> >> list of NVDIMM command sets. Also advertise support for ND_CMD_CALL
> >> for the nvdimm command mask and implement necessary scaffolding in
> >> the module to handle ND_CMD_CALL ioctl and PDSM requests that we
> receive.
> >>
> >> The layout of the PDSM request as we expect from libnvdimm/libndctl
> >> is described in newly introduced uapi header 'papr_pdsm.h' which
> >> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used to
> >> communicate the PDSM request via member
> 'nd_cmd_pkg.nd_command' and
> >> size of payload that need to be sent/received for servicing the PDSM.
> >>
> >> A new function is_cmd_valid() is implemented that reads the args to
> >> papr_scm_ndctl() and performs sanity tests on them. A new function
> >> papr_scm_service_pdsm() is introduced and is called from
> >> papr_scm_ndctl() in case of a PDSM request is received via
> >> ND_CMD_CALL command from libnvdimm.
> >>
> >> Cc: "Aneesh Kumar K . V" <[email protected]>
> >> Cc: Dan Williams <[email protected]>
> >> Cc: Michael Ellerman <[email protected]>
> >> Cc: Ira Weiny <[email protected]>
> >> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> >> Signed-off-by: Vaibhav Jain <[email protected]>
> >> ---
> >> Changelog:
> >>
> >> Resend:
> >> * Added ack from Aneesh.
> >>
> >> v8..v9:
> >> * Reduced the usage of term SCM replacing it with appropriate
> >> replacement [ Dan Williams, Aneesh ]
> >> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
> >> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
> >> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
> >> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
> >> * Minor update to patch description.
> >>
> >> v7..v8:
> >> * Removed the 'payload_offset' field from 'struct
> >> nd_pdsm_cmd_pkg'. Instead command payload is always assumed to
> start
> >> at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
> >> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
> >> 'reserved' field of 10-bytes is introduced. [ Aneesh ]
> >> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
> >> [ Ira ]
> >>
> >> Resend:
> >> * None
> >>
> >> v6..v7 :
> >> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
> >> [Mpe].
> >> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h
> [Mpe].
> >> * Removed macros that were unused in papr_scm.c from
> papr_scm_pdsm.h
> >> [Mpe].
> >> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
> >>
> >> v5..v6 :
> >> * Changed the usage of the term DSM to PDSM to distinguish it from the
> >> ACPI term [ Dan Williams ]
> >> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various
> >> struct
> >> to reflect the new terminology.
> >> * Updated the patch description and title to reflect the new terminology.
> >> * Squashed patch to introduce new command family in 'ndctl.h' with
> >> this patch [ Dan Williams ]
> >> * Updated the papr_scm_pdsm method starting index from 0x10000 to
> 0x0
> >> [ Dan Williams ]
> >> * Removed redundant license text from the papr_scm_psdm.h file.
> >> [ Dan Williams ]
> >> * s/envelop/envelope/ at various places [ Dan Williams ]
> >> * Added '__packed' attribute to command package header to gaurd
> >> against different compiler adding paddings between the fields.
> >> [ Dan Williams]
> >> * Converted various pr_debug to dev_debug [ Dan Williams ]
> >>
> >> v4..v5 :
> >> * None
> >>
> >> v3..v4 :
> >> * None
> >>
> >> v2..v3 :
> >> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
> >>
> >> v1..v2 :
> >> * None
> >> ---
> >> arch/powerpc/include/uapi/asm/papr_pdsm.h | 136
> >> ++++++++++++++++++++++
> arch/powerpc/platforms/pseries/papr_scm.c |
> >> 101 +++++++++++++++-
> >> include/uapi/linux/ndctl.h | 1 +
> >> 3 files changed, 232 insertions(+), 6 deletions(-) create mode
> >> 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
> >>
> >> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h
> >> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> >> new file mode 100644
> >> index 000000000000..6407fefcc007
> >> --- /dev/null
> >> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
> >> @@ -0,0 +1,136 @@
> >> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> >> +/*
> >> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
> >> + *
> >> + * (C) Copyright IBM 2020
> >> + *
> >> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com> */
> >> +
> >> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_ #define
> >> +_UAPI_ASM_POWERPC_PAPR_PDSM_H_
> >> +
> >> +#include <linux/types.h>
> >> +
> >> +/*
> >> + * PDSM Envelope:
> >> + *
> >> + * The ioctl ND_CMD_CALL transfers data between user-space and
> >> +kernel via
> >> + * envelope which consists of a header and user-defined payload
> sections.
> >> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects
> >> +a
> >> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload'
> field.
> >> + * There is reserved field that can used to introduce new fields to
> >> +the
> >> + * structure in future. It also tries to ensure that
> >> 'nd_pdsm_cmd_pkg.payload'
> >> + * lies at a 8-byte boundary.
> >> + *
> >> + * +-------------+---------------------+---------------------------+
> >> + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
> >> + * +-------------+---------------------+---------------------------+
> >> + * | nd_pdsm_cmd_pkg | |
> >> + * |-------------+ | |
> >> + * | nd_cmd_pkg | | |
> >> + * +-------------+---------------------+---------------------------+
> >> + * | nd_family | | |
> >> + * | nd_size_out | cmd_status | |
> >> + * | nd_size_in | payload_version | payload |
> >> + * | nd_command | reserved | |
> >> + * | nd_fw_size | | |
> >> + *
> >> + +-------------+---------------------+---------------------------+
> >> + *
> >> + * PDSM Header:
> >> + *
> >> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> >> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to
> >> member
> >> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the
> >> envelope
> >> +which is
> >> + * contained in 'struct nd_cmd_pkg', the header also has members
> >> +following
> >> + * members:
> >> + *
> >> + * 'cmd_status' : (Out) Errors if any encountered while
> >> servicing PDSM.
> >> + * 'payload_version' : (In/Out) Version number associated with
> the
> >> payload.
> >> + * 'reserved' : Not used and reserved for future.
> >> + *
> >> + * PDSM Payload:
> >> + *
> >> + * The layout of the PDSM Payload is defined by various structs
> >> +shared between
> >> + * papr_scm and libndctl so that contents of payload can be
> >> +interpreted. During
> >> + * servicing of a PDSM the papr_scm module will read input args from
> >> +the payload
> >> + * field by casting its contents to an appropriate struct pointer
> >> +based on the
> >> + * PDSM command. Similarly the output of servicing the PDSM command
> >> +will be
> >> + * copied to the payload field using the same struct.
> >> + *
> >> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope
> >> +size, which
> >> + * leaves around 176 bytes for the envelope payload (ignoring any
> >> +padding that
> >> + * the compiler may silently introduce).
> >> + *
> >> + * Payload Version:
> >> + *
> >> + * A 'payload_version' field is present in PDSM header that
> >> +indicates a specific
> >> + * version of the structure present in PDSM Payload for a given PDSM
> >> command.
> >> + * This provides backward compatibility in case the PDSM Payload
> >> +structure
> >> + * evolves and different structures are supported by 'papr_scm' and
> >> 'libndctl'.
> >> + *
> >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send
> >> +the version
> >> + * of the payload struct it supports via 'payload_version' field.
> >> +The
> >> 'papr_scm'
> >> + * module when servicing the PDSM envelope checks the
> 'payload_version'
> >> +and then
> >> + * uses 'payload struct version' == MIN('payload_version field',
> >> + * 'max payload-struct-version supported by papr_scm') to service
> >> +the
> >> PDSM.
> >> + * After servicing the PDSM, 'papr_scm' put the negotiated version
> >> +of payload
> >> + * struct in returned 'payload_version' field.
> >> + *
> >> + * Libndctl on receiving the envelope back from papr_scm again
> >> +checks the
> >> + * 'payload_version' field and based on it use the appropriate
> >> +version dsm
> >> + * struct to parse the results.
> >> + *
> >> + * Backward Compatibility:
> >> + *
> >> + * Above scheme of exchanging different versioned PDSM struct
> >> +between libndctl
> >> + * and papr_scm should provide backward compatibility until
> >> +following two
> >> + * assumptions/conditions when defining new PDSM structs hold:
> >> + *
> >> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> >> + *
> >> + * 1. T(X) is a proper subset of T(Y) if Y > X.
> >> + * i.e Each new version of PDSM struct should retain existing struct
> >> + * attributes from previous version
> >> + *
> >> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> >> + * it should also support T(1), T(2)...T(X - 1).
> >> + * i.e When adding support for new version of a PDSM struct, libndctl
> >> + * and papr_scm should retain support of the existing PDSM struct
> >> + * version they support.
> >> + */
> >> +
> >> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from
> >> libnvdimm
> >> +*/ struct nd_pdsm_cmd_pkg {
> >> + struct nd_cmd_pkg hdr; /* Package header containing sub-
> >> cmd */
> >> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> >> + __u16 reserved[5]; /* Ignored and to be used in future */
> >> + __u16 payload_version; /* In/Out: version of the payload */
> >> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> >> +} __packed;
> >> +
> >> +/*
> >> + * Methods to be embedded in ND_CMD_CALL request. These are sent
> to
> >> the
> >> +kernel
> >> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> >> +*/ enum papr_pdsm {
> >> + PAPR_PDSM_MIN = 0x0,
> >> + PAPR_PDSM_MAX,
> >> +};
> >> +
> >> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */ static
> >> +inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct
> nd_cmd_pkg
> >> *cmd) {
> >> + return (struct nd_pdsm_cmd_pkg *) cmd; }
> >> +
> >> +/* Return the payload pointer for a given pcmd */ static inline void
> >> +*pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) {
> >> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> >> + return NULL;
> >> + else
> >> + return (void *)(pcmd->payload);
> >> +}
> >> +
> >> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> >> b/arch/powerpc/platforms/pseries/papr_scm.c
> >> index 149431594839..5e2237e7ec08 100644
> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> >> @@ -15,13 +15,15 @@
> >> #include <linux/seq_buf.h>
> >>
> >> #include <asm/plpar_wrappers.h>
> >> +#include <asm/papr_pdsm.h>
> >>
> >> #define BIND_ANY_ADDR (~0ul)
> >>
> >> #define PAPR_SCM_DIMM_CMD_MASK \
> >> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> >> (1ul << ND_CMD_GET_CONFIG_DATA) | \
> >> - (1ul << ND_CMD_SET_CONFIG_DATA))
> >> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
> >> + (1ul << ND_CMD_CALL))
> >>
> >> /* DIMM health bitmap bitmap indicators */
> >> /* SCM device is unable to persist memory contents */ @@ -350,16
> >> +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> >> return 0;
> >> }
> >>
> >> +/*
> >> + * Validate the inputs args to dimm-control function and return '0' if valid.
> >> + * This also does initial sanity validation to ND_CMD_CALL
> >> +sub-command
> >> packages.
> >> + */
> >> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd,
> >> +void
> >> *buf,
> >> + unsigned int buf_len)
> >> +{
> >> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> >> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> >> + struct papr_scm_priv *p;
> >> +
> >> + /* Only dimm-specific calls are supported atm */
> >> + if (!nvdimm)
> >> + return -EINVAL;
> >> +
> >> + /* get the provider date from struct nvdimm */
> >> + p = nvdimm_provider_data(nvdimm);
> >> +
> >> + if (!test_bit(cmd, &cmd_mask)) {
> >> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> >> + return -EINVAL;
> >> + } else if (cmd == ND_CMD_CALL) {
> >> +
> >> + /* Verify the envelope package */
> >> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> >> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> >> + buf_len);
> >> + return -EINVAL;
> >> + }
> >> +
> >> + /* Verify that the PDSM family is valid */
> >> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
> >> + dev_dbg(&p->pdev->dev, "Invalid pkg
> >> family=0x%llx\n",
> >> + pkg->hdr.nd_family);
> >> + return -EINVAL;
> >> +
> >> + }
> >> +
> >> + /* We except a payload with all PDSM commands */
> >> + if (pdsm_cmd_to_payload(pkg) == NULL) {
> >> + dev_dbg(&p->pdev->dev,
> >> + "Empty payload for sub-command=0x%llx\n",
> >> + pkg->hdr.nd_command);
> >> + return -EINVAL;
> >> + }
> >> + }
> >> +
> >> + /* Command looks valid */
> >
> <snip>
> > So this is where I would expect the kernel to validate the command vs
> > a known list of supported commands / payloads. One of the goals of
> > requiring public documentation of any commands that libnvdimm might
> > support for the ioctl path is to give the kernel the ability to gate
> > future enabling on consideration of a common kernel front-end
> > interface. I believe this would also address questions about the
> > versioning scheme because userspace would be actively prevented from
> > sending command payloads that were not first explicitly enabled in the
> > kernel. This interface as it stands in this patch set seems to be a
> > very thin / "anything goes" passthrough with no consideration for that
> > policy.
> >
> > As an example of the utility of this policy, consider the recent
> > support for nvdimm security commands that allow a passphrase to be set
> > and issue commands like "unlock" and "secure erase". The kernel
> > actively prevents those commands from being sent from userspace. See
> > acpi_nfit_clear_to_send() and nd_cmd_clear_to_send(). The reasoning is
> > that it enforces the kernel's nvdimm security model that uses
> > encrypted/trusted keys to protect key material (clear text keys
> > only-ever exist in kernel-space). Yes, that restriction is painful for
> > people that don't want the kernel's security model and just want the
> > simplicity of passing clear-text keys around, but it's necessary for
> > the kernel to have any chance to provide a common abstraction across
> > vendors. The pain of negotiating every single command with what the
> > kernel will support is useful for the long term health of the kernel.
> > It forces ongoing conversations across vendors to consolidate
> > interfaces and reuse kernel best practices like encrypted/trusted
> > keys. Code acceptance is the only real gate the kernel has to enforce
> > cooperation across vendors.
> >
> > The expectation is that the kernel does not allow any command to pass
> > that is not explicitly listed in a bitmap of known commands. I would
> > expect that if you changed the payload of an existing command that
> > would likely require a new entry in this bitmap. The goal is to give
> > the kernel a chance to constrain the passthrough interface to afford a
> > chance to have a discussion of what might done in a common
> > implementation. Another example is the label-area read-write commands.
> > The kernel needs explicit control to ensure that it owns the label
> > area and that userspace is not able to corrupt it (write it behind the
> > kernel's back).
> >
> > Now that said, I have battle scars with some OEMs that just want a
> > generic passthrough interface so they never need to work with the
> > kernel community again and can just write their custom validation
> > tooling and be done. I've mostly been successful in that fight outside
> > of the gaping hole of ND_CMD_VENDOR. That's the path that ipmctl has
> > used to issue commands that have not made it into the public
> > specification on docs.pmem.io. My warning shot for that is the
> > "disable_vendor_specific" module option that administrators can set to
> > only allow commands that the kernel explicitly knows the effects of to
> > be issued. The result is only tooling / enabling that submits to this
> > auditing regime is guaranteed to work everywhere.
>
> Agree with points made above. With this patchset we arent really trying to
> push an ioctl passthrough to exchange arbitary data with papr-scm module.
> Nor do we want to bypass the kernel community for any future
> enhancements on this interface. We made some design choices based on
> our understanding of certain restriction we saw in ndctl/libndctl. Specifically
> wanted to avoid issuing two CMD_CALL ioctl roundtrips.
>
> That being said I had an extended discussion with Aneesh rethinking the
> 'version' field and we both agreed *to remove this field* from the proposed
> 'struct nd_pdsm_cmd_pkg'. This should resolve the contentions around this
> Patch-4 in this patchset. Since the 'version' field isnt extensively used right
> now the impact on the patchset would be small.
>
> >
> > So, that long explanation out of the way, what does that mean for this
> > patch set? I'd like to understand if you still see a need for a
> > versioning scheme if the implementation is required to explicitly list
> > all the commands it supports? I.e. that the kernel need not worry
> > about userspace sending future unknown payloads because unknown
> > payloads are blocked. Also if your interface has anything similar to a
> > "vendor specific" passthrough I would like to require that go through
> > the ND_CMD_VENDOR ioctl, so that the kernel still has a common check
> > point to prevent vendor specific "I don't want to talk to the kernel
> > community" shenanigans, but even better if ND_CMD_VENDOR is
> something
> > the kernel can eventually jettison because nobody is using it.
>
> As I mentioned above this isn't a 'vendor specific passthrough'
> machenism. The 'version' field was proposed to avoid two CMD_CALL ioctl
> roundtrip to fetch and report extended nvdimm health data like 'life-
> remaining' which isnt always available for papr-scm.
Oh, why not define a maximal health payload with all the attributes you know about today, leave some room for future expansion, and then report a validity flag for each attribute? This is how the "intel" smart-health payload works. If they ever needed to extend the payload they would increase the size and add more validity flags. Old userspace never groks the new fields, new userspace knows to ask for and parse the larger payload.
See the flags field in 'struct nd_intel_smart' (in ndctl) and the translation of those flags to ndctl generic attribute flags intel_cmd_smart_get_flags().
In general I'd like ndctl to understand the superset of all health attributes across all vendors. For the truly vendor specific ones it would mean that the health flags with a specific "papr_scm" back-end just would never be set on an "intel" device. I.e. look at the "hpe" and "msft" health backends. They only set a subset of the valid flags that could be reported.
> However we just realized instead of relying on 'version' field we can
> advertise support for these extended attributes via nvdimm-flags from sysfs.
> Looking at the nvdimm-flags libndctl can use an appropriate pdsm command
> and struct to fetch the dimm health information from papr_scm via
> CMD_CALL.
>
> But thats something we plan to do in future and not with the current
> patchset which only reports fixed set of nvdimm health attributes.
>
> >
> > I feel like this is a conversation that will take a few days to
> > resolve, which does not leave time to push this for v5.8. That said, I
> > do think the health flags patches at the beginning of this series are
> > low risk and uncontentious. How about I merge those for v5.8 and
> > circle back to get this ioctl path queued early in v5.8-rc? Apologies
> > for the late feedback on this relative to v5.8.
> >
> Thanks for this consideration. Agree to the proposal. However changes to
> patchset with removal of 'version' field is fairly small hence can quickly push
> an updated patch series cumulating rest of the review comments from Ira.
>
> Does that sounds reasonable ?
"Williams, Dan J" <[email protected]> writes:
>> -----Original Message-----
>> From: Vaibhav Jain <[email protected]>
>> Sent: Thursday, June 4, 2020 2:06 AM
>> To: Williams, Dan J <[email protected]>; linuxppc-
>> [email protected]; [email protected]; linux-
>> [email protected]
>> Cc: Santosh Sivaraj <[email protected]>; Aneesh Kumar K . V
>> <[email protected]>; Steven Rostedt <[email protected]>;
>> Oliver O'Halloran <[email protected]>; Weiny, Ira <[email protected]>
>> Subject: RE: [RESEND PATCH v9 4/5] ndctl/papr_scm,uapi: Add support for
>> PAPR nvdimm specific methods
>>
>> Hi Dan,
>>
>> Thanks for review and insights on this. My responses below:
>>
>> "Williams, Dan J" <[email protected]> writes:
>>
>> > [ forgive formatting I'm temporarily stuck using Outlook this week...
>> > ]
>> >
>> >> From: Vaibhav Jain <[email protected]>
>> > [..]
>> >>
>> >> Introduce support for PAPR NVDIMM Specific Methods (PDSM) in
>> papr_scm
>> >> module and add the command family NVDIMM_FAMILY_PAPR to the
>> white
>> >> list of NVDIMM command sets. Also advertise support for ND_CMD_CALL
>> >> for the nvdimm command mask and implement necessary scaffolding in
>> >> the module to handle ND_CMD_CALL ioctl and PDSM requests that we
>> receive.
>> >>
>> >> The layout of the PDSM request as we expect from libnvdimm/libndctl
>> >> is described in newly introduced uapi header 'papr_pdsm.h' which
>> >> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used to
>> >> communicate the PDSM request via member
>> 'nd_cmd_pkg.nd_command' and
>> >> size of payload that need to be sent/received for servicing the PDSM.
>> >>
>> >> A new function is_cmd_valid() is implemented that reads the args to
>> >> papr_scm_ndctl() and performs sanity tests on them. A new function
>> >> papr_scm_service_pdsm() is introduced and is called from
>> >> papr_scm_ndctl() in case of a PDSM request is received via
>> >> ND_CMD_CALL command from libnvdimm.
>> >>
>> >> Cc: "Aneesh Kumar K . V" <[email protected]>
>> >> Cc: Dan Williams <[email protected]>
>> >> Cc: Michael Ellerman <[email protected]>
>> >> Cc: Ira Weiny <[email protected]>
>> >> Reviewed-by: Aneesh Kumar K.V <[email protected]>
>> >> Signed-off-by: Vaibhav Jain <[email protected]>
>> >> ---
>> >> Changelog:
>> >>
>> >> Resend:
>> >> * Added ack from Aneesh.
>> >>
>> >> v8..v9:
>> >> * Reduced the usage of term SCM replacing it with appropriate
>> >> replacement [ Dan Williams, Aneesh ]
>> >> * Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'
>> >> * s/PAPR_SCM_PDSM_*/PAPR_PDSM_*/g
>> >> * s/NVDIMM_FAMILY_PAPR_SCM/NVDIMM_FAMILY_PAPR/g
>> >> * Minor updates to 'papr_psdm.h' to replace usage of term 'SCM'.
>> >> * Minor update to patch description.
>> >>
>> >> v7..v8:
>> >> * Removed the 'payload_offset' field from 'struct
>> >> nd_pdsm_cmd_pkg'. Instead command payload is always assumed to
>> start
>> >> at 'nd_pdsm_cmd_pkg.payload'. [ Aneesh ]
>> >> * To enable introducing new fields to 'struct nd_pdsm_cmd_pkg',
>> >> 'reserved' field of 10-bytes is introduced. [ Aneesh ]
>> >> * Fixed a typo in "Backward Compatibility" section of papr_scm_pdsm.h
>> >> [ Ira ]
>> >>
>> >> Resend:
>> >> * None
>> >>
>> >> v6..v7 :
>> >> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>> >> [Mpe].
>> >> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h
>> [Mpe].
>> >> * Removed macros that were unused in papr_scm.c from
>> papr_scm_pdsm.h
>> >> [Mpe].
>> >> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>> >>
>> >> v5..v6 :
>> >> * Changed the usage of the term DSM to PDSM to distinguish it from the
>> >> ACPI term [ Dan Williams ]
>> >> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various
>> >> struct
>> >> to reflect the new terminology.
>> >> * Updated the patch description and title to reflect the new terminology.
>> >> * Squashed patch to introduce new command family in 'ndctl.h' with
>> >> this patch [ Dan Williams ]
>> >> * Updated the papr_scm_pdsm method starting index from 0x10000 to
>> 0x0
>> >> [ Dan Williams ]
>> >> * Removed redundant license text from the papr_scm_psdm.h file.
>> >> [ Dan Williams ]
>> >> * s/envelop/envelope/ at various places [ Dan Williams ]
>> >> * Added '__packed' attribute to command package header to gaurd
>> >> against different compiler adding paddings between the fields.
>> >> [ Dan Williams]
>> >> * Converted various pr_debug to dev_debug [ Dan Williams ]
>> >>
>> >> v4..v5 :
>> >> * None
>> >>
>> >> v3..v4 :
>> >> * None
>> >>
>> >> v2..v3 :
>> >> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>> >>
>> >> v1..v2 :
>> >> * None
>> >> ---
>> >> arch/powerpc/include/uapi/asm/papr_pdsm.h | 136
>> >> ++++++++++++++++++++++
>> arch/powerpc/platforms/pseries/papr_scm.c |
>> >> 101 +++++++++++++++-
>> >> include/uapi/linux/ndctl.h | 1 +
>> >> 3 files changed, 232 insertions(+), 6 deletions(-) create mode
>> >> 100644 arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >>
>> >> diff --git a/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> new file mode 100644
>> >> index 000000000000..6407fefcc007
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/include/uapi/asm/papr_pdsm.h
>> >> @@ -0,0 +1,136 @@
>> >> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> >> +/*
>> >> + * PAPR nvDimm Specific Methods (PDSM) and structs for libndctl
>> >> + *
>> >> + * (C) Copyright IBM 2020
>> >> + *
>> >> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com> */
>> >> +
>> >> +#ifndef _UAPI_ASM_POWERPC_PAPR_PDSM_H_ #define
>> >> +_UAPI_ASM_POWERPC_PAPR_PDSM_H_
>> >> +
>> >> +#include <linux/types.h>
>> >> +
>> >> +/*
>> >> + * PDSM Envelope:
>> >> + *
>> >> + * The ioctl ND_CMD_CALL transfers data between user-space and
>> >> +kernel via
>> >> + * envelope which consists of a header and user-defined payload
>> sections.
>> >> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects
>> >> +a
>> >> + * payload following it and accessible via 'nd_pdsm_cmd_pkg.payload'
>> field.
>> >> + * There is reserved field that can used to introduce new fields to
>> >> +the
>> >> + * structure in future. It also tries to ensure that
>> >> 'nd_pdsm_cmd_pkg.payload'
>> >> + * lies at a 8-byte boundary.
>> >> + *
>> >> + * +-------------+---------------------+---------------------------+
>> >> + * | 64-Bytes | 16-Bytes | Max 176-Bytes |
>> >> + * +-------------+---------------------+---------------------------+
>> >> + * | nd_pdsm_cmd_pkg | |
>> >> + * |-------------+ | |
>> >> + * | nd_cmd_pkg | | |
>> >> + * +-------------+---------------------+---------------------------+
>> >> + * | nd_family | | |
>> >> + * | nd_size_out | cmd_status | |
>> >> + * | nd_size_in | payload_version | payload |
>> >> + * | nd_command | reserved | |
>> >> + * | nd_fw_size | | |
>> >> + *
>> >> + +-------------+---------------------+---------------------------+
>> >> + *
>> >> + * PDSM Header:
>> >> + *
>> >> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
>> >> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to
>> >> member
>> >> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the
>> >> envelope
>> >> +which is
>> >> + * contained in 'struct nd_cmd_pkg', the header also has members
>> >> +following
>> >> + * members:
>> >> + *
>> >> + * 'cmd_status' : (Out) Errors if any encountered while
>> >> servicing PDSM.
>> >> + * 'payload_version' : (In/Out) Version number associated with
>> the
>> >> payload.
>> >> + * 'reserved' : Not used and reserved for future.
>> >> + *
>> >> + * PDSM Payload:
>> >> + *
>> >> + * The layout of the PDSM Payload is defined by various structs
>> >> +shared between
>> >> + * papr_scm and libndctl so that contents of payload can be
>> >> +interpreted. During
>> >> + * servicing of a PDSM the papr_scm module will read input args from
>> >> +the payload
>> >> + * field by casting its contents to an appropriate struct pointer
>> >> +based on the
>> >> + * PDSM command. Similarly the output of servicing the PDSM command
>> >> +will be
>> >> + * copied to the payload field using the same struct.
>> >> + *
>> >> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope
>> >> +size, which
>> >> + * leaves around 176 bytes for the envelope payload (ignoring any
>> >> +padding that
>> >> + * the compiler may silently introduce).
>> >> + *
>> >> + * Payload Version:
>> >> + *
>> >> + * A 'payload_version' field is present in PDSM header that
>> >> +indicates a specific
>> >> + * version of the structure present in PDSM Payload for a given PDSM
>> >> command.
>> >> + * This provides backward compatibility in case the PDSM Payload
>> >> +structure
>> >> + * evolves and different structures are supported by 'papr_scm' and
>> >> 'libndctl'.
>> >> + *
>> >> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send
>> >> +the version
>> >> + * of the payload struct it supports via 'payload_version' field.
>> >> +The
>> >> 'papr_scm'
>> >> + * module when servicing the PDSM envelope checks the
>> 'payload_version'
>> >> +and then
>> >> + * uses 'payload struct version' == MIN('payload_version field',
>> >> + * 'max payload-struct-version supported by papr_scm') to service
>> >> +the
>> >> PDSM.
>> >> + * After servicing the PDSM, 'papr_scm' put the negotiated version
>> >> +of payload
>> >> + * struct in returned 'payload_version' field.
>> >> + *
>> >> + * Libndctl on receiving the envelope back from papr_scm again
>> >> +checks the
>> >> + * 'payload_version' field and based on it use the appropriate
>> >> +version dsm
>> >> + * struct to parse the results.
>> >> + *
>> >> + * Backward Compatibility:
>> >> + *
>> >> + * Above scheme of exchanging different versioned PDSM struct
>> >> +between libndctl
>> >> + * and papr_scm should provide backward compatibility until
>> >> +following two
>> >> + * assumptions/conditions when defining new PDSM structs hold:
>> >> + *
>> >> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> >> + *
>> >> + * 1. T(X) is a proper subset of T(Y) if Y > X.
>> >> + * i.e Each new version of PDSM struct should retain existing struct
>> >> + * attributes from previous version
>> >> + *
>> >> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> >> + * it should also support T(1), T(2)...T(X - 1).
>> >> + * i.e When adding support for new version of a PDSM struct, libndctl
>> >> + * and papr_scm should retain support of the existing PDSM struct
>> >> + * version they support.
>> >> + */
>> >> +
>> >> +/* PDSM-header + payload expected with ND_CMD_CALL ioctl from
>> >> libnvdimm
>> >> +*/ struct nd_pdsm_cmd_pkg {
>> >> + struct nd_cmd_pkg hdr; /* Package header containing sub-
>> >> cmd */
>> >> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> >> + __u16 reserved[5]; /* Ignored and to be used in future */
>> >> + __u16 payload_version; /* In/Out: version of the payload */
>> >> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> >> +} __packed;
>> >> +
>> >> +/*
>> >> + * Methods to be embedded in ND_CMD_CALL request. These are sent
>> to
>> >> the
>> >> +kernel
>> >> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> >> +*/ enum papr_pdsm {
>> >> + PAPR_PDSM_MIN = 0x0,
>> >> + PAPR_PDSM_MAX,
>> >> +};
>> >> +
>> >> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */ static
>> >> +inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct
>> nd_cmd_pkg
>> >> *cmd) {
>> >> + return (struct nd_pdsm_cmd_pkg *) cmd; }
>> >> +
>> >> +/* Return the payload pointer for a given pcmd */ static inline void
>> >> +*pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd) {
>> >> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> >> + return NULL;
>> >> + else
>> >> + return (void *)(pcmd->payload);
>> >> +}
>> >> +
>> >> +#endif /* _UAPI_ASM_POWERPC_PAPR_PDSM_H_ */
>> >> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> index 149431594839..5e2237e7ec08 100644
>> >> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> >> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> >> @@ -15,13 +15,15 @@
>> >> #include <linux/seq_buf.h>
>> >>
>> >> #include <asm/plpar_wrappers.h>
>> >> +#include <asm/papr_pdsm.h>
>> >>
>> >> #define BIND_ANY_ADDR (~0ul)
>> >>
>> >> #define PAPR_SCM_DIMM_CMD_MASK \
>> >> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> >> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> >> - (1ul << ND_CMD_SET_CONFIG_DATA))
>> >> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> >> + (1ul << ND_CMD_CALL))
>> >>
>> >> /* DIMM health bitmap bitmap indicators */
>> >> /* SCM device is unable to persist memory contents */ @@ -350,16
>> >> +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> >> return 0;
>> >> }
>> >>
>> >> +/*
>> >> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> >> + * This also does initial sanity validation to ND_CMD_CALL
>> >> +sub-command
>> >> packages.
>> >> + */
>> >> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd,
>> >> +void
>> >> *buf,
>> >> + unsigned int buf_len)
>> >> +{
>> >> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> >> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> >> + struct papr_scm_priv *p;
>> >> +
>> >> + /* Only dimm-specific calls are supported atm */
>> >> + if (!nvdimm)
>> >> + return -EINVAL;
>> >> +
>> >> + /* get the provider date from struct nvdimm */
>> >> + p = nvdimm_provider_data(nvdimm);
>> >> +
>> >> + if (!test_bit(cmd, &cmd_mask)) {
>> >> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> >> + return -EINVAL;
>> >> + } else if (cmd == ND_CMD_CALL) {
>> >> +
>> >> + /* Verify the envelope package */
>> >> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> >> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> >> + buf_len);
>> >> + return -EINVAL;
>> >> + }
>> >> +
>> >> + /* Verify that the PDSM family is valid */
>> >> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR) {
>> >> + dev_dbg(&p->pdev->dev, "Invalid pkg
>> >> family=0x%llx\n",
>> >> + pkg->hdr.nd_family);
>> >> + return -EINVAL;
>> >> +
>> >> + }
>> >> +
>> >> + /* We except a payload with all PDSM commands */
>> >> + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> >> + dev_dbg(&p->pdev->dev,
>> >> + "Empty payload for sub-command=0x%llx\n",
>> >> + pkg->hdr.nd_command);
>> >> + return -EINVAL;
>> >> + }
>> >> + }
>> >> +
>> >> + /* Command looks valid */
>> >
>> <snip>
>> > So this is where I would expect the kernel to validate the command vs
>> > a known list of supported commands / payloads. One of the goals of
>> > requiring public documentation of any commands that libnvdimm might
>> > support for the ioctl path is to give the kernel the ability to gate
>> > future enabling on consideration of a common kernel front-end
>> > interface. I believe this would also address questions about the
>> > versioning scheme because userspace would be actively prevented from
>> > sending command payloads that were not first explicitly enabled in the
>> > kernel. This interface as it stands in this patch set seems to be a
>> > very thin / "anything goes" passthrough with no consideration for that
>> > policy.
>> >
>> > As an example of the utility of this policy, consider the recent
>> > support for nvdimm security commands that allow a passphrase to be set
>> > and issue commands like "unlock" and "secure erase". The kernel
>> > actively prevents those commands from being sent from userspace. See
>> > acpi_nfit_clear_to_send() and nd_cmd_clear_to_send(). The reasoning is
>> > that it enforces the kernel's nvdimm security model that uses
>> > encrypted/trusted keys to protect key material (clear text keys
>> > only-ever exist in kernel-space). Yes, that restriction is painful for
>> > people that don't want the kernel's security model and just want the
>> > simplicity of passing clear-text keys around, but it's necessary for
>> > the kernel to have any chance to provide a common abstraction across
>> > vendors. The pain of negotiating every single command with what the
>> > kernel will support is useful for the long term health of the kernel.
>> > It forces ongoing conversations across vendors to consolidate
>> > interfaces and reuse kernel best practices like encrypted/trusted
>> > keys. Code acceptance is the only real gate the kernel has to enforce
>> > cooperation across vendors.
>> >
>> > The expectation is that the kernel does not allow any command to pass
>> > that is not explicitly listed in a bitmap of known commands. I would
>> > expect that if you changed the payload of an existing command that
>> > would likely require a new entry in this bitmap. The goal is to give
>> > the kernel a chance to constrain the passthrough interface to afford a
>> > chance to have a discussion of what might done in a common
>> > implementation. Another example is the label-area read-write commands.
>> > The kernel needs explicit control to ensure that it owns the label
>> > area and that userspace is not able to corrupt it (write it behind the
>> > kernel's back).
>> >
>> > Now that said, I have battle scars with some OEMs that just want a
>> > generic passthrough interface so they never need to work with the
>> > kernel community again and can just write their custom validation
>> > tooling and be done. I've mostly been successful in that fight outside
>> > of the gaping hole of ND_CMD_VENDOR. That's the path that ipmctl has
>> > used to issue commands that have not made it into the public
>> > specification on docs.pmem.io. My warning shot for that is the
>> > "disable_vendor_specific" module option that administrators can set to
>> > only allow commands that the kernel explicitly knows the effects of to
>> > be issued. The result is only tooling / enabling that submits to this
>> > auditing regime is guaranteed to work everywhere.
>>
>> Agree with points made above. With this patchset we arent really trying to
>> push an ioctl passthrough to exchange arbitary data with papr-scm module.
>> Nor do we want to bypass the kernel community for any future
>> enhancements on this interface. We made some design choices based on
>> our understanding of certain restriction we saw in ndctl/libndctl. Specifically
>> wanted to avoid issuing two CMD_CALL ioctl roundtrips.
>>
>> That being said I had an extended discussion with Aneesh rethinking the
>> 'version' field and we both agreed *to remove this field* from the proposed
>> 'struct nd_pdsm_cmd_pkg'. This should resolve the contentions around this
>> Patch-4 in this patchset. Since the 'version' field isnt extensively used right
>> now the impact on the patchset would be small.
>>
>> >
>> > So, that long explanation out of the way, what does that mean for this
>> > patch set? I'd like to understand if you still see a need for a
>> > versioning scheme if the implementation is required to explicitly list
>> > all the commands it supports? I.e. that the kernel need not worry
>> > about userspace sending future unknown payloads because unknown
>> > payloads are blocked. Also if your interface has anything similar to a
>> > "vendor specific" passthrough I would like to require that go through
>> > the ND_CMD_VENDOR ioctl, so that the kernel still has a common check
>> > point to prevent vendor specific "I don't want to talk to the kernel
>> > community" shenanigans, but even better if ND_CMD_VENDOR is
>> something
>> > the kernel can eventually jettison because nobody is using it.
>>
>> As I mentioned above this isn't a 'vendor specific passthrough'
>> machenism. The 'version' field was proposed to avoid two CMD_CALL ioctl
>> roundtrip to fetch and report extended nvdimm health data like 'life-
>> remaining' which isnt always available for papr-scm.
>
> Oh, why not define a maximal health payload with all the attributes
> you know about today, leave some room for future expansion, and then
> report a validity flag for each attribute? This is how the "intel"
> smart-health payload works. If they ever needed to extend the payload
> they would increase the size and add more validity flags. Old
> userspace never groks the new fields, new userspace knows to ask for
> and parse the larger payload.
>
> See the flags field in 'struct nd_intel_smart' (in ndctl) and the
> translation of those flags to ndctl generic attribute flags
> intel_cmd_smart_get_flags().
>
> In general I'd like ndctl to understand the superset of all health
> attributes across all vendors. For the truly vendor specific ones it
> would mean that the health flags with a specific "papr_scm" back-end
> just would never be set on an "intel" device. I.e. look at the "hpe"
> and "msft" health backends. They only set a subset of the valid flags
> that could be reported.
Thanks, this sounds good. Infact papr_scm implementation in ndctl does
advertises support for only a subset of ND_SMART_* flags right now.
Using 'flags' instead of 'version' was indeed discussed during
v7..v9. However re-looking at the 'msft' and 'hpe' implementations the
approach of maximal health payload tagged with a flags field looks more
intuitive and I would prefer implementing this scheme in this patch-set.
The current set health data exchanged with between libndctl and
papr_scm via 'struct nd_papr_pdsm_health' (e.g various health status
bits , nvdimm arming status etc) are guaranteed to be always available
hence associating their availability with a flag wont be much useful as
the flag will be always set.
However as you suggested, extending the 'struct nd_papr_pdsm_health' in
future to accommodate new attributes like 'life-remaining' can be done
via adding them to the end of the struct and setting a flag field to
indicate its presence.
So I have the following proposal:
* Add a new '__u32 extension_flags' field at beginning of 'struct
nd_papr_pdsm_health'
* Set the size of the struct to 184-bytes which is the maximum possible
size for a pdsm payload.
* 'papr_scm' kernel driver will currently set 'extension_flag' to 0
indicating no extension fields.
* Future patch that adds support for 'life-remaining' add the new-field
at the end of known fields in 'struct nd_papr_pdsm_health'.
* When provided to papr_scm kernel module, if 'life-remaining' data is
available its populated and corresponding flag set in
'extension_flags' field indicating its presence.
* When received by libndctl papr_scm implementation its tests if the
extension_flags have associated 'life-remaining' flag set and if yes
then return ND_SMART_USED_VALID flag back from
ndctl_cmd_smart_get_flags().
Implementing first 3 items above in the current patchset should be
fairly trivial.
Does that sounds reasonable ?
Thanks,
~ Vaibhav
>
>> However we just realized instead of relying on 'version' field we can
>> advertise support for these extended attributes via nvdimm-flags from sysfs.
>> Looking at the nvdimm-flags libndctl can use an appropriate pdsm command
>> and struct to fetch the dimm health information from papr_scm via
>> CMD_CALL.
>>
>> But thats something we plan to do in future and not with the current
>> patchset which only reports fixed set of nvdimm health attributes.
>>
>> >
>> > I feel like this is a conversation that will take a few days to
>> > resolve, which does not leave time to push this for v5.8. That said, I
>> > do think the health flags patches at the beginning of this series are
>> > low risk and uncontentious. How about I merge those for v5.8 and
>> > circle back to get this ioctl path queued early in v5.8-rc? Apologies
>> > for the late feedback on this relative to v5.8.
>> >
>> Thanks for this consideration. Agree to the proposal. However changes to
>> patchset with removal of 'version' field is fairly small hence can quickly push
>> an updated patch series cumulating rest of the review comments from Ira.
>>
>> Does that sounds reasonable ?
>
On Fri, Jun 5, 2020 at 8:22 AM Vaibhav Jain <[email protected]> wrote:
[..]
> > Oh, why not define a maximal health payload with all the attributes
> > you know about today, leave some room for future expansion, and then
> > report a validity flag for each attribute? This is how the "intel"
> > smart-health payload works. If they ever needed to extend the payload
> > they would increase the size and add more validity flags. Old
> > userspace never groks the new fields, new userspace knows to ask for
> > and parse the larger payload.
> >
> > See the flags field in 'struct nd_intel_smart' (in ndctl) and the
> > translation of those flags to ndctl generic attribute flags
> > intel_cmd_smart_get_flags().
> >
> > In general I'd like ndctl to understand the superset of all health
> > attributes across all vendors. For the truly vendor specific ones it
> > would mean that the health flags with a specific "papr_scm" back-end
> > just would never be set on an "intel" device. I.e. look at the "hpe"
> > and "msft" health backends. They only set a subset of the valid flags
> > that could be reported.
>
> Thanks, this sounds good. Infact papr_scm implementation in ndctl does
> advertises support for only a subset of ND_SMART_* flags right now.
>
> Using 'flags' instead of 'version' was indeed discussed during
> v7..v9. However re-looking at the 'msft' and 'hpe' implementations the
> approach of maximal health payload tagged with a flags field looks more
> intuitive and I would prefer implementing this scheme in this patch-set.
>
> The current set health data exchanged with between libndctl and
> papr_scm via 'struct nd_papr_pdsm_health' (e.g various health status
> bits , nvdimm arming status etc) are guaranteed to be always available
> hence associating their availability with a flag wont be much useful as
> the flag will be always set.
>
> However as you suggested, extending the 'struct nd_papr_pdsm_health' in
> future to accommodate new attributes like 'life-remaining' can be done
> via adding them to the end of the struct and setting a flag field to
> indicate its presence.
>
> So I have the following proposal:
> * Add a new '__u32 extension_flags' field at beginning of 'struct
> nd_papr_pdsm_health'
> * Set the size of the struct to 184-bytes which is the maximum possible
> size for a pdsm payload.
> * 'papr_scm' kernel driver will currently set 'extension_flag' to 0
> indicating no extension fields.
>
> * Future patch that adds support for 'life-remaining' add the new-field
> at the end of known fields in 'struct nd_papr_pdsm_health'.
> * When provided to papr_scm kernel module, if 'life-remaining' data is
> available its populated and corresponding flag set in
> 'extension_flags' field indicating its presence.
> * When received by libndctl papr_scm implementation its tests if the
> extension_flags have associated 'life-remaining' flag set and if yes
> then return ND_SMART_USED_VALID flag back from
> ndctl_cmd_smart_get_flags().
>
> Implementing first 3 items above in the current patchset should be
> fairly trivial.
>
> Does that sounds reasonable ?
This sounds good to me.