2022-06-15 00:12:21

by Alison Schofield

[permalink] [raw]
Subject: [PATCH 0/3] CXL Poison List Retrieval & Tracing

From: Alison Schofield <[email protected]>

Introducing the first piece of support for CXL Media Errors,
offering the ability to retrieve a devices poison list and
store the returned error records as kernel trace events.

The handling of the poison list is guided by the CXL 2.0 Spec
Section 8.2.9.5.4.1. [1] The usage of Trace Events to store the
Media Error records is a first look at the proposed handling
of CXL ARS events.

Example command line usage:

$ trace-cmd record -e cxl_poison_list
$ echo 1 > /sys/bus/cxl/devices/mem1/get_poison
$ trace-cmd report trace.dat

cxl_poison_list: memdev: mem3 source EXTERNAL start 0x41 length 0x2
cxl_poison_list: memdev: mem3 source INTERNAL start 0xc2 length 0x3
cxl_poison_list: memdev: mem3 source INJECTED start 0x183 length 0x4
cxl_poison_list: memdev: mem3 source INVALID start 0x284 length 0x5
cxl_poison_list: memdev: mem3 source VENDOR start 0x707 length 0x8

[1]: https://www.computeexpresslink.org/download-the-specification

Alison Schofield (3):
trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records
cxl/mbox: Add GET_POISON_LIST mailbox command support
cxl/core: Add sysfs attribute get_poison for list retrieval

Documentation/ABI/testing/sysfs-bus-cxl | 13 +++++
drivers/cxl/cxlmem.h | 43 ++++++++++++++
include/trace/events/cxl.h | 60 ++++++++++++++++++++
drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++
drivers/cxl/core/memdev.c | 32 +++++++++++
5 files changed, 223 insertions(+)
create mode 100644 include/trace/events/cxl.h


base-commit: 2263e9ed65887cc7c6e977f372596199d2c9f4af
--
2.31.1


2022-06-15 00:54:21

by Alison Schofield

[permalink] [raw]
Subject: [PATCH 1/3] trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records

From: Alison Schofield <[email protected]>

Add a trace event for CXL Poison List Media Error Records that
includes the starting DPA of the poison, the length, and the
the source of the poison.

This trace event will be used by the CXL_MEM driver to log the
Media Errors returned by the GET_POISON_LIST Mailbox command.

Signed-off-by: Alison Schofield <[email protected]>
---
include/trace/events/cxl.h | 60 ++++++++++++++++++++++++++++++++++++++
1 file changed, 60 insertions(+)
create mode 100644 include/trace/events/cxl.h

diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
new file mode 100644
index 000000000000..17e707c3817e
--- /dev/null
+++ b/include/trace/events/cxl.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cxl
+
+#if !defined(_CXL_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _CXL_TRACE_H
+
+#include <linux/tracepoint.h>
+
+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_UNKNOWN);
+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INTERNAL);
+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_EXTERNAL);
+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INJECTED);
+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_VENDOR);
+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INVALID);
+
+#define show_poison_source(source) \
+ __print_symbolic(source, \
+ {CXL_POISON_SOURCE_UNKNOWN, "UNKNOWN"}, \
+ {CXL_POISON_SOURCE_EXTERNAL, "EXTERNAL"}, \
+ {CXL_POISON_SOURCE_INTERNAL, "INTERNAL"}, \
+ {CXL_POISON_SOURCE_INJECTED, "INJECTED"}, \
+ {CXL_POISON_SOURCE_VENDOR, "VENDOR"}, \
+ {CXL_POISON_SOURCE_INVALID, "INVALID"})
+
+TRACE_EVENT(cxl_poison_list,
+
+ TP_PROTO(struct device *dev,
+ int source,
+ unsigned long start,
+ unsigned int length),
+
+ TP_ARGS(dev, source, start, length),
+
+ TP_STRUCT__entry(
+ __string(name, dev_name(dev))
+ __field(int, source)
+ __field(u64, start)
+ __field(u32, length)
+ ),
+
+ TP_fast_assign(
+ __assign_str(name, dev_name(dev));
+ __entry->source = source;
+ __entry->start = start;
+ __entry->length = length;
+ ),
+
+ TP_printk("dev %s source %s start %llu length %u",
+ __get_str(name),
+ show_poison_source(__entry->source),
+ __entry->start,
+ __entry->length)
+);
+#endif /* _CXL_TRACE_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE cxl
+#include <trace/define_trace.h>
--
2.31.1

2022-06-15 01:00:50

by Alison Schofield

[permalink] [raw]
Subject: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

From: Alison Schofield <[email protected]>

CXL devices that support persistent memory maintain a list of locations
that are poisoned or result in poison if the addresses are accessed by
the host.

Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
list as a set of Media Error Records that include the source of the
error, the starting device physical address and length. The length is
the number of adjacent DPAs in the record and is in units of 64 bytes.

Retrieve the list and log each Media Error Record as a trace event of
type cxl_poison_list.

Signed-off-by: Alison Schofield <[email protected]>
---
drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 118 insertions(+)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 60d10ee1e7fc..29cf0459b44a 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -174,6 +174,7 @@ struct cxl_endpoint_dvsec_info {
* (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
* @lsa_size: Size of Label Storage Area
* (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
+ * @poison_max_mer: maximum Media Error Records tracked in Poison List
* @mbox_mutex: Mutex to synchronize mailbox access.
* @firmware_version: Firmware version for the memory device.
* @enabled_cmds: Hardware commands found enabled in CEL.
@@ -204,6 +205,7 @@ struct cxl_dev_state {

size_t payload_size;
size_t lsa_size;
+ u32 poison_max;
struct mutex mbox_mutex; /* Protects device mailbox and firmware */
char firmware_version[0x10];
DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
@@ -317,6 +319,46 @@ struct cxl_mbox_set_partition_info {

#define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)

+struct cxl_mbox_poison_payload_in {
+ __le64 offset;
+ __le64 length;
+} __packed;
+
+struct cxl_mbox_poison_payload_out {
+ u8 flags;
+ u8 rsvd1;
+ __le64 overflow_timestamp;
+ __le16 count;
+ u8 rsvd2[0x14];
+ struct cxl_poison_record {
+ __le64 address;
+ __le32 length;
+ __le32 rsvd;
+ } __packed record[];
+} __packed;
+
+/* CXL 8.2.9.5.4.1 Get Poison List: payload out flags: */
+#define CXL_POISON_FLAG_MORE BIT(0)
+#define CXL_POISON_FLAG_OVERFLOW BIT(1)
+#define CXL_POISON_FLAG_SCANNING BIT(2)
+
+/* CXL 8.2.9.5.4.1 Get Poison List: Error is encoded in record.address[2:0] */
+#define CXL_POISON_SOURCE_MASK GENMASK(2, 0)
+#define CXL_POISON_SOURCE_UNKNOWN 0
+#define CXL_POISON_SOURCE_EXTERNAL 1
+#define CXL_POISON_SOURCE_INTERNAL 2
+#define CXL_POISON_SOURCE_INJECTED 3
+#define CXL_POISON_SOURCE_VENDOR 7
+
+/* Software define */
+#define CXL_POISON_SOURCE_INVALID 99
+#define CXL_POISON_SOURCE_VALID(x) \
+ (((x) == CXL_POISON_SOURCE_UNKNOWN) || \
+ ((x) == CXL_POISON_SOURCE_EXTERNAL) || \
+ ((x) == CXL_POISON_SOURCE_INTERNAL) || \
+ ((x) == CXL_POISON_SOURCE_INJECTED) || \
+ ((x) == CXL_POISON_SOURCE_VENDOR))
+
/**
* struct cxl_mem_command - Driver representation of a memory device command
* @info: Command information as it exists for the UAPI
@@ -351,6 +393,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+int cxl_mem_get_poison_list(struct device *dev);
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 54f434733b56..c10c7020ebc2 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -9,6 +9,9 @@

#include "core.h"

+#define CREATE_TRACE_POINTS
+#include <trace/events/cxl.h>
+
static bool cxl_raw_allow_all;

/**
@@ -755,6 +758,7 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
{
/* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
struct cxl_mbox_identify id;
+ __le32 val = 0;
int rc;

rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_IDENTIFY, NULL, 0, &id,
@@ -783,6 +787,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
cxlds->lsa_size = le32_to_cpu(id.lsa_size);
memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision));

+ memcpy(&val, id.poison_list_max_mer, 3);
+ cxlds->poison_max = le32_to_cpu(val);
+
return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
@@ -826,6 +833,74 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);

+int cxl_mem_get_poison_list(struct device *dev)
+{
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_mbox_poison_payload_out *po;
+ struct cxl_mbox_poison_payload_in pi;
+ int nr_records = 0;
+ int rc, i;
+
+ if (range_len(&cxlds->pmem_range)) {
+ pi.offset = cpu_to_le64(cxlds->pmem_range.start);
+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
+ } else {
+ return -ENXIO;
+ }
+
+ po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
+ if (!po)
+ return -ENOMEM;
+
+ do {
+ rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
+ sizeof(pi), po, cxlds->payload_size);
+ if (rc)
+ goto out;
+
+ if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
+ time64_t o_time = le64_to_cpu(po->overflow_timestamp);
+
+ dev_err(dev, "Poison list overflow at %ptTs UTC\n",
+ &o_time);
+ rc = -ENXIO;
+ goto out;
+ }
+
+ if (po->flags & CXL_POISON_FLAG_SCANNING) {
+ dev_err(dev, "Scan Media in Progress\n");
+ rc = -EBUSY;
+ goto out;
+ }
+
+ for (i = 0; i < le16_to_cpu(po->count); i++) {
+ u64 addr = le64_to_cpu(po->record[i].address);
+ u32 len = le32_to_cpu(po->record[i].length);
+ int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
+
+ if (!CXL_POISON_SOURCE_VALID(source)) {
+ dev_dbg(dev, "Invalid poison source %d",
+ source);
+ source = CXL_POISON_SOURCE_INVALID;
+ }
+
+ trace_cxl_poison_list(dev, source, addr, len);
+ }
+
+ /* Protect against an uncleared _FLAG_MORE */
+ nr_records = nr_records + le16_to_cpu(po->count);
+ if (nr_records >= cxlds->poison_max)
+ goto out;
+
+ } while (po->flags & CXL_POISON_FLAG_MORE);
+
+out:
+ kvfree(po);
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
+
struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
{
struct cxl_dev_state *cxlds;
--
2.31.1

2022-06-15 01:01:03

by Alison Schofield

[permalink] [raw]
Subject: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

From: Alison Schofield <[email protected]>

The sysfs attribute, get_poison, allows user space to request the
retrieval of a CXL devices poison list for its persistent memory.

From Documentation/ABI/.../sysfs-bus-cxl
(WO) When a '1' is written to this attribute the memdev
driver retrieves the poison list from the device. The list
includes addresses that are poisoned or would result in
poison if accessed, and the source of the poison. This
attribute is only visible for devices supporting the
capability. The retrieved errors are logged as kernel
trace events with the label: cxl_poison_list.

Signed-off-by: Alison Schofield <[email protected]>
---
Documentation/ABI/testing/sysfs-bus-cxl | 13 ++++++++++
drivers/cxl/core/memdev.c | 32 +++++++++++++++++++++++++
2 files changed, 45 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 7c2b846521f3..9d0c3988fdd2 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -163,3 +163,16 @@ Description:
memory (type-3). The 'target_type' attribute indicates the
current setting which may dynamically change based on what
memory regions are activated in this decode hierarchy.
+
+What: /sys/bus/cxl/devices/memX/get_poison
+Date: June, 2022
+KernelVersion: v5.20
+Contact: [email protected]
+Description:
+ (WO) When a '1' is written to this attribute the memdev
+ driver retrieves the poison list from the device. The list
+ includes addresses that are poisoned or would result in
+ poison if accessed, and the source of the poison. This
+ attribute is only visible for devices supporting the
+ capability. The retrieved errors are logged as kernel
+ trace events with the label: cxl_poison_list.
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index f7cdcd33504a..5ef9ffaa934a 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -106,12 +106,34 @@ static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(numa_node);

+static ssize_t get_poison_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+
+{
+ int rc;
+
+ if (!sysfs_streq(buf, "1")) {
+ dev_err(dev, "%s: unknown value: %s\n", attr->attr.name, buf);
+ return -EINVAL;
+ }
+
+ rc = cxl_mem_get_poison_list(dev);
+ if (rc) {
+ dev_err(dev, "Failed to retrieve poison list %d\n", rc);
+ return rc;
+ }
+ return len;
+}
+static DEVICE_ATTR_WO(get_poison);
+
static struct attribute *cxl_memdev_attributes[] = {
&dev_attr_serial.attr,
&dev_attr_firmware_version.attr,
&dev_attr_payload_max.attr,
&dev_attr_label_storage_size.attr,
&dev_attr_numa_node.attr,
+ &dev_attr_get_poison.attr,
NULL,
};

@@ -130,6 +152,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
{
if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
return 0;
+
+ if (a == &dev_attr_get_poison.attr) {
+ struct device *dev = container_of(kobj, struct device, kobj);
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON,
+ cxlds->enabled_cmds))
+ return 0;
+ }
return a->mode;
}

--
2.31.1

2022-06-15 01:32:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/3] trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records

On Tue, 14 Jun 2022 17:10:26 -0700
[email protected] wrote:

> From: Alison Schofield <[email protected]>
>
> Add a trace event for CXL Poison List Media Error Records that
> includes the starting DPA of the poison, the length, and the
> the source of the poison.
>
> This trace event will be used by the CXL_MEM driver to log the
> Media Errors returned by the GET_POISON_LIST Mailbox command.
>
> Signed-off-by: Alison Schofield <[email protected]>
> ---
> include/trace/events/cxl.h | 60 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 60 insertions(+)
> create mode 100644 include/trace/events/cxl.h
>
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..17e707c3817e
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_UNKNOWN);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INTERNAL);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_EXTERNAL);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INJECTED);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_VENDOR);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INVALID);
> +
> +#define show_poison_source(source) \
> + __print_symbolic(source, \
> + {CXL_POISON_SOURCE_UNKNOWN, "UNKNOWN"}, \
> + {CXL_POISON_SOURCE_EXTERNAL, "EXTERNAL"}, \
> + {CXL_POISON_SOURCE_INTERNAL, "INTERNAL"}, \
> + {CXL_POISON_SOURCE_INJECTED, "INJECTED"}, \
> + {CXL_POISON_SOURCE_VENDOR, "VENDOR"}, \
> + {CXL_POISON_SOURCE_INVALID, "INVALID"})
> +
> +TRACE_EVENT(cxl_poison_list,
> +
> + TP_PROTO(struct device *dev,
> + int source,
> + unsigned long start,
> + unsigned int length),
> +
> + TP_ARGS(dev, source, start, length),
> +
> + TP_STRUCT__entry(
> + __string(name, dev_name(dev))
> + __field(int, source)
> + __field(u64, start)
> + __field(u32, length)

OK, the above order should be fine, without adding any holes. The
__string() is 4 bytes as well as the "int". Which keeps it aligned with the
u64 (8 bytes), followed by a u32, which is 4 bytes.

From a tracing perspective:

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve


> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, dev_name(dev));
> + __entry->source = source;
> + __entry->start = start;
> + __entry->length = length;
> + ),
> +
> + TP_printk("dev %s source %s start %llu length %u",
> + __get_str(name),
> + show_poison_source(__entry->source),
> + __entry->start,
> + __entry->length)
> +);
> +#endif /* _CXL_TRACE_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>

2022-06-15 03:47:27

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

On Tue, Jun 14, 2022 at 05:10:28PM -0700, Alison Schofield wrote:
> From: Alison Schofield <[email protected]>
>
> The sysfs attribute, get_poison, allows user space to request the
> retrieval of a CXL devices poison list for its persistent memory.
>
> From Documentation/ABI/.../sysfs-bus-cxl
> (WO) When a '1' is written to this attribute the memdev
> driver retrieves the poison list from the device. The list
> includes addresses that are poisoned or would result in
> poison if accessed, and the source of the poison. This
> attribute is only visible for devices supporting the
> capability. The retrieved errors are logged as kernel
> trace events with the label: cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>

Reviewed-by: Ira Weiny <[email protected]>

> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 13 ++++++++++
> drivers/cxl/core/memdev.c | 32 +++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 7c2b846521f3..9d0c3988fdd2 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -163,3 +163,16 @@ Description:
> memory (type-3). The 'target_type' attribute indicates the
> current setting which may dynamically change based on what
> memory regions are activated in this decode hierarchy.
> +
> +What: /sys/bus/cxl/devices/memX/get_poison
> +Date: June, 2022
> +KernelVersion: v5.20
> +Contact: [email protected]
> +Description:
> + (WO) When a '1' is written to this attribute the memdev
> + driver retrieves the poison list from the device. The list
> + includes addresses that are poisoned or would result in
> + poison if accessed, and the source of the poison. This
> + attribute is only visible for devices supporting the
> + capability. The retrieved errors are logged as kernel
> + trace events with the label: cxl_poison_list.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f7cdcd33504a..5ef9ffaa934a 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -106,12 +106,34 @@ static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t get_poison_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +
> +{
> + int rc;
> +
> + if (!sysfs_streq(buf, "1")) {
> + dev_err(dev, "%s: unknown value: %s\n", attr->attr.name, buf);
> + return -EINVAL;
> + }
> +
> + rc = cxl_mem_get_poison_list(dev);
> + if (rc) {
> + dev_err(dev, "Failed to retrieve poison list %d\n", rc);
> + return rc;
> + }
> + return len;
> +}
> +static DEVICE_ATTR_WO(get_poison);
> +
> static struct attribute *cxl_memdev_attributes[] = {
> &dev_attr_serial.attr,
> &dev_attr_firmware_version.attr,
> &dev_attr_payload_max.attr,
> &dev_attr_label_storage_size.attr,
> &dev_attr_numa_node.attr,
> + &dev_attr_get_poison.attr,
> NULL,
> };
>
> @@ -130,6 +152,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> {
> if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> return 0;
> +
> + if (a == &dev_attr_get_poison.attr) {
> + struct device *dev = container_of(kobj, struct device, kobj);
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +
> + if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON,
> + cxlds->enabled_cmds))
> + return 0;
> + }
> return a->mode;
> }
>
> --
> 2.31.1
>

2022-06-15 04:23:52

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Tue, Jun 14, 2022 at 05:10:27PM -0700, Alison Schofield wrote:
> From: Alison Schofield <[email protected]>
>
> CXL devices that support persistent memory maintain a list of locations
> that are poisoned or result in poison if the addresses are accessed by
> the host.
>
> Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> list as a set of Media Error Records that include the source of the
> error, the starting device physical address and length. The length is
> the number of adjacent DPAs in the record and is in units of 64 bytes.
>
> Retrieve the list and log each Media Error Record as a trace event of
> type cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>
> ---
> drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 118 insertions(+)
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 60d10ee1e7fc..29cf0459b44a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -174,6 +174,7 @@ struct cxl_endpoint_dvsec_info {
> * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
> * @lsa_size: Size of Label Storage Area
> * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
> + * @poison_max_mer: maximum Media Error Records tracked in Poison List

Does not match the member name below.

Ira

> * @mbox_mutex: Mutex to synchronize mailbox access.
> * @firmware_version: Firmware version for the memory device.
> * @enabled_cmds: Hardware commands found enabled in CEL.
> @@ -204,6 +205,7 @@ struct cxl_dev_state {
>
> size_t payload_size;
> size_t lsa_size;
> + u32 poison_max;
> struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> char firmware_version[0x10];
> DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
> @@ -317,6 +319,46 @@ struct cxl_mbox_set_partition_info {
>
> #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
>
> +struct cxl_mbox_poison_payload_in {
> + __le64 offset;
> + __le64 length;
> +} __packed;
> +
> +struct cxl_mbox_poison_payload_out {
> + u8 flags;
> + u8 rsvd1;
> + __le64 overflow_timestamp;
> + __le16 count;
> + u8 rsvd2[0x14];
> + struct cxl_poison_record {
> + __le64 address;
> + __le32 length;
> + __le32 rsvd;
> + } __packed record[];
> +} __packed;
> +
> +/* CXL 8.2.9.5.4.1 Get Poison List: payload out flags: */
> +#define CXL_POISON_FLAG_MORE BIT(0)
> +#define CXL_POISON_FLAG_OVERFLOW BIT(1)
> +#define CXL_POISON_FLAG_SCANNING BIT(2)
> +
> +/* CXL 8.2.9.5.4.1 Get Poison List: Error is encoded in record.address[2:0] */
> +#define CXL_POISON_SOURCE_MASK GENMASK(2, 0)
> +#define CXL_POISON_SOURCE_UNKNOWN 0
> +#define CXL_POISON_SOURCE_EXTERNAL 1
> +#define CXL_POISON_SOURCE_INTERNAL 2
> +#define CXL_POISON_SOURCE_INJECTED 3
> +#define CXL_POISON_SOURCE_VENDOR 7
> +
> +/* Software define */
> +#define CXL_POISON_SOURCE_INVALID 99
> +#define CXL_POISON_SOURCE_VALID(x) \
> + (((x) == CXL_POISON_SOURCE_UNKNOWN) || \
> + ((x) == CXL_POISON_SOURCE_EXTERNAL) || \
> + ((x) == CXL_POISON_SOURCE_INTERNAL) || \
> + ((x) == CXL_POISON_SOURCE_INJECTED) || \
> + ((x) == CXL_POISON_SOURCE_VENDOR))
> +
> /**
> * struct cxl_mem_command - Driver representation of a memory device command
> * @info: Command information as it exists for the UAPI
> @@ -351,6 +393,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +int cxl_mem_get_poison_list(struct device *dev);
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 54f434733b56..c10c7020ebc2 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -9,6 +9,9 @@
>
> #include "core.h"
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
> +
> static bool cxl_raw_allow_all;
>
> /**
> @@ -755,6 +758,7 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> {
> /* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
> struct cxl_mbox_identify id;
> + __le32 val = 0;
> int rc;
>
> rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_IDENTIFY, NULL, 0, &id,
> @@ -783,6 +787,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> cxlds->lsa_size = le32_to_cpu(id.lsa_size);
> memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision));
>
> + memcpy(&val, id.poison_list_max_mer, 3);
> + cxlds->poison_max = le32_to_cpu(val);
> +
> return 0;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
> @@ -826,6 +833,74 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);
>
> +int cxl_mem_get_poison_list(struct device *dev)
> +{
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_mbox_poison_payload_out *po;
> + struct cxl_mbox_poison_payload_in pi;
> + int nr_records = 0;
> + int rc, i;
> +
> + if (range_len(&cxlds->pmem_range)) {
> + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> + } else {
> + return -ENXIO;
> + }
> +
> + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> + if (!po)
> + return -ENOMEM;
> +
> + do {
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> + sizeof(pi), po, cxlds->payload_size);
> + if (rc)
> + goto out;
> +
> + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> +
> + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> + &o_time);
> + rc = -ENXIO;
> + goto out;

I guess the idea is that this return will trigger something else will clear the list,
rebuild the list, and perform a scan media request?

I'm just wondering if this loop should continue to clear the list and then let
something else do the scan media request?

Other than that question and the above typo. Looks good!

Ira

> + }
> +
> + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> + dev_err(dev, "Scan Media in Progress\n");
> + rc = -EBUSY;
> + goto out;
> + }
> +
> + for (i = 0; i < le16_to_cpu(po->count); i++) {
> + u64 addr = le64_to_cpu(po->record[i].address);
> + u32 len = le32_to_cpu(po->record[i].length);
> + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> +
> + if (!CXL_POISON_SOURCE_VALID(source)) {
> + dev_dbg(dev, "Invalid poison source %d",
> + source);
> + source = CXL_POISON_SOURCE_INVALID;
> + }
> +
> + trace_cxl_poison_list(dev, source, addr, len);
> + }
> +
> + /* Protect against an uncleared _FLAG_MORE */
> + nr_records = nr_records + le16_to_cpu(po->count);
> + if (nr_records >= cxlds->poison_max)
> + goto out;
> +
> + } while (po->flags & CXL_POISON_FLAG_MORE);
> +
> +out:
> + kvfree(po);
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> +
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> {
> struct cxl_dev_state *cxlds;
> --
> 2.31.1
>

2022-06-15 05:14:17

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Tue, Jun 14, 2022 at 08:22:41PM -0700, Ira Weiny wrote:

Thanks for the review Ira...

> On Tue, Jun 14, 2022 at 05:10:27PM -0700, Alison Schofield wrote:
> > From: Alison Schofield <[email protected]>
> >

snip

> >
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 60d10ee1e7fc..29cf0459b44a 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -174,6 +174,7 @@ struct cxl_endpoint_dvsec_info {
> > * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
> > * @lsa_size: Size of Label Storage Area
> > * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
> > + * @poison_max_mer: maximum Media Error Records tracked in Poison List
>
> Does not match the member name below.
Got it! I intended to drop the _mer.

>
> Ira
>
> > * @mbox_mutex: Mutex to synchronize mailbox access.
> > * @firmware_version: Firmware version for the memory device.
> > * @enabled_cmds: Hardware commands found enabled in CEL.
> > @@ -204,6 +205,7 @@ struct cxl_dev_state {
> >
> > size_t payload_size;
> > size_t lsa_size;
> > + u32 poison_max;
> > struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> > char firmware_version[0x10];
> > DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);

snip

> > +int cxl_mem_get_poison_list(struct device *dev)
> > +{
> > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > + struct cxl_mbox_poison_payload_out *po;
> > + struct cxl_mbox_poison_payload_in pi;
> > + int nr_records = 0;
> > + int rc, i;
> > +
> > + if (range_len(&cxlds->pmem_range)) {
> > + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> > + } else {
> > + return -ENXIO;
> > + }
> > +
> > + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> > + if (!po)
> > + return -ENOMEM;
> > +
> > + do {
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> > + sizeof(pi), po, cxlds->payload_size);
> > + if (rc)
> > + goto out;
> > +
> > + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> > + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> > +
> > + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> > + &o_time);
> > + rc = -ENXIO;
> > + goto out;
>
> I guess the idea is that this return will trigger something else will clear the list,
> rebuild the list, and perform a scan media request?
>
Per CXL Spec 8.2.9.5.4.1: The poison list may be incomplete when the list
has overflowed. User can perform a Scan Media to try to clear and rebuild
the list, with no guarantee that the overflow will not recur.

So yes to what you are saying. This return value should indicate to
user space that a Scan Media should be issued. Issuing the Scan Media
to the device does lead the device to rebuild it's list, as you say.
Also, when we get the Scan Media results, the device is able to report
partial results and tell the host to collect the error records, and
then restart the scan, get results again, and on and on until the scan
is complete.

Perhaps a clarification - there is not a logical pairing of Scan Media
followed by Get Poison List. Scan Media followed by Get Scan Media
Results is the logical pairing. Get Poison List is getting a snapshot
of the poison list at a point in time. The device adds DPAs to the list
when the device detects poison, some devices run their own backround
scans and add to the poison list, and then there are the user initiated
actions (Scan Media and Poison Inject) that can affect the list.

> I'm just wondering if this loop should continue to clear the list and then let
> something else do the scan media request?

It's not like the _MORE status where the device is telling the host to
come back and gather more. I think the action of failing, and letting
user initiated a Scan Media is correct course here.

So, this response got kind of long winded. As you can see, especially
if one looks in the spec as I know you are, there are additional
commands that need to be implemented to complete the ARS feature set.
And, of course, we'll offer user space tooling (NDCTL and libcxl).

>
> Other than that question and the above typo. Looks good!
>
> Ira
>
> > + }
> > +
> > + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> > + dev_err(dev, "Scan Media in Progress\n");
> > + rc = -EBUSY;
> > + goto out;
> > + }
> > +
> > + for (i = 0; i < le16_to_cpu(po->count); i++) {
> > + u64 addr = le64_to_cpu(po->record[i].address);
> > + u32 len = le32_to_cpu(po->record[i].length);
> > + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> > +
> > + if (!CXL_POISON_SOURCE_VALID(source)) {
> > + dev_dbg(dev, "Invalid poison source %d",
> > + source);
> > + source = CXL_POISON_SOURCE_INVALID;
> > + }
> > +
> > + trace_cxl_poison_list(dev, source, addr, len);
> > + }
> > +
> > + /* Protect against an uncleared _FLAG_MORE */
> > + nr_records = nr_records + le16_to_cpu(po->count);
> > + if (nr_records >= cxlds->poison_max)
> > + goto out;
> > +
> > + } while (po->flags & CXL_POISON_FLAG_MORE);
> > +
> > +out:
> > + kvfree(po);
> > + return rc;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> > +
> > struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> > {
> > struct cxl_dev_state *cxlds;
> > --
> > 2.31.1
> >

2022-06-15 15:07:41

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Tue, Jun 14, 2022 at 10:07:52PM -0700, Alison Schofield wrote:
> On Tue, Jun 14, 2022 at 08:22:41PM -0700, Ira Weiny wrote:
>

[snip]

> > > +
> > > + do {
> > > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> > > + sizeof(pi), po, cxlds->payload_size);
> > > + if (rc)
> > > + goto out;
> > > +
> > > + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> > > + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> > > +
> > > + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> > > + &o_time);
> > > + rc = -ENXIO;
> > > + goto out;
> >
> > I guess the idea is that this return will trigger something else will clear the list,
> > rebuild the list, and perform a scan media request?
> >
> Per CXL Spec 8.2.9.5.4.1: The poison list may be incomplete when the list
> has overflowed. User can perform a Scan Media to try to clear and rebuild
> the list, with no guarantee that the overflow will not recur.
>
> So yes to what you are saying. This return value should indicate to
> user space that a Scan Media should be issued. Issuing the Scan Media
> to the device does lead the device to rebuild it's list, as you say.
> Also, when we get the Scan Media results, the device is able to report
> partial results and tell the host to collect the error records, and
> then restart the scan, get results again, and on and on until the scan
> is complete.
>
> Perhaps a clarification - there is not a logical pairing of Scan Media
> followed by Get Poison List. Scan Media followed by Get Scan Media
> Results is the logical pairing. Get Poison List is getting a snapshot
> of the poison list at a point in time. The device adds DPAs to the list
> when the device detects poison, some devices run their own backround
> scans and add to the poison list, and then there are the user initiated
> actions (Scan Media and Poison Inject) that can affect the list.
>
> > I'm just wondering if this loop should continue to clear the list and then let
> > something else do the scan media request?
>
> It's not like the _MORE status where the device is telling the host to
> come back and gather more. I think the action of failing, and letting
> user initiated a Scan Media is correct course here.

Fair enough. But I guess I'm still confused by the spec. The way I read it
yesterday (and I could be wrong) was that the OS was supposed to read the
entries to clear the list? Is that not true?

I the device will clear the list internally when Scan Media is run?

At this point I'm just trying to understand not necessarily objecting to the
patch.

Ira

>
> So, this response got kind of long winded. As you can see, especially
> if one looks in the spec as I know you are, there are additional
> commands that need to be implemented to complete the ARS feature set.
> And, of course, we'll offer user space tooling (NDCTL and libcxl).
>

2022-06-15 17:33:40

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Wed, Jun 15, 2022 at 08:01:50AM -0700, Ira Weiny wrote:
> On Tue, Jun 14, 2022 at 10:07:52PM -0700, Alison Schofield wrote:
> > On Tue, Jun 14, 2022 at 08:22:41PM -0700, Ira Weiny wrote:
> >
>
> [snip]
>
> > > > +
> > > > + do {
> > > > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> > > > + sizeof(pi), po, cxlds->payload_size);
> > > > + if (rc)
> > > > + goto out;
> > > > +
> > > > + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> > > > + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> > > > +
> > > > + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> > > > + &o_time);
> > > > + rc = -ENXIO;
> > > > + goto out;
> > >
> > > I guess the idea is that this return will trigger something else will clear the list,
> > > rebuild the list, and perform a scan media request?
> > >
> > Per CXL Spec 8.2.9.5.4.1: The poison list may be incomplete when the list
> > has overflowed. User can perform a Scan Media to try to clear and rebuild
> > the list, with no guarantee that the overflow will not recur.
> >
> > So yes to what you are saying. This return value should indicate to
> > user space that a Scan Media should be issued. Issuing the Scan Media
> > to the device does lead the device to rebuild it's list, as you say.
> > Also, when we get the Scan Media results, the device is able to report
> > partial results and tell the host to collect the error records, and
> > then restart the scan, get results again, and on and on until the scan
> > is complete.
> >
> > Perhaps a clarification - there is not a logical pairing of Scan Media
> > followed by Get Poison List. Scan Media followed by Get Scan Media
> > Results is the logical pairing. Get Poison List is getting a snapshot
> > of the poison list at a point in time. The device adds DPAs to the list
> > when the device detects poison, some devices run their own backround
> > scans and add to the poison list, and then there are the user initiated
> > actions (Scan Media and Poison Inject) that can affect the list.
> >
> > > I'm just wondering if this loop should continue to clear the list and then let
> > > something else do the scan media request?
> >
> > It's not like the _MORE status where the device is telling the host to
> > come back and gather more. I think the action of failing, and letting
> > user initiated a Scan Media is correct course here.
>
> Fair enough. But I guess I'm still confused by the spec. The way I read it
> yesterday (and I could be wrong) was that the OS was supposed to read the
> entries to clear the list? Is that not true?

I think - not true.

Get_Poison_List has no effect on the contents of the list itself.
Even with its MORE flag, it is not clearing any poison, it's just
telling the host that it had more records than could fit in one
device payload so they will have to delivered to the host in multiple
requests. I'd expect issuing multiple Get Poison List requests would
get same results. (unless of course the media was going bad quickly

Maybe you are conflating w other cmds: Scan Media & Clear Poison

>
> I the device will clear the list internally when Scan Media is run?

Spec says device 'rebuids' the list. I might guess that's a clear and
start anew, but not the hosts business. As a host, we wait for Scan
Media to complete before issuing Get Scan Media Results or Get Poison
List.
>
> At this point I'm just trying to understand not necessarily objecting to the
> patch.

NP. The questions are helpful!

>
> Ira
>
> >
> > So, this response got kind of long winded. As you can see, especially
> > if one looks in the spec as I know you are, there are additional
> > commands that need to be implemented to complete the ARS feature set.
> > And, of course, we'll offer user space tooling (NDCTL and libcxl).
> >

2022-06-16 15:24:25

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

On Tue, 14 Jun 2022 17:10:28 -0700
[email protected] wrote:

> From: Alison Schofield <[email protected]>
>
> The sysfs attribute, get_poison, allows user space to request the
> retrieval of a CXL devices poison list for its persistent memory.
>
> From Documentation/ABI/.../sysfs-bus-cxl
> (WO) When a '1' is written to this attribute the memdev
> driver retrieves the poison list from the device. The list
> includes addresses that are poisoned or would result in
> poison if accessed, and the source of the poison. This
> attribute is only visible for devices supporting the
> capability. The retrieved errors are logged as kernel
> trace events with the label: cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>

Hi Alison,

I'm planning to throw together QEMU support for this and test
it. In meantime a few quick comments / suggestions inline.

Thanks,

Jonathan

> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 13 ++++++++++
> drivers/cxl/core/memdev.c | 32 +++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 7c2b846521f3..9d0c3988fdd2 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -163,3 +163,16 @@ Description:
> memory (type-3). The 'target_type' attribute indicates the
> current setting which may dynamically change based on what
> memory regions are activated in this decode hierarchy.
> +
> +What: /sys/bus/cxl/devices/memX/get_poison
> +Date: June, 2022
> +KernelVersion: v5.20
> +Contact: [email protected]
> +Description:
> + (WO) When a '1' is written to this attribute the memdev
> + driver retrieves the poison list from the device. The list
> + includes addresses that are poisoned or would result in
> + poison if accessed, and the source of the poison. This
> + attribute is only visible for devices supporting the
> + capability. The retrieved errors are logged as kernel
> + trace events with the label: cxl_poison_list.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f7cdcd33504a..5ef9ffaa934a 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -106,12 +106,34 @@ static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t get_poison_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +
> +{
> + int rc;
> +
> + if (!sysfs_streq(buf, "1")) {

Maybe kstrtobool? If you do then fine to leave the documentation claiming
it's tighter as that'll tell people who actually read it to expect to
write a 1.

> + dev_err(dev, "%s: unknown value: %s\n", attr->attr.name, buf);

Feels noisy when I'd expect -EINVAL to be enough info to indicate an invalid
parameter.

> + return -EINVAL;
> + }
> +
> + rc = cxl_mem_get_poison_list(dev);
> + if (rc) {
> + dev_err(dev, "Failed to retrieve poison list %d\n", rc);

Here I'd expect the error code to returned on the write to probably be enough
info so not sure this error print is useful either.

> + return rc;
> + }
> + return len;
> +}
> +static DEVICE_ATTR_WO(get_poison);
> +
> static struct attribute *cxl_memdev_attributes[] = {
> &dev_attr_serial.attr,
> &dev_attr_firmware_version.attr,
> &dev_attr_payload_max.attr,
> &dev_attr_label_storage_size.attr,
> &dev_attr_numa_node.attr,
> + &dev_attr_get_poison.attr,
> NULL,
> };
>
> @@ -130,6 +152,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> {
> if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> return 0;
> +
> + if (a == &dev_attr_get_poison.attr) {
> + struct device *dev = container_of(kobj, struct device, kobj);
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +
> + if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON,
> + cxlds->enabled_cmds))
to_cxl_memdev(dev)->enabled_cmds))
and drop the local variable is shorter and I don't htink it loses
any readability.

> + return 0;
> + }
> return a->mode;
> }
>

2022-06-16 19:55:27

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Tue, 14 Jun 2022, [email protected] wrote:

>From: Alison Schofield <[email protected]>
>
>CXL devices that support persistent memory maintain a list of locations
>that are poisoned or result in poison if the addresses are accessed by
>the host.
>
>Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
>list as a set of Media Error Records that include the source of the
>error, the starting device physical address and length. The length is
>the number of adjacent DPAs in the record and is in units of 64 bytes.
>
>Retrieve the list and log each Media Error Record as a trace event of
>type cxl_poison_list.
>
>Signed-off-by: Alison Schofield <[email protected]>
>---
> drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 118 insertions(+)
>
>diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>index 60d10ee1e7fc..29cf0459b44a 100644
>--- a/drivers/cxl/cxlmem.h
>+++ b/drivers/cxl/cxlmem.h
>@@ -174,6 +174,7 @@ struct cxl_endpoint_dvsec_info {
> * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
> * @lsa_size: Size of Label Storage Area
> * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
>+ * @poison_max_mer: maximum Media Error Records tracked in Poison List
> * @mbox_mutex: Mutex to synchronize mailbox access.
> * @firmware_version: Firmware version for the memory device.
> * @enabled_cmds: Hardware commands found enabled in CEL.
>@@ -204,6 +205,7 @@ struct cxl_dev_state {
>
> size_t payload_size;
> size_t lsa_size;
>+ u32 poison_max;
> struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> char firmware_version[0x10];
> DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
>@@ -317,6 +319,46 @@ struct cxl_mbox_set_partition_info {
>
> #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
>
>+struct cxl_mbox_poison_payload_in {
>+ __le64 offset;
>+ __le64 length;
>+} __packed;
>+
>+struct cxl_mbox_poison_payload_out {
>+ u8 flags;
>+ u8 rsvd1;
>+ __le64 overflow_timestamp;
>+ __le16 count;
>+ u8 rsvd2[0x14];
>+ struct cxl_poison_record {
>+ __le64 address;
>+ __le32 length;
>+ __le32 rsvd;
>+ } __packed record[];
>+} __packed;
>+
>+/* CXL 8.2.9.5.4.1 Get Poison List: payload out flags: */
>+#define CXL_POISON_FLAG_MORE BIT(0)
>+#define CXL_POISON_FLAG_OVERFLOW BIT(1)
>+#define CXL_POISON_FLAG_SCANNING BIT(2)
>+
>+/* CXL 8.2.9.5.4.1 Get Poison List: Error is encoded in record.address[2:0] */
>+#define CXL_POISON_SOURCE_MASK GENMASK(2, 0)
>+#define CXL_POISON_SOURCE_UNKNOWN 0
>+#define CXL_POISON_SOURCE_EXTERNAL 1
>+#define CXL_POISON_SOURCE_INTERNAL 2
>+#define CXL_POISON_SOURCE_INJECTED 3
>+#define CXL_POISON_SOURCE_VENDOR 7
>+
>+/* Software define */
>+#define CXL_POISON_SOURCE_INVALID 99
>+#define CXL_POISON_SOURCE_VALID(x) \
>+ (((x) == CXL_POISON_SOURCE_UNKNOWN) || \
>+ ((x) == CXL_POISON_SOURCE_EXTERNAL) || \
>+ ((x) == CXL_POISON_SOURCE_INTERNAL) || \
>+ ((x) == CXL_POISON_SOURCE_INJECTED) || \
>+ ((x) == CXL_POISON_SOURCE_VENDOR))
>+
> /**
> * struct cxl_mem_command - Driver representation of a memory device command
> * @info: Command information as it exists for the UAPI
>@@ -351,6 +393,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
>+int cxl_mem_get_poison_list(struct device *dev);
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
>diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>index 54f434733b56..c10c7020ebc2 100644
>--- a/drivers/cxl/core/mbox.c
>+++ b/drivers/cxl/core/mbox.c
>@@ -9,6 +9,9 @@
>
> #include "core.h"
>
>+#define CREATE_TRACE_POINTS
>+#include <trace/events/cxl.h>
>+
> static bool cxl_raw_allow_all;
>
> /**
>@@ -755,6 +758,7 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> {
> /* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
> struct cxl_mbox_identify id;
>+ __le32 val = 0;
> int rc;
>
> rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_IDENTIFY, NULL, 0, &id,
>@@ -783,6 +787,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> cxlds->lsa_size = le32_to_cpu(id.lsa_size);
> memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision));
>
>+ memcpy(&val, id.poison_list_max_mer, 3);
>+ cxlds->poison_max = le32_to_cpu(val);
>+
> return 0;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
>@@ -826,6 +833,74 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);
>
>+int cxl_mem_get_poison_list(struct device *dev)
>+{
>+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
>+ struct cxl_mbox_poison_payload_out *po;
>+ struct cxl_mbox_poison_payload_in pi;
>+ int nr_records = 0;
>+ int rc, i;
>+
>+ if (range_len(&cxlds->pmem_range)) {
>+ pi.offset = cpu_to_le64(cxlds->pmem_range.start);
>+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));

Do you ever see this changing to not always use the full pmem DPA range
but allow arbitrary ones? I also assume this is the reason why you don't
check the range vs cxlds->ram_range to prevent any overlaps, no?

Thanks,
Davidlohr

>+ } else {
>+ return -ENXIO;
>+ }
>+
>+ po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
>+ if (!po)
>+ return -ENOMEM;
>+
>+ do {
>+ rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
>+ sizeof(pi), po, cxlds->payload_size);
>+ if (rc)
>+ goto out;
>+
>+ if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
>+ time64_t o_time = le64_to_cpu(po->overflow_timestamp);
>+
>+ dev_err(dev, "Poison list overflow at %ptTs UTC\n",
>+ &o_time);
>+ rc = -ENXIO;
>+ goto out;
>+ }
>+
>+ if (po->flags & CXL_POISON_FLAG_SCANNING) {
>+ dev_err(dev, "Scan Media in Progress\n");
>+ rc = -EBUSY;
>+ goto out;
>+ }
>+
>+ for (i = 0; i < le16_to_cpu(po->count); i++) {
>+ u64 addr = le64_to_cpu(po->record[i].address);
>+ u32 len = le32_to_cpu(po->record[i].length);
>+ int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
>+
>+ if (!CXL_POISON_SOURCE_VALID(source)) {
>+ dev_dbg(dev, "Invalid poison source %d",
>+ source);
>+ source = CXL_POISON_SOURCE_INVALID;
>+ }
>+
>+ trace_cxl_poison_list(dev, source, addr, len);
>+ }
>+
>+ /* Protect against an uncleared _FLAG_MORE */
>+ nr_records = nr_records + le16_to_cpu(po->count);
>+ if (nr_records >= cxlds->poison_max)
>+ goto out;
>+
>+ } while (po->flags & CXL_POISON_FLAG_MORE);
>+
>+out:
>+ kvfree(po);
>+ return rc;
>+}
>+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
>+
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> {
> struct cxl_dev_state *cxlds;
>
>--
>2.31.1
>

2022-06-16 20:12:00

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 1/3] trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records

On Tue, 14 Jun 2022, [email protected] wrote:

>From: Alison Schofield <[email protected]>
>
>Add a trace event for CXL Poison List Media Error Records that
>includes the starting DPA of the poison, the length, and the
>the source of the poison.
>
>This trace event will be used by the CXL_MEM driver to log the
>Media Errors returned by the GET_POISON_LIST Mailbox command.
>
>Signed-off-by: Alison Schofield <[email protected]>
>Reviewed-by: Steven Rostedt (VMware) <[email protected]>

Reviewed-by: Davidlohr Bueso <[email protected]>

>---
> include/trace/events/cxl.h | 60 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 60 insertions(+)
> create mode 100644 include/trace/events/cxl.h
>
>diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
>new file mode 100644
>index 000000000000..17e707c3817e
>--- /dev/null
>+++ b/include/trace/events/cxl.h
>@@ -0,0 +1,60 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+#undef TRACE_SYSTEM
>+#define TRACE_SYSTEM cxl
>+
>+#if !defined(_CXL_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
>+#define _CXL_TRACE_H
>+
>+#include <linux/tracepoint.h>
>+
>+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_UNKNOWN);
>+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INTERNAL);
>+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_EXTERNAL);
>+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INJECTED);
>+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_VENDOR);
>+TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INVALID);
>+
>+#define show_poison_source(source) \
>+ __print_symbolic(source, \
>+ {CXL_POISON_SOURCE_UNKNOWN, "UNKNOWN"}, \
>+ {CXL_POISON_SOURCE_EXTERNAL, "EXTERNAL"}, \
>+ {CXL_POISON_SOURCE_INTERNAL, "INTERNAL"}, \
>+ {CXL_POISON_SOURCE_INJECTED, "INJECTED"}, \
>+ {CXL_POISON_SOURCE_VENDOR, "VENDOR"}, \
>+ {CXL_POISON_SOURCE_INVALID, "INVALID"})
>+
>+TRACE_EVENT(cxl_poison_list,
>+
>+ TP_PROTO(struct device *dev,
>+ int source,
>+ unsigned long start,
>+ unsigned int length),
>+
>+ TP_ARGS(dev, source, start, length),
>+
>+ TP_STRUCT__entry(
>+ __string(name, dev_name(dev))
>+ __field(int, source)
>+ __field(u64, start)
>+ __field(u32, length)
>+ ),
>+
>+ TP_fast_assign(
>+ __assign_str(name, dev_name(dev));
>+ __entry->source = source;
>+ __entry->start = start;
>+ __entry->length = length;
>+ ),
>+
>+ TP_printk("dev %s source %s start %llu length %u",
>+ __get_str(name),
>+ show_poison_source(__entry->source),
>+ __entry->start,
>+ __entry->length)
>+);
>+#endif /* _CXL_TRACE_H */
>+
>+/* This part must be outside protection */
>+#undef TRACE_INCLUDE_FILE
>+#define TRACE_INCLUDE_FILE cxl
>+#include <trace/define_trace.h>
>
>--
>2.31.1
>

2022-06-16 20:52:21

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, Jun 16, 2022 at 12:43:34PM -0700, Davidlohr Bueso wrote:
> On Tue, 14 Jun 2022, [email protected] wrote:
>
> >From: Alison Schofield <[email protected]>
> >
> >CXL devices that support persistent memory maintain a list of locations
> >that are poisoned or result in poison if the addresses are accessed by
> >the host.
> >
> >Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> >list as a set of Media Error Records that include the source of the
> >error, the starting device physical address and length. The length is
> >the number of adjacent DPAs in the record and is in units of 64 bytes.
> >
> >Retrieve the list and log each Media Error Record as a trace event of
> >type cxl_poison_list.
> >
> >Signed-off-by: Alison Schofield <[email protected]>
> >---
> > drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> > drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 118 insertions(+)
> >
snip

> >+int cxl_mem_get_poison_list(struct device *dev)
> >+{
> >+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> >+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
> >+ struct cxl_mbox_poison_payload_out *po;
> >+ struct cxl_mbox_poison_payload_in pi;
> >+ int nr_records = 0;
> >+ int rc, i;
> >+
> >+ if (range_len(&cxlds->pmem_range)) {
> >+ pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> >+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));

First off - you stopped at a bug here - that pi.length needs to be
in units of 64 bytes.
>
> Do you ever see this changing to not always use the full pmem DPA range
> but allow arbitrary ones? I also assume this is the reason why you don't
> check the range vs cxlds->ram_range to prevent any overlaps, no?
>
> Thanks,
> Davidlohr

David - Great question!

I'm headed in this direction -

cxl list --media-errors -m mem1
lists media errors for requested memdev

cxl list --media-errors -r region#
lists region errors with HPA addresses
(So here cxl tool will collect the poison for all the regions
memdevs and do the DPA to HPA translation)

To answer your question, I wasn't thinking of limiting
the range within the memdev, but certainly could. And if we were
taking in ranges, those ranges would need to be checked.

$cxl list --media-errors -m mem1 --range-start= --range-end|len=

Now, if I left the sysfs inteface as is, the driver will read the
entire poison list for the memdev and then cxl tool will filter it
for the range requested.

Or, maybe we should implement in libcxl (not sysfs), with memdev and
range options and only collect from the device the range requested.

Either one looks the same to the cxl tool user, but limiting the
range we send to the device would certainly cut down on unwanted
records being logged, retrieved, and examined.

I'd like to hear more from you and other community members.

Alison

> > snip

2022-06-16 21:23:12

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

On Thu, Jun 16, 2022 at 08:04:12AM -0700, Jonathan Cameron wrote:
> On Tue, 14 Jun 2022 17:10:28 -0700
> [email protected] wrote:
>
> > From: Alison Schofield <[email protected]>
> >
> > The sysfs attribute, get_poison, allows user space to request the
> > retrieval of a CXL devices poison list for its persistent memory.
> >
> > From Documentation/ABI/.../sysfs-bus-cxl
> > (WO) When a '1' is written to this attribute the memdev
> > driver retrieves the poison list from the device. The list
> > includes addresses that are poisoned or would result in
> > poison if accessed, and the source of the poison. This
> > attribute is only visible for devices supporting the
> > capability. The retrieved errors are logged as kernel
> > trace events with the label: cxl_poison_list.
> >
> > Signed-off-by: Alison Schofield <[email protected]>
>
> Hi Alison,
>
> I'm planning to throw together QEMU support for this and test
> it. In meantime a few quick comments / suggestions inline.

Thanks Jonathan.
I've tested with a test patch that returns contrived output payloads,
and will look fwd to trying out w qemu,

>
> Thanks,
>
> Jonathan
>
> > ---
snip
> > +
> > + if (!sysfs_streq(buf, "1")) {
>
> Maybe kstrtobool? If you do then fine to leave the documentation claiming
> it's tighter as that'll tell people who actually read it to expect to
> write a 1.
>
Got it.

> > + dev_err(dev, "%s: unknown value: %s\n", attr->attr.name, buf);
>
> Feels noisy when I'd expect -EINVAL to be enough info to indicate an invalid
> parameter.
>
Got it.

> > + return -EINVAL;
> > + }
> > +
> > + rc = cxl_mem_get_poison_list(dev);
> > + if (rc) {
> > + dev_err(dev, "Failed to retrieve poison list %d\n", rc);
>
> Here I'd expect the error code to returned on the write to probably be enough
> info so not sure this error print is useful either.
>
Got it.

> > +
snip
> > + if (a == &dev_attr_get_poison.attr) {
> > + struct device *dev = container_of(kobj, struct device, kobj);
> > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +
> > + if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON,
> > + cxlds->enabled_cmds))
> to_cxl_memdev(dev)->enabled_cmds))
> and drop the local variable is shorter and I don't htink it loses
> any readability.
>
Got it.

Thanks Jonathan!

2022-06-16 22:24:44

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, 16 Jun 2022, Alison Schofield wrote:
>I'm headed in this direction -

I like these interfaces, btw.

>cxl list --media-errors -m mem1
> lists media errors for requested memdev

But in this patchset you're only listing for persistent configurations.
So if there is a volatile partion, or the whole device is volatile,
this would not consider that.

So unless I'm missing something, we need to consider ram_range as well.

>cxl list --media-errors -r region#
> lists region errors with HPA addresses
> (So here cxl tool will collect the poison for all the regions
> memdevs and do the DPA to HPA translation)

I was indeed thinking along these lines. But similar to the above,
the region driver also has plans to enumarate volatile regions
configured by BIOS.

>
>To answer your question, I wasn't thinking of limiting
>the range within the memdev, but certainly could. And if we were
>taking in ranges, those ranges would need to be checked.

My question was originally considering poisoning only within pmem DPA
ranges, but now I'm wondering if all this also applies equally to volatile
parts as well... Reading the spec I interpret both, but reading the
T3 Memory Device Software Guide '2.13.19' it only mentions persistent
capacity.

>
>$cxl list --media-errors -m mem1 --range-start= --range-end|len=

I figure this kind of like the above with regions being very arbitrary
and dynamic.

>Now, if I left the sysfs interface as is, the driver will read the
>entire poison list for the memdev and then cxl tool will filter it
>for the range requested.
>
>Or, maybe we should implement in libcxl (not sysfs), with memdev and
>range options and only collect from the device the range requested.

I wonder if the latter may be the better option considering that always
scanning the entire memdev would cause unnecessary media scan wait times,
specially for large capacities.

Thanks,
Davidlohr

2022-06-16 22:34:54

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

David - you make lots of good points, one quick comments at end...

On Thu, Jun 16, 2022 at 02:47:40PM -0700, Davidlohr Bueso wrote:
> On Thu, 16 Jun 2022, Alison Schofield wrote:
> >I'm headed in this direction -
>
> I like these interfaces, btw.
>
> >cxl list --media-errors -m mem1
> > lists media errors for requested memdev
>
> But in this patchset you're only listing for persistent configurations.
> So if there is a volatile partion, or the whole device is volatile,
> this would not consider that.
>
> So unless I'm missing something, we need to consider ram_range as well.
>
> >cxl list --media-errors -r region#
> > lists region errors with HPA addresses
> > (So here cxl tool will collect the poison for all the regions
> > memdevs and do the DPA to HPA translation)
>
> I was indeed thinking along these lines. But similar to the above,
> the region driver also has plans to enumarate volatile regions
> configured by BIOS.
>
> >
> >To answer your question, I wasn't thinking of limiting
> >the range within the memdev, but certainly could. And if we were
> >taking in ranges, those ranges would need to be checked.
>
> My question was originally considering poisoning only within pmem DPA
> ranges, but now I'm wondering if all this also applies equally to volatile
> parts as well... Reading the spec I interpret both, but reading the
> T3 Memory Device Software Guide '2.13.19' it only mentions persistent
> capacity.
>
> >
> >$cxl list --media-errors -m mem1 --range-start= --range-end|len=
>
> I figure this kind of like the above with regions being very arbitrary
> and dynamic.
>
> >Now, if I left the sysfs interface as is, the driver will read the
> >entire poison list for the memdev and then cxl tool will filter it
> >for the range requested.
> >
> >Or, maybe we should implement in libcxl (not sysfs), with memdev and
> >range options and only collect from the device the range requested.
>
> I wonder if the latter may be the better option considering that always
> scanning the entire memdev would cause unnecessary media scan wait times,
> specially for large capacities.

This is not a Media Scan. This is only reading the existing Poison List.

>
> Thanks,
> Davidlohr

2022-06-16 22:59:02

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, 16 Jun 2022, Alison Schofield wrote:
>> I wonder if the latter may be the better option considering that always
>> scanning the entire memdev would cause unnecessary media scan wait times,
>> specially for large capacities.
>
>This is not a Media Scan. This is only reading the existing Poison List.

Right, but I was thinking we'd want a input similar interface passing the
desidered range.

Thanks,
Davidlohr

2022-06-16 23:33:30

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, 16 Jun 2022, Alison Schofield wrote:

>cxl list --media-errors -m mem1
> lists media errors for requested memdev
>
>cxl list --media-errors -r region#

A quick question on the tooling front: the above goes nicely with
cxl-list, but what about the rest of the poisoning cmds? Do you have
anything in mind? Do we want something specific for media and poison
management instead? Ie:

cxl media --list-errors <params>
cxl media --inject-errors <params>
cxl media --clear-errors <params>

Thanks,
Davidlohr

2022-06-16 23:35:16

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, Jun 16, 2022 at 03:45:25PM -0700, Davidlohr Bueso wrote:
> On Thu, 16 Jun 2022, Alison Schofield wrote:
>
> >cxl list --media-errors -m mem1
> > lists media errors for requested memdev
> >
> >cxl list --media-errors -r region#
>
> A quick question on the tooling front: the above goes nicely with
> cxl-list, but what about the rest of the poisoning cmds? Do you have
> anything in mind? Do we want something specific for media and poison
> management instead? Ie:
>
> cxl media --list-errors <params>
Not clear how this one differs. Seems like we can get any piece of
the list w cxl list.

> cxl media --inject-errors <params>
> cxl media --clear-errors <params>
For inject/clear I'd probably start w what ndctl does today.
ndctl inject−error <namespace> [<options>]
where option -d --uninject performs the clear.

>
> Thanks,
> Davidlohr

2022-06-16 23:55:10

by Verma, Vishal L

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, 2022-06-16 at 16:15 -0700, Alison Schofield wrote:
> On Thu, Jun 16, 2022 at 03:45:25PM -0700, Davidlohr Bueso wrote:
> > On Thu, 16 Jun 2022, Alison Schofield wrote:
> >
> > > cxl list --media-errors -m mem1
> > >         lists media errors for requested memdev
> > >
> > > cxl list --media-errors -r region#
> >
> > A quick question on the tooling front: the above goes nicely with
> > cxl-list, but what about the rest of the poisoning cmds? Do you
> > have
> > anything in mind? Do we want something specific for media and
> > poison
> > management instead? Ie:
> >
> > cxl media --list-errors <params>
> Not clear how this one differs. Seems like we can get any piece of
> the list w cxl list.
>
> > cxl media --inject-errors <params>
> > cxl media --clear-errors <params>
> For inject/clear I'd probably start w what ndctl does today.
> ndctl inject−error  <namespace> [<options>]
> where option -d --uninject performs the clear.

Yeah agreed with Alison that cxl inject-error <options> sounds good.
Generally speaking, we've tried to avoid the 'sub-command of a
subcommand' situation - such as 'cxl <media> <sub-sub-command>
<options>', instead prefering 'cxl <action-object> <options>'.

>
> >
> > Thanks,
> > Davidlohr

2022-06-17 00:58:03

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Thu, 16 Jun 2022, Verma, Vishal L wrote:
>Yeah agreed with Alison that cxl inject-error <options> sounds good.

I agree as well, that's a much nicer interface.

Thanks,
Davidlohr

2022-06-17 13:11:33

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Tue, 14 Jun 2022 17:10:27 -0700
[email protected] wrote:

> From: Alison Schofield <[email protected]>
>
> CXL devices that support persistent memory maintain a list of locations
> that are poisoned or result in poison if the addresses are accessed by
> the host.
>
> Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> list as a set of Media Error Records that include the source of the
> error, the starting device physical address and length. The length is
> the number of adjacent DPAs in the record and is in units of 64 bytes.
>
> Retrieve the list and log each Media Error Record as a trace event of
> type cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>
> ---

> +int cxl_mem_get_poison_list(struct device *dev)
> +{
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_mbox_poison_payload_out *po;
> + struct cxl_mbox_poison_payload_in pi;
> + int nr_records = 0;
> + int rc, i;
> +
> + if (range_len(&cxlds->pmem_range)) {
> + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> + } else {
> + return -ENXIO;
> + }
> +
> + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> + if (!po)
> + return -ENOMEM;
> +
> + do {
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> + sizeof(pi), po, cxlds->payload_size);
> + if (rc)
> + goto out;
> +
> + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> +
> + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> + &o_time);
> + rc = -ENXIO;
> + goto out;
> + }
> +
> + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> + dev_err(dev, "Scan Media in Progress\n");
> + rc = -EBUSY;
> + goto out;
> + }
> +
> + for (i = 0; i < le16_to_cpu(po->count); i++) {
> + u64 addr = le64_to_cpu(po->record[i].address);
> + u32 len = le32_to_cpu(po->record[i].length);
> + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> +
> + if (!CXL_POISON_SOURCE_VALID(source)) {
> + dev_dbg(dev, "Invalid poison source %d",
> + source);
> + source = CXL_POISON_SOURCE_INVALID;
> + }
> +
> + trace_cxl_poison_list(dev, source, addr, len);
> + }
> +
> + /* Protect against an uncleared _FLAG_MORE */
> + nr_records = nr_records + le16_to_cpu(po->count);
> + if (nr_records >= cxlds->poison_max)

If this happens and _FLAG_MORE is set (it will occur anyway currently
if we happen to have poison_max records - I hit this in QEMU because
until now default of poison_max == 0)
then we should spit out an error message as I think that means the
hardware is broken.


> + goto out;
> +
> + } while (po->flags & CXL_POISON_FLAG_MORE);
> +
> +out:
> + kvfree(po);
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> +
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> {
> struct cxl_dev_state *cxlds;

2022-06-17 14:20:09

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Tue, 14 Jun 2022 17:10:27 -0700
[email protected] wrote:

> From: Alison Schofield <[email protected]>
>
> CXL devices that support persistent memory maintain a list of locations
> that are poisoned or result in poison if the addresses are accessed by
> the host.
>
> Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> list as a set of Media Error Records that include the source of the
> error, the starting device physical address and length. The length is
> the number of adjacent DPAs in the record and is in units of 64 bytes.
>
> Retrieve the list and log each Media Error Record as a trace event of
> type cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>

A few more things inline.

Otherwise, can confirm it works with some hack QEMU code.
I'll tidy that up and post soon.

> +int cxl_mem_get_poison_list(struct device *dev)
> +{
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_mbox_poison_payload_out *po;
> + struct cxl_mbox_poison_payload_in pi;
> + int nr_records = 0;
> + int rc, i;
> +
> + if (range_len(&cxlds->pmem_range)) {
> + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> + } else {
> + return -ENXIO;
> + }
> +
> + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> + if (!po)
> + return -ENOMEM;
> +
> + do {
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> + sizeof(pi), po, cxlds->payload_size);
> + if (rc)
> + goto out;
> +
> + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> +
> + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> + &o_time);
> + rc = -ENXIO;
> + goto out;
> + }
> +
> + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> + dev_err(dev, "Scan Media in Progress\n");
> + rc = -EBUSY;
> + goto out;
> + }
> +
> + for (i = 0; i < le16_to_cpu(po->count); i++) {
> + u64 addr = le64_to_cpu(po->record[i].address);

> + u32 len = le32_to_cpu(po->record[i].length);


> + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> +
> + if (!CXL_POISON_SOURCE_VALID(source)) {
> + dev_dbg(dev, "Invalid poison source %d",
> + source);
> + source = CXL_POISON_SOURCE_INVALID;
> + }
> +
> + trace_cxl_poison_list(dev, source, addr, len);

Need to mask off the lower 6 bits of addr as they contain the source
+ a few reserved bits.

I was confused how you were geting better than 64 byte precision in your
example.

> + }
> +
> + /* Protect against an uncleared _FLAG_MORE */
> + nr_records = nr_records + le16_to_cpu(po->count);
> + if (nr_records >= cxlds->poison_max)
> + goto out;
> +
> + } while (po->flags & CXL_POISON_FLAG_MORE);
So.. A conundrum here. What happens if:

1. We get an error mid way through a set of multiple reads
(something intermittent - maybe a software issue)
2. We will drop out of here fine and report the error.
3. We run this function again.

It will (I think) currently pick up where we left off, but we have
no way of knowing that as there isn't a 'total records' count or
any other form of index in the output payload.

So, software solutions I think should work (though may warrant a note
to be added to the spec).

1. Read whole thing twice. First time is just to ensure we get
to the end and flush out any prior half done reads.
2. Issue a read for a different region (perhaps length 0) first
and assume everything starts from scratch when we go back to
this region.

Jonathan



> +
> +out:
> + kvfree(po);
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> +
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> {
> struct cxl_dev_state *cxlds;

2022-06-17 16:27:01

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 1/3] trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records

On Tue, 14 Jun 2022 17:10:26 -0700
[email protected] wrote:

> From: Alison Schofield <[email protected]>
>
> Add a trace event for CXL Poison List Media Error Records that
> includes the starting DPA of the poison, the length, and the
> the source of the poison.
>
> This trace event will be used by the CXL_MEM driver to log the
> Media Errors returned by the GET_POISON_LIST Mailbox command.
>
> Signed-off-by: Alison Schofield <[email protected]>

Some comments on how this is used than the actual trace point for which

Reviewed-by: Jonathan Cameron <[email protected]>

The patch set currently just pushes through the length as provided
in the CXL record. That is in units of 64 byte cachelines so I'd suggest
the result will be more readable if we multiply it up to be in bytes.

Also the address should be masked to the same length, but I mentioned that
at the caller.

Jonathan


> ---
> include/trace/events/cxl.h | 60 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 60 insertions(+)
> create mode 100644 include/trace/events/cxl.h
>
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..17e707c3817e
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_UNKNOWN);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INTERNAL);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_EXTERNAL);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INJECTED);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_VENDOR);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INVALID);
> +
> +#define show_poison_source(source) \
> + __print_symbolic(source, \
> + {CXL_POISON_SOURCE_UNKNOWN, "UNKNOWN"}, \
> + {CXL_POISON_SOURCE_EXTERNAL, "EXTERNAL"}, \
> + {CXL_POISON_SOURCE_INTERNAL, "INTERNAL"}, \
> + {CXL_POISON_SOURCE_INJECTED, "INJECTED"}, \
> + {CXL_POISON_SOURCE_VENDOR, "VENDOR"}, \
> + {CXL_POISON_SOURCE_INVALID, "INVALID"})
> +
> +TRACE_EVENT(cxl_poison_list,
> +
> + TP_PROTO(struct device *dev,
> + int source,
> + unsigned long start,
> + unsigned int length),
> +
> + TP_ARGS(dev, source, start, length),
> +
> + TP_STRUCT__entry(
> + __string(name, dev_name(dev))
> + __field(int, source)
> + __field(u64, start)
> + __field(u32, length)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, dev_name(dev));
> + __entry->source = source;
> + __entry->start = start;
> + __entry->length = length;
> + ),
> +
> + TP_printk("dev %s source %s start %llu length %u",
> + __get_str(name),
> + show_poison_source(__entry->source),
> + __entry->start,
> + __entry->length)
> +);
> +#endif /* _CXL_TRACE_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>

2022-06-17 17:09:44

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Fri, Jun 17, 2022 at 07:05:08AM -0700, Jonathan Cameron wrote:
> On Tue, 14 Jun 2022 17:10:27 -0700
> [email protected] wrote:
>
> > From: Alison Schofield <[email protected]>
> >
> > CXL devices that support persistent memory maintain a list of locations
> > that are poisoned or result in poison if the addresses are accessed by
> > the host.
> >
> > Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > list as a set of Media Error Records that include the source of the
> > error, the starting device physical address and length. The length is
> > the number of adjacent DPAs in the record and is in units of 64 bytes.
> >
> > Retrieve the list and log each Media Error Record as a trace event of
> > type cxl_poison_list.
> >
> > Signed-off-by: Alison Schofield <[email protected]>
>
> A few more things inline.
>
> Otherwise, can confirm it works with some hack QEMU code.
> I'll tidy that up and post soon.
>
> > +int cxl_mem_get_poison_list(struct device *dev)
> > +{
snip
> > +
> > + trace_cxl_poison_list(dev, source, addr, len);
>
> Need to mask off the lower 6 bits of addr as they contain the source
> + a few reserved bits.
>
> I was confused how you were geting better than 64 byte precision in your
> example.
>
Ah...got it. Thanks!

> > + }
> > +
> > + /* Protect against an uncleared _FLAG_MORE */
> > + nr_records = nr_records + le16_to_cpu(po->count);
> > + if (nr_records >= cxlds->poison_max)
> > + goto out;
> > +
> > + } while (po->flags & CXL_POISON_FLAG_MORE);
> So.. A conundrum here. What happens if:
>
> 1. We get an error mid way through a set of multiple reads
> (something intermittent - maybe a software issue)
> 2. We will drop out of here fine and report the error.
> 3. We run this function again.
>
> It will (I think) currently pick up where we left off, but we have
> no way of knowing that as there isn't a 'total records' count or
> any other form of index in the output payload.

Yes. That is sad. I'm assume it's by design and CXL devices never
intended to keep any totals.

>
> So, software solutions I think should work (though may warrant a note
> to be added to the spec).
>
> 1. Read whole thing twice. First time is just to ensure we get
> to the end and flush out any prior half done reads.
> 2. Issue a read for a different region (perhaps length 0) first
> and assume everything starts from scratch when we go back to
> this region.

Can you tell me more about 2 ?

Also, Since posting this I have added protection to this path to ensure
only one reader of the poison list for this device. Like this:

if (!completion_done(&cxlds->read_poison_complete);
return -EBUSY;
wait_for_completion_interruptible(&cxlds->read_poison_complete);
...GET ALL THE POISON...
complete(&cxlds->read_poison_complete);

And will add the error message on that unexpected _FLAG_MORE too.

Alison
>
> Jonathan
>



> > +
> > +out:
> > + kvfree(po);
> > + return rc;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> > +
> > struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> > {
> > struct cxl_dev_state *cxlds;
>

2022-06-17 17:31:08

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Fri, 17 Jun 2022, Alison Schofield wrote:

>On Fri, Jun 17, 2022 at 07:05:08AM -0700, Jonathan Cameron wrote:
>> On Tue, 14 Jun 2022 17:10:27 -0700
>> [email protected] wrote:
>>
>> > From: Alison Schofield <[email protected]>
>> >
>> > CXL devices that support persistent memory maintain a list of locations
>> > that are poisoned or result in poison if the addresses are accessed by
>> > the host.
>> >
>> > Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
>> > list as a set of Media Error Records that include the source of the
>> > error, the starting device physical address and length. The length is
>> > the number of adjacent DPAs in the record and is in units of 64 bytes.
>> >
>> > Retrieve the list and log each Media Error Record as a trace event of
>> > type cxl_poison_list.
>> >
>> > Signed-off-by: Alison Schofield <[email protected]>
>>
>> A few more things inline.
>>
>> Otherwise, can confirm it works with some hack QEMU code.
>> I'll tidy that up and post soon.
>>
>> > +int cxl_mem_get_poison_list(struct device *dev)
>> > +{
>snip
>> > +
>> > + trace_cxl_poison_list(dev, source, addr, len);
>>
>> Need to mask off the lower 6 bits of addr as they contain the source
>> + a few reserved bits.
>>
>> I was confused how you were geting better than 64 byte precision in your
>> example.
>>
>Ah...got it. Thanks!
>
>> > + }
>> > +
>> > + /* Protect against an uncleared _FLAG_MORE */
>> > + nr_records = nr_records + le16_to_cpu(po->count);
>> > + if (nr_records >= cxlds->poison_max)
>> > + goto out;
>> > +
>> > + } while (po->flags & CXL_POISON_FLAG_MORE);
>> So.. A conundrum here. What happens if:
>>
>> 1. We get an error mid way through a set of multiple reads
>> (something intermittent - maybe a software issue)
>> 2. We will drop out of here fine and report the error.
>> 3. We run this function again.
>>
>> It will (I think) currently pick up where we left off, but we have
>> no way of knowing that as there isn't a 'total records' count or
>> any other form of index in the output payload.
>
>Yes. That is sad. I'm assume it's by design and CXL devices never
>intended to keep any totals.
>
>>
>> So, software solutions I think should work (though may warrant a note
>> to be added to the spec).
>>
>> 1. Read whole thing twice. First time is just to ensure we get
>> to the end and flush out any prior half done reads.
>> 2. Issue a read for a different region (perhaps length 0) first
>> and assume everything starts from scratch when we go back to
>> this region.
>
>Can you tell me more about 2 ?
>
>Also, Since posting this I have added protection to this path to ensure
>only one reader of the poison list for this device. Like this:

I don't think we should prevent multiple list reads. I would expect the
scenario Jonathan describes to be the uncommon case.

Thanks,
Davidlohr

>
>if (!completion_done(&cxlds->read_poison_complete);
> return -EBUSY;
>wait_for_completion_interruptible(&cxlds->read_poison_complete);
> ...GET ALL THE POISON...
>complete(&cxlds->read_poison_complete);
>
>And will add the error message on that unexpected _FLAG_MORE too.
>
>Alison
>>
>> Jonathan
>>
>
>
>
>> > +
>> > +out:
>> > + kvfree(po);
>> > + return rc;
>> > +}
>> > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
>> > +
>> > struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
>> > {
>> > struct cxl_dev_state *cxlds;
>>

2022-06-17 18:03:54

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH 0/3] CXL Poison List Retrieval & Tracing

alison.schofield@ wrote:
> From: Alison Schofield <[email protected]>
>
> Introducing the first piece of support for CXL Media Errors,
> offering the ability to retrieve a devices poison list and
> store the returned error records as kernel trace events.
>
> The handling of the poison list is guided by the CXL 2.0 Spec
> Section 8.2.9.5.4.1. [1] The usage of Trace Events to store the
> Media Error records is a first look at the proposed handling
> of CXL ARS events.

Nice! Thanks for getting this started.

>
> Example command line usage:
>
> $ trace-cmd record -e cxl_poison_list
> $ echo 1 > /sys/bus/cxl/devices/mem1/get_poison

Perhaps call this 'trace_poison_cached', and later call the one that
does scan_media 'trace_poison'? Otherwise it is odd that an attribute
named "get" is write-only and returns nothing but an error code.


> $ trace-cmd report trace.dat
>
> cxl_poison_list: memdev: mem3 source EXTERNAL start 0x41 length 0x2

Perhaps just make the source the raw number value so there is no need to
go change the kernel if that reserved field grows more definitions, or
if userspace grows knowledge of what values 0x4, 0x5, and 0x6 mean for a
given device.

> cxl_poison_list: memdev: mem3 source INTERNAL start 0xc2 length 0x3

I wonder if this wants to eventually be unified into a common format
with the General Media Event record? The idea being that there will be
multiple sources and triggers for CXL subsystem trace events. Lets take
the scenario of poison being written to a device by a DMA agent. In that
case the device may fire a General Media Event with the Uncorrectable
indicator and the DPA. Userspace will want to distinguish that from the
Media Error record, but probably benefits from using similar event
parsing. The trace event could be modified with something like an
"origin" key to say whether the event was asynchronously generated by
the device, or synchronously requested by a given process. Something
like: "origin:<device|pid>".

> cxl_poison_list: memdev: mem3 source INJECTED start 0x183 length 0x4
> cxl_poison_list: memdev: mem3 source INVALID start 0x284 length 0x5
> cxl_poison_list: memdev: mem3 source VENDOR start 0x707 length 0x8

From a parsing perspective should these be consistent about the usage of
":" to delineate keys vs values? I.e.:

cxl_poison_list: memdev:mem3 source:VENDOR start:0x707 length:0x8

...that way if this format changes tooling will be prepared to tolerate
new keys injected at any position. I have not looked at how
libtraceevent identifies fields and backwards compatibility.

>
> [1]: https://www.computeexpresslink.org/download-the-specification
>
> Alison Schofield (3):
> trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records
> cxl/mbox: Add GET_POISON_LIST mailbox command support
> cxl/core: Add sysfs attribute get_poison for list retrieval
>
> Documentation/ABI/testing/sysfs-bus-cxl | 13 +++++
> drivers/cxl/cxlmem.h | 43 ++++++++++++++
> include/trace/events/cxl.h | 60 ++++++++++++++++++++
> drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++
> drivers/cxl/core/memdev.c | 32 +++++++++++
> 5 files changed, 223 insertions(+)
> create mode 100644 include/trace/events/cxl.h
>
>
> base-commit: 2263e9ed65887cc7c6e977f372596199d2c9f4af
> --
> 2.31.1
>

2022-06-17 18:13:14

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH 1/3] trace, cxl: Introduce a TRACE_EVENT for CXL Poison Records

alison.schofield@ wrote:
> From: Alison Schofield <[email protected]>
>
> Add a trace event for CXL Poison List Media Error Records that
> includes the starting DPA of the poison, the length, and the
> the source of the poison.
>
> This trace event will be used by the CXL_MEM driver to log the
> Media Errors returned by the GET_POISON_LIST Mailbox command.
>
> Signed-off-by: Alison Schofield <[email protected]>
> ---
> include/trace/events/cxl.h | 60 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 60 insertions(+)
> create mode 100644 include/trace/events/cxl.h
>
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..17e707c3817e
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_UNKNOWN);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INTERNAL);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_EXTERNAL);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INJECTED);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_VENDOR);
> +TRACE_DEFINE_ENUM(CXL_POISON_SOURCE_INVALID);
> +
> +#define show_poison_source(source) \
> + __print_symbolic(source, \
> + {CXL_POISON_SOURCE_UNKNOWN, "UNKNOWN"}, \
> + {CXL_POISON_SOURCE_EXTERNAL, "EXTERNAL"}, \
> + {CXL_POISON_SOURCE_INTERNAL, "INTERNAL"}, \
> + {CXL_POISON_SOURCE_INJECTED, "INJECTED"}, \
> + {CXL_POISON_SOURCE_VENDOR, "VENDOR"}, \
> + {CXL_POISON_SOURCE_INVALID, "INVALID"})
> +
> +TRACE_EVENT(cxl_poison_list,
> +
> + TP_PROTO(struct device *dev,

The CXL memory device names need some decoding before other userspace
RAS tools might know what they are. So in addition to
dev_name(&cxlmd->dev) I think it would be useful to include
dev_name(cxlmd->dev.parent), i.e. the host PCI device name.

Other than that, looks good to me.

> + int source,
> + unsigned long start,
> + unsigned int length),
> +
> + TP_ARGS(dev, source, start, length),
> +
> + TP_STRUCT__entry(
> + __string(name, dev_name(dev))
> + __field(int, source)
> + __field(u64, start)
> + __field(u32, length)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(name, dev_name(dev));
> + __entry->source = source;
> + __entry->start = start;
> + __entry->length = length;
> + ),
> +
> + TP_printk("dev %s source %s start %llu length %u",
> + __get_str(name),
> + show_poison_source(__entry->source),
> + __entry->start,
> + __entry->length)
> +);
> +#endif /* _CXL_TRACE_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>
> --
> 2.31.1
>


2022-06-17 18:28:21

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

alison.schofield@ wrote:
> From: Alison Schofield <[email protected]>
>
> CXL devices that support persistent memory maintain a list of locations
> that are poisoned or result in poison if the addresses are accessed by
> the host.
>
> Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> list as a set of Media Error Records that include the source of the
> error, the starting device physical address and length. The length is
> the number of adjacent DPAs in the record and is in units of 64 bytes.
>
> Retrieve the list and log each Media Error Record as a trace event of
> type cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>
> ---
> drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 118 insertions(+)
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 60d10ee1e7fc..29cf0459b44a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -174,6 +174,7 @@ struct cxl_endpoint_dvsec_info {
> * (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
> * @lsa_size: Size of Label Storage Area
> * (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
> + * @poison_max_mer: maximum Media Error Records tracked in Poison List
> * @mbox_mutex: Mutex to synchronize mailbox access.
> * @firmware_version: Firmware version for the memory device.
> * @enabled_cmds: Hardware commands found enabled in CEL.
> @@ -204,6 +205,7 @@ struct cxl_dev_state {
>
> size_t payload_size;
> size_t lsa_size;
> + u32 poison_max;
> struct mutex mbox_mutex; /* Protects device mailbox and firmware */
> char firmware_version[0x10];
> DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
> @@ -317,6 +319,46 @@ struct cxl_mbox_set_partition_info {
>
> #define CXL_SET_PARTITION_IMMEDIATE_FLAG BIT(0)
>
> +struct cxl_mbox_poison_payload_in {
> + __le64 offset;
> + __le64 length;
> +} __packed;
> +
> +struct cxl_mbox_poison_payload_out {
> + u8 flags;
> + u8 rsvd1;
> + __le64 overflow_timestamp;
> + __le16 count;
> + u8 rsvd2[0x14];
> + struct cxl_poison_record {
> + __le64 address;
> + __le32 length;
> + __le32 rsvd;
> + } __packed record[];
> +} __packed;
> +
> +/* CXL 8.2.9.5.4.1 Get Poison List: payload out flags: */
> +#define CXL_POISON_FLAG_MORE BIT(0)
> +#define CXL_POISON_FLAG_OVERFLOW BIT(1)
> +#define CXL_POISON_FLAG_SCANNING BIT(2)
> +
> +/* CXL 8.2.9.5.4.1 Get Poison List: Error is encoded in record.address[2:0] */
> +#define CXL_POISON_SOURCE_MASK GENMASK(2, 0)
> +#define CXL_POISON_SOURCE_UNKNOWN 0
> +#define CXL_POISON_SOURCE_EXTERNAL 1
> +#define CXL_POISON_SOURCE_INTERNAL 2
> +#define CXL_POISON_SOURCE_INJECTED 3
> +#define CXL_POISON_SOURCE_VENDOR 7
> +
> +/* Software define */
> +#define CXL_POISON_SOURCE_INVALID 99
> +#define CXL_POISON_SOURCE_VALID(x) \
> + (((x) == CXL_POISON_SOURCE_UNKNOWN) || \
> + ((x) == CXL_POISON_SOURCE_EXTERNAL) || \
> + ((x) == CXL_POISON_SOURCE_INTERNAL) || \
> + ((x) == CXL_POISON_SOURCE_INJECTED) || \
> + ((x) == CXL_POISON_SOURCE_VENDOR))
> +
> /**
> * struct cxl_mem_command - Driver representation of a memory device command
> * @info: Command information as it exists for the UAPI
> @@ -351,6 +393,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +int cxl_mem_get_poison_list(struct device *dev);
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 54f434733b56..c10c7020ebc2 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -9,6 +9,9 @@
>
> #include "core.h"
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
> +
> static bool cxl_raw_allow_all;
>
> /**
> @@ -755,6 +758,7 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> {
> /* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
> struct cxl_mbox_identify id;
> + __le32 val = 0;
> int rc;
>
> rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_IDENTIFY, NULL, 0, &id,
> @@ -783,6 +787,9 @@ int cxl_dev_state_identify(struct cxl_dev_state *cxlds)
> cxlds->lsa_size = le32_to_cpu(id.lsa_size);
> memcpy(cxlds->firmware_version, id.fw_revision, sizeof(id.fw_revision));
>
> + memcpy(&val, id.poison_list_max_mer, 3);
> + cxlds->poison_max = le32_to_cpu(val);
> +
> return 0;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_dev_state_identify, CXL);
> @@ -826,6 +833,74 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);
>
> +int cxl_mem_get_poison_list(struct device *dev)
> +{
> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_mbox_poison_payload_out *po;
> + struct cxl_mbox_poison_payload_in pi;
> + int nr_records = 0;
> + int rc, i;
> +
> + if (range_len(&cxlds->pmem_range)) {
> + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> + } else {
> + return -ENXIO;
> + }
> +
> + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> + if (!po)
> + return -ENOMEM;
> +
> + do {
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> + sizeof(pi), po, cxlds->payload_size);
> + if (rc)
> + goto out;

In this case, no need for a 'goto' when 'break' will do.

> +
> + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> +
> + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> + &o_time);

This should be just another event type, not an error message.

> + rc = -ENXIO;

No need to error out, the media error records are still valid.

> + goto out;
> + }
> +
> + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> + dev_err(dev, "Scan Media in Progress\n");

This isn't an error condition and it should be something the kernel is
mediating in the first instance. For example if userspace did:

echo 1 > trace_poison &
echo 1 > trace_poison_cached

I would expect the second echo to just block awaiting the completion of
the scan media request. I.e. just prevent the possibility of these
commands colliding outside of CONFIG_CXL_MEM_RAW_COMMANDS=y shenanigans,
in which case userspace gets to keep the pieces.

> + rc = -EBUSY;
> + goto out;
> + }
> +
> + for (i = 0; i < le16_to_cpu(po->count); i++) {
> + u64 addr = le64_to_cpu(po->record[i].address);
> + u32 len = le32_to_cpu(po->record[i].length);
> + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> +
> + if (!CXL_POISON_SOURCE_VALID(source)) {
> + dev_dbg(dev, "Invalid poison source %d",
> + source);

Per above, just emit the raw field value and leave parsing values to
userspace.

> + source = CXL_POISON_SOURCE_INVALID;
> + }
> +
> + trace_cxl_poison_list(dev, source, addr, len);
> + }
> +
> + /* Protect against an uncleared _FLAG_MORE */
> + nr_records = nr_records + le16_to_cpu(po->count);
> + if (nr_records >= cxlds->poison_max)
> + goto out;

This also feels like something that should have a Linux max as well,
something like:

"cxlds->poison_max = min(identify->poison_max, CXL_LIST_POISON_MAX)"

...where CXL_LIST_POISON_MAX is a "reasonable" maximum for a cache of
hardware media poison errors like 1024.

> +
> + } while (po->flags & CXL_POISON_FLAG_MORE);
> +
> +out:
> + kvfree(po);
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> +
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> {
> struct cxl_dev_state *cxlds;
> --
> 2.31.1
>

2022-06-17 19:06:36

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

alison.schofield@ wrote:
> From: Alison Schofield <[email protected]>
>
> The sysfs attribute, get_poison, allows user space to request the
> retrieval of a CXL devices poison list for its persistent memory.

If the device supports get poison list for volatile memory, just grab
that too. With the "to be released soon" region patches userspace can
trivially translate DPA addresses to media type.

>
> From Documentation/ABI/.../sysfs-bus-cxl
> (WO) When a '1' is written to this attribute the memdev
> driver retrieves the poison list from the device. The list
> includes addresses that are poisoned or would result in
> poison if accessed, and the source of the poison. This
> attribute is only visible for devices supporting the
> capability. The retrieved errors are logged as kernel
> trace events with the label: cxl_poison_list.
>
> Signed-off-by: Alison Schofield <[email protected]>
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 13 ++++++++++
> drivers/cxl/core/memdev.c | 32 +++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 7c2b846521f3..9d0c3988fdd2 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -163,3 +163,16 @@ Description:
> memory (type-3). The 'target_type' attribute indicates the
> current setting which may dynamically change based on what
> memory regions are activated in this decode hierarchy.
> +
> +What: /sys/bus/cxl/devices/memX/get_poison
> +Date: June, 2022
> +KernelVersion: v5.20
> +Contact: [email protected]
> +Description:
> + (WO) When a '1' is written to this attribute the memdev
> + driver retrieves the poison list from the device. The list
> + includes addresses that are poisoned or would result in
> + poison if accessed, and the source of the poison. This
> + attribute is only visible for devices supporting the
> + capability. The retrieved errors are logged as kernel
> + trace events with the label: cxl_poison_list.
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f7cdcd33504a..5ef9ffaa934a 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -106,12 +106,34 @@ static ssize_t numa_node_show(struct device *dev, struct device_attribute *attr,
> }
> static DEVICE_ATTR_RO(numa_node);
>
> +static ssize_t get_poison_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +
> +{
> + int rc;
> +
> + if (!sysfs_streq(buf, "1")) {

kstrtobool()?

> + dev_err(dev, "%s: unknown value: %s\n", attr->attr.name, buf);

dev_err() is overkill for sysfs errors. dev_dbg() can be nice for errors
that trigger deep within the kernel in response to a sysfs write. In
this case EINVAL return is sufficient.

> + return -EINVAL;
> + }
> +
> + rc = cxl_mem_get_poison_list(dev);
> + if (rc) {
> + dev_err(dev, "Failed to retrieve poison list %d\n", rc);

Too chatty, dev_dbg() or delete.

> + return rc;
> + }
> + return len;
> +}
> +static DEVICE_ATTR_WO(get_poison);
> +
> static struct attribute *cxl_memdev_attributes[] = {
> &dev_attr_serial.attr,
> &dev_attr_firmware_version.attr,
> &dev_attr_payload_max.attr,
> &dev_attr_label_storage_size.attr,
> &dev_attr_numa_node.attr,
> + &dev_attr_get_poison.attr,
> NULL,
> };
>
> @@ -130,6 +152,16 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> {
> if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> return 0;
> +
> + if (a == &dev_attr_get_poison.attr) {
> + struct device *dev = container_of(kobj, struct device, kobj);

Use the kobj_to_dev() helper.

> + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +
> + if (!test_bit(CXL_MEM_COMMAND_ID_GET_POISON,
> + cxlds->enabled_cmds))
> + return 0;
> + }
> return a->mode;
> }
>
> --
> 2.31.1
>

2022-06-17 19:07:14

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

Alison Schofield wrote:
> On Thu, Jun 16, 2022 at 12:43:34PM -0700, Davidlohr Bueso wrote:
> > On Tue, 14 Jun 2022, [email protected] wrote:
> >
> > >From: Alison Schofield <[email protected]>
> > >
> > >CXL devices that support persistent memory maintain a list of locations
> > >that are poisoned or result in poison if the addresses are accessed by
> > >the host.
> > >
> > >Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > >list as a set of Media Error Records that include the source of the
> > >error, the starting device physical address and length. The length is
> > >the number of adjacent DPAs in the record and is in units of 64 bytes.
> > >
> > >Retrieve the list and log each Media Error Record as a trace event of
> > >type cxl_poison_list.
> > >
> > >Signed-off-by: Alison Schofield <[email protected]>
> > >---
> > > drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> > > drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 118 insertions(+)
> > >
> snip
>
> > >+int cxl_mem_get_poison_list(struct device *dev)
> > >+{
> > >+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > >+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > >+ struct cxl_mbox_poison_payload_out *po;
> > >+ struct cxl_mbox_poison_payload_in pi;
> > >+ int nr_records = 0;
> > >+ int rc, i;
> > >+
> > >+ if (range_len(&cxlds->pmem_range)) {
> > >+ pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > >+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
>
> First off - you stopped at a bug here - that pi.length needs to be
> in units of 64 bytes.
> >
> > Do you ever see this changing to not always use the full pmem DPA range
> > but allow arbitrary ones? I also assume this is the reason why you don't
> > check the range vs cxlds->ram_range to prevent any overlaps, no?
> >
> > Thanks,
> > Davidlohr
>
> David - Great question!
>
> I'm headed in this direction -
>
> cxl list --media-errors -m mem1
> lists media errors for requested memdev
>
> cxl list --media-errors -r region#
> lists region errors with HPA addresses
> (So here cxl tool will collect the poison for all the regions
> memdevs and do the DPA to HPA translation)
>
> To answer your question, I wasn't thinking of limiting
> the range within the memdev, but certainly could. And if we were
> taking in ranges, those ranges would need to be checked.
>
> $cxl list --media-errors -m mem1 --range-start= --range-end|len=
>
> Now, if I left the sysfs inteface as is, the driver will read the
> entire poison list for the memdev and then cxl tool will filter it
> for the range requested.
>
> Or, maybe we should implement in libcxl (not sysfs), with memdev and
> range options and only collect from the device the range requested.
>
> Either one looks the same to the cxl tool user, but limiting the
> range we send to the device would certainly cut down on unwanted
> records being logged, retrieved, and examined.
>
> I'd like to hear more from you and other community members.

There is some history here that is relevant to this design. CXL Get
Poison List builds on lessons learned from the ACPI "Address Range
Scrub" mechanism that was deployed for ACPI described persistent memory
platform (See ACPI 6.4 9.20.7.2 "Address Range Scrubbing (ARS)
Overview"). In that case there was no expectation that the device
maintained a cached and coherent (with incoming poison writes) copy of
the media error list. CXL Get Poison List in comparison is meant to
obviate the need to perform Scan Media in most scenarios, and it is
lightweight enough that userspace need not have a mechanism to request
errors by range, in my opinion.

One of the design warts of drivers/acpi/nfit/ that I want to get away
from in the case of drivers/cxl/ is snooping the equivalent of ARS
command results to populate a kernel list of poison addresses and
instead put the onus on userspace to collect DPA events and optionally
inform the kernel of the HPA impacts. For example, DAX filesystems will
soon have the ability to do something useful with poison notifications
[1], but that support is limited to synchronously consumed poison
flagged by memory_failure(). When the cxl tool translates the poison
list to HPA it needs an ABI to turn around and notify the filesystem
about which blocks got clobbered.

[1]: https://lore.kernel.org/all/[email protected]/

2022-06-17 19:34:31

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

Jonathan Cameron wrote:
> On Tue, 14 Jun 2022 17:10:27 -0700
> [email protected] wrote:
>
> > From: Alison Schofield <[email protected]>
> >
> > CXL devices that support persistent memory maintain a list of locations
> > that are poisoned or result in poison if the addresses are accessed by
> > the host.
> >
> > Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > list as a set of Media Error Records that include the source of the
> > error, the starting device physical address and length. The length is
> > the number of adjacent DPAs in the record and is in units of 64 bytes.
> >
> > Retrieve the list and log each Media Error Record as a trace event of
> > type cxl_poison_list.
> >
> > Signed-off-by: Alison Schofield <[email protected]>
>
> A few more things inline.
>
> Otherwise, can confirm it works with some hack QEMU code.
> I'll tidy that up and post soon.
>
> > +int cxl_mem_get_poison_list(struct device *dev)
> > +{
> > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > + struct cxl_mbox_poison_payload_out *po;
> > + struct cxl_mbox_poison_payload_in pi;
> > + int nr_records = 0;
> > + int rc, i;
> > +
> > + if (range_len(&cxlds->pmem_range)) {
> > + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> > + } else {
> > + return -ENXIO;
> > + }
> > +
> > + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> > + if (!po)
> > + return -ENOMEM;
> > +
> > + do {
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> > + sizeof(pi), po, cxlds->payload_size);
> > + if (rc)
> > + goto out;
> > +
> > + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> > + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> > +
> > + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> > + &o_time);
> > + rc = -ENXIO;
> > + goto out;
> > + }
> > +
> > + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> > + dev_err(dev, "Scan Media in Progress\n");
> > + rc = -EBUSY;
> > + goto out;
> > + }
> > +
> > + for (i = 0; i < le16_to_cpu(po->count); i++) {
> > + u64 addr = le64_to_cpu(po->record[i].address);
>
> > + u32 len = le32_to_cpu(po->record[i].length);
>
>
> > + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> > +
> > + if (!CXL_POISON_SOURCE_VALID(source)) {
> > + dev_dbg(dev, "Invalid poison source %d",
> > + source);
> > + source = CXL_POISON_SOURCE_INVALID;
> > + }
> > +
> > + trace_cxl_poison_list(dev, source, addr, len);
>
> Need to mask off the lower 6 bits of addr as they contain the source
> + a few reserved bits.
>
> I was confused how you were geting better than 64 byte precision in your
> example.
>
> > + }
> > +
> > + /* Protect against an uncleared _FLAG_MORE */
> > + nr_records = nr_records + le16_to_cpu(po->count);
> > + if (nr_records >= cxlds->poison_max)
> > + goto out;
> > +
> > + } while (po->flags & CXL_POISON_FLAG_MORE);
> So.. A conundrum here. What happens if:
>
> 1. We get an error mid way through a set of multiple reads
> (something intermittent - maybe a software issue)
> 2. We will drop out of here fine and report the error.
> 3. We run this function again.
>
> It will (I think) currently pick up where we left off, but we have
> no way of knowing that as there isn't a 'total records' count or
> any other form of index in the output payload.
>
> So, software solutions I think should work (though may warrant a note
> to be added to the spec).
>
> 1. Read whole thing twice. First time is just to ensure we get
> to the end and flush out any prior half done reads.
> 2. Issue a read for a different region (perhaps length 0) first
> and assume everything starts from scratch when we go back to
> this region.

It would be nice if this was codified as *the* way to reset the
retrieval, but I worry that neither length==0 or length==1 can be used
for this purpose since the "more" bit is attached to the last passed in
*range*, not request. I.e. spec seems to allow for overlapping
retrievals, although I doubt any implementation gets that fancy.

I think it is sufficient to just include the "more" flag in the trace
event and if userspace suspects that it is getting "more" results from a
previous run it can reissue the scan. This is another reason that the
trace event should include the pid of the process that triggered the
results so it can delineate re-requests. Otherwise, the poison list
cache is opportunistic so I am not sure that missing records in this
corner case is fatal.

2022-06-17 19:50:16

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

Alison Schofield wrote:
> On Fri, Jun 17, 2022 at 07:05:08AM -0700, Jonathan Cameron wrote:
> > On Tue, 14 Jun 2022 17:10:27 -0700
> > [email protected] wrote:
> >
> > > From: Alison Schofield <[email protected]>
> > >
> > > CXL devices that support persistent memory maintain a list of locations
> > > that are poisoned or result in poison if the addresses are accessed by
> > > the host.
> > >
> > > Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > > list as a set of Media Error Records that include the source of the
> > > error, the starting device physical address and length. The length is
> > > the number of adjacent DPAs in the record and is in units of 64 bytes.
> > >
> > > Retrieve the list and log each Media Error Record as a trace event of
> > > type cxl_poison_list.
> > >
> > > Signed-off-by: Alison Schofield <[email protected]>
> >
> > A few more things inline.
> >
> > Otherwise, can confirm it works with some hack QEMU code.
> > I'll tidy that up and post soon.
> >
> > > +int cxl_mem_get_poison_list(struct device *dev)
> > > +{
> snip
> > > +
> > > + trace_cxl_poison_list(dev, source, addr, len);
> >
> > Need to mask off the lower 6 bits of addr as they contain the source
> > + a few reserved bits.
> >
> > I was confused how you were geting better than 64 byte precision in your
> > example.
> >
> Ah...got it. Thanks!
>
> > > + }
> > > +
> > > + /* Protect against an uncleared _FLAG_MORE */
> > > + nr_records = nr_records + le16_to_cpu(po->count);
> > > + if (nr_records >= cxlds->poison_max)
> > > + goto out;
> > > +
> > > + } while (po->flags & CXL_POISON_FLAG_MORE);
> > So.. A conundrum here. What happens if:
> >
> > 1. We get an error mid way through a set of multiple reads
> > (something intermittent - maybe a software issue)
> > 2. We will drop out of here fine and report the error.
> > 3. We run this function again.
> >
> > It will (I think) currently pick up where we left off, but we have
> > no way of knowing that as there isn't a 'total records' count or
> > any other form of index in the output payload.
>
> Yes. That is sad. I'm assume it's by design and CXL devices never
> intended to keep any totals.
>
> >
> > So, software solutions I think should work (though may warrant a note
> > to be added to the spec).
> >
> > 1. Read whole thing twice. First time is just to ensure we get
> > to the end and flush out any prior half done reads.
> > 2. Issue a read for a different region (perhaps length 0) first
> > and assume everything starts from scratch when we go back to
> > this region.
>
> Can you tell me more about 2 ?
>
> Also, Since posting this I have added protection to this path to ensure
> only one reader of the poison list for this device. Like this:
>
> if (!completion_done(&cxlds->read_poison_complete);
> return -EBUSY;
> wait_for_completion_interruptible(&cxlds->read_poison_complete);
> ...GET ALL THE POISON...
> complete(&cxlds->read_poison_complete);

Since this runs in the context of the requester a completion feels out
of place. What this probably wants is a mutex() protecting the state
machine of the Media Error Record retrieval and the "more" flag.

>
> And will add the error message on that unexpected _FLAG_MORE too.
>
> Alison
> >
> > Jonathan
> >
>
>
>
> > > +
> > > +out:
> > > + kvfree(po);
> > > + return rc;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> > > +
> > > struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> > > {
> > > struct cxl_dev_state *cxlds;
> >


2022-06-18 00:28:31

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

On Fri, Jun 17, 2022 at 11:42:11AM -0700, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <[email protected]>
> >
> > The sysfs attribute, get_poison, allows user space to request the
> > retrieval of a CXL devices poison list for its persistent memory.
>
> If the device supports get poison list for volatile memory, just grab
> that too. With the "to be released soon" region patches userspace can
> trivially translate DPA addresses to media type.
>

Dan,

The only way I know to discover if the device supports poison list for
volatile is to do the get_poison_list on the volatile range and see
what happens. Am I missing a capability setting somewhere?

Here's a blanket "Got it, Thanks!" for all the other pieces.

Alison


> >
snip

2022-06-18 01:11:26

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

Alison Schofield wrote:
> On Fri, Jun 17, 2022 at 11:42:11AM -0700, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <[email protected]>
> > >
> > > The sysfs attribute, get_poison, allows user space to request the
> > > retrieval of a CXL devices poison list for its persistent memory.
> >
> > If the device supports get poison list for volatile memory, just grab
> > that too. With the "to be released soon" region patches userspace can
> > trivially translate DPA addresses to media type.
> >
>
> Dan,
>
> The only way I know to discover if the device supports poison list for
> volatile is to do the get_poison_list on the volatile range and see
> what happens. Am I missing a capability setting somewhere?

If someone executes "echo 1 > trace_poison_list" I expect that the
driver does:

get_poison_list(volatile_range);
get_poison_list(pmem_range);

...and if scanning the volatile partition ends in error then that just
means no error records appear. When the error is "Invalid Physical
Address" the driver can just remember that's a permanent error and never
try again. So it's more like:

if (volatile_range_valid) {
if (get_poison_list(volatile_range) == INVALID_PHYS_ADDR)
volatile_range_valid = false;
}
get_poison_list(pmem_range);

...but that's probably overkill since get_poison_list() is cheap. Just
treat it like the zero error records case.

In the to be released region provisioning patches there is a DPA
resource tree partitioned by DPA mode type, so the poison list code
probably wants to do something like:

down_read(&cxl_dpa_rwsem);
for (p = cxlds->dpa_res.child; p; p = p->sibling)
get_poison_list(p->start, resource_size(p));
up_read(&cxl_dpa_rwsem);

2022-06-18 02:12:53

by Alison Schofield

[permalink] [raw]
Subject: Re: [PATCH 3/3] cxl/core: Add sysfs attribute get_poison for list retrieval

On Fri, Jun 17, 2022 at 06:08:52PM -0700, Dan Williams wrote:
> Alison Schofield wrote:
> > On Fri, Jun 17, 2022 at 11:42:11AM -0700, Dan Williams wrote:
> > > alison.schofield@ wrote:
> > > > From: Alison Schofield <[email protected]>
> > > >
> > > > The sysfs attribute, get_poison, allows user space to request the
> > > > retrieval of a CXL devices poison list for its persistent memory.
> > >
> > > If the device supports get poison list for volatile memory, just grab
> > > that too. With the "to be released soon" region patches userspace can
> > > trivially translate DPA addresses to media type.
> > >
> >
> > Dan,
> >
> > The only way I know to discover if the device supports poison list for
> > volatile is to do the get_poison_list on the volatile range and see
> > what happens. Am I missing a capability setting somewhere?
>
> If someone executes "echo 1 > trace_poison_list" I expect that the
> driver does:
>
> get_poison_list(volatile_range);
> get_poison_list(pmem_range);
>
> ...and if scanning the volatile partition ends in error then that just
> means no error records appear. When the error is "Invalid Physical
> Address" the driver can just remember that's a permanent error and never
> try again. So it's more like:
>
> if (volatile_range_valid) {
> if (get_poison_list(volatile_range) == INVALID_PHYS_ADDR)
> volatile_range_valid = false;
> }
> get_poison_list(pmem_range);
>
> ...but that's probably overkill since get_poison_list() is cheap. Just
> treat it like the zero error records case.

Got it!

>
> In the to be released region provisioning patches there is a DPA
> resource tree partitioned by DPA mode type, so the poison list code
> probably wants to do something like:
>
> down_read(&cxl_dpa_rwsem);
> for (p = cxlds->dpa_res.child; p; p = p->sibling)
> get_poison_list(p->start, resource_size(p));
> up_read(&cxl_dpa_rwsem);

Great ending to the week! This is going to make collecting the
poison per region much simpler than I was imagining :)



2022-06-20 11:20:55

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Fri, 17 Jun 2022 12:02:32 -0700
Dan Williams <[email protected]> wrote:

> Alison Schofield wrote:
> > On Thu, Jun 16, 2022 at 12:43:34PM -0700, Davidlohr Bueso wrote:
> > > On Tue, 14 Jun 2022, [email protected] wrote:
> > >
> > > >From: Alison Schofield <[email protected]>
> > > >
> > > >CXL devices that support persistent memory maintain a list of locations
> > > >that are poisoned or result in poison if the addresses are accessed by
> > > >the host.
> > > >
> > > >Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > > >list as a set of Media Error Records that include the source of the
> > > >error, the starting device physical address and length. The length is
> > > >the number of adjacent DPAs in the record and is in units of 64 bytes.
> > > >
> > > >Retrieve the list and log each Media Error Record as a trace event of
> > > >type cxl_poison_list.
> > > >
> > > >Signed-off-by: Alison Schofield <[email protected]>
> > > >---
> > > > drivers/cxl/cxlmem.h | 43 +++++++++++++++++++++++
> > > > drivers/cxl/core/mbox.c | 75 +++++++++++++++++++++++++++++++++++++++++
> > > > 2 files changed, 118 insertions(+)
> > > >
> > snip
> >
> > > >+int cxl_mem_get_poison_list(struct device *dev)
> > > >+{
> > > >+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > >+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > >+ struct cxl_mbox_poison_payload_out *po;
> > > >+ struct cxl_mbox_poison_payload_in pi;
> > > >+ int nr_records = 0;
> > > >+ int rc, i;
> > > >+
> > > >+ if (range_len(&cxlds->pmem_range)) {
> > > >+ pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > > >+ pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> >
> > First off - you stopped at a bug here - that pi.length needs to be
> > in units of 64 bytes.
> > >
> > > Do you ever see this changing to not always use the full pmem DPA range
> > > but allow arbitrary ones? I also assume this is the reason why you don't
> > > check the range vs cxlds->ram_range to prevent any overlaps, no?
> > >
> > > Thanks,
> > > Davidlohr
> >
> > David - Great question!
> >
> > I'm headed in this direction -
> >
> > cxl list --media-errors -m mem1
> > lists media errors for requested memdev
> >
> > cxl list --media-errors -r region#
> > lists region errors with HPA addresses
> > (So here cxl tool will collect the poison for all the regions
> > memdevs and do the DPA to HPA translation)
> >
> > To answer your question, I wasn't thinking of limiting
> > the range within the memdev, but certainly could. And if we were
> > taking in ranges, those ranges would need to be checked.
> >
> > $cxl list --media-errors -m mem1 --range-start= --range-end|len=
> >
> > Now, if I left the sysfs inteface as is, the driver will read the
> > entire poison list for the memdev and then cxl tool will filter it
> > for the range requested.
> >
> > Or, maybe we should implement in libcxl (not sysfs), with memdev and
> > range options and only collect from the device the range requested.
> >
> > Either one looks the same to the cxl tool user, but limiting the
> > range we send to the device would certainly cut down on unwanted
> > records being logged, retrieved, and examined.
> >
> > I'd like to hear more from you and other community members.
>
> There is some history here that is relevant to this design. CXL Get
> Poison List builds on lessons learned from the ACPI "Address Range
> Scrub" mechanism that was deployed for ACPI described persistent memory
> platform (See ACPI 6.4 9.20.7.2 "Address Range Scrubbing (ARS)
> Overview"). In that case there was no expectation that the device
> maintained a cached and coherent (with incoming poison writes) copy of
> the media error list. CXL Get Poison List in comparison is meant to
> obviate the need to perform Scan Media in most scenarios, and it is
> lightweight enough that userspace need not have a mechanism to request
> errors by range, in my opinion.
>
> One of the design warts of drivers/acpi/nfit/ that I want to get away
> from in the case of drivers/cxl/ is snooping the equivalent of ARS
> command results to populate a kernel list of poison addresses and
> instead put the onus on userspace to collect DPA events and optionally
> inform the kernel of the HPA impacts.

Can you give more info on why such an in kernel flow doesn't work
well for this usecase (particularly for volatile memory)? I don't
much like the flow differing from how we do DRAM patrol scrub based
handling today. I'm not sure I like the requirement for userspace
to be in the loop if we are dealing with volatile failures even if
the detection is async.

> For example, DAX filesystems will
> soon have the ability to do something useful with poison notifications
> [1], but that support is limited to synchronously consumed poison
> flagged by memory_failure(). When the cxl tool translates the poison
> list to HPA it needs an ABI to turn around and notify the filesystem
> about which blocks got clobbered.

I'm not clear why it makes sense to do the DPA to HPA conversion in
userspace. It should be a fairly trivial bit of code to do the reverse look
up in kernel and then we just bolt into existing infrastructure.

Jonathan

>
> [1]: https://lore.kernel.org/all/[email protected]/



2022-06-20 11:21:03

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Fri, 17 Jun 2022 09:29:35 -0700
Alison Schofield <[email protected]> wrote:

> On Fri, Jun 17, 2022 at 07:05:08AM -0700, Jonathan Cameron wrote:
> > On Tue, 14 Jun 2022 17:10:27 -0700
> > [email protected] wrote:
> >
> > > From: Alison Schofield <[email protected]>
> > >
> > > CXL devices that support persistent memory maintain a list of locations
> > > that are poisoned or result in poison if the addresses are accessed by
> > > the host.
> > >
> > > Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > > list as a set of Media Error Records that include the source of the
> > > error, the starting device physical address and length. The length is
> > > the number of adjacent DPAs in the record and is in units of 64 bytes.
> > >
> > > Retrieve the list and log each Media Error Record as a trace event of
> > > type cxl_poison_list.
> > >
> > > Signed-off-by: Alison Schofield <[email protected]>
> >
> > A few more things inline.
> >
> > Otherwise, can confirm it works with some hack QEMU code.
> > I'll tidy that up and post soon.
> >
> > > +int cxl_mem_get_poison_list(struct device *dev)
> > > +{
> snip
> > > +
> > > + trace_cxl_poison_list(dev, source, addr, len);
> >
> > Need to mask off the lower 6 bits of addr as they contain the source
> > + a few reserved bits.
> >
> > I was confused how you were geting better than 64 byte precision in your
> > example.
> >
> Ah...got it. Thanks!
>
> > > + }
> > > +
> > > + /* Protect against an uncleared _FLAG_MORE */
> > > + nr_records = nr_records + le16_to_cpu(po->count);
> > > + if (nr_records >= cxlds->poison_max)
> > > + goto out;
> > > +
> > > + } while (po->flags & CXL_POISON_FLAG_MORE);
> > So.. A conundrum here. What happens if:
> >
> > 1. We get an error mid way through a set of multiple reads
> > (something intermittent - maybe a software issue)
> > 2. We will drop out of here fine and report the error.
> > 3. We run this function again.
> >
> > It will (I think) currently pick up where we left off, but we have
> > no way of knowing that as there isn't a 'total records' count or
> > any other form of index in the output payload.
>
> Yes. That is sad. I'm assume it's by design and CXL devices never
> intended to keep any totals.
>
> >
> > So, software solutions I think should work (though may warrant a note
> > to be added to the spec).
> >
> > 1. Read whole thing twice. First time is just to ensure we get
> > to the end and flush out any prior half done reads.
> > 2. Issue a read for a different region (perhaps length 0) first
> > and assume everything starts from scratch when we go back to
> > this region.
>
> Can you tell me more about 2 ?

2 relies on interpreting what the spec says for an unusual corner case.
The concept of 'more available' is something I would assume you'd only
get if you issued a repeat of the same request. I don't think the
spec actually covers this case - perhaps that's something we need to
raise via the appropriate channels.

Jonathan



>
> Also, Since posting this I have added protection to this path to ensure
> only one reader of the poison list for this device. Like this:
>
> if (!completion_done(&cxlds->read_poison_complete);
> return -EBUSY;
> wait_for_completion_interruptible(&cxlds->read_poison_complete);
> ...GET ALL THE POISON...
> complete(&cxlds->read_poison_complete);
>
> And will add the error message on that unexpected _FLAG_MORE too.
>
> Alison
> >
> > Jonathan
> >
>
>
>
> > > +
> > > +out:
> > > + kvfree(po);
> > > + return rc;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison_list, CXL);
> > > +
> > > struct cxl_dev_state *cxl_dev_state_create(struct device *dev)
> > > {
> > > struct cxl_dev_state *cxlds;
> >

2022-06-20 11:49:01

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 2/3] cxl/mbox: Add GET_POISON_LIST mailbox command support

On Fri, 17 Jun 2022 12:27:38 -0700
Dan Williams <[email protected]> wrote:

> Jonathan Cameron wrote:
> > On Tue, 14 Jun 2022 17:10:27 -0700
> > [email protected] wrote:
> >
> > > From: Alison Schofield <[email protected]>
> > >
> > > CXL devices that support persistent memory maintain a list of locations
> > > that are poisoned or result in poison if the addresses are accessed by
> > > the host.
> > >
> > > Per the spec (CXL 2.0 8.2.8.5.4.1), the device returns this Poison
> > > list as a set of Media Error Records that include the source of the
> > > error, the starting device physical address and length. The length is
> > > the number of adjacent DPAs in the record and is in units of 64 bytes.
> > >
> > > Retrieve the list and log each Media Error Record as a trace event of
> > > type cxl_poison_list.
> > >
> > > Signed-off-by: Alison Schofield <[email protected]>
> >
> > A few more things inline.
> >
> > Otherwise, can confirm it works with some hack QEMU code.
> > I'll tidy that up and post soon.
> >
> > > +int cxl_mem_get_poison_list(struct device *dev)
> > > +{
> > > + struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > + struct cxl_mbox_poison_payload_out *po;
> > > + struct cxl_mbox_poison_payload_in pi;
> > > + int nr_records = 0;
> > > + int rc, i;
> > > +
> > > + if (range_len(&cxlds->pmem_range)) {
> > > + pi.offset = cpu_to_le64(cxlds->pmem_range.start);
> > > + pi.length = cpu_to_le64(range_len(&cxlds->pmem_range));
> > > + } else {
> > > + return -ENXIO;
> > > + }
> > > +
> > > + po = kvmalloc(cxlds->payload_size, GFP_KERNEL);
> > > + if (!po)
> > > + return -ENOMEM;
> > > +
> > > + do {
> > > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_POISON, &pi,
> > > + sizeof(pi), po, cxlds->payload_size);
> > > + if (rc)
> > > + goto out;
> > > +
> > > + if (po->flags & CXL_POISON_FLAG_OVERFLOW) {
> > > + time64_t o_time = le64_to_cpu(po->overflow_timestamp);
> > > +
> > > + dev_err(dev, "Poison list overflow at %ptTs UTC\n",
> > > + &o_time);
> > > + rc = -ENXIO;
> > > + goto out;
> > > + }
> > > +
> > > + if (po->flags & CXL_POISON_FLAG_SCANNING) {
> > > + dev_err(dev, "Scan Media in Progress\n");
> > > + rc = -EBUSY;
> > > + goto out;
> > > + }
> > > +
> > > + for (i = 0; i < le16_to_cpu(po->count); i++) {
> > > + u64 addr = le64_to_cpu(po->record[i].address);
> >
> > > + u32 len = le32_to_cpu(po->record[i].length);
> >
> >
> > > + int source = FIELD_GET(CXL_POISON_SOURCE_MASK, addr);
> > > +
> > > + if (!CXL_POISON_SOURCE_VALID(source)) {
> > > + dev_dbg(dev, "Invalid poison source %d",
> > > + source);
> > > + source = CXL_POISON_SOURCE_INVALID;
> > > + }
> > > +
> > > + trace_cxl_poison_list(dev, source, addr, len);
> >
> > Need to mask off the lower 6 bits of addr as they contain the source
> > + a few reserved bits.
> >
> > I was confused how you were geting better than 64 byte precision in your
> > example.
> >
> > > + }
> > > +
> > > + /* Protect against an uncleared _FLAG_MORE */
> > > + nr_records = nr_records + le16_to_cpu(po->count);
> > > + if (nr_records >= cxlds->poison_max)
> > > + goto out;
> > > +
> > > + } while (po->flags & CXL_POISON_FLAG_MORE);
> > So.. A conundrum here. What happens if:
> >
> > 1. We get an error mid way through a set of multiple reads
> > (something intermittent - maybe a software issue)
> > 2. We will drop out of here fine and report the error.
> > 3. We run this function again.
> >
> > It will (I think) currently pick up where we left off, but we have
> > no way of knowing that as there isn't a 'total records' count or
> > any other form of index in the output payload.
> >
> > So, software solutions I think should work (though may warrant a note
> > to be added to the spec).
> >
> > 1. Read whole thing twice. First time is just to ensure we get
> > to the end and flush out any prior half done reads.
> > 2. Issue a read for a different region (perhaps length 0) first
> > and assume everything starts from scratch when we go back to
> > this region.
>
> It would be nice if this was codified as *the* way to reset the
> retrieval, but I worry that neither length==0 or length==1 can be used
> for this purpose since the "more" bit is attached to the last passed in
> *range*, not request. I.e. spec seems to allow for overlapping
> retrievals, although I doubt any implementation gets that fancy.
>
> I think it is sufficient to just include the "more" flag in the trace
> event and if userspace suspects that it is getting "more" results from a
> previous run it can reissue the scan.

Meaning is a bit ugly if attached to an individual trace event, though
I guess we could do something nicer like only have one that doesn't have
MORE set, thus indicating that one trace event is the last one from a
query. i.e. fill in MORE for all the other events in the last GET_POISON_LIST
reply.

> This is another reason that the
> trace event should include the pid of the process that triggered the
> results so it can delineate re-requests. Otherwise, the poison list
> cache is opportunistic so I am not sure that missing records in this
> corner case is fatal.

Ok, for now let's document the limitation with an appropriate comment.
In parallel I've started a thread in appropriate venue to discuss if
we can clarify the spec and potentially do better in future. So that
discussion should shift over there.

Thanks,

Jonathan