LinuxLists.cc - [RFC PATCH 0/9] CXL: Read and clear event logs

2022-08-13 05:34:25

[permalink] [raw]

Subject: [RFC PATCH 0/9] CXL: Read and clear event logs

From: Ira Weiny <[email protected]>

Event records inform the OS of various device events. Events are not needed
for any kernel operation but various user level software will want to track
events.

Add event reporting through the trace event mechanism. On driver load read and
clear all device events.

Normally interrupts will trigger new events to be reported as they occur.
Because the interrupt code is still being worked on this series provides a
cxl-test mechanism to create a series of events and trigger the reporting of
those events.

This series is submitted as an RFC for a few reasons:

1) Interrupt support is still missing
2) I'd like to get comments on the format of the trace events
3) Some of the event formats are badly aligned and I would like to see
if there is any clarification on how the data will be formatted
(See individual patches for details)

Ira Weiny (9):
cxl/mem: Implement Get Event Records command
cxl/mem: Implement Clear Event Records command
cxl/mem: Clear events on driver load
cxl/mem: Trace General Media Event Record
cxl/mem: Trace DRAM Event Record
cxl/mem: Trace Memory Module Event Record
cxl/test: Add generic mock events
cxl/test: Add specific events
cxl/test: Simulate event log overflow

MAINTAINERS | 1 +
drivers/cxl/core/mbox.c | 143 ++++++++
drivers/cxl/cxlmem.h | 149 +++++++++
drivers/cxl/pci.c | 2 +
include/trace/events/cxl-events.h | 521 ++++++++++++++++++++++++++++++
include/uapi/linux/cxl_mem.h | 2 +
tools/testing/cxl/test/mem.c | 399 +++++++++++++++++++++++
7 files changed, 1217 insertions(+)
create mode 100644 include/trace/events/cxl-events.h

base-commit: 1cd8a2537eb07751d405ab7e2223f20338a90506
--
2.35.3

2022-08-13 05:34:41

[permalink] [raw]

Subject: [RFC PATCH 2/9] cxl/mem: Implement Clear Event Records command

From: Ira Weiny <[email protected]>

CXL v3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
command. After an event record is read it needs to be cleared from the
event log.

Implement cxl_clear_event_record() and call it for each record retrieved
from the device.

Each record is cleared individually. A clear all bit is specified but
events could arrive between a get and the final clear all operation.
Therefore each event is cleared specifically.

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 31 ++++++++++++++++++++++++++++---
drivers/cxl/cxlmem.h | 15 +++++++++++++++
include/uapi/linux/cxl_mem.h | 1 +
3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 2cceed8608dc..493f5ceb5d1c 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
#endif
CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
+ CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -708,6 +709,26 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);

+static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
+ enum cxl_event_log_type log,
+ __le16 handle)
+{
+ struct cxl_mbox_clear_event_payload payload;
+ int rc;
+
+ memset(&payload, 0, sizeof(payload));
+ payload.event_log = log;
+ payload.nr_recs = 1;
+ payload.handle = handle;
+
+ rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
+ &payload, sizeof(payload), NULL, 0);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
enum cxl_event_log_type type)
{
@@ -725,9 +746,12 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
return rc;

record_count = le16_to_cpu(payload.record_count);
- if (record_count > 0)
+ if (record_count > 0) {
trace_cxl_event(dev_name(cxlds->dev), type,
&payload.record);
+ cxl_clear_event_record(cxlds, type,
+ payload.record.hdr.handle);
+ }

if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
trace_cxl_event_overflow(dev_name(cxlds->dev), type,
@@ -742,10 +766,11 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
* cxl_mem_get_event_records - Get Event Records from the device
* @cxlds: The device data for the operation
*
- * Retrieve all event records available on the device and report them as trace
- * events.
+ * Retrieve all event records available on the device, report them as trace
+ * events, and clear them.
*
* See CXL v3.0 @8.2.9.2.2 Get Event Records
+ * See CXL v3.0 @8.2.9.2.3 Clear Event Records
*/
void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
{
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f83634f3bc8d..5506e7210cf6 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -255,6 +255,7 @@ enum cxl_opcode {
CXL_MBOX_OP_INVALID = 0x0000,
CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
+ CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
CXL_MBOX_OP_GET_FW_INFO = 0x0200,
CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
@@ -387,6 +388,20 @@ static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
return "<unknown>";
}

+/*
+ * Clear Event Records input payload
+ * CXL v3.0 section 8.2.9.2.3; Table 8-51
+ *
+ * Space given for 1 record
+ */
+struct cxl_mbox_clear_event_payload {
+ u8 event_log; /* enum cxl_event_log_type */
+ u8 clear_flags;
+ u8 nr_recs; /* 1 for this struct */
+ u8 reserved[3];
+ __le16 handle;
+};
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 70459be5bdd4..7c1ad8062792 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -25,6 +25,7 @@
___C(RAW, "Raw device command"), \
___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
___C(GET_EVENT_RECORD, "Get Event Record"), \
+ ___C(CLEAR_EVENT_RECORD, "Clear Event Record"), \
___C(GET_FW_INFO, "Get FW Info"), \
___C(GET_PARTITION_INFO, "Get Partition Information"), \
___C(GET_LSA, "Get Label Storage Area"), \
--
2.35.3

2022-08-13 05:34:52

[permalink] [raw]

Subject: [RFC PATCH 3/9] cxl/mem: Clear events on driver load

From: Ira Weiny <[email protected]>

The information contained in the events prior to the driver loading can
be queried at any time through other mailbox commands.

Ensure a clean slate of events by reading and clearing the events. The
events are sent to the trace buffer but it is not anticipated to have
anyone listening to it at driver load time.

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/pci.c | 2 ++
tools/testing/cxl/test/mem.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..5f1b492bd388 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -498,6 +498,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);

+ cxl_mem_get_event_records(cxlds);
+
if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index aa2df3a15051..e2f5445d24ff 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);

+ cxl_mem_get_event_records(cxlds);
+
if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
rc = devm_cxl_add_nvdimm(dev, cxlmd);

--
2.35.3

2022-08-13 05:47:50

[permalink] [raw]

Subject: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record

From: Ira Weiny <[email protected]>

CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.

Determine if the event read is a general media record and if so trace
the record.

Signed-off-by: Ira Weiny <[email protected]>

---
A couple of specification questions I've had.

1) The component id is not specified as a UUID or any particular
format. It is therefore reported as a byte array. Is this intentional?

2) This record has a very odd byte layout with a 16 bit field
(validity_flags) landing on a 3 byte boundary and a 3 byte bit field
(device) landing on a 7 byte boundary.

I've made my best guess as to how the endianess of these fields should
be resolved. But I'm happy to hear from other folks if what I have done
is wrong.

struct cxl_evt_gen_media {
struct cxl_event_record_hdr hdr;
__le64 phys_addr;
u8 descriptor;
u8 type;
u8 transaction_type;
u16 validity_flags; /* ??? */
u8 channel;
u8 rank;
u8 device[CXL_EVT_GEN_MED_DEV_SIZE]; /* ??? */
u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
} __packed;
---
drivers/cxl/core/mbox.c | 30 ++++++-
drivers/cxl/cxlmem.h | 19 +++++
include/trace/events/cxl-events.h | 125 ++++++++++++++++++++++++++++++
3 files changed, 172 insertions(+), 2 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 493f5ceb5d1c..0e433f072163 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -709,6 +709,32 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);

+/*
+ * General Media Event Record
+ * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+static const uuid_t gen_media_event_uuid =
+ UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
+
+static void cxl_trace_event_record(const char *dev_name,
+ enum cxl_event_log_type type,
+ struct cxl_get_event_payload *payload)
+{
+ uuid_t *id = &payload->record.hdr.id;
+
+ if (uuid_equal(id, &gen_media_event_uuid)) {
+ struct cxl_evt_gen_media *rec =
+ (struct cxl_evt_gen_media *)&payload->record;
+
+ trace_cxl_gen_media_event(dev_name, type, rec);
+ return;
+ }
+
+ /* For unknown record types print just the header */
+ trace_cxl_event(dev_name, type, &payload->record);
+}
+
static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
enum cxl_event_log_type log,
__le16 handle)
@@ -747,8 +773,8 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,

record_count = le16_to_cpu(payload.record_count);
if (record_count > 0) {
- trace_cxl_event(dev_name(cxlds->dev), type,
- &payload.record);
+ cxl_trace_event_record(dev_name(cxlds->dev), type,
+ &payload);
cxl_clear_event_record(cxlds, type,
payload.record.hdr.handle);
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5506e7210cf6..33669459ae4b 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -402,6 +402,25 @@ struct cxl_mbox_clear_event_payload {
__le16 handle;
};

+/*
+ * General Media Event Record
+ * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVT_GEN_MED_DEV_SIZE 3
+#define CXL_EVT_GEN_MED_COMP_ID_SIZE 0x10
+struct cxl_evt_gen_media {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u16 validity_flags;
+ u8 channel;
+ u8 rank;
+ u8 device[CXL_EVT_GEN_MED_DEV_SIZE];
+ u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
+} __packed;
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
index f4baeae66cf3..b51c51fd4e62 100644
--- a/include/trace/events/cxl-events.h
+++ b/include/trace/events/cxl-events.h
@@ -119,6 +119,131 @@ TRACE_EVENT(cxl_event,
)
);

+/*
+ * General Media Event Record - GMER
+ * CXL v2.0 Section 8.2.9.1.1.1; Table 154
+ */
+#define CXL_GMER_PHYS_ADDR_VOLATILE BIT(0)
+#define CXL_GMER_PHYS_ADDR_MASK 0x3f
+
+#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
+#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
+#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW BIT(2)
+#define show_event_desc_flags(flags) __print_flags(flags, "|", \
+ { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
+ { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
+ { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
+)
+
+#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
+#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
+#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
+#define show_mem_event_type(type) __print_symbolic(type, \
+ { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
+ { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
+ { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" } \
+)
+
+#define CXL_GMER_TRANS_UNKNOWN 0x00
+#define CXL_GMER_TRANS_HOST_READ 0x01
+#define CXL_GMER_TRANS_HOST_WRITE 0x02
+#define CXL_GMER_TRANS_HOST_SCAN_MEDIA 0x03
+#define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
+#define show_trans_type(type) __print_symbolic(type, \
+ { CXL_GMER_TRANS_UNKNOWN, "Unknown" }, \
+ { CXL_GMER_TRANS_HOST_READ, "Host Read" }, \
+ { CXL_GMER_TRANS_HOST_WRITE, "Host Write" }, \
+ { CXL_GMER_TRANS_HOST_SCAN_MEDIA, "Host Scan Media" }, \
+ { CXL_GMER_TRANS_HOST_INJECT_POISON, "Host Inject Poison" }, \
+ { CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB, "Internal Media Scrub" }, \
+ { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT, "Internal Media Management" } \
+)
+
+#define CXL_GMER_VALID_CHANNEL BIT(0)
+#define CXL_GMER_VALID_RANK BIT(1)
+#define CXL_GMER_VALID_DEVICE BIT(2)
+#define CXL_GMER_VALID_COMPONENT BIT(3)
+#define show_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_GMER_VALID_CHANNEL, "CHANNEL" }, \
+ { CXL_GMER_VALID_RANK, "RANK" }, \
+ { CXL_GMER_VALID_DEVICE, "DEVICE" }, \
+ { CXL_GMER_VALID_COMPONENT, "COMPONENT" } \
+)
+
+TRACE_EVENT(cxl_gen_media_event,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_evt_gen_media *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ /* Common */
+ __string(dev_name, dev_name)
+ __field(int, log)
+ __array(u8, id, UUID_SIZE)
+ __field(u32, flags)
+ __field(u16, handle)
+ __field(u16, related_handle)
+ __field(u64, timestamp)
+
+ /* General Media */
+ __field(u64, phys_addr)
+ __field(u8, descriptor)
+ __field(u8, type)
+ __field(u8, transaction_type)
+ __field(u8, channel)
+ __field(u32, device)
+ __array(u8, comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE)
+ __field(u16, validity_flags)
+ __field(u8, rank) /* Out of order to pack trace record */
+ ),
+
+ TP_fast_assign(
+ /* Common */
+ __assign_str(dev_name, dev_name);
+ memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+ __entry->log = log;
+ __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+ __entry->handle = le16_to_cpu(rec->hdr.handle);
+ __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+ __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+
+ /* General Media */
+ __entry->phys_addr = le64_to_cpu(rec->phys_addr);
+ __entry->descriptor = rec->descriptor;
+ __entry->type = rec->type;
+ __entry->transaction_type = rec->transaction_type;
+ __entry->channel = rec->channel;
+ __entry->rank = rec->rank;
+ __entry->device = rec->device[0] << 24 |
+ rec->device[1] << 16 |
+ rec->device[2] << 8; /* 3 byte LE ? */
+ __entry->device = le32_to_cpu(__entry->device);
+ memcpy(__entry->comp_id, &rec->component_id,
+ CXL_EVT_GEN_MED_COMP_ID_SIZE);
+ __entry->validity_flags = le16_to_cpu(rec->validity_flags);
+ ),
+
+ TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
+ "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
+ "rank=%u device=%x comp_id=%s valid_flags='%s'",
+ __get_str(dev_name), show_log_type(__entry->log),
+ __entry->timestamp, __entry->id, __entry->handle,
+ __entry->related_handle, show_hdr_flags(__entry->flags),
+ __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
+ (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
+ show_event_desc_flags(__entry->descriptor),
+ show_mem_event_type(__entry->type),
+ show_trans_type(__entry->transaction_type),
+ __entry->channel, __entry->rank, __entry->device,
+ __print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
+ show_valid_flags(__entry->validity_flags)
+ )
+);
+
#endif /* _CXL_TRACE_EVENTS_H */

/* This part must be outside protection */
--
2.35.3

2022-08-13 06:02:58

[permalink] [raw]

Subject: [RFC PATCH 8/9] cxl/test: Add specific events

From: Ira Weiny <[email protected]>

Each type of event has different trace point outputs.

Add mock General Media Event, DRAM event, and Memory Module Event
records to the mock list of events returned.

Signed-off-by: Ira Weiny <[email protected]>
---
tools/testing/cxl/test/mem.c | 70 ++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 87196d62acf5..c5d7857ae2e5 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -391,6 +391,70 @@ struct cxl_event_record_raw hardware_replace = {
.data = { 0xDE, 0xAD, 0xBE, 0xEF },
};

+struct cxl_evt_gen_media gen_media = {
+ .hdr = {
+ .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+ .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERMANENT << 8) |
+ sizeof(struct cxl_evt_gen_media)),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x2000),
+ .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+ .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+ .validity_flags = cpu_to_le16(CXL_GMER_VALID_CHANNEL |
+ CXL_GMER_VALID_RANK),
+ .channel = 1,
+ .rank = 30
+};
+
+struct cxl_evt_dram_rec dram_rec = {
+ .hdr = {
+ .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+ .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERF_DEGRADED << 8) |
+ sizeof(struct cxl_evt_dram_rec)),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x8000),
+ .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+ .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+ .validity_flags = cpu_to_le16(CXL_DER_VALID_CHANNEL |
+ CXL_DER_VALID_BANK_GROUP |
+ CXL_DER_VALID_BANK |
+ CXL_DER_VALID_COLUMN),
+ .channel = 1,
+ .bank_group = 5,
+ .bank = 2,
+ .column = cpu_to_le16(1024)
+};
+
+struct cxl_evt_mem_mod_rec mem_mod_rec = {
+ .hdr = {
+ .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+ .flags_length = cpu_to_le32(sizeof(struct cxl_evt_mem_mod_rec)),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .event_type = CXL_MMER_TEMP_CHANGE,
+ .info = {
+ .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+ .media_status = CXL_DHI_MS_ALL_DATA_LOST,
+ .add_status = (CXL_DHI_AS_CRITICAL << 2) |
+ (CXL_DHI_AS_WARNING << 4) |
+ (CXL_DHI_AS_WARNING << 5),
+ .device_temp = cpu_to_le16(1000),
+ .dirty_shutdown_cnt = cpu_to_le32(30000),
+ .cor_vol_err_cnt = cpu_to_le32(30100),
+ .cor_per_err_cnt = cpu_to_le32(40100),
+ }
+};
+
static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
{
struct device *dev = &cxlmd->dev;
@@ -414,8 +478,14 @@ static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
es->cxlds = cxlmd->cxlds;

event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
+ event_store_add_event(es, CXL_EVENT_TYPE_INFO,
+ (struct cxl_event_record_raw *)&gen_media);
+ event_store_add_event(es, CXL_EVENT_TYPE_INFO,
+ (struct cxl_event_record_raw *)&mem_mod_rec);

event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+ event_store_add_event(es, CXL_EVENT_TYPE_FATAL,
+ (struct cxl_event_record_raw *)&dram_rec);

store_event_store(es);
}
--
2.35.3

2022-08-13 06:04:37

[permalink] [raw]

Subject: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record

From: Ira Weiny <[email protected]>

CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.

Determine if the event read is memory module record and if so trace the
record.

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 16 +++
drivers/cxl/cxlmem.h | 25 +++++
include/trace/events/cxl-events.h | 155 ++++++++++++++++++++++++++++++
3 files changed, 196 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6414588a3c7b..99b09bfeaff5 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -725,6 +725,14 @@ static const uuid_t dram_event_uuid =
UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);

+/*
+ * Memory Module Event Record
+ * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+static const uuid_t mem_mod_event_uuid =
+ UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
+
static void cxl_trace_event_record(const char *dev_name,
enum cxl_event_log_type type,
struct cxl_get_event_payload *payload)
@@ -747,6 +755,14 @@ static void cxl_trace_event_record(const char *dev_name,
return;
}

+ if (uuid_equal(id, &mem_mod_event_uuid)) {
+ struct cxl_evt_mem_mod_rec *rec =
+ (struct cxl_evt_mem_mod_rec *)&payload->record;
+
+ trace_cxl_mem_mod_event(dev_name, type, rec);
+ return;
+ }
+
/* For unknown record types print just the header */
trace_cxl_event(dev_name, type, &payload->record);
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 50536c0a7850..a02a41dfd988 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -445,6 +445,31 @@ struct cxl_evt_dram_rec {
u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
} __packed;

+/*
+ * Get Health Info Record
+ * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+ u8 health_status;
+ u8 media_status;
+ u8 add_status;
+ u8 life_used;
+ u16 device_temp;
+ u32 dirty_shutdown_cnt;
+ u32 cor_vol_err_cnt;
+ u32 cor_per_err_cnt;
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_evt_mem_mod_rec {
+ struct cxl_event_record_hdr hdr;
+ u8 event_type;
+ struct cxl_get_health_info info;
+} __packed;
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
index db9b34ddd240..dbbe25fee25c 100644
--- a/include/trace/events/cxl-events.h
+++ b/include/trace/events/cxl-events.h
@@ -358,6 +358,161 @@ TRACE_EVENT(cxl_dram_event,
)
);

+/*
+ * Memory Module Event Record - MMER
+ *
+ * CXL v2.0 section 8.2.9.1.1.3; Table 156, Table 181
+ *
+ * Device Health Information - DHI; Table 181
+ */
+#define CXL_MMER_HEALTH_STATUS_CHANGE 0x00
+#define CXL_MMER_MEDIA_STATUS_CHANGE 0x01
+#define CXL_MMER_LIFE_USED_CHANGE 0x02
+#define CXL_MMER_TEMP_CHANGE 0x03
+#define CXL_MMER_DATA_PATH_ERROR 0x04
+#define CXL_MMER_LAS_ERROR 0x05
+#define show_dev_evt_type(type) __print_symbolic(type, \
+ { CXL_MMER_HEALTH_STATUS_CHANGE, "Health Status Change" }, \
+ { CXL_MMER_MEDIA_STATUS_CHANGE, "Media Status Change" }, \
+ { CXL_MMER_LIFE_USED_CHANGE, "Life Used Change" }, \
+ { CXL_MMER_TEMP_CHANGE, "Temperature Change" }, \
+ { CXL_MMER_DATA_PATH_ERROR, "Data Path Error" }, \
+ { CXL_MMER_LAS_ERROR, "LSA Error" } \
+)
+
+#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
+#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
+#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
+#define show_health_status_flags(flags) __print_flags(flags, "|", \
+ { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
+ { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
+ { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
+)
+
+#define CXL_DHI_MS_NORMAL 0x00
+#define CXL_DHI_MS_NOT_READY 0x01
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST 0x02
+#define CXL_DHI_MS_ALL_DATA_LOST 0x03
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS 0x04
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN 0x05
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT 0x06
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS 0x07
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN 0x08
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT 0x09
+#define show_media_status(ms) __print_symbolic(ms, \
+ { CXL_DHI_MS_NORMAL, \
+ "Normal" }, \
+ { CXL_DHI_MS_NOT_READY, \
+ "Not Ready" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOST, \
+ "Write Persistency Lost" }, \
+ { CXL_DHI_MS_ALL_DATA_LOST, \
+ "All Data Lost" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS, \
+ "Write Persistency Loss in the Event of Power Loss" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN, \
+ "Write Persistency Loss in Event of Shutdown" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT, \
+ "Write Persistency Loss Imminent" }, \
+ { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS, \
+ "All Data Loss in Event of Power Loss" }, \
+ { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN, \
+ "All Data loss in the Event of Shutdown" }, \
+ { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT, \
+ "All Data Loss Imminent" } \
+)
+
+#define CXL_DHI_AS_NORMAL 0x0
+#define CXL_DHI_AS_WARNING 0x1
+#define CXL_DHI_AS_CRITICAL 0x2
+#define show_add_status(as) __print_symbolic(as, \
+ { CXL_DHI_AS_NORMAL, "Normal" }, \
+ { CXL_DHI_AS_WARNING, "Warning" }, \
+ { CXL_DHI_AS_CRITICAL, "Critical" } \
+)
+
+#define CXL_DHI_AS_LIFE_USED(as) (as & 0x3)
+#define CXL_DHI_AS_DEV_TEMP(as) ((as & 0xC) >> 2)
+#define CXL_DHI_AS_COR_VOL_ERR_CNT(as) ((as & 0x10) >> 4)
+#define CXL_DHI_AS_COR_PER_ERR_CNT(as) ((as & 0x20) >> 5)
+
+TRACE_EVENT(cxl_mem_mod_event,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_evt_mem_mod_rec *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ /* Common */
+ __string(dev_name, dev_name)
+ __field(int, log)
+ __array(u8, id, UUID_SIZE)
+ __field(u32, flags)
+ __field(u16, handle)
+ __field(u16, related_handle)
+ __field(u64, timestamp)
+
+ /* Memory Module Event */
+ __field(u8, event_type)
+
+ /* Device Health Info */
+ __field(u8, health_status)
+ __field(u8, media_status)
+ __field(u8, life_used)
+ __field(u32, dirty_shutdown_cnt)
+ __field(u32, cor_vol_err_cnt)
+ __field(u32, cor_per_err_cnt)
+ __field(s16, device_temp)
+ __field(u8, add_status)
+ ),
+
+ TP_fast_assign(
+ /* Common */
+ __assign_str(dev_name, dev_name);
+ memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+ __entry->log = log;
+ __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+ __entry->handle = le16_to_cpu(rec->hdr.handle);
+ __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+ __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+
+ /* Memory Module Event */
+ __entry->event_type = rec->event_type;
+
+ /* Device Health Info */
+ __entry->health_status = rec->info.health_status;
+ __entry->media_status = rec->info.media_status;
+ __entry->life_used = rec->info.life_used;
+ __entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
+ __entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);
+ __entry->cor_per_err_cnt = le32_to_cpu(rec->info.cor_per_err_cnt);
+ __entry->device_temp = le16_to_cpu(rec->info.device_temp);
+ __entry->add_status = rec->info.add_status;
+ ),
+
+ TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
+ "evt_type='%s' health_status='%s' media_status='%s' as_life_used=%s " \
+ "as_dev_temp=%s as_cor_vol_err_cnt=%s as_cor_per_err_cnt=%s " \
+ "life_used=%u dev_temp=%d dirty_shutdown_cnt=%u cor_vol_err_cnt=%u " \
+ "cor_per_err_cnt=%u",
+ __get_str(dev_name), show_log_type(__entry->log),
+ __entry->timestamp, __entry->id, __entry->handle,
+ __entry->related_handle, show_hdr_flags(__entry->flags),
+
+ show_dev_evt_type(__entry->event_type),
+ show_health_status_flags(__entry->health_status),
+ show_media_status(__entry->media_status),
+ show_add_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
+ show_add_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
+ show_add_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
+ show_add_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
+ __entry->life_used, __entry->device_temp,
+ __entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
+ __entry->cor_per_err_cnt)
+);
+
+
#endif /* _CXL_TRACE_EVENTS_H */

/* This part must be outside protection */
--
2.35.3

2022-08-13 06:04:45

[permalink] [raw]

Subject: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record

From: Ira Weiny <[email protected]>

CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.

Determine if the event read is a DRAM event record and if so trace the
record.

Signed-off-by: Ira Weiny <[email protected]>

---
This record has a very odd byte layout with 2 - 16 bit fields
(validity_flags and column) aligned on an odd byte boundary. In
addition nibble_mask and row are oddly aligned.

I've made my best guess as to how the endianess of these fields should
be resolved. But I'm happy to hear from other folks if what I have is
wrong.

struct cxl_evt_dram_rec {
struct cxl_event_record_hdr hdr;
__le64 phys_addr;
u8 descriptor;
u8 type;
u8 transaction_type;
u16 validity_flags;
u8 channel;
u8 rank;
u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
u8 bank_group;
u8 bank;
u8 row[CXL_EVT_DER_ROW_SIZE];
u16 column;
u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
} __packed;
---
drivers/cxl/core/mbox.c | 16 +++++
drivers/cxl/cxlmem.h | 24 +++++++
include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
3 files changed, 154 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 0e433f072163..6414588a3c7b 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);

+/*
+ * DRAM Event Record
+ * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+static const uuid_t dram_event_uuid =
+ UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
+
static void cxl_trace_event_record(const char *dev_name,
enum cxl_event_log_type type,
struct cxl_get_event_payload *payload)
@@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
return;
}

+ if (uuid_equal(id, &dram_event_uuid)) {
+ struct cxl_evt_dram_rec *rec =
+ (struct cxl_evt_dram_rec *)&payload->record;
+
+ trace_cxl_dram_event(dev_name, type, rec);
+ return;
+ }
+
/* For unknown record types print just the header */
trace_cxl_event(dev_name, type, &payload->record);
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 33669459ae4b..50536c0a7850 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
} __packed;

+/*
+ * DRAM Event Record - DER
+ * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVT_DER_NIBBLE_MASK_SIZE 3
+#define CXL_EVT_DER_ROW_SIZE 3
+#define CXL_EVT_DER_CORRECTION_MASK_SIZE 0x20
+struct cxl_evt_dram_rec {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u16 validity_flags;
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
+ u8 bank_group;
+ u8 bank;
+ u8 row[CXL_EVT_DER_ROW_SIZE];
+ u16 column;
+ u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
+} __packed;
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
index b51c51fd4e62..db9b34ddd240 100644
--- a/include/trace/events/cxl-events.h
+++ b/include/trace/events/cxl-events.h
@@ -244,6 +244,120 @@ TRACE_EVENT(cxl_gen_media_event,
)
);

+/*
+ * DRAM Event Record - DER
+ *
+ * CXL v2.0 section 8.2.9.1.1.2; Table 155
+ */
+/*
+ * DRAM Event Record defines many fields the same as the General Media Event
+ * Record. Reuse those definitions as appropriate.
+ */
+#define CXL_DER_VALID_CHANNEL BIT(0)
+#define CXL_DER_VALID_RANK BIT(1)
+#define CXL_DER_VALID_NIBBLE BIT(2)
+#define CXL_DER_VALID_BANK_GROUP BIT(3)
+#define CXL_DER_VALID_BANK BIT(4)
+#define CXL_DER_VALID_ROW BIT(5)
+#define CXL_DER_VALID_COLUMN BIT(6)
+#define CXL_DER_VALID_CORRECTION_MASK BIT(7)
+#define show_dram_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_DER_VALID_CHANNEL, "CHANNEL" }, \
+ { CXL_DER_VALID_RANK, "RANK" }, \
+ { CXL_DER_VALID_NIBBLE, "NIBBLE" }, \
+ { CXL_DER_VALID_BANK_GROUP, "BANK GROUP" }, \
+ { CXL_DER_VALID_BANK, "BANK" }, \
+ { CXL_DER_VALID_ROW, "ROW" }, \
+ { CXL_DER_VALID_COLUMN, "COLUMN" }, \
+ { CXL_DER_VALID_CORRECTION_MASK, "CORRECTION MASK" } \
+)
+
+TRACE_EVENT(cxl_dram_event,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_evt_dram_rec *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ /* Common */
+ __string(dev_name, dev_name)
+ __field(int, log)
+ __array(u8, id, UUID_SIZE)
+ __field(u32, flags)
+ __field(u16, handle)
+ __field(u16, related_handle)
+ __field(u64, timestamp)
+
+ /* DRAM */
+ __field(u64, phys_addr)
+ __field(u8, descriptor)
+ __field(u8, type)
+ __field(u8, transaction_type)
+ __field(u8, channel)
+ __field(u16, validity_flags)
+ __field(u16, column) /* Out of order to pack trace record */
+ __field(u32, nibble_mask)
+ __field(u32, row)
+ __array(u8, cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE)
+ __field(u8, rank) /* Out of order to pack trace record */
+ __field(u8, bank_group) /* Out of order to pack trace record */
+ __field(u8, bank) /* Out of order to pack trace record */
+ ),
+
+ TP_fast_assign(
+ /* Common */
+ __assign_str(dev_name, dev_name);
+ memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
+ __entry->log = log;
+ __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
+ __entry->handle = le16_to_cpu(rec->hdr.handle);
+ __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
+ __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
+
+ /* DRAM */
+ __entry->phys_addr = le64_to_cpu(rec->phys_addr);
+ __entry->descriptor = rec->descriptor;
+ __entry->type = rec->type;
+ __entry->transaction_type = rec->transaction_type;
+ __entry->validity_flags = le16_to_cpu(rec->validity_flags);
+ __entry->channel = rec->channel;
+ __entry->rank = rec->rank;
+ __entry->nibble_mask = rec->nibble_mask[0] << 24 |
+ rec->nibble_mask[1] << 16 |
+ rec->nibble_mask[2] << 8; /* 3 byte LE ? */
+ __entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);
+ __entry->bank_group = rec->bank_group;
+ __entry->bank = rec->bank;
+ __entry->row = rec->row[0] << 24 |
+ rec->row[1] << 16 |
+ rec->row[2] << 8; /* 3 byte LE ? */
+ __entry->row = le32_to_cpu(__entry->row);
+ __entry->column = le16_to_cpu(rec->column);
+ memcpy(__entry->cor_mask, &rec->correction_mask,
+ CXL_EVT_DER_CORRECTION_MASK_SIZE);
+ ),
+
+ TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
+ "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
+ "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
+ "cor_mask=%s valid_flags='%s'",
+ __get_str(dev_name), show_log_type(__entry->log),
+ __entry->timestamp, __entry->id, __entry->handle,
+ __entry->related_handle, show_hdr_flags(__entry->flags),
+ __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
+ (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
+ show_event_desc_flags(__entry->descriptor),
+ show_mem_event_type(__entry->type),
+ show_trans_type(__entry->transaction_type),
+ __entry->channel, __entry->rank, __entry->nibble_mask,
+ __entry->bank_group, __entry->bank,
+ __entry->row, __entry->column,
+ __print_hex(__entry->cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE),
+ show_dram_valid_flags(__entry->validity_flags)
+ )
+);
+
#endif /* _CXL_TRACE_EVENTS_H */

/* This part must be outside protection */
--
2.35.3

2022-08-13 06:05:13

[permalink] [raw]

Subject: [RFC PATCH 7/9] cxl/test: Add generic mock events

From: Ira Weiny <[email protected]>

Facilitate testing basic Get/Clear Event functionality by creating
multiple logs and generic events with made up UUID's.

Data is completely made up with data patterns which should be easy to
spot in trace output.

Test traces are easy to obtain with a small script such as this:

#!/bin/bash -x

devices=`find /sys/devices/platform -name cxl_mem*`

# Generate fake events if reset is passed in
if [ "$1" == "reset" ]; then
for device in $devices; do
echo 1 > $device/mem*/event_reset
done
fi

# Turn on tracing
echo "" > /sys/kernel/tracing/trace
echo 1 > /sys/kernel/tracing/events/cxl_events/enable
echo 1 > /sys/kernel/tracing/tracing_on

# Generate fake interrupt
for device in $devices; do
echo 1 > $device/mem*/event_trigger
# just trigger 1
break;
done

# Turn off tracing and report events
echo 0 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace

Signed-off-by: Ira Weiny <[email protected]>
---
tools/testing/cxl/test/mem.c | 291 +++++++++++++++++++++++++++++++++++
1 file changed, 291 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index e2f5445d24ff..87196d62acf5 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -9,6 +9,8 @@
#include <linux/bits.h>
#include <cxlmem.h>

+#include <trace/events/cxl-events.h>
+
#define LSA_SIZE SZ_128K
#define DEV_SIZE SZ_2G
#define EFFECT(x) (1U << x)
@@ -137,6 +139,287 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
return 0;
}

+/*
+ * Mock Events
+ */
+struct mock_event_log {
+ int cur_event;
+ int nr_events;
+ struct xarray events;
+};
+
+struct mock_event_store {
+ struct cxl_dev_state *cxlds;
+ struct mock_event_log *mock_logs[CXL_EVENT_TYPE_MAX];
+};
+
+DEFINE_XARRAY(mock_cxlds_event_store);
+
+void delete_event_store(void *ds)
+{
+ xa_store(&mock_cxlds_event_store, (unsigned long)ds, NULL, GFP_KERNEL);
+}
+
+void store_event_store(struct mock_event_store *es)
+{
+ struct cxl_dev_state *cxlds = es->cxlds;
+
+ if (xa_insert(&mock_cxlds_event_store, (unsigned long)cxlds, es,
+ GFP_KERNEL)) {
+ dev_err(cxlds->dev, "Event store not available for %s\n",
+ dev_name(cxlds->dev));
+ return;
+ }
+
+ devm_add_action_or_reset(cxlds->dev, delete_event_store, cxlds);
+}
+
+struct mock_event_log *find_event_log(struct cxl_dev_state *cxlds, int log_type)
+{
+ struct mock_event_store *es = xa_load(&mock_cxlds_event_store,
+ (unsigned long)cxlds);
+
+ if (!es || log_type >= CXL_EVENT_TYPE_MAX)
+ return NULL;
+ return es->mock_logs[log_type];
+}
+
+struct cxl_event_record_raw *get_cur_event(struct mock_event_log *log)
+{
+ return xa_load(&log->events, log->cur_event);
+}
+
+__le16 get_cur_event_handle(struct mock_event_log *log)
+{
+ return cpu_to_le16(log->cur_event);
+}
+
+static bool log_empty(struct mock_event_log *log)
+{
+ return log->cur_event == log->nr_events;
+}
+
+static int log_rec_left(struct mock_event_log *log)
+{
+ return log->nr_events - log->cur_event;
+}
+
+static void xa_events_destroy(void *l)
+{
+ struct mock_event_log *log = l;
+
+ xa_destroy(&log->events);
+}
+
+static void event_store_add_event(struct mock_event_store *es,
+ enum cxl_event_log_type log_type,
+ struct cxl_event_record_raw *event)
+{
+ struct mock_event_log *log;
+ struct device *dev = es->cxlds->dev;
+ int rc;
+
+ if (log_type >= CXL_EVENT_TYPE_MAX)
+ return;
+
+ log = es->mock_logs[log_type];
+ if (!log) {
+ log = devm_kzalloc(dev, sizeof(*log), GFP_KERNEL);
+ if (!log) {
+ dev_err(dev, "Failed to create %s log\n",
+ cxl_event_log_type_str(log_type));
+ return;
+ }
+ xa_init(&log->events);
+ devm_add_action(dev, xa_events_destroy, log);
+ es->mock_logs[log_type] = log;
+ }
+
+ rc = xa_insert(&log->events, log->nr_events, event, GFP_KERNEL);
+ if (rc) {
+ dev_err(dev, "Failed to store event %s log\n",
+ cxl_event_log_type_str(log_type));
+ return;
+ }
+ log->nr_events++;
+}
+
+/*
+ * Get and clear event only handle 1 record at a time as this is what is
+ * currently implemented in the main code.
+ */
+static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
+{
+ struct cxl_get_event_payload *pl;
+ struct mock_event_log *log;
+ u8 log_type;
+
+ /* Valid request? */
+ if (cmd->size_in != 1)
+ return -EINVAL;
+
+ log_type = *((u8 *)cmd->payload_in);
+ if (log_type >= CXL_EVENT_TYPE_MAX)
+ return -EINVAL;
+
+ log = find_event_log(cxlds, log_type);
+ if (!log || log_empty(log))
+ goto no_data;
+
+ /* Don't handle more than 1 record at a time */
+ if (cmd->size_out < sizeof(*pl))
+ return -EINVAL;
+
+ pl = cmd->payload_out;
+ memset(pl, 0, sizeof(*pl));
+
+ pl->record_count = cpu_to_le16(1);
+
+ if (log_rec_left(log) > 1)
+ pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
+
+ memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
+ pl->record.hdr.handle = get_cur_event_handle(log);
+ return 0;
+
+no_data:
+ /* Room for header? */
+ if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record)))
+ return -EINVAL;
+
+ memset(cmd->payload_out, 0, cmd->size_out);
+ return 0;
+}
+
+/*
+ * Get and clear event only handle 1 record at a time as this is what is
+ * currently implemented in the main code.
+ */
+static int mock_clear_event(struct cxl_dev_state *cxlds,
+ struct cxl_mbox_cmd *cmd)
+{
+ struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
+ struct mock_event_log *log;
+ u8 log_type = pl->event_log;
+
+ /* Don't handle more than 1 record at a time */
+ if (pl->nr_recs != 1)
+ return -EINVAL;
+
+ if (log_type >= CXL_EVENT_TYPE_MAX)
+ return -EINVAL;
+
+ log = find_event_log(cxlds, log_type);
+ if (!log)
+ return 0; /* No mock data in this log */
+
+ /*
+ * The current code clears events as they are read
+ * Test that behavior; not clearning from the middle of the log
+ */
+ if (log->cur_event != le16_to_cpu(pl->handle)) {
+ dev_err(cxlds->dev, "Clearing events out of order\n");
+ return -EINVAL;
+ }
+
+ log->cur_event++;
+ return 0;
+}
+
+static ssize_t event_reset_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
+ int i;
+
+ for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) {
+ struct mock_event_log *log;
+
+ log = find_event_log(cxlmd->cxlds, i);
+ if (log)
+ log->cur_event = 0;
+ }
+
+ return count;
+}
+static DEVICE_ATTR_WO(event_reset);
+
+static ssize_t event_trigger_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
+
+ cxl_mem_get_event_records(cxlmd->cxlds);
+
+ return count;
+}
+static DEVICE_ATTR_WO(event_trigger);
+
+static struct attribute *cxl_mock_event_attrs[] = {
+ &dev_attr_event_reset.attr,
+ &dev_attr_event_trigger.attr,
+ NULL
+};
+ATTRIBUTE_GROUPS(cxl_mock_event);
+
+void remove_mock_event_groups(void *dev)
+{
+ device_remove_groups(dev, cxl_mock_event_groups);
+}
+
+struct cxl_event_record_raw maint_needed = {
+ .hdr = {
+ .id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+ .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_MAINT_NEEDED << 8) |
+ sizeof(struct cxl_event_record_raw)),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xa5b6),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+struct cxl_event_record_raw hardware_replace = {
+ .hdr = {
+ .id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
+ .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_HW_REPLACE << 8) |
+ sizeof(struct cxl_event_record_raw)),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xb6a5),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
+};
+
+static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
+{
+ struct device *dev = &cxlmd->dev;
+ struct mock_event_store *es;
+
+ /*
+ * The memory device gets the sysfs attributes such that the cxlmd
+ * pointer can be used to get to a cxlds pointer.
+ */
+ if (device_add_groups(dev, cxl_mock_event_groups))
+ return;
+ if (devm_add_action_or_reset(dev, remove_mock_event_groups, dev))
+ return;
+
+ /*
+ * All the mock event data hangs off the device itself.
+ */
+ es = devm_kzalloc(cxlmd->cxlds->dev, sizeof(*es), GFP_KERNEL);
+ if (!es)
+ return;
+ es->cxlds = cxlmd->cxlds;
+
+ event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
+
+ event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+
+ store_event_store(es);
+}
+
static int mock_get_lsa(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
{
struct cxl_mbox_get_lsa *get_lsa = cmd->payload_in;
@@ -224,6 +507,12 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
case CXL_MBOX_OP_GET_PARTITION_INFO:
rc = mock_partition_info(cxlds, cmd);
break;
+ case CXL_MBOX_OP_GET_EVENT_RECORD:
+ rc = mock_get_event(cxlds, cmd);
+ break;
+ case CXL_MBOX_OP_CLEAR_EVENT_RECORD:
+ rc = mock_clear_event(cxlds, cmd);
+ break;
case CXL_MBOX_OP_SET_LSA:
rc = mock_set_lsa(cxlds, cmd);
break;
@@ -285,6 +574,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);

+ devm_cxl_mock_event_logs(cxlmd);
+
cxl_mem_get_event_records(cxlds);

if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
--
2.35.3

2022-08-13 07:10:32

[permalink] [raw]

Subject: [RFC PATCH 9/9] cxl/test: Simulate event log overflow

From: Ira Weiny <[email protected]>

Log overflow is marked by a separate trace message.

Simulate a log with lots of messages and flag overflow until it is
drained a bit.

Signed-off-by: Ira Weiny <[email protected]>
---
tools/testing/cxl/test/mem.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index c5d7857ae2e5..87e6b10896c9 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -244,6 +244,15 @@ static void event_store_add_event(struct mock_event_store *es,
log->nr_events++;
}

+static u16 log_overflow(struct mock_event_log *log)
+{
+ int cnt = log_rec_left(log) - 5;
+
+ if (cnt < 0)
+ return 0;
+ return cnt;
+}
+
/*
* Get and clear event only handle 1 record at a time as this is what is
* currently implemented in the main code.
@@ -253,6 +262,7 @@ static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
struct cxl_get_event_payload *pl;
struct mock_event_log *log;
u8 log_type;
+ u16 nr_overflow;

/* Valid request? */
if (cmd->size_in != 1)
@@ -278,6 +288,20 @@ static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
if (log_rec_left(log) > 1)
pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;

+ nr_overflow = log_overflow(log);
+ if (nr_overflow) {
+ u64 ns;
+
+ pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
+ pl->overflow_err_count = cpu_to_le16(nr_overflow);
+ ns = ktime_get_real_ns();
+ ns -= 5000000000; /* 5s ago */
+ pl->first_overflow_timestamp = cpu_to_le64(ns);
+ ns = ktime_get_real_ns();
+ ns -= 1000000000; /* 1s ago */
+ pl->last_overflow_timestamp = cpu_to_le64(ns);
+ }
+
memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
pl->record.hdr.handle = get_cur_event_handle(log);
return 0;
@@ -483,6 +507,18 @@ static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
event_store_add_event(es, CXL_EVENT_TYPE_INFO,
(struct cxl_event_record_raw *)&mem_mod_rec);

+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL, &maint_needed);
+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&dram_rec);
+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&gen_media);
+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&mem_mod_rec);
+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+ event_store_add_event(es, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&dram_rec);
+
event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
event_store_add_event(es, CXL_EVENT_TYPE_FATAL,
(struct cxl_event_record_raw *)&dram_rec);
--
2.35.3

2022-08-16 16:58:19

by Steven Rostedt

[permalink] [raw]

Subject: Re: [RFC PATCH 9/9] cxl/test: Simulate event log overflow

I just skimmed through the rest of the patches, and it looks fine to me.

-- Steve

2022-08-22 17:08:30

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: [RFC PATCH 0/9] CXL: Read and clear event logs

On Fri, 12 Aug 2022, [email protected] wrote:

>From: Ira Weiny <[email protected]>
>
>Event records inform the OS of various device events. Events are not needed
>for any kernel operation but various user level software will want to track
>events.
>
>Add event reporting through the trace event mechanism. On driver load read and
>clear all device events.
>
>Normally interrupts will trigger new events to be reported as they occur.
>Because the interrupt code is still being worked on this series provides a
>cxl-test mechanism to create a series of events and trigger the reporting of
>those events.

Where is this irq code being worked on? I've asked about this for async mbox
commands, and Jonathan has also posted some code for the PMU implementation.

Could we not just start with an initial MSI/MSI-X support? Then gradually
interested users can be added? So each "feature" would need to do implement
it's "get message number" and to install the isr just do the standard:

irq = pci_irq_vector(pdev, num);
irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
cxl_irq_cap_table[feature].name);
rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);

The only complexity I see for this is to know the number of vectors to request
apriori, for which we'd have to get the larges value of all CXL features that
can support interrupts. Something like the following? One thing I have not
considered in this is the DOE stuff.

Thanks,
Davidlohr

------
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88e3a8e54b6a..b334d2f497c1 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -245,6 +245,8 @@ struct cxl_dev_state {
resource_size_t component_reg_phys;
u64 serial;

+ int irq_type; /* MSI-X, MSI */
+
struct xarray doe_mbs;

int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index eec597dbe763..95f4b91f43b1 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -53,15 +53,6 @@
#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)

-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
- CXL_REGLOC_RBI_EMPTY = 0,
- CXL_REGLOC_RBI_COMPONENT,
- CXL_REGLOC_RBI_VIRT,
- CXL_REGLOC_RBI_MEMDEV,
- CXL_REGLOC_RBI_TYPES
-};
-
static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
struct cxl_register_map *map)
{
@@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
struct cxl_dev_state;
int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
void read_cdat_data(struct cxl_port *port);
+
+#define CXL_IRQ_CAPABILITY_TABLE \
+ C(ISOLATION, "isolation", NULL), \
+ C(PMU, "pmu_overflow", NULL), /* per pmu instance */ \
+ C(MBOX, "mailbox", NULL), /* primary-only */ \
+ C(EVENT, "event", NULL),
+
+#undef C
+#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
+enum { CXL_IRQ_CAPABILITY_TABLE };
+#undef C
+#define C(a, b, c) { b, c }
+/**
+ * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
+ *
+ * @name: Name of the device generating this interrupt.
+ * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
+ * where there is only one instance it also indicates which
+ * MSI/MSI-X vector is used for the interrupt message generated
+ * in association with the feature. If the feature does not
+ * have the Interrupt Supported bit set, then return -1.
+ */
+struct cxl_irq_cap {
+ const char *name;
+ int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
+};
+
+static const
+struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
+#undef C
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+ CXL_REGLOC_RBI_EMPTY = 0,
+ CXL_REGLOC_RBI_COMPONENT,
+ CXL_REGLOC_RBI_VIRT,
+ CXL_REGLOC_RBI_MEMDEV,
+ CXL_REGLOC_RBI_TYPES
+};
+
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..c0fe78e0559b 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
return rc;
}

+static void cxl_pci_free_irq_vectors(void *data)
+{
+ pci_free_irq_vectors(data);
+}
+
+static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
+{
+ struct device *dev = cxlds->dev;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ int rc, i, vectors = -1;
+
+ for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
+ int irq;
+
+ if (!cxl_irq_cap_table[i].get_max_msgnum)
+ continue;
+
+ irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
+ vectors = max_t(int, irq, vectors);
+ }
+
+ if (vectors == -1)
+ return -EINVAL; /* no irq support whatsoever */
+
+ vectors++;
+ rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
+ if (rc < 0) {
+ rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
+ if (rc < 0)
+ return rc;
+
+ cxlds->irq_type = PCI_IRQ_MSI;
+ } else {
+ cxlds->irq_type = PCI_IRQ_MSIX;
+ }
+
+ if (rc != vectors) {
+ pci_err(pdev, "Not enough interrupts; use polling where supported\n");
+ /* Some got allocated; clean them up */
+ cxl_pci_free_irq_vectors(pdev);
+ return -ENOSPC;
+ }
+
+ return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
+}
+
static void cxl_pci_destroy_doe(void *mbs)
{
xa_destroy(mbs);
@@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)

cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);

+ if (cxl_pci_alloc_irq_vectors(cxlds))
+ cxlds->irq_type = 0;
+
devm_cxl_pci_create_doe(cxlds);

rc = cxl_pci_setup_mailbox(cxlds);

2022-08-22 23:08:19

[permalink] [raw]

Subject: Re: [RFC PATCH 0/9] CXL: Read and clear event logs

On Mon, Aug 22, 2022 at 09:18:02AM -0700, Davidlohr Bueso wrote:
> On Fri, 12 Aug 2022, [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > Event records inform the OS of various device events. Events are not needed
> > for any kernel operation but various user level software will want to track
> > events.
> >
> > Add event reporting through the trace event mechanism. On driver load read and
> > clear all device events.
> >
> > Normally interrupts will trigger new events to be reported as they occur.
> > Because the interrupt code is still being worked on this series provides a
> > cxl-test mechanism to create a series of events and trigger the reporting of
> > those events.
>
> Where is this irq code being worked on? I've asked about this for async mbox
> commands, and Jonathan has also posted some code for the PMU implementation.

I'm still trying to work out how to share irq's between PCI and CXL. Mainly
for DOE.

I thought that we could skip IRQ support for DOE completely and this would
support your proposal below. But I just found that:

"A device may interrupt the host when CDAT content changes using the MSI
associated with this DOE Capability instance."

So I guess it needs to be supported at some point.

>
> Could we not just start with an initial MSI/MSI-X support? Then gradually
> interested users can be added? So each "feature" would need to do implement
> it's "get message number" and to install the isr just do the standard:
>
> irq = pci_irq_vector(pdev, num);
> irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
> cxl_irq_cap_table[feature].name);
> rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);
>
> The only complexity I see for this is to know the number of vectors to request
> apriori, for which we'd have to get the larges value of all CXL features that
> can support interrupts. Something like the following?

Generally it seems ok but I have questions below.

> One thing I have not
> considered in this is the DOE stuff.

I think this is the harder thing to support because of needing to allow both
the PCI layer and the CXL layer to create irqs. Potentially at different
times.

>
> Thanks,
> Davidlohr
>
> ------
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..b334d2f497c1 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -245,6 +245,8 @@ struct cxl_dev_state {
> resource_size_t component_reg_phys;
> u64 serial;
>
> + int irq_type; /* MSI-X, MSI */
> +
> struct xarray doe_mbs;
>
> int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index eec597dbe763..95f4b91f43b1 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,15 +53,6 @@
> #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
>
> -/* Register Block Identifier (RBI) */
> -enum cxl_regloc_type {
> - CXL_REGLOC_RBI_EMPTY = 0,
> - CXL_REGLOC_RBI_COMPONENT,
> - CXL_REGLOC_RBI_VIRT,
> - CXL_REGLOC_RBI_MEMDEV,
> - CXL_REGLOC_RBI_TYPES
> -};

Why move this?

> -
> static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> struct cxl_register_map *map)
> {
> @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> struct cxl_dev_state;
> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
> void read_cdat_data(struct cxl_port *port);
> +
> +#define CXL_IRQ_CAPABILITY_TABLE \
> + C(ISOLATION, "isolation", NULL), \
> + C(PMU, "pmu_overflow", NULL), /* per pmu instance */ \
> + C(MBOX, "mailbox", NULL), /* primary-only */ \
> + C(EVENT, "event", NULL),

This is defining get_max_msgnum to NULL right?

> +
> +#undef C
> +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
> +enum { CXL_IRQ_CAPABILITY_TABLE };
> +#undef C
> +#define C(a, b, c) { b, c }
> +/**
> + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
> + *
> + * @name: Name of the device generating this interrupt.
> + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
> + * where there is only one instance it also indicates which
> + * MSI/MSI-X vector is used for the interrupt message generated
> + * in association with the feature. If the feature does not
> + * have the Interrupt Supported bit set, then return -1.
> + */
> +struct cxl_irq_cap {
> + const char *name;
> + int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
> +};
> +
> +static const
> +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
> +#undef C

Why all this macro magic?

> +
> +/* Register Block Identifier (RBI) */
> +enum cxl_regloc_type {
> + CXL_REGLOC_RBI_EMPTY = 0,
> + CXL_REGLOC_RBI_COMPONENT,
> + CXL_REGLOC_RBI_VIRT,
> + CXL_REGLOC_RBI_MEMDEV,
> + CXL_REGLOC_RBI_TYPES
> +};
> +
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..c0fe78e0559b 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> return rc;
> }
>
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> + pci_free_irq_vectors(data);
> +}
> +
> +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> + struct device *dev = cxlds->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int rc, i, vectors = -1;
> +
> + for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
> + int irq;
> +
> + if (!cxl_irq_cap_table[i].get_max_msgnum)
> + continue;
> +
> + irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
> + vectors = max_t(int, irq, vectors);
> + }
> +
> + if (vectors == -1)
> + return -EINVAL; /* no irq support whatsoever */
> +
> + vectors++;

This is pretty much what earlier versions of the DOE code did with the
exception of only have 1 get_max_msgnum() calls defined (for DOE). But there
was a lot of debate about how to share vectors with the PCI layer. And
eventually we got rid of it. I'm still trying to figure it out. Sorry for
being slow.

Perhaps we do this for this series. However, won't we have an issue if we want
to support switch events?

Ira

> + rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
> + if (rc < 0) {
> + rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
> + if (rc < 0)
> + return rc;
> +
> + cxlds->irq_type = PCI_IRQ_MSI;
> + } else {
> + cxlds->irq_type = PCI_IRQ_MSIX;
> + }
> +
> + if (rc != vectors) {
> + pci_err(pdev, "Not enough interrupts; use polling where supported\n");
> + /* Some got allocated; clean them up */
> + cxl_pci_free_irq_vectors(pdev);
> + return -ENOSPC;
> + }
> +
> + return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> +}
> +
> static void cxl_pci_destroy_doe(void *mbs)
> {
> xa_destroy(mbs);
> @@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>
> cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);
>
> + if (cxl_pci_alloc_irq_vectors(cxlds))
> + cxlds->irq_type = 0;
> +
> devm_cxl_pci_create_doe(cxlds);
>
> rc = cxl_pci_setup_mailbox(cxlds);

2022-08-23 18:29:01

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: [RFC PATCH 0/9] CXL: Read and clear event logs

On Mon, 22 Aug 2022, Ira Weiny wrote:

>Generally it seems ok but I have questions below.
>
>> One thing I have not
>> considered in this is the DOE stuff.
>
>I think this is the harder thing to support because of needing to allow both
>the PCI layer and the CXL layer to create irqs. Potentially at different
>times.

I agree.

>> -/* Register Block Identifier (RBI) */
>> -enum cxl_regloc_type {
>> - CXL_REGLOC_RBI_EMPTY = 0,
>> - CXL_REGLOC_RBI_COMPONENT,
>> - CXL_REGLOC_RBI_VIRT,
>> - CXL_REGLOC_RBI_MEMDEV,
>> - CXL_REGLOC_RBI_TYPES
>> -};
>
>Why move this?

That was sloppy of me, sorry. I wanted to reuse struct cxlds forward declaration,
no idea why that diff formed.

>> -
>> static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
>> struct cxl_register_map *map)
>> {
>> @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
>> struct cxl_dev_state;
>> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
>> void read_cdat_data(struct cxl_port *port);
>> +
>> +#define CXL_IRQ_CAPABILITY_TABLE \
>> + C(ISOLATION, "isolation", NULL), \
>> + C(PMU, "pmu_overflow", NULL), /* per pmu instance */ \
>> + C(MBOX, "mailbox", NULL), /* primary-only */ \
>> + C(EVENT, "event", NULL),
>
>This is defining get_max_msgnum to NULL right?

Yes. So untl there are any users everything's a nop.

>> +
>> +#undef C
>> +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
>> +enum { CXL_IRQ_CAPABILITY_TABLE };
>> +#undef C
>> +#define C(a, b, c) { b, c }
>> +/**
>> + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
>> + *
>> + * @name: Name of the device generating this interrupt.
>> + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
>> + * where there is only one instance it also indicates which
>> + * MSI/MSI-X vector is used for the interrupt message generated
>> + * in association with the feature. If the feature does not
>> + * have the Interrupt Supported bit set, then return -1.
>> + */
>> +struct cxl_irq_cap {
>> + const char *name;
>> + int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
>> +};
>> +
>> +static const
>> +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
>> +#undef C
>
>Why all this macro magic?

A nifty trick Dan likes, it avoids duplicating the fields (enums + the table).

>> +
>> +/* Register Block Identifier (RBI) */
>> +enum cxl_regloc_type {
>> + CXL_REGLOC_RBI_EMPTY = 0,
>> + CXL_REGLOC_RBI_COMPONENT,
>> + CXL_REGLOC_RBI_VIRT,
>> + CXL_REGLOC_RBI_MEMDEV,
>> + CXL_REGLOC_RBI_TYPES
>> +};
>> +
>> #endif /* __CXL_PCI_H__ */
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index faeb5d9d7a7a..c0fe78e0559b 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>> return rc;
>> }
>>
>> +static void cxl_pci_free_irq_vectors(void *data)
>> +{
>> + pci_free_irq_vectors(data);
>> +}
>> +
>> +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
>> +{
>> + struct device *dev = cxlds->dev;
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> + int rc, i, vectors = -1;
>> +
>> + for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
>> + int irq;
>> +
>> + if (!cxl_irq_cap_table[i].get_max_msgnum)
>> + continue;
>> +
>> + irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
>> + vectors = max_t(int, irq, vectors);
>> + }
>> +
>> + if (vectors == -1)
>> + return -EINVAL; /* no irq support whatsoever */
>> +
>> + vectors++;
>
>This is pretty much what earlier versions of the DOE code did with the
>exception of only have 1 get_max_msgnum() calls defined (for DOE). But there
>was a lot of debate about how to share vectors with the PCI layer. And
>eventually we got rid of it. I'm still trying to figure it out. Sorry for
>being slow.

That makes sense, thanks for the explanation. And no not slow, it is _I_
that needs to go re-read the DOE stuff with more attention. But while I
knew this was the hardest part, all I really wanted was a basic irq
support to add to the bg cmd handling series.

>Perhaps we do this for this series. However, won't we have an issue if we want
>to support switch events?

If possible, could you elaborate more on this?

Thanks,
Davidlohr

2022-08-24 10:24:41

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 0/9] CXL: Read and clear event logs

On Mon, 22 Aug 2022 15:53:54 -0700
Ira Weiny <[email protected]> wrote:

> On Mon, Aug 22, 2022 at 09:18:02AM -0700, Davidlohr Bueso wrote:
> > On Fri, 12 Aug 2022, [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > Event records inform the OS of various device events. Events are not needed
> > > for any kernel operation but various user level software will want to track
> > > events.
> > >
> > > Add event reporting through the trace event mechanism. On driver load read and
> > > clear all device events.
> > >
> > > Normally interrupts will trigger new events to be reported as they occur.
> > > Because the interrupt code is still being worked on this series provides a
> > > cxl-test mechanism to create a series of events and trigger the reporting of
> > > those events.
> >
> > Where is this irq code being worked on? I've asked about this for async mbox
> > commands, and Jonathan has also posted some code for the PMU implementation.
>
> I'm still trying to work out how to share irq's between PCI and CXL. Mainly
> for DOE.
>
> I thought that we could skip IRQ support for DOE completely and this would
> support your proposal below. But I just found that:
>
> "A device may interrupt the host when CDAT content changes using the MSI
> associated with this DOE Capability instance."

As of today that doesn't work because there is no status flag anywhere to let
you know that was the interrupt source.

It's been raised in appropriate places, but I can't say anymore on that
until stuff is published.

Hence I'd not worry about that corner for now.

>
> So I guess it needs to be supported at some point.
>
> >
> > Could we not just start with an initial MSI/MSI-X support? Then gradually
> > interested users can be added? So each "feature" would need to do implement
> > it's "get message number" and to install the isr just do the standard:
> >
> > irq = pci_irq_vector(pdev, num);
> > irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
> > cxl_irq_cap_table[feature].name);
> > rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);
> >
> > The only complexity I see for this is to know the number of vectors to request
> > apriori, for which we'd have to get the larges value of all CXL features that
> > can support interrupts. Something like the following?
>
> Generally it seems ok but I have questions below.
>
> > One thing I have not
> > considered in this is the DOE stuff.
>
> I think this is the harder thing to support because of needing to allow both
> the PCI layer and the CXL layer to create irqs. Potentially at different
> times.

My reasoning on this is that IRQ creation has to be done by
the PCI device driver. That may result in some juggling and late starting
or indeed restarting of DOE mailboxes once we can know the list of vectors.
(e.g. query them by polling, then a later driver register can request enabling
the DOE with an irq).
Or it needs the ability to do dynamic increasing of the requested IRQ vectors.

>
> >
> > Thanks,
> > Davidlohr
> >
> > ------
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 88e3a8e54b6a..b334d2f497c1 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -245,6 +245,8 @@ struct cxl_dev_state {
> > resource_size_t component_reg_phys;
> > u64 serial;
> >
> > + int irq_type; /* MSI-X, MSI */
> > +
> > struct xarray doe_mbs;
> >
> > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index eec597dbe763..95f4b91f43b1 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -53,15 +53,6 @@
> > #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> > #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
> >
> > -/* Register Block Identifier (RBI) */
> > -enum cxl_regloc_type {
> > - CXL_REGLOC_RBI_EMPTY = 0,
> > - CXL_REGLOC_RBI_COMPONENT,
> > - CXL_REGLOC_RBI_VIRT,
> > - CXL_REGLOC_RBI_MEMDEV,
> > - CXL_REGLOC_RBI_TYPES
> > -};
>
> Why move this?
>
> > -
> > static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
> > struct cxl_register_map *map)
> > {
> > @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
> > struct cxl_dev_state;
> > int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
> > void read_cdat_data(struct cxl_port *port);
> > +
> > +#define CXL_IRQ_CAPABILITY_TABLE \
> > + C(ISOLATION, "isolation", NULL), \
> > + C(PMU, "pmu_overflow", NULL), /* per pmu instance */ \
> > + C(MBOX, "mailbox", NULL), /* primary-only */ \
> > + C(EVENT, "event", NULL),
>
> This is defining get_max_msgnum to NULL right?
>
> > +
> > +#undef C
> > +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
> > +enum { CXL_IRQ_CAPABILITY_TABLE };
> > +#undef C
> > +#define C(a, b, c) { b, c }
> > +/**
> > + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
> > + *
> > + * @name: Name of the device generating this interrupt.
> > + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
> > + * where there is only one instance it also indicates which
> > + * MSI/MSI-X vector is used for the interrupt message generated
> > + * in association with the feature. If the feature does not
> > + * have the Interrupt Supported bit set, then return -1.
> > + */
> > +struct cxl_irq_cap {
> > + const char *name;
> > + int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
> > +};
> > +
> > +static const
> > +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
> > +#undef C
>
> Why all this macro magic?

Agreed. I'm rarely persuaded it's a good idea to do this sort of trickery
and it definitely isn't worth the readabilty problems unless there a
large number of users.

>
> > +
> > +/* Register Block Identifier (RBI) */
> > +enum cxl_regloc_type {
> > + CXL_REGLOC_RBI_EMPTY = 0,
> > + CXL_REGLOC_RBI_COMPONENT,
> > + CXL_REGLOC_RBI_VIRT,
> > + CXL_REGLOC_RBI_MEMDEV,
> > + CXL_REGLOC_RBI_TYPES
> > +};
> > +
> > #endif /* __CXL_PCI_H__ */
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index faeb5d9d7a7a..c0fe78e0559b 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> > return rc;
> > }
> >
> > +static void cxl_pci_free_irq_vectors(void *data)
> > +{
> > + pci_free_irq_vectors(data);
> > +}
> > +
> > +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > +{
> > + struct device *dev = cxlds->dev;
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > + int rc, i, vectors = -1;
> > +
> > + for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
> > + int irq;
> > +
> > + if (!cxl_irq_cap_table[i].get_max_msgnum)
> > + continue;
> > +
> > + irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
> > + vectors = max_t(int, irq, vectors);
> > + }
> > +
> > + if (vectors == -1)
> > + return -EINVAL; /* no irq support whatsoever */
> > +
> > + vectors++;
>
> This is pretty much what earlier versions of the DOE code did with the
> exception of only have 1 get_max_msgnum() calls defined (for DOE). But there
> was a lot of debate about how to share vectors with the PCI layer. And
> eventually we got rid of it. I'm still trying to figure it out. Sorry for
> being slow.

I'm not yet setting huge advantage in wrapping this up. For now a set of
linear calls to establish the max irq vector is more readable. Sure
down the line moving to this may make sense.

>
> Perhaps we do this for this series. However, won't we have an issue if we want
> to support switch events?

We 'could' extend existing stuff in the portdrv code (which is ultimately
where this general approach was copied from ;) but I suspect doing that
for non generic PCI stuff is going to be controversial.

That whole infrastructure in PCI may need a rewrite.

>
> Ira
>
> > + rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
> > + if (rc < 0) {
> > + rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
> > + if (rc < 0)
> > + return rc;
> > +
> > + cxlds->irq_type = PCI_IRQ_MSI;
> > + } else {
> > + cxlds->irq_type = PCI_IRQ_MSIX;
> > + }
> > +
> > + if (rc != vectors) {
> > + pci_err(pdev, "Not enough interrupts; use polling where supported\n");
> > + /* Some got allocated; clean them up */
> > + cxl_pci_free_irq_vectors(pdev);
> > + return -ENOSPC;
> > + }
> > +
> > + return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> > +}
> > +
> > static void cxl_pci_destroy_doe(void *mbs)
> > {
> > xa_destroy(mbs);
> > @@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >
> > cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);
> >
> > + if (cxl_pci_alloc_irq_vectors(cxlds))
> > + cxlds->irq_type = 0;
> > +
> > devm_cxl_pci_create_doe(cxlds);
> >
> > rc = cxl_pci_setup_mailbox(cxlds);

2022-08-24 15:57:11

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 2/9] cxl/mem: Implement Clear Event Records command

On Fri, 12 Aug 2022 22:32:36 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL v3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command. After an event record is read it needs to be cleared from the
> event log.
>
> Implement cxl_clear_event_record() and call it for each record retrieved
> from the device.
>
> Each record is cleared individually. A clear all bit is specified but
> events could arrive between a get and the final clear all operation.
> Therefore each event is cleared specifically.
>
> Signed-off-by: Ira Weiny <[email protected]>
Trivial suggestions inline, but other than that LGTM

Reviewed-by: Jonathan Cameron <[email protected]>

> ---
> drivers/cxl/core/mbox.c | 31 ++++++++++++++++++++++++++++---
> drivers/cxl/cxlmem.h | 15 +++++++++++++++
> include/uapi/linux/cxl_mem.h | 1 +
> 3 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 2cceed8608dc..493f5ceb5d1c 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> #endif
> CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> + CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -708,6 +709,26 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type log,
> + __le16 handle)
> +{
> + struct cxl_mbox_clear_event_payload payload;
> + int rc;
> +
> + memset(&payload, 0, sizeof(payload));

Could just do payload = {};

Thouch as you are setting stuff, why not just do

payload = {
.event_log = log,
.nr_recs = 1,
.handle = handle,
};
and let the compiler zero anything else (I think there are no holes to complicate
things).

> + payload.event_log = log;
> + payload.nr_recs = 1;
> + payload.handle = handle;
> +
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> + &payload, sizeof(payload), NULL, 0);

return cxl_mbox_send_cmd() and drop rc definition.

> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +
> static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> enum cxl_event_log_type type)
> {
> @@ -725,9 +746,12 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> return rc;
>
> record_count = le16_to_cpu(payload.record_count);
> - if (record_count > 0)
> + if (record_count > 0) {
> trace_cxl_event(dev_name(cxlds->dev), type,
> &payload.record);
> + cxl_clear_event_record(cxlds, type,
> + payload.record.hdr.handle);
> + }
>
> if (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_event_overflow(dev_name(cxlds->dev), type,
> @@ -742,10 +766,11 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> * cxl_mem_get_event_records - Get Event Records from the device
> * @cxlds: The device data for the operation
> *
> - * Retrieve all event records available on the device and report them as trace
> - * events.
> + * Retrieve all event records available on the device, report them as trace
> + * events, and clear them.
> *
> * See CXL v3.0 @8.2.9.2.2 Get Event Records
> + * See CXL v3.0 @8.2.9.2.3 Clear Event Records
> */
> void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index f83634f3bc8d..5506e7210cf6 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -255,6 +255,7 @@ enum cxl_opcode {
> CXL_MBOX_OP_INVALID = 0x0000,
> CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
> + CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
> CXL_MBOX_OP_GET_FW_INFO = 0x0200,
> CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> @@ -387,6 +388,20 @@ static inline char *cxl_event_log_type_str(enum cxl_event_log_type type)
> return "<unknown>";
> }
>
> +/*
> + * Clear Event Records input payload
> + * CXL v3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record
> + */
> +struct cxl_mbox_clear_event_payload {
> + u8 event_log; /* enum cxl_event_log_type */
> + u8 clear_flags;
> + u8 nr_recs; /* 1 for this struct */
> + u8 reserved[3];
> + __le16 handle;
> +};
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 70459be5bdd4..7c1ad8062792 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -25,6 +25,7 @@
> ___C(RAW, "Raw device command"), \
> ___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
> ___C(GET_EVENT_RECORD, "Get Event Record"), \
> + ___C(CLEAR_EVENT_RECORD, "Clear Event Record"), \
> ___C(GET_FW_INFO, "Get FW Info"), \
> ___C(GET_PARTITION_INFO, "Get Partition Information"), \
> ___C(GET_LSA, "Get Label Storage Area"), \

2022-08-24 16:08:47

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 3/9] cxl/mem: Clear events on driver load

On Fri, 12 Aug 2022 22:32:37 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> The information contained in the events prior to the driver loading can
> be queried at any time through other mailbox commands.
>
> Ensure a clean slate of events by reading and clearing the events. The
> events are sent to the trace buffer but it is not anticipated to have
> anyone listening to it at driver load time.

I'm not totally sold on it being a good idea to drop records on binding
the device. Let's see what others think...

>
> Signed-off-by: Ira Weiny <[email protected]>
> ---
> drivers/cxl/pci.c | 2 ++
> tools/testing/cxl/test/mem.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..5f1b492bd388 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -498,6 +498,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> + cxl_mem_get_event_records(cxlds);
> +
> if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index aa2df3a15051..e2f5445d24ff 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> + cxl_mem_get_event_records(cxlds);
> +
> if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> rc = devm_cxl_add_nvdimm(dev, cxlmd);
>

2022-08-24 16:14:57

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record

On Fri, 12 Aug 2022 22:32:38 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
>
> Determine if the event read is a general media record and if so trace
> the record.
>
> Signed-off-by: Ira Weiny <[email protected]>
A few trivial things inline...

>
> ---
> A couple of specification questions I've had.
>
> 1) The component id is not specified as a UUID or any particular
> format. It is therefore reported as a byte array. Is this intentional?
Given spec gives "device specific" I'm guessing it's intentional.

>
> 2) This record has a very odd byte layout with a 16 bit field
> (validity_flags) landing on a 3 byte boundary and a 3 byte bit field
> (device) landing on a 7 byte boundary.

oops. Guess 'we' weren't paying attention. Stuck with it now.

>
> I've made my best guess as to how the endianess of these fields should
> be resolved. But I'm happy to hear from other folks if what I have done
> is wrong.

Mailing list probably isn't the place to look for clarification on this.

>
> struct cxl_evt_gen_media {
> struct cxl_event_record_hdr hdr;
> __le64 phys_addr;
> u8 descriptor;
> u8 type;
> u8 transaction_type;
> u16 validity_flags; /* ??? */
> u8 channel;
> u8 rank;
> u8 device[CXL_EVT_GEN_MED_DEV_SIZE]; /* ??? */
> u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> } __packed;
> ---
> drivers/cxl/core/mbox.c | 30 ++++++-
> drivers/cxl/cxlmem.h | 19 +++++
> include/trace/events/cxl-events.h | 125 ++++++++++++++++++++++++++++++
> 3 files changed, 172 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 493f5ceb5d1c..0e433f072163 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -709,6 +709,32 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> +/*
> + * General Media Event Record
> + * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +static const uuid_t gen_media_event_uuid =
> + UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> +
> +static void cxl_trace_event_record(const char *dev_name,
> + enum cxl_event_log_type type,
> + struct cxl_get_event_payload *payload)
> +{
> + uuid_t *id = &payload->record.hdr.id;
> +
> + if (uuid_equal(id, &gen_media_event_uuid)) {
> + struct cxl_evt_gen_media *rec =
> + (struct cxl_evt_gen_media *)&payload->record;
> +
> + trace_cxl_gen_media_event(dev_name, type, rec);
> + return;
> + }
> +
> + /* For unknown record types print just the header */
> + trace_cxl_event(dev_name, type, &payload->record);
> +}
> +
> static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> enum cxl_event_log_type log,
> __le16 handle)
> @@ -747,8 +773,8 @@ static int cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
>
> record_count = le16_to_cpu(payload.record_count);
> if (record_count > 0) {
> - trace_cxl_event(dev_name(cxlds->dev), type,
> - &payload.record);
> + cxl_trace_event_record(dev_name(cxlds->dev), type,
> + &payload);
> cxl_clear_event_record(cxlds, type,
> payload.record.hdr.handle);
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5506e7210cf6..33669459ae4b 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -402,6 +402,25 @@ struct cxl_mbox_clear_event_payload {
> __le16 handle;
> };
>
> +/*
> + * General Media Event Record
> + * CXL v3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CXL_EVT_GEN_MED_DEV_SIZE 3
> +#define CXL_EVT_GEN_MED_COMP_ID_SIZE 0x10
> +struct cxl_evt_gen_media {
> + struct cxl_event_record_hdr hdr;
> + __le64 phys_addr;
> + u8 descriptor;
> + u8 type;
> + u8 transaction_type;
> + u16 validity_flags;
> + u8 channel;
> + u8 rank;
> + u8 device[CXL_EVT_GEN_MED_DEV_SIZE];
> + u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> +} __packed;
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> index f4baeae66cf3..b51c51fd4e62 100644
> --- a/include/trace/events/cxl-events.h
> +++ b/include/trace/events/cxl-events.h
> @@ -119,6 +119,131 @@ TRACE_EVENT(cxl_event,
> )
> );
>
> +/*
> + * General Media Event Record - GMER
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> + */
> +#define CXL_GMER_PHYS_ADDR_VOLATILE BIT(0)
> +#define CXL_GMER_PHYS_ADDR_MASK 0x3f

Inverse of mask is confusing. Just specify the full mask.

> +
> +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
> +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
> +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW BIT(2)
> +#define show_event_desc_flags(flags) __print_flags(flags, "|", \
> + { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
> + { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
> + { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
> +)
> +
> +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
> +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
> +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
> +#define show_mem_event_type(type) __print_symbolic(type, \
> + { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
> + { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
> + { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" } \
> +)
> +
> +#define CXL_GMER_TRANS_UNKNOWN 0x00
> +#define CXL_GMER_TRANS_HOST_READ 0x01
> +#define CXL_GMER_TRANS_HOST_WRITE 0x02
> +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA 0x03
> +#define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
> +#define show_trans_type(type) __print_symbolic(type, \
> + { CXL_GMER_TRANS_UNKNOWN, "Unknown" }, \
> + { CXL_GMER_TRANS_HOST_READ, "Host Read" }, \
> + { CXL_GMER_TRANS_HOST_WRITE, "Host Write" }, \
> + { CXL_GMER_TRANS_HOST_SCAN_MEDIA, "Host Scan Media" }, \
> + { CXL_GMER_TRANS_HOST_INJECT_POISON, "Host Inject Poison" }, \
> + { CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB, "Internal Media Scrub" }, \
> + { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT, "Internal Media Management" } \
> +)
> +
> +#define CXL_GMER_VALID_CHANNEL BIT(0)
> +#define CXL_GMER_VALID_RANK BIT(1)
> +#define CXL_GMER_VALID_DEVICE BIT(2)
> +#define CXL_GMER_VALID_COMPONENT BIT(3)
> +#define show_valid_flags(flags) __print_flags(flags, "|", \
> + { CXL_GMER_VALID_CHANNEL, "CHANNEL" }, \
> + { CXL_GMER_VALID_RANK, "RANK" }, \
> + { CXL_GMER_VALID_DEVICE, "DEVICE" }, \
> + { CXL_GMER_VALID_COMPONENT, "COMPONENT" } \
> +)
> +
> +TRACE_EVENT(cxl_gen_media_event,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_evt_gen_media *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + /* Common */
> + __string(dev_name, dev_name)
> + __field(int, log)
> + __array(u8, id, UUID_SIZE)
> + __field(u32, flags)
> + __field(u16, handle)
> + __field(u16, related_handle)
> + __field(u64, timestamp)
> +
> + /* General Media */
> + __field(u64, phys_addr)
> + __field(u8, descriptor)
> + __field(u8, type)
> + __field(u8, transaction_type)
> + __field(u8, channel)
> + __field(u32, device)
> + __array(u8, comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE)
> + __field(u16, validity_flags)
> + __field(u8, rank) /* Out of order to pack trace record */
> + ),
> +
> + TP_fast_assign(
> + /* Common */
> + __assign_str(dev_name, dev_name);
> + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> + __entry->log = log;
> + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> + __entry->handle = le16_to_cpu(rec->hdr.handle);
> + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +
> + /* General Media */
> + __entry->phys_addr = le64_to_cpu(rec->phys_addr);
> + __entry->descriptor = rec->descriptor;
> + __entry->type = rec->type;
> + __entry->transaction_type = rec->transaction_type;
> + __entry->channel = rec->channel;
> + __entry->rank = rec->rank;
> + __entry->device = rec->device[0] << 24 |
> + rec->device[1] << 16 |
> + rec->device[2] << 8; /* 3 byte LE ? */
> + __entry->device = le32_to_cpu(__entry->device);
> + memcpy(__entry->comp_id, &rec->component_id,
> + CXL_EVT_GEN_MED_COMP_ID_SIZE);
> + __entry->validity_flags = le16_to_cpu(rec->validity_flags);
> + ),
> +
> + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> + "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> + "rank=%u device=%x comp_id=%s valid_flags='%s'",
> + __get_str(dev_name), show_log_type(__entry->log),
> + __entry->timestamp, __entry->id, __entry->handle,
> + __entry->related_handle, show_hdr_flags(__entry->flags),
> + __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> + (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> + show_event_desc_flags(__entry->descriptor),
> + show_mem_event_type(__entry->type),
> + show_trans_type(__entry->transaction_type),
> + __entry->channel, __entry->rank, __entry->device,
> + __print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
> + show_valid_flags(__entry->validity_flags)

Can we make the printing of fields with valid flags conditional?
Been a while since I wrote a Trace point, but I think I recall doing that..

> + )
> +);
> +
> #endif /* _CXL_TRACE_EVENTS_H */
>
> /* This part must be outside protection */

2022-08-25 11:02:48

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record

On Fri, 12 Aug 2022 22:32:39 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
>
> Determine if the event read is a DRAM event record and if so trace the
> record.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> This record has a very odd byte layout with 2 - 16 bit fields
> (validity_flags and column) aligned on an odd byte boundary. In
> addition nibble_mask and row are oddly aligned.
>
> I've made my best guess as to how the endianess of these fields should
> be resolved. But I'm happy to hear from other folks if what I have is
> wrong.
My assumption is same as you. We should sanity check of course by
poking relevant people.

Similar comments in here to previous. Use the get_unaligned_le24()
accessors + consider not printing invalid fields.
>
> struct cxl_evt_dram_rec {
> struct cxl_event_record_hdr hdr;
> __le64 phys_addr;
> u8 descriptor;
> u8 type;
> u8 transaction_type;
> u16 validity_flags;
> u8 channel;
> u8 rank;
> u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> u8 bank_group;
> u8 bank;
> u8 row[CXL_EVT_DER_ROW_SIZE];
> u16 column;
> u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> } __packed;
> ---
> drivers/cxl/core/mbox.c | 16 +++++
> drivers/cxl/cxlmem.h | 24 +++++++
> include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
> 3 files changed, 154 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 0e433f072163..6414588a3c7b 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
> UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
>
> +/*
> + * DRAM Event Record
> + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
rev3.0, r3.0 or just 3.0

> + */
> +static const uuid_t dram_event_uuid =
> + UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> +
> static void cxl_trace_event_record(const char *dev_name,
> enum cxl_event_log_type type,
> struct cxl_get_event_payload *payload)
> @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
> return;
> }
>
> + if (uuid_equal(id, &dram_event_uuid)) {
Why not else if? Should be obvious to compiler that multiple uuid_equal
conditions can't match, but even better to not make it try hard perhaps?

> + struct cxl_evt_dram_rec *rec =
> + (struct cxl_evt_dram_rec *)&payload->record;
> +
> + trace_cxl_dram_event(dev_name, type, rec);
> + return;
> + }
> +
> /* For unknown record types print just the header */
> trace_cxl_event(dev_name, type, &payload->record);
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 33669459ae4b..50536c0a7850 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
> u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> } __packed;
>
> +/*
> + * DRAM Event Record - DER
> + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> + */
> +#define CXL_EVT_DER_NIBBLE_MASK_SIZE 3
> +#define CXL_EVT_DER_ROW_SIZE 3
> +#define CXL_EVT_DER_CORRECTION_MASK_SIZE 0x20
> +struct cxl_evt_dram_rec {
> + struct cxl_event_record_hdr hdr;
> + __le64 phys_addr;
> + u8 descriptor;
> + u8 type;
> + u8 transaction_type;
> + u16 validity_flags;
I've not tried it, but can we just mark these as __le16 and use
the unaligned accessors? get_unaligned_le16 etc
Also there is get_unaligned_le24() for the 3 byte ones.

> + u8 channel;
> + u8 rank;
> + u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> + u8 bank_group;
> + u8 bank;
> + u8 row[CXL_EVT_DER_ROW_SIZE];
> + u16 column;
> + u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> +} __packed;
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> index b51c51fd4e62..db9b34ddd240 100644
> --- a/include/trace/events/cxl-events.h
> +++ b/include/trace/events/cxl-events.h
> @@ -244,6 +244,120 @@ TRACE_EVENT(cxl_gen_media_event,
> )
> );
>
> +/*
> + * DRAM Event Record - DER
> + *
> + * CXL v2.0 section 8.2.9.1.1.2; Table 155
> + */
> +/*
> + * DRAM Event Record defines many fields the same as the General Media Event
> + * Record. Reuse those definitions as appropriate.
> + */
> +#define CXL_DER_VALID_CHANNEL BIT(0)
> +#define CXL_DER_VALID_RANK BIT(1)
> +#define CXL_DER_VALID_NIBBLE BIT(2)
> +#define CXL_DER_VALID_BANK_GROUP BIT(3)
> +#define CXL_DER_VALID_BANK BIT(4)
> +#define CXL_DER_VALID_ROW BIT(5)
> +#define CXL_DER_VALID_COLUMN BIT(6)
> +#define CXL_DER_VALID_CORRECTION_MASK BIT(7)
> +#define show_dram_valid_flags(flags) __print_flags(flags, "|", \
> + { CXL_DER_VALID_CHANNEL, "CHANNEL" }, \
> + { CXL_DER_VALID_RANK, "RANK" }, \
> + { CXL_DER_VALID_NIBBLE, "NIBBLE" }, \
> + { CXL_DER_VALID_BANK_GROUP, "BANK GROUP" }, \
> + { CXL_DER_VALID_BANK, "BANK" }, \
> + { CXL_DER_VALID_ROW, "ROW" }, \
> + { CXL_DER_VALID_COLUMN, "COLUMN" }, \
> + { CXL_DER_VALID_CORRECTION_MASK, "CORRECTION MASK" } \
> +)
> +
> +TRACE_EVENT(cxl_dram_event,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_evt_dram_rec *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + /* Common */
> + __string(dev_name, dev_name)
> + __field(int, log)
> + __array(u8, id, UUID_SIZE)
> + __field(u32, flags)
> + __field(u16, handle)
> + __field(u16, related_handle)
> + __field(u64, timestamp)
> +
> + /* DRAM */
> + __field(u64, phys_addr)
> + __field(u8, descriptor)
> + __field(u8, type)
> + __field(u8, transaction_type)
> + __field(u8, channel)
> + __field(u16, validity_flags)
> + __field(u16, column) /* Out of order to pack trace record */
> + __field(u32, nibble_mask)
> + __field(u32, row)
> + __array(u8, cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE)
> + __field(u8, rank) /* Out of order to pack trace record */
> + __field(u8, bank_group) /* Out of order to pack trace record */
> + __field(u8, bank) /* Out of order to pack trace record */
> + ),
> +
> + TP_fast_assign(
> + /* Common */
> + __assign_str(dev_name, dev_name);
> + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> + __entry->log = log;
> + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> + __entry->handle = le16_to_cpu(rec->hdr.handle);
> + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +
> + /* DRAM */
> + __entry->phys_addr = le64_to_cpu(rec->phys_addr);
> + __entry->descriptor = rec->descriptor;
> + __entry->type = rec->type;
> + __entry->transaction_type = rec->transaction_type;
> + __entry->validity_flags = le16_to_cpu(rec->validity_flags);
> + __entry->channel = rec->channel;
> + __entry->rank = rec->rank;
> + __entry->nibble_mask = rec->nibble_mask[0] << 24 |
> + rec->nibble_mask[1] << 16 |
> + rec->nibble_mask[2] << 8; /* 3 byte LE ? */

Use get_unalinged_le24() ? I'd definitely expect these to be le24.

> + __entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);

That doesn't look right. You will have unwound the endianness using
the shifts above. Don't convert it again (noop on le systems, so you
probably won't see a problem when testing).

> + __entry->bank_group = rec->bank_group;
> + __entry->bank = rec->bank;
> + __entry->row = rec->row[0] << 24 |
> + rec->row[1] << 16 |
> + rec->row[2] << 8; /* 3 byte LE ? */

get_unaligned_le24()

> + __entry->row = le32_to_cpu(__entry->row);

> + __entry->column = le16_to_cpu(rec->column);
> + memcpy(__entry->cor_mask, &rec->correction_mask,
> + CXL_EVT_DER_CORRECTION_MASK_SIZE);
> + ),
> +
> + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> + "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> + "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> + "cor_mask=%s valid_flags='%s'",
> + __get_str(dev_name), show_log_type(__entry->log),
> + __entry->timestamp, __entry->id, __entry->handle,
> + __entry->related_handle, show_hdr_flags(__entry->flags),
> + __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> + (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> + show_event_desc_flags(__entry->descriptor),
As before can we not print the invalid ones based on the validity flags?

Few years ago now, but I did something along those lines for the CCIX equivalent of
this stuff. (honestly can't remember much about it now though!)
Was a bit fiddly but lead to nicer prints in my opinion.

https://lore.kernel.org/all/[email protected]/

> + show_mem_event_type(__entry->type),
> + show_trans_type(__entry->transaction_type),
> + __entry->channel, __entry->rank, __entry->nibble_mask,
> + __entry->bank_group, __entry->bank,
> + __entry->row, __entry->column,
> + __print_hex(__entry->cor_mask, CXL_EVT_DER_CORRECTION_MASK_SIZE),
> + show_dram_valid_flags(__entry->validity_flags)
> + )
> +);
> +
> #endif /* _CXL_TRACE_EVENTS_H */
>
> /* This part must be outside protection */

2022-08-25 11:35:00

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Fri, 12 Aug 2022 22:32:41 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> Facilitate testing basic Get/Clear Event functionality by creating
> multiple logs and generic events with made up UUID's.
>
> Data is completely made up with data patterns which should be easy to
> spot in trace output.
Hi Ira,

I'm tempted to hack the QEMU emulation for this in with appropriately
complex interface to inject all the record types...
Lots to do there though, so not sure where this fits in my priority list!

>
> Test traces are easy to obtain with a small script such as this:
>
> #!/bin/bash -x
>
> devices=`find /sys/devices/platform -name cxl_mem*`
>
> # Generate fake events if reset is passed in

reset is rather unintuitive naming.

fill_event_queue maybe or something more in that direction?

> if [ "$1" == "reset" ]; then
> for device in $devices; do
> echo 1 > $device/mem*/event_reset
> done
> fi
>
> # Turn on tracing
> echo "" > /sys/kernel/tracing/trace
> echo 1 > /sys/kernel/tracing/events/cxl_events/enable
> echo 1 > /sys/kernel/tracing/tracing_on
>
> # Generate fake interrupt
> for device in $devices; do
> echo 1 > $device/mem*/event_trigger
> # just trigger 1
> break;
> done
>
> # Turn off tracing and report events
> echo 0 > /sys/kernel/tracing/tracing_on
> cat /sys/kernel/tracing/trace
>
> Signed-off-by: Ira Weiny <[email protected]>
> ---
> tools/testing/cxl/test/mem.c | 291 +++++++++++++++++++++++++++++++++++
> 1 file changed, 291 insertions(+)
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index e2f5445d24ff..87196d62acf5 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -9,6 +9,8 @@
> #include <linux/bits.h>
> #include <cxlmem.h>
>
> +#include <trace/events/cxl-events.h>
> +
> #define LSA_SIZE SZ_128K
> #define DEV_SIZE SZ_2G
> #define EFFECT(x) (1U << x)
> @@ -137,6 +139,287 @@ static int mock_partition_info(struct cxl_dev_state *cxlds,
> return 0;
> }
>
> +/*
> + * Mock Events
> + */
> +struct mock_event_log {
> + int cur_event;
> + int nr_events;
> + struct xarray events;

I'm not convinced an xarray is appropriate here (I'd have used
a fixed size array) but meh, I don't care that much and mocking
code doesn't have to be quick or elegant :)

> +};
> +
> +struct mock_event_store {
> + struct cxl_dev_state *cxlds;
> + struct mock_event_log *mock_logs[CXL_EVENT_TYPE_MAX];

Each entry isn't terribly big and there aren't that many of them.
Make the code simpler by just embedding the instances here?

> +};
> +
> +DEFINE_XARRAY(mock_cxlds_event_store);
> +
> +void delete_event_store(void *ds)
> +{
> + xa_store(&mock_cxlds_event_store, (unsigned long)ds, NULL, GFP_KERNEL);
> +}
> +
> +void store_event_store(struct mock_event_store *es)
> +{
> + struct cxl_dev_state *cxlds = es->cxlds;
> +
> + if (xa_insert(&mock_cxlds_event_store, (unsigned long)cxlds, es,
> + GFP_KERNEL)) {
> + dev_err(cxlds->dev, "Event store not available for %s\n",
> + dev_name(cxlds->dev));
> + return;
> + }
> +
> + devm_add_action_or_reset(cxlds->dev, delete_event_store, cxlds);
> +}
> +
> +struct mock_event_log *find_event_log(struct cxl_dev_state *cxlds, int log_type)
> +{
> + struct mock_event_store *es = xa_load(&mock_cxlds_event_store,
> + (unsigned long)cxlds);
> +
> + if (!es || log_type >= CXL_EVENT_TYPE_MAX)
> + return NULL;
> + return es->mock_logs[log_type];
> +}
> +
> +struct cxl_event_record_raw *get_cur_event(struct mock_event_log *log)
> +{
> + return xa_load(&log->events, log->cur_event);
> +}
> +
> +__le16 get_cur_event_handle(struct mock_event_log *log)
> +{
> + return cpu_to_le16(log->cur_event);
> +}
> +
> +static bool log_empty(struct mock_event_log *log)
> +{
> + return log->cur_event == log->nr_events;
> +}
> +
> +static int log_rec_left(struct mock_event_log *log)
> +{
> + return log->nr_events - log->cur_event;
> +}
> +
> +static void xa_events_destroy(void *l)
> +{
> + struct mock_event_log *log = l;
> +
> + xa_destroy(&log->events);
> +}
> +
> +static void event_store_add_event(struct mock_event_store *es,
> + enum cxl_event_log_type log_type,
> + struct cxl_event_record_raw *event)
> +{
> + struct mock_event_log *log;
> + struct device *dev = es->cxlds->dev;
> + int rc;
> +
> + if (log_type >= CXL_EVENT_TYPE_MAX)
> + return;
> +
> + log = es->mock_logs[log_type];
> + if (!log) {
> + log = devm_kzalloc(dev, sizeof(*log), GFP_KERNEL);

As above, I'd just embed the logs directly in the containing structure
rather than allocating on demand. init them all up front.

> + if (!log) {
> + dev_err(dev, "Failed to create %s log\n",
> + cxl_event_log_type_str(log_type));
> + return;
> + }
> + xa_init(&log->events);
> + devm_add_action(dev, xa_events_destroy, log);
> + es->mock_logs[log_type] = log;
> + }
> +
> + rc = xa_insert(&log->events, log->nr_events, event, GFP_KERNEL);
Not sure using an xa for a list really makes that much sense, but
doesn't matter hugely.
> + if (rc) {
> + dev_err(dev, "Failed to store event %s log\n",
> + cxl_event_log_type_str(log_type));
> + return;
> + }
> + log->nr_events++;

Having an index into a static set of events is more complex.
I'd either switch to a simple array of pointers, or actually add and
remove events (or pointers to them anyway).

> +}
> +
> +/*
> + * Get and clear event only handle 1 record at a time as this is what is
> + * currently implemented in the main code.
> + */
> +static int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> +{
> + struct cxl_get_event_payload *pl;
> + struct mock_event_log *log;
> + u8 log_type;
> +
> + /* Valid request? */
> + if (cmd->size_in != 1)
> + return -EINVAL;
> +
> + log_type = *((u8 *)cmd->payload_in);
> + if (log_type >= CXL_EVENT_TYPE_MAX)
> + return -EINVAL;
> +
> + log = find_event_log(cxlds, log_type);
> + if (!log || log_empty(log))
> + goto no_data;
> +
> + /* Don't handle more than 1 record at a time */
> + if (cmd->size_out < sizeof(*pl))
> + return -EINVAL;
> +
> + pl = cmd->payload_out;
> + memset(pl, 0, sizeof(*pl));
> +
> + pl->record_count = cpu_to_le16(1);
> +
> + if (log_rec_left(log) > 1)
> + pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
> +
> + memcpy(&pl->record, get_cur_event(log), sizeof(pl->record));
> + pl->record.hdr.handle = get_cur_event_handle(log);
> + return 0;
> +
> +no_data:
> + /* Room for header? */
> + if (cmd->size_out < (sizeof(*pl) - sizeof(pl->record)))
> + return -EINVAL;
> +
> + memset(cmd->payload_out, 0, cmd->size_out);
> + return 0;
> +}
> +
> +/*
> + * Get and clear event only handle 1 record at a time as this is what is
> + * currently implemented in the main code.

Duplicating this comment seems unnecessary.

> + */
> +static int mock_clear_event(struct cxl_dev_state *cxlds,
> + struct cxl_mbox_cmd *cmd)
> +{
> + struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> + struct mock_event_log *log;
> + u8 log_type = pl->event_log;
> +
> + /* Don't handle more than 1 record at a time */
> + if (pl->nr_recs != 1)
> + return -EINVAL;
> +
> + if (log_type >= CXL_EVENT_TYPE_MAX)
> + return -EINVAL;
> +
> + log = find_event_log(cxlds, log_type);
> + if (!log)
> + return 0; /* No mock data in this log */
> +
> + /*
> + * The current code clears events as they are read
> + * Test that behavior; not clearning from the middle of the log
> + */
> + if (log->cur_event != le16_to_cpu(pl->handle)) {
> + dev_err(cxlds->dev, "Clearing events out of order\n");
> + return -EINVAL;
> + }
> +
> + log->cur_event++;
> + return 0;
> +}
> +
> +static ssize_t event_reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
> + int i;
> +
> + for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) {
> + struct mock_event_log *log;
> +
> + log = find_event_log(cxlmd->cxlds, i);
> + if (log)
> + log->cur_event = 0;
> + }
> +
> + return count;
> +}
> +static DEVICE_ATTR_WO(event_reset);
> +
> +static ssize_t event_trigger_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct cxl_memdev *cxlmd = container_of(dev, struct cxl_memdev, dev);
> +
> + cxl_mem_get_event_records(cxlmd->cxlds);
> +
> + return count;
> +}
> +static DEVICE_ATTR_WO(event_trigger);
> +
> +static struct attribute *cxl_mock_event_attrs[] = {
> + &dev_attr_event_reset.attr,
> + &dev_attr_event_trigger.attr,
> + NULL
> +};
> +ATTRIBUTE_GROUPS(cxl_mock_event);
> +
> +void remove_mock_event_groups(void *dev)
static
> +{
> + device_remove_groups(dev, cxl_mock_event_groups);
> +}
> +
> +struct cxl_event_record_raw maint_needed = {
> + .hdr = {
> + .id = UUID_INIT(0xDEADBEEF, 0xCAFE, 0xBABE, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> + .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_MAINT_NEEDED << 8) |
> + sizeof(struct cxl_event_record_raw)),
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0xa5b6),
> + },
> + .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +struct cxl_event_record_raw hardware_replace = {
static const?
> + .hdr = {
> + .id = UUID_INIT(0xBABECAFE, 0xBEEF, 0xDEAD, 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
> + .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_HW_REPLACE << 8) |
> + sizeof(struct cxl_event_record_raw)),
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0xb6a5),
> + },
> + .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> +};
> +
> +static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
> +{
> + struct device *dev = &cxlmd->dev;
> + struct mock_event_store *es;
> +
> + /*
> + * The memory device gets the sysfs attributes such that the cxlmd
> + * pointer can be used to get to a cxlds pointer.
> + */
> + if (device_add_groups(dev, cxl_mock_event_groups))

Whilst it might not matter in a mocking driver, it's normal to jump through
hoops to avoid doing this because it races with userspace notifications in
all sorts of hideous ways. It makes the sysfs maintainers very grumpy ;)
To do it here, you would need to pass the group to devm_cxl_add_memdev()
and have that slip it in before the cdev_device_add() call I think.
That wouldn't be particular invasive though.

> + return;
> + if (devm_add_action_or_reset(dev, remove_mock_event_groups, dev))
> + return;
> +
> + /*
> + * All the mock event data hangs off the device itself.

Nitpick of the day: Single line comment syntax ;)

> + */
> + es = devm_kzalloc(cxlmd->cxlds->dev, sizeof(*es), GFP_KERNEL);
> + if (!es)
> + return;
> + es->cxlds = cxlmd->cxlds;
> +
> + event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
> +
> + event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> +
> + store_event_store(es);
> +}
> +
> static int mock_get_lsa(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> {
> struct cxl_mbox_get_lsa *get_lsa = cmd->payload_in;
> @@ -224,6 +507,12 @@ static int cxl_mock_mbox_send(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *
> case CXL_MBOX_OP_GET_PARTITION_INFO:
> rc = mock_partition_info(cxlds, cmd);
> break;
> + case CXL_MBOX_OP_GET_EVENT_RECORD:
> + rc = mock_get_event(cxlds, cmd);
> + break;
> + case CXL_MBOX_OP_CLEAR_EVENT_RECORD:
> + rc = mock_clear_event(cxlds, cmd);
> + break;
> case CXL_MBOX_OP_SET_LSA:
> rc = mock_set_lsa(cxlds, cmd);
> break;
> @@ -285,6 +574,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> + devm_cxl_mock_event_logs(cxlmd);
> +
> cxl_mem_get_event_records(cxlds);
>
> if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))

2022-08-25 12:00:12

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record

On Fri, 12 Aug 2022 22:32:40 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
>
> Determine if the event read is memory module record and if so trace the
> record.
>
> Signed-off-by: Ira Weiny <[email protected]>
Similar comments to on previous patches around using
get_unaligned_le*()

> ---
> drivers/cxl/core/mbox.c | 16 +++
> drivers/cxl/cxlmem.h | 25 +++++
> include/trace/events/cxl-events.h | 155 ++++++++++++++++++++++++++++++
> 3 files changed, 196 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6414588a3c7b..99b09bfeaff5 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -725,6 +725,14 @@ static const uuid_t dram_event_uuid =
> UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
>
> +/*
> + * Memory Module Event Record
> + * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +static const uuid_t mem_mod_event_uuid =
> + UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
> +
> static void cxl_trace_event_record(const char *dev_name,
> enum cxl_event_log_type type,
> struct cxl_get_event_payload *payload)
> @@ -747,6 +755,14 @@ static void cxl_trace_event_record(const char *dev_name,
> return;
> }
>
> + if (uuid_equal(id, &mem_mod_event_uuid)) {
> + struct cxl_evt_mem_mod_rec *rec =
> + (struct cxl_evt_mem_mod_rec *)&payload->record;
> +
> + trace_cxl_mem_mod_event(dev_name, type, rec);
> + return;
> + }
> +
> /* For unknown record types print just the header */
> trace_cxl_event(dev_name, type, &payload->record);
> }
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 50536c0a7850..a02a41dfd988 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -445,6 +445,31 @@ struct cxl_evt_dram_rec {
> u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> } __packed;
>
> +/*
> + * Get Health Info Record
> + * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +struct cxl_get_health_info {
> + u8 health_status;
> + u8 media_status;
> + u8 add_status;
> + u8 life_used;
> + u16 device_temp;

As previous - even though they aren't aligned, I'd have thought
__le16 etc will still work. The unaligned accessors are fine
taking __le16 * for example.

> + u32 dirty_shutdown_cnt;
> + u32 cor_vol_err_cnt;
> + u32 cor_per_err_cnt;
> +} __packed;
> +
> +/*
> + * Memory Module Event Record
> + * CXL v3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +struct cxl_evt_mem_mod_rec {
> + struct cxl_event_record_hdr hdr;
> + u8 event_type;
> + struct cxl_get_health_info info;
> +} __packed;
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl-events.h b/include/trace/events/cxl-events.h
> index db9b34ddd240..dbbe25fee25c 100644
> --- a/include/trace/events/cxl-events.h
> +++ b/include/trace/events/cxl-events.h
> @@ -358,6 +358,161 @@ TRACE_EVENT(cxl_dram_event,
> )
> );
>
> +/*
> + * Memory Module Event Record - MMER
> + *
> + * CXL v2.0 section 8.2.9.1.1.3; Table 156, Table 181
> + *
> + * Device Health Information - DHI; Table 181
> + */
> +#define CXL_MMER_HEALTH_STATUS_CHANGE 0x00
> +#define CXL_MMER_MEDIA_STATUS_CHANGE 0x01
> +#define CXL_MMER_LIFE_USED_CHANGE 0x02
> +#define CXL_MMER_TEMP_CHANGE 0x03
> +#define CXL_MMER_DATA_PATH_ERROR 0x04
> +#define CXL_MMER_LAS_ERROR 0x05
> +#define show_dev_evt_type(type) __print_symbolic(type, \
> + { CXL_MMER_HEALTH_STATUS_CHANGE, "Health Status Change" }, \
> + { CXL_MMER_MEDIA_STATUS_CHANGE, "Media Status Change" }, \
> + { CXL_MMER_LIFE_USED_CHANGE, "Life Used Change" }, \
> + { CXL_MMER_TEMP_CHANGE, "Temperature Change" }, \
> + { CXL_MMER_DATA_PATH_ERROR, "Data Path Error" }, \
> + { CXL_MMER_LAS_ERROR, "LSA Error" } \
> +)
> +
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
> +)
> +
> +#define CXL_DHI_MS_NORMAL 0x00
> +#define CXL_DHI_MS_NOT_READY 0x01
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST 0x02
> +#define CXL_DHI_MS_ALL_DATA_LOST 0x03
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS 0x04
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN 0x05
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT 0x06
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS 0x07
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN 0x08
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT 0x09
> +#define show_media_status(ms) __print_symbolic(ms, \
> + { CXL_DHI_MS_NORMAL, \
> + "Normal" }, \
> + { CXL_DHI_MS_NOT_READY, \
> + "Not Ready" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOST, \
> + "Write Persistency Lost" }, \
> + { CXL_DHI_MS_ALL_DATA_LOST, \
> + "All Data Lost" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS, \
> + "Write Persistency Loss in the Event of Power Loss" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN, \
> + "Write Persistency Loss in Event of Shutdown" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT, \
> + "Write Persistency Loss Imminent" }, \
> + { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS, \
> + "All Data Loss in Event of Power Loss" }, \
> + { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN, \
> + "All Data loss in the Event of Shutdown" }, \
> + { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT, \
> + "All Data Loss Imminent" } \
> +)
> +
> +#define CXL_DHI_AS_NORMAL 0x0
> +#define CXL_DHI_AS_WARNING 0x1
> +#define CXL_DHI_AS_CRITICAL 0x2
> +#define show_add_status(as) __print_symbolic(as, \
> + { CXL_DHI_AS_NORMAL, "Normal" }, \
> + { CXL_DHI_AS_WARNING, "Warning" }, \
> + { CXL_DHI_AS_CRITICAL, "Critical" } \
> +)
> +
> +#define CXL_DHI_AS_LIFE_USED(as) (as & 0x3)
> +#define CXL_DHI_AS_DEV_TEMP(as) ((as & 0xC) >> 2)
> +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as) ((as & 0x10) >> 4)
> +#define CXL_DHI_AS_COR_PER_ERR_CNT(as) ((as & 0x20) >> 5)
> +
> +TRACE_EVENT(cxl_mem_mod_event,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_evt_mem_mod_rec *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + /* Common */
> + __string(dev_name, dev_name)
> + __field(int, log)
> + __array(u8, id, UUID_SIZE)
> + __field(u32, flags)
> + __field(u16, handle)
> + __field(u16, related_handle)
> + __field(u64, timestamp)
> +
> + /* Memory Module Event */
> + __field(u8, event_type)
> +
> + /* Device Health Info */
> + __field(u8, health_status)
> + __field(u8, media_status)
> + __field(u8, life_used)
> + __field(u32, dirty_shutdown_cnt)
> + __field(u32, cor_vol_err_cnt)
> + __field(u32, cor_per_err_cnt)
> + __field(s16, device_temp)
> + __field(u8, add_status)
> + ),
> +
> + TP_fast_assign(
> + /* Common */
> + __assign_str(dev_name, dev_name);
> + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> + __entry->log = log;
> + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> + __entry->handle = le16_to_cpu(rec->hdr.handle);
> + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> +
> + /* Memory Module Event */
> + __entry->event_type = rec->event_type;
> +
> + /* Device Health Info */
> + __entry->health_status = rec->info.health_status;
> + __entry->media_status = rec->info.media_status;
> + __entry->life_used = rec->info.life_used;
> + __entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
> + __entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);

I've lost track, but my guess is some / all of these need the unaligned_get_le32()
etc rather than aligned form. Maybe just be lazy and use the unaligned versions
even when things happen to be aligned - then we don't have to think about it
when reviewing :)

> + __entry->cor_per_err_cnt = le32_to_cpu(rec->info.cor_per_err_cnt);
> + __entry->device_temp = le16_to_cpu(rec->info.device_temp);
> + __entry->add_status = rec->info.add_status;
> + ),
> +
> + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> + "evt_type='%s' health_status='%s' media_status='%s' as_life_used=%s " \
> + "as_dev_temp=%s as_cor_vol_err_cnt=%s as_cor_per_err_cnt=%s " \
> + "life_used=%u dev_temp=%d dirty_shutdown_cnt=%u cor_vol_err_cnt=%u " \
> + "cor_per_err_cnt=%u",
> + __get_str(dev_name), show_log_type(__entry->log),
> + __entry->timestamp, __entry->id, __entry->handle,
> + __entry->related_handle, show_hdr_flags(__entry->flags),
> +
> + show_dev_evt_type(__entry->event_type),
> + show_health_status_flags(__entry->health_status),
> + show_media_status(__entry->media_status),
> + show_add_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
> + show_add_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
> + show_add_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
> + show_add_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
> + __entry->life_used, __entry->device_temp,
> + __entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
> + __entry->cor_per_err_cnt)
> +);
> +
> +
> #endif /* _CXL_TRACE_EVENTS_H */
>
> /* This part must be outside protection */

2022-08-25 12:16:25

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 8/9] cxl/test: Add specific events

On Fri, 12 Aug 2022 22:32:42 -0700
[email protected] wrote:

> From: Ira Weiny <[email protected]>
>
> Each type of event has different trace point outputs.
>
> Add mock General Media Event, DRAM event, and Memory Module Event
> records to the mock list of events returned.
>
> Signed-off-by: Ira Weiny <[email protected]>
> ---
> tools/testing/cxl/test/mem.c | 70 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 70 insertions(+)
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 87196d62acf5..c5d7857ae2e5 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -391,6 +391,70 @@ struct cxl_event_record_raw hardware_replace = {
> .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> };
>
> +struct cxl_evt_gen_media gen_media = {
> + .hdr = {
> + .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> + .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERMANENT << 8) |
> + sizeof(struct cxl_evt_gen_media)),
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0),
> + },
> + .phys_addr = cpu_to_le64(0x2000),
> + .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> + .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> + .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> + .validity_flags = cpu_to_le16(CXL_GMER_VALID_CHANNEL |
> + CXL_GMER_VALID_RANK),

No actual affect (I think: __put_unaligned_t is basically
forcing a packed structure element) , but put_unaligned_le16() would
make it clear this is unaligned?

> + .channel = 1,
> + .rank = 30
> +};
> +
> +struct cxl_evt_dram_rec dram_rec = {
> + .hdr = {
> + .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> + .flags_length = cpu_to_le32((CXL_EVENT_RECORD_FLAG_PERF_DEGRADED << 8) |
> + sizeof(struct cxl_evt_dram_rec)),
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0),
> + },
> + .phys_addr = cpu_to_le64(0x8000),
> + .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> + .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> + .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> + .validity_flags = cpu_to_le16(CXL_DER_VALID_CHANNEL |
> + CXL_DER_VALID_BANK_GROUP |
> + CXL_DER_VALID_BANK |
> + CXL_DER_VALID_COLUMN),
> + .channel = 1,
> + .bank_group = 5,
> + .bank = 2,
> + .column = cpu_to_le16(1024)
> +};
> +
> +struct cxl_evt_mem_mod_rec mem_mod_rec = {
> + .hdr = {
> + .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
> + .flags_length = cpu_to_le32(sizeof(struct cxl_evt_mem_mod_rec)),
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0),
> + },
> + .event_type = CXL_MMER_TEMP_CHANGE,
> + .info = {
> + .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> + .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> + .add_status = (CXL_DHI_AS_CRITICAL << 2) |

Can we use masks + FIELD_PREP() for these rather than
magic shifts here?

> + (CXL_DHI_AS_WARNING << 4) |
> + (CXL_DHI_AS_WARNING << 5),
> + .device_temp = cpu_to_le16(1000),
> + .dirty_shutdown_cnt = cpu_to_le32(30000),
> + .cor_vol_err_cnt = cpu_to_le32(30100),
> + .cor_per_err_cnt = cpu_to_le32(40100),
> + }
> +};
> +
> static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
> {
> struct device *dev = &cxlmd->dev;
> @@ -414,8 +478,14 @@ static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
> es->cxlds = cxlmd->cxlds;
>
> event_store_add_event(es, CXL_EVENT_TYPE_INFO, &maint_needed);
> + event_store_add_event(es, CXL_EVENT_TYPE_INFO,
> + (struct cxl_event_record_raw *)&gen_media);
> + event_store_add_event(es, CXL_EVENT_TYPE_INFO,
> + (struct cxl_event_record_raw *)&mem_mod_rec);
>
> event_store_add_event(es, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> + event_store_add_event(es, CXL_EVENT_TYPE_FATAL,
> + (struct cxl_event_record_raw *)&dram_rec);
>
> store_event_store(es);
> }

2022-09-01 18:30:43

[permalink] [raw]

Subject: Re: [RFC PATCH 0/9] CXL: Read and clear event logs

On 8/24/2022 3:07 AM, Jonathan Cameron wrote:
> On Mon, 22 Aug 2022 15:53:54 -0700
> Ira Weiny <[email protected]> wrote:
>
>> On Mon, Aug 22, 2022 at 09:18:02AM -0700, Davidlohr Bueso wrote:
>>> On Fri, 12 Aug 2022, [email protected] wrote:
>>>
>>>> From: Ira Weiny <[email protected]>
>>>>
>>>> Event records inform the OS of various device events. Events are not needed
>>>> for any kernel operation but various user level software will want to track
>>>> events.
>>>>
>>>> Add event reporting through the trace event mechanism. On driver load read and
>>>> clear all device events.
>>>>
>>>> Normally interrupts will trigger new events to be reported as they occur.
>>>> Because the interrupt code is still being worked on this series provides a
>>>> cxl-test mechanism to create a series of events and trigger the reporting of
>>>> those events.
>>> Where is this irq code being worked on? I've asked about this for async mbox
>>> commands, and Jonathan has also posted some code for the PMU implementation.
>> I'm still trying to work out how to share irq's between PCI and CXL. Mainly
>> for DOE.
>>
>> I thought that we could skip IRQ support for DOE completely and this would
>> support your proposal below. But I just found that:
>>
>> "A device may interrupt the host when CDAT content changes using the MSI
>> associated with this DOE Capability instance."
> As of today that doesn't work because there is no status flag anywhere to let
> you know that was the interrupt source.
>
> It's been raised in appropriate places, but I can't say anymore on that
> until stuff is published.
>
> Hence I'd not worry about that corner for now.
>
>> So I guess it needs to be supported at some point.
>>
>>> Could we not just start with an initial MSI/MSI-X support? Then gradually
>>> interested users can be added? So each "feature" would need to do implement
>>> it's "get message number" and to install the isr just do the standard:
>>>
>>> irq = pci_irq_vector(pdev, num);
>>> irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s\n", dev_name(dev),
>>> cxl_irq_cap_table[feature].name);
>>> rc = devm_request_irq(dev, irq, isr_fn, IRQF_SHARED, irq_name, info);
>>>
>>> The only complexity I see for this is to know the number of vectors to request
>>> apriori, for which we'd have to get the larges value of all CXL features that
>>> can support interrupts. Something like the following?
>> Generally it seems ok but I have questions below.
>>
>>> One thing I have not
>>> considered in this is the DOE stuff.
>> I think this is the harder thing to support because of needing to allow both
>> the PCI layer and the CXL layer to create irqs. Potentially at different
>> times.
> My reasoning on this is that IRQ creation has to be done by
> the PCI device driver. That may result in some juggling and late starting
> or indeed restarting of DOE mailboxes once we can know the list of vectors.
> (e.g. query them by polling, then a later driver register can request enabling
> the DOE with an irq).
> Or it needs the ability to do dynamic increasing of the requested IRQ vectors.

tglx was working on dynamic MSIX a while back. not sure the state of
that now

https://lore.kernel.org/lkml/87a6hof5sr.ffs@tglx/T/

DJ

>
>>> Thanks,
>>> Davidlohr
>>>
>>> ------
>>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>>> index 88e3a8e54b6a..b334d2f497c1 100644
>>> --- a/drivers/cxl/cxlmem.h
>>> +++ b/drivers/cxl/cxlmem.h
>>> @@ -245,6 +245,8 @@ struct cxl_dev_state {
>>> resource_size_t component_reg_phys;
>>> u64 serial;
>>>
>>> + int irq_type; /* MSI-X, MSI */
>>> +
>>> struct xarray doe_mbs;
>>>
>>> int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
>>> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
>>> index eec597dbe763..95f4b91f43b1 100644
>>> --- a/drivers/cxl/cxlpci.h
>>> +++ b/drivers/cxl/cxlpci.h
>>> @@ -53,15 +53,6 @@
>>> #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
>>> #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
>>>
>>> -/* Register Block Identifier (RBI) */
>>> -enum cxl_regloc_type {
>>> - CXL_REGLOC_RBI_EMPTY = 0,
>>> - CXL_REGLOC_RBI_COMPONENT,
>>> - CXL_REGLOC_RBI_VIRT,
>>> - CXL_REGLOC_RBI_MEMDEV,
>>> - CXL_REGLOC_RBI_TYPES
>>> -};
>> Why move this?
>>
>>> -
>>> static inline resource_size_t cxl_regmap_to_base(struct pci_dev *pdev,
>>> struct cxl_register_map *map)
>>> {
>>> @@ -75,4 +66,44 @@ int devm_cxl_port_enumerate_dports(struct cxl_port *port);
>>> struct cxl_dev_state;
>>> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm);
>>> void read_cdat_data(struct cxl_port *port);
>>> +
>>> +#define CXL_IRQ_CAPABILITY_TABLE \
>>> + C(ISOLATION, "isolation", NULL), \
>>> + C(PMU, "pmu_overflow", NULL), /* per pmu instance */ \
>>> + C(MBOX, "mailbox", NULL), /* primary-only */ \
>>> + C(EVENT, "event", NULL),
>> This is defining get_max_msgnum to NULL right?
>>
>>> +
>>> +#undef C
>>> +#define C(a, b, c) CXL_IRQ_CAPABILITY_##a
>>> +enum { CXL_IRQ_CAPABILITY_TABLE };
>>> +#undef C
>>> +#define C(a, b, c) { b, c }
>>> +/**
>>> + * struct cxl_irq_cap - CXL feature that is capable of receiving MSI/MSI-X irqs.
>>> + *
>>> + * @name: Name of the device generating this interrupt.
>>> + * @get_max_msgnum: Get the feature's largest interrupt message number. In cases
>>> + * where there is only one instance it also indicates which
>>> + * MSI/MSI-X vector is used for the interrupt message generated
>>> + * in association with the feature. If the feature does not
>>> + * have the Interrupt Supported bit set, then return -1.
>>> + */
>>> +struct cxl_irq_cap {
>>> + const char *name;
>>> + int (*get_max_msgnum)(struct cxl_dev_state *cxlds);
>>> +};
>>> +
>>> +static const
>>> +struct cxl_irq_cap cxl_irq_cap_table[] = { CXL_IRQ_CAPABILITY_TABLE };
>>> +#undef C
>> Why all this macro magic?
> Agreed. I'm rarely persuaded it's a good idea to do this sort of trickery
> and it definitely isn't worth the readabilty problems unless there a
> large number of users.
>
>>> +
>>> +/* Register Block Identifier (RBI) */
>>> +enum cxl_regloc_type {
>>> + CXL_REGLOC_RBI_EMPTY = 0,
>>> + CXL_REGLOC_RBI_COMPONENT,
>>> + CXL_REGLOC_RBI_VIRT,
>>> + CXL_REGLOC_RBI_MEMDEV,
>>> + CXL_REGLOC_RBI_TYPES
>>> +};
>>> +
>>> #endif /* __CXL_PCI_H__ */
>>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>>> index faeb5d9d7a7a..c0fe78e0559b 100644
>>> --- a/drivers/cxl/pci.c
>>> +++ b/drivers/cxl/pci.c
>>> @@ -387,6 +387,52 @@ static int cxl_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>>> return rc;
>>> }
>>>
>>> +static void cxl_pci_free_irq_vectors(void *data)
>>> +{
>>> + pci_free_irq_vectors(data);
>>> +}
>>> +
>>> +static int cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
>>> +{
>>> + struct device *dev = cxlds->dev;
>>> + struct pci_dev *pdev = to_pci_dev(dev);
>>> + int rc, i, vectors = -1;
>>> +
>>> + for (i = 0; i < ARRAY_SIZE(cxl_irq_cap_table); i++) {
>>> + int irq;
>>> +
>>> + if (!cxl_irq_cap_table[i].get_max_msgnum)
>>> + continue;
>>> +
>>> + irq = cxl_irq_cap_table[i].get_max_msgnum(cxlds);
>>> + vectors = max_t(int, irq, vectors);
>>> + }
>>> +
>>> + if (vectors == -1)
>>> + return -EINVAL; /* no irq support whatsoever */
>>> +
>>> + vectors++;
>> This is pretty much what earlier versions of the DOE code did with the
>> exception of only have 1 get_max_msgnum() calls defined (for DOE). But there
>> was a lot of debate about how to share vectors with the PCI layer. And
>> eventually we got rid of it. I'm still trying to figure it out. Sorry for
>> being slow.
> I'm not yet setting huge advantage in wrapping this up. For now a set of
> linear calls to establish the max irq vector is more readable. Sure
> down the line moving to this may make sense.
>
>> Perhaps we do this for this series. However, won't we have an issue if we want
>> to support switch events?
> We 'could' extend existing stuff in the portdrv code (which is ultimately
> where this general approach was copied from ;) but I suspect doing that
> for non generic PCI stuff is going to be controversial.
>
> That whole infrastructure in PCI may need a rewrite.
>
>> Ira
>>
>>> + rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSIX);
>>> + if (rc < 0) {
>>> + rc = pci_alloc_irq_vectors(pdev, vectors, vectors, PCI_IRQ_MSI);
>>> + if (rc < 0)
>>> + return rc;
>>> +
>>> + cxlds->irq_type = PCI_IRQ_MSI;
>>> + } else {
>>> + cxlds->irq_type = PCI_IRQ_MSIX;
>>> + }
>>> +
>>> + if (rc != vectors) {
>>> + pci_err(pdev, "Not enough interrupts; use polling where supported\n");
>>> + /* Some got allocated; clean them up */
>>> + cxl_pci_free_irq_vectors(pdev);
>>> + return -ENOSPC;
>>> + }
>>> +
>>> + return devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
>>> +}
>>> +
>>> static void cxl_pci_destroy_doe(void *mbs)
>>> {
>>> xa_destroy(mbs);
>>> @@ -476,6 +522,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>
>>> cxlds->component_reg_phys = cxl_regmap_to_base(pdev, &map);
>>>
>>> + if (cxl_pci_alloc_irq_vectors(cxlds))
>>> + cxlds->irq_type = 0;
>>> +
>>> devm_cxl_pci_create_doe(cxlds);
>>>
>>> rc = cxl_pci_setup_mailbox(cxlds);

2022-09-09 22:02:21

[permalink] [raw]

Subject: Re: [RFC PATCH 2/9] cxl/mem: Implement Clear Event Records command

On Wed, Aug 24, 2022 at 04:55:13PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:36 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL v3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command. After an event record is read it needs to be cleared from the
> > event log.
> >
> > Implement cxl_clear_event_record() and call it for each record retrieved
> > from the device.
> >
> > Each record is cleared individually. A clear all bit is specified but
> > events could arrive between a get and the final clear all operation.
> > Therefore each event is cleared specifically.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> Trivial suggestions inline, but other than that LGTM
>
> Reviewed-by: Jonathan Cameron <[email protected]>

Thanks!

>
> > ---
> > drivers/cxl/core/mbox.c | 31 ++++++++++++++++++++++++++++---
> > drivers/cxl/cxlmem.h | 15 +++++++++++++++
> > include/uapi/linux/cxl_mem.h | 1 +
> > 3 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 2cceed8608dc..493f5ceb5d1c 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> > #endif
> > CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> > CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> > + CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> > CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> > CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> > CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> > @@ -708,6 +709,26 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >
> > +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type log,
> > + __le16 handle)
> > +{
> > + struct cxl_mbox_clear_event_payload payload;
> > + int rc;
> > +
> > + memset(&payload, 0, sizeof(payload));
>
> Could just do payload = {};
>
> Thouch as you are setting stuff, why not just do
>
> payload = {
> .event_log = log,
> .nr_recs = 1,
> .handle = handle,
> };
> and let the compiler zero anything else (I think there are no holes to complicate
> things).

Yea! Done.

>
> > + payload.event_log = log;
> > + payload.nr_recs = 1;
> > + payload.handle = handle;
> > +
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> > + &payload, sizeof(payload), NULL, 0);
>
> return cxl_mbox_send_cmd() and drop rc definition.

And Done. I've also used the return value now! ;-)

Thanks again!
Ira

2022-09-12 23:01:54

[permalink] [raw]

Subject: Re: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record

On Wed, Aug 24, 2022 at 05:11:13PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:38 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> >
> > Determine if the event read is a general media record and if so trace
> > the record.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> A few trivial things inline...
>

[snip]

> > +/*
> > + * General Media Event Record - GMER
> > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> > + */
> > +#define CXL_GMER_PHYS_ADDR_VOLATILE BIT(0)
> > +#define CXL_GMER_PHYS_ADDR_MASK 0x3f
>
> Inverse of mask is confusing. Just specify the full mask.

Fixed

[snip]

> > + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > + "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > + "rank=%u device=%x comp_id=%s valid_flags='%s'",
> > + __get_str(dev_name), show_log_type(__entry->log),
> > + __entry->timestamp, __entry->id, __entry->handle,
> > + __entry->related_handle, show_hdr_flags(__entry->flags),
> > + __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > + (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > + show_event_desc_flags(__entry->descriptor),
> > + show_mem_event_type(__entry->type),
> > + show_trans_type(__entry->transaction_type),
> > + __entry->channel, __entry->rank, __entry->device,
> > + __print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
> > + show_valid_flags(__entry->validity_flags)
>
> Can we make the printing of fields with valid flags conditional?
> Been a while since I wrote a Trace point, but I think I recall doing that..

I'm not seeing a way right off. But I can't say it is impossible...

I'll keep an eye out as I clean the series up,
Ira

>
> > + )
> > +);
> > +
> > #endif /* _CXL_TRACE_EVENTS_H */
> >
> > /* This part must be outside protection */
>

2022-09-12 23:48:31

[permalink] [raw]

Subject: Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record

On Thu, Aug 25, 2022 at 11:46:32AM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:39 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> >
> > Determine if the event read is a DRAM event record and if so trace the
> > record.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
> > This record has a very odd byte layout with 2 - 16 bit fields
> > (validity_flags and column) aligned on an odd byte boundary. In
> > addition nibble_mask and row are oddly aligned.
> >
> > I've made my best guess as to how the endianess of these fields should
> > be resolved. But I'm happy to hear from other folks if what I have is
> > wrong.
> My assumption is same as you. We should sanity check of course by
> poking relevant people.
>
> Similar comments in here to previous. Use the get_unaligned_le24()
> accessors + consider not printing invalid fields.

Yea I've already converted the 3 byte fields to get_unaligned_le24()

> >
> > struct cxl_evt_dram_rec {
> > struct cxl_event_record_hdr hdr;
> > __le64 phys_addr;
> > u8 descriptor;
> > u8 type;
> > u8 transaction_type;
> > u16 validity_flags;
> > u8 channel;
> > u8 rank;
> > u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> > u8 bank_group;
> > u8 bank;
> > u8 row[CXL_EVT_DER_ROW_SIZE];
> > u16 column;
> > u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> > } __packed;
> > ---
> > drivers/cxl/core/mbox.c | 16 +++++
> > drivers/cxl/cxlmem.h | 24 +++++++
> > include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
> > 3 files changed, 154 insertions(+)
> >
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 0e433f072163..6414588a3c7b 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
> > UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> > 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> >
> > +/*
> > + * DRAM Event Record
> > + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
> rev3.0, r3.0 or just 3.0

Already done.

>
> > + */
> > +static const uuid_t dram_event_uuid =
> > + UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> > + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> > +
> > static void cxl_trace_event_record(const char *dev_name,
> > enum cxl_event_log_type type,
> > struct cxl_get_event_payload *payload)
> > @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
> > return;
> > }
> >
> > + if (uuid_equal(id, &dram_event_uuid)) {
> Why not else if? Should be obvious to compiler that multiple uuid_equal
> conditions can't match, but even better to not make it try hard perhaps?

Sure else if can work.

>
> > + struct cxl_evt_dram_rec *rec =
> > + (struct cxl_evt_dram_rec *)&payload->record;
> > +
> > + trace_cxl_dram_event(dev_name, type, rec);
> > + return;
> > + }
> > +
> > /* For unknown record types print just the header */
> > trace_cxl_event(dev_name, type, &payload->record);
> > }
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 33669459ae4b..50536c0a7850 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
> > u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> > } __packed;
> >
> > +/*
> > + * DRAM Event Record - DER
> > + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> > + */
> > +#define CXL_EVT_DER_NIBBLE_MASK_SIZE 3
> > +#define CXL_EVT_DER_ROW_SIZE 3
> > +#define CXL_EVT_DER_CORRECTION_MASK_SIZE 0x20
> > +struct cxl_evt_dram_rec {
> > + struct cxl_event_record_hdr hdr;
> > + __le64 phys_addr;
> > + u8 descriptor;
> > + u8 type;
> > + u8 transaction_type;
> > + u16 validity_flags;
> I've not tried it, but can we just mark these as __le16 and use
> the unaligned accessors? get_unaligned_le16 etc

get_unaligned_le16() requires a byte array...

So I think this needs to be:

u8 validity_flags[2];

Now that I know about those calls I think this does make a lot more sense. The
test code works but I knew that it would be sketchy with real devices.

I'll adjust this.

> Also there is get_unaligned_le24() for the 3 byte ones.

Yea done.

[snip]

> > +
> > + TP_fast_assign(
> > + /* Common */
> > + __assign_str(dev_name, dev_name);
> > + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > + __entry->log = log;
> > + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > + __entry->handle = le16_to_cpu(rec->hdr.handle);
> > + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > +
> > + /* DRAM */
> > + __entry->phys_addr = le64_to_cpu(rec->phys_addr);
> > + __entry->descriptor = rec->descriptor;
> > + __entry->type = rec->type;
> > + __entry->transaction_type = rec->transaction_type;
> > + __entry->validity_flags = le16_to_cpu(rec->validity_flags);
> > + __entry->channel = rec->channel;
> > + __entry->rank = rec->rank;
> > + __entry->nibble_mask = rec->nibble_mask[0] << 24 |
> > + rec->nibble_mask[1] << 16 |
> > + rec->nibble_mask[2] << 8; /* 3 byte LE ? */
>
> Use get_unalinged_le24() ? I'd definitely expect these to be le24.
>
>
> > + __entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);
>
> That doesn't look right. You will have unwound the endianness using
> the shifts above. Don't convert it again (noop on le systems, so you
> probably won't see a problem when testing).

I thought I did it right with 2 shifts. But regardless using
get_unalinged_le24() is better and I've already changed it.

>
> > + __entry->bank_group = rec->bank_group;
> > + __entry->bank = rec->bank;
> > + __entry->row = rec->row[0] << 24 |
> > + rec->row[1] << 16 |
> > + rec->row[2] << 8; /* 3 byte LE ? */
>
> get_unaligned_le24()

... and this one.

>
> > + __entry->row = le32_to_cpu(__entry->row);
>
> > + __entry->column = le16_to_cpu(rec->column);
> > + memcpy(__entry->cor_mask, &rec->correction_mask,
> > + CXL_EVT_DER_CORRECTION_MASK_SIZE);
> > + ),
> > +
> > + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > + "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > + "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> > + "cor_mask=%s valid_flags='%s'",
> > + __get_str(dev_name), show_log_type(__entry->log),
> > + __entry->timestamp, __entry->id, __entry->handle,
> > + __entry->related_handle, show_hdr_flags(__entry->flags),
> > + __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > + (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > + show_event_desc_flags(__entry->descriptor),
> As before can we not print the invalid ones based on the validity flags?
>
> Few years ago now, but I did something along those lines for the CCIX equivalent of
> this stuff. (honestly can't remember much about it now though!)
> Was a bit fiddly but lead to nicer prints in my opinion.
>
> https://lore.kernel.org/all/[email protected]/

I'm still not seeing anything which alters the actual print in this patch or
ras_event.h

Perhaps I'm missing what you mean by selecting the valid fields.

Something will have to change the TP_printk() format itself from what I can see
and I don't see a way to do that within the trace infrastructure.

We _could_ do that within the C code where trace_dram() is called. But I'd
like to keep all the info together and let user space decode more than what the
kernel may know.

Ira

2022-09-14 21:32:16

[permalink] [raw]

Subject: Re: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record

On Thu, Aug 25, 2022 at 11:58:42AM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:40 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> >
> > Determine if the event read is memory module record and if so trace the
> > record.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> Similar comments to on previous patches around using
> get_unaligned_le*()

Yep...

[snip]

> >
> > +/*
> > + * Get Health Info Record
> > + * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
> > + */
> > +struct cxl_get_health_info {
> > + u8 health_status;
> > + u8 media_status;
> > + u8 add_status;
> > + u8 life_used;
> > + u16 device_temp;
>
> As previous - even though they aren't aligned, I'd have thought
> __le16 etc will still work. The unaligned accessors are fine
> taking __le16 * for example.

Ok my bad on using u16 here and I will change it. I 100% agree that these
should be __le16/__le32. That said there is no need to use the unaligned
accessors for the 16/32 bit fields.

The unaligned accessors cast the pointer to a __le16/__le32 type and no
architecture redefines those. So using le{16,32}_to_cpu() should work just
fine on all archs.

[snip]

> > +
> > + TP_fast_assign(
> > + /* Common */
> > + __assign_str(dev_name, dev_name);
> > + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > + __entry->log = log;
> > + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > + __entry->handle = le16_to_cpu(rec->hdr.handle);
> > + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > +
> > + /* Memory Module Event */
> > + __entry->event_type = rec->event_type;
> > +
> > + /* Device Health Info */
> > + __entry->health_status = rec->info.health_status;
> > + __entry->media_status = rec->info.media_status;
> > + __entry->life_used = rec->info.life_used;
> > + __entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
> > + __entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);
>
> I've lost track, but my guess is some / all of these need the unaligned_get_le32()
> etc rather than aligned form. Maybe just be lazy and use the unaligned versions
> even when things happen to be aligned - then we don't have to think about it
> when reviewing :)

See above. I think the 16/32 bit fields work as intended except for my lack of
using the correct type.

Ira

2022-09-15 19:34:24

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> On Fri, 12 Aug 2022 22:32:41 -0700
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > Facilitate testing basic Get/Clear Event functionality by creating
> > multiple logs and generic events with made up UUID's.
> >
> > Data is completely made up with data patterns which should be easy to
> > spot in trace output.
> Hi Ira,
>
> I'm tempted to hack the QEMU emulation for this in with appropriately
> complex interface to inject all the record types...

Every time I look at the QEMU code it makes my head spin. :-(

I really thought about adding some support there. And I think for irq's it may
work better? But after your talk today I did a quick search to see what it
would take to do irqs in QEMU and got even more confused. :-(

> Lots to do there though, so not sure where this fits in my priority list!

I bet it is higher on mine! ;-)

>
> >
> > Test traces are easy to obtain with a small script such as this:
> >
> > #!/bin/bash -x
> >
> > devices=`find /sys/devices/platform -name cxl_mem*`
> >
> > # Generate fake events if reset is passed in
>
> reset is rather unintuitive naming.
>
> fill_event_queue maybe or something more in that direction?

Fair enough... Naming is hard and I'm one of the worst.

I've changed to

<sysfs>/.../event_fill_queue
<sysfs>/.../event_trigger

Thoughts?

[snip]

> >
> > +/*
> > + * Mock Events
> > + */
> > +struct mock_event_log {
> > + int cur_event;
> > + int nr_events;
> > + struct xarray events;
>
> I'm not convinced an xarray is appropriate here (I'd have used
> a fixed size array) but meh, I don't care that much and mocking
> code doesn't have to be quick or elegant :)

I rather thought the xarray was more elegant than the fixed array.

>
> > +};
> > +
> > +struct mock_event_store {
> > + struct cxl_dev_state *cxlds;
> > + struct mock_event_log *mock_logs[CXL_EVENT_TYPE_MAX];
>
> Each entry isn't terribly big and there aren't that many of them.
> Make the code simpler by just embedding the instances here?

That is a good idea. Not sure any more why I did it this way.

[snip]

> > +
> > +static void event_store_add_event(struct mock_event_store *es,
> > + enum cxl_event_log_type log_type,
> > + struct cxl_event_record_raw *event)
> > +{
> > + struct mock_event_log *log;
> > + struct device *dev = es->cxlds->dev;
> > + int rc;
> > +
> > + if (log_type >= CXL_EVENT_TYPE_MAX)
> > + return;
> > +
> > + log = es->mock_logs[log_type];
> > + if (!log) {
> > + log = devm_kzalloc(dev, sizeof(*log), GFP_KERNEL);
>
> As above, I'd just embed the logs directly in the containing structure
> rather than allocating on demand. init them all up front.

yep. Done.

>
> > + if (!log) {
> > + dev_err(dev, "Failed to create %s log\n",
> > + cxl_event_log_type_str(log_type));
> > + return;
> > + }
> > + xa_init(&log->events);
> > + devm_add_action(dev, xa_events_destroy, log);
> > + es->mock_logs[log_type] = log;
> > + }
> > +
> > + rc = xa_insert(&log->events, log->nr_events, event, GFP_KERNEL);
> Not sure using an xa for a list really makes that much sense, but
> doesn't matter hugely.

It is much easier than trying to manage pointers and allows the events to be
inserted more than once.

> > + if (rc) {
> > + dev_err(dev, "Failed to store event %s log\n",
> > + cxl_event_log_type_str(log_type));
> > + return;
> > + }
> > + log->nr_events++;
>
> Having an index into a static set of events is more complex.
> I'd either switch to a simple array of pointers, or actually add and
> remove events (or pointers to them anyway).

xarray was much easier to deal with than an array of pointers. Using a list
was hard because I wanted to reuse the static definitions of events rather than
have a bunch of them defined.

[snip]

> > +
> > +/*
> > + * Get and clear event only handle 1 record at a time as this is what is
> > + * currently implemented in the main code.
>
> Duplicating this comment seems unnecessary.

I wanted to make it clear this test code could only test what was currently
implemented...

>
> > + */
> > +static int mock_clear_event(struct cxl_dev_state *cxlds,
> > + struct cxl_mbox_cmd *cmd)
> > +{
> > + struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
> > + struct mock_event_log *log;
> > + u8 log_type = pl->event_log;
> > +
> > + /* Don't handle more than 1 record at a time */
> > + if (pl->nr_recs != 1)
> > + return -EINVAL;

... and this check ...

> > +
> > + if (log_type >= CXL_EVENT_TYPE_MAX)
> > + return -EINVAL;
> > +
> > + log = find_event_log(cxlds, log_type);
> > + if (!log)
> > + return 0; /* No mock data in this log */
> > +
> > + /*
> > + * The current code clears events as they are read
> > + * Test that behavior; not clearning from the middle of the log
> > + */

... and this one; prevents it from blowing up.

[snip]

> > +
> > +static void devm_cxl_mock_event_logs(struct cxl_memdev *cxlmd)
> > +{
> > + struct device *dev = &cxlmd->dev;
> > + struct mock_event_store *es;
> > +
> > + /*
> > + * The memory device gets the sysfs attributes such that the cxlmd
> > + * pointer can be used to get to a cxlds pointer.
> > + */
> > + if (device_add_groups(dev, cxl_mock_event_groups))
>
> Whilst it might not matter in a mocking driver, it's normal to jump through
> hoops to avoid doing this because it races with userspace notifications in
> all sorts of hideous ways. It makes the sysfs maintainers very grumpy ;)

<sigh> I know this is a hack... I really wanted to hang this off of cxlds but
it did not make sense.

> To do it here, you would need to pass the group to devm_cxl_add_memdev()
> and have that slip it in before the cdev_device_add() call I think.
> That wouldn't be particular invasive though.

I guess that would work and yea I guess it is not too invasive.

I'll throw it together for the next version and see how it looks/works.

>
>
> > + return;
> > + if (devm_add_action_or_reset(dev, remove_mock_event_groups, dev))
> > + return;
> > +
> > + /*
> > + * All the mock event data hangs off the device itself.
>
> Nitpick of the day: Single line comment syntax ;)

:-D

Done.

Thanks again for the review!
Ira

2022-09-20 16:06:05

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 4/9] cxl/mem: Trace General Media Event Record

On Mon, 12 Sep 2022 15:38:21 -0700
Ira Weiny <[email protected]> wrote:

> On Wed, Aug 24, 2022 at 05:11:13PM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:38 -0700
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > CXL v3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> > >
> > > Determine if the event read is a general media record and if so trace
> > > the record.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > A few trivial things inline...
> >
>
> [snip]
>
> > > +/*
> > > + * General Media Event Record - GMER
> > > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> > > + */
> > > +#define CXL_GMER_PHYS_ADDR_VOLATILE BIT(0)
> > > +#define CXL_GMER_PHYS_ADDR_MASK 0x3f
> >
> > Inverse of mask is confusing. Just specify the full mask.
>
> Fixed
>
> [snip]
>
> > > + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > > + "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > > + "rank=%u device=%x comp_id=%s valid_flags='%s'",
> > > + __get_str(dev_name), show_log_type(__entry->log),
> > > + __entry->timestamp, __entry->id, __entry->handle,
> > > + __entry->related_handle, show_hdr_flags(__entry->flags),
> > > + __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > > + (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > > + show_event_desc_flags(__entry->descriptor),
> > > + show_mem_event_type(__entry->type),
> > > + show_trans_type(__entry->transaction_type),
> > > + __entry->channel, __entry->rank, __entry->device,
> > > + __print_hex(__entry->comp_id, CXL_EVT_GEN_MED_COMP_ID_SIZE),
> > > + show_valid_flags(__entry->validity_flags)
> >
> > Can we make the printing of fields with valid flags conditional?
> > Been a while since I wrote a Trace point, but I think I recall doing that..
>
> I'm not seeing a way right off. But I can't say it is impossible...

Needs some helper code... Here's one I made earlier (and had almost entirely
banished from my memory!)

https://lore.kernel.org/all/[email protected]/

>
> I'll keep an eye out as I clean the series up,
> Ira
>
> >
> > > + )
> > > +);
> > > +
> > > #endif /* _CXL_TRACE_EVENTS_H */
> > >
> > > /* This part must be outside protection */
> >

2022-09-20 16:10:48

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 5/9] cxl/mem: Trace DRAM Event Record

On Mon, 12 Sep 2022 16:04:07 -0700
Ira Weiny <[email protected]> wrote:

> On Thu, Aug 25, 2022 at 11:46:32AM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:39 -0700
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > CXL v3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
> > >
> > > Determine if the event read is a DRAM event record and if so trace the
> > > record.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > ---
> > > This record has a very odd byte layout with 2 - 16 bit fields
> > > (validity_flags and column) aligned on an odd byte boundary. In
> > > addition nibble_mask and row are oddly aligned.
> > >
> > > I've made my best guess as to how the endianess of these fields should
> > > be resolved. But I'm happy to hear from other folks if what I have is
> > > wrong.
> > My assumption is same as you. We should sanity check of course by
> > poking relevant people.
> >
> > Similar comments in here to previous. Use the get_unaligned_le24()
> > accessors + consider not printing invalid fields.
>
> Yea I've already converted the 3 byte fields to get_unaligned_le24()
>
> > >
> > > struct cxl_evt_dram_rec {
> > > struct cxl_event_record_hdr hdr;
> > > __le64 phys_addr;
> > > u8 descriptor;
> > > u8 type;
> > > u8 transaction_type;
> > > u16 validity_flags;
> > > u8 channel;
> > > u8 rank;
> > > u8 nibble_mask[CXL_EVT_DER_NIBBLE_MASK_SIZE];
> > > u8 bank_group;
> > > u8 bank;
> > > u8 row[CXL_EVT_DER_ROW_SIZE];
> > > u16 column;
> > > u8 correction_mask[CXL_EVT_DER_CORRECTION_MASK_SIZE];
> > > } __packed;
> > > ---
> > > drivers/cxl/core/mbox.c | 16 +++++
> > > drivers/cxl/cxlmem.h | 24 +++++++
> > > include/trace/events/cxl-events.h | 114 ++++++++++++++++++++++++++++++
> > > 3 files changed, 154 insertions(+)
> > >
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 0e433f072163..6414588a3c7b 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -717,6 +717,14 @@ static const uuid_t gen_media_event_uuid =
> > > UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> > > 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> > >
> > > +/*
> > > + * DRAM Event Record
> > > + * CXL v3.0 section 8.2.9.2.1.2; Table 8-44
> > rev3.0, r3.0 or just 3.0
>
> Already done.
>
> >
> > > + */
> > > +static const uuid_t dram_event_uuid =
> > > + UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> > > + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> > > +
> > > static void cxl_trace_event_record(const char *dev_name,
> > > enum cxl_event_log_type type,
> > > struct cxl_get_event_payload *payload)
> > > @@ -731,6 +739,14 @@ static void cxl_trace_event_record(const char *dev_name,
> > > return;
> > > }
> > >
> > > + if (uuid_equal(id, &dram_event_uuid)) {
> > Why not else if? Should be obvious to compiler that multiple uuid_equal
> > conditions can't match, but even better to not make it try hard perhaps?
>
> Sure else if can work.
>
> >
> > > + struct cxl_evt_dram_rec *rec =
> > > + (struct cxl_evt_dram_rec *)&payload->record;
> > > +
> > > + trace_cxl_dram_event(dev_name, type, rec);
> > > + return;
> > > + }
> > > +
> > > /* For unknown record types print just the header */
> > > trace_cxl_event(dev_name, type, &payload->record);
> > > }
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 33669459ae4b..50536c0a7850 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -421,6 +421,30 @@ struct cxl_evt_gen_media {
> > > u8 component_id[CXL_EVT_GEN_MED_COMP_ID_SIZE];
> > > } __packed;
> > >
> > > +/*
> > > + * DRAM Event Record - DER
> > > + * CXL v3.0 section 8.2.9.2.1.2; Table 3-44
> > > + */
> > > +#define CXL_EVT_DER_NIBBLE_MASK_SIZE 3
> > > +#define CXL_EVT_DER_ROW_SIZE 3
> > > +#define CXL_EVT_DER_CORRECTION_MASK_SIZE 0x20
> > > +struct cxl_evt_dram_rec {
> > > + struct cxl_event_record_hdr hdr;
> > > + __le64 phys_addr;
> > > + u8 descriptor;
> > > + u8 type;
> > > + u8 transaction_type;
> > > + u16 validity_flags;
> > I've not tried it, but can we just mark these as __le16 and use
> > the unaligned accessors? get_unaligned_le16 etc
>
> get_unaligned_le16() requires a byte array...
>
> So I think this needs to be:
>
> u8 validity_flags[2];
>
> Now that I know about those calls I think this does make a lot more sense. The
> test code works but I knew that it would be sketchy with real devices.
>
> I'll adjust this.
>
> > Also there is get_unaligned_le24() for the 3 byte ones.
>
> Yea done.
>
> [snip]
>
> > > +
> > > + TP_fast_assign(
> > > + /* Common */
> > > + __assign_str(dev_name, dev_name);
> > > + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > > + __entry->log = log;
> > > + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > > + __entry->handle = le16_to_cpu(rec->hdr.handle);
> > > + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > > + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > > +
> > > + /* DRAM */
> > > + __entry->phys_addr = le64_to_cpu(rec->phys_addr);
> > > + __entry->descriptor = rec->descriptor;
> > > + __entry->type = rec->type;
> > > + __entry->transaction_type = rec->transaction_type;
> > > + __entry->validity_flags = le16_to_cpu(rec->validity_flags);
> > > + __entry->channel = rec->channel;
> > > + __entry->rank = rec->rank;
> > > + __entry->nibble_mask = rec->nibble_mask[0] << 24 |
> > > + rec->nibble_mask[1] << 16 |
> > > + rec->nibble_mask[2] << 8; /* 3 byte LE ? */
> >
> > Use get_unalinged_le24() ? I'd definitely expect these to be le24.
> >
> >
> > > + __entry->nibble_mask = le32_to_cpu(__entry->nibble_mask);
> >
> > That doesn't look right. You will have unwound the endianness using
> > the shifts above. Don't convert it again (noop on le systems, so you
> > probably won't see a problem when testing).
>
> I thought I did it right with 2 shifts. But regardless using
> get_unalinged_le24() is better and I've already changed it.
>
> >
> > > + __entry->bank_group = rec->bank_group;
> > > + __entry->bank = rec->bank;
> > > + __entry->row = rec->row[0] << 24 |
> > > + rec->row[1] << 16 |
> > > + rec->row[2] << 8; /* 3 byte LE ? */
> >
> > get_unaligned_le24()
>
> ... and this one.
>
> >
> > > + __entry->row = le32_to_cpu(__entry->row);
> >
> > > + __entry->column = le16_to_cpu(rec->column);
> > > + memcpy(__entry->cor_mask, &rec->correction_mask,
> > > + CXL_EVT_DER_CORRECTION_MASK_SIZE);
> > > + ),
> > > +
> > > + TP_printk("%s: %s time=%llu id=%pUl handle=%x related_handle=%x hdr_flags='%s': " \
> > > + "phys_addr=%llx volatile=%s desc='%s' type='%s' trans_type='%s' channel=%u " \
> > > + "rank=%u nibble_mask=%x bank_group=%u bank=%u row=%u column=%u " \
> > > + "cor_mask=%s valid_flags='%s'",
> > > + __get_str(dev_name), show_log_type(__entry->log),
> > > + __entry->timestamp, __entry->id, __entry->handle,
> > > + __entry->related_handle, show_hdr_flags(__entry->flags),
> > > + __entry->phys_addr & ~CXL_GMER_PHYS_ADDR_MASK,
> > > + (__entry->phys_addr & CXL_GMER_PHYS_ADDR_VOLATILE) ? "TRUE" : "FALSE",
> > > + show_event_desc_flags(__entry->descriptor),
> > As before can we not print the invalid ones based on the validity flags?
> >
> > Few years ago now, but I did something along those lines for the CCIX equivalent of
> > this stuff. (honestly can't remember much about it now though!)
> > Was a bit fiddly but lead to nicer prints in my opinion.
> >
> > https://lore.kernel.org/all/[email protected]/
>

Ah. And I'd forgotten I shared it in this reply ;)

> I'm still not seeing anything which alters the actual print in this patch or
> ras_event.h
>
> Perhaps I'm missing what you mean by selecting the valid fields.
>
> Something will have to change the TP_printk() format itself from what I can see
> and I don't see a way to do that within the trace infrastructure.
>
> We _could_ do that within the C code where trace_dram() is called. But I'd
> like to keep all the info together and let user space decode more than what the
> kernel may know.

Take a look at cper_ccix_err_location() e.g.

+ if (cmem_err->validation_bits & CCIX_MEM_ERR_GENERIC_MEM_VALID)
+ n = snprintf(msg, len, "Pool Generic Type: %s ",
+ cper_ccix_mem_err_generic_type_str(cmem_err->pool_generic_type));

which is called from the TP_printk() via cper_ccix_mem_err_unpack()

You can call normal code in TP_printk() though indeed that code needs to then
be in a c file, not the tracepoint header.

Given the meaning of those valid fields won't change, I'd be keen not to print
the associated 'invalid' entries as those are kind of misleading.

Note that userspace code doesn't generally consume anything to do with TP_printk()
but rather does it's own processing...
E.g. something like:
https://github.com/mchehab/rasdaemon/blob/master/non-standard-hisilicon.c#L210
which happens to be one of our more complex trace point handlers in
rasdaemon. I think that particular handler decodes for print, but drops the
data in the DB in a fairly raw format. Some others break it down further for
logging. Here are the CCIX ones that never went upstream...
https://lore.kernel.org/all/[email protected]/

Jonathan

>
> Ira

2022-09-20 16:40:38

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 6/9] cxl/mem: Trace Memory Module Event Record

On Wed, 14 Sep 2022 14:17:14 -0700
Ira Weiny <[email protected]> wrote:

> On Thu, Aug 25, 2022 at 11:58:42AM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:40 -0700
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > CXL v3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > >
> > > Determine if the event read is memory module record and if so trace the
> > > record.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > Similar comments to on previous patches around using
> > get_unaligned_le*()
>
> Yep...
>
> [snip]
>
> > >
> > > +/*
> > > + * Get Health Info Record
> > > + * CXL v3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +struct cxl_get_health_info {
> > > + u8 health_status;
> > > + u8 media_status;
> > > + u8 add_status;
> > > + u8 life_used;
> > > + u16 device_temp;
> >
> > As previous - even though they aren't aligned, I'd have thought
> > __le16 etc will still work. The unaligned accessors are fine
> > taking __le16 * for example.
>
> Ok my bad on using u16 here and I will change it. I 100% agree that these
> should be __le16/__le32. That said there is no need to use the unaligned
> accessors for the 16/32 bit fields.
>
> The unaligned accessors cast the pointer to a __le16/__le32 type and no
> architecture redefines those. So using le{16,32}_to_cpu() should work just
> fine on all archs.

If they are unaligned, make sure to use the unaligned accessors.

Key is that it's not a simple cast, but rather a cast to a packed
structure. The C spec guarantees that those will be handled correctly
even on platforms that don't do unaligned accesses - it will have to
use multiple instructions to construct the unaligned access from
a set of small aligned ones.
The C Spec doesn't guarantee the same for a simple cast to an __le16.

There are some hints on this in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/include/asm-generic/unaligned.h?id=778aaefb8e864fc61f850539ea479554dd4caea1

I recall a full explanation of why this worked, but no idea where
to find that now - might be the thread referred to in that patch from
Arnd.

Jonathan

>
> [snip]
>
> > > +
> > > + TP_fast_assign(
> > > + /* Common */
> > > + __assign_str(dev_name, dev_name);
> > > + memcpy(__entry->id, &rec->hdr.id, UUID_SIZE);
> > > + __entry->log = log;
> > > + __entry->flags = le32_to_cpu(rec->hdr.flags_length) >> 8;
> > > + __entry->handle = le16_to_cpu(rec->hdr.handle);
> > > + __entry->related_handle = le16_to_cpu(rec->hdr.related_handle);
> > > + __entry->timestamp = le64_to_cpu(rec->hdr.timestamp);
> > > +
> > > + /* Memory Module Event */
> > > + __entry->event_type = rec->event_type;
> > > +
> > > + /* Device Health Info */
> > > + __entry->health_status = rec->info.health_status;
> > > + __entry->media_status = rec->info.media_status;
> > > + __entry->life_used = rec->info.life_used;
> > > + __entry->dirty_shutdown_cnt = le32_to_cpu(rec->info.dirty_shutdown_cnt);
> > > + __entry->cor_vol_err_cnt = le32_to_cpu(rec->info.cor_vol_err_cnt);
> >
> > I've lost track, but my guess is some / all of these need the unaligned_get_le32()
> > etc rather than aligned form. Maybe just be lazy and use the unaligned versions
> > even when things happen to be aligned - then we don't have to think about it
> > when reviewing :)
>
> See above. I think the 16/32 bit fields work as intended except for my lack of
> using the correct type.
>
> Ira

2022-09-20 17:05:32

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Thu, 15 Sep 2022 11:53:29 -0700
Ira Weiny <[email protected]> wrote:

> On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > On Fri, 12 Aug 2022 22:32:41 -0700
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > Facilitate testing basic Get/Clear Event functionality by creating
> > > multiple logs and generic events with made up UUID's.
> > >
> > > Data is completely made up with data patterns which should be easy to
> > > spot in trace output.
> > Hi Ira,
> >
> > I'm tempted to hack the QEMU emulation for this in with appropriately
> > complex interface to inject all the record types...
>
> Every time I look at the QEMU code it makes my head spin. :-(

You get used to it ;)`

>
> I really thought about adding some support there. And I think for irq's it may
> work better? But after your talk today I did a quick search to see what it
> would take to do irqs in QEMU and got even more confused. :-(

Copy an example - though we haven't upstreamed any yet...

Either...

https://gitlab.com/jic23/qemu/-/commit/958fec58582b5cc910d2da4e2b855e134bb2c0c3#3dfd54f69a5f2382ddf5a6c00a52546d8b57316e_0_169

Or the CPMU one.

https://lore.kernel.org/all/[email protected]/
to setup then look for msix_notify in

https://lore.kernel.org/all/[email protected]/

>
> > Lots to do there though, so not sure where this fits in my priority list!
>
> I bet it is higher on mine! ;-)

:)

>
> >
> > >
> > > Test traces are easy to obtain with a small script such as this:
> > >
> > > #!/bin/bash -x
> > >
> > > devices=`find /sys/devices/platform -name cxl_mem*`
> > >
> > > # Generate fake events if reset is passed in
> >
> > reset is rather unintuitive naming.
> >
> > fill_event_queue maybe or something more in that direction?
>
> Fair enough... Naming is hard and I'm one of the worst.
>
> I've changed to
>
> <sysfs>/.../event_fill_queue
> <sysfs>/.../event_trigger
>
> Thoughts?

Works for me.

..

J

2022-09-26 22:29:49

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> On Thu, 15 Sep 2022 11:53:29 -0700
> Ira Weiny <[email protected]> wrote:
>
> > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > [email protected] wrote:
> > >
> > > > From: Ira Weiny <[email protected]>
> > > >
> > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > multiple logs and generic events with made up UUID's.
> > > >
> > > > Data is completely made up with data patterns which should be easy to
> > > > spot in trace output.
> > > Hi Ira,
> > >
> > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > complex interface to inject all the record types...
> >
> > Every time I look at the QEMU code it makes my head spin. :-(
>
> You get used to it ;)`

I'm trying... :-/

Question though:

Is there a call in qemu which is equivalent to cpu_to_leXX()? The
exec/cpu-all.h is having compilation issues for me because the
TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).

So I'm afraid that the tswapXX() calls are not what I'm supposed to use. Is
this true? Are those some sort of internal call?

Ira

2022-09-27 14:44:44

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Mon, 26 Sep 2022 14:39:52 -0700
Ira Weiny <[email protected]> wrote:

> On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> > On Thu, 15 Sep 2022 11:53:29 -0700
> > Ira Weiny <[email protected]> wrote:
> >
> > > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > > [email protected] wrote:
> > > >
> > > > > From: Ira Weiny <[email protected]>
> > > > >
> > > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > > multiple logs and generic events with made up UUID's.
> > > > >
> > > > > Data is completely made up with data patterns which should be easy to
> > > > > spot in trace output.
> > > > Hi Ira,
> > > >
> > > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > > complex interface to inject all the record types...
> > >
> > > Every time I look at the QEMU code it makes my head spin. :-(
> >
> > You get used to it ;)`
>
> I'm trying... :-/
>
> Question though:
>
> Is there a call in qemu which is equivalent to cpu_to_leXX()? The
> exec/cpu-all.h is having compilation issues for me because the
> TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).
>
> So I'm afraid that the tswapXX() calls are not what I'm supposed to use. Is
> this true? Are those some sort of internal call?
I'm confused. There is cpu_to_le16 in "qemu/bswap.h"

I suspect we've played a bit fast and loose with endianness in a few places in
current qemu code and should probably check all that sometime.

Jonathan

>
> Ira

2022-09-27 16:30:27

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Tue, Sep 27, 2022 at 02:56:23PM +0100, Jonathan Cameron wrote:
> On Mon, 26 Sep 2022 14:39:52 -0700
> Ira Weiny <[email protected]> wrote:
>
> > On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> > > On Thu, 15 Sep 2022 11:53:29 -0700
> > > Ira Weiny <[email protected]> wrote:
> > >
> > > > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > > > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > > > [email protected] wrote:
> > > > >
> > > > > > From: Ira Weiny <[email protected]>
> > > > > >
> > > > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > > > multiple logs and generic events with made up UUID's.
> > > > > >
> > > > > > Data is completely made up with data patterns which should be easy to
> > > > > > spot in trace output.
> > > > > Hi Ira,
> > > > >
> > > > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > > > complex interface to inject all the record types...
> > > >
> > > > Every time I look at the QEMU code it makes my head spin. :-(
> > >
> > > You get used to it ;)`
> >
> > I'm trying... :-/
> >
> > Question though:
> >
> > Is there a call in qemu which is equivalent to cpu_to_leXX()? The
> > exec/cpu-all.h is having compilation issues for me because the
> > TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).
> >
> > So I'm afraid that the tswapXX() calls are not what I'm supposed to use. Is
> > this true? Are those some sort of internal call?
> I'm confused. There is cpu_to_le16 in "qemu/bswap.h"

<sigh> I don't know how I missed it. Sorry.

>
> I suspect we've played a bit fast and loose with endianness in a few places in
> current qemu code and should probably check all that sometime.

Yea nothing in hw/cxl seems to use any swapping. I suppose only little endian
hosts have been used thus far?

I greped for 'ENDIAN' and found the tswap* calls. I guess I should have
grepped for 'cpu_to'! That found it right away! :-/ :-D

Sorry for the distraction,
Ira

> Jonathan
>
>
>
> >
> > Ira
>

2022-09-28 10:05:19

by Jonathan Cameron

[permalink] [raw]

Subject: Re: [RFC PATCH 7/9] cxl/test: Add generic mock events

On Tue, 27 Sep 2022 09:13:58 -0700
Ira Weiny <[email protected]> wrote:

> On Tue, Sep 27, 2022 at 02:56:23PM +0100, Jonathan Cameron wrote:
> > On Mon, 26 Sep 2022 14:39:52 -0700
> > Ira Weiny <[email protected]> wrote:
> >
> > > On Tue, Sep 20, 2022 at 09:17:48AM -0700, Jonathan Cameron wrote:
> > > > On Thu, 15 Sep 2022 11:53:29 -0700
> > > > Ira Weiny <[email protected]> wrote:
> > > >
> > > > > On Thu, Aug 25, 2022 at 12:31:19PM +0100, Jonathan Cameron wrote:
> > > > > > On Fri, 12 Aug 2022 22:32:41 -0700
> > > > > > [email protected] wrote:
> > > > > >
> > > > > > > From: Ira Weiny <[email protected]>
> > > > > > >
> > > > > > > Facilitate testing basic Get/Clear Event functionality by creating
> > > > > > > multiple logs and generic events with made up UUID's.
> > > > > > >
> > > > > > > Data is completely made up with data patterns which should be easy to
> > > > > > > spot in trace output.
> > > > > > Hi Ira,
> > > > > >
> > > > > > I'm tempted to hack the QEMU emulation for this in with appropriately
> > > > > > complex interface to inject all the record types...
> > > > >
> > > > > Every time I look at the QEMU code it makes my head spin. :-(
> > > >
> > > > You get used to it ;)`
> > >
> > > I'm trying... :-/
> > >
> > > Question though:
> > >
> > > Is there a call in qemu which is equivalent to cpu_to_leXX()? The
> > > exec/cpu-all.h is having compilation issues for me because the
> > > TARGET_BIG_ENDIAN is not defined (it is defined in a meson generated header).
> > >
> > > So I'm afraid that the tswapXX() calls are not what I'm supposed to use. Is
> > > this true? Are those some sort of internal call?
> > I'm confused. There is cpu_to_le16 in "qemu/bswap.h"
>
> <sigh> I don't know how I missed it. Sorry.
>
> >
> > I suspect we've played a bit fast and loose with endianness in a few places in
> > current qemu code and should probably check all that sometime.
>
> Yea nothing in hw/cxl seems to use any swapping. I suppose only little endian
> hosts have been used thus far?

Exactly and I'm not sure when we'll see any big endian emulated hosts. We should fix that,
but lots of other things on todo list, so it's not particularly high on the list
+ getting a test environment up is going to be non trivial.

J
>
> I greped for 'ENDIAN' and found the tswap* calls. I guess I should have
> grepped for 'cpu_to'! That found it right away! :-/ :-D
>
> Sorry for the distraction,
> Ira
>
> > Jonathan
> >
> >
> >
> > >
> > > Ira
> >