2023-11-01 21:12:27

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC v3 0/6] efi/cxl-cper: Report CPER CXL component events through trace events

Series status/background
========================

This is another RFC version of processing the CXL CPER records through
the CXL trace mechanisms as Dan mentioned in [1].

This raises the cxl event structures to a core header and rearranges them
such that they can be shared most efficiently. Thus eliminating a
memcpy Smita noticed. Also BDF is used instead of serial number.

NOTE: I'm still fuzzy on which fields in the CPER record are correct to
find the BDF in the Linux code. It would be nice to double check those
for me.

The CPER code remains compile tested only. The original event code
continues to pass cxl-test.

[1] https://lore.kernel.org/all/[email protected]/

Cover letter
============

CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record. If a device is configured for firmware first CXL event records
are not sent directly to the host.

The CXL sub-system uniquely has DPA to HPA translation information. It
also already properly decodes the event format. Send the CXL CPER
records to the CXL sub-system for processing.

With CXL event logs the device interrupts the host with events. In the
EFI case events are wrapped with device information which needs to be
matched with memdev devices the CXL driver is tracking.

A number of alternatives were considered to match the memdev with the
CPER record. The most robust was to find the PCI device via Bus,
Device, Function and match it to the memdev driver data.

CPER records are identified with GUID's while CXL event logs contain
UUID's. The UUID was previously printed for all events. But the UUID
is redundant information which presents unnecessary complexity when
processing CPER data. Remove the UUIDs from known events. Restructure
the code to make sharing the data between CPER/event logs most
efficient.

Signed-off-by: Ira Weiny <[email protected]>
---
Changes in RFC v3:
- djbw: Share structures between CPER/event logs
- Smita: use BDF to resolve the memdev
- djbw/Smita: various cleanups
- Link to v2: https://lore.kernel.org/r/[email protected]

---
Ira Weiny (6):
cxl/trace: Remove uuid from event trace known events
cxl/events: Promote CXL event structures to a core header
cxl/events: Remove UUID from non-generic event structures
cxl/events: Create a CXL event union
firmware/efi: Process CXL Component Events
cxl/memdev: Register for and process CPER events

drivers/cxl/core/mbox.c | 57 +++++++++-----
drivers/cxl/core/trace.h | 18 ++---
drivers/cxl/cxlmem.h | 96 ++---------------------
drivers/cxl/pci.c | 59 +++++++++++++-
drivers/firmware/efi/cper.c | 15 ++++
drivers/firmware/efi/cper_cxl.c | 40 ++++++++++
drivers/firmware/efi/cper_cxl.h | 29 +++++++
include/linux/cxl-event.h | 160 ++++++++++++++++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 166 +++++++++++++++++++++++-----------------
9 files changed, 451 insertions(+), 189 deletions(-)
---
base-commit: 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5
change-id: 20230601-cxl-cper-26ffc839c6c6

Best regards,
--
Ira Weiny <[email protected]>


2023-11-01 21:12:53

by Ira Weiny

[permalink] [raw]
Subject: [PATCH RFC v3 2/6] cxl/events: Promote CXL event structures to a core header

EFI code can process CXL events through CPER records. Those records use
almost the same format as the CXL events.

Lift the CXL event structures to a core header to be shared.

Signed-off-by: Ira Weiny <[email protected]>

---
Changes from RFC v2:
[djbw: new patch]
---
drivers/cxl/cxlmem.h | 90 +----------------------------------------
include/linux/cxl-event.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 89 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 706f8a6d1ef4..d694820ce8f5 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -6,6 +6,7 @@
#include <linux/cdev.h>
#include <linux/uuid.h>
#include <linux/rcuwait.h>
+#include <linux/cxl-event.h>
#include "cxl.h"

/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -576,27 +577,6 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;

-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
-struct cxl_event_record_hdr {
- uuid_t id;
- u8 length;
- u8 flags[3];
- __le16 handle;
- __le16 related_handle;
- __le64 timestamp;
- u8 maint_op_class;
- u8 reserved[15];
-} __packed;
-
-#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- struct cxl_event_record_hdr hdr;
- u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
-} __packed;
-
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -638,74 +618,6 @@ struct cxl_mbox_clear_event_payload {
} __packed;
#define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX

-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
-struct cxl_event_gen_media {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 device[3];
- u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
- u8 reserved[46];
-} __packed;
-
-/*
- * DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
- */
-#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
-struct cxl_event_dram {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 nibble_mask[3];
- u8 bank_group;
- u8 bank;
- u8 row[3];
- u8 column[2];
- u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
- u8 reserved[0x17];
-} __packed;
-
-/*
- * Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
- */
-struct cxl_get_health_info {
- u8 health_status;
- u8 media_status;
- u8 add_status;
- u8 life_used;
- u8 device_temp[2];
- u8 dirty_shutdown_cnt[4];
- u8 cor_vol_err_cnt[4];
- u8 cor_per_err_cnt[4];
-} __packed;
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-struct cxl_event_mem_module {
- struct cxl_event_record_hdr hdr;
- u8 event_type;
- struct cxl_get_health_info info;
- u8 reserved[0x3d];
-} __packed;
-
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
new file mode 100644
index 000000000000..1c94e8fdd227
--- /dev/null
+++ b/include/linux/cxl-event.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CXL_EVENT_H
+#define _LINUX_CXL_EVENT_H
+
+/*
+ * CXL event records; CXL rev 3.0
+ *
+ * Copyright(c) 2023 Intel Corporation.
+ */
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+ uuid_t id;
+ u8 length;
+ u8 flags[3];
+ __le16 handle;
+ __le16 related_handle;
+ __le64 timestamp;
+ u8 maint_op_class;
+ u8 reserved[15];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+ struct cxl_event_record_hdr hdr;
+ u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
+struct cxl_event_gen_media {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 device[3];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 reserved[46];
+} __packed;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
+struct cxl_event_dram {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ u8 column[2];
+ u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+ u8 reserved[0x17];
+} __packed;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+ u8 health_status;
+ u8 media_status;
+ u8 add_status;
+ u8 life_used;
+ u8 device_temp[2];
+ u8 dirty_shutdown_cnt[4];
+ u8 cor_vol_err_cnt[4];
+ u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+ struct cxl_event_record_hdr hdr;
+ u8 event_type;
+ struct cxl_get_health_info info;
+ u8 reserved[0x3d];
+} __packed;
+
+#endif /* _LINUX_CXL_EVENT_H */

--
2.41.0