Series status/background
========================
Smita has been a great help with this series. This includes testing the
last RFC version.[1]
Unfortunately, Dan had a better idea for how to register the call
between the efi and cxl subsystems so this is reworked for V2.
[1] https://lore.kernel.org/all/[email protected]/
Cover letter
============
CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record. If a device is configured for firmware first CXL event records
are not sent directly to the host.
The CXL sub-system uniquely has DPA to HPA translation information. It
also already has event format tracing. Restructure the code to make
sharing the data between CPER/event logs most efficient. Then send the
CXL CPER records to the CXL sub-system for processing.
With event logs the events interrupt the driver directly. In the EFI
case events are wrapped with device information which allows the CXL
subsystem to identify the PCI device.
Previous version considered matching the memdev differently. However,
the most robust was to find the PCI device via Bus, Device, Function and
use the PCI device to find the driver data.
CPER records are identified with GUID's while CXL event logs contain
UUID's. The UUID is reported for all events. While the UUID is
redundant for the known events the UUID's are already used by rasdaemon.
To keep compatibility UUIDs are injected for CPER records based on the
record type.
In addition this series cleans up the UUID defines used between the
event processing and cxl_test code.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes in v2:
- djbw: Clarify GUID defines
- djbw: Clarify UUID defines
- djbw: Use a single event callback to the CXL subsystem
- iweiny: Minor function name clean ups
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Ira Weiny (7):
cxl/trace: Pass uuid explicitly to event traces
cxl/events: Promote CXL event structures to a core header
cxl/events: Create common event UUID defines
cxl/events: Separate UUID from event structures
cxl/events: Create a CXL event union
firmware/efi: Process CXL Component Events
cxl/memdev: Register for and process CPER events
drivers/cxl/core/mbox.c | 77 ++++++++++---------
drivers/cxl/core/trace.h | 32 ++++----
drivers/cxl/cxlmem.h | 112 +++++++--------------------
drivers/cxl/pci.c | 52 ++++++++++++-
drivers/firmware/efi/cper.c | 15 ++++
drivers/firmware/efi/cper_cxl.c | 45 +++++++++++
drivers/firmware/efi/cper_cxl.h | 29 +++++++
include/linux/cxl-event.h | 162 +++++++++++++++++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 163 +++++++++++++++++++++++-----------------
9 files changed, 481 insertions(+), 206 deletions(-)
---
base-commit: 7475e51b87969e01a6812eac713a1c8310372e8a
change-id: 20230601-cxl-cper-26ffc839c6c6
Best regards,
--
Ira Weiny <[email protected]>
CXL CPER events are identified by the CPER Section Type GUID. The GUID
correlates with the CXL UUID for the event record. It turns out that a
CXL CPER record is a strict subset of the CXL event record, only the
UUID header field is chopped.
In order to unify handling between native and CPER flavors of CXL
events, prepare the code for the UUID to be passed in rather than
inferred from the record itself.
Later patches update the passed in record to only refer to the common
data between the formats.
Pass the UUID explicitly to each trace event to be able to remove the
UUID from the event structures.
Originally it was desirable to remove the UUID from the well known event
because the UUID value was redundant. However, the trace API was
already in place.[1]
[1] https://lore.kernel.org/all/[email protected]/
Signed-off-by: Ira Weiny <[email protected]>
---
Changes for v2:
[djbw: reword commit message]
[djbw: keep uuid field name hdr_uuid]
---
drivers/cxl/core/mbox.c | 8 ++++----
drivers/cxl/core/trace.h | 30 +++++++++++++++---------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 36270dcfb42e..00f429c440df 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -870,19 +870,19 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
struct cxl_event_gen_media *rec =
(struct cxl_event_gen_media *)record;
- trace_cxl_general_media(cxlmd, type, rec);
+ trace_cxl_general_media(cxlmd, type, id, rec);
} else if (uuid_equal(id, &dram_event_uuid)) {
struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
- trace_cxl_dram(cxlmd, type, rec);
+ trace_cxl_dram(cxlmd, type, id, rec);
} else if (uuid_equal(id, &mem_mod_event_uuid)) {
struct cxl_event_mem_module *rec =
(struct cxl_event_mem_module *)record;
- trace_cxl_memory_module(cxlmd, type, rec);
+ trace_cxl_memory_module(cxlmd, type, id, rec);
} else {
/* For unknown record types print just the header */
- trace_cxl_generic_event(cxlmd, type, record);
+ trace_cxl_generic_event(cxlmd, type, id, record);
}
}
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a0b5819bc70b..68973a101a75 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -189,7 +189,7 @@ TRACE_EVENT(cxl_overflow,
__string(memdev, dev_name(&cxlmd->dev)) \
__string(host, dev_name(cxlmd->dev.parent)) \
__field(int, log) \
- __field_struct(uuid_t, hdr_uuid) \
+ __field_struct(uuid_t, hdr_uuid) \
__field(u64, serial) \
__field(u32, hdr_flags) \
__field(u16, hdr_handle) \
@@ -198,12 +198,12 @@ TRACE_EVENT(cxl_overflow,
__field(u8, hdr_length) \
__field(u8, hdr_maint_op_class)
-#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
+#define CXL_EVT_TP_fast_assign(cxlmd, l, uuid, hdr) \
__assign_str(memdev, dev_name(&(cxlmd)->dev)); \
__assign_str(host, dev_name((cxlmd)->dev.parent)); \
__entry->log = (l); \
__entry->serial = (cxlmd)->cxlds->serial; \
- memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
+ memcpy(&__entry->hdr_uuid, (uuid), sizeof(uuid_t)); \
__entry->hdr_length = (hdr).length; \
__entry->hdr_flags = get_unaligned_le24((hdr).flags); \
__entry->hdr_handle = le16_to_cpu((hdr).handle); \
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
TRACE_EVENT(cxl_generic_event,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_record_raw *rec),
+ const uuid_t *uuid, struct cxl_event_record_raw *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -235,7 +235,7 @@ TRACE_EVENT(cxl_generic_event,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),
@@ -315,9 +315,9 @@ TRACE_EVENT(cxl_generic_event,
TRACE_EVENT(cxl_general_media,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_gen_media *rec),
+ const uuid_t *uuid, struct cxl_event_gen_media *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -336,7 +336,7 @@ TRACE_EVENT(cxl_general_media,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
/* General Media */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -398,9 +398,9 @@ TRACE_EVENT(cxl_general_media,
TRACE_EVENT(cxl_dram,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_dram *rec),
+ const uuid_t *uuid, struct cxl_event_dram *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -422,7 +422,7 @@ TRACE_EVENT(cxl_dram,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
/* DRAM */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -547,9 +547,9 @@ TRACE_EVENT(cxl_dram,
TRACE_EVENT(cxl_memory_module,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_mem_module *rec),
+ const uuid_t *uuid, struct cxl_event_mem_module *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -569,7 +569,7 @@ TRACE_EVENT(cxl_memory_module,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
/* Memory Module Event */
__entry->event_type = rec->event_type;
--
2.43.0
UEFI code can process CXL events through CPER records. Those records
use almost the same format as the CXL events.
Lift the CXL event structures to a core header to be shared.
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/cxlmem.h | 90 +----------------------------------------
include/linux/cxl-event.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 89 deletions(-)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a2fcbca253f3..f0e7ebb84f02 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -6,6 +6,7 @@
#include <linux/cdev.h>
#include <linux/uuid.h>
#include <linux/rcuwait.h>
+#include <linux/cxl-event.h>
#include "cxl.h"
/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -579,27 +580,6 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
-struct cxl_event_record_hdr {
- uuid_t id;
- u8 length;
- u8 flags[3];
- __le16 handle;
- __le16 related_handle;
- __le64 timestamp;
- u8 maint_op_class;
- u8 reserved[15];
-} __packed;
-
-#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- struct cxl_event_record_hdr hdr;
- u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
-} __packed;
-
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -641,74 +621,6 @@ struct cxl_mbox_clear_event_payload {
} __packed;
#define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
-struct cxl_event_gen_media {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 device[3];
- u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
- u8 reserved[46];
-} __packed;
-
-/*
- * DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
- */
-#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
-struct cxl_event_dram {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 nibble_mask[3];
- u8 bank_group;
- u8 bank;
- u8 row[3];
- u8 column[2];
- u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
- u8 reserved[0x17];
-} __packed;
-
-/*
- * Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
- */
-struct cxl_get_health_info {
- u8 health_status;
- u8 media_status;
- u8 add_status;
- u8 life_used;
- u8 device_temp[2];
- u8 dirty_shutdown_cnt[4];
- u8 cor_vol_err_cnt[4];
- u8 cor_per_err_cnt[4];
-} __packed;
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-struct cxl_event_mem_module {
- struct cxl_event_record_hdr hdr;
- u8 event_type;
- struct cxl_get_health_info info;
- u8 reserved[0x3d];
-} __packed;
-
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
new file mode 100644
index 000000000000..1c94e8fdd227
--- /dev/null
+++ b/include/linux/cxl-event.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CXL_EVENT_H
+#define _LINUX_CXL_EVENT_H
+
+/*
+ * CXL event records; CXL rev 3.0
+ *
+ * Copyright(c) 2023 Intel Corporation.
+ */
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+ uuid_t id;
+ u8 length;
+ u8 flags[3];
+ __le16 handle;
+ __le16 related_handle;
+ __le64 timestamp;
+ u8 maint_op_class;
+ u8 reserved[15];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+ struct cxl_event_record_hdr hdr;
+ u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
+struct cxl_event_gen_media {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 device[3];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 reserved[46];
+} __packed;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
+struct cxl_event_dram {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ u8 column[2];
+ u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+ u8 reserved[0x17];
+} __packed;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+ u8 health_status;
+ u8 media_status;
+ u8 add_status;
+ u8 life_used;
+ u8 device_temp[2];
+ u8 dirty_shutdown_cnt[4];
+ u8 cor_vol_err_cnt[4];
+ u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+ struct cxl_event_record_hdr hdr;
+ u8 event_type;
+ struct cxl_get_health_info info;
+ u8 reserved[0x3d];
+} __packed;
+
+#endif /* _LINUX_CXL_EVENT_H */
--
2.43.0
Dan points out in review that the cxl_test code could be made better
through the use of UUID's defines rather than being open coded.[1]
Static const variable still need to exist to be passed to the trace
code.
Create UUID defines and use them rather than open coding them.
[1] https://lore.kernel.org/all/[email protected]/
Suggested-by: Dan Williams <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
new patch for v2:
---
drivers/cxl/core/mbox.c | 12 +++---------
drivers/cxl/cxlmem.h | 24 ++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 9 +++------
3 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 00f429c440df..6866fb403fa1 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -840,25 +840,19 @@ EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
* General Media Event Record
* CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
*/
-static const uuid_t gen_media_event_uuid =
- UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
+static const uuid_t gen_media_event_uuid = CXL_EVENT_GEN_MEDIA_UUID;
/*
* DRAM Event Record
* CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
*/
-static const uuid_t dram_event_uuid =
- UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
+static const uuid_t dram_event_uuid = CXL_EVENT_DRAM_UUID;
/*
* Memory Module Event Record
* CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
*/
-static const uuid_t mem_mod_event_uuid =
- UUID_INIT(0xfe927475, 0xdd59, 0x4339,
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
+static const uuid_t mem_mod_event_uuid = CXL_EVENT_MEM_MODULE_UUID;
static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f0e7ebb84f02..e5d770e26e02 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -580,6 +580,30 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;
+/*
+ * General Media Event Record UUID
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MEDIA_UUID \
+ UUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record UUID
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CXL_EVENT_DRAM_UUID \
+ UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record UUID
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CXL_EVENT_MEM_MODULE_UUID \
+ UUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index ee61fa3a2411..5a95b04b329a 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -362,8 +362,7 @@ struct cxl_event_record_raw hardware_replace = {
struct cxl_event_gen_media gen_media = {
.hdr = {
- .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+ .id = CXL_EVENT_GEN_MEDIA_UUID,
.length = sizeof(struct cxl_event_gen_media),
.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
/* .handle = Set dynamically */
@@ -380,8 +379,7 @@ struct cxl_event_gen_media gen_media = {
struct cxl_event_dram dram = {
.hdr = {
- .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+ .id = CXL_EVENT_DRAM_UUID,
.length = sizeof(struct cxl_event_dram),
.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
/* .handle = Set dynamically */
@@ -400,8 +398,7 @@ struct cxl_event_dram dram = {
struct cxl_event_mem_module mem_module = {
.hdr = {
- .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+ .id = CXL_EVENT_MEM_MODULE_UUID,
.length = sizeof(struct cxl_event_mem_module),
/* .handle = Set dynamically */
.related_handle = cpu_to_le16(0),
--
2.43.0
The UEFI CXL CPER structure does not include the UUID. Now that the
UUID is passed separately to the trace event there is no need to have
the UUID in those structures.
Move UUID from the event record header to the raw structures. Adjust
cxl-test to Create dummy structures for creating test records.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from v1:
[iweiny: adjust for new UUID defines]
---
drivers/cxl/core/mbox.c | 2 +-
include/linux/cxl-event.h | 10 ++--
tools/testing/cxl/test/mem.c | 129 +++++++++++++++++++++++++------------------
3 files changed, 81 insertions(+), 60 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6866fb403fa1..f4d82e29435d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -858,7 +858,7 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
- uuid_t *id = &record->hdr.id;
+ uuid_t *id = &record->id;
if (uuid_equal(id, &gen_media_event_uuid)) {
struct cxl_event_gen_media *rec =
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 1c94e8fdd227..ebb00ead1496 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -8,12 +8,7 @@
* Copyright(c) 2023 Intel Corporation.
*/
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
struct cxl_event_record_hdr {
- uuid_t id;
u8 length;
u8 flags[3];
__le16 handle;
@@ -23,8 +18,13 @@ struct cxl_event_record_hdr {
u8 reserved[15];
} __packed;
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
struct cxl_event_record_raw {
+ uuid_t id;
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 5a95b04b329a..9cc2b8ce1efd 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -337,9 +337,9 @@ static void cxl_mock_event_trigger(struct device *dev)
}
struct cxl_event_record_raw maint_needed = {
+ .id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
+ 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.hdr = {
- .id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
- 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.length = sizeof(struct cxl_event_record_raw),
.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
/* .handle = Set dynamically */
@@ -349,9 +349,9 @@ struct cxl_event_record_raw maint_needed = {
};
struct cxl_event_record_raw hardware_replace = {
+ .id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
+ 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.hdr = {
- .id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
- 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.length = sizeof(struct cxl_event_record_raw),
.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
/* .handle = Set dynamically */
@@ -360,61 +360,82 @@ struct cxl_event_record_raw hardware_replace = {
.data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
-struct cxl_event_gen_media gen_media = {
- .hdr = {
- .id = CXL_EVENT_GEN_MEDIA_UUID,
- .length = sizeof(struct cxl_event_gen_media),
- .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_gen_media {
+ uuid_t id;
+ struct cxl_event_gen_media rec;
+} __packed;
+
+struct cxl_test_gen_media gen_media = {
+ .id = CXL_EVENT_GEN_MEDIA_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_gen_media),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x2000),
+ .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+ .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+ /* .validity_flags = <set below> */
+ .channel = 1,
+ .rank = 30
},
- .phys_addr = cpu_to_le64(0x2000),
- .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
- .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
- .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
- /* .validity_flags = <set below> */
- .channel = 1,
- .rank = 30
};
-struct cxl_event_dram dram = {
- .hdr = {
- .id = CXL_EVENT_DRAM_UUID,
- .length = sizeof(struct cxl_event_dram),
- .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_dram {
+ uuid_t id;
+ struct cxl_event_dram rec;
+} __packed;
+
+struct cxl_test_dram dram = {
+ .id = CXL_EVENT_DRAM_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_dram),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x8000),
+ .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+ .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+ /* .validity_flags = <set below> */
+ .channel = 1,
+ .bank_group = 5,
+ .bank = 2,
+ .column = {0xDE, 0xAD},
},
- .phys_addr = cpu_to_le64(0x8000),
- .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
- .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
- .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
- /* .validity_flags = <set below> */
- .channel = 1,
- .bank_group = 5,
- .bank = 2,
- .column = {0xDE, 0xAD},
};
-struct cxl_event_mem_module mem_module = {
- .hdr = {
- .id = CXL_EVENT_MEM_MODULE_UUID,
- .length = sizeof(struct cxl_event_mem_module),
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_mem_module {
+ uuid_t id;
+ struct cxl_event_mem_module rec;
+} __packed;
+
+struct cxl_test_mem_module mem_module = {
+ .id = CXL_EVENT_MEM_MODULE_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_mem_module),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .event_type = CXL_MMER_TEMP_CHANGE,
+ .info = {
+ .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+ .media_status = CXL_DHI_MS_ALL_DATA_LOST,
+ .add_status = (CXL_DHI_AS_CRITICAL << 2) |
+ (CXL_DHI_AS_WARNING << 4) |
+ (CXL_DHI_AS_WARNING << 5),
+ .device_temp = { 0xDE, 0xAD},
+ .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ }
},
- .event_type = CXL_MMER_TEMP_CHANGE,
- .info = {
- .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
- .media_status = CXL_DHI_MS_ALL_DATA_LOST,
- .add_status = (CXL_DHI_AS_CRITICAL << 2) |
- (CXL_DHI_AS_WARNING << 4) |
- (CXL_DHI_AS_WARNING << 5),
- .device_temp = { 0xDE, 0xAD},
- .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
- .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- }
};
static int mock_set_timestamp(struct cxl_dev_state *cxlds,
@@ -436,11 +457,11 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
static void cxl_mock_add_event_logs(struct mock_event_store *mes)
{
put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK,
- &gen_media.validity_flags);
+ &gen_media.rec.validity_flags);
put_unaligned_le16(CXL_DER_VALID_CHANNEL | CXL_DER_VALID_BANK_GROUP |
CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN,
- &dram.validity_flags);
+ &dram.rec.validity_flags);
mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
mes_add_event(mes, CXL_EVENT_TYPE_INFO,
--
2.43.0
The CXL CPER and event log records share everything but a UUID/GUID in
their structures.
Define a cxl_event union without the UUID/GUID to be shared between the
CPER and event log record formats. Adjust the code to use this union.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC:
[iweiny: ensure event union is packed]
---
drivers/cxl/core/mbox.c | 32 +++++++++++++-------------------
drivers/cxl/core/trace.h | 8 ++++----
include/linux/cxl-event.h | 23 +++++++++++++++++------
tools/testing/cxl/test/mem.c | 31 ++++++++++++++++++-------------
4 files changed, 52 insertions(+), 42 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f4d82e29435d..a67161f8764a 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -858,26 +858,17 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
+ union cxl_event *evt = &record->event;
uuid_t *id = &record->id;
- if (uuid_equal(id, &gen_media_event_uuid)) {
- struct cxl_event_gen_media *rec =
- (struct cxl_event_gen_media *)record;
-
- trace_cxl_general_media(cxlmd, type, id, rec);
- } else if (uuid_equal(id, &dram_event_uuid)) {
- struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
-
- trace_cxl_dram(cxlmd, type, id, rec);
- } else if (uuid_equal(id, &mem_mod_event_uuid)) {
- struct cxl_event_mem_module *rec =
- (struct cxl_event_mem_module *)record;
-
- trace_cxl_memory_module(cxlmd, type, id, rec);
- } else {
- /* For unknown record types print just the header */
- trace_cxl_generic_event(cxlmd, type, id, record);
- }
+ if (uuid_equal(id, &gen_media_event_uuid))
+ trace_cxl_general_media(cxlmd, type, id, &evt->gen_media);
+ else if (uuid_equal(id, &dram_event_uuid))
+ trace_cxl_dram(cxlmd, type, id, &evt->dram);
+ else if (uuid_equal(id, &mem_mod_event_uuid))
+ trace_cxl_memory_module(cxlmd, type, id, &evt->mem_module);
+ else
+ trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
}
static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -920,7 +911,10 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
*/
i = 0;
for (cnt = 0; cnt < total; cnt++) {
- payload->handles[i++] = get_pl->records[cnt].hdr.handle;
+ struct cxl_event_record_raw *raw = &get_pl->records[cnt];
+ struct cxl_event_generic *gen = &raw->event.generic;
+
+ payload->handles[i++] = gen->hdr.handle;
dev_dbg(mds->cxlds.dev, "Event log '%d': Clearing %u\n", log,
le16_to_cpu(payload->handles[i]));
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 68973a101a75..3e09f2f2d7df 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
TRACE_EVENT(cxl_generic_event,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- const uuid_t *uuid, struct cxl_event_record_raw *rec),
+ const uuid_t *uuid, struct cxl_event_generic *gen_rec),
- TP_ARGS(cxlmd, log, uuid, rec),
+ TP_ARGS(cxlmd, log, uuid, gen_rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -235,8 +235,8 @@ TRACE_EVENT(cxl_generic_event,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
- memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, gen_rec->hdr);
+ memcpy(__entry->data, gen_rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),
CXL_EVT_TP_printk("%s",
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index ebb00ead1496..18dab4d90dc8 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -18,13 +18,8 @@ struct cxl_event_record_hdr {
u8 reserved[15];
} __packed;
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- uuid_t id;
+struct cxl_event_generic {
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
@@ -97,4 +92,20 @@ struct cxl_event_mem_module {
u8 reserved[0x3d];
} __packed;
+union cxl_event {
+ struct cxl_event_generic generic;
+ struct cxl_event_gen_media gen_media;
+ struct cxl_event_dram dram;
+ struct cxl_event_mem_module mem_module;
+} __packed;
+
+/*
+ * Common Event Record Format; in event logs
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_raw {
+ uuid_t id;
+ union cxl_event event;
+} __packed;
+
#endif /* _LINUX_CXL_EVENT_H */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9cc2b8ce1efd..35ee41e435ab 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -251,7 +251,8 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) {
memcpy(&pl->records[i], event_get_current(log),
sizeof(pl->records[i]));
- pl->records[i].hdr.handle = event_get_cur_event_handle(log);
+ pl->records[i].event.generic.hdr.handle =
+ event_get_cur_event_handle(log);
log->cur_idx++;
}
@@ -339,25 +340,29 @@ static void cxl_mock_event_trigger(struct device *dev)
struct cxl_event_record_raw maint_needed = {
.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
- .hdr = {
- .length = sizeof(struct cxl_event_record_raw),
- .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0xa5b6),
+ .event.generic = {
+ .hdr = {
+ .length = sizeof(struct cxl_event_record_raw),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xa5b6),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
},
- .data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
struct cxl_event_record_raw hardware_replace = {
.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
- .hdr = {
- .length = sizeof(struct cxl_event_record_raw),
- .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0xb6a5),
+ .event.generic = {
+ .hdr = {
+ .length = sizeof(struct cxl_event_record_raw),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xb6a5),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
},
- .data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
struct cxl_test_gen_media {
--
2.43.0
If the firmware has configured CXL event support to be firmware first
the OS can process those events through CPER records. The CXL layer has
unique DPA to HPA knowledge and standard event trace parsing in place.
CPER records contain Bus, Device, Function information which can be used
to identify the PCI device which is sending the event.
Change pci driver registration to include registration for a CXL CPER
callback to process the events through the trace subsystem.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from v1:
[djbw: use single registration function]
---
drivers/cxl/core/mbox.c | 31 ++++++++++++++++++++++++-----
drivers/cxl/cxlmem.h | 6 ++++++
drivers/cxl/pci.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 83 insertions(+), 6 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index a67161f8764a..da262bbc3519 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -854,9 +854,30 @@ static const uuid_t dram_event_uuid = CXL_EVENT_DRAM_UUID;
*/
static const uuid_t mem_mod_event_uuid = CXL_EVENT_MEM_MODULE_UUID;
-static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
- enum cxl_event_log_type type,
- struct cxl_event_record_raw *record)
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ union cxl_event *event)
+{
+ switch (event_type) {
+ case CXL_CPER_EVENT_GEN_MEDIA:
+ trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
+ &event->gen_media);
+ break;
+ case CXL_CPER_EVENT_DRAM:
+ trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
+ break;
+ case CXL_CPER_EVENT_MEM_MODULE:
+ trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
+ &event->mem_module);
+ break;
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+
+static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ struct cxl_event_record_raw *record)
{
union cxl_event *evt = &record->event;
uuid_t *id = &record->id;
@@ -979,8 +1000,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
break;
for (i = 0; i < nr_rec; i++)
- cxl_event_trace_record(cxlmd, type,
- &payload->records[i]);
+ __cxl_event_trace_record(cxlmd, type,
+ &payload->records[i]);
if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e5d770e26e02..7a891b4641cc 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -481,6 +481,8 @@ struct cxl_memdev_state {
struct cxl_security_state security;
struct cxl_fw_state fw;
+ struct notifier_block cxl_cper_nb;
+
struct rcuwait mbox_wait;
int (*mbox_send)(struct cxl_memdev_state *mds,
struct cxl_mbox_cmd *cmd);
@@ -802,6 +804,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ union cxl_event *event);
int cxl_set_timestamp(struct cxl_memdev_state *mds);
int cxl_poison_state_init(struct cxl_memdev_state *mds);
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0155fb66b580..30a98399d013 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <asm-generic/unaligned.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/moduleparam.h>
#include <linux/module.h>
@@ -969,6 +970,55 @@ static struct pci_driver cxl_pci_driver = {
},
};
+#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
+static void cxl_cper_event_call(struct cxl_cper_event_data *ev_data)
+{
+ struct cper_cxl_event_devid *device_id = &ev_data->rec->hdr.device_id;
+ struct cxl_dev_state *cxlds = NULL;
+ enum cxl_event_log_type log_type;
+ struct pci_dev *pdev;
+ unsigned int devfn;
+ u32 hdr_flags;
+
+ devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
+ pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
+ device_id->bus_num, devfn);
+
+ guard(mutex)(&pdev->dev.mutex);
+ if (pdev->driver == &cxl_pci_driver)
+ cxlds = pci_get_drvdata(pdev);
+ if (!cxlds)
+ goto out;
+
+ /* Fabricate a log type */
+ hdr_flags = get_unaligned_le24(ev_data->rec->event.generic.hdr.flags);
+ log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
+
+ cxl_event_trace_record(cxlds->cxlmd, log_type, ev_data->event_type,
+ &ev_data->rec->event);
+out:
+ pci_dev_put(pdev);
+}
+
+static int __init cxl_pci_driver_init(void)
+{
+ int rc;
+
+ rc = pci_register_driver(&cxl_pci_driver);
+ if (rc)
+ return rc;
+
+ cxl_cper_register_notifier(cxl_cper_event_call);
+ return 0;
+}
+
+static void __exit cxl_pci_driver_exit(void)
+{
+ cxl_cper_unregister_notifier();
+ pci_unregister_driver(&cxl_pci_driver);
+}
+
+module_init(cxl_pci_driver_init);
+module_exit(cxl_pci_driver_exit);
MODULE_LICENSE("GPL v2");
-module_pci_driver(cxl_pci_driver);
MODULE_IMPORT_NS(CXL);
--
2.43.0
BIOS can configure memory devices as firmware first. This will send CXL
events to the firmware instead of the OS. The firmware can then send
these events to the OS via UEFI.
UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
format for CXL Component Events. The format is mostly the same as the
CXL Common Event Record Format. The difference is a GUID is used in
the Section Type to identify the event type.
Add EFI support to detect CXL CPER records and call a notifier chain
with the record data blobs to be processed by the CXL code.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from v1:
[djbw: convert to single notifier callback]
[djbw: append _GUID to guid defines]
[iweiny: clean up function names]
---
drivers/firmware/efi/cper.c | 15 ++++++++++++
drivers/firmware/efi/cper_cxl.c | 45 ++++++++++++++++++++++++++++++++++++
drivers/firmware/efi/cper_cxl.h | 29 +++++++++++++++++++++++
include/linux/cxl-event.h | 51 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 140 insertions(+)
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f667781..39c65733ae9b 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -22,6 +22,7 @@
#include <linux/aer.h>
#include <linux/printk.h>
#include <linux/bcd.h>
+#include <linux/cxl-event.h>
#include <acpi/ghes.h>
#include <ras/ras_event.h>
#include "cper_cxl.h"
@@ -607,6 +608,20 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
cper_print_prot_err(newpfx, prot_err);
else
goto err_section_too_small;
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ struct cper_cxl_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ if (rec->hdr.length <= sizeof(rec->hdr))
+ goto err_section_too_small;
+
+ if (rec->hdr.length > sizeof(*rec)) {
+ pr_err(FW_WARN "error section length is too big\n");
+ return;
+ }
+
+ cxl_cper_post_event(newpfx, sec_type, rec);
} else {
const void *err = acpi_hest_get_payload(gdata);
diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index a55771b99a97..669983f7956f 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -8,6 +8,7 @@
*/
#include <linux/cper.h>
+#include <linux/cxl-event.h>
#include "cper_cxl.h"
#define PROT_ERR_VALID_AGENT_TYPE BIT_ULL(0)
@@ -187,3 +188,47 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
sizeof(cxl_ras->header_log), 0);
}
}
+
+DECLARE_RWSEM(cxl_cper_rw_sem);
+static cxl_cper_notifier cper_notifier;
+
+void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
+ struct cper_cxl_event_rec *rec)
+{
+ struct cxl_cper_event_data data = {
+ .rec = rec,
+ };
+
+ if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
+ pr_err(FW_WARN "cxl event no Component Event Log present\n");
+ return;
+ }
+
+ if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID))
+ data.event_type = CXL_CPER_EVENT_GEN_MEDIA;
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID))
+ data.event_type = CXL_CPER_EVENT_DRAM;
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID))
+ data.event_type = CXL_CPER_EVENT_MEM_MODULE;
+
+ down_read(&cxl_cper_rw_sem);
+ if (cper_notifier)
+ cper_notifier(&data);
+ up_read(&cxl_cper_rw_sem);
+}
+
+void cxl_cper_register_notifier(cxl_cper_notifier notifier)
+{
+ down_write(&cxl_cper_rw_sem);
+ cper_notifier = notifier;
+ up_write(&cxl_cper_rw_sem);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_notifier, CXL);
+
+void cxl_cper_unregister_notifier(void)
+{
+ down_write(&cxl_cper_rw_sem);
+ cper_notifier = NULL;
+ up_write(&cxl_cper_rw_sem);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_notifier, CXL);
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
index 86bfcf7909ec..b1b1b0514f6b 100644
--- a/drivers/firmware/efi/cper_cxl.h
+++ b/drivers/firmware/efi/cper_cxl.h
@@ -10,11 +10,38 @@
#ifndef LINUX_CPER_CXL_H
#define LINUX_CPER_CXL_H
+#include <linux/cxl-event.h>
+
/* CXL Protocol Error Section */
#define CPER_SEC_CXL_PROT_ERR \
GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
0x4B, 0x77, 0x10, 0x48)
+/* CXL Event record UUIDs are formated at GUIDs and reported in section type */
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID \
+ GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID \
+ GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE_GUID \
+ GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
#pragma pack(1)
/* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
@@ -62,5 +89,7 @@ struct cper_sec_prot_err {
#pragma pack()
void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
+void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
+ struct cper_cxl_event_rec *rec);
#endif //__CPER_CXL_
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 18dab4d90dc8..c764ff877a6d 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -108,4 +108,55 @@ struct cxl_event_record_raw {
union cxl_event event;
} __packed;
+enum cxl_event_type {
+ CXL_CPER_EVENT_GEN_MEDIA,
+ CXL_CPER_EVENT_DRAM,
+ CXL_CPER_EVENT_MEM_MODULE,
+};
+
+#pragma pack(1)
+
+#define CPER_CXL_DEVICE_ID_VALID BIT(0)
+#define CPER_CXL_DEVICE_SN_VALID BIT(1)
+#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
+struct cper_cxl_event_rec {
+ struct {
+ u32 length;
+ u64 validation_bits;
+ struct cper_cxl_event_devid {
+ u16 vendor_id;
+ u16 device_id;
+ u8 func_num;
+ u8 device_num;
+ u8 bus_num;
+ u16 segment_num;
+ u16 slot_num; /* bits 2:0 reserved */
+ u8 reserved;
+ } device_id;
+ struct cper_cxl_event_sn {
+ u32 lower_dw;
+ u32 upper_dw;
+ } dev_serial_num;
+ } hdr;
+
+ union cxl_event event;
+};
+
+struct cxl_cper_event_data {
+ enum cxl_event_type event_type;
+ struct cper_cxl_event_rec *rec;
+};
+
+#pragma pack()
+
+typedef void (*cxl_cper_notifier)(struct cxl_cper_event_data *ev_data);
+
+#ifdef CONFIG_UEFI_CPER
+void cxl_cper_register_notifier(cxl_cper_notifier notifier);
+void cxl_cper_unregister_notifier(void);
+#else
+static inline void cxl_cper_register_notifier(cxl_cper_notifier notifier) { }
+static inline void cxl_cper_unregister_notifier(void) { }
+#endif
+
#endif /* _LINUX_CXL_EVENT_H */
--
2.43.0
On Mon, 11 Dec 2023 at 23:57, Ira Weiny <[email protected]> wrote:
>
> BIOS can configure memory devices as firmware first. This will send CXL
> events to the firmware instead of the OS. The firmware can then send
> these events to the OS via UEFI.
>
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events. The format is mostly the same as the
> CXL Common Event Record Format. The difference is a GUID is used in
> the Section Type to identify the event type.
>
> Add EFI support to detect CXL CPER records and call a notifier chain
> with the record data blobs to be processed by the CXL code.
>
> Signed-off-by: Ira Weiny <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
>
> ---
> Changes from v1:
> [djbw: convert to single notifier callback]
> [djbw: append _GUID to guid defines]
> [iweiny: clean up function names]
> ---
> drivers/firmware/efi/cper.c | 15 ++++++++++++
> drivers/firmware/efi/cper_cxl.c | 45 ++++++++++++++++++++++++++++++++++++
> drivers/firmware/efi/cper_cxl.h | 29 +++++++++++++++++++++++
> include/linux/cxl-event.h | 51 +++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 140 insertions(+)
>
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 35c37f667781..39c65733ae9b 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -22,6 +22,7 @@
> #include <linux/aer.h>
> #include <linux/printk.h>
> #include <linux/bcd.h>
> +#include <linux/cxl-event.h>
> #include <acpi/ghes.h>
> #include <ras/ras_event.h>
> #include "cper_cxl.h"
> @@ -607,6 +608,20 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> cper_print_prot_err(newpfx, prot_err);
> else
> goto err_section_too_small;
> + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
> + guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
> + guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
> + struct cper_cxl_event_rec *rec = acpi_hest_get_payload(gdata);
> +
> + if (rec->hdr.length <= sizeof(rec->hdr))
> + goto err_section_too_small;
> +
> + if (rec->hdr.length > sizeof(*rec)) {
> + pr_err(FW_WARN "error section length is too big\n");
> + return;
> + }
> +
> + cxl_cper_post_event(newpfx, sec_type, rec);
> } else {
> const void *err = acpi_hest_get_payload(gdata);
>
> diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
> index a55771b99a97..669983f7956f 100644
> --- a/drivers/firmware/efi/cper_cxl.c
> +++ b/drivers/firmware/efi/cper_cxl.c
> @@ -8,6 +8,7 @@
> */
>
> #include <linux/cper.h>
> +#include <linux/cxl-event.h>
> #include "cper_cxl.h"
>
> #define PROT_ERR_VALID_AGENT_TYPE BIT_ULL(0)
> @@ -187,3 +188,47 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
> sizeof(cxl_ras->header_log), 0);
> }
> }
> +
> +DECLARE_RWSEM(cxl_cper_rw_sem);
> +static cxl_cper_notifier cper_notifier;
> +
> +void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
> + struct cper_cxl_event_rec *rec)
> +{
> + struct cxl_cper_event_data data = {
> + .rec = rec,
> + };
> +
> + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> + pr_err(FW_WARN "cxl event no Component Event Log present\n");
> + return;
> + }
> +
> + if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID))
> + data.event_type = CXL_CPER_EVENT_GEN_MEDIA;
> + else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID))
> + data.event_type = CXL_CPER_EVENT_DRAM;
> + else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID))
> + data.event_type = CXL_CPER_EVENT_MEM_MODULE;
> +
> + down_read(&cxl_cper_rw_sem);
> + if (cper_notifier)
> + cper_notifier(&data);
> + up_read(&cxl_cper_rw_sem);
> +}
> +
> +void cxl_cper_register_notifier(cxl_cper_notifier notifier)
> +{
> + down_write(&cxl_cper_rw_sem);
> + cper_notifier = notifier;
> + up_write(&cxl_cper_rw_sem);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_notifier, CXL);
> +
> +void cxl_cper_unregister_notifier(void)
> +{
> + down_write(&cxl_cper_rw_sem);
> + cper_notifier = NULL;
> + up_write(&cxl_cper_rw_sem);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_notifier, CXL);
> diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
> index 86bfcf7909ec..b1b1b0514f6b 100644
> --- a/drivers/firmware/efi/cper_cxl.h
> +++ b/drivers/firmware/efi/cper_cxl.h
> @@ -10,11 +10,38 @@
> #ifndef LINUX_CPER_CXL_H
> #define LINUX_CPER_CXL_H
>
> +#include <linux/cxl-event.h>
> +
> /* CXL Protocol Error Section */
> #define CPER_SEC_CXL_PROT_ERR \
> GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
> 0x4B, 0x77, 0x10, 0x48)
>
> +/* CXL Event record UUIDs are formated at GUIDs and reported in section type */
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CPER_SEC_CXL_GEN_MEDIA_GUID \
> + GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
> + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
> +
> +/*
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +#define CPER_SEC_CXL_DRAM_GUID \
> + GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
> + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
> +
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +#define CPER_SEC_CXL_MEM_MODULE_GUID \
> + GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
> + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
> +
> #pragma pack(1)
>
> /* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
> @@ -62,5 +89,7 @@ struct cper_sec_prot_err {
> #pragma pack()
>
> void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
> +void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
> + struct cper_cxl_event_rec *rec);
>
> #endif //__CPER_CXL_
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 18dab4d90dc8..c764ff877a6d 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -108,4 +108,55 @@ struct cxl_event_record_raw {
> union cxl_event event;
> } __packed;
>
> +enum cxl_event_type {
> + CXL_CPER_EVENT_GEN_MEDIA,
> + CXL_CPER_EVENT_DRAM,
> + CXL_CPER_EVENT_MEM_MODULE,
> +};
> +
> +#pragma pack(1)
> +
> +#define CPER_CXL_DEVICE_ID_VALID BIT(0)
> +#define CPER_CXL_DEVICE_SN_VALID BIT(1)
> +#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
> +struct cper_cxl_event_rec {
> + struct {
> + u32 length;
> + u64 validation_bits;
> + struct cper_cxl_event_devid {
> + u16 vendor_id;
> + u16 device_id;
> + u8 func_num;
> + u8 device_num;
> + u8 bus_num;
> + u16 segment_num;
> + u16 slot_num; /* bits 2:0 reserved */
> + u8 reserved;
> + } device_id;
> + struct cper_cxl_event_sn {
> + u32 lower_dw;
> + u32 upper_dw;
> + } dev_serial_num;
> + } hdr;
> +
> + union cxl_event event;
> +};
> +
> +struct cxl_cper_event_data {
> + enum cxl_event_type event_type;
> + struct cper_cxl_event_rec *rec;
> +};
> +
> +#pragma pack()
> +
> +typedef void (*cxl_cper_notifier)(struct cxl_cper_event_data *ev_data);
> +
> +#ifdef CONFIG_UEFI_CPER
> +void cxl_cper_register_notifier(cxl_cper_notifier notifier);
> +void cxl_cper_unregister_notifier(void);
> +#else
> +static inline void cxl_cper_register_notifier(cxl_cper_notifier notifier) { }
> +static inline void cxl_cper_unregister_notifier(void) { }
> +#endif
> +
> #endif /* _LINUX_CXL_EVENT_H */
>
> --
> 2.43.0
>
Ira Weiny wrote:
> BIOS can configure memory devices as firmware first. This will send CXL
> events to the firmware instead of the OS. The firmware can then send
> these events to the OS via UEFI.
>
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events. The format is mostly the same as the
> CXL Common Event Record Format. The difference is a GUID is used in
> the Section Type to identify the event type.
>
> Add EFI support to detect CXL CPER records and call a notifier chain
> with the record data blobs to be processed by the CXL code.
It is no longer a notifier chain in this version. I wouldn't even call
it a notifier, it's just a registered callback.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from v1:
> [djbw: convert to single notifier callback]
> [djbw: append _GUID to guid defines]
> [iweiny: clean up function names]
> ---
> drivers/firmware/efi/cper.c | 15 ++++++++++++
> drivers/firmware/efi/cper_cxl.c | 45 ++++++++++++++++++++++++++++++++++++
> drivers/firmware/efi/cper_cxl.h | 29 +++++++++++++++++++++++
> include/linux/cxl-event.h | 51 +++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 140 insertions(+)
>
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 35c37f667781..39c65733ae9b 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -22,6 +22,7 @@
> #include <linux/aer.h>
> #include <linux/printk.h>
> #include <linux/bcd.h>
> +#include <linux/cxl-event.h>
> #include <acpi/ghes.h>
> #include <ras/ras_event.h>
> #include "cper_cxl.h"
> @@ -607,6 +608,20 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> cper_print_prot_err(newpfx, prot_err);
> else
> goto err_section_too_small;
> + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
> + guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
> + guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
> + struct cper_cxl_event_rec *rec = acpi_hest_get_payload(gdata);
> +
> + if (rec->hdr.length <= sizeof(rec->hdr))
> + goto err_section_too_small;
> +
> + if (rec->hdr.length > sizeof(*rec)) {
> + pr_err(FW_WARN "error section length is too big\n");
> + return;
> + }
> +
> + cxl_cper_post_event(newpfx, sec_type, rec);
> } else {
> const void *err = acpi_hest_get_payload(gdata);
>
> diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
> index a55771b99a97..669983f7956f 100644
> --- a/drivers/firmware/efi/cper_cxl.c
> +++ b/drivers/firmware/efi/cper_cxl.c
> @@ -8,6 +8,7 @@
> */
>
> #include <linux/cper.h>
> +#include <linux/cxl-event.h>
> #include "cper_cxl.h"
>
> #define PROT_ERR_VALID_AGENT_TYPE BIT_ULL(0)
> @@ -187,3 +188,47 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
> sizeof(cxl_ras->header_log), 0);
> }
> }
> +
> +DECLARE_RWSEM(cxl_cper_rw_sem);
> +static cxl_cper_notifier cper_notifier;
> +
> +void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
> + struct cper_cxl_event_rec *rec)
> +{
> + struct cxl_cper_event_data data = {
> + .rec = rec,
> + };
> +
> + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> + pr_err(FW_WARN "cxl event no Component Event Log present\n");
> + return;
> + }
> +
> + if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID))
> + data.event_type = CXL_CPER_EVENT_GEN_MEDIA;
> + else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID))
> + data.event_type = CXL_CPER_EVENT_DRAM;
> + else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID))
> + data.event_type = CXL_CPER_EVENT_MEM_MODULE;
> +
> + down_read(&cxl_cper_rw_sem);
guard(rwsem_read)(&cxl_cper_rw_sem);
> + if (cper_notifier)
> + cper_notifier(&data);
> + up_read(&cxl_cper_rw_sem);
> +}
> +
> +void cxl_cper_register_notifier(cxl_cper_notifier notifier)
> +{
> + down_write(&cxl_cper_rw_sem);
guard(rwsem_write)(&cxl_cper_rw_sem);
> + cper_notifier = notifier;
I would enforce that there is only one registrant and explicitly fail
attempts to assign more than one.
> + up_write(&cxl_cper_rw_sem);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_notifier, CXL);
> +
> +void cxl_cper_unregister_notifier(void)
> +{
> + down_write(&cxl_cper_rw_sem);
guard(rwsem_write)(&cxl_cper_rw_sem);
> + cper_notifier = NULL;
This could enforce that the same callback specified at registration time
must also be passed at unregistration to disallow anonymous
unregistration.
Makes the code self documenting that the registrant is a singleton, and
that unregistration must precede the next registration.
> + up_write(&cxl_cper_rw_sem);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_notifier, CXL);
> diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
> index 86bfcf7909ec..b1b1b0514f6b 100644
> --- a/drivers/firmware/efi/cper_cxl.h
> +++ b/drivers/firmware/efi/cper_cxl.h
> @@ -10,11 +10,38 @@
> #ifndef LINUX_CPER_CXL_H
> #define LINUX_CPER_CXL_H
>
> +#include <linux/cxl-event.h>
> +
> /* CXL Protocol Error Section */
> #define CPER_SEC_CXL_PROT_ERR \
> GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
> 0x4B, 0x77, 0x10, 0x48)
>
> +/* CXL Event record UUIDs are formated at GUIDs and reported in section type */
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CPER_SEC_CXL_GEN_MEDIA_GUID \
> + GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
> + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
> +
> +/*
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +#define CPER_SEC_CXL_DRAM_GUID \
> + GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
> + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
> +
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +#define CPER_SEC_CXL_MEM_MODULE_GUID \
> + GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
> + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
> +
> #pragma pack(1)
>
> /* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
> @@ -62,5 +89,7 @@ struct cper_sec_prot_err {
> #pragma pack()
>
> void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
> +void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
> + struct cper_cxl_event_rec *rec);
>
> #endif //__CPER_CXL_
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 18dab4d90dc8..c764ff877a6d 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -108,4 +108,55 @@ struct cxl_event_record_raw {
> union cxl_event event;
> } __packed;
>
> +enum cxl_event_type {
> + CXL_CPER_EVENT_GEN_MEDIA,
> + CXL_CPER_EVENT_DRAM,
> + CXL_CPER_EVENT_MEM_MODULE,
> +};
> +
> +#pragma pack(1)
Looks like there is usage of __packed a few lines up, just pick
one-style. Prefer __packed vs #pragma when only a small handful of
structures need annotation as that is easier to check for correctness in
patch form.
> +
> +#define CPER_CXL_DEVICE_ID_VALID BIT(0)
> +#define CPER_CXL_DEVICE_SN_VALID BIT(1)
> +#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
> +struct cper_cxl_event_rec {
> + struct {
> + u32 length;
> + u64 validation_bits;
> + struct cper_cxl_event_devid {
> + u16 vendor_id;
> + u16 device_id;
> + u8 func_num;
> + u8 device_num;
> + u8 bus_num;
> + u16 segment_num;
> + u16 slot_num; /* bits 2:0 reserved */
> + u8 reserved;
> + } device_id;
> + struct cper_cxl_event_sn {
> + u32 lower_dw;
> + u32 upper_dw;
> + } dev_serial_num;
> + } hdr;
> +
> + union cxl_event event;
> +};
> +
> +struct cxl_cper_event_data {
> + enum cxl_event_type event_type;
> + struct cper_cxl_event_rec *rec;
> +};
> +
> +#pragma pack()
> +
> +typedef void (*cxl_cper_notifier)(struct cxl_cper_event_data *ev_data);
> +
> +#ifdef CONFIG_UEFI_CPER
> +void cxl_cper_register_notifier(cxl_cper_notifier notifier);
> +void cxl_cper_unregister_notifier(void);
> +#else
> +static inline void cxl_cper_register_notifier(cxl_cper_notifier notifier) { }
> +static inline void cxl_cper_unregister_notifier(void) { }
> +#endif
> +
> #endif /* _LINUX_CXL_EVENT_H */
>
> --
> 2.43.0
>
Ira Weiny wrote:
> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change pci driver registration to include registration for a CXL CPER
> callback to process the events through the trace subsystem.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from v1:
> [djbw: use single registration function]
> ---
> drivers/cxl/core/mbox.c | 31 ++++++++++++++++++++++++-----
> drivers/cxl/cxlmem.h | 6 ++++++
> drivers/cxl/pci.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 83 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index a67161f8764a..da262bbc3519 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -854,9 +854,30 @@ static const uuid_t dram_event_uuid = CXL_EVENT_DRAM_UUID;
> */
> static const uuid_t mem_mod_event_uuid = CXL_EVENT_MEM_MODULE_UUID;
>
> -static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - struct cxl_event_record_raw *record)
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + union cxl_event *event)
> +{
> + switch (event_type) {
> + case CXL_CPER_EVENT_GEN_MEDIA:
> + trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
> + &event->gen_media);
> + break;
> + case CXL_CPER_EVENT_DRAM:
> + trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
> + break;
> + case CXL_CPER_EVENT_MEM_MODULE:
> + trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
> + &event->mem_module);
> + break;
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +
> +static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> {
> union cxl_event *evt = &record->event;
> uuid_t *id = &record->id;
> @@ -979,8 +1000,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> break;
>
> for (i = 0; i < nr_rec; i++)
> - cxl_event_trace_record(cxlmd, type,
> - &payload->records[i]);
> + __cxl_event_trace_record(cxlmd, type,
> + &payload->records[i]);
>
> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_overflow(cxlmd, type, payload);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index e5d770e26e02..7a891b4641cc 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -481,6 +481,8 @@ struct cxl_memdev_state {
> struct cxl_security_state security;
> struct cxl_fw_state fw;
>
> + struct notifier_block cxl_cper_nb;
> +
> struct rcuwait mbox_wait;
> int (*mbox_send)(struct cxl_memdev_state *mds,
> struct cxl_mbox_cmd *cmd);
> @@ -802,6 +804,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + union cxl_event *event);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0155fb66b580..30a98399d013 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include <asm-generic/unaligned.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/moduleparam.h>
> #include <linux/module.h>
> @@ -969,6 +970,55 @@ static struct pci_driver cxl_pci_driver = {
> },
> };
>
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static void cxl_cper_event_call(struct cxl_cper_event_data *ev_data)
> +{
> + struct cper_cxl_event_devid *device_id = &ev_data->rec->hdr.device_id;
> + struct cxl_dev_state *cxlds = NULL;
> + enum cxl_event_log_type log_type;
> + struct pci_dev *pdev;
> + unsigned int devfn;
> + u32 hdr_flags;
> +
> + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> + device_id->bus_num, devfn);
What if pci_get_domain_bus_and_slot() returned NULL?
> +
> + guard(mutex)(&pdev->dev.mutex);
Lets not open code this since device_lock() is so prevalent it deserves
its own guard() type:
DEFINE_GUARD(device, struct device *, device_lock(_T), device_unlock(_T))
> + if (pdev->driver == &cxl_pci_driver)
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + goto out;
Lets not mix usage of cleanup.h helpers with usage of goto. The helpers
are there to eliminate goto errors. Just add a new helper:
DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
...and declare @pdev as:
struct pci_dev *pdev __free(pci_dev_put) = NULL;
> +
> + /* Fabricate a log type */
> + hdr_flags = get_unaligned_le24(ev_data->rec->event.generic.hdr.flags);
> + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_data->event_type,
> + &ev_data->rec->event);
> +out:
> + pci_dev_put(pdev);
> +}
> +
> +static int __init cxl_pci_driver_init(void)
> +{
> + int rc;
> +
> + rc = pci_register_driver(&cxl_pci_driver);
> + if (rc)
> + return rc;
> +
> + cxl_cper_register_notifier(cxl_cper_event_call);
> + return 0;
> +}
> +
> +static void __exit cxl_pci_driver_exit(void)
> +{
> + cxl_cper_unregister_notifier();
> + pci_unregister_driver(&cxl_pci_driver);
> +}
> +
> +module_init(cxl_pci_driver_init);
> +module_exit(cxl_pci_driver_exit);
> MODULE_LICENSE("GPL v2");
> -module_pci_driver(cxl_pci_driver);
> MODULE_IMPORT_NS(CXL);
>
> --
> 2.43.0
>
Dan Williams wrote:
> Ira Weiny wrote:
> > BIOS can configure memory devices as firmware first. This will send CXL
> > events to the firmware instead of the OS. The firmware can then send
> > these events to the OS via UEFI.
> >
> > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> > format for CXL Component Events. The format is mostly the same as the
> > CXL Common Event Record Format. The difference is a GUID is used in
> > the Section Type to identify the event type.
> >
> > Add EFI support to detect CXL CPER records and call a notifier chain
> > with the record data blobs to be processed by the CXL code.
>
> It is no longer a notifier chain in this version. I wouldn't even call
> it a notifier, it's just a registered callback.
Ah yea I missed that in my rework sorry.
[snip]
> > +DECLARE_RWSEM(cxl_cper_rw_sem);
> > +static cxl_cper_notifier cper_notifier;
> > +
> > +void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
> > + struct cper_cxl_event_rec *rec)
> > +{
> > + struct cxl_cper_event_data data = {
> > + .rec = rec,
> > + };
> > +
> > + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> > + pr_err(FW_WARN "cxl event no Component Event Log present\n");
> > + return;
> > + }
> > +
> > + if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID))
> > + data.event_type = CXL_CPER_EVENT_GEN_MEDIA;
> > + else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID))
> > + data.event_type = CXL_CPER_EVENT_DRAM;
> > + else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID))
> > + data.event_type = CXL_CPER_EVENT_MEM_MODULE;
> > +
> > + down_read(&cxl_cper_rw_sem);
>
> guard(rwsem_read)(&cxl_cper_rw_sem);
Much better. Done throughout.
>
> > + if (cper_notifier)
> > + cper_notifier(&data);
> > + up_read(&cxl_cper_rw_sem);
> > +}
> > +
> > +void cxl_cper_register_notifier(cxl_cper_notifier notifier)
> > +{
> > + down_write(&cxl_cper_rw_sem);
>
> guard(rwsem_write)(&cxl_cper_rw_sem);
>
> > + cper_notifier = notifier;
>
> I would enforce that there is only one registrant and explicitly fail
> attempts to assign more than one.
Done.
>
> > + up_write(&cxl_cper_rw_sem);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_notifier, CXL);
> > +
> > +void cxl_cper_unregister_notifier(void)
> > +{
> > + down_write(&cxl_cper_rw_sem);
>
> guard(rwsem_write)(&cxl_cper_rw_sem);
>
> > + cper_notifier = NULL;
>
> This could enforce that the same callback specified at registration time
> must also be passed at unregistration to disallow anonymous
> unregistration.
>
> Makes the code self documenting that the registrant is a singleton, and
> that unregistration must precede the next registration.
But what do we do if it does not match? Returning an error will be
ignored by the cxl_pci_driver_exit() and if we enforce the singleton in
the registration I don't see a lot of room for error here.
[snip]
> > diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> > index 18dab4d90dc8..c764ff877a6d 100644
> > --- a/include/linux/cxl-event.h
> > +++ b/include/linux/cxl-event.h
> > @@ -108,4 +108,55 @@ struct cxl_event_record_raw {
> > union cxl_event event;
> > } __packed;
> >
> > +enum cxl_event_type {
> > + CXL_CPER_EVENT_GEN_MEDIA,
> > + CXL_CPER_EVENT_DRAM,
> > + CXL_CPER_EVENT_MEM_MODULE,
> > +};
> > +
> > +#pragma pack(1)
>
> Looks like there is usage of __packed a few lines up, just pick
> one-style. Prefer __packed vs #pragma when only a small handful of
> structures need annotation as that is easier to check for correctness in
> patch form.
Ok I'll change it. Smita requested the use of pragma but keeping the
__packed is redundant right now. And I'll go with your preference of
__packed.
Ira
Dan Williams wrote:
> Ira Weiny wrote:
[snip]
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index e5d770e26e02..7a891b4641cc 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -481,6 +481,8 @@ struct cxl_memdev_state {
> > struct cxl_security_state security;
> > struct cxl_fw_state fw;
> >
> > + struct notifier_block cxl_cper_nb;
delete this...
[snip]
> >
> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +static void cxl_cper_event_call(struct cxl_cper_event_data *ev_data)
> > +{
> > + struct cper_cxl_event_devid *device_id = &ev_data->rec->hdr.device_id;
> > + struct cxl_dev_state *cxlds = NULL;
> > + enum cxl_event_log_type log_type;
> > + struct pci_dev *pdev;
> > + unsigned int devfn;
> > + u32 hdr_flags;
> > +
> > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > + device_id->bus_num, devfn);
>
> What if pci_get_domain_bus_and_slot() returned NULL?
:-/ yep.
>
> > +
> > + guard(mutex)(&pdev->dev.mutex);
>
> Lets not open code this since device_lock() is so prevalent it deserves
> its own guard() type:
>
> DEFINE_GUARD(device, struct device *, device_lock(_T), device_unlock(_T))
Good idea.
>
> > + if (pdev->driver == &cxl_pci_driver)
> > + cxlds = pci_get_drvdata(pdev);
> > + if (!cxlds)
> > + goto out;
>
> Lets not mix usage of cleanup.h helpers with usage of goto. The helpers
> are there to eliminate goto errors. Just add a new helper:
>
> DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
>
> ...and declare @pdev as:
>
> struct pci_dev *pdev __free(pci_dev_put) = NULL;
>
Even better idea. Done.
Thanks,
Ira