2023-12-15 23:26:45

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 0/7] efi/cxl-cper: Report CPER CXL component events through trace events

Series status/background
========================

Smita has been a great help with this series.

Dan had a better idea for how to register the call between the efi and
cxl subsystems so this is reworked for V2-4.

Dependencies
============

NOTE this patch depends on Dan's addition of a device guard[1].
Therefore, the base commit is not a stable commit. I've pushed a branch
with this commit included for testing if folks are interested.[2]

[1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
[2] https://github.com/weiny2/linux-kernel/tree/cxl-cper-2023-12-15

Cover letter
============

CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record. If a device is configured for firmware first CXL event records
are not sent directly to the host.

The CXL sub-system uniquely has DPA to HPA translation information. It
also already has event format tracing. Restructure the code to make
sharing the data between CPER/event logs most efficient. Then send the
CXL CPER records to the CXL sub-system for processing.

With event logs the events interrupt the driver directly. In the EFI
case events are wrapped with device information which allows the CXL
subsystem to identify the PCI device.

Previous version considered matching the memdev differently. However,
the most robust was to find the PCI device via Bus, Device, Function and
use the PCI device to find the driver data.

CPER records are identified with GUID's while CXL event logs contain
UUID's. The UUID is reported for all events. While the UUID is
redundant for the known events the UUID's are already used by rasdaemon.
To keep compatibility UUIDs are injected for CPER records based on the
record type.

In addition this series cleans up the UUID defines used between the
event processing and cxl_test code.

Signed-off-by: Ira Weiny <[email protected]>
---
Changes in v4:
- lkp: Fix 0day build issues
[Functionally the same as v3]
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- djbw: use scope based cleanup functions
- iweiny: further clean ups
- Link to v2: https://lore.kernel.org/r/[email protected]

---
Ira Weiny (7):
cxl/trace: Pass uuid explicitly to event traces
cxl/events: Promote CXL event structures to a core header
cxl/events: Create common event UUID defines
cxl/events: Separate UUID from event structures
cxl/events: Create a CXL event union
firmware/efi: Process CXL Component Events
cxl/memdev: Register for and process CPER events

drivers/cxl/core/mbox.c | 93 +++++++++++------------
drivers/cxl/core/trace.h | 30 ++++----
drivers/cxl/cxlmem.h | 110 ++++++---------------------
drivers/cxl/pci.c | 55 +++++++++++++-
drivers/firmware/efi/cper.c | 15 ++++
drivers/firmware/efi/cper_cxl.c | 46 ++++++++++++
drivers/firmware/efi/cper_cxl.h | 29 +++++++
include/linux/cxl-event.h | 161 +++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 2 +
tools/testing/cxl/test/mem.c | 163 +++++++++++++++++++++++-----------------
10 files changed, 484 insertions(+), 220 deletions(-)
---
base-commit: 6436863dfabce0d7ac416c8dc661fd513b967d39
change-id: 20230601-cxl-cper-26ffc839c6c6

Best regards,
--
Ira Weiny <[email protected]>



2023-12-15 23:26:48

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 1/7] cxl/trace: Pass uuid explicitly to event traces

CXL CPER events are identified by the CPER Section Type GUID. The GUID
correlates with the CXL UUID for the event record. It turns out that a
CXL CPER record is a strict subset of the CXL event record, only the
UUID header field is chopped.

In order to unify handling between native and CPER flavors of CXL
events, prepare the code for the UUID to be passed in rather than
inferred from the record itself.

Later patches update the passed in record to only refer to the common
data between the formats.

Pass the UUID explicitly to each trace event to be able to remove the
UUID from the event structures.

Originally it was desirable to remove the UUID from the well known event
because the UUID value was redundant. However, the trace API was
already in place.[1]

[1] https://lore.kernel.org/all/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 8 ++++----
drivers/cxl/core/trace.h | 28 ++++++++++++++--------------
2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 36270dcfb42e..00f429c440df 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -870,19 +870,19 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
struct cxl_event_gen_media *rec =
(struct cxl_event_gen_media *)record;

- trace_cxl_general_media(cxlmd, type, rec);
+ trace_cxl_general_media(cxlmd, type, id, rec);
} else if (uuid_equal(id, &dram_event_uuid)) {
struct cxl_event_dram *rec = (struct cxl_event_dram *)record;

- trace_cxl_dram(cxlmd, type, rec);
+ trace_cxl_dram(cxlmd, type, id, rec);
} else if (uuid_equal(id, &mem_mod_event_uuid)) {
struct cxl_event_mem_module *rec =
(struct cxl_event_mem_module *)record;

- trace_cxl_memory_module(cxlmd, type, rec);
+ trace_cxl_memory_module(cxlmd, type, id, rec);
} else {
/* For unknown record types print just the header */
- trace_cxl_generic_event(cxlmd, type, record);
+ trace_cxl_generic_event(cxlmd, type, id, record);
}
}

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a0b5819bc70b..3da16026b8db 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -198,12 +198,12 @@ TRACE_EVENT(cxl_overflow,
__field(u8, hdr_length) \
__field(u8, hdr_maint_op_class)

-#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
+#define CXL_EVT_TP_fast_assign(cxlmd, l, uuid, hdr) \
__assign_str(memdev, dev_name(&(cxlmd)->dev)); \
__assign_str(host, dev_name((cxlmd)->dev.parent)); \
__entry->log = (l); \
__entry->serial = (cxlmd)->cxlds->serial; \
- memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
+ memcpy(&__entry->hdr_uuid, (uuid), sizeof(uuid_t)); \
__entry->hdr_length = (hdr).length; \
__entry->hdr_flags = get_unaligned_le24((hdr).flags); \
__entry->hdr_handle = le16_to_cpu((hdr).handle); \
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
TRACE_EVENT(cxl_generic_event,

TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_record_raw *rec),
+ const uuid_t *uuid, struct cxl_event_record_raw *rec),

- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),

TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -235,7 +235,7 @@ TRACE_EVENT(cxl_generic_event,
),

TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),

@@ -315,9 +315,9 @@ TRACE_EVENT(cxl_generic_event,
TRACE_EVENT(cxl_general_media,

TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_gen_media *rec),
+ const uuid_t *uuid, struct cxl_event_gen_media *rec),

- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),

TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -336,7 +336,7 @@ TRACE_EVENT(cxl_general_media,
),

TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);

/* General Media */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -398,9 +398,9 @@ TRACE_EVENT(cxl_general_media,
TRACE_EVENT(cxl_dram,

TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_dram *rec),
+ const uuid_t *uuid, struct cxl_event_dram *rec),

- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),

TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -422,7 +422,7 @@ TRACE_EVENT(cxl_dram,
),

TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);

/* DRAM */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -547,9 +547,9 @@ TRACE_EVENT(cxl_dram,
TRACE_EVENT(cxl_memory_module,

TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_mem_module *rec),
+ const uuid_t *uuid, struct cxl_event_mem_module *rec),

- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),

TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -569,7 +569,7 @@ TRACE_EVENT(cxl_memory_module,
),

TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);

/* Memory Module Event */
__entry->event_type = rec->event_type;

--
2.43.0


2023-12-15 23:27:07

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 2/7] cxl/events: Promote CXL event structures to a core header

UEFI code can process CXL events through CPER records. Those records
use almost the same format as the CXL events.

Lift the CXL event structures to a core header to be shared in later
patches.

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/cxlmem.h | 90 +----------------------------------------
include/linux/cxl-event.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 89 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a2fcbca253f3..f0e7ebb84f02 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -6,6 +6,7 @@
#include <linux/cdev.h>
#include <linux/uuid.h>
#include <linux/rcuwait.h>
+#include <linux/cxl-event.h>
#include "cxl.h"

/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -579,27 +580,6 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;

-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
-struct cxl_event_record_hdr {
- uuid_t id;
- u8 length;
- u8 flags[3];
- __le16 handle;
- __le16 related_handle;
- __le64 timestamp;
- u8 maint_op_class;
- u8 reserved[15];
-} __packed;
-
-#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- struct cxl_event_record_hdr hdr;
- u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
-} __packed;
-
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -641,74 +621,6 @@ struct cxl_mbox_clear_event_payload {
} __packed;
#define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX

-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
-struct cxl_event_gen_media {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 device[3];
- u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
- u8 reserved[46];
-} __packed;
-
-/*
- * DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
- */
-#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
-struct cxl_event_dram {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 nibble_mask[3];
- u8 bank_group;
- u8 bank;
- u8 row[3];
- u8 column[2];
- u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
- u8 reserved[0x17];
-} __packed;
-
-/*
- * Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
- */
-struct cxl_get_health_info {
- u8 health_status;
- u8 media_status;
- u8 add_status;
- u8 life_used;
- u8 device_temp[2];
- u8 dirty_shutdown_cnt[4];
- u8 cor_vol_err_cnt[4];
- u8 cor_per_err_cnt[4];
-} __packed;
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-struct cxl_event_mem_module {
- struct cxl_event_record_hdr hdr;
- u8 event_type;
- struct cxl_get_health_info info;
- u8 reserved[0x3d];
-} __packed;
-
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
new file mode 100644
index 000000000000..1c94e8fdd227
--- /dev/null
+++ b/include/linux/cxl-event.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CXL_EVENT_H
+#define _LINUX_CXL_EVENT_H
+
+/*
+ * CXL event records; CXL rev 3.0
+ *
+ * Copyright(c) 2023 Intel Corporation.
+ */
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+ uuid_t id;
+ u8 length;
+ u8 flags[3];
+ __le16 handle;
+ __le16 related_handle;
+ __le64 timestamp;
+ u8 maint_op_class;
+ u8 reserved[15];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+ struct cxl_event_record_hdr hdr;
+ u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
+struct cxl_event_gen_media {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 device[3];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 reserved[46];
+} __packed;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
+struct cxl_event_dram {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ u8 column[2];
+ u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+ u8 reserved[0x17];
+} __packed;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+ u8 health_status;
+ u8 media_status;
+ u8 add_status;
+ u8 life_used;
+ u8 device_temp[2];
+ u8 dirty_shutdown_cnt[4];
+ u8 cor_vol_err_cnt[4];
+ u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+ struct cxl_event_record_hdr hdr;
+ u8 event_type;
+ struct cxl_get_health_info info;
+ u8 reserved[0x3d];
+} __packed;
+
+#endif /* _LINUX_CXL_EVENT_H */

--
2.43.0


2023-12-15 23:27:29

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 3/7] cxl/events: Create common event UUID defines

Dan points out in review that the cxl_test code could be made better
through the use of UUID's defines rather than being open coded.[1]
Static const variables still need to exist to be passed to the trace
code.

Create UUID defines and use them rather than open coding them.

[1] https://lore.kernel.org/all/[email protected]/

Suggested-by: Dan Williams <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for v3/v4:
[iweiny: move comments along with defines]
---
drivers/cxl/core/mbox.c | 26 +++-----------------------
drivers/cxl/cxlmem.h | 24 ++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 9 +++------
3 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 00f429c440df..be2ad3b6f431 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -836,29 +836,9 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);

-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-static const uuid_t gen_media_event_uuid =
- UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
-
-/*
- * DRAM Event Record
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
- */
-static const uuid_t dram_event_uuid =
- UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-static const uuid_t mem_mod_event_uuid =
- UUID_INIT(0xfe927475, 0xdd59, 0x4339,
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
+static const uuid_t gen_media_event_uuid = CXL_EVENT_GEN_MEDIA_UUID;
+static const uuid_t dram_event_uuid = CXL_EVENT_DRAM_UUID;
+static const uuid_t mem_mod_event_uuid = CXL_EVENT_MEM_MODULE_UUID;

static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f0e7ebb84f02..e5d770e26e02 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -580,6 +580,30 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;

+/*
+ * General Media Event Record UUID
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MEDIA_UUID \
+ UUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record UUID
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CXL_EVENT_DRAM_UUID \
+ UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record UUID
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CXL_EVENT_MEM_MODULE_UUID \
+ UUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index ee61fa3a2411..5a95b04b329a 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -362,8 +362,7 @@ struct cxl_event_record_raw hardware_replace = {

struct cxl_event_gen_media gen_media = {
.hdr = {
- .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+ .id = CXL_EVENT_GEN_MEDIA_UUID,
.length = sizeof(struct cxl_event_gen_media),
.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
/* .handle = Set dynamically */
@@ -380,8 +379,7 @@ struct cxl_event_gen_media gen_media = {

struct cxl_event_dram dram = {
.hdr = {
- .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+ .id = CXL_EVENT_DRAM_UUID,
.length = sizeof(struct cxl_event_dram),
.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
/* .handle = Set dynamically */
@@ -400,8 +398,7 @@ struct cxl_event_dram dram = {

struct cxl_event_mem_module mem_module = {
.hdr = {
- .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+ .id = CXL_EVENT_MEM_MODULE_UUID,
.length = sizeof(struct cxl_event_mem_module),
/* .handle = Set dynamically */
.related_handle = cpu_to_le16(0),

--
2.43.0


2023-12-15 23:27:47

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 4/7] cxl/events: Separate UUID from event structures

The UEFI CXL CPER structure does not include the UUID. Now that the
UUID is passed separately to the trace event there is no need to have
the UUID in those structures.

Move UUID from the event record header to the raw structures. Adjust
cxl-test to Create dummy structures for creating test records.

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 2 +-
include/linux/cxl-event.h | 10 ++--
tools/testing/cxl/test/mem.c | 129 +++++++++++++++++++++++++------------------
3 files changed, 81 insertions(+), 60 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index be2ad3b6f431..5c4e4a692dea 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -844,7 +844,7 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
- uuid_t *id = &record->hdr.id;
+ uuid_t *id = &record->id;

if (uuid_equal(id, &gen_media_event_uuid)) {
struct cxl_event_gen_media *rec =
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 1c94e8fdd227..ebb00ead1496 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -8,12 +8,7 @@
* Copyright(c) 2023 Intel Corporation.
*/

-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
struct cxl_event_record_hdr {
- uuid_t id;
u8 length;
u8 flags[3];
__le16 handle;
@@ -23,8 +18,13 @@ struct cxl_event_record_hdr {
u8 reserved[15];
} __packed;

+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
struct cxl_event_record_raw {
+ uuid_t id;
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 5a95b04b329a..9cc2b8ce1efd 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -337,9 +337,9 @@ static void cxl_mock_event_trigger(struct device *dev)
}

struct cxl_event_record_raw maint_needed = {
+ .id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
+ 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.hdr = {
- .id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
- 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.length = sizeof(struct cxl_event_record_raw),
.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
/* .handle = Set dynamically */
@@ -349,9 +349,9 @@ struct cxl_event_record_raw maint_needed = {
};

struct cxl_event_record_raw hardware_replace = {
+ .id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
+ 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.hdr = {
- .id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
- 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.length = sizeof(struct cxl_event_record_raw),
.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
/* .handle = Set dynamically */
@@ -360,61 +360,82 @@ struct cxl_event_record_raw hardware_replace = {
.data = { 0xDE, 0xAD, 0xBE, 0xEF },
};

-struct cxl_event_gen_media gen_media = {
- .hdr = {
- .id = CXL_EVENT_GEN_MEDIA_UUID,
- .length = sizeof(struct cxl_event_gen_media),
- .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_gen_media {
+ uuid_t id;
+ struct cxl_event_gen_media rec;
+} __packed;
+
+struct cxl_test_gen_media gen_media = {
+ .id = CXL_EVENT_GEN_MEDIA_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_gen_media),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x2000),
+ .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+ .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+ /* .validity_flags = <set below> */
+ .channel = 1,
+ .rank = 30
},
- .phys_addr = cpu_to_le64(0x2000),
- .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
- .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
- .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
- /* .validity_flags = <set below> */
- .channel = 1,
- .rank = 30
};

-struct cxl_event_dram dram = {
- .hdr = {
- .id = CXL_EVENT_DRAM_UUID,
- .length = sizeof(struct cxl_event_dram),
- .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_dram {
+ uuid_t id;
+ struct cxl_event_dram rec;
+} __packed;
+
+struct cxl_test_dram dram = {
+ .id = CXL_EVENT_DRAM_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_dram),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x8000),
+ .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+ .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+ /* .validity_flags = <set below> */
+ .channel = 1,
+ .bank_group = 5,
+ .bank = 2,
+ .column = {0xDE, 0xAD},
},
- .phys_addr = cpu_to_le64(0x8000),
- .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
- .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
- .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
- /* .validity_flags = <set below> */
- .channel = 1,
- .bank_group = 5,
- .bank = 2,
- .column = {0xDE, 0xAD},
};

-struct cxl_event_mem_module mem_module = {
- .hdr = {
- .id = CXL_EVENT_MEM_MODULE_UUID,
- .length = sizeof(struct cxl_event_mem_module),
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_mem_module {
+ uuid_t id;
+ struct cxl_event_mem_module rec;
+} __packed;
+
+struct cxl_test_mem_module mem_module = {
+ .id = CXL_EVENT_MEM_MODULE_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_mem_module),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .event_type = CXL_MMER_TEMP_CHANGE,
+ .info = {
+ .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+ .media_status = CXL_DHI_MS_ALL_DATA_LOST,
+ .add_status = (CXL_DHI_AS_CRITICAL << 2) |
+ (CXL_DHI_AS_WARNING << 4) |
+ (CXL_DHI_AS_WARNING << 5),
+ .device_temp = { 0xDE, 0xAD},
+ .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ }
},
- .event_type = CXL_MMER_TEMP_CHANGE,
- .info = {
- .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
- .media_status = CXL_DHI_MS_ALL_DATA_LOST,
- .add_status = (CXL_DHI_AS_CRITICAL << 2) |
- (CXL_DHI_AS_WARNING << 4) |
- (CXL_DHI_AS_WARNING << 5),
- .device_temp = { 0xDE, 0xAD},
- .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
- .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- }
};

static int mock_set_timestamp(struct cxl_dev_state *cxlds,
@@ -436,11 +457,11 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
static void cxl_mock_add_event_logs(struct mock_event_store *mes)
{
put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK,
- &gen_media.validity_flags);
+ &gen_media.rec.validity_flags);

put_unaligned_le16(CXL_DER_VALID_CHANNEL | CXL_DER_VALID_BANK_GROUP |
CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN,
- &dram.validity_flags);
+ &dram.rec.validity_flags);

mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
mes_add_event(mes, CXL_EVENT_TYPE_INFO,

--
2.43.0


2023-12-15 23:28:00

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 5/7] cxl/events: Create a CXL event union

The CXL CPER and event log records share everything but a UUID/GUID in
their structures.

Define a cxl_event union without the UUID/GUID to be shared between the
CPER and event log record formats. Adjust the code to use this union.

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 32 +++++++++++++-------------------
drivers/cxl/core/trace.h | 8 ++++----
include/linux/cxl-event.h | 23 +++++++++++++++++------
tools/testing/cxl/test/mem.c | 31 ++++++++++++++++++-------------
4 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5c4e4a692dea..b7efa058a100 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -844,26 +844,17 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
+ union cxl_event *evt = &record->event;
uuid_t *id = &record->id;

- if (uuid_equal(id, &gen_media_event_uuid)) {
- struct cxl_event_gen_media *rec =
- (struct cxl_event_gen_media *)record;
-
- trace_cxl_general_media(cxlmd, type, id, rec);
- } else if (uuid_equal(id, &dram_event_uuid)) {
- struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
-
- trace_cxl_dram(cxlmd, type, id, rec);
- } else if (uuid_equal(id, &mem_mod_event_uuid)) {
- struct cxl_event_mem_module *rec =
- (struct cxl_event_mem_module *)record;
-
- trace_cxl_memory_module(cxlmd, type, id, rec);
- } else {
- /* For unknown record types print just the header */
- trace_cxl_generic_event(cxlmd, type, id, record);
- }
+ if (uuid_equal(id, &gen_media_event_uuid))
+ trace_cxl_general_media(cxlmd, type, id, &evt->gen_media);
+ else if (uuid_equal(id, &dram_event_uuid))
+ trace_cxl_dram(cxlmd, type, id, &evt->dram);
+ else if (uuid_equal(id, &mem_mod_event_uuid))
+ trace_cxl_memory_module(cxlmd, type, id, &evt->mem_module);
+ else
+ trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
}

static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -906,7 +897,10 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
*/
i = 0;
for (cnt = 0; cnt < total; cnt++) {
- payload->handles[i++] = get_pl->records[cnt].hdr.handle;
+ struct cxl_event_record_raw *raw = &get_pl->records[cnt];
+ struct cxl_event_generic *gen = &raw->event.generic;
+
+ payload->handles[i++] = gen->hdr.handle;
dev_dbg(mds->cxlds.dev, "Event log '%d': Clearing %u\n", log,
le16_to_cpu(payload->handles[i]));

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 3da16026b8db..862b50cb1829 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
TRACE_EVENT(cxl_generic_event,

TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- const uuid_t *uuid, struct cxl_event_record_raw *rec),
+ const uuid_t *uuid, struct cxl_event_generic *gen_rec),

- TP_ARGS(cxlmd, log, uuid, rec),
+ TP_ARGS(cxlmd, log, uuid, gen_rec),

TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -235,8 +235,8 @@ TRACE_EVENT(cxl_generic_event,
),

TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
- memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, gen_rec->hdr);
+ memcpy(__entry->data, gen_rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),

CXL_EVT_TP_printk("%s",
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index ebb00ead1496..18dab4d90dc8 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -18,13 +18,8 @@ struct cxl_event_record_hdr {
u8 reserved[15];
} __packed;

-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- uuid_t id;
+struct cxl_event_generic {
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
@@ -97,4 +92,20 @@ struct cxl_event_mem_module {
u8 reserved[0x3d];
} __packed;

+union cxl_event {
+ struct cxl_event_generic generic;
+ struct cxl_event_gen_media gen_media;
+ struct cxl_event_dram dram;
+ struct cxl_event_mem_module mem_module;
+} __packed;
+
+/*
+ * Common Event Record Format; in event logs
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_raw {
+ uuid_t id;
+ union cxl_event event;
+} __packed;
+
#endif /* _LINUX_CXL_EVENT_H */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9cc2b8ce1efd..35ee41e435ab 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -251,7 +251,8 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) {
memcpy(&pl->records[i], event_get_current(log),
sizeof(pl->records[i]));
- pl->records[i].hdr.handle = event_get_cur_event_handle(log);
+ pl->records[i].event.generic.hdr.handle =
+ event_get_cur_event_handle(log);
log->cur_idx++;
}

@@ -339,25 +340,29 @@ static void cxl_mock_event_trigger(struct device *dev)
struct cxl_event_record_raw maint_needed = {
.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
- .hdr = {
- .length = sizeof(struct cxl_event_record_raw),
- .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0xa5b6),
+ .event.generic = {
+ .hdr = {
+ .length = sizeof(struct cxl_event_record_raw),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xa5b6),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
},
- .data = { 0xDE, 0xAD, 0xBE, 0xEF },
};

struct cxl_event_record_raw hardware_replace = {
.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
- .hdr = {
- .length = sizeof(struct cxl_event_record_raw),
- .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0xb6a5),
+ .event.generic = {
+ .hdr = {
+ .length = sizeof(struct cxl_event_record_raw),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xb6a5),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
},
- .data = { 0xDE, 0xAD, 0xBE, 0xEF },
};

struct cxl_test_gen_media {

--
2.43.0


2023-12-15 23:28:09

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 6/7] firmware/efi: Process CXL Component Events

BIOS can configure memory devices as firmware first. This will send CXL
events to the firmware instead of the OS. The firmware can then send
these events to the OS via UEFI.

UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
format for CXL Component Events. The format is mostly the same as the
CXL Common Event Record Format. The difference is the use of a GUID in
the Section Type rather than a UUID as part of the event itself.

Add EFI support to detect CXL CPER records and call a registered
notifier with the event. Enforce that only one notifier call can be
registered at any time.

Cc: Ard Biesheuvel <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for v4
[kernel test robot: fix CONFIG_UEFI_CPER=n build]
[kernel test robot: use static for cxl_cper_rw_sem]

Changes for v3:
[djbw: check notifier vs registered and error if not the same]
[djbw: check for previous notifier and fail if already set]
[djbw: use guard for rw sem]
[djbw: pass the event type and record data as 2 parameters]
[djbw: prefer __packed over pragma packed]
[djbw: clean up commit message]
[iweiny: clean structure names]
---
drivers/firmware/efi/cper.c | 15 +++++++++++++
drivers/firmware/efi/cper_cxl.c | 46 +++++++++++++++++++++++++++++++++++++
drivers/firmware/efi/cper_cxl.h | 29 ++++++++++++++++++++++++
include/linux/cxl-event.h | 50 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 140 insertions(+)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f667781..2ee5011c4a8e 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -22,6 +22,7 @@
#include <linux/aer.h>
#include <linux/printk.h>
#include <linux/bcd.h>
+#include <linux/cxl-event.h>
#include <acpi/ghes.h>
#include <ras/ras_event.h>
#include "cper_cxl.h"
@@ -607,6 +608,20 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
cper_print_prot_err(newpfx, prot_err);
else
goto err_section_too_small;
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ if (rec->hdr.length <= sizeof(rec->hdr))
+ goto err_section_too_small;
+
+ if (rec->hdr.length > sizeof(*rec)) {
+ pr_err(FW_WARN "error section length is too big\n");
+ return;
+ }
+
+ cxl_cper_post_event(newpfx, sec_type, rec);
} else {
const void *err = acpi_hest_get_payload(gdata);

diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
index a55771b99a97..29388a0825ef 100644
--- a/drivers/firmware/efi/cper_cxl.c
+++ b/drivers/firmware/efi/cper_cxl.c
@@ -8,6 +8,7 @@
*/

#include <linux/cper.h>
+#include <linux/cxl-event.h>
#include "cper_cxl.h"

#define PROT_ERR_VALID_AGENT_TYPE BIT_ULL(0)
@@ -187,3 +188,48 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
sizeof(cxl_ras->header_log), 0);
}
}
+
+static DECLARE_RWSEM(cxl_cper_rw_sem);
+static cxl_cper_notifier cper_notifier;
+
+void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
+ struct cxl_cper_event_rec *rec)
+{
+ enum cxl_event_type event_type;
+
+ if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
+ pr_err(FW_WARN "cxl event no Component Event Log present\n");
+ return;
+ }
+
+ if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID))
+ event_type = CXL_CPER_EVENT_GEN_MEDIA;
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID))
+ event_type = CXL_CPER_EVENT_DRAM;
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID))
+ event_type = CXL_CPER_EVENT_MEM_MODULE;
+
+ guard(rwsem_read)(&cxl_cper_rw_sem);
+ if (cper_notifier)
+ cper_notifier(event_type, rec);
+}
+
+int cxl_cper_register_notifier(cxl_cper_notifier notifier)
+{
+ guard(rwsem_write)(&cxl_cper_rw_sem);
+ if (cper_notifier)
+ return -EINVAL;
+ cper_notifier = notifier;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_notifier, CXL);
+
+int cxl_cper_unregister_notifier(cxl_cper_notifier notifier)
+{
+ guard(rwsem_write)(&cxl_cper_rw_sem);
+ if (notifier != cper_notifier)
+ return -EINVAL;
+ cper_notifier = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_notifier, CXL);
diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
index 86bfcf7909ec..71f27b3e2810 100644
--- a/drivers/firmware/efi/cper_cxl.h
+++ b/drivers/firmware/efi/cper_cxl.h
@@ -10,11 +10,38 @@
#ifndef LINUX_CPER_CXL_H
#define LINUX_CPER_CXL_H

+#include <linux/cxl-event.h>
+
/* CXL Protocol Error Section */
#define CPER_SEC_CXL_PROT_ERR \
GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
0x4B, 0x77, 0x10, 0x48)

+/* CXL Event record UUIDs are formated at GUIDs and reported in section type */
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID \
+ GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID \
+ GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE_GUID \
+ GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
#pragma pack(1)

/* Compute Express Link Protocol Error Section, UEFI v2.10 sec N.2.13 */
@@ -62,5 +89,7 @@ struct cper_sec_prot_err {
#pragma pack()

void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
+void cxl_cper_post_event(const char *pfx, guid_t *sec_type,
+ struct cxl_cper_event_rec *rec);

#endif //__CPER_CXL_
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 18dab4d90dc8..2b137aead750 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -108,4 +108,54 @@ struct cxl_event_record_raw {
union cxl_event event;
} __packed;

+enum cxl_event_type {
+ CXL_CPER_EVENT_GEN_MEDIA,
+ CXL_CPER_EVENT_DRAM,
+ CXL_CPER_EVENT_MEM_MODULE,
+};
+
+#define CPER_CXL_DEVICE_ID_VALID BIT(0)
+#define CPER_CXL_DEVICE_SN_VALID BIT(1)
+#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
+struct cxl_cper_event_rec {
+ struct {
+ u32 length;
+ u64 validation_bits;
+ struct cper_cxl_event_devid {
+ u16 vendor_id;
+ u16 device_id;
+ u8 func_num;
+ u8 device_num;
+ u8 bus_num;
+ u16 segment_num;
+ u16 slot_num; /* bits 2:0 reserved */
+ u8 reserved;
+ } device_id __packed;
+ struct cper_cxl_event_sn {
+ u32 lower_dw;
+ u32 upper_dw;
+ } dev_serial_num __packed;
+ } hdr __packed;
+
+ union cxl_event event;
+} __packed;
+
+typedef void (*cxl_cper_notifier)(enum cxl_event_type type,
+ struct cxl_cper_event_rec *rec);
+
+#ifdef CONFIG_UEFI_CPER
+int cxl_cper_register_notifier(cxl_cper_notifier notifier);
+int cxl_cper_unregister_notifier(cxl_cper_notifier notifier);
+#else
+static inline int cxl_cper_register_notifier(cxl_cper_notifier notifier)
+{
+ return 0;
+}
+
+static inline int cxl_cper_unregister_notifier(cxl_cper_notifier notifier)
+{
+ return 0;
+}
+#endif
+
#endif /* _LINUX_CXL_EVENT_H */

--
2.43.0


2023-12-15 23:28:29

by Ira Weiny

[permalink] [raw]
Subject: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

If the firmware has configured CXL event support to be firmware first
the OS can process those events through CPER records. The CXL layer has
unique DPA to HPA knowledge and standard event trace parsing in place.

CPER records contain Bus, Device, Function information which can be used
to identify the PCI device which is sending the event.

Change pci driver registration to include registration for a CXL CPER
notifier to process the events through the trace subsystem.

Define and use scoped based management to simplify the handling of the
pci device object.

Cc: Bjorn Helgaas <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
NOTE this patch depends on Dan's addition of a device guard[1].

[1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/

Changes for v3/v4:
[djbw: define a __free(pci_dev_put) to release the device automatically]
[djbw: use device guard from Vishal]
[iweiny: delete old notifier block structure]
[iweiny: adjust for new notifier interface]
---
drivers/cxl/core/mbox.c | 31 +++++++++++++++++++++++-----
drivers/cxl/cxlmem.h | 4 ++++
drivers/cxl/pci.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/pci.h | 2 ++
4 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index b7efa058a100..c9aa723e3391 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -840,9 +840,30 @@ static const uuid_t gen_media_event_uuid = CXL_EVENT_GEN_MEDIA_UUID;
static const uuid_t dram_event_uuid = CXL_EVENT_DRAM_UUID;
static const uuid_t mem_mod_event_uuid = CXL_EVENT_MEM_MODULE_UUID;

-static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
- enum cxl_event_log_type type,
- struct cxl_event_record_raw *record)
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ union cxl_event *event)
+{
+ switch (event_type) {
+ case CXL_CPER_EVENT_GEN_MEDIA:
+ trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
+ &event->gen_media);
+ break;
+ case CXL_CPER_EVENT_DRAM:
+ trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
+ break;
+ case CXL_CPER_EVENT_MEM_MODULE:
+ trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
+ &event->mem_module);
+ break;
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+
+static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ struct cxl_event_record_raw *record)
{
union cxl_event *evt = &record->event;
uuid_t *id = &record->id;
@@ -965,8 +986,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
break;

for (i = 0; i < nr_rec; i++)
- cxl_event_trace_record(cxlmd, type,
- &payload->records[i]);
+ __cxl_event_trace_record(cxlmd, type,
+ &payload->records[i]);

if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e5d770e26e02..e7e9508fecac 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -802,6 +802,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ union cxl_event *event);
int cxl_set_timestamp(struct cxl_memdev_state *mds);
int cxl_poison_state_init(struct cxl_memdev_state *mds);
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0155fb66b580..638275569d63 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <asm-generic/unaligned.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/moduleparam.h>
#include <linux/module.h>
@@ -969,6 +970,58 @@ static struct pci_driver cxl_pci_driver = {
},
};

+#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
+static void cxl_cper_event_call(enum cxl_event_type ev_type,
+ struct cxl_cper_event_rec *rec)
+{
+ struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
+ struct pci_dev *pdev __free(pci_dev_put) = NULL;
+ struct cxl_dev_state *cxlds = NULL;
+ enum cxl_event_log_type log_type;
+ unsigned int devfn;
+ u32 hdr_flags;
+
+ devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
+ pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
+ device_id->bus_num, devfn);
+ if (!pdev)
+ return;
+
+ guard(device)(&pdev->dev);
+ if (pdev->driver == &cxl_pci_driver)
+ cxlds = pci_get_drvdata(pdev);
+ if (!cxlds)
+ return;
+
+ /* Fabricate a log type */
+ hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
+ log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
+
+ cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type, &rec->event);
+}
+
+static int __init cxl_pci_driver_init(void)
+{
+ int rc;
+
+ rc = pci_register_driver(&cxl_pci_driver);
+ if (rc)
+ return rc;
+
+ rc = cxl_cper_register_notifier(cxl_cper_event_call);
+ if (rc)
+ pci_unregister_driver(&cxl_pci_driver);
+
+ return rc;
+}
+
+static void __exit cxl_pci_driver_exit(void)
+{
+ cxl_cper_unregister_notifier(cxl_cper_event_call);
+ pci_unregister_driver(&cxl_pci_driver);
+}
+
+module_init(cxl_pci_driver_init);
+module_exit(cxl_pci_driver_exit);
MODULE_LICENSE("GPL v2");
-module_pci_driver(cxl_pci_driver);
MODULE_IMPORT_NS(CXL);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..290d0a2651b2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
struct pci_dev *pci_dev_get(struct pci_dev *dev);
void pci_dev_put(struct pci_dev *dev);
+DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
void pci_remove_bus(struct pci_bus *b);
void pci_stop_and_remove_bus_device(struct pci_dev *dev);
void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
@@ -1871,6 +1872,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
void pci_dev_lock(struct pci_dev *dev);
int pci_dev_trylock(struct pci_dev *dev);
void pci_dev_unlock(struct pci_dev *dev);
+DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))

/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),

--
2.43.0


2023-12-18 18:13:34

by Smita Koralahalli

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] firmware/efi: Process CXL Component Events

On 12/15/2023 3:26 PM, Ira Weiny wrote:
> BIOS can configure memory devices as firmware first. This will send CXL
> events to the firmware instead of the OS. The firmware can then send
> these events to the OS via UEFI.
>
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events. The format is mostly the same as the
> CXL Common Event Record Format. The difference is the use of a GUID in
> the Section Type rather than a UUID as part of the event itself.
>
> Add EFI support to detect CXL CPER records and call a registered
> notifier with the event. Enforce that only one notifier call can be
> registered at any time.
>
> Cc: Ard Biesheuvel <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>

[snip]

> + struct {
> + u32 length;
> + u64 validation_bits;
> + struct cper_cxl_event_devid {
> + u16 vendor_id;
> + u16 device_id;
> + u8 func_num;
> + u8 device_num;
> + u8 bus_num;
> + u16 segment_num;
> + u16 slot_num; /* bits 2:0 reserved */
> + u8 reserved;
> + } device_id __packed;
> + struct cper_cxl_event_sn {
> + u32 lower_dw;
> + u32 upper_dw;
> + } dev_serial_num __packed;
> + } hdr __packed;
> +
> + union cxl_event event;
> +} __packed;
> +

For some reason, prefixing the struct name with __packed attribute seems
to do the job. ("__packed device_id" and "__packed dev_serial_num").

Thanks,
Smita

2023-12-18 18:18:13

by Smita Koralahalli

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

On 12/15/2023 3:26 PM, Ira Weiny wrote:
> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change pci driver registration to include registration for a CXL CPER
> notifier to process the events through the trace subsystem.
>
> Define and use scoped based management to simplify the handling of the
> pci device object.
>
> Cc: Bjorn Helgaas <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---

[snip]


> + switch (event_type) {
> + case CXL_CPER_EVENT_GEN_MEDIA:
> + trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
> + &event->gen_media);
> + break;
> + case CXL_CPER_EVENT_DRAM:
> + trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
> + break;
> + case CXL_CPER_EVENT_MEM_MODULE:
> + trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
> + &event->mem_module);
> + break;
> + }
> +}

Is default case needed here?

> +EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +
> +static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> {
> union cxl_event *evt = &record->event;
> uuid_t *id = &record->id;
> @@ -965,8 +986,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> break;
>
> for (i = 0; i < nr_rec; i++)
> - cxl_event_trace_record(cxlmd, type,
> - &payload->records[i]);
> + __cxl_event_trace_record(cxlmd, type,
> + &payload->records[i]);
>
> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_overflow(cxlmd, type, payload);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index e5d770e26e02..e7e9508fecac 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -802,6 +802,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + union cxl_event *event);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0155fb66b580..638275569d63 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include <asm-generic/unaligned.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/moduleparam.h>
> #include <linux/module.h>
> @@ -969,6 +970,58 @@ static struct pci_driver cxl_pci_driver = {
> },
> };
>
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> + struct cxl_cper_event_rec *rec)
> +{
> + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> + struct cxl_dev_state *cxlds = NULL;
> + enum cxl_event_log_type log_type;
> + unsigned int devfn;
> + u32 hdr_flags;
> +
> + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> + device_id->bus_num, devfn);
> + if (!pdev)
> + return;
> +
> + guard(device)(&pdev->dev);
> + if (pdev->driver == &cxl_pci_driver)
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;
> +
> + /* Fabricate a log type */
> + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type, &rec->event);

Currently, when I run this, I see two trace events printed. One from
here, and another as a non_standard_event from ghes. I think both should
be unified?

I remember Dan pointing out to me this when I sent decoding for protocol
errors and its still pending on me for protocol errors.

Thanks,
Smita

> +}
> +
> +static int __init cxl_pci_driver_init(void)
> +{
> + int rc;
> +
> + rc = pci_register_driver(&cxl_pci_driver);
> + if (rc)
> + return rc;
> +
> + rc = cxl_cper_register_notifier(cxl_cper_event_call);
> + if (rc)
> + pci_unregister_driver(&cxl_pci_driver);
> +
> + return rc;
> +}
> +
> +static void __exit cxl_pci_driver_exit(void)
> +{
> + cxl_cper_unregister_notifier(cxl_cper_event_call);
> + pci_unregister_driver(&cxl_pci_driver);
> +}
> +
> +module_init(cxl_pci_driver_init);
> +module_exit(cxl_pci_driver_exit);
> MODULE_LICENSE("GPL v2");
> -module_pci_driver(cxl_pci_driver);
> MODULE_IMPORT_NS(CXL);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 60ca768bc867..290d0a2651b2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> struct pci_dev *pci_dev_get(struct pci_dev *dev);
> void pci_dev_put(struct pci_dev *dev);
> +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> void pci_remove_bus(struct pci_bus *b);
> void pci_stop_and_remove_bus_device(struct pci_dev *dev);
> void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
> @@ -1871,6 +1872,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
> void pci_dev_lock(struct pci_dev *dev);
> int pci_dev_trylock(struct pci_dev *dev);
> void pci_dev_unlock(struct pci_dev *dev);
> +DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
>
> /*
> * PCI domain support. Sometimes called PCI segment (eg by ACPI),
>

2023-12-18 20:26:12

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] firmware/efi: Process CXL Component Events

Smita Koralahalli wrote:
> On 12/15/2023 3:26 PM, Ira Weiny wrote:
> > BIOS can configure memory devices as firmware first. This will send CXL
> > events to the firmware instead of the OS. The firmware can then send
> > these events to the OS via UEFI.
> >
> > UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> > format for CXL Component Events. The format is mostly the same as the
> > CXL Common Event Record Format. The difference is the use of a GUID in
> > the Section Type rather than a UUID as part of the event itself.
> >
> > Add EFI support to detect CXL CPER records and call a registered
> > notifier with the event. Enforce that only one notifier call can be
> > registered at any time.
> >
> > Cc: Ard Biesheuvel <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
>
> [snip]
>
> > + struct {
> > + u32 length;
> > + u64 validation_bits;
> > + struct cper_cxl_event_devid {
> > + u16 vendor_id;
> > + u16 device_id;
> > + u8 func_num;
> > + u8 device_num;
> > + u8 bus_num;
> > + u16 segment_num;
> > + u16 slot_num; /* bits 2:0 reserved */
> > + u8 reserved;
> > + } device_id __packed;
> > + struct cper_cxl_event_sn {
> > + u32 lower_dw;
> > + u32 upper_dw;
> > + } dev_serial_num __packed;
> > + } hdr __packed;
> > +
> > + union cxl_event event;
> > +} __packed;
> > +
>
> For some reason, prefixing the struct name with __packed attribute seems
> to do the job. ("__packed device_id" and "__packed dev_serial_num").

Good catch, yeah, the expectation is that follows the closing brace not
only to match the predominant style in the kernel, but gcc appears to
not honor it otherwise. Looks better with this on top:

diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 2b137aead750..975925029f6d 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -130,12 +130,12 @@ struct cxl_cper_event_rec {
u16 segment_num;
u16 slot_num; /* bits 2:0 reserved */
u8 reserved;
- } device_id __packed;
+ } __packed device_id;
struct cper_cxl_event_sn {
u32 lower_dw;
u32 upper_dw;
- } dev_serial_num __packed;
- } hdr __packed;
+ } __packed dev_serial_num;
+ } __packed hdr;

union cxl_event event;
} __packed;


2023-12-18 20:57:21

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

Smita Koralahalli wrote:
> On 12/15/2023 3:26 PM, Ira Weiny wrote:
> > If the firmware has configured CXL event support to be firmware first
> > the OS can process those events through CPER records. The CXL layer has
> > unique DPA to HPA knowledge and standard event trace parsing in place.
> >
> > CPER records contain Bus, Device, Function information which can be used
> > to identify the PCI device which is sending the event.
> >
> > Change pci driver registration to include registration for a CXL CPER
> > notifier to process the events through the trace subsystem.
> >
> > Define and use scoped based management to simplify the handling of the
> > pci device object.
> >
> > Cc: Bjorn Helgaas <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> > ---
>
> [snip]
>
>
> > + switch (event_type) {
> > + case CXL_CPER_EVENT_GEN_MEDIA:
> > + trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
> > + &event->gen_media);
> > + break;
> > + case CXL_CPER_EVENT_DRAM:
> > + trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
> > + break;
> > + case CXL_CPER_EVENT_MEM_MODULE:
> > + trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
> > + &event->mem_module);
> > + break;
> > + }
> > +}
>
> Is default case needed here?

Yeah, it looks like an uninitialized @type value can be passed through
the stack here.

[..]
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 0155fb66b580..638275569d63 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -1,5 +1,6 @@
> > // SPDX-License-Identifier: GPL-2.0-only
> > /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> > +#include <asm-generic/unaligned.h>
> > #include <linux/io-64-nonatomic-lo-hi.h>
> > #include <linux/moduleparam.h>
> > #include <linux/module.h>
> > @@ -969,6 +970,58 @@ static struct pci_driver cxl_pci_driver = {
> > },
> > };
> >
> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> > + struct cxl_cper_event_rec *rec)
> > +{
> > + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> > + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > + struct cxl_dev_state *cxlds = NULL;
> > + enum cxl_event_log_type log_type;
> > + unsigned int devfn;
> > + u32 hdr_flags;
> > +
> > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > + device_id->bus_num, devfn);
> > + if (!pdev)
> > + return;
> > +
> > + guard(device)(&pdev->dev);
> > + if (pdev->driver == &cxl_pci_driver)
> > + cxlds = pci_get_drvdata(pdev);
> > + if (!cxlds)
> > + return;
> > +
> > + /* Fabricate a log type */
> > + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> > + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> > +
> > + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type, &rec->event);
>
> Currently, when I run this, I see two trace events printed. One from
> here, and another as a non_standard_event from ghes. I think both should
> be unified?
>
> I remember Dan pointing out to me this when I sent decoding for protocol
> errors and its still pending on me for protocol errors.

Good point, so I think the responsibility to trace CXL events should
belong to ghes_do_proc() and ghes_print_estatus() can just ignore CXL
events.

Notice how ghes_proc() sometimes skips ghes_print_estatus(), but
uncoditionally emits a trace event in ghes_do_proc()? To me that means
that the cper_estatus_print() inside ghes_print_estatus() can just defer
to the ghes code to do the hookup to the trace code.

For example, ras_userspace_consumers() was introduced to skip emitting
events to the kernel log when the trace event might be handled. My
assumption is that was for historical reasons, but since CXL events are
new, just never emit them to the kernel log and always require the trace
path.

I am open to other thoughts here, but it seems like ghes_do_proc() is
where the callback needs to be triggered.


> Thanks,
> Smita
>
> > +}
> > +
> > +static int __init cxl_pci_driver_init(void)
> > +{
> > + int rc;
> > +
> > + rc = pci_register_driver(&cxl_pci_driver);
> > + if (rc)
> > + return rc;
> > +
> > + rc = cxl_cper_register_notifier(cxl_cper_event_call);

Quick aside as I am reading through this, the "notifier" name is
misleading since this callback has nothing to do with the
include/linux/notifier.h API.

2023-12-19 14:24:56

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] firmware/efi: Process CXL Component Events

On Fri, 15 Dec 2023 15:26:32 -0800
Ira Weiny <[email protected]> wrote:

> BIOS can configure memory devices as firmware first. This will send CXL
> events to the firmware instead of the OS. The firmware can then send
> these events to the OS via UEFI.
>
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events. The format is mostly the same as the
> CXL Common Event Record Format. The difference is the use of a GUID in
> the Section Type rather than a UUID as part of the event itself.
>
> Add EFI support to detect CXL CPER records and call a registered
> notifier with the event. Enforce that only one notifier call can be
> registered at any time.
>
> Cc: Ard Biesheuvel <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
One trivial thing inline.

+ Agree that notifier naming is unwise given what that means elsewhere in the
kernel.

> diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
> index 86bfcf7909ec..71f27b3e2810 100644
> --- a/drivers/firmware/efi/cper_cxl.h
> +++ b/drivers/firmware/efi/cper_cxl.h
> @@ -10,11 +10,38 @@
> #ifndef LINUX_CPER_CXL_H
> #define LINUX_CPER_CXL_H
>
> +#include <linux/cxl-event.h>
> +
> /* CXL Protocol Error Section */
> #define CPER_SEC_CXL_PROT_ERR \
> GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
> 0x4B, 0x77, 0x10, 0x48)
>
> +/* CXL Event record UUIDs are formated at GUIDs and reported in section type */

as GUIDs

> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CPER_SEC_CXL_GEN_MEDIA_GUID \

>


2023-12-19 14:52:20

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

On Fri, 15 Dec 2023 15:26:33 -0800
Ira Weiny <[email protected]> wrote:

> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change pci driver registration to include registration for a CXL CPER
> notifier to process the events through the trace subsystem.
>
> Define and use scoped based management to simplify the handling of the
> pci device object.
>
> Cc: Bjorn Helgaas <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>

I'd break out the pci guard stuff as a precursor patch. That's likely
to be used elsewhere so it would help for backporting for other users
if it wasn't buried in a patch doing other stuff.

Not to mention that has a different set of likely reviewers to the rest
of this patch.

More generally maybe we should just hardcode the UUID in the tracepoint
definitions? I think for everything other than the generic one we
only ever call trace_cxl_memory_module(... &mem_mod_event_uuid..)
etc.

It's a little ugly to match on the UUID to call a function where it
hard coded, but less so than inserting the UUID like this does.
Better I think to make it obvious that this isn't actually a variable
(for the ones we understand).

Jonathan

>
> ---
> NOTE this patch depends on Dan's addition of a device guard[1].
>
> [1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
>
> Changes for v3/v4:
> [djbw: define a __free(pci_dev_put) to release the device automatically]
> [djbw: use device guard from Vishal]
> [iweiny: delete old notifier block structure]
> [iweiny: adjust for new notifier interface]
> ---
> drivers/cxl/core/mbox.c | 31 +++++++++++++++++++++++-----
> drivers/cxl/cxlmem.h | 4 ++++
> drivers/cxl/pci.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/pci.h | 2 ++
> 4 files changed, 86 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index b7efa058a100..c9aa723e3391 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -840,9 +840,30 @@ static const uuid_t gen_media_event_uuid = CXL_EVENT_GEN_MEDIA_UUID;
> static const uuid_t dram_event_uuid = CXL_EVENT_DRAM_UUID;
> static const uuid_t mem_mod_event_uuid = CXL_EVENT_MEM_MODULE_UUID;
>
> -static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - struct cxl_event_record_raw *record)
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + union cxl_event *event)
> +{
> + switch (event_type) {
> + case CXL_CPER_EVENT_GEN_MEDIA:
> + trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
> + &event->gen_media);
> + break;
> + case CXL_CPER_EVENT_DRAM:
> + trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
> + break;
> + case CXL_CPER_EVENT_MEM_MODULE:
> + trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
> + &event->mem_module);
> + break;
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +
> +static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> {
> union cxl_event *evt = &record->event;
> uuid_t *id = &record->id;
> @@ -965,8 +986,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> break;
>
> for (i = 0; i < nr_rec; i++)
> - cxl_event_trace_record(cxlmd, type,
> - &payload->records[i]);
> + __cxl_event_trace_record(cxlmd, type,
> + &payload->records[i]);
>
> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_overflow(cxlmd, type, payload);

> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0155fb66b580..638275569d63 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include <asm-generic/unaligned.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/moduleparam.h>
> #include <linux/module.h>
> @@ -969,6 +970,58 @@ static struct pci_driver cxl_pci_driver = {
> },
> };
>
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> + struct cxl_cper_event_rec *rec)
> +{
> + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> + struct cxl_dev_state *cxlds = NULL;
> + enum cxl_event_log_type log_type;
> + unsigned int devfn;
> + u32 hdr_flags;
> +
> + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> + device_id->bus_num, devfn);
> + if (!pdev)
> + return;
> +
> + guard(device)(&pdev->dev);
> + if (pdev->driver == &cxl_pci_driver)
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;

This is handling two conditions. I'd find it more readable split like:

if (pdev->driver != &cxl_pci_driver)
return;

cxlds = pci_get_drvdata(pdev);
if (!cxlds)
return;

and drop the = NULL above.

> +
> + /* Fabricate a log type */
> + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type, &rec->event);
> +}
> +


2023-12-19 15:44:15

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] firmware/efi: Process CXL Component Events

Dan Williams wrote:
> Smita Koralahalli wrote:
> > On 12/15/2023 3:26 PM, Ira Weiny wrote:

[snip]

> > > + struct {
> > > + u32 length;
> > > + u64 validation_bits;
> > > + struct cper_cxl_event_devid {
> > > + u16 vendor_id;
> > > + u16 device_id;
> > > + u8 func_num;
> > > + u8 device_num;
> > > + u8 bus_num;
> > > + u16 segment_num;
> > > + u16 slot_num; /* bits 2:0 reserved */
> > > + u8 reserved;
> > > + } device_id __packed;
> > > + struct cper_cxl_event_sn {
> > > + u32 lower_dw;
> > > + u32 upper_dw;
> > > + } dev_serial_num __packed;
> > > + } hdr __packed;
> > > +
> > > + union cxl_event event;
> > > +} __packed;
> > > +
> >
> > For some reason, prefixing the struct name with __packed attribute seems
> > to do the job. ("__packed device_id" and "__packed dev_serial_num").
>
> Good catch, yeah, the expectation is that follows the closing brace not
> only to match the predominant style in the kernel, but gcc appears to
> not honor it otherwise. Looks better with this on top:

Very good catch. I did not mean to do this at all... :-(

>
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 2b137aead750..975925029f6d 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -130,12 +130,12 @@ struct cxl_cper_event_rec {
> u16 segment_num;
> u16 slot_num; /* bits 2:0 reserved */
> u8 reserved;
> - } device_id __packed;
> + } __packed device_id;
> struct cper_cxl_event_sn {
> u32 lower_dw;
> u32 upper_dw;
> - } dev_serial_num __packed;
> - } hdr __packed;
> + } __packed dev_serial_num;
> + } __packed hdr;
>
> union cxl_event event;
> } __packed;
>

Yes thanks,
Ira

2023-12-19 17:00:54

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

Dan Williams wrote:
> Smita Koralahalli wrote:
> > On 12/15/2023 3:26 PM, Ira Weiny wrote:
> > > If the firmware has configured CXL event support to be firmware first
> > > the OS can process those events through CPER records. The CXL layer has
> > > unique DPA to HPA knowledge and standard event trace parsing in place.
> > >
> > > CPER records contain Bus, Device, Function information which can be used
> > > to identify the PCI device which is sending the event.
> > >
> > > Change pci driver registration to include registration for a CXL CPER
> > > notifier to process the events through the trace subsystem.
> > >
> > > Define and use scoped based management to simplify the handling of the
> > > pci device object.
> > >
> > > Cc: Bjorn Helgaas <[email protected]>
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > > ---
> >
> > [snip]
> >
> >
> > > + switch (event_type) {
> > > + case CXL_CPER_EVENT_GEN_MEDIA:
> > > + trace_cxl_general_media(cxlmd, type, &gen_media_event_uuid,
> > > + &event->gen_media);
> > > + break;
> > > + case CXL_CPER_EVENT_DRAM:
> > > + trace_cxl_dram(cxlmd, type, &dram_event_uuid, &event->dram);
> > > + break;
> > > + case CXL_CPER_EVENT_MEM_MODULE:
> > > + trace_cxl_memory_module(cxlmd, type, &mem_mod_event_uuid,
> > > + &event->mem_module);
> > > + break;
> > > + }
> > > +}
> >
> > Is default case needed here?
>
> Yeah, it looks like an uninitialized @type value can be passed through
> the stack here.

That was not my intention but yea.

Added a generic trace with a null UUID.

[snip]

> > > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > > +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> > > + struct cxl_cper_event_rec *rec)
> > > +{
> > > + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> > > + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > > + struct cxl_dev_state *cxlds = NULL;
> > > + enum cxl_event_log_type log_type;
> > > + unsigned int devfn;
> > > + u32 hdr_flags;
> > > +
> > > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > > + device_id->bus_num, devfn);
> > > + if (!pdev)
> > > + return;
> > > +
> > > + guard(device)(&pdev->dev);
> > > + if (pdev->driver == &cxl_pci_driver)
> > > + cxlds = pci_get_drvdata(pdev);
> > > + if (!cxlds)
> > > + return;
> > > +
> > > + /* Fabricate a log type */
> > > + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> > > + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> > > +
> > > + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type, &rec->event);
> >
> > Currently, when I run this, I see two trace events printed. One from
> > here, and another as a non_standard_event from ghes. I think both should
> > be unified?

By the way, Smita,

Thanks for testing! I really do appreciate it!

> >
> > I remember Dan pointing out to me this when I sent decoding for protocol
> > errors and its still pending on me for protocol errors.
>
> Good point, so I think the responsibility to trace CXL events should
> belong to ghes_do_proc() and ghes_print_estatus() can just ignore CXL
> events.
>
> Notice how ghes_proc() sometimes skips ghes_print_estatus(), but
> uncoditionally emits a trace event in ghes_do_proc()? To me that means
> that the cper_estatus_print() inside ghes_print_estatus() can just defer
> to the ghes code to do the hookup to the trace code.
>
> For example, ras_userspace_consumers() was introduced to skip emitting
> events to the kernel log when the trace event might be handled. My
> assumption is that was for historical reasons, but since CXL events are
> new, just never emit them to the kernel log and always require the trace
> path.
>
> I am open to other thoughts here, but it seems like ghes_do_proc() is
> where the callback needs to be triggered.

I see.

Ok. I'll create a pre-patch which moves the protocol error first then
I'll put the events in the ghes_do_proc() well.


[snip]

> > > + rc = cxl_cper_register_notifier(cxl_cper_event_call);
>
> Quick aside as I am reading through this, the "notifier" name is
> misleading since this callback has nothing to do with the
> include/linux/notifier.h API.

Fair point. I debated 'callback' vs 'notifier'. I'll change it to
callback as I think that is equally correct and as you say does clarify
this is not a 'notifier'.

Ira

2023-12-19 17:17:29

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

Ira Weiny wrote:
> Dan Williams wrote:
> > Smita Koralahalli wrote:
> > > On 12/15/2023 3:26 PM, Ira Weiny wrote:

[snip]

> > > I remember Dan pointing out to me this when I sent decoding for protocol
> > > errors and its still pending on me for protocol errors.
> >
> > Good point, so I think the responsibility to trace CXL events should
> > belong to ghes_do_proc() and ghes_print_estatus() can just ignore CXL
> > events.
> >
> > Notice how ghes_proc() sometimes skips ghes_print_estatus(), but
> > uncoditionally emits a trace event in ghes_do_proc()? To me that means
> > that the cper_estatus_print() inside ghes_print_estatus() can just defer
> > to the ghes code to do the hookup to the trace code.
> >
> > For example, ras_userspace_consumers() was introduced to skip emitting
> > events to the kernel log when the trace event might be handled. My
> > assumption is that was for historical reasons, but since CXL events are
> > new, just never emit them to the kernel log and always require the trace
> > path.
> >
> > I am open to other thoughts here, but it seems like ghes_do_proc() is
> > where the callback needs to be triggered.
>
> I see.
>
> Ok. I'll create a pre-patch which moves the protocol error first then
> I'll put the events in the ghes_do_proc() well.
>

Apologies. I really wanted to make this work a pre-cursor patch but I
see that there is not a trace point for the protocol errors yet. So as
not to slow the progress of this work I'm going to skip moving the
protocol stuff right now.

Also, as part of this work I think moving the CXL specific defines into
the common linux/cper.h is appropriate at this time.

Unless I hear otherwise I'm going to land the event stuff in that common
header and we can move the protocol error defines later.

Thanks again for all the testing,
Ira

2023-12-19 18:05:49

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] firmware/efi: Process CXL Component Events

Jonathan Cameron wrote:
> On Fri, 15 Dec 2023 15:26:32 -0800
> Ira Weiny <[email protected]> wrote:

[snip]

> >
> > Cc: Ard Biesheuvel <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> One trivial thing inline.
>
> + Agree that notifier naming is unwise given what that means elsewhere in the
> kernel.

Changing that.

>
> > diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
> > index 86bfcf7909ec..71f27b3e2810 100644
> > --- a/drivers/firmware/efi/cper_cxl.h
> > +++ b/drivers/firmware/efi/cper_cxl.h
> > @@ -10,11 +10,38 @@
> > #ifndef LINUX_CPER_CXL_H
> > #define LINUX_CPER_CXL_H
> >
> > +#include <linux/cxl-event.h>
> > +
> > /* CXL Protocol Error Section */
> > #define CPER_SEC_CXL_PROT_ERR \
> > GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78, \
> > 0x4B, 0x77, 0x10, 0x48)
> >
> > +/* CXL Event record UUIDs are formated at GUIDs and reported in section type */
>
> as GUIDs
>

Fixed, thanks,
Ira

2023-12-19 23:27:34

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

Jonathan Cameron wrote:
> On Fri, 15 Dec 2023 15:26:33 -0800
> Ira Weiny <[email protected]> wrote:
>
> > If the firmware has configured CXL event support to be firmware first
> > the OS can process those events through CPER records. The CXL layer has
> > unique DPA to HPA knowledge and standard event trace parsing in place.
> >
> > CPER records contain Bus, Device, Function information which can be used
> > to identify the PCI device which is sending the event.
> >
> > Change pci driver registration to include registration for a CXL CPER
> > notifier to process the events through the trace subsystem.
> >
> > Define and use scoped based management to simplify the handling of the
> > pci device object.
> >
> > Cc: Bjorn Helgaas <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
>
> I'd break out the pci guard stuff as a precursor patch. That's likely
> to be used elsewhere so it would help for backporting for other users
> if it wasn't buried in a patch doing other stuff.

That is good. I've done that.

>
> Not to mention that has a different set of likely reviewers to the rest
> of this patch.

Yep.

>
> More generally maybe we should just hardcode the UUID in the tracepoint
> definitions? I think for everything other than the generic one we
> only ever call trace_cxl_memory_module(... &mem_mod_event_uuid..)
> etc.
>
> It's a little ugly to match on the UUID to call a function where it
> hard coded, but less so than inserting the UUID like this does.
> Better I think to make it obvious that this isn't actually a variable
> (for the ones we understand).

I thought about that a bit. But I built the tracing code with generic
macros which contained the UUID. That complicated my efforts.

I've reworked it again and it took a bit of time but I got it to work. It
was not that hard but there is a caveat in the generic macros, which I
made a note of.

[snip]

> >
> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> > + struct cxl_cper_event_rec *rec)
> > +{
> > + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> > + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > + struct cxl_dev_state *cxlds = NULL;
> > + enum cxl_event_log_type log_type;
> > + unsigned int devfn;
> > + u32 hdr_flags;
> > +
> > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > + device_id->bus_num, devfn);
> > + if (!pdev)
> > + return;
> > +
> > + guard(device)(&pdev->dev);
> > + if (pdev->driver == &cxl_pci_driver)
> > + cxlds = pci_get_drvdata(pdev);
> > + if (!cxlds)
> > + return;
>
> This is handling two conditions. I'd find it more readable split like:
>
> if (pdev->driver != &cxl_pci_driver)
> return;
>
> cxlds = pci_get_drvdata(pdev);
> if (!cxlds)
> return;
>
> and drop the = NULL above.

Done.

Ira

2023-12-19 23:37:15

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

Ira Weiny wrote:
[..]
> > and drop the = NULL above.
>
> Done.

The NULL assignment was more about making it clear that
__free(pci_dev_put) will take no action until the pdev is acquired.
Otherwise, any future refactoring that introduces a 'return' before
@pdev is acquired needs to be careful to assign @pdev to NULL. So, just
include it in the declaration more as a __free() declaration style issue
than a correctness issue.

2023-12-20 00:29:55

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

Dan Williams wrote:
> Ira Weiny wrote:
> [..]
> > > and drop the = NULL above.
> >
> > Done.
>
> The NULL assignment was more about making it clear that
> __free(pci_dev_put) will take no action until the pdev is acquired.
> Otherwise, any future refactoring that introduces a 'return' before
> @pdev is acquired needs to be careful to assign @pdev to NULL. So, just
> include it in the declaration more as a __free() declaration style issue
> than a correctness issue.

I think he meant the assignment to cxlds. At least that is the NULL
assignment I took out.

Ira

2024-01-03 10:41:30

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

On Tue, 19 Dec 2023 16:29:08 -0800
Ira Weiny <[email protected]> wrote:

> Dan Williams wrote:
> > Ira Weiny wrote:
> > [..]
> > > > and drop the = NULL above.
> > >
> > > Done.
> >
> > The NULL assignment was more about making it clear that
> > __free(pci_dev_put) will take no action until the pdev is acquired.
> > Otherwise, any future refactoring that introduces a 'return' before
> > @pdev is acquired needs to be careful to assign @pdev to NULL. So, just
> > include it in the declaration more as a __free() declaration style issue
> > than a correctness issue.
>
> I think he meant the assignment to cxlds. At least that is the NULL
> assignment I took out.
>
> Ira

yup. The other one is correct as Dan pointed out.