Series status/background
========================
Smita has been a great help with this series. Thank you again!
Smita's testing found that the GHES code ended up printing the events
twice. This version avoids the duplicate print by calling the callback
from the GHES code instead of the EFI code as suggested by Dan.
Dependencies
============
NOTE this series still depends on Dan's addition of a device guard[1].
Therefore, the base commit is not a stable commit. I've pushed a branch
with this commit included for testing if folks are interested.[2]
[1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
[2] https://github.com/weiny2/linux-kernel/tree/cxl-cper-2023-12-20
Cover letter
============
CXL Component Events, as defined by EFI 2.10 Section N.2.14, wrap a
mostly CXL event payload in an EFI Common Platform Error Record (CPER)
record. If a device is configured for firmware first CXL event records
are not sent directly to the host.
The CXL sub-system uniquely has DPA to HPA translation information. It
also already has event format tracing. Restructure the code to make
sharing the data between CPER/event logs most efficient. Then send the
CXL CPER records to the CXL sub-system for processing.
With event logs the events interrupt the driver directly. In the EFI
case events are wrapped with device information which allows the CXL
subsystem to identify the PCI device.
Previous version considered matching the memdev differently. However,
the most robust was to find the PCI device via Bus, Device, Function and
use the PCI device to find the driver data.
CPER records are identified with GUID's while CXL event logs contain
UUID's. The UUID is reported for all events no matter the source.
While the UUID is redundant for the known events the UUID's are already
used by rasdaemon. To keep compatibility UUIDs are still reported.
In addition this series cleans up the UUID defines.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes in v5:
- Smita/djbw: trigger trace from ghes_do_proc()
- Jonathan: split out pci scoped based functions to it's own patch
- Jonathan: remove unneeded static uuid variables
- Smita/djbw: trace an unknown event type as a generic with null UUID
- Jonathan: code clean ups
- Link to v4: https://lore.kernel.org/r/[email protected]
---
Ira Weiny (9):
cxl/trace: Pass uuid explicitly to event traces
cxl/events: Promote CXL event structures to a core header
cxl/events: Create common event UUID defines
cxl/events: Remove passing a UUID to known event traces
cxl/events: Separate UUID from event structures
cxl/events: Create a CXL event union
acpi/ghes: Process CXL Component Events
PCI: Define scoped based management functions
cxl/pci: Register for and process CPER events
drivers/acpi/apei/ghes.c | 88 +++++++++++++++++++++++
drivers/cxl/core/mbox.c | 87 +++++++++++------------
drivers/cxl/core/trace.h | 14 ++--
drivers/cxl/cxlmem.h | 110 +++++++----------------------
drivers/cxl/pci.c | 58 ++++++++++++++-
include/linux/cxl-event.h | 162 ++++++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 2 +
tools/testing/cxl/test/mem.c | 163 ++++++++++++++++++++++++-------------------
8 files changed, 476 insertions(+), 208 deletions(-)
---
base-commit: 6436863dfabce0d7ac416c8dc661fd513b967d39
change-id: 20230601-cxl-cper-26ffc839c6c6
Best regards,
--
Ira Weiny <[email protected]>
The UUID data is redundant in the known event trace types. The addition
of static defines allows the trace macros to create the UUID data inside
the trace thus removing unnecessary code.
Have well known trace events use static data to set the uuid field based
on the event type.
Suggested-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Changes for v5:
New patch
---
drivers/cxl/core/mbox.c | 6 +++---
drivers/cxl/core/trace.h | 28 ++++++++++++++++------------
2 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 1ccc3a56e0af..5f3681de10de 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -846,16 +846,16 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
struct cxl_event_gen_media *rec =
(struct cxl_event_gen_media *)record;
- trace_cxl_general_media(cxlmd, type, id, rec);
+ trace_cxl_general_media(cxlmd, type, rec);
} else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID)) {
struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
- trace_cxl_dram(cxlmd, type, id, rec);
+ trace_cxl_dram(cxlmd, type, rec);
} else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID)) {
struct cxl_event_mem_module *rec =
(struct cxl_event_mem_module *)record;
- trace_cxl_memory_module(cxlmd, type, id, rec);
+ trace_cxl_memory_module(cxlmd, type, rec);
} else {
/* For unknown record types print just the header */
trace_cxl_generic_event(cxlmd, type, id, record);
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 3da16026b8db..312cfa9e0004 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -181,6 +181,7 @@ TRACE_EVENT(cxl_overflow,
* 1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
* 2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
* pass the dev, log, and CXL event header
+ * NOTE: The uuid must be assigned by the specific trace event
* 3) Use CXL_EVT_TP_printk() instead of TP_printk()
*
* See the generic_event tracepoint as an example.
@@ -198,12 +199,11 @@ TRACE_EVENT(cxl_overflow,
__field(u8, hdr_length) \
__field(u8, hdr_maint_op_class)
-#define CXL_EVT_TP_fast_assign(cxlmd, l, uuid, hdr) \
+#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
__assign_str(memdev, dev_name(&(cxlmd)->dev)); \
__assign_str(host, dev_name((cxlmd)->dev.parent)); \
__entry->log = (l); \
__entry->serial = (cxlmd)->cxlds->serial; \
- memcpy(&__entry->hdr_uuid, (uuid), sizeof(uuid_t)); \
__entry->hdr_length = (hdr).length; \
__entry->hdr_flags = get_unaligned_le24((hdr).flags); \
__entry->hdr_handle = le16_to_cpu((hdr).handle); \
@@ -235,7 +235,8 @@ TRACE_EVENT(cxl_generic_event,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ memcpy(&__entry->hdr_uuid, uuid, sizeof(uuid_t));
memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),
@@ -315,9 +316,9 @@ TRACE_EVENT(cxl_generic_event,
TRACE_EVENT(cxl_general_media,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- const uuid_t *uuid, struct cxl_event_gen_media *rec),
+ struct cxl_event_gen_media *rec),
- TP_ARGS(cxlmd, log, uuid, rec),
+ TP_ARGS(cxlmd, log, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -336,7 +337,8 @@ TRACE_EVENT(cxl_general_media,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ memcpy(&__entry->hdr_uuid, &CXL_EVENT_GEN_MEDIA_UUID, sizeof(uuid_t));
/* General Media */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -398,9 +400,9 @@ TRACE_EVENT(cxl_general_media,
TRACE_EVENT(cxl_dram,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- const uuid_t *uuid, struct cxl_event_dram *rec),
+ struct cxl_event_dram *rec),
- TP_ARGS(cxlmd, log, uuid, rec),
+ TP_ARGS(cxlmd, log, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -422,7 +424,8 @@ TRACE_EVENT(cxl_dram,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ memcpy(&__entry->hdr_uuid, &CXL_EVENT_DRAM_UUID, sizeof(uuid_t));
/* DRAM */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -547,9 +550,9 @@ TRACE_EVENT(cxl_dram,
TRACE_EVENT(cxl_memory_module,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- const uuid_t *uuid, struct cxl_event_mem_module *rec),
+ struct cxl_event_mem_module *rec),
- TP_ARGS(cxlmd, log, uuid, rec),
+ TP_ARGS(cxlmd, log, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -569,7 +572,8 @@ TRACE_EVENT(cxl_memory_module,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ memcpy(&__entry->hdr_uuid, &CXL_EVENT_MEM_MODULE_UUID, sizeof(uuid_t));
/* Memory Module Event */
__entry->event_type = rec->event_type;
--
2.43.0
UEFI code can process CXL events through CPER records. Those records
use almost the same format as the CXL events.
Lift the CXL event structures to a core header to be shared in later
patches.
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/cxlmem.h | 90 +----------------------------------------
include/linux/cxl-event.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+), 89 deletions(-)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a2fcbca253f3..f0e7ebb84f02 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -6,6 +6,7 @@
#include <linux/cdev.h>
#include <linux/uuid.h>
#include <linux/rcuwait.h>
+#include <linux/cxl-event.h>
#include "cxl.h"
/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -579,27 +580,6 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
-struct cxl_event_record_hdr {
- uuid_t id;
- u8 length;
- u8 flags[3];
- __le16 handle;
- __le16 related_handle;
- __le64 timestamp;
- u8 maint_op_class;
- u8 reserved[15];
-} __packed;
-
-#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- struct cxl_event_record_hdr hdr;
- u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
-} __packed;
-
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -641,74 +621,6 @@ struct cxl_mbox_clear_event_payload {
} __packed;
#define CXL_CLEAR_EVENT_MAX_HANDLES U8_MAX
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
-struct cxl_event_gen_media {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 device[3];
- u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
- u8 reserved[46];
-} __packed;
-
-/*
- * DRAM Event Record - DER
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
- */
-#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
-struct cxl_event_dram {
- struct cxl_event_record_hdr hdr;
- __le64 phys_addr;
- u8 descriptor;
- u8 type;
- u8 transaction_type;
- u8 validity_flags[2];
- u8 channel;
- u8 rank;
- u8 nibble_mask[3];
- u8 bank_group;
- u8 bank;
- u8 row[3];
- u8 column[2];
- u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
- u8 reserved[0x17];
-} __packed;
-
-/*
- * Get Health Info Record
- * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
- */
-struct cxl_get_health_info {
- u8 health_status;
- u8 media_status;
- u8 add_status;
- u8 life_used;
- u8 device_temp[2];
- u8 dirty_shutdown_cnt[4];
- u8 cor_vol_err_cnt[4];
- u8 cor_per_err_cnt[4];
-} __packed;
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-struct cxl_event_mem_module {
- struct cxl_event_record_hdr hdr;
- u8 event_type;
- struct cxl_get_health_info info;
- u8 reserved[0x3d];
-} __packed;
-
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
new file mode 100644
index 000000000000..1c94e8fdd227
--- /dev/null
+++ b/include/linux/cxl-event.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_CXL_EVENT_H
+#define _LINUX_CXL_EVENT_H
+
+/*
+ * CXL event records; CXL rev 3.0
+ *
+ * Copyright(c) 2023 Intel Corporation.
+ */
+
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+ uuid_t id;
+ u8 length;
+ u8 flags[3];
+ __le16 handle;
+ __le16 related_handle;
+ __le64 timestamp;
+ u8 maint_op_class;
+ u8 reserved[15];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+ struct cxl_event_record_hdr hdr;
+ u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
+struct cxl_event_gen_media {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 device[3];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 reserved[46];
+} __packed;
+
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
+struct cxl_event_dram {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ u8 column[2];
+ u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+ u8 reserved[0x17];
+} __packed;
+
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+ u8 health_status;
+ u8 media_status;
+ u8 add_status;
+ u8 life_used;
+ u8 device_temp[2];
+ u8 dirty_shutdown_cnt[4];
+ u8 cor_vol_err_cnt[4];
+ u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+ struct cxl_event_record_hdr hdr;
+ u8 event_type;
+ struct cxl_get_health_info info;
+ u8 reserved[0x3d];
+} __packed;
+
+#endif /* _LINUX_CXL_EVENT_H */
--
2.43.0
The UEFI CXL CPER structure does not include the UUID. Now that the
UUID is passed separately to the trace event there is no need to have
the UUID in those structures.
Move UUID from the event record header to the raw structures. Adjust
cxl-test to Create dummy structures for creating test records.
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 2 +-
include/linux/cxl-event.h | 10 ++--
tools/testing/cxl/test/mem.c | 129 +++++++++++++++++++++++++------------------
3 files changed, 81 insertions(+), 60 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5f3681de10de..4c5161896826 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -840,7 +840,7 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
- uuid_t *id = &record->hdr.id;
+ uuid_t *id = &record->id;
if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID)) {
struct cxl_event_gen_media *rec =
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 1c94e8fdd227..ebb00ead1496 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -8,12 +8,7 @@
* Copyright(c) 2023 Intel Corporation.
*/
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
struct cxl_event_record_hdr {
- uuid_t id;
u8 length;
u8 flags[3];
__le16 handle;
@@ -23,8 +18,13 @@ struct cxl_event_record_hdr {
u8 reserved[15];
} __packed;
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
struct cxl_event_record_raw {
+ uuid_t id;
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 5a95b04b329a..9cc2b8ce1efd 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -337,9 +337,9 @@ static void cxl_mock_event_trigger(struct device *dev)
}
struct cxl_event_record_raw maint_needed = {
+ .id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
+ 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.hdr = {
- .id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
- 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.length = sizeof(struct cxl_event_record_raw),
.flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
/* .handle = Set dynamically */
@@ -349,9 +349,9 @@ struct cxl_event_record_raw maint_needed = {
};
struct cxl_event_record_raw hardware_replace = {
+ .id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
+ 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.hdr = {
- .id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
- 0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
.length = sizeof(struct cxl_event_record_raw),
.flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
/* .handle = Set dynamically */
@@ -360,61 +360,82 @@ struct cxl_event_record_raw hardware_replace = {
.data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
-struct cxl_event_gen_media gen_media = {
- .hdr = {
- .id = CXL_EVENT_GEN_MEDIA_UUID,
- .length = sizeof(struct cxl_event_gen_media),
- .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_gen_media {
+ uuid_t id;
+ struct cxl_event_gen_media rec;
+} __packed;
+
+struct cxl_test_gen_media gen_media = {
+ .id = CXL_EVENT_GEN_MEDIA_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_gen_media),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x2000),
+ .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+ .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+ /* .validity_flags = <set below> */
+ .channel = 1,
+ .rank = 30
},
- .phys_addr = cpu_to_le64(0x2000),
- .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
- .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
- .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
- /* .validity_flags = <set below> */
- .channel = 1,
- .rank = 30
};
-struct cxl_event_dram dram = {
- .hdr = {
- .id = CXL_EVENT_DRAM_UUID,
- .length = sizeof(struct cxl_event_dram),
- .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_dram {
+ uuid_t id;
+ struct cxl_event_dram rec;
+} __packed;
+
+struct cxl_test_dram dram = {
+ .id = CXL_EVENT_DRAM_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_dram),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x8000),
+ .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+ .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+ /* .validity_flags = <set below> */
+ .channel = 1,
+ .bank_group = 5,
+ .bank = 2,
+ .column = {0xDE, 0xAD},
},
- .phys_addr = cpu_to_le64(0x8000),
- .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
- .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
- .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
- /* .validity_flags = <set below> */
- .channel = 1,
- .bank_group = 5,
- .bank = 2,
- .column = {0xDE, 0xAD},
};
-struct cxl_event_mem_module mem_module = {
- .hdr = {
- .id = CXL_EVENT_MEM_MODULE_UUID,
- .length = sizeof(struct cxl_event_mem_module),
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0),
+struct cxl_test_mem_module {
+ uuid_t id;
+ struct cxl_event_mem_module rec;
+} __packed;
+
+struct cxl_test_mem_module mem_module = {
+ .id = CXL_EVENT_MEM_MODULE_UUID,
+ .rec = {
+ .hdr = {
+ .length = sizeof(struct cxl_test_mem_module),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .event_type = CXL_MMER_TEMP_CHANGE,
+ .info = {
+ .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+ .media_status = CXL_DHI_MS_ALL_DATA_LOST,
+ .add_status = (CXL_DHI_AS_CRITICAL << 2) |
+ (CXL_DHI_AS_WARNING << 4) |
+ (CXL_DHI_AS_WARNING << 5),
+ .device_temp = { 0xDE, 0xAD},
+ .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ }
},
- .event_type = CXL_MMER_TEMP_CHANGE,
- .info = {
- .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
- .media_status = CXL_DHI_MS_ALL_DATA_LOST,
- .add_status = (CXL_DHI_AS_CRITICAL << 2) |
- (CXL_DHI_AS_WARNING << 4) |
- (CXL_DHI_AS_WARNING << 5),
- .device_temp = { 0xDE, 0xAD},
- .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
- .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
- }
};
static int mock_set_timestamp(struct cxl_dev_state *cxlds,
@@ -436,11 +457,11 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
static void cxl_mock_add_event_logs(struct mock_event_store *mes)
{
put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK,
- &gen_media.validity_flags);
+ &gen_media.rec.validity_flags);
put_unaligned_le16(CXL_DER_VALID_CHANNEL | CXL_DER_VALID_BANK_GROUP |
CXL_DER_VALID_BANK | CXL_DER_VALID_COLUMN,
- &dram.validity_flags);
+ &dram.rec.validity_flags);
mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
mes_add_event(mes, CXL_EVENT_TYPE_INFO,
--
2.43.0
Dan points out in review that the cxl_test code could be made better
through the use of UUID's defines rather than being open coded.[1]
Create UUID defines and use them rather than open coding them.
[1] https://lore.kernel.org/all/[email protected]/
Suggested-by: Dan Williams <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Changes for v5:
[Jonathan: eliminate the static uuid variables]
---
drivers/cxl/core/mbox.c | 30 +++---------------------------
drivers/cxl/cxlmem.h | 24 ++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 9 +++------
3 files changed, 30 insertions(+), 33 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 00f429c440df..1ccc3a56e0af 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -836,46 +836,22 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-static const uuid_t gen_media_event_uuid =
- UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
-
-/*
- * DRAM Event Record
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
- */
-static const uuid_t dram_event_uuid =
- UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-static const uuid_t mem_mod_event_uuid =
- UUID_INIT(0xfe927475, 0xdd59, 0x4339,
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
-
static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
uuid_t *id = &record->hdr.id;
- if (uuid_equal(id, &gen_media_event_uuid)) {
+ if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID)) {
struct cxl_event_gen_media *rec =
(struct cxl_event_gen_media *)record;
trace_cxl_general_media(cxlmd, type, id, rec);
- } else if (uuid_equal(id, &dram_event_uuid)) {
+ } else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID)) {
struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
trace_cxl_dram(cxlmd, type, id, rec);
- } else if (uuid_equal(id, &mem_mod_event_uuid)) {
+ } else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID)) {
struct cxl_event_mem_module *rec =
(struct cxl_event_mem_module *)record;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index f0e7ebb84f02..e5d770e26e02 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -580,6 +580,30 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;
+/*
+ * General Media Event Record UUID
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MEDIA_UUID \
+ UUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record UUID
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CXL_EVENT_DRAM_UUID \
+ UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record UUID
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CXL_EVENT_MEM_MODULE_UUID \
+ UUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
/*
* Get Event Records output payload
* CXL rev 3.0 section 8.2.9.2.2; Table 8-50
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index ee61fa3a2411..5a95b04b329a 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -362,8 +362,7 @@ struct cxl_event_record_raw hardware_replace = {
struct cxl_event_gen_media gen_media = {
.hdr = {
- .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+ .id = CXL_EVENT_GEN_MEDIA_UUID,
.length = sizeof(struct cxl_event_gen_media),
.flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
/* .handle = Set dynamically */
@@ -380,8 +379,7 @@ struct cxl_event_gen_media gen_media = {
struct cxl_event_dram dram = {
.hdr = {
- .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+ .id = CXL_EVENT_DRAM_UUID,
.length = sizeof(struct cxl_event_dram),
.flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
/* .handle = Set dynamically */
@@ -400,8 +398,7 @@ struct cxl_event_dram dram = {
struct cxl_event_mem_module mem_module = {
.hdr = {
- .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+ .id = CXL_EVENT_MEM_MODULE_UUID,
.length = sizeof(struct cxl_event_mem_module),
/* .handle = Set dynamically */
.related_handle = cpu_to_le16(0),
--
2.43.0
CXL CPER events are identified by the CPER Section Type GUID. The GUID
correlates with the CXL UUID for the event record. It turns out that a
CXL CPER record is a strict subset of the CXL event record, only the
UUID header field is chopped.
In order to unify handling between native and CPER flavors of CXL
events, prepare the code for the UUID to be passed in rather than
inferred from the record itself.
Later patches update the passed in record to only refer to the common
data between the formats.
Pass the UUID explicitly to each trace event to be able to remove the
UUID from the event structures.
Originally it was desirable to remove the UUID from the well known event
because the UUID value was redundant. However, the trace API was
already in place.[1]
[1] https://lore.kernel.org/all/[email protected]/
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 8 ++++----
drivers/cxl/core/trace.h | 28 ++++++++++++++--------------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 36270dcfb42e..00f429c440df 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -870,19 +870,19 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
struct cxl_event_gen_media *rec =
(struct cxl_event_gen_media *)record;
- trace_cxl_general_media(cxlmd, type, rec);
+ trace_cxl_general_media(cxlmd, type, id, rec);
} else if (uuid_equal(id, &dram_event_uuid)) {
struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
- trace_cxl_dram(cxlmd, type, rec);
+ trace_cxl_dram(cxlmd, type, id, rec);
} else if (uuid_equal(id, &mem_mod_event_uuid)) {
struct cxl_event_mem_module *rec =
(struct cxl_event_mem_module *)record;
- trace_cxl_memory_module(cxlmd, type, rec);
+ trace_cxl_memory_module(cxlmd, type, id, rec);
} else {
/* For unknown record types print just the header */
- trace_cxl_generic_event(cxlmd, type, record);
+ trace_cxl_generic_event(cxlmd, type, id, record);
}
}
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a0b5819bc70b..3da16026b8db 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -198,12 +198,12 @@ TRACE_EVENT(cxl_overflow,
__field(u8, hdr_length) \
__field(u8, hdr_maint_op_class)
-#define CXL_EVT_TP_fast_assign(cxlmd, l, hdr) \
+#define CXL_EVT_TP_fast_assign(cxlmd, l, uuid, hdr) \
__assign_str(memdev, dev_name(&(cxlmd)->dev)); \
__assign_str(host, dev_name((cxlmd)->dev.parent)); \
__entry->log = (l); \
__entry->serial = (cxlmd)->cxlds->serial; \
- memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
+ memcpy(&__entry->hdr_uuid, (uuid), sizeof(uuid_t)); \
__entry->hdr_length = (hdr).length; \
__entry->hdr_flags = get_unaligned_le24((hdr).flags); \
__entry->hdr_handle = le16_to_cpu((hdr).handle); \
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
TRACE_EVENT(cxl_generic_event,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_record_raw *rec),
+ const uuid_t *uuid, struct cxl_event_record_raw *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -235,7 +235,7 @@ TRACE_EVENT(cxl_generic_event,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),
@@ -315,9 +315,9 @@ TRACE_EVENT(cxl_generic_event,
TRACE_EVENT(cxl_general_media,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_gen_media *rec),
+ const uuid_t *uuid, struct cxl_event_gen_media *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -336,7 +336,7 @@ TRACE_EVENT(cxl_general_media,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
/* General Media */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -398,9 +398,9 @@ TRACE_EVENT(cxl_general_media,
TRACE_EVENT(cxl_dram,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_dram *rec),
+ const uuid_t *uuid, struct cxl_event_dram *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -422,7 +422,7 @@ TRACE_EVENT(cxl_dram,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
/* DRAM */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -547,9 +547,9 @@ TRACE_EVENT(cxl_dram,
TRACE_EVENT(cxl_memory_module,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- struct cxl_event_mem_module *rec),
+ const uuid_t *uuid, struct cxl_event_mem_module *rec),
- TP_ARGS(cxlmd, log, rec),
+ TP_ARGS(cxlmd, log, uuid, rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -569,7 +569,7 @@ TRACE_EVENT(cxl_memory_module,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
/* Memory Module Event */
__entry->event_type = rec->event_type;
--
2.43.0
The CXL CPER and event log records share everything but a UUID/GUID in
their structures.
Define a cxl_event union without the UUID/GUID to be shared between the
CPER and event log record formats. Adjust the code to use this union.
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/core/mbox.c | 32 +++++++++++++-------------------
drivers/cxl/core/trace.h | 8 ++++----
include/linux/cxl-event.h | 23 +++++++++++++++++------
tools/testing/cxl/test/mem.c | 31 ++++++++++++++++++-------------
4 files changed, 52 insertions(+), 42 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 4c5161896826..06957696247b 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -840,26 +840,17 @@ static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
enum cxl_event_log_type type,
struct cxl_event_record_raw *record)
{
+ union cxl_event *evt = &record->event;
uuid_t *id = &record->id;
- if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID)) {
- struct cxl_event_gen_media *rec =
- (struct cxl_event_gen_media *)record;
-
- trace_cxl_general_media(cxlmd, type, rec);
- } else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID)) {
- struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
-
- trace_cxl_dram(cxlmd, type, rec);
- } else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID)) {
- struct cxl_event_mem_module *rec =
- (struct cxl_event_mem_module *)record;
-
- trace_cxl_memory_module(cxlmd, type, rec);
- } else {
- /* For unknown record types print just the header */
- trace_cxl_generic_event(cxlmd, type, id, record);
- }
+ if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID))
+ trace_cxl_general_media(cxlmd, type, &evt->gen_media);
+ else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID))
+ trace_cxl_dram(cxlmd, type, &evt->dram);
+ else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID))
+ trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
+ else
+ trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
}
static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -902,7 +893,10 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
*/
i = 0;
for (cnt = 0; cnt < total; cnt++) {
- payload->handles[i++] = get_pl->records[cnt].hdr.handle;
+ struct cxl_event_record_raw *raw = &get_pl->records[cnt];
+ struct cxl_event_generic *gen = &raw->event.generic;
+
+ payload->handles[i++] = gen->hdr.handle;
dev_dbg(mds->cxlds.dev, "Event log '%d': Clearing %u\n", log,
le16_to_cpu(payload->handles[i]));
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 312cfa9e0004..89445435303a 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -225,9 +225,9 @@ TRACE_EVENT(cxl_overflow,
TRACE_EVENT(cxl_generic_event,
TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
- const uuid_t *uuid, struct cxl_event_record_raw *rec),
+ const uuid_t *uuid, struct cxl_event_generic *gen_rec),
- TP_ARGS(cxlmd, log, uuid, rec),
+ TP_ARGS(cxlmd, log, uuid, gen_rec),
TP_STRUCT__entry(
CXL_EVT_TP_entry
@@ -235,9 +235,9 @@ TRACE_EVENT(cxl_generic_event,
),
TP_fast_assign(
- CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+ CXL_EVT_TP_fast_assign(cxlmd, log, gen_rec->hdr);
memcpy(&__entry->hdr_uuid, uuid, sizeof(uuid_t));
- memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+ memcpy(__entry->data, gen_rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
),
CXL_EVT_TP_printk("%s",
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index ebb00ead1496..18dab4d90dc8 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -18,13 +18,8 @@ struct cxl_event_record_hdr {
u8 reserved[15];
} __packed;
-/*
- * Common Event Record Format
- * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
- */
#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
-struct cxl_event_record_raw {
- uuid_t id;
+struct cxl_event_generic {
struct cxl_event_record_hdr hdr;
u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
} __packed;
@@ -97,4 +92,20 @@ struct cxl_event_mem_module {
u8 reserved[0x3d];
} __packed;
+union cxl_event {
+ struct cxl_event_generic generic;
+ struct cxl_event_gen_media gen_media;
+ struct cxl_event_dram dram;
+ struct cxl_event_mem_module mem_module;
+} __packed;
+
+/*
+ * Common Event Record Format; in event logs
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_raw {
+ uuid_t id;
+ union cxl_event event;
+} __packed;
+
#endif /* _LINUX_CXL_EVENT_H */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 9cc2b8ce1efd..35ee41e435ab 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -251,7 +251,8 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
for (i = 0; i < CXL_TEST_EVENT_CNT && !event_log_empty(log); i++) {
memcpy(&pl->records[i], event_get_current(log),
sizeof(pl->records[i]));
- pl->records[i].hdr.handle = event_get_cur_event_handle(log);
+ pl->records[i].event.generic.hdr.handle =
+ event_get_cur_event_handle(log);
log->cur_idx++;
}
@@ -339,25 +340,29 @@ static void cxl_mock_event_trigger(struct device *dev)
struct cxl_event_record_raw maint_needed = {
.id = UUID_INIT(0xBA5EBA11, 0xABCD, 0xEFEB,
0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
- .hdr = {
- .length = sizeof(struct cxl_event_record_raw),
- .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0xa5b6),
+ .event.generic = {
+ .hdr = {
+ .length = sizeof(struct cxl_event_record_raw),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_MAINT_NEEDED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xa5b6),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
},
- .data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
struct cxl_event_record_raw hardware_replace = {
.id = UUID_INIT(0xABCDEFEB, 0xBA11, 0xBA5E,
0xa5, 0x5a, 0xa5, 0x5a, 0xa5, 0xa5, 0x5a, 0xa5),
- .hdr = {
- .length = sizeof(struct cxl_event_record_raw),
- .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
- /* .handle = Set dynamically */
- .related_handle = cpu_to_le16(0xb6a5),
+ .event.generic = {
+ .hdr = {
+ .length = sizeof(struct cxl_event_record_raw),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_HW_REPLACE,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0xb6a5),
+ },
+ .data = { 0xDE, 0xAD, 0xBE, 0xEF },
},
- .data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
struct cxl_test_gen_media {
--
2.43.0
Users of pci_dev_get() can benefit from a scoped based put. Also,
locking a PCI device is often done within a single scope.
Define a pci_dev_put() free function and a PCI device lock guard. These
will initially be used in new CXL event processing code but is defined
in a separate patch for others to pickup and use/backport easier.
Cc: Bjorn Helgaas <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Changes for v5:
[Jonathan: New patch]
---
include/linux/pci.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..290d0a2651b2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
struct pci_dev *pci_dev_get(struct pci_dev *dev);
void pci_dev_put(struct pci_dev *dev);
+DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
void pci_remove_bus(struct pci_bus *b);
void pci_stop_and_remove_bus_device(struct pci_dev *dev);
void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
@@ -1871,6 +1872,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
void pci_dev_lock(struct pci_dev *dev);
int pci_dev_trylock(struct pci_dev *dev);
void pci_dev_unlock(struct pci_dev *dev);
+DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
--
2.43.0
BIOS can configure memory devices as firmware first. This will send CXL
events to the firmware instead of the OS. The firmware can then send
these events to the OS via UEFI.
UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
format for CXL Component Events. The format is mostly the same as the
CXL Common Event Record Format. The difference is the use of a GUID in
the Section Type rather than a UUID as part of the event itself.
Add GHES support to detect CXL CPER records and call a registered
callback with the event.
A notifier chain was considered for the callback but the complexity did
not justify the use case as only the CXL subsystem requires this event.
Enforce that only one callback can be registered at any time.
Cc: Ard Biesheuvel <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Ard, Due to Dan's feedback[1] the code is moved to ghes which is
entirely in Rafael's subsystem. That said I have left you on the CC so
you know what is going on.
Changes for v5
[Smita/djbw: move section type parsing and callback to ghes_do_proc()]
[Smita/djbw: Fix __packed usage]
[djbw/Jonathan: s/notifier/callback/]
[Jonathan: fix typo]
[iweiny: Eliminate parsing the section type a 2nd time]
[iweiny: update error messages]
[1] https://lore.kernel.org/all/6580b21723b2c_269bd294f8@dwillia2-mobl3.amr.corp.intel.com.notmuch/
---
drivers/acpi/apei/ghes.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/cxl-event.h | 50 +++++++++++++++++++++++++++
2 files changed, 138 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 63ad0541db38..aed465d2fd68 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -26,6 +26,7 @@
#include <linux/interrupt.h>
#include <linux/timer.h>
#include <linux/cper.h>
+#include <linux/cxl-event.h>
#include <linux/platform_device.h>
#include <linux/mutex.h>
#include <linux/ratelimit.h>
@@ -657,6 +658,78 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
schedule_work(&entry->work);
}
+/*
+ * Only a single callback can be registered for CXL CPER events.
+ */
+static DECLARE_RWSEM(cxl_cper_rw_sem);
+static cxl_cper_callback cper_callback;
+
+/* CXL Event record UUIDs are formated as GUIDs and reported in section type */
+
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID \
+ GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID \
+ GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE_GUID \
+ GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
+static void cxl_cper_post_event(enum cxl_event_type event_type,
+ struct cxl_cper_event_rec *rec)
+{
+ if (rec->hdr.length <= sizeof(rec->hdr) ||
+ rec->hdr.length > sizeof(*rec)) {
+ pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
+ rec->hdr.length);
+ return;
+ }
+
+ if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
+ pr_err(FW_WARN "CXL CPER invalid event\n");
+ return;
+ }
+
+ guard(rwsem_read)(&cxl_cper_rw_sem);
+ if (cper_callback)
+ cper_callback(event_type, rec);
+}
+
+int cxl_cper_register_callback(cxl_cper_callback callback)
+{
+ guard(rwsem_write)(&cxl_cper_rw_sem);
+ if (cper_callback)
+ return -EINVAL;
+ cper_callback = callback;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_callback, CXL);
+
+int cxl_cper_unregister_callback(cxl_cper_callback callback)
+{
+ guard(rwsem_write)(&cxl_cper_rw_sem);
+ if (callback != cper_callback)
+ return -EINVAL;
+ cper_callback = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_callback, CXL);
+
static bool ghes_do_proc(struct ghes *ghes,
const struct acpi_hest_generic_status *estatus)
{
@@ -690,6 +763,21 @@ static bool ghes_do_proc(struct ghes *ghes,
}
else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
queued = ghes_handle_arm_hw_error(gdata, sev);
+ }
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec);
+ }
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec);
+ }
+ else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec);
} else {
void *err = acpi_hest_get_payload(gdata);
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 18dab4d90dc8..71e3646f7569 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -108,4 +108,54 @@ struct cxl_event_record_raw {
union cxl_event event;
} __packed;
+enum cxl_event_type {
+ CXL_CPER_EVENT_GEN_MEDIA,
+ CXL_CPER_EVENT_DRAM,
+ CXL_CPER_EVENT_MEM_MODULE,
+};
+
+#define CPER_CXL_DEVICE_ID_VALID BIT(0)
+#define CPER_CXL_DEVICE_SN_VALID BIT(1)
+#define CPER_CXL_COMP_EVENT_LOG_VALID BIT(2)
+struct cxl_cper_event_rec {
+ struct {
+ u32 length;
+ u64 validation_bits;
+ struct cper_cxl_event_devid {
+ u16 vendor_id;
+ u16 device_id;
+ u8 func_num;
+ u8 device_num;
+ u8 bus_num;
+ u16 segment_num;
+ u16 slot_num; /* bits 2:0 reserved */
+ u8 reserved;
+ } __packed device_id;
+ struct cper_cxl_event_sn {
+ u32 lower_dw;
+ u32 upper_dw;
+ } __packed dev_serial_num;
+ } __packed hdr;
+
+ union cxl_event event;
+} __packed;
+
+typedef void (*cxl_cper_callback)(enum cxl_event_type type,
+ struct cxl_cper_event_rec *rec);
+
+#ifdef CONFIG_ACPI_APEI_GHES
+int cxl_cper_register_callback(cxl_cper_callback callback);
+int cxl_cper_unregister_callback(cxl_cper_callback callback);
+#else
+static inline int cxl_cper_register_callback(cxl_cper_callback callback)
+{
+ return 0;
+}
+
+static inline int cxl_cper_unregister_callback(cxl_cper_callback callback)
+{
+ return 0;
+}
+#endif
+
#endif /* _LINUX_CXL_EVENT_H */
--
2.43.0
If the firmware has configured CXL event support to be firmware first
the OS can process those events through CPER records. The CXL layer has
unique DPA to HPA knowledge and standard event trace parsing in place.
CPER records contain Bus, Device, Function information which can be used
to identify the PCI device which is sending the event.
Change the PCI driver registration to include registration of a CXL
CPER callback to process events through the trace subsystem.
Use new scoped based management to simplify the handling of the PCI
device object.
NOTE this patch depends on Dan's addition of a device guard[1].
[1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
---
Changes for v5:
[Smita/djbw: trace a generic UUID if the type is unknown]
[Jonathan: clean up pci and device state error handling]
[iweiny: consolidate the trace function]
---
drivers/cxl/core/mbox.c | 49 ++++++++++++++++++++++++++++-----------
drivers/cxl/cxlmem.h | 4 ++++
drivers/cxl/pci.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/cxl-event.h | 1 +
4 files changed, 98 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 06957696247b..b801faaccd45 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -836,21 +836,44 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
-static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
- enum cxl_event_log_type type,
- struct cxl_event_record_raw *record)
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ const uuid_t *uuid, union cxl_event *evt)
{
- union cxl_event *evt = &record->event;
- uuid_t *id = &record->id;
-
- if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID))
+ switch (event_type) {
+ case CXL_CPER_EVENT_GEN_MEDIA:
trace_cxl_general_media(cxlmd, type, &evt->gen_media);
- else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID))
+ break;
+ case CXL_CPER_EVENT_DRAM:
trace_cxl_dram(cxlmd, type, &evt->dram);
- else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID))
+ break;
+ case CXL_CPER_EVENT_MEM_MODULE:
trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
- else
- trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
+ break;
+ case CXL_CPER_EVENT_GENERIC:
+ default:
+ trace_cxl_generic_event(cxlmd, type, uuid, &evt->generic);
+ break;
+ }
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+
+static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ struct cxl_event_record_raw *record)
+{
+ enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
+ const uuid_t *uuid = &record->id;
+
+ if (uuid_equal(uuid, &CXL_EVENT_GEN_MEDIA_UUID))
+ ev_type = CXL_CPER_EVENT_GEN_MEDIA;
+ else if (uuid_equal(uuid, &CXL_EVENT_DRAM_UUID))
+ ev_type = CXL_CPER_EVENT_DRAM;
+ else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
+ ev_type = CXL_CPER_EVENT_MEM_MODULE;
+
+ cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
}
static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -961,8 +984,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
break;
for (i = 0; i < nr_rec; i++)
- cxl_event_trace_record(cxlmd, type,
- &payload->records[i]);
+ __cxl_event_trace_record(cxlmd, type,
+ &payload->records[i]);
if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e5d770e26e02..80076c235073 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -802,6 +802,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
+void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ const uuid_t *uuid, union cxl_event *evt);
int cxl_set_timestamp(struct cxl_memdev_state *mds);
int cxl_poison_state_init(struct cxl_memdev_state *mds);
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0155fb66b580..b14237f824cf 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only
/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#include <asm-generic/unaligned.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/moduleparam.h>
#include <linux/module.h>
@@ -969,6 +970,61 @@ static struct pci_driver cxl_pci_driver = {
},
};
+#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
+static void cxl_cper_event_call(enum cxl_event_type ev_type,
+ struct cxl_cper_event_rec *rec)
+{
+ struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
+ struct pci_dev *pdev __free(pci_dev_put) = NULL;
+ enum cxl_event_log_type log_type;
+ struct cxl_dev_state *cxlds;
+ unsigned int devfn;
+ u32 hdr_flags;
+
+ devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
+ pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
+ device_id->bus_num, devfn);
+ if (!pdev)
+ return;
+
+ guard(device)(&pdev->dev);
+ if (pdev->driver != &cxl_pci_driver)
+ return;
+
+ cxlds = pci_get_drvdata(pdev);
+ if (!cxlds)
+ return;
+
+ /* Fabricate a log type */
+ hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
+ log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
+
+ cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
+ &uuid_null, &rec->event);
+}
+
+static int __init cxl_pci_driver_init(void)
+{
+ int rc;
+
+ rc = pci_register_driver(&cxl_pci_driver);
+ if (rc)
+ return rc;
+
+ rc = cxl_cper_register_callback(cxl_cper_event_call);
+ if (rc)
+ pci_unregister_driver(&cxl_pci_driver);
+
+ return rc;
+}
+
+static void __exit cxl_pci_driver_exit(void)
+{
+ cxl_cper_unregister_callback(cxl_cper_event_call);
+ pci_unregister_driver(&cxl_pci_driver);
+}
+
+module_init(cxl_pci_driver_init);
+module_exit(cxl_pci_driver_exit);
MODULE_LICENSE("GPL v2");
-module_pci_driver(cxl_pci_driver);
MODULE_IMPORT_NS(CXL);
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 71e3646f7569..17eadee819b6 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -109,6 +109,7 @@ struct cxl_event_record_raw {
} __packed;
enum cxl_event_type {
+ CXL_CPER_EVENT_GENERIC,
CXL_CPER_EVENT_GEN_MEDIA,
CXL_CPER_EVENT_DRAM,
CXL_CPER_EVENT_MEM_MODULE,
--
2.43.0
Hi Ira,
I tested these patches. It works as expected.
Tested-by: Smita-Koralahalli <[email protected]>
Reviewed-by: Smita-Koralahalli <[email protected]>
Since, the trace support for FW-First Protocol errors are missing I
wrote a patch for it. I reused the existing registered callback
cxl_cper_callback making some changes to it. Please take a look and let
me know what you think. I'm not sure if its appropriate to reuse the
existing callback or define a new one..
https://lore.kernel.org/linux-cxl/[email protected]/T/#t
Thanks,
Smita
On 12/20/2023 4:17 PM, Ira Weiny wrote:
> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change the PCI driver registration to include registration of a CXL
> CPER callback to process events through the trace subsystem.
>
> Use new scoped based management to simplify the handling of the PCI
> device object.
>
> NOTE this patch depends on Dan's addition of a device guard[1].
>
> [1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
>
> ---
> Changes for v5:
> [Smita/djbw: trace a generic UUID if the type is unknown]
> [Jonathan: clean up pci and device state error handling]
> [iweiny: consolidate the trace function]
> ---
> drivers/cxl/core/mbox.c | 49 ++++++++++++++++++++++++++++-----------
> drivers/cxl/cxlmem.h | 4 ++++
> drivers/cxl/pci.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/cxl-event.h | 1 +
> 4 files changed, 98 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 06957696247b..b801faaccd45 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -836,21 +836,44 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> -static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - struct cxl_event_record_raw *record)
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt)
> {
> - union cxl_event *evt = &record->event;
> - uuid_t *id = &record->id;
> -
> - if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID))
> + switch (event_type) {
> + case CXL_CPER_EVENT_GEN_MEDIA:
> trace_cxl_general_media(cxlmd, type, &evt->gen_media);
> - else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID))
> + break;
> + case CXL_CPER_EVENT_DRAM:
> trace_cxl_dram(cxlmd, type, &evt->dram);
> - else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID))
> + break;
> + case CXL_CPER_EVENT_MEM_MODULE:
> trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
> - else
> - trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
> + break;
> + case CXL_CPER_EVENT_GENERIC:
> + default:
> + trace_cxl_generic_event(cxlmd, type, uuid, &evt->generic);
> + break;
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +
> +static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> +{
> + enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
> + const uuid_t *uuid = &record->id;
> +
> + if (uuid_equal(uuid, &CXL_EVENT_GEN_MEDIA_UUID))
> + ev_type = CXL_CPER_EVENT_GEN_MEDIA;
> + else if (uuid_equal(uuid, &CXL_EVENT_DRAM_UUID))
> + ev_type = CXL_CPER_EVENT_DRAM;
> + else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
> + ev_type = CXL_CPER_EVENT_MEM_MODULE;
> +
> + cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
> }
>
> static int cxl_clear_event_record(struct cxl_memdev_state *mds,
> @@ -961,8 +984,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> break;
>
> for (i = 0; i < nr_rec; i++)
> - cxl_event_trace_record(cxlmd, type,
> - &payload->records[i]);
> + __cxl_event_trace_record(cxlmd, type,
> + &payload->records[i]);
>
> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_overflow(cxlmd, type, payload);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index e5d770e26e02..80076c235073 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -802,6 +802,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0155fb66b580..b14237f824cf 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include <asm-generic/unaligned.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/moduleparam.h>
> #include <linux/module.h>
> @@ -969,6 +970,61 @@ static struct pci_driver cxl_pci_driver = {
> },
> };
>
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> + struct cxl_cper_event_rec *rec)
> +{
> + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> + enum cxl_event_log_type log_type;
> + struct cxl_dev_state *cxlds;
> + unsigned int devfn;
> + u32 hdr_flags;
> +
> + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> + device_id->bus_num, devfn);
> + if (!pdev)
> + return;
> +
> + guard(device)(&pdev->dev);
> + if (pdev->driver != &cxl_pci_driver)
> + return;
> +
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;
> +
> + /* Fabricate a log type */
> + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
> + &uuid_null, &rec->event);
> +}
> +
> +static int __init cxl_pci_driver_init(void)
> +{
> + int rc;
> +
> + rc = pci_register_driver(&cxl_pci_driver);
> + if (rc)
> + return rc;
> +
> + rc = cxl_cper_register_callback(cxl_cper_event_call);
> + if (rc)
> + pci_unregister_driver(&cxl_pci_driver);
> +
> + return rc;
> +}
> +
> +static void __exit cxl_pci_driver_exit(void)
> +{
> + cxl_cper_unregister_callback(cxl_cper_event_call);
> + pci_unregister_driver(&cxl_pci_driver);
> +}
> +
> +module_init(cxl_pci_driver_init);
> +module_exit(cxl_pci_driver_exit);
> MODULE_LICENSE("GPL v2");
> -module_pci_driver(cxl_pci_driver);
> MODULE_IMPORT_NS(CXL);
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 71e3646f7569..17eadee819b6 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -109,6 +109,7 @@ struct cxl_event_record_raw {
> } __packed;
>
> enum cxl_event_type {
> + CXL_CPER_EVENT_GENERIC,
> CXL_CPER_EVENT_GEN_MEDIA,
> CXL_CPER_EVENT_DRAM,
> CXL_CPER_EVENT_MEM_MODULE,
>
Smita Koralahalli wrote:
> Hi Ira,
>
> I tested these patches. It works as expected.
>
> Tested-by: Smita-Koralahalli <[email protected]>
> Reviewed-by: Smita-Koralahalli <[email protected]>
Thank you!
>
> Since, the trace support for FW-First Protocol errors are missing I
> wrote a patch for it. I reused the existing registered callback
> cxl_cper_callback making some changes to it. Please take a look and let
> me know what you think. I'm not sure if its appropriate to reuse the
> existing callback or define a new one..
>
> https://lore.kernel.org/linux-cxl/[email protected]/T/#t
Awesome! Yea I just went through it.
Thank you again for all the testing!
Ira
[snip]
Ira Weiny wrote:
> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change the PCI driver registration to include registration of a CXL
> CPER callback to process events through the trace subsystem.
>
> Use new scoped based management to simplify the handling of the PCI
> device object.
>
> NOTE this patch depends on Dan's addition of a device guard[1].
Now that you added guard(pci_dev) earlier in the series you can just use
that here rather than guard(device).
[..]
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> + struct cxl_cper_event_rec *rec)
> +{
> + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> + enum cxl_event_log_type log_type;
> + struct cxl_dev_state *cxlds;
> + unsigned int devfn;
> + u32 hdr_flags;
> +
> + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> + device_id->bus_num, devfn);
> + if (!pdev)
> + return;
> +
> + guard(device)(&pdev->dev);
Per above:
guard(pci_dev)(pdev);
> + if (pdev->driver != &cxl_pci_driver)
> + return;
> +
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;
> +
> + /* Fabricate a log type */
> + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
> + &uuid_null, &rec->event);
> +}
> +
> +static int __init cxl_pci_driver_init(void)
> +{
> + int rc;
> +
> + rc = pci_register_driver(&cxl_pci_driver);
> + if (rc)
> + return rc;
> +
> + rc = cxl_cper_register_callback(cxl_cper_event_call);
> + if (rc)
> + pci_unregister_driver(&cxl_pci_driver);
I think this order should be flipped. That way any errors that might
arrive due to activity caused by probing have a chance to be serviced.
Any that fire while initial probing is happening will pause in
cxl_cper_event_call() and wait for probing to complete. Of course if
probing fails, all is lost, but I think there is some incremental
benefit to trying to catch those early records for things that are not
probing fatal.
[ add linux-pci ]
Ira Weiny wrote:
> Users of pci_dev_get() can benefit from a scoped based put. Also,
> locking a PCI device is often done within a single scope.
>
> Define a pci_dev_put() free function and a PCI device lock guard. These
> will initially be used in new CXL event processing code but is defined
> in a separate patch for others to pickup and use/backport easier.
Hi Bjorn,
Any heartburn if I take this through cxl.git with the rest in this
series? Patch 9 has a dependency on this one.
>
> Cc: Bjorn Helgaas <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes for v5:
> [Jonathan: New patch]
> ---
> include/linux/pci.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 60ca768bc867..290d0a2651b2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> struct pci_dev *pci_dev_get(struct pci_dev *dev);
> void pci_dev_put(struct pci_dev *dev);
> +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> void pci_remove_bus(struct pci_bus *b);
> void pci_stop_and_remove_bus_device(struct pci_dev *dev);
> void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
> @@ -1871,6 +1872,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
> void pci_dev_lock(struct pci_dev *dev);
> int pci_dev_trylock(struct pci_dev *dev);
> void pci_dev_unlock(struct pci_dev *dev);
> +DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
>
> /*
> * PCI domain support. Sometimes called PCI segment (eg by ACPI),
>
> --
> 2.43.0
>
On Wed, Jan 03, 2024 at 02:38:57PM -0800, Dan Williams wrote:
> Ira Weiny wrote:
> > Users of pci_dev_get() can benefit from a scoped based put. Also,
> > locking a PCI device is often done within a single scope.
> >
> > Define a pci_dev_put() free function and a PCI device lock guard. These
> > will initially be used in new CXL event processing code but is defined
> > in a separate patch for others to pickup and use/backport easier.
>
> Any heartburn if I take this through cxl.git with the rest in this
> series? Patch 9 has a dependency on this one.
No real heartburn. I was trying to figure out what this does
since I'm not familiar with the "scoped based put" idea and
'git grep -i "scope.*base"' wasn't much help.
I would kind of like the commit log to say a little more about what
the "scoped based put" (does that have too many past tenses in it?)
means and how users of pci_dev_get() will benefit.
I don't know what "locking a PCI device is often done within a single
scope" is trying to tell me either. What if it's *not* done within a
single scope?
Does this change anything for callers of pci_dev_get() and
pci_dev_put()?
Does this avoid a common programming error? I just don't know what
the benefit of this is yet.
I'm sure this is really cool stuff, but there's little documentation,
few existing users, and I don't know what I need to look for when
reviewing things.
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> > u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> > struct pci_dev *pci_dev_get(struct pci_dev *dev);
> > void pci_dev_put(struct pci_dev *dev);
> > +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> > void pci_remove_bus(struct pci_bus *b);
> > void pci_stop_and_remove_bus_device(struct pci_dev *dev);
> > void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
> > @@ -1871,6 +1872,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
> > void pci_dev_lock(struct pci_dev *dev);
> > int pci_dev_trylock(struct pci_dev *dev);
> > void pci_dev_unlock(struct pci_dev *dev);
> > +DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
> >
> > /*
> > * PCI domain support. Sometimes called PCI segment (eg by ACPI),
> >
> > --
> > 2.43.0
> >
>
>
Bjorn Helgaas wrote:
> On Wed, Jan 03, 2024 at 02:38:57PM -0800, Dan Williams wrote:
> > Ira Weiny wrote:
> > > Users of pci_dev_get() can benefit from a scoped based put. Also,
> > > locking a PCI device is often done within a single scope.
> > >
> > > Define a pci_dev_put() free function and a PCI device lock guard. These
> > > will initially be used in new CXL event processing code but is defined
> > > in a separate patch for others to pickup and use/backport easier.
> >
> > Any heartburn if I take this through cxl.git with the rest in this
> > series? Patch 9 has a dependency on this one.
>
> No real heartburn. I was trying to figure out what this does
> since I'm not familiar with the "scoped based put" idea and
> 'git grep -i "scope.*base"' wasn't much help.
>
> I would kind of like the commit log to say a little more about what
> the "scoped based put" (does that have too many past tenses in it?)
> means and how users of pci_dev_get() will benefit.
That is completely fair, and I should have checked to make sure that the
changelog clarified the impact of the change.
> I don't know what "locking a PCI device is often done within a single
> scope" is trying to tell me either. What if it's *not* done within a
> single scope?
>
> Does this change anything for callers of pci_dev_get() and
> pci_dev_put()?
>
> Does this avoid a common programming error? I just don't know what
> the benefit of this is yet.
>
> I'm sure this is really cool stuff, but there's little documentation,
> few existing users, and I don't know what I need to look for when
> reviewing things.
Here a is a re-write of the changelog:
---
PCI: Introduce cleanup helpers for device reference counts and locks
The "goto error" pattern is notorious for introducing subtle resource
leaks. Use the new cleanup.h helpers for PCI device reference counts and
locks.
Similar to the new put_device() and device_lock() cleanup helpers,
__free(put_device) and guard(device), define the same for PCI devices,
__free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
need for "goto free;" and "goto unlock;" patterns. For example, A
'struct pci_dev *' instance declared as:
struct pci_dev *pdev __free(pci_dev_put) = NULL;
...will automatically call pci_dev_put() if @pdev is non-NULL when @pdev
goes out of scope (automatic variable scope). If a function wants to
invoke pci_dev_put() on error, but return @pdev on success, it can do:
return no_free_ptr(pdev);
...or:
return_ptr(pdev);
For potential cleanup opportunity there are 587 open-coded calls to
pci_dev_put() in the kernel with 65 instances within 10 lines of a goto
statement with the CXL driver threatening to add another one.
The guard() helper holds the associated lock for the remainder of the
current scope in which it was invoked. So, for example:
func(...)
{
if (...) {
...
guard(pci_dev); /* pci_dev_lock() invoked here */
...
} /* <- implied pci_dev_unlock() triggered here */
}
There are 15 invocations of pci_dev_unlock() in the kernel with 5
instances within 10 lines of a goto statement. Again, the CXL driver is
threatening to add another.
Introduce these helpers to preclude the addition of new more error prone
goto put; / goto unlock; sequences. For now, these helpers are used in
drivers/cxl/pci.c to allow ACPI error reports to be fed back into the
CXL driver associated with the PCI device identified in the report.
---
As for reviewing conversions that use these new helpers, one of the
gotchas I have found is that it is easy to make a mistake if a goto
still exists in the function after the conversion. So unless and until
all of the resources a function acquires, that currently need a goto to
unwind them on error, can be converted to cleanup.h based helpers it is
best not to mix styles.
I think the function documentation in include/linux/cleanup.h does a
decent job of explaining how to use the helpers. However, I am happy to
suggest some updates if you think it would help.
On Wed, Dec 20, 2023 at 04:17:35PM -0800, Ira Weiny wrote:
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> struct pci_dev *pci_dev_get(struct pci_dev *dev);
> void pci_dev_put(struct pci_dev *dev);
> +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
pci_dev_put() already performs a NULL pointer check internally.
Why duplicate it here?
Thanks,
Lukas
Lukas Wunner wrote:
> On Wed, Dec 20, 2023 at 04:17:35PM -0800, Ira Weiny wrote:
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> > u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> > struct pci_dev *pci_dev_get(struct pci_dev *dev);
> > void pci_dev_put(struct pci_dev *dev);
> > +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
>
> pci_dev_put() already performs a NULL pointer check internally.
> Why duplicate it here?
Greg asked the same for the introduction of __free(kvfree), and Peter
clarified:
http://lore.kernel.org/r/[email protected]
Essentially, that check is more for build-time than runtime because when
the macro is expanded the compiler can notice scenarios where @pdev is
set to NULL (likely by no_free_ptr()) and skip the call to pci_dev_put()
altogether. pci_dev_put() also happens to be out-of-line, so saving a
call when @pdev is NULL a small win in that respect as well.
On Wed, Jan 03, 2024 at 10:43:40PM -0800, Dan Williams wrote:
> Lukas Wunner wrote:
> > On Wed, Dec 20, 2023 at 04:17:35PM -0800, Ira Weiny wrote:
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> > > u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> > > struct pci_dev *pci_dev_get(struct pci_dev *dev);
> > > void pci_dev_put(struct pci_dev *dev);
> > > +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> >
> > pci_dev_put() already performs a NULL pointer check internally.
> > Why duplicate it here?
>
> Greg asked the same for the introduction of __free(kvfree), and Peter
> clarified:
>
> http://lore.kernel.org/r/[email protected]
>
> Essentially, that check is more for build-time than runtime because when
> the macro is expanded the compiler can notice scenarios where @pdev is
> set to NULL (likely by no_free_ptr()) and skip the call to pci_dev_put()
> altogether. pci_dev_put() also happens to be out-of-line, so saving a
> call when @pdev is NULL a small win in that respect as well.
Doubtful whether that's correct. The kernel is compiled with
-fno-delete-null-pointer-checks since commit a3ca86aea507
("Add '-fno-delete-null-pointer-checks' to gcc CFLAGS").
So these NULL pointer checks are generally not optimized away.
I've just responded to the discussion you've linked above:
https://lore.kernel.org/all/[email protected]/
Thanks,
Lukas
On Thu, 4 Jan 2024 at 08:02, Lukas Wunner <[email protected]> wrote:
>
> On Wed, Jan 03, 2024 at 10:43:40PM -0800, Dan Williams wrote:
> > Lukas Wunner wrote:
> > > On Wed, Dec 20, 2023 at 04:17:35PM -0800, Ira Weiny wrote:
> > > > --- a/include/linux/pci.h
> > > > +++ b/include/linux/pci.h
> > > > @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> > > > u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> > > > struct pci_dev *pci_dev_get(struct pci_dev *dev);
> > > > void pci_dev_put(struct pci_dev *dev);
> > > > +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> > >
> > > pci_dev_put() already performs a NULL pointer check internally.
> > > Why duplicate it here?
> >
> > Greg asked the same for the introduction of __free(kvfree), and Peter
> > clarified:
> >
> > http://lore.kernel.org/r/[email protected]
> >
> > Essentially, that check is more for build-time than runtime because when
> > the macro is expanded the compiler can notice scenarios where @pdev is
> > set to NULL (likely by no_free_ptr()) and skip the call to pci_dev_put()
> > altogether. pci_dev_put() also happens to be out-of-line, so saving a
> > call when @pdev is NULL a small win in that respect as well.
>
> Doubtful whether that's correct. The kernel is compiled with
> -fno-delete-null-pointer-checks since commit a3ca86aea507
> ("Add '-fno-delete-null-pointer-checks' to gcc CFLAGS").
>
> So these NULL pointer checks are generally not optimized away.
>
> I've just responded to the discussion you've linked above:
> https://lore.kernel.org/all/[email protected]/
>
AIUI, Peter is referring to constant propagation of compile time
constant pointers here, not pointer variables where the NULL check is
elided if the variable has already been dereferenced.
Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Wed, Jan 03, 2024 at 02:38:57PM -0800, Dan Williams wrote:
> > > Ira Weiny wrote:
> > > > Users of pci_dev_get() can benefit from a scoped based put. Also,
> > > > locking a PCI device is often done within a single scope.
> > > >
> > > > Define a pci_dev_put() free function and a PCI device lock guard. These
> > > > will initially be used in new CXL event processing code but is defined
> > > > in a separate patch for others to pickup and use/backport easier.
> > >
> > > Any heartburn if I take this through cxl.git with the rest in this
> > > series? Patch 9 has a dependency on this one.
> >
> > No real heartburn. I was trying to figure out what this does
> > since I'm not familiar with the "scoped based put" idea and
> > 'git grep -i "scope.*base"' wasn't much help.
> >
> > I would kind of like the commit log to say a little more about what
> > the "scoped based put" (does that have too many past tenses in it?)
> > means and how users of pci_dev_get() will benefit.
>
> That is completely fair, and I should have checked to make sure that the
> changelog clarified the impact of the change.
Agreed. Apologies for the brevity.
>
> > I don't know what "locking a PCI device is often done within a single
> > scope" is trying to tell me either. What if it's *not* done within a
> > single scope?
I was not trying to fix that but Dan covers it below indicating that the
pointer can be returned outside the scope if needed with no_free_ptr().
> >
> > Does this change anything for callers of pci_dev_get() and
> > pci_dev_put()?
Current callers don't need to use this.
> >
> > Does this avoid a common programming error? I just don't know what
> > the benefit of this is yet.
Dan covered it well below.
> >
> > I'm sure this is really cool stuff, but there's little documentation,
> > few existing users, and I don't know what I need to look for when
> > reviewing things.
>
> Here a is a re-write of the changelog:
FWIW
Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> PCI: Introduce cleanup helpers for device reference counts and locks
>
> The "goto error" pattern is notorious for introducing subtle resource
> leaks. Use the new cleanup.h helpers for PCI device reference counts and
> locks.
>
> Similar to the new put_device() and device_lock() cleanup helpers,
> __free(put_device) and guard(device), define the same for PCI devices,
> __free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
> need for "goto free;" and "goto unlock;" patterns. For example, A
> 'struct pci_dev *' instance declared as:
>
> struct pci_dev *pdev __free(pci_dev_put) = NULL;
>
> ...will automatically call pci_dev_put() if @pdev is non-NULL when @pdev
> goes out of scope (automatic variable scope). If a function wants to
> invoke pci_dev_put() on error, but return @pdev on success, it can do:
>
> return no_free_ptr(pdev);
>
> ...or:
>
> return_ptr(pdev);
>
> For potential cleanup opportunity there are 587 open-coded calls to
> pci_dev_put() in the kernel with 65 instances within 10 lines of a goto
> statement with the CXL driver threatening to add another one.
>
> The guard() helper holds the associated lock for the remainder of the
> current scope in which it was invoked. So, for example:
>
> func(...)
> {
> if (...) {
> ...
> guard(pci_dev); /* pci_dev_lock() invoked here */
> ...
> } /* <- implied pci_dev_unlock() triggered here */
> }
>
> There are 15 invocations of pci_dev_unlock() in the kernel with 5
> instances within 10 lines of a goto statement. Again, the CXL driver is
> threatening to add another.
>
> Introduce these helpers to preclude the addition of new more error prone
> goto put; / goto unlock; sequences. For now, these helpers are used in
> drivers/cxl/pci.c to allow ACPI error reports to be fed back into the
> CXL driver associated with the PCI device identified in the report.
>
> ---
>
> As for reviewing conversions that use these new helpers, one of the
> gotchas I have found is that it is easy to make a mistake if a goto
> still exists in the function after the conversion. So unless and until
> all of the resources a function acquires, that currently need a goto to
> unwind them on error, can be converted to cleanup.h based helpers it is
> best not to mix styles.
>
> I think the function documentation in include/linux/cleanup.h does a
> decent job of explaining how to use the helpers. However, I am happy to
> suggest some updates if you think it would help.
Ard Biesheuvel wrote:
> On Thu, 4 Jan 2024 at 08:02, Lukas Wunner <[email protected]> wrote:
> >
> > On Wed, Jan 03, 2024 at 10:43:40PM -0800, Dan Williams wrote:
> > > Lukas Wunner wrote:
> > > > On Wed, Dec 20, 2023 at 04:17:35PM -0800, Ira Weiny wrote:
> > > > > --- a/include/linux/pci.h
> > > > > +++ b/include/linux/pci.h
> > > > > @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> > > > > u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> > > > > struct pci_dev *pci_dev_get(struct pci_dev *dev);
> > > > > void pci_dev_put(struct pci_dev *dev);
> > > > > +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> > > >
> > > > pci_dev_put() already performs a NULL pointer check internally.
> > > > Why duplicate it here?
> > >
> > > Greg asked the same for the introduction of __free(kvfree), and Peter
> > > clarified:
> > >
> > > http://lore.kernel.org/r/[email protected]
> > >
> > > Essentially, that check is more for build-time than runtime because when
> > > the macro is expanded the compiler can notice scenarios where @pdev is
> > > set to NULL (likely by no_free_ptr()) and skip the call to pci_dev_put()
> > > altogether. pci_dev_put() also happens to be out-of-line, so saving a
> > > call when @pdev is NULL a small win in that respect as well.
> >
> > Doubtful whether that's correct. The kernel is compiled with
> > -fno-delete-null-pointer-checks since commit a3ca86aea507
> > ("Add '-fno-delete-null-pointer-checks' to gcc CFLAGS").
> >
> > So these NULL pointer checks are generally not optimized away.
> >
> > I've just responded to the discussion you've linked above:
> > https://lore.kernel.org/all/[email protected]/
> >
>
> AIUI, Peter is referring to constant propagation of compile time
> constant pointers here, not pointer variables where the NULL check is
> elided if the variable has already been dereferenced.
>
No, it is for auto (on stack) pointer variables. Consider this sequence:
struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(...);
if (!pdev)
return NULL;
if (!check_pdev(pdev))
return NULL;
return no_free_ptr(pdev);
...that expands at compile time to a first pass of:
struct pci_dev *pdev = pci_get_domain_bus_and_slot(...);
if (!pdev) {
if (pdev)
pci_dev_put(pdev);
return NULL;
}
if (!check_pdev(pdev)) {
if (pdev)
pci_dev_put(pdev);
return NULL;
}
struct pci_dev *tmp = pdev;
pdev = NULL;
if (pdev)
pci_dev_put(pdev);
return tmp;
...the compiler can then optimize this on a second pass to:
if (!pdev)
return NULL;
if (!check_pdev(pdev)) {
pci_dev_put(pdev);
return NULL;
}
return pdev;
...if the NULL check is dropped from DEFINE_FREE(pci_dev_put...) then
this becomes unoptimizable by the compiler without
link-time-optimization (LTO) to see that pci_dev_put() has an internal
NULL check:
struct pci_dev *pdev = pci_get_domain_bus_and_slot(...);
if (!pdev) {
pci_dev_put(pdev);
return NULL;
}
if (!check_pdev(pdev)) {
pci_dev_put(pdev);
return NULL;
}
struct pci_dev *tmp = pdev;
pdev = NULL;
pci_dev_put(pdev);
return tmp;
Now, if pci_dev_put() would become a static inline the compiler could
again do the optimization, but it is otherwise free (post compiler
optimization) to keep a conditional in these DEFINE_FREE() instances and
not worry about whether the actual free routine is inline, out-of-line,
or has its own NULL check.
Ira Weiny wrote:
> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change the PCI driver registration to include registration of a CXL
> CPER callback to process events through the trace subsystem.
>
> Use new scoped based management to simplify the handling of the PCI
> device object.
>
> NOTE this patch depends on Dan's addition of a device guard[1].
>
> [1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
Somehow this patch lost my signed off by line from V4.
Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes for v5:
> [Smita/djbw: trace a generic UUID if the type is unknown]
> [Jonathan: clean up pci and device state error handling]
> [iweiny: consolidate the trace function]
> ---
> drivers/cxl/core/mbox.c | 49 ++++++++++++++++++++++++++++-----------
> drivers/cxl/cxlmem.h | 4 ++++
> drivers/cxl/pci.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/cxl-event.h | 1 +
> 4 files changed, 98 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 06957696247b..b801faaccd45 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -836,21 +836,44 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> -static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - struct cxl_event_record_raw *record)
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt)
> {
> - union cxl_event *evt = &record->event;
> - uuid_t *id = &record->id;
> -
> - if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID))
> + switch (event_type) {
> + case CXL_CPER_EVENT_GEN_MEDIA:
> trace_cxl_general_media(cxlmd, type, &evt->gen_media);
> - else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID))
> + break;
> + case CXL_CPER_EVENT_DRAM:
> trace_cxl_dram(cxlmd, type, &evt->dram);
> - else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID))
> + break;
> + case CXL_CPER_EVENT_MEM_MODULE:
> trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
> - else
> - trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
> + break;
> + case CXL_CPER_EVENT_GENERIC:
> + default:
> + trace_cxl_generic_event(cxlmd, type, uuid, &evt->generic);
> + break;
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +
> +static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> +{
> + enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
> + const uuid_t *uuid = &record->id;
> +
> + if (uuid_equal(uuid, &CXL_EVENT_GEN_MEDIA_UUID))
> + ev_type = CXL_CPER_EVENT_GEN_MEDIA;
> + else if (uuid_equal(uuid, &CXL_EVENT_DRAM_UUID))
> + ev_type = CXL_CPER_EVENT_DRAM;
> + else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
> + ev_type = CXL_CPER_EVENT_MEM_MODULE;
> +
> + cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
> }
>
> static int cxl_clear_event_record(struct cxl_memdev_state *mds,
> @@ -961,8 +984,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> break;
>
> for (i = 0; i < nr_rec; i++)
> - cxl_event_trace_record(cxlmd, type,
> - &payload->records[i]);
> + __cxl_event_trace_record(cxlmd, type,
> + &payload->records[i]);
>
> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_overflow(cxlmd, type, payload);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index e5d770e26e02..80076c235073 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -802,6 +802,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 0155fb66b580..b14237f824cf 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +#include <asm-generic/unaligned.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/moduleparam.h>
> #include <linux/module.h>
> @@ -969,6 +970,61 @@ static struct pci_driver cxl_pci_driver = {
> },
> };
>
> +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> + struct cxl_cper_event_rec *rec)
> +{
> + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> + enum cxl_event_log_type log_type;
> + struct cxl_dev_state *cxlds;
> + unsigned int devfn;
> + u32 hdr_flags;
> +
> + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> + device_id->bus_num, devfn);
> + if (!pdev)
> + return;
> +
> + guard(device)(&pdev->dev);
> + if (pdev->driver != &cxl_pci_driver)
> + return;
> +
> + cxlds = pci_get_drvdata(pdev);
> + if (!cxlds)
> + return;
> +
> + /* Fabricate a log type */
> + hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> + log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
> +
> + cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
> + &uuid_null, &rec->event);
> +}
> +
> +static int __init cxl_pci_driver_init(void)
> +{
> + int rc;
> +
> + rc = pci_register_driver(&cxl_pci_driver);
> + if (rc)
> + return rc;
> +
> + rc = cxl_cper_register_callback(cxl_cper_event_call);
> + if (rc)
> + pci_unregister_driver(&cxl_pci_driver);
> +
> + return rc;
> +}
> +
> +static void __exit cxl_pci_driver_exit(void)
> +{
> + cxl_cper_unregister_callback(cxl_cper_event_call);
> + pci_unregister_driver(&cxl_pci_driver);
> +}
> +
> +module_init(cxl_pci_driver_init);
> +module_exit(cxl_pci_driver_exit);
> MODULE_LICENSE("GPL v2");
> -module_pci_driver(cxl_pci_driver);
> MODULE_IMPORT_NS(CXL);
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 71e3646f7569..17eadee819b6 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -109,6 +109,7 @@ struct cxl_event_record_raw {
> } __packed;
>
> enum cxl_event_type {
> + CXL_CPER_EVENT_GENERIC,
> CXL_CPER_EVENT_GEN_MEDIA,
> CXL_CPER_EVENT_DRAM,
> CXL_CPER_EVENT_MEM_MODULE,
>
> --
> 2.43.0
>
On Wed, Jan 03, 2024 at 04:21:02PM -0800, Dan Williams wrote:
> Bjorn Helgaas wrote:
> > On Wed, Jan 03, 2024 at 02:38:57PM -0800, Dan Williams wrote:
> > > Ira Weiny wrote:
> > > > Users of pci_dev_get() can benefit from a scoped based put. Also,
> > > > locking a PCI device is often done within a single scope.
> > > >
> > > > Define a pci_dev_put() free function and a PCI device lock guard. These
> > > > will initially be used in new CXL event processing code but is defined
> > > > in a separate patch for others to pickup and use/backport easier.
> > >
> > > Any heartburn if I take this through cxl.git with the rest in this
> > > series? Patch 9 has a dependency on this one.
> >
> > No real heartburn. I was trying to figure out what this does
> > since I'm not familiar with the "scoped based put" idea and
> > 'git grep -i "scope.*base"' wasn't much help.
> >
> > I would kind of like the commit log to say a little more about what
> > the "scoped based put" (does that have too many past tenses in it?)
> > means and how users of pci_dev_get() will benefit.
>
> That is completely fair, and I should have checked to make sure that the
> changelog clarified the impact of the change.
I see "scoped based put" follows a similar use in cleanup.h, but I
think "scope-based X" reads better.
> > I don't know what "locking a PCI device is often done within a single
> > scope" is trying to tell me either. What if it's *not* done within a
> > single scope?
> >
> > Does this change anything for callers of pci_dev_get() and
> > pci_dev_put()?
> >
> > Does this avoid a common programming error? I just don't know what
> > the benefit of this is yet.
> >
> > I'm sure this is really cool stuff, but there's little documentation,
> > few existing users, and I don't know what I need to look for when
> > reviewing things.
>
> Here a is a re-write of the changelog:
>
> ---
> PCI: Introduce cleanup helpers for device reference counts and locks
>
> The "goto error" pattern is notorious for introducing subtle resource
> leaks. Use the new cleanup.h helpers for PCI device reference counts and
> locks.
>
> Similar to the new put_device() and device_lock() cleanup helpers,
> __free(put_device) and guard(device), define the same for PCI devices,
> __free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
> need for "goto free;" and "goto unlock;" patterns. For example, A
> 'struct pci_dev *' instance declared as:
>
> struct pci_dev *pdev __free(pci_dev_put) = NULL;
I see several similar __free() uses with NULL initializations in gpio,
but I think this idiom would be slightly improved if the __free()
function were more closely associated with the actual pci_dev_get():
struct pci_dev *pdev __free(pci_dev_put) = pci_get_device(...);
Not always possible, I know, but easier to analyze when it is.
> ...will automatically call pci_dev_put() if @pdev is non-NULL when @pdev
> goes out of scope (automatic variable scope). If a function wants to
> invoke pci_dev_put() on error, but return @pdev on success, it can do:
>
> return no_free_ptr(pdev);
>
> ...or:
>
> return_ptr(pdev);
>
> For potential cleanup opportunity there are 587 open-coded calls to
> pci_dev_put() in the kernel with 65 instances within 10 lines of a goto
> statement with the CXL driver threatening to add another one.
>
> The guard() helper holds the associated lock for the remainder of the
> current scope in which it was invoked. So, for example:
>
> func(...)
> {
> if (...) {
> ...
> guard(pci_dev); /* pci_dev_lock() invoked here */
> ...
> } /* <- implied pci_dev_unlock() triggered here */
> }
Thanks for this! I had skimmed cleanup.h previously, but it makes a
lot more sense after your description here.
I think a little introduction along these lines would be even more
useful in cleanup.h since the concept is general and not PCI-specific.
E.g., the motivation (avoid resource leaks with "goto error" pattern),
a definition of "__free() based cleanup function" (IIUC, a function to
be run when a variable goes out of scope), maybe something about
ordering (it's important in the "goto error" pattern that the cleanups
are done in a specific order; how does that translate to __free()?)
But the commit log above is fine with me. (It does contain tabs,
which get slightly mangled when "git log" indents it.)
> There are 15 invocations of pci_dev_unlock() in the kernel with 5
> instances within 10 lines of a goto statement. Again, the CXL driver is
> threatening to add another.
>
> Introduce these helpers to preclude the addition of new more error prone
> goto put; / goto unlock; sequences. For now, these helpers are used in
> drivers/cxl/pci.c to allow ACPI error reports to be fed back into the
> CXL driver associated with the PCI device identified in the report.
This part is also fine but doesn't seem strictly necessary to me. I
think the part about error reports being fed back needs a lot more
background to understand the connection, and probably only makes sense
in the context of that patch.
> As for reviewing conversions that use these new helpers, one of the
> gotchas I have found is that it is easy to make a mistake if a goto
> still exists in the function after the conversion. So unless and until
> all of the resources a function acquires, that currently need a goto to
> unwind them on error, can be converted to cleanup.h based helpers it is
> best not to mix styles.
>
> I think the function documentation in include/linux/cleanup.h does a
> decent job of explaining how to use the helpers. However, I am happy to
> suggest some updates if you think it would help.
Thanks, Dan!
Acked-by: Bjorn Helgaas <[email protected]>
Bjorn Helgaas wrote:
[..]
> > ---
> > PCI: Introduce cleanup helpers for device reference counts and locks
> >
> > The "goto error" pattern is notorious for introducing subtle resource
> > leaks. Use the new cleanup.h helpers for PCI device reference counts and
> > locks.
> >
> > Similar to the new put_device() and device_lock() cleanup helpers,
> > __free(put_device) and guard(device), define the same for PCI devices,
> > __free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
> > need for "goto free;" and "goto unlock;" patterns. For example, A
> > 'struct pci_dev *' instance declared as:
> >
> > struct pci_dev *pdev __free(pci_dev_put) = NULL;
>
> I see several similar __free() uses with NULL initializations in gpio,
> but I think this idiom would be slightly improved if the __free()
> function were more closely associated with the actual pci_dev_get():
>
> struct pci_dev *pdev __free(pci_dev_put) = pci_get_device(...);
>
> Not always possible, I know, but easier to analyze when it is.
I tend to agree, but it does lead to some long lines, for example:
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4fd1f207c84e..549ba4b8294e 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -975,15 +975,14 @@ static void cxl_cper_event_call(enum cxl_event_type ev_type,
struct cxl_cper_event_rec *rec)
{
struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
- struct pci_dev *pdev __free(pci_dev_put) = NULL;
enum cxl_event_log_type log_type;
struct cxl_dev_state *cxlds;
unsigned int devfn;
u32 hdr_flags;
devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
- pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
- device_id->bus_num, devfn);
+ struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
+ device_id->segment_num, device_id->bus_num, devfn);
if (!pdev)
return;
...so I think people are choosing the "... __free(x) = NULL;" style for
code density readability.
>
> > ...will automatically call pci_dev_put() if @pdev is non-NULL when @pdev
> > goes out of scope (automatic variable scope). If a function wants to
> > invoke pci_dev_put() on error, but return @pdev on success, it can do:
> >
> > return no_free_ptr(pdev);
> >
> > ...or:
> >
> > return_ptr(pdev);
> >
> > For potential cleanup opportunity there are 587 open-coded calls to
> > pci_dev_put() in the kernel with 65 instances within 10 lines of a goto
> > statement with the CXL driver threatening to add another one.
> >
> > The guard() helper holds the associated lock for the remainder of the
> > current scope in which it was invoked. So, for example:
> >
> > func(...)
> > {
> > if (...) {
> > ...
> > guard(pci_dev); /* pci_dev_lock() invoked here */
> > ...
> > } /* <- implied pci_dev_unlock() triggered here */
> > }
>
> Thanks for this! I had skimmed cleanup.h previously, but it makes a
> lot more sense after your description here.
>
> I think a little introduction along these lines would be even more
> useful in cleanup.h since the concept is general and not PCI-specific.
Ok, let me ponder an update here.
> E.g., the motivation (avoid resource leaks with "goto error" pattern),
> a definition of "__free() based cleanup function" (IIUC, a function to
> be run when a variable goes out of scope), maybe something about
> ordering (it's important in the "goto error" pattern that the cleanups
> are done in a specific order; how does that translate to __free()?)
The __free() callbacks are invoked in reverse declaration (FILO) order.
However, as I say that another reviewer recommendation falls out. Be
careful about the variable declaration order diverging from the init
order.
I.e. save the reader from needing to wonder if there are intra variable
init order dependencies by making it clear that init order ==
declaration order.
>
> But the commit log above is fine with me. (It does contain tabs,
> which get slightly mangled when "git log" indents it.)
>
> > There are 15 invocations of pci_dev_unlock() in the kernel with 5
> > instances within 10 lines of a goto statement. Again, the CXL driver is
> > threatening to add another.
> >
> > Introduce these helpers to preclude the addition of new more error prone
> > goto put; / goto unlock; sequences. For now, these helpers are used in
> > drivers/cxl/pci.c to allow ACPI error reports to be fed back into the
> > CXL driver associated with the PCI device identified in the report.
>
> This part is also fine but doesn't seem strictly necessary to me. I
> think the part about error reports being fed back needs a lot more
> background to understand the connection, and probably only makes sense
> in the context of that patch.
Sure I can trim that out and just say that the CXL driver is one such
occasion where a new goto for pci_dev_put() and pci_dev_unlock() was
about to be introduced.
> > As for reviewing conversions that use these new helpers, one of the
> > gotchas I have found is that it is easy to make a mistake if a goto
> > still exists in the function after the conversion. So unless and until
> > all of the resources a function acquires, that currently need a goto to
> > unwind them on error, can be converted to cleanup.h based helpers it is
> > best not to mix styles.
> >
> > I think the function documentation in include/linux/cleanup.h does a
> > decent job of explaining how to use the helpers. However, I am happy to
> > suggest some updates if you think it would help.
>
> Thanks, Dan!
>
> Acked-by: Bjorn Helgaas <[email protected]>
Thanks, Bjorn!
Dan Williams wrote:
> Bjorn Helgaas wrote:
> [..]
> > > ---
> > > PCI: Introduce cleanup helpers for device reference counts and locks
> > >
> > > The "goto error" pattern is notorious for introducing subtle resource
> > > leaks. Use the new cleanup.h helpers for PCI device reference counts and
> > > locks.
> > >
> > > Similar to the new put_device() and device_lock() cleanup helpers,
> > > __free(put_device) and guard(device), define the same for PCI devices,
> > > __free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
> > > need for "goto free;" and "goto unlock;" patterns. For example, A
> > > 'struct pci_dev *' instance declared as:
> > >
> > > struct pci_dev *pdev __free(pci_dev_put) = NULL;
> >
> > I see several similar __free() uses with NULL initializations in gpio,
> > but I think this idiom would be slightly improved if the __free()
> > function were more closely associated with the actual pci_dev_get():
> >
> > struct pci_dev *pdev __free(pci_dev_put) = pci_get_device(...);
> >
> > Not always possible, I know, but easier to analyze when it is.
>
> I tend to agree, but it does lead to some long lines, for example:
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 4fd1f207c84e..549ba4b8294e 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -975,15 +975,14 @@ static void cxl_cper_event_call(enum cxl_event_type ev_type,
> struct cxl_cper_event_rec *rec)
> {
> struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> - struct pci_dev *pdev __free(pci_dev_put) = NULL;
> enum cxl_event_log_type log_type;
> struct cxl_dev_state *cxlds;
> unsigned int devfn;
> u32 hdr_flags;
>
> devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> - pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> - device_id->bus_num, devfn);
> + struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
> + device_id->segment_num, device_id->bus_num, devfn);
> if (!pdev)
> return;
>
> ...so I think people are choosing the "... __free(x) = NULL;" style for
> code density readability.
>
Also in this case we need devfn assigned first.
Is the above patch compliant with current style guidelines?
Or would it be better to do?
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index b14237f824cf..8a180c6abb67 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -975,15 +975,14 @@ static void cxl_cper_event_call(enum cxl_event_type ev_type,
struct cxl_cper_event_rec *rec)
{
struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
- struct pci_dev *pdev __free(pci_dev_put) = NULL;
enum cxl_event_log_type log_type;
struct cxl_dev_state *cxlds;
- unsigned int devfn;
+ unsigned int devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
+ struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
+ device_id->segment_num,
+ device_id->bus_num, devfn);
u32 hdr_flags;
- devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
- pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
- device_id->bus_num, devfn);
if (!pdev)
return;
Ira
On Thu, Jan 04, 2024 at 01:46:56PM -0800, Ira Weiny wrote:
> Dan Williams wrote:
> > Bjorn Helgaas wrote:
> > [..]
> > > > ---
> > > > PCI: Introduce cleanup helpers for device reference counts and locks
> > > >
> > > > The "goto error" pattern is notorious for introducing subtle resource
> > > > leaks. Use the new cleanup.h helpers for PCI device reference counts and
> > > > locks.
> > > >
> > > > Similar to the new put_device() and device_lock() cleanup helpers,
> > > > __free(put_device) and guard(device), define the same for PCI devices,
> > > > __free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
> > > > need for "goto free;" and "goto unlock;" patterns. For example, A
> > > > 'struct pci_dev *' instance declared as:
> > > >
> > > > struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > >
> > > I see several similar __free() uses with NULL initializations in gpio,
> > > but I think this idiom would be slightly improved if the __free()
> > > function were more closely associated with the actual pci_dev_get():
> > >
> > > struct pci_dev *pdev __free(pci_dev_put) = pci_get_device(...);
> > >
> > > Not always possible, I know, but easier to analyze when it is.
> >
> > I tend to agree, but it does lead to some long lines, for example:
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index 4fd1f207c84e..549ba4b8294e 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -975,15 +975,14 @@ static void cxl_cper_event_call(enum cxl_event_type ev_type,
> > struct cxl_cper_event_rec *rec)
> > {
> > struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> > - struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > enum cxl_event_log_type log_type;
> > struct cxl_dev_state *cxlds;
> > unsigned int devfn;
> > u32 hdr_flags;
> >
> > devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > - pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > - device_id->bus_num, devfn);
> > + struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
> > + device_id->segment_num, device_id->bus_num, devfn);
> > if (!pdev)
> > return;
> >
> > ...so I think people are choosing the "... __free(x) = NULL;" style for
> > code density readability.
> >
>
> Also in this case we need devfn assigned first.
>
> Is the above patch compliant with current style guidelines?
>
> Or would it be better to do?
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index b14237f824cf..8a180c6abb67 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -975,15 +975,14 @@ static void cxl_cper_event_call(enum cxl_event_type ev_type,
> struct cxl_cper_event_rec *rec)
> {
> struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> - struct pci_dev *pdev __free(pci_dev_put) = NULL;
> enum cxl_event_log_type log_type;
> struct cxl_dev_state *cxlds;
> - unsigned int devfn;
> + unsigned int devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> + struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
> + device_id->segment_num,
> + device_id->bus_num, devfn);
I don't really care about this specific instance; my comment was more
about the commit log for the "Define scope based management functions"
patch, thinking maybe the example could encourage get/put togetherness
when it's practical.
Bjorn
On Wed, Dec 20, 2023 at 04:17:27PM -0800, Ira Weiny wrote:
> cxl/trace: Pass uuid explicitly to event traces
Nit: s/uuid/UUID/ would match the patches below
> cxl/events: Promote CXL event structures to a core header
> cxl/events: Create common event UUID defines
> cxl/events: Remove passing a UUID to known event traces
> cxl/events: Separate UUID from event structures
> cxl/events: Create a CXL event union
> acpi/ghes: Process CXL Component Events
> PCI: Define scoped based management functions
"scope based" unless I'm misunderstanding something.
Maybe "cleanup and guard functions"? "management" is pretty generic.
Bjorn Helgaas wrote:
> On Thu, Jan 04, 2024 at 01:46:56PM -0800, Ira Weiny wrote:
> > Dan Williams wrote:
> > > Bjorn Helgaas wrote:
[snip]
> >
> > Also in this case we need devfn assigned first.
> >
> > Is the above patch compliant with current style guidelines?
> >
> > Or would it be better to do?
> >
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index b14237f824cf..8a180c6abb67 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -975,15 +975,14 @@ static void cxl_cper_event_call(enum cxl_event_type ev_type,
> > struct cxl_cper_event_rec *rec)
> > {
> > struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> > - struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > enum cxl_event_log_type log_type;
> > struct cxl_dev_state *cxlds;
> > - unsigned int devfn;
> > + unsigned int devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > + struct pci_dev *pdev __free(pci_dev_put) = pci_get_domain_bus_and_slot(
> > + device_id->segment_num,
> > + device_id->bus_num, devfn);
>
> I don't really care about this specific instance; my comment was more
> about the commit log for the "Define scope based management functions"
> patch, thinking maybe the example could encourage get/put togetherness
> when it's practical.
Ok thanks!
Ira
On Wed, 20 Dec 2023 16:17:28 -0800
Ira Weiny <[email protected]> wrote:
> CXL CPER events are identified by the CPER Section Type GUID. The GUID
> correlates with the CXL UUID for the event record. It turns out that a
> CXL CPER record is a strict subset of the CXL event record, only the
> UUID header field is chopped.
>
> In order to unify handling between native and CPER flavors of CXL
> events, prepare the code for the UUID to be passed in rather than
> inferred from the record itself.
>
> Later patches update the passed in record to only refer to the common
> data between the formats.
>
> Pass the UUID explicitly to each trace event to be able to remove the
> UUID from the event structures.
>
> Originally it was desirable to remove the UUID from the well known event
> because the UUID value was redundant. However, the trace API was
> already in place.[1]
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Signed-off-by: Ira Weiny <[email protected]>
LGTM
Reviewed-by: Jonathan Cameron <[email protected]>
On Wed, 20 Dec 2023 16:17:29 -0800
Ira Weiny <[email protected]> wrote:
> UEFI code can process CXL events through CPER records. Those records
> use almost the same format as the CXL events.
>
> Lift the CXL event structures to a core header to be shared in later
> patches.
>
> Signed-off-by: Ira Weiny <[email protected]>
As a side note, seems we need some updates in here for 3.1 additions. Job
for another day.
One trivial comment on 'extra' docs that will probably bit rot.
Otherwise
Reviewed-by: Jonathan Cameron <[email protected]>
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> new file mode 100644
> index 000000000000..1c94e8fdd227
> --- /dev/null
> +++ b/include/linux/cxl-event.h
> @@ -0,0 +1,100 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_CXL_EVENT_H
> +#define _LINUX_CXL_EVENT_H
> +
> +/*
> + * CXL event records; CXL rev 3.0
That version number is nearly guaranteed to bit rot. I'd just
not mention it here and keep the versioning local to
the particular entries (where it is currently repeated).
> + *
> + * Copyright(c) 2023 Intel Corporation.
> + */
On Wed, 20 Dec 2023 16:17:30 -0800
Ira Weiny <[email protected]> wrote:
> Dan points out in review that the cxl_test code could be made better
> through the use of UUID's defines rather than being open coded.[1]
>
> Create UUID defines and use them rather than open coding them.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Suggested-by: Dan Williams <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
>
> ---
> Changes for v5:
> [Jonathan: eliminate the static uuid variables]
Now I'm not sure why I minded them. Ah well, good either way - sorry
for the noise!
Jonathan
On Wed, 20 Dec 2023 16:17:31 -0800
Ira Weiny <[email protected]> wrote:
> The UUID data is redundant in the known event trace types. The addition
> of static defines allows the trace macros to create the UUID data inside
> the trace thus removing unnecessary code.
>
> Have well known trace events use static data to set the uuid field based
> on the event type.
>
> Suggested-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
> TP_STRUCT__entry(
> CXL_EVT_TP_entry
> @@ -422,7 +424,8 @@ TRACE_EVENT(cxl_dram,
> ),
>
> TP_fast_assign(
> - CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
> + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
> + memcpy(&__entry->hdr_uuid, &CXL_EVENT_DRAM_UUID, sizeof(uuid_t));
Hmm. Why not
__entry->hdr_uuid = CXL_EVENT_DRAM_UUID;
?
Compiler should be able to squish the stuff in the define down to data as as the
UUID generation logic is pretty simple.
I've not emulated the cper records for these yet, so not tested that works beyond
compiling.
J
On Wed, 20 Dec 2023 16:17:32 -0800
Ira Weiny <[email protected]> wrote:
> The UEFI CXL CPER structure does not include the UUID. Now that the
> UUID is passed separately to the trace event there is no need to have
> the UUID in those structures.
>
> Move UUID from the event record header to the raw structures. Adjust
> cxl-test to Create dummy structures for creating test records.
>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
On Wed, 20 Dec 2023 16:17:33 -0800
Ira Weiny <[email protected]> wrote:
> The CXL CPER and event log records share everything but a UUID/GUID in
> their structures.
>
> Define a cxl_event union without the UUID/GUID to be shared between the
> CPER and event log record formats. Adjust the code to use this union.
>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
On Wed, 20 Dec 2023 16:17:34 -0800
Ira Weiny <[email protected]> wrote:
> BIOS can configure memory devices as firmware first. This will send CXL
> events to the firmware instead of the OS. The firmware can then send
> these events to the OS via UEFI.
>
> UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
> format for CXL Component Events. The format is mostly the same as the
> CXL Common Event Record Format. The difference is the use of a GUID in
> the Section Type rather than a UUID as part of the event itself.
>
> Add GHES support to detect CXL CPER records and call a registered
> callback with the event.
>
> A notifier chain was considered for the callback but the complexity did
> not justify the use case as only the CXL subsystem requires this event.
> Enforce that only one callback can be registered at any time.
>
> Cc: Ard Biesheuvel <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
Very annoying that CXL using UUIDS and UEFI GUIDs (so endian swapped).
Is it worth a general conversion function or just keep the data duplication?
Otherwise LGTM
Reviewed-by: Jonathan Cameron <[email protected]>
On Wed, 20 Dec 2023 16:17:35 -0800
Ira Weiny <[email protected]> wrote:
> Users of pci_dev_get() can benefit from a scoped based put. Also,
> locking a PCI device is often done within a single scope.
>
> Define a pci_dev_put() free function and a PCI device lock guard. These
> will initially be used in new CXL event processing code but is defined
> in a separate patch for others to pickup and use/backport easier.
>
> Cc: Bjorn Helgaas <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
With the extra text buried deep in the discussion LGTM.
I've not been that thorough on documenting my own similar cleanup.h
series, so might well steal some of that for the ones I have outstanding
for device_handle_put() and fwnode_handle_put() ;)
Reviewed-by: Jonathan Cameron <[email protected]>
>
> ---
> Changes for v5:
> [Jonathan: New patch]
> ---
> include/linux/pci.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 60ca768bc867..290d0a2651b2 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1170,6 +1170,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
> u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
> struct pci_dev *pci_dev_get(struct pci_dev *dev);
> void pci_dev_put(struct pci_dev *dev);
> +DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
> void pci_remove_bus(struct pci_bus *b);
> void pci_stop_and_remove_bus_device(struct pci_dev *dev);
> void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
> @@ -1871,6 +1872,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
> void pci_dev_lock(struct pci_dev *dev);
> int pci_dev_trylock(struct pci_dev *dev);
> void pci_dev_unlock(struct pci_dev *dev);
> +DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
>
> /*
> * PCI domain support. Sometimes called PCI segment (eg by ACPI),
>
On Wed, 20 Dec 2023 16:17:36 -0800
Ira Weiny <[email protected]> wrote:
> If the firmware has configured CXL event support to be firmware first
> the OS can process those events through CPER records. The CXL layer has
> unique DPA to HPA knowledge and standard event trace parsing in place.
>
> CPER records contain Bus, Device, Function information which can be used
> to identify the PCI device which is sending the event.
>
> Change the PCI driver registration to include registration of a CXL
> CPER callback to process events through the trace subsystem.
>
> Use new scoped based management to simplify the handling of the PCI
> device object.
>
> NOTE this patch depends on Dan's addition of a device guard[1].
>
> [1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
>
One trivial comment inline.
The guard change Dan suggests makes sense. Otherwise I'm fine with this.
Reviewed-by: Jonathan Cameron <[email protected]>
I'll bolt in the other stuff I need to test it from QEMU this week.
Did the protocol error first, but these are easy to add now I have
that working,
Jonathan
> ---
> Changes for v5:
> [Smita/djbw: trace a generic UUID if the type is unknown]
> [Jonathan: clean up pci and device state error handling]
> [iweiny: consolidate the trace function]
> ---
> drivers/cxl/core/mbox.c | 49 ++++++++++++++++++++++++++++-----------
> drivers/cxl/cxlmem.h | 4 ++++
> drivers/cxl/pci.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/cxl-event.h | 1 +
> 4 files changed, 98 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 06957696247b..b801faaccd45 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -836,21 +836,44 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> -static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - struct cxl_event_record_raw *record)
> +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt)
> {
> - union cxl_event *evt = &record->event;
> - uuid_t *id = &record->id;
> -
> - if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID))
> + switch (event_type) {
> + case CXL_CPER_EVENT_GEN_MEDIA:
> trace_cxl_general_media(cxlmd, type, &evt->gen_media);
> - else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID))
> + break;
Might as well return directly and save a reviewer having to check if anything else happens
after the switch
> + case CXL_CPER_EVENT_DRAM:
> trace_cxl_dram(cxlmd, type, &evt->dram);
> - else if (uuid_equal(id, &CXL_EVENT_MEM_MODULE_UUID))
> + break;
> + case CXL_CPER_EVENT_MEM_MODULE:
> trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
> - else
> - trace_cxl_generic_event(cxlmd, type, id, &evt->generic);
> + break;
> + case CXL_CPER_EVENT_GENERIC:
> + default:
> + trace_cxl_generic_event(cxlmd, type, uuid, &evt->generic);
> + break;
> + }
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
On Wed, 20 Dec 2023 16:17:27 -0800
Ira Weiny <[email protected]> wrote:
> Series status/background
> ========================
>
> Smita has been a great help with this series. Thank you again!
>
> Smita's testing found that the GHES code ended up printing the events
> twice. This version avoids the duplicate print by calling the callback
> from the GHES code instead of the EFI code as suggested by Dan.
I'm not sure this is working as intended.
There is nothing gating the call in ghes_proc() of ghes_print_estatus()
and now the EFI code handling that pretty printed things is missing we get
the horrible kernel logging for an unknown block instead.
So I think we need some minimal code in cper.c to match the guids then not
log them (on basis we are arguing there is no need for new cper records).
Otherwise we are in for some messy kernel logs
Something like:
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: recoverable
{1}[Hardware Error]: Error 0, type: recoverable
{1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
{1}[Hardware Error]: section length: 0x90
{1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
{1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
{1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
{1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
{1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
{1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
{1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
{1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
{1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
(I'm filling the record with 0s currently)
On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> On Wed, 20 Dec 2023 16:17:27 -0800
> Ira Weiny <[email protected]> wrote:
>
>> Series status/background
>> ========================
>>
>> Smita has been a great help with this series. Thank you again!
>>
>> Smita's testing found that the GHES code ended up printing the events
>> twice. This version avoids the duplicate print by calling the callback
>> from the GHES code instead of the EFI code as suggested by Dan.
>
> I'm not sure this is working as intended.
>
> There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> and now the EFI code handling that pretty printed things is missing we get
> the horrible kernel logging for an unknown block instead.
>
> So I think we need some minimal code in cper.c to match the guids then not
> log them (on basis we are arguing there is no need for new cper records).
> Otherwise we are in for some messy kernel logs
>
> Something like:
>
> {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> {1}[Hardware Error]: event severity: recoverable
> {1}[Hardware Error]: Error 0, type: recoverable
> {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> {1}[Hardware Error]: section length: 0x90
> {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
>
> (I'm filling the record with 0s currently)
Yeah, when I tested this, I thought its okay for the hexdump to be there
in dmesg from EFI as the handling is done in trace events from GHES.
If, we need to handle from EFI, then it would be a good reason to move
the GUIDs out from GHES and place it in a common location for EFI/cper
to share similar to protocol errors.
Thanks,
Smita
>
Smita Koralahalli wrote:
> On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > On Wed, 20 Dec 2023 16:17:27 -0800
> > Ira Weiny <[email protected]> wrote:
> >
> >> Series status/background
> >> ========================
> >>
> >> Smita has been a great help with this series. Thank you again!
> >>
> >> Smita's testing found that the GHES code ended up printing the events
> >> twice. This version avoids the duplicate print by calling the callback
> >> from the GHES code instead of the EFI code as suggested by Dan.
> >
> > I'm not sure this is working as intended.
> >
> > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > and now the EFI code handling that pretty printed things is missing we get
> > the horrible kernel logging for an unknown block instead.
> >
> > So I think we need some minimal code in cper.c to match the guids then not
> > log them (on basis we are arguing there is no need for new cper records).
> > Otherwise we are in for some messy kernel logs
> >
> > Something like:
> >
> > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > {1}[Hardware Error]: event severity: recoverable
> > {1}[Hardware Error]: Error 0, type: recoverable
> > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > {1}[Hardware Error]: section length: 0x90
> > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> >
> > (I'm filling the record with 0s currently)
>
> Yeah, when I tested this, I thought its okay for the hexdump to be there
> in dmesg from EFI as the handling is done in trace events from GHES.
>
> If, we need to handle from EFI, then it would be a good reason to move
> the GUIDs out from GHES and place it in a common location for EFI/cper
> to share similar to protocol errors.
Ah, yes, my expectation was more aligned with Jonathan's observation to
do the processing in GHES code *and* skip the processing in the CPER
code, something like:
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 56a5d2ef9e0a..e13e5fa4df4b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -666,30 +666,6 @@ static cxl_cper_callback cper_callback;
/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CPER_SEC_CXL_GEN_MEDIA_GUID \
- GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
-
-/*
- * DRAM Event Record
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
- */
-#define CPER_SEC_CXL_DRAM_GUID \
- GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-#define CPER_SEC_CXL_MEM_MODULE_GUID \
- GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
-
static void cxl_cper_post_event(enum cxl_event_type event_type,
struct cxl_cper_event_rec *rec)
{
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f667781..0a4eed470750 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -24,6 +24,7 @@
#include <linux/bcd.h>
#include <acpi/ghes.h>
#include <ras/ras_event.h>
+#include <linux/cxl-event.h>
#include "cper_cxl.h"
/*
@@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
cper_print_prot_err(newpfx, prot_err);
else
goto err_section_too_small;
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
+ printk("%ssection_type: CXL General Media Error\n", newpfx);
+ /* see: cxl_cper_event_call() */
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
+ printk("%ssection_type: CXL DRAM Error\n", newpfx);
+ /* see: cxl_cper_event_call() */
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ printk("%ssection_type: CXL Memory Module Error\n", newpfx);
+ /* see: cxl_cper_event_call() */
} else {
const void *err = acpi_hest_get_payload(gdata);
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 17eadee819b6..6d9a7df88d4a 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -1,12 +1,31 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_CXL_EVENT_H
#define _LINUX_CXL_EVENT_H
+#include <linux/uuid.h>
/*
- * CXL event records; CXL rev 3.0
- *
- * Copyright(c) 2023 Intel Corporation.
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID \
+ GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID \
+ GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
*/
+#define CPER_SEC_CXL_MEM_MODULE_GUID \
+ GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
struct cxl_event_record_hdr {
u8 length;
Dan Williams wrote:
> Smita Koralahalli wrote:
> > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > Ira Weiny <[email protected]> wrote:
> > >
> > >> Series status/background
> > >> ========================
> > >>
> > >> Smita has been a great help with this series. Thank you again!
> > >>
> > >> Smita's testing found that the GHES code ended up printing the events
> > >> twice. This version avoids the duplicate print by calling the callback
> > >> from the GHES code instead of the EFI code as suggested by Dan.
> > >
> > > I'm not sure this is working as intended.
> > >
> > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > and now the EFI code handling that pretty printed things is missing we get
> > > the horrible kernel logging for an unknown block instead.
> > >
> > > So I think we need some minimal code in cper.c to match the guids then not
> > > log them (on basis we are arguing there is no need for new cper records).
> > > Otherwise we are in for some messy kernel logs
> > >
> > > Something like:
> > >
> > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > {1}[Hardware Error]: event severity: recoverable
> > > {1}[Hardware Error]: Error 0, type: recoverable
> > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > {1}[Hardware Error]: section length: 0x90
> > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > >
> > > (I'm filling the record with 0s currently)
> >
> > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > in dmesg from EFI as the handling is done in trace events from GHES.
> >
> > If, we need to handle from EFI, then it would be a good reason to move
> > the GUIDs out from GHES and place it in a common location for EFI/cper
> > to share similar to protocol errors.
>
> Ah, yes, my expectation was more aligned with Jonathan's observation to
> do the processing in GHES code *and* skip the processing in the CPER
> code, something like:
>
Agreed this was intended I did not realize the above.
>
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 35c37f667781..0a4eed470750 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -24,6 +24,7 @@
> #include <linux/bcd.h>
> #include <acpi/ghes.h>
> #include <ras/ras_event.h>
> +#include <linux/cxl-event.h>
> #include "cper_cxl.h"
>
> /*
> @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> cper_print_prot_err(newpfx, prot_err);
> else
> goto err_section_too_small;
> + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> + printk("%ssection_type: CXL General Media Error\n", newpfx);
Do we want the printk's here? I did not realize that a generic event
would be printed. So intention was nothing would be done on this path.
Ira
Ira Weiny wrote:
> Dan Williams wrote:
> > Smita Koralahalli wrote:
> > > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > > Ira Weiny <[email protected]> wrote:
> > > >
> > > >> Series status/background
> > > >> ========================
> > > >>
> > > >> Smita has been a great help with this series. Thank you again!
> > > >>
> > > >> Smita's testing found that the GHES code ended up printing the events
> > > >> twice. This version avoids the duplicate print by calling the callback
> > > >> from the GHES code instead of the EFI code as suggested by Dan.
> > > >
> > > > I'm not sure this is working as intended.
> > > >
> > > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > > and now the EFI code handling that pretty printed things is missing we get
> > > > the horrible kernel logging for an unknown block instead.
> > > >
> > > > So I think we need some minimal code in cper.c to match the guids then not
> > > > log them (on basis we are arguing there is no need for new cper records).
> > > > Otherwise we are in for some messy kernel logs
> > > >
> > > > Something like:
> > > >
> > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > {1}[Hardware Error]: event severity: recoverable
> > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > > {1}[Hardware Error]: section length: 0x90
> > > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > > >
> > > > (I'm filling the record with 0s currently)
> > >
> > > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > > in dmesg from EFI as the handling is done in trace events from GHES.
> > >
> > > If, we need to handle from EFI, then it would be a good reason to move
> > > the GUIDs out from GHES and place it in a common location for EFI/cper
> > > to share similar to protocol errors.
> >
> > Ah, yes, my expectation was more aligned with Jonathan's observation to
> > do the processing in GHES code *and* skip the processing in the CPER
> > code, something like:
> >
>
> Agreed this was intended I did not realize the above.
>
> >
> > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> > index 35c37f667781..0a4eed470750 100644
> > --- a/drivers/firmware/efi/cper.c
> > +++ b/drivers/firmware/efi/cper.c
> > @@ -24,6 +24,7 @@
> > #include <linux/bcd.h>
> > #include <acpi/ghes.h>
> > #include <ras/ras_event.h>
> > +#include <linux/cxl-event.h>
> > #include "cper_cxl.h"
> >
> > /*
> > @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> > cper_print_prot_err(newpfx, prot_err);
> > else
> > goto err_section_too_small;
> > + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> > + printk("%ssection_type: CXL General Media Error\n", newpfx);
>
> Do we want the printk's here? I did not realize that a generic event
> would be printed. So intention was nothing would be done on this path.
I think we do otherwise the kernel will say
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: recoverable
{1}[Hardware Error]: Error 0, type: recoverable
...
..leaving the user hanging vs:
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: recoverable
{1}[Hardware Error]: Error 0, type: recoverable
{1}[Hardware Error]: section type: General Media Error
..as an indicator to go follow up with rasdaemon or whatever else is
doing the detailed monitoring of CXL events.
On Mon, 8 Jan 2024 18:59:16 -0800
Dan Williams <[email protected]> wrote:
> Ira Weiny wrote:
> > Dan Williams wrote:
> > > Smita Koralahalli wrote:
> > > > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > > > Ira Weiny <[email protected]> wrote:
> > > > >
> > > > >> Series status/background
> > > > >> ========================
> > > > >>
> > > > >> Smita has been a great help with this series. Thank you again!
> > > > >>
> > > > >> Smita's testing found that the GHES code ended up printing the events
> > > > >> twice. This version avoids the duplicate print by calling the callback
> > > > >> from the GHES code instead of the EFI code as suggested by Dan.
> > > > >
> > > > > I'm not sure this is working as intended.
> > > > >
> > > > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > > > and now the EFI code handling that pretty printed things is missing we get
> > > > > the horrible kernel logging for an unknown block instead.
> > > > >
> > > > > So I think we need some minimal code in cper.c to match the guids then not
> > > > > log them (on basis we are arguing there is no need for new cper records).
> > > > > Otherwise we are in for some messy kernel logs
> > > > >
> > > > > Something like:
> > > > >
> > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > > > {1}[Hardware Error]: section length: 0x90
> > > > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > > > >
> > > > > (I'm filling the record with 0s currently)
> > > >
> > > > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > > > in dmesg from EFI as the handling is done in trace events from GHES.
> > > >
> > > > If, we need to handle from EFI, then it would be a good reason to move
> > > > the GUIDs out from GHES and place it in a common location for EFI/cper
> > > > to share similar to protocol errors.
> > >
> > > Ah, yes, my expectation was more aligned with Jonathan's observation to
> > > do the processing in GHES code *and* skip the processing in the CPER
> > > code, something like:
> > >
> >
> > Agreed this was intended I did not realize the above.
> >
> > >
> > > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> > > index 35c37f667781..0a4eed470750 100644
> > > --- a/drivers/firmware/efi/cper.c
> > > +++ b/drivers/firmware/efi/cper.c
> > > @@ -24,6 +24,7 @@
> > > #include <linux/bcd.h>
> > > #include <acpi/ghes.h>
> > > #include <ras/ras_event.h>
> > > +#include <linux/cxl-event.h>
> > > #include "cper_cxl.h"
> > >
> > > /*
> > > @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> > > cper_print_prot_err(newpfx, prot_err);
> > > else
> > > goto err_section_too_small;
> > > + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> > > + printk("%ssection_type: CXL General Media Error\n", newpfx);
> >
> > Do we want the printk's here? I did not realize that a generic event
> > would be printed. So intention was nothing would be done on this path.
>
> I think we do otherwise the kernel will say
>
> {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> {1}[Hardware Error]: event severity: recoverable
> {1}[Hardware Error]: Error 0, type: recoverable
> ...
>
> ...leaving the user hanging vs:
>
> {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> {1}[Hardware Error]: event severity: recoverable
> {1}[Hardware Error]: Error 0, type: recoverable
> {1}[Hardware Error]: section type: General Media Error
>
> ...as an indicator to go follow up with rasdaemon or whatever else is
> doing the detailed monitoring of CXL events.
Agreed. Maybe push it out to a static const table though.
As the argument was that we shouldn't be spitting out big logs in this
modern world, let's make it easy for people to add more entries.
struct skip_me {
guid_t guid;
const char *name;
};
static const struct skip_me skip_me = {
{ &CPER_SEC_CXL_GEN_MEDIA, "CXL General Media Error" },
etc.
};
for (i = 0; i < ARRAY_SIZE(skip_me); i++) {
if (guid_equal(sec_type, skip_me[i].guid)) {
printk("%asection_type: %s\n", newpfx, skip_me[i].name);
break;
}
or something like that in the final else.
Jonathan Cameron wrote:
> On Mon, 8 Jan 2024 18:59:16 -0800
> Dan Williams <[email protected]> wrote:
>
> > Ira Weiny wrote:
> > > Dan Williams wrote:
> > > > Smita Koralahalli wrote:
> > > > > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > > > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > > > > Ira Weiny <[email protected]> wrote:
> > > > > >
> > > > > >> Series status/background
> > > > > >> ========================
> > > > > >>
> > > > > >> Smita has been a great help with this series. Thank you again!
> > > > > >>
> > > > > >> Smita's testing found that the GHES code ended up printing the events
> > > > > >> twice. This version avoids the duplicate print by calling the callback
> > > > > >> from the GHES code instead of the EFI code as suggested by Dan.
> > > > > >
> > > > > > I'm not sure this is working as intended.
> > > > > >
> > > > > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > > > > and now the EFI code handling that pretty printed things is missing we get
> > > > > > the horrible kernel logging for an unknown block instead.
> > > > > >
> > > > > > So I think we need some minimal code in cper.c to match the guids then not
> > > > > > log them (on basis we are arguing there is no need for new cper records).
> > > > > > Otherwise we are in for some messy kernel logs
> > > > > >
> > > > > > Something like:
> > > > > >
> > > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > > > > {1}[Hardware Error]: section length: 0x90
> > > > > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > > > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > > > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > > > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > > > > >
> > > > > > (I'm filling the record with 0s currently)
> > > > >
> > > > > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > > > > in dmesg from EFI as the handling is done in trace events from GHES.
> > > > >
> > > > > If, we need to handle from EFI, then it would be a good reason to move
> > > > > the GUIDs out from GHES and place it in a common location for EFI/cper
> > > > > to share similar to protocol errors.
> > > >
> > > > Ah, yes, my expectation was more aligned with Jonathan's observation to
> > > > do the processing in GHES code *and* skip the processing in the CPER
> > > > code, something like:
> > > >
> > >
> > > Agreed this was intended I did not realize the above.
> > >
> > > >
> > > > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> > > > index 35c37f667781..0a4eed470750 100644
> > > > --- a/drivers/firmware/efi/cper.c
> > > > +++ b/drivers/firmware/efi/cper.c
> > > > @@ -24,6 +24,7 @@
> > > > #include <linux/bcd.h>
> > > > #include <acpi/ghes.h>
> > > > #include <ras/ras_event.h>
> > > > +#include <linux/cxl-event.h>
> > > > #include "cper_cxl.h"
> > > >
> > > > /*
> > > > @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> > > > cper_print_prot_err(newpfx, prot_err);
> > > > else
> > > > goto err_section_too_small;
> > > > + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> > > > + printk("%ssection_type: CXL General Media Error\n", newpfx);
> > >
> > > Do we want the printk's here? I did not realize that a generic event
> > > would be printed. So intention was nothing would be done on this path.
> >
> > I think we do otherwise the kernel will say
> >
> > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > {1}[Hardware Error]: event severity: recoverable
> > {1}[Hardware Error]: Error 0, type: recoverable
> > ...
> >
> > ...leaving the user hanging vs:
> >
> > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > {1}[Hardware Error]: event severity: recoverable
> > {1}[Hardware Error]: Error 0, type: recoverable
> > {1}[Hardware Error]: section type: General Media Error
> >
> > ...as an indicator to go follow up with rasdaemon or whatever else is
> > doing the detailed monitoring of CXL events.
>
> Agreed. Maybe push it out to a static const table though.
> As the argument was that we shouldn't be spitting out big logs in this
> modern world, let's make it easy for people to add more entries.
>
> struct skip_me {
> guid_t guid;
> const char *name;
> };
> static const struct skip_me skip_me = {
> { &CPER_SEC_CXL_GEN_MEDIA, "CXL General Media Error" },
> etc.
> };
>
> for (i = 0; i < ARRAY_SIZE(skip_me); i++) {
> if (guid_equal(sec_type, skip_me[i].guid)) {
> printk("%asection_type: %s\n", newpfx, skip_me[i].name);
> break;
> }
>
> or something like that in the final else.
I like it.
Any concerns with that being an -rc fixup, and move ahead with the base
enabling for v6.8? I don't see that follow-on as a reason to push the
whole thing to v6.9.
Dan Williams wrote:
> Jonathan Cameron wrote:
> > On Mon, 8 Jan 2024 18:59:16 -0800
> > Dan Williams <[email protected]> wrote:
> >
> > > Ira Weiny wrote:
> > > > Dan Williams wrote:
> > > > > Smita Koralahalli wrote:
> > > > > > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > > > > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > > > > > Ira Weiny <[email protected]> wrote:
> > > > > > >
> > > > > > >> Series status/background
> > > > > > >> ========================
> > > > > > >>
> > > > > > >> Smita has been a great help with this series. Thank you again!
> > > > > > >>
> > > > > > >> Smita's testing found that the GHES code ended up printing the events
> > > > > > >> twice. This version avoids the duplicate print by calling the callback
> > > > > > >> from the GHES code instead of the EFI code as suggested by Dan.
> > > > > > >
> > > > > > > I'm not sure this is working as intended.
> > > > > > >
> > > > > > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > > > > > and now the EFI code handling that pretty printed things is missing we get
> > > > > > > the horrible kernel logging for an unknown block instead.
> > > > > > >
> > > > > > > So I think we need some minimal code in cper.c to match the guids then not
> > > > > > > log them (on basis we are arguing there is no need for new cper records).
> > > > > > > Otherwise we are in for some messy kernel logs
> > > > > > >
> > > > > > > Something like:
> > > > > > >
> > > > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > > > > > {1}[Hardware Error]: section length: 0x90
> > > > > > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > > > > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > > > > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > > > > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > > > > > >
> > > > > > > (I'm filling the record with 0s currently)
> > > > > >
> > > > > > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > > > > > in dmesg from EFI as the handling is done in trace events from GHES.
> > > > > >
> > > > > > If, we need to handle from EFI, then it would be a good reason to move
> > > > > > the GUIDs out from GHES and place it in a common location for EFI/cper
> > > > > > to share similar to protocol errors.
> > > > >
> > > > > Ah, yes, my expectation was more aligned with Jonathan's observation to
> > > > > do the processing in GHES code *and* skip the processing in the CPER
> > > > > code, something like:
> > > > >
> > > >
> > > > Agreed this was intended I did not realize the above.
> > > >
> > > > >
> > > > > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> > > > > index 35c37f667781..0a4eed470750 100644
> > > > > --- a/drivers/firmware/efi/cper.c
> > > > > +++ b/drivers/firmware/efi/cper.c
> > > > > @@ -24,6 +24,7 @@
> > > > > #include <linux/bcd.h>
> > > > > #include <acpi/ghes.h>
> > > > > #include <ras/ras_event.h>
> > > > > +#include <linux/cxl-event.h>
> > > > > #include "cper_cxl.h"
> > > > >
> > > > > /*
> > > > > @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> > > > > cper_print_prot_err(newpfx, prot_err);
> > > > > else
> > > > > goto err_section_too_small;
> > > > > + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> > > > > + printk("%ssection_type: CXL General Media Error\n", newpfx);
> > > >
> > > > Do we want the printk's here? I did not realize that a generic event
> > > > would be printed. So intention was nothing would be done on this path.
> > >
> > > I think we do otherwise the kernel will say
> > >
> > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > {1}[Hardware Error]: event severity: recoverable
> > > {1}[Hardware Error]: Error 0, type: recoverable
> > > ...
> > >
> > > ...leaving the user hanging vs:
> > >
> > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > {1}[Hardware Error]: event severity: recoverable
> > > {1}[Hardware Error]: Error 0, type: recoverable
> > > {1}[Hardware Error]: section type: General Media Error
> > >
> > > ...as an indicator to go follow up with rasdaemon or whatever else is
> > > doing the detailed monitoring of CXL events.
> >
> > Agreed. Maybe push it out to a static const table though.
> > As the argument was that we shouldn't be spitting out big logs in this
> > modern world, let's make it easy for people to add more entries.
> >
> > struct skip_me {
> > guid_t guid;
> > const char *name;
> > };
> > static const struct skip_me skip_me = {
> > { &CPER_SEC_CXL_GEN_MEDIA, "CXL General Media Error" },
> > etc.
> > };
> >
> > for (i = 0; i < ARRAY_SIZE(skip_me); i++) {
> > if (guid_equal(sec_type, skip_me[i].guid)) {
> > printk("%asection_type: %s\n", newpfx, skip_me[i].name);
> > break;
> > }
> >
> > or something like that in the final else.
>
> I like it.
>
> Any concerns with that being an -rc fixup, and move ahead with the base
> enabling for v6.8? I don't see that follow-on as a reason to push the
> whole thing to v6.9.
I will put it in -next for soak time and make an inclusion decision in a
few days after I hear back.
On Wed, 10 Jan 2024 at 00:30, Dan Williams <[email protected]> wrote:
>
> Dan Williams wrote:
> > Jonathan Cameron wrote:
> > > On Mon, 8 Jan 2024 18:59:16 -0800
> > > Dan Williams <[email protected]> wrote:
> > >
> > > > Ira Weiny wrote:
> > > > > Dan Williams wrote:
> > > > > > Smita Koralahalli wrote:
> > > > > > > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > > > > > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > > > > > > Ira Weiny <[email protected]> wrote:
> > > > > > > >
> > > > > > > >> Series status/background
> > > > > > > >> ========================
> > > > > > > >>
> > > > > > > >> Smita has been a great help with this series. Thank you again!
> > > > > > > >>
> > > > > > > >> Smita's testing found that the GHES code ended up printing the events
> > > > > > > >> twice. This version avoids the duplicate print by calling the callback
> > > > > > > >> from the GHES code instead of the EFI code as suggested by Dan.
> > > > > > > >
> > > > > > > > I'm not sure this is working as intended.
> > > > > > > >
> > > > > > > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > > > > > > and now the EFI code handling that pretty printed things is missing we get
> > > > > > > > the horrible kernel logging for an unknown block instead.
> > > > > > > >
> > > > > > > > So I think we need some minimal code in cper.c to match the guids then not
> > > > > > > > log them (on basis we are arguing there is no need for new cper records).
> > > > > > > > Otherwise we are in for some messy kernel logs
> > > > > > > >
> > > > > > > > Something like:
> > > > > > > >
> > > > > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > > > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > > > > > > {1}[Hardware Error]: section length: 0x90
> > > > > > > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > > > > > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > > > > > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > > > > > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > > > > > > >
> > > > > > > > (I'm filling the record with 0s currently)
> > > > > > >
> > > > > > > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > > > > > > in dmesg from EFI as the handling is done in trace events from GHES.
> > > > > > >
> > > > > > > If, we need to handle from EFI, then it would be a good reason to move
> > > > > > > the GUIDs out from GHES and place it in a common location for EFI/cper
> > > > > > > to share similar to protocol errors.
> > > > > >
> > > > > > Ah, yes, my expectation was more aligned with Jonathan's observation to
> > > > > > do the processing in GHES code *and* skip the processing in the CPER
> > > > > > code, something like:
> > > > > >
> > > > >
> > > > > Agreed this was intended I did not realize the above.
> > > > >
> > > > > >
> > > > > > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> > > > > > index 35c37f667781..0a4eed470750 100644
> > > > > > --- a/drivers/firmware/efi/cper.c
> > > > > > +++ b/drivers/firmware/efi/cper.c
> > > > > > @@ -24,6 +24,7 @@
> > > > > > #include <linux/bcd.h>
> > > > > > #include <acpi/ghes.h>
> > > > > > #include <ras/ras_event.h>
> > > > > > +#include <linux/cxl-event.h>
> > > > > > #include "cper_cxl.h"
> > > > > >
> > > > > > /*
> > > > > > @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> > > > > > cper_print_prot_err(newpfx, prot_err);
> > > > > > else
> > > > > > goto err_section_too_small;
> > > > > > + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> > > > > > + printk("%ssection_type: CXL General Media Error\n", newpfx);
> > > > >
> > > > > Do we want the printk's here? I did not realize that a generic event
> > > > > would be printed. So intention was nothing would be done on this path.
> > > >
> > > > I think we do otherwise the kernel will say
> > > >
> > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > {1}[Hardware Error]: event severity: recoverable
> > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > ...
> > > >
> > > > ...leaving the user hanging vs:
> > > >
> > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > {1}[Hardware Error]: event severity: recoverable
> > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > {1}[Hardware Error]: section type: General Media Error
> > > >
> > > > ...as an indicator to go follow up with rasdaemon or whatever else is
> > > > doing the detailed monitoring of CXL events.
> > >
> > > Agreed. Maybe push it out to a static const table though.
> > > As the argument was that we shouldn't be spitting out big logs in this
> > > modern world, let's make it easy for people to add more entries.
> > >
> > > struct skip_me {
> > > guid_t guid;
> > > const char *name;
> > > };
> > > static const struct skip_me skip_me = {
> > > { &CPER_SEC_CXL_GEN_MEDIA, "CXL General Media Error" },
> > > etc.
> > > };
> > >
> > > for (i = 0; i < ARRAY_SIZE(skip_me); i++) {
> > > if (guid_equal(sec_type, skip_me[i].guid)) {
> > > printk("%asection_type: %s\n", newpfx, skip_me[i].name);
> > > break;
> > > }
> > >
> > > or something like that in the final else.
> >
> > I like it.
> >
> > Any concerns with that being an -rc fixup, and move ahead with the base
> > enabling for v6.8? I don't see that follow-on as a reason to push the
> > whole thing to v6.9.
>
> I will put it in -next for soak time and make an inclusion decision in a
> few days after I hear back.
>
For the series and however you want to handle the merge:
Acked-by: Ard Biesheuvel <[email protected]>
Jonathan Cameron wrote:
> On Wed, 20 Dec 2023 16:17:31 -0800
> Ira Weiny <[email protected]> wrote:
>
> > The UUID data is redundant in the known event trace types. The addition
> > of static defines allows the trace macros to create the UUID data inside
> > the trace thus removing unnecessary code.
> >
> > Have well known trace events use static data to set the uuid field based
> > on the event type.
> >
> > Suggested-by: Jonathan Cameron <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
>
> > TP_STRUCT__entry(
> > CXL_EVT_TP_entry
> > @@ -422,7 +424,8 @@ TRACE_EVENT(cxl_dram,
> > ),
> >
> > TP_fast_assign(
> > - CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
> > + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
> > + memcpy(&__entry->hdr_uuid, &CXL_EVENT_DRAM_UUID, sizeof(uuid_t));
>
> Hmm. Why not
>
> __entry->hdr_uuid = CXL_EVENT_DRAM_UUID;
> ?
>
> Compiler should be able to squish the stuff in the define down to data as as the
> UUID generation logic is pretty simple.
>
> I've not emulated the cper records for these yet, so not tested that works beyond
> compiling.
We can follow on with this conversion later as I see other usage of uuid
copying in trace events (bcache for instance). Although I probably would
not replace it with straight assignment and instead use the uuid_copy()
helper. Otherwise, why do {uuid,guid}_copy() helpers exist?
Jonathan Cameron wrote:
> On Wed, 20 Dec 2023 16:17:36 -0800
> Ira Weiny <[email protected]> wrote:
>
> > If the firmware has configured CXL event support to be firmware first
> > the OS can process those events through CPER records. The CXL layer has
> > unique DPA to HPA knowledge and standard event trace parsing in place.
> >
> > CPER records contain Bus, Device, Function information which can be used
> > to identify the PCI device which is sending the event.
> >
> > Change the PCI driver registration to include registration of a CXL
> > CPER callback to process events through the trace subsystem.
> >
> > Use new scoped based management to simplify the handling of the PCI
> > device object.
> >
> > NOTE this patch depends on Dan's addition of a device guard[1].
> >
> > [1] https://lore.kernel.org/all/170250854466.1522182.17555361077409628655.stgit@dwillia2-xfh.jf.intel.com/
> >
> One trivial comment inline.
> The guard change Dan suggests makes sense. Otherwise I'm fine with this.
> Reviewed-by: Jonathan Cameron <[email protected]>
>
> I'll bolt in the other stuff I need to test it from QEMU this week.
> Did the protocol error first, but these are easy to add now I have
> that working,
>
> Jonathan
> > ---
> > Changes for v5:
> > [Smita/djbw: trace a generic UUID if the type is unknown]
> > [Jonathan: clean up pci and device state error handling]
> > [iweiny: consolidate the trace function]
> > ---
> > drivers/cxl/core/mbox.c | 49 ++++++++++++++++++++++++++++-----------
> > drivers/cxl/cxlmem.h | 4 ++++
> > drivers/cxl/pci.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-
> > include/linux/cxl-event.h | 1 +
> > 4 files changed, 98 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 06957696247b..b801faaccd45 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -836,21 +836,44 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
> >
> > -static void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> > - enum cxl_event_log_type type,
> > - struct cxl_event_record_raw *record)
> > +void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> > + enum cxl_event_log_type type,
> > + enum cxl_event_type event_type,
> > + const uuid_t *uuid, union cxl_event *evt)
> > {
> > - union cxl_event *evt = &record->event;
> > - uuid_t *id = &record->id;
> > -
> > - if (uuid_equal(id, &CXL_EVENT_GEN_MEDIA_UUID))
> > + switch (event_type) {
> > + case CXL_CPER_EVENT_GEN_MEDIA:
> > trace_cxl_general_media(cxlmd, type, &evt->gen_media);
> > - else if (uuid_equal(id, &CXL_EVENT_DRAM_UUID))
> > + break;
>
> Might as well return directly and save a reviewer having to check if anything else happens
> after the switch
Might as well keep it as an "if () else" tree as that's equally clear
and more compact.
That immeidiately then opens the concern of why the upper level
__cxl_event_trace_record() is calling a lower level function without the
prefix. That can be swapped later to meet common expectations, but it
feels like gymnastics to parse all the uuids *and* still pass the uuid
to the cxl_event_trace_record() helper. Yes, I see how it happens, just
not totally comfortable with the result, but not enough to hold up the
series.
On Tue, 9 Jan 2024 15:38:28 -0800
Dan Williams <[email protected]> wrote:
> Jonathan Cameron wrote:
> > On Wed, 20 Dec 2023 16:17:31 -0800
> > Ira Weiny <[email protected]> wrote:
> >
> > > The UUID data is redundant in the known event trace types. The addition
> > > of static defines allows the trace macros to create the UUID data inside
> > > the trace thus removing unnecessary code.
> > >
> > > Have well known trace events use static data to set the uuid field based
> > > on the event type.
> > >
> > > Suggested-by: Jonathan Cameron <[email protected]>
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> >
> > > TP_STRUCT__entry(
> > > CXL_EVT_TP_entry
> > > @@ -422,7 +424,8 @@ TRACE_EVENT(cxl_dram,
> > > ),
> > >
> > > TP_fast_assign(
> > > - CXL_EVT_TP_fast_assign(cxlmd, log, uuid, rec->hdr);
> > > + CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
> > > + memcpy(&__entry->hdr_uuid, &CXL_EVENT_DRAM_UUID, sizeof(uuid_t));
> >
> > Hmm. Why not
> >
> > __entry->hdr_uuid = CXL_EVENT_DRAM_UUID;
> > ?
> >
> > Compiler should be able to squish the stuff in the define down to data as as the
> > UUID generation logic is pretty simple.
> >
> > I've not emulated the cper records for these yet, so not tested that works beyond
> > compiling.
>
> We can follow on with this conversion later as I see other usage of uuid
> copying in trace events (bcache for instance). Although I probably would
> not replace it with straight assignment and instead use the uuid_copy()
> helper. Otherwise, why do {uuid,guid}_copy() helpers exist?
To copy unknown uuids and guids where the compiler can't optimize things
nearly as well because it can't see the values.
Jonathan
On Wed, 10 Jan 2024 00:31:17 +0100
Ard Biesheuvel <[email protected]> wrote:
> On Wed, 10 Jan 2024 at 00:30, Dan Williams <[email protected]> wrote:
> >
> > Dan Williams wrote:
> > > Jonathan Cameron wrote:
> > > > On Mon, 8 Jan 2024 18:59:16 -0800
> > > > Dan Williams <[email protected]> wrote:
> > > >
> > > > > Ira Weiny wrote:
> > > > > > Dan Williams wrote:
> > > > > > > Smita Koralahalli wrote:
> > > > > > > > On 1/8/2024 8:58 AM, Jonathan Cameron wrote:
> > > > > > > > > On Wed, 20 Dec 2023 16:17:27 -0800
> > > > > > > > > Ira Weiny <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > >> Series status/background
> > > > > > > > >> ========================
> > > > > > > > >>
> > > > > > > > >> Smita has been a great help with this series. Thank you again!
> > > > > > > > >>
> > > > > > > > >> Smita's testing found that the GHES code ended up printing the events
> > > > > > > > >> twice. This version avoids the duplicate print by calling the callback
> > > > > > > > >> from the GHES code instead of the EFI code as suggested by Dan.
> > > > > > > > >
> > > > > > > > > I'm not sure this is working as intended.
> > > > > > > > >
> > > > > > > > > There is nothing gating the call in ghes_proc() of ghes_print_estatus()
> > > > > > > > > and now the EFI code handling that pretty printed things is missing we get
> > > > > > > > > the horrible kernel logging for an unknown block instead.
> > > > > > > > >
> > > > > > > > > So I think we need some minimal code in cper.c to match the guids then not
> > > > > > > > > log them (on basis we are arguing there is no need for new cper records).
> > > > > > > > > Otherwise we are in for some messy kernel logs
> > > > > > > > >
> > > > > > > > > Something like:
> > > > > > > > >
> > > > > > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > > > > > {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
> > > > > > > > > {1}[Hardware Error]: section length: 0x90
> > > > > > > > > {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
> > > > > > > > > {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000040: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000050: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................
> > > > > > > > > cxl_general_media: memdev=mem1 host=0000:10:00.0 serial=4 log=Informational : time=0 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=0 flags='' handle=0 related_handle=0 maint_op_class=0 : dpa=0 dpa_flags='' descriptor='' type='ECC Error' transaction_type='Unknown' channel=0 rank=0 device=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 validity_flags=''
> > > > > > > > >
> > > > > > > > > (I'm filling the record with 0s currently)
> > > > > > > >
> > > > > > > > Yeah, when I tested this, I thought its okay for the hexdump to be there
> > > > > > > > in dmesg from EFI as the handling is done in trace events from GHES.
> > > > > > > >
> > > > > > > > If, we need to handle from EFI, then it would be a good reason to move
> > > > > > > > the GUIDs out from GHES and place it in a common location for EFI/cper
> > > > > > > > to share similar to protocol errors.
> > > > > > >
> > > > > > > Ah, yes, my expectation was more aligned with Jonathan's observation to
> > > > > > > do the processing in GHES code *and* skip the processing in the CPER
> > > > > > > code, something like:
> > > > > > >
> > > > > >
> > > > > > Agreed this was intended I did not realize the above.
> > > > > >
> > > > > > >
> > > > > > > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> > > > > > > index 35c37f667781..0a4eed470750 100644
> > > > > > > --- a/drivers/firmware/efi/cper.c
> > > > > > > +++ b/drivers/firmware/efi/cper.c
> > > > > > > @@ -24,6 +24,7 @@
> > > > > > > #include <linux/bcd.h>
> > > > > > > #include <acpi/ghes.h>
> > > > > > > #include <ras/ras_event.h>
> > > > > > > +#include <linux/cxl-event.h>
> > > > > > > #include "cper_cxl.h"
> > > > > > >
> > > > > > > /*
> > > > > > > @@ -607,6 +608,15 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
> > > > > > > cper_print_prot_err(newpfx, prot_err);
> > > > > > > else
> > > > > > > goto err_section_too_small;
> > > > > > > + } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
> > > > > > > + printk("%ssection_type: CXL General Media Error\n", newpfx);
> > > > > >
> > > > > > Do we want the printk's here? I did not realize that a generic event
> > > > > > would be printed. So intention was nothing would be done on this path.
> > > > >
> > > > > I think we do otherwise the kernel will say
> > > > >
> > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > ...
> > > > >
> > > > > ...leaving the user hanging vs:
> > > > >
> > > > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> > > > > {1}[Hardware Error]: event severity: recoverable
> > > > > {1}[Hardware Error]: Error 0, type: recoverable
> > > > > {1}[Hardware Error]: section type: General Media Error
> > > > >
> > > > > ...as an indicator to go follow up with rasdaemon or whatever else is
> > > > > doing the detailed monitoring of CXL events.
> > > >
> > > > Agreed. Maybe push it out to a static const table though.
> > > > As the argument was that we shouldn't be spitting out big logs in this
> > > > modern world, let's make it easy for people to add more entries.
> > > >
> > > > struct skip_me {
> > > > guid_t guid;
> > > > const char *name;
> > > > };
> > > > static const struct skip_me skip_me = {
> > > > { &CPER_SEC_CXL_GEN_MEDIA, "CXL General Media Error" },
> > > > etc.
> > > > };
> > > >
> > > > for (i = 0; i < ARRAY_SIZE(skip_me); i++) {
> > > > if (guid_equal(sec_type, skip_me[i].guid)) {
> > > > printk("%asection_type: %s\n", newpfx, skip_me[i].name);
> > > > break;
> > > > }
> > > >
> > > > or something like that in the final else.
> > >
> > > I like it.
> > >
> > > Any concerns with that being an -rc fixup, and move ahead with the base
> > > enabling for v6.8? I don't see that follow-on as a reason to push the
> > > whole thing to v6.9.
> >
> > I will put it in -next for soak time and make an inclusion decision in a
> > few days after I hear back.
> >
>
> For the series and however you want to handle the merge:
>
> Acked-by: Ard Biesheuvel <[email protected]>
Any path in works for me as well.
J