From: Ira Weiny <[email protected]>
This code is well tested with the changes to Qemu I've made (See Below).
The series is in 5 parts:
0) Davidlohrs irq patch modified for 16 vectors
1) Base functionality
2) Parsing specific events (Dynamic Capacity Event Record is defered)
3) Event interrupt support
4) cxl-test infrastructure for basic tests
While I believe this entire series is ready to be merged I realize that the
interrupt support may still have some discussion around it. Therefor parts 1,
2, and 4 could be merged without irq support as cxl-test provides testing for
that. Interrupt testing requires Qemu but it too is fully tested and ready to
go.
Changes from RFC v2
Integrated Davidlohr's irq patch, allocate up to 16 vectors, and base
my irq support on modifications to that patch.
Smita
Check event status before reading each log.
Jonathan
Process more than 1 record at a time
Remove reserved fields
Steven
Prefix trace points with 'cxl_'
Davidlohr
PUll in his patch
Changes from RFC v1
Add event irqs
General simplification of the code.
Resolve field alignment questions
Update to rev 3.0 for comments and structures
Add reserved fields and output them
Event records inform the OS of various device events. Events are not needed
for any kernel operation but various user level software will want to track
events.
Add event reporting through the trace event mechanism. On driver load read and
clear all device events.
Enable all event logs for interrupts and process each log on interrupt.
TESTING:
Testing of this was performed with additions to QEMU in the following repo:
https://github.com/weiny2/qemu/tree/ira-cxl-events-latest
Changes to this repo are not finalized yet so I'm not posting those patches
right away. But there is enough functionality added to further test this.
1) event status register
2) additional event injection capabilities
3) Process more than 1 record at a time in Get/Clear mailbox commands
Davidlohr Bueso (1):
cxl/pci: Add generic MSI-X/MSI irq support
Ira Weiny (10):
cxl/mem: Implement Get Event Records command
cxl/mem: Implement Clear Event Records command
cxl/mem: Clear events on driver load
cxl/mem: Trace General Media Event Record
cxl/mem: Trace DRAM Event Record
cxl/mem: Trace Memory Module Event Record
cxl/mem: Wire up event interrupts
cxl/test: Add generic mock events
cxl/test: Add specific events
cxl/test: Simulate event log overflow
MAINTAINERS | 1 +
drivers/cxl/core/mbox.c | 219 ++++++++++++++
drivers/cxl/cxl.h | 8 +
drivers/cxl/cxlmem.h | 191 +++++++++++++
drivers/cxl/cxlpci.h | 6 +
drivers/cxl/pci.c | 167 +++++++++++
include/trace/events/cxl.h | 487 ++++++++++++++++++++++++++++++++
include/uapi/linux/cxl_mem.h | 4 +
tools/testing/cxl/test/Kbuild | 2 +-
tools/testing/cxl/test/events.c | 329 +++++++++++++++++++++
tools/testing/cxl/test/events.h | 9 +
tools/testing/cxl/test/mem.c | 35 +++
12 files changed, 1457 insertions(+), 1 deletion(-)
create mode 100644 include/trace/events/cxl.h
create mode 100644 tools/testing/cxl/test/events.c
create mode 100644 tools/testing/cxl/test/events.h
base-commit: aae703b02f92bde9264366c545e87cec451de471
--
2.37.2
From: Ira Weiny <[email protected]>
CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
Determine if the event read is memory module record and if so trace the
record.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC v2:
Ensure field names match TP_print output
Steven
prefix TRACE_EVENT with 'cxl_'
Jonathan
Remove reserved field
Define a 1bit and 2 bit status decoder
Fix paren alignment
Changes from RFC:
Clean up spec reference
Add reserved data
Use new CXL header macros
Jonathan
Use else if
Use get_unaligned_le*() for unaligned fields
Dave Jiang
s/cxl_mem_mod_event/memory_module
s/cxl_evt_mem_mod_rec/cxl_event_mem_module
---
drivers/cxl/core/mbox.c | 17 ++++-
drivers/cxl/cxlmem.h | 26 +++++++
include/trace/events/cxl.h | 144 +++++++++++++++++++++++++++++++++++++
3 files changed, 186 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index b03d7b856f3d..879b228a98a0 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -725,11 +725,20 @@ static const uuid_t dram_event_uuid =
UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+static const uuid_t mem_mod_event_uuid =
+ UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
+
static bool cxl_event_tracing_enabled(void)
{
return trace_cxl_generic_event_enabled() ||
trace_cxl_general_media_enabled() ||
- trace_cxl_dram_enabled();
+ trace_cxl_dram_enabled() ||
+ trace_cxl_memory_module_enabled();
}
static void cxl_trace_event_record(const char *dev_name,
@@ -749,6 +758,12 @@ static void cxl_trace_event_record(const char *dev_name,
trace_cxl_dram(dev_name, type, rec);
return;
+ } else if (uuid_equal(id, &mem_mod_event_uuid)) {
+ struct cxl_event_mem_module *rec =
+ (struct cxl_event_mem_module *)record;
+
+ trace_cxl_memory_module(dev_name, type, rec);
+ return;
}
/* For unknown record types print just the header */
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 87c877f0940d..03da4f8f74d3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -454,6 +454,32 @@ struct cxl_event_dram {
u8 reserved[0x17];
} __packed;
+/*
+ * Get Health Info Record
+ * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+struct cxl_get_health_info {
+ u8 health_status;
+ u8 media_status;
+ u8 add_status;
+ u8 life_used;
+ u8 device_temp[2];
+ u8 dirty_shutdown_cnt[4];
+ u8 cor_vol_err_cnt[4];
+ u8 cor_per_err_cnt[4];
+} __packed;
+
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+struct cxl_event_mem_module {
+ struct cxl_event_record_hdr hdr;
+ u8 event_type;
+ struct cxl_get_health_info info;
+ u8 reserved[0x3d];
+} __packed;
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
index 37bbe59905af..05437e13a882 100644
--- a/include/trace/events/cxl.h
+++ b/include/trace/events/cxl.h
@@ -335,6 +335,150 @@ TRACE_EVENT(cxl_dram,
)
);
+/*
+ * Memory Module Event Record - MMER
+ *
+ * CXL res 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CXL_MMER_HEALTH_STATUS_CHANGE 0x00
+#define CXL_MMER_MEDIA_STATUS_CHANGE 0x01
+#define CXL_MMER_LIFE_USED_CHANGE 0x02
+#define CXL_MMER_TEMP_CHANGE 0x03
+#define CXL_MMER_DATA_PATH_ERROR 0x04
+#define CXL_MMER_LAS_ERROR 0x05
+#define show_dev_evt_type(type) __print_symbolic(type, \
+ { CXL_MMER_HEALTH_STATUS_CHANGE, "Health Status Change" }, \
+ { CXL_MMER_MEDIA_STATUS_CHANGE, "Media Status Change" }, \
+ { CXL_MMER_LIFE_USED_CHANGE, "Life Used Change" }, \
+ { CXL_MMER_TEMP_CHANGE, "Temperature Change" }, \
+ { CXL_MMER_DATA_PATH_ERROR, "Data Path Error" }, \
+ { CXL_MMER_LAS_ERROR, "LSA Error" } \
+)
+
+/*
+ * Device Health Information - DHI
+ *
+ * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
+ */
+#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
+#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
+#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
+#define show_health_status_flags(flags) __print_flags(flags, "|", \
+ { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
+ { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
+ { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
+)
+
+#define CXL_DHI_MS_NORMAL 0x00
+#define CXL_DHI_MS_NOT_READY 0x01
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST 0x02
+#define CXL_DHI_MS_ALL_DATA_LOST 0x03
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS 0x04
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN 0x05
+#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT 0x06
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS 0x07
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN 0x08
+#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT 0x09
+#define show_media_status(ms) __print_symbolic(ms, \
+ { CXL_DHI_MS_NORMAL, \
+ "Normal" }, \
+ { CXL_DHI_MS_NOT_READY, \
+ "Not Ready" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOST, \
+ "Write Persistency Lost" }, \
+ { CXL_DHI_MS_ALL_DATA_LOST, \
+ "All Data Lost" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS, \
+ "Write Persistency Loss in the Event of Power Loss" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN, \
+ "Write Persistency Loss in Event of Shutdown" }, \
+ { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT, \
+ "Write Persistency Loss Imminent" }, \
+ { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS, \
+ "All Data Loss in Event of Power Loss" }, \
+ { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN, \
+ "All Data loss in the Event of Shutdown" }, \
+ { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT, \
+ "All Data Loss Imminent" } \
+)
+
+#define CXL_DHI_AS_NORMAL 0x0
+#define CXL_DHI_AS_WARNING 0x1
+#define CXL_DHI_AS_CRITICAL 0x2
+#define show_two_bit_status(as) __print_symbolic(as, \
+ { CXL_DHI_AS_NORMAL, "Normal" }, \
+ { CXL_DHI_AS_WARNING, "Warning" }, \
+ { CXL_DHI_AS_CRITICAL, "Critical" } \
+)
+#define show_one_bit_status(as) __print_symbolic(as, \
+ { CXL_DHI_AS_NORMAL, "Normal" }, \
+ { CXL_DHI_AS_WARNING, "Warning" } \
+)
+
+#define CXL_DHI_AS_LIFE_USED(as) (as & 0x3)
+#define CXL_DHI_AS_DEV_TEMP(as) ((as & 0xC) >> 2)
+#define CXL_DHI_AS_COR_VOL_ERR_CNT(as) ((as & 0x10) >> 4)
+#define CXL_DHI_AS_COR_PER_ERR_CNT(as) ((as & 0x20) >> 5)
+
+TRACE_EVENT(cxl_memory_module,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_event_mem_module *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ CXL_EVT_TP_entry
+
+ /* Memory Module Event */
+ __field(u8, event_type)
+
+ /* Device Health Info */
+ __field(u8, health_status)
+ __field(u8, media_status)
+ __field(u8, life_used)
+ __field(u32, dirty_shutdown_cnt)
+ __field(u32, cor_vol_err_cnt)
+ __field(u32, cor_per_err_cnt)
+ __field(s16, device_temp)
+ __field(u8, add_status)
+ ),
+
+ TP_fast_assign(
+ CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+
+ /* Memory Module Event */
+ __entry->event_type = rec->event_type;
+
+ /* Device Health Info */
+ __entry->health_status = rec->info.health_status;
+ __entry->media_status = rec->info.media_status;
+ __entry->life_used = rec->info.life_used;
+ __entry->dirty_shutdown_cnt = get_unaligned_le32(rec->info.dirty_shutdown_cnt);
+ __entry->cor_vol_err_cnt = get_unaligned_le32(rec->info.cor_vol_err_cnt);
+ __entry->cor_per_err_cnt = get_unaligned_le32(rec->info.cor_per_err_cnt);
+ __entry->device_temp = get_unaligned_le16(rec->info.device_temp);
+ __entry->add_status = rec->info.add_status;
+ ),
+
+ CXL_EVT_TP_printk("event_type='%s' health_status='%s' media_status='%s' " \
+ "as_life_used=%s as_dev_temp=%s as_cor_vol_err_cnt=%s " \
+ "as_cor_per_err_cnt=%s life_used=%u device_temp=%d " \
+ "dirty_shutdown_cnt=%u cor_vol_err_cnt=%u cor_per_err_cnt=%u",
+ show_dev_evt_type(__entry->event_type),
+ show_health_status_flags(__entry->health_status),
+ show_media_status(__entry->media_status),
+ show_two_bit_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
+ show_two_bit_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
+ show_one_bit_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
+ show_one_bit_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
+ __entry->life_used, __entry->device_temp,
+ __entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
+ __entry->cor_per_err_cnt
+ )
+);
+
+
#endif /* _CXL_TRACE_EVENTS_H */
/* This part must be outside protection */
--
2.37.2
From: Ira Weiny <[email protected]>
CXL devices have multiple event logs which can be queried for CXL event
records. Devices are required to support the storage of at least one
event record in each event log type.
Devices track event log overflow by incrementing a counter and tracking
the time of the first and last overflow event seen.
Software queries events via the Get Event Record mailbox command; CXL
rev 3.0 section 8.2.9.2.2.
Issue the Get Event Record mailbox command on driver load. Trace each
record found with a generic record trace. Trace any overflow
conditions.
The device can return up to 1MB worth of event records per query. This
presents complications with allocating a huge buffers to potentially
capture all the records. It is not anticipated that these event logs
will be very deep and reading them does not need to be performant.
Process only 3 records at a time. 3 records was chosen as it fits
comfortably on the stack to prevent dynamic allocation while still
cutting down on extra mailbox messages.
This patch traces a raw event record only and leaves the specific event
record types to subsequent patches.
Macros are created to use for tracing the common CXL Event header
fields.
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Change from RFC v2:
Support reading 3 events at once.
Reverse Jonathan's suggestion and check for positive number of
records. Because the record count may have been
returned as something > 3 based on what the device
thinks it can send back even though the core Linux mbox
processing truncates the data.
Alison and Dave Jiang
Change header uuid type to uuid_t for better user space
processing
Smita
Check status reg before reading log.
Steven
Prefix all trace points with 'cxl_'
Use static branch <trace>_enabled() calls
Jonathan
s/CXL_EVENT_TYPE_INFO/0
s/{first,last}/{first,last}_ts
Remove Reserved field from header
Fix header issue for cxl_event_log_type_str()
Change from RFC:
Remove redundant error message in get event records loop
s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
Use hdr_uuid for the header UUID field
Use cxl_event_log_type_str() for the trace events
Create macros for the header fields and common entries of each event
Add reserved buffer output dump
Report error if event query fails
Remove unused record_cnt variable
Steven - reorder overflow record
Remove NOTE about checkpatch
Jonathan
check for exactly 1 record
s/v3.0/rev 3.0
Use 3 byte fields for 24bit fields
Add 3.0 Maintenance Operation Class
Add Dynamic Capacity log type
Fix spelling
Dave Jiang/Dan/Alison
s/cxl-event/cxl
trace/events/cxl-events => trace/events/cxl.h
s/cxl_event_overflow/overflow
s/cxl_event/generic_event
---
MAINTAINERS | 1 +
drivers/cxl/core/mbox.c | 70 +++++++++++++++++++
drivers/cxl/cxl.h | 8 +++
drivers/cxl/cxlmem.h | 73 ++++++++++++++++++++
include/trace/events/cxl.h | 127 +++++++++++++++++++++++++++++++++++
include/uapi/linux/cxl_mem.h | 1 +
6 files changed, 280 insertions(+)
create mode 100644 include/trace/events/cxl.h
diff --git a/MAINTAINERS b/MAINTAINERS
index ca063a504026..4b7c6e3055c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5223,6 +5223,7 @@ M: Dan Williams <[email protected]>
L: [email protected]
S: Maintained
F: drivers/cxl/
+F: include/trace/events/cxl.h
F: include/uapi/linux/cxl_mem.h
CONEXANT ACCESSRUNNER USB DRIVER
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 16176b9278b4..a908b95a7de4 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -7,6 +7,9 @@
#include <cxlmem.h>
#include <cxl.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/cxl.h>
+
#include "core.h"
static bool cxl_raw_allow_all;
@@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
#endif
CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
+ CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -704,6 +708,72 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
+static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+ enum cxl_event_log_type type)
+{
+ struct cxl_get_event_payload payload;
+ u16 pl_nr;
+
+ do {
+ u8 log_type = type;
+ int rc;
+
+ rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
+ &log_type, sizeof(log_type),
+ &payload, sizeof(payload));
+ if (rc) {
+ dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
+ cxl_event_log_type_str(type), rc);
+ return;
+ }
+
+ pl_nr = le16_to_cpu(payload.record_count);
+ if (trace_cxl_generic_event_enabled()) {
+ u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
+ int i;
+
+ for (i = 0; i < nr_rec; i++)
+ trace_cxl_generic_event(dev_name(cxlds->dev),
+ type,
+ &payload.record[i]);
+ }
+
+ if (trace_cxl_overflow_enabled() &&
+ (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
+ trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
+
+ } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
+ payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
+}
+
+/**
+ * cxl_mem_get_event_records - Get Event Records from the device
+ * @cxlds: The device data for the operation
+ *
+ * Retrieve all event records available on the device and report them as trace
+ * events.
+ *
+ * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
+ */
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
+{
+ u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
+
+ dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
+
+ if (status & CXLDEV_EVENT_STATUS_INFO)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
+ if (status & CXLDEV_EVENT_STATUS_WARN)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
+ if (status & CXLDEV_EVENT_STATUS_FAIL)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
+ if (status & CXLDEV_EVENT_STATUS_FATAL)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
+ if (status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
+
/**
* cxl_mem_get_partition_info - Get partition info
* @cxlds: The device data for the operation
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index f680450f0b16..492cff1bea6d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -132,6 +132,14 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
#define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
#define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
+/* CXL 3.0 8.2.8.3.1 Event Status Register */
+#define CXLDEV_DEV_EVENT_STATUS_OFFSET 0x00
+#define CXLDEV_EVENT_STATUS_INFO BIT(0)
+#define CXLDEV_EVENT_STATUS_WARN BIT(1)
+#define CXLDEV_EVENT_STATUS_FAIL BIT(2)
+#define CXLDEV_EVENT_STATUS_FATAL BIT(3)
+#define CXLDEV_EVENT_STATUS_DYNAMIC_CAP BIT(4)
+
/* CXL 2.0 8.2.8.4 Mailbox Registers */
#define CXLDEV_MBOX_CAPS_OFFSET 0x00
#define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index b7b955ded3ac..da64ba0f156b 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -4,6 +4,7 @@
#define __CXL_MEM_H__
#include <uapi/linux/cxl_mem.h>
#include <linux/cdev.h>
+#include <linux/uuid.h>
#include "cxl.h"
/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -256,6 +257,7 @@ struct cxl_dev_state {
enum cxl_opcode {
CXL_MBOX_OP_INVALID = 0x0000,
CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
+ CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
CXL_MBOX_OP_GET_FW_INFO = 0x0200,
CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
@@ -325,6 +327,76 @@ struct cxl_mbox_identify {
u8 qos_telemetry_caps;
} __packed;
+/*
+ * Common Event Record Format
+ * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
+ */
+struct cxl_event_record_hdr {
+ uuid_t id;
+ u8 length;
+ u8 flags[3];
+ __le16 handle;
+ __le16 related_handle;
+ __le64 timestamp;
+ u8 maint_op_class;
+ u8 reserved[0xf];
+} __packed;
+
+#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
+struct cxl_event_record_raw {
+ struct cxl_event_record_hdr hdr;
+ u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
+} __packed;
+
+/*
+ * Get Event Records output payload
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
+ */
+#define CXL_GET_EVENT_FLAG_OVERFLOW BIT(0)
+#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
+#define CXL_GET_EVENT_NR_RECORDS 3
+struct cxl_get_event_payload {
+ u8 flags;
+ u8 reserved1;
+ __le16 overflow_err_count;
+ __le64 first_overflow_timestamp;
+ __le64 last_overflow_timestamp;
+ __le16 record_count;
+ u8 reserved2[0xa];
+ struct cxl_event_record_raw record[CXL_GET_EVENT_NR_RECORDS];
+} __packed;
+
+/*
+ * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
+ */
+enum cxl_event_log_type {
+ CXL_EVENT_TYPE_INFO = 0x00,
+ CXL_EVENT_TYPE_WARN,
+ CXL_EVENT_TYPE_FAIL,
+ CXL_EVENT_TYPE_FATAL,
+ CXL_EVENT_TYPE_DYNAMIC_CAP,
+ CXL_EVENT_TYPE_MAX
+};
+
+static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
+{
+ switch (type) {
+ case CXL_EVENT_TYPE_INFO:
+ return "Informational";
+ case CXL_EVENT_TYPE_WARN:
+ return "Warning";
+ case CXL_EVENT_TYPE_FAIL:
+ return "Failure";
+ case CXL_EVENT_TYPE_FATAL:
+ return "Fatal";
+ case CXL_EVENT_TYPE_DYNAMIC_CAP:
+ return "Dynamic Capacity";
+ default:
+ break;
+ }
+ return "<unknown>";
+}
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
@@ -384,6 +456,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
new file mode 100644
index 000000000000..60dec9a84918
--- /dev/null
+++ b/include/trace/events/cxl.h
@@ -0,0 +1,127 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cxl
+
+#if !defined(_CXL_TRACE_EVENTS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _CXL_TRACE_EVENTS_H
+
+#include <asm-generic/unaligned.h>
+#include <linux/tracepoint.h>
+#include <cxlmem.h>
+
+TRACE_EVENT(cxl_overflow,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_get_event_payload *payload),
+
+ TP_ARGS(dev_name, log, payload),
+
+ TP_STRUCT__entry(
+ __string(dev_name, dev_name)
+ __field(int, log)
+ __field(u64, first_ts)
+ __field(u64, last_ts)
+ __field(u16, count)
+ ),
+
+ TP_fast_assign(
+ __assign_str(dev_name, dev_name);
+ __entry->log = log;
+ __entry->count = le16_to_cpu(payload->overflow_err_count);
+ __entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
+ __entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
+ ),
+
+ TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
+ __get_str(dev_name), cxl_event_log_type_str(__entry->log),
+ __entry->count, __entry->first_ts, __entry->last_ts)
+
+);
+
+/*
+ * Common Event Record Format
+ * CXL 3.0 section 8.2.9.2.1; Table 8-42
+ */
+#define CXL_EVENT_RECORD_FLAG_PERMANENT BIT(2)
+#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
+#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
+#define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
+#define show_hdr_flags(flags) __print_flags(flags, " | ", \
+ { CXL_EVENT_RECORD_FLAG_PERMANENT, "Permanent Condition" }, \
+ { CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "Maintenance Needed" }, \
+ { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "Performance Degraded" }, \
+ { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "Hardware Replacement Needed" } \
+)
+
+/*
+ * Define macros for the common header of each CXL event.
+ *
+ * Tracepoints using these macros must do 3 things:
+ *
+ * 1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
+ * 2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
+ * pass the dev_name, log, and CXL event header
+ * 3) Use CXL_EVT_TP_printk() instead of TP_printk()
+ *
+ * See the generic_event tracepoint as an example.
+ */
+#define CXL_EVT_TP_entry \
+ __string(dev_name, dev_name) \
+ __field(int, log) \
+ __field_struct(uuid_t, hdr_uuid) \
+ __field(u32, hdr_flags) \
+ __field(u16, hdr_handle) \
+ __field(u16, hdr_related_handle) \
+ __field(u64, hdr_timestamp) \
+ __field(u8, hdr_length) \
+ __field(u8, hdr_maint_op_class)
+
+#define CXL_EVT_TP_fast_assign(dname, l, hdr) \
+ __assign_str(dev_name, (dname)); \
+ __entry->log = (l); \
+ memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
+ __entry->hdr_length = (hdr).length; \
+ __entry->hdr_flags = get_unaligned_le24((hdr).flags); \
+ __entry->hdr_handle = le16_to_cpu((hdr).handle); \
+ __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
+ __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
+ __entry->hdr_maint_op_class = (hdr).maint_op_class
+
+
+#define CXL_EVT_TP_printk(fmt, ...) \
+ TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' " \
+ "handle=%x related_handle=%x maint_op_class=%u" \
+ " : " fmt, \
+ __get_str(dev_name), cxl_event_log_type_str(__entry->log), \
+ __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
+ show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
+ __entry->hdr_related_handle, __entry->hdr_maint_op_class, \
+ ##__VA_ARGS__)
+
+TRACE_EVENT(cxl_generic_event,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_event_record_raw *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ CXL_EVT_TP_entry
+ __array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
+ ),
+
+ TP_fast_assign(
+ CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+ memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
+ ),
+
+ CXL_EVT_TP_printk("%s",
+ __print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
+);
+
+#endif /* _CXL_TRACE_EVENTS_H */
+
+/* This part must be outside protection */
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE cxl
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index c71021a2a9ed..70459be5bdd4 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -24,6 +24,7 @@
___C(IDENTIFY, "Identify Command"), \
___C(RAW, "Raw device command"), \
___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
+ ___C(GET_EVENT_RECORD, "Get Event Record"), \
___C(GET_FW_INFO, "Get FW Info"), \
___C(GET_PARTITION_INFO, "Get Partition Information"), \
___C(GET_LSA, "Get Label Storage Area"), \
--
2.37.2
From: Ira Weiny <[email protected]>
The information contained in the events prior to the driver loading can
be queried at any time through other mailbox commands.
Ensure a clean slate of events by reading and clearing the events. The
events are sent to the trace buffer but it is not anticipated to have
anyone listening to it at driver load time.
Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
drivers/cxl/pci.c | 2 ++
tools/testing/cxl/test/mem.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 62e560063e50..e0d511575b45 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -530,6 +530,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);
+ cxl_mem_get_event_records(cxlds);
+
if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index aa2df3a15051..e2f5445d24ff 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);
+ cxl_mem_get_event_records(cxlds);
+
if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
rc = devm_cxl_add_nvdimm(dev, cxlmd);
--
2.37.2
From: Ira Weiny <[email protected]>
CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
Determine if the event read is a DRAM event record and if so trace the
record.
Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC v2:
Output DPA flags as a separate field.
Ensure field names match TP_print output
Steven
prefix TRACE_EVENT with 'cxl_'
Jonathan
Formatting fix
Remove reserved field
Changes from RFC:
Add reserved byte data
Use new CXL header macros
Jonathan
Use get_unaligned_le{24,16}() for unaligned fields
Use 'else if'
Dave Jiang
s/cxl_dram_event/dram
s/cxl_evt_dram_rec/cxl_event_dram
Adjust for new phys addr mask
---
drivers/cxl/core/mbox.c | 16 ++++++-
drivers/cxl/cxlmem.h | 23 ++++++++++
include/trace/events/cxl.h | 92 ++++++++++++++++++++++++++++++++++++++
3 files changed, 130 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6d48fdb07700..b03d7b856f3d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -717,10 +717,19 @@ static const uuid_t gen_media_event_uuid =
UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+static const uuid_t dram_event_uuid =
+ UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
+
static bool cxl_event_tracing_enabled(void)
{
return trace_cxl_generic_event_enabled() ||
- trace_cxl_general_media_enabled();
+ trace_cxl_general_media_enabled() ||
+ trace_cxl_dram_enabled();
}
static void cxl_trace_event_record(const char *dev_name,
@@ -735,6 +744,11 @@ static void cxl_trace_event_record(const char *dev_name,
trace_cxl_general_media(dev_name, type, rec);
return;
+ } else if (uuid_equal(id, &dram_event_uuid)) {
+ struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
+
+ trace_cxl_dram(dev_name, type, rec);
+ return;
}
/* For unknown record types print just the header */
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 86197f3168c7..87c877f0940d 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -431,6 +431,29 @@ struct cxl_event_gen_media {
u8 reserved[0x2e];
} __packed;
+/*
+ * DRAM Event Record - DER
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
+ */
+#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
+struct cxl_event_dram {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 nibble_mask[3];
+ u8 bank_group;
+ u8 bank;
+ u8 row[3];
+ u8 column[2];
+ u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
+ u8 reserved[0x17];
+} __packed;
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
index a0c20e110708..37bbe59905af 100644
--- a/include/trace/events/cxl.h
+++ b/include/trace/events/cxl.h
@@ -243,6 +243,98 @@ TRACE_EVENT(cxl_general_media,
)
);
+/*
+ * DRAM Event Record - DER
+ *
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+/*
+ * DRAM Event Record defines many fields the same as the General Media Event
+ * Record. Reuse those definitions as appropriate.
+ */
+#define CXL_DER_VALID_CHANNEL BIT(0)
+#define CXL_DER_VALID_RANK BIT(1)
+#define CXL_DER_VALID_NIBBLE BIT(2)
+#define CXL_DER_VALID_BANK_GROUP BIT(3)
+#define CXL_DER_VALID_BANK BIT(4)
+#define CXL_DER_VALID_ROW BIT(5)
+#define CXL_DER_VALID_COLUMN BIT(6)
+#define CXL_DER_VALID_CORRECTION_MASK BIT(7)
+#define show_dram_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_DER_VALID_CHANNEL, "CHANNEL" }, \
+ { CXL_DER_VALID_RANK, "RANK" }, \
+ { CXL_DER_VALID_NIBBLE, "NIBBLE" }, \
+ { CXL_DER_VALID_BANK_GROUP, "BANK GROUP" }, \
+ { CXL_DER_VALID_BANK, "BANK" }, \
+ { CXL_DER_VALID_ROW, "ROW" }, \
+ { CXL_DER_VALID_COLUMN, "COLUMN" }, \
+ { CXL_DER_VALID_CORRECTION_MASK, "CORRECTION MASK" } \
+)
+
+TRACE_EVENT(cxl_dram,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_event_dram *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ CXL_EVT_TP_entry
+ /* DRAM */
+ __field(u64, dpa)
+ __field(u8, descriptor)
+ __field(u8, type)
+ __field(u8, transaction_type)
+ __field(u8, channel)
+ __field(u16, validity_flags)
+ __field(u16, column) /* Out of order to pack trace record */
+ __field(u32, nibble_mask)
+ __field(u32, row)
+ __array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
+ __field(u8, rank) /* Out of order to pack trace record */
+ __field(u8, bank_group) /* Out of order to pack trace record */
+ __field(u8, bank) /* Out of order to pack trace record */
+ __field(u8, dpa_flags) /* Out of order to pack trace record */
+ ),
+
+ TP_fast_assign(
+ CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+
+ /* DRAM */
+ __entry->dpa = le64_to_cpu(rec->phys_addr);
+ __entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
+ __entry->dpa &= CXL_DPA_MASK;
+ __entry->descriptor = rec->descriptor;
+ __entry->type = rec->type;
+ __entry->transaction_type = rec->transaction_type;
+ __entry->validity_flags = get_unaligned_le16(rec->validity_flags);
+ __entry->channel = rec->channel;
+ __entry->rank = rec->rank;
+ __entry->nibble_mask = get_unaligned_le24(rec->nibble_mask);
+ __entry->bank_group = rec->bank_group;
+ __entry->bank = rec->bank;
+ __entry->row = get_unaligned_le24(rec->row);
+ __entry->column = get_unaligned_le16(rec->column);
+ memcpy(__entry->cor_mask, &rec->correction_mask,
+ CXL_EVENT_DER_CORRECTION_MASK_SIZE);
+ ),
+
+ CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' " \
+ "transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
+ "bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
+ "validity_flags='%s'",
+ __entry->dpa, show_dpa_flags(__entry->dpa_flags),
+ show_event_desc_flags(__entry->descriptor),
+ show_mem_event_type(__entry->type),
+ show_trans_type(__entry->transaction_type),
+ __entry->channel, __entry->rank, __entry->nibble_mask,
+ __entry->bank_group, __entry->bank,
+ __entry->row, __entry->column,
+ __print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
+ show_dram_valid_flags(__entry->validity_flags)
+ )
+);
+
#endif /* _CXL_TRACE_EVENTS_H */
/* This part must be outside protection */
--
2.37.2
From: Ira Weiny <[email protected]>
Each type of event has different trace point outputs.
Add mock General Media Event, DRAM event, and Memory Module Event
records to the mock list of events returned.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC:
Adjust for struct changes
adjust for unaligned fields
---
tools/testing/cxl/test/events.c | 70 +++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)
diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
index a4816f230bb5..8693f3fb9cbb 100644
--- a/tools/testing/cxl/test/events.c
+++ b/tools/testing/cxl/test/events.c
@@ -186,6 +186,70 @@ struct cxl_event_record_raw hardware_replace = {
.data = { 0xDE, 0xAD, 0xBE, 0xEF },
};
+struct cxl_event_gen_media gen_media = {
+ .hdr = {
+ .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
+ .length = sizeof(struct cxl_event_gen_media),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x2000),
+ .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
+ .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
+ .validity_flags = { CXL_GMER_VALID_CHANNEL |
+ CXL_GMER_VALID_RANK, 0 },
+ .channel = 1,
+ .rank = 30
+};
+
+struct cxl_event_dram dram = {
+ .hdr = {
+ .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
+ .length = sizeof(struct cxl_event_dram),
+ .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .phys_addr = cpu_to_le64(0x8000),
+ .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
+ .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
+ .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
+ .validity_flags = { CXL_DER_VALID_CHANNEL |
+ CXL_DER_VALID_BANK_GROUP |
+ CXL_DER_VALID_BANK |
+ CXL_DER_VALID_COLUMN, 0 },
+ .channel = 1,
+ .bank_group = 5,
+ .bank = 2,
+ .column = { 0xDE, 0xAD},
+};
+
+struct cxl_event_mem_module mem_module = {
+ .hdr = {
+ .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
+ .length = sizeof(struct cxl_event_mem_module),
+ /* .handle = Set dynamically */
+ .related_handle = cpu_to_le16(0),
+ },
+ .event_type = CXL_MMER_TEMP_CHANGE,
+ .info = {
+ .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
+ .media_status = CXL_DHI_MS_ALL_DATA_LOST,
+ .add_status = (CXL_DHI_AS_CRITICAL << 2) |
+ (CXL_DHI_AS_WARNING << 4) |
+ (CXL_DHI_AS_WARNING << 5),
+ .device_temp = { 0xDE, 0xAD},
+ .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
+ }
+};
+
u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
{
struct device *dev = cxlds->dev;
@@ -204,9 +268,15 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
}
event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
+ event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
+ (struct cxl_event_record_raw *)&gen_media);
+ event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
+ (struct cxl_event_record_raw *)&mem_module);
mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
+ (struct cxl_event_record_raw *)&dram);
mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
return mes->ev_status;
--
2.37.2
From: Ira Weiny <[email protected]>
CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
command. After an event record is read it needs to be cleared from the
event log.
Implement cxl_clear_event_record() and call it for each record retrieved
from the device.
Each record is cleared individually. A clear all bit is specified but
events could arrive between a get and the final clear all operation.
Therefore each event is cleared specifically.
Reviewed-by: Jonathan Cameron <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC:
Jonathan
Clean up init of payload and use return code.
Also report any error to clear the event.
s/v3.0/rev 3.0
---
drivers/cxl/core/mbox.c | 46 ++++++++++++++++++++++++++++++------
drivers/cxl/cxlmem.h | 15 ++++++++++++
include/uapi/linux/cxl_mem.h | 1 +
3 files changed, 55 insertions(+), 7 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index a908b95a7de4..f46558e09f08 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
#endif
CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
+ CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -708,6 +709,27 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
+static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
+ enum cxl_event_log_type log,
+ struct cxl_get_event_payload *get_pl, u16 nr)
+{
+ struct cxl_mbox_clear_event_payload payload = {
+ .event_log = log,
+ .nr_recs = nr,
+ };
+ int i;
+
+ for (i = 0; i < nr; i++) {
+ payload.handle[i] = get_pl->record[i].hdr.handle;
+ dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
+ cxl_event_log_type_str(log),
+ le16_to_cpu(payload.handle[i]));
+ }
+
+ return cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
+ &payload, sizeof(payload), NULL, 0);
+}
+
static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
enum cxl_event_log_type type)
{
@@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
}
pl_nr = le16_to_cpu(payload.record_count);
- if (trace_cxl_generic_event_enabled()) {
+ if (pl_nr > 0) {
u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
int i;
- for (i = 0; i < nr_rec; i++)
- trace_cxl_generic_event(dev_name(cxlds->dev),
- type,
- &payload.record[i]);
+ if (trace_cxl_generic_event_enabled()) {
+ for (i = 0; i < nr_rec; i++)
+ trace_cxl_generic_event(dev_name(cxlds->dev),
+ type,
+ &payload.record[i]);
+ }
+
+ rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
+ if (rc) {
+ dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
+ cxl_event_log_type_str(type), rc);
+ return;
+ }
}
if (trace_cxl_overflow_enabled() &&
@@ -750,10 +781,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
* cxl_mem_get_event_records - Get Event Records from the device
* @cxlds: The device data for the operation
*
- * Retrieve all event records available on the device and report them as trace
- * events.
+ * Retrieve all event records available on the device, report them as trace
+ * events, and clear them.
*
* See CXL rev 3.0 @8.2.9.2.2 Get Event Records
+ * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
*/
void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
{
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index da64ba0f156b..28a114c7cf69 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -258,6 +258,7 @@ enum cxl_opcode {
CXL_MBOX_OP_INVALID = 0x0000,
CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
+ CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
CXL_MBOX_OP_GET_FW_INFO = 0x0200,
CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
@@ -397,6 +398,20 @@ static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
return "<unknown>";
}
+/*
+ * Clear Event Records input payload
+ * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
+ *
+ * Space given for 1 record
+ */
+struct cxl_mbox_clear_event_payload {
+ u8 event_log; /* enum cxl_event_log_type */
+ u8 clear_flags;
+ u8 nr_recs; /* 1 for this struct */
+ u8 reserved[3];
+ __le16 handle[CXL_GET_EVENT_NR_RECORDS];
+};
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 70459be5bdd4..7c1ad8062792 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -25,6 +25,7 @@
___C(RAW, "Raw device command"), \
___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
___C(GET_EVENT_RECORD, "Get Event Record"), \
+ ___C(CLEAR_EVENT_RECORD, "Clear Event Record"), \
___C(GET_FW_INFO, "Get FW Info"), \
___C(GET_PARTITION_INFO, "Get Partition Information"), \
___C(GET_LSA, "Get Label Storage Area"), \
--
2.37.2
From: Ira Weiny <[email protected]>
CXL device events are signaled via interrupts. Each event log may have
a different interrupt message number. These message numbers are
reported in the Get Event Interrupt Policy mailbox command.
Add interrupt support for event logs. Interrupts are allocated as
shared interrupts. Therefore, all or some event logs can share the same
message number.
The driver must deal with the possibility that dynamic capacity is not
yet supported by a device it sees. Fallback and retry without dynamic
capacity if the first attempt fails.
Device capacity event logs interrupt as part of the informational event
log. Check the event status to see which log has data.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC v2
Adjust to new irq 16 vector allocation
Jonathan
Remove CXL_INT_RES
Use irq threads to ensure mailbox commands are executed outside irq context
Adjust for optional Dynamic Capacity log
---
drivers/cxl/core/mbox.c | 53 +++++++++++++-
drivers/cxl/cxlmem.h | 31 ++++++++
drivers/cxl/pci.c | 133 +++++++++++++++++++++++++++++++++++
include/uapi/linux/cxl_mem.h | 2 +
4 files changed, 217 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 879b228a98a0..1e6762af2a00 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -53,6 +53,8 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
+ CXL_CMD(GET_EVT_INT_POLICY, 0, 0x5, 0),
+ CXL_CMD(SET_EVT_INT_POLICY, 0x5, 0, 0),
CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
@@ -791,8 +793,8 @@ static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
&payload, sizeof(payload), NULL, 0);
}
-static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
- enum cxl_event_log_type type)
+void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+ enum cxl_event_log_type type)
{
struct cxl_get_event_payload payload;
u16 pl_nr;
@@ -837,6 +839,7 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
} while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_get_records_log, CXL);
/**
* cxl_mem_get_event_records - Get Event Records from the device
@@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
+int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
+{
+ struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
+ size_t policy_size = sizeof(*policy);
+ bool retry = true;
+ int rc;
+
+ policy->info_settings = CXL_INT_MSI_MSIX;
+ policy->warn_settings = CXL_INT_MSI_MSIX;
+ policy->failure_settings = CXL_INT_MSI_MSIX;
+ policy->fatal_settings = CXL_INT_MSI_MSIX;
+ policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
+
+again:
+ rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
+ policy, policy_size, NULL, 0);
+ if (rc < 0) {
+ /*
+ * If the device does not support dynamic capacity it may fail
+ * the command due to an invalid payload. Retry without
+ * dynamic capacity.
+ */
+ if (retry) {
+ retry = false;
+ policy->dyn_cap_settings = 0;
+ policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
+ goto again;
+ }
+ dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
+ rc);
+ memset(policy, CXL_INT_NONE, sizeof(*policy));
+ return rc;
+ }
+
+ rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
+ policy, policy_size);
+ if (rc < 0) {
+ dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
+ rc);
+ return rc;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
+
/**
* cxl_mem_get_partition_info - Get partition info
* @cxlds: The device data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 03da4f8f74d3..4d9c3ea30c24 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -179,6 +179,31 @@ struct cxl_endpoint_dvsec_info {
struct range dvsec_range[2];
};
+/**
+ * Event Interrupt Policy
+ *
+ * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
+ */
+enum cxl_event_int_mode {
+ CXL_INT_NONE = 0x00,
+ CXL_INT_MSI_MSIX = 0x01,
+ CXL_INT_FW = 0x02
+};
+#define CXL_EVENT_INT_MODE_MASK 0x3
+#define CXL_EVENT_INT_MSGNUM(setting) (((setting) & 0xf0) >> 4)
+struct cxl_event_interrupt_policy {
+ u8 info_settings;
+ u8 warn_settings;
+ u8 failure_settings;
+ u8 fatal_settings;
+ u8 dyn_cap_settings;
+} __packed;
+
+static inline bool cxl_evt_int_is_msi(u8 setting)
+{
+ return CXL_INT_MSI_MSIX == (setting & CXL_EVENT_INT_MODE_MASK);
+}
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -246,6 +271,7 @@ struct cxl_dev_state {
resource_size_t component_reg_phys;
u64 serial;
+ struct cxl_event_interrupt_policy evt_int_policy;
struct xarray doe_mbs;
@@ -259,6 +285,8 @@ enum cxl_opcode {
CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
+ CXL_MBOX_OP_GET_EVT_INT_POLICY = 0x0102,
+ CXL_MBOX_OP_SET_EVT_INT_POLICY = 0x0103,
CXL_MBOX_OP_GET_FW_INFO = 0x0200,
CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
@@ -539,7 +567,10 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
+void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
+ enum cxl_event_log_type type);
void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
+int cxl_event_config_msgnums(struct cxl_dev_state *cxlds);
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index e0d511575b45..64b2e2671043 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
cxlds->nr_irq_vecs = nvecs;
}
+struct cxl_event_irq_id {
+ struct cxl_dev_state *cxlds;
+ u32 status;
+ unsigned int msgnum;
+};
+
+static irqreturn_t cxl_event_int_thread(int irq, void *id)
+{
+ struct cxl_event_irq_id *cxlid = id;
+ struct cxl_dev_state *cxlds = cxlid->cxlds;
+
+ if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
+ if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
+ if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
+ if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
+ if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
+ cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t cxl_event_int_handler(int irq, void *id)
+{
+ struct cxl_event_irq_id *cxlid = id;
+ struct cxl_dev_state *cxlds = cxlid->cxlds;
+ u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
+
+ if (cxlid->status & status)
+ return IRQ_WAKE_THREAD;
+ return IRQ_HANDLED;
+}
+
+static void cxl_free_event_irq(void *id)
+{
+ struct cxl_event_irq_id *cxlid = id;
+ struct pci_dev *pdev = to_pci_dev(cxlid->cxlds->dev);
+
+ pci_free_irq(pdev, cxlid->msgnum, id);
+}
+
+static u32 log_type_to_status(enum cxl_event_log_type log_type)
+{
+ switch (log_type) {
+ case CXL_EVENT_TYPE_INFO:
+ return CXLDEV_EVENT_STATUS_INFO | CXLDEV_EVENT_STATUS_DYNAMIC_CAP;
+ case CXL_EVENT_TYPE_WARN:
+ return CXLDEV_EVENT_STATUS_WARN;
+ case CXL_EVENT_TYPE_FAIL:
+ return CXLDEV_EVENT_STATUS_FAIL;
+ case CXL_EVENT_TYPE_FATAL:
+ return CXLDEV_EVENT_STATUS_FATAL;
+ default:
+ break;
+ }
+ return 0;
+}
+
+static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
+ enum cxl_event_log_type log_type,
+ u8 setting)
+{
+ struct device *dev = cxlds->dev;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct cxl_event_irq_id *id;
+ unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
+ int irq;
+
+ /* Disabled irq is not an error */
+ if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
+ dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
+ cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
+ msgnum, cxlds->nr_irq_vecs);
+ return 0;
+ }
+
+ id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
+ if (!id)
+ return -ENOMEM;
+
+ id->cxlds = cxlds;
+ id->msgnum = msgnum;
+ id->status = log_type_to_status(log_type);
+
+ irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
+ cxl_event_int_thread, id,
+ "%s:event-log-%s", dev_name(dev),
+ cxl_event_log_type_str(log_type));
+ if (irq)
+ return irq;
+
+ devm_add_action_or_reset(dev, cxl_free_event_irq, id);
+ return 0;
+}
+
+static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
+{
+ struct device *dev = cxlds->dev;
+ u8 setting;
+
+ if (cxl_event_config_msgnums(cxlds))
+ return;
+
+ /*
+ * Dynamic Capacity shares the info message number
+ * Nothing to be done except check the status bit in the
+ * irq thread.
+ */
+ setting = cxlds->evt_int_policy.info_settings;
+ if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
+ dev_err(dev, "Failed to get interrupt for %s event log\n",
+ cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
+
+ setting = cxlds->evt_int_policy.warn_settings;
+ if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
+ dev_err(dev, "Failed to get interrupt for %s event log\n",
+ cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
+
+ setting = cxlds->evt_int_policy.failure_settings;
+ if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
+ dev_err(dev, "Failed to get interrupt for %s event log\n",
+ cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
+
+ setting = cxlds->evt_int_policy.fatal_settings;
+ if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
+ dev_err(dev, "Failed to get interrupt for %s event log\n",
+ cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct cxl_register_map map;
@@ -525,6 +657,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return rc;
cxl_pci_alloc_irq_vectors(cxlds);
+ cxl_event_irqsetup(cxlds);
cxlmd = devm_cxl_add_memdev(cxlds);
if (IS_ERR(cxlmd))
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index 7c1ad8062792..a8204802fcca 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -26,6 +26,8 @@
___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
___C(GET_EVENT_RECORD, "Get Event Record"), \
___C(CLEAR_EVENT_RECORD, "Clear Event Record"), \
+ ___C(GET_EVT_INT_POLICY, "Get Event Interrupt Policy"), \
+ ___C(SET_EVT_INT_POLICY, "Set Event Interrupt Policy"), \
___C(GET_FW_INFO, "Get FW Info"), \
___C(GET_PARTITION_INFO, "Get Partition Information"), \
___C(GET_LSA, "Get Label Storage Area"), \
--
2.37.2
From: Ira Weiny <[email protected]>
CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
Determine if the event read is a general media record and if so trace
the record as a General Media Event Record.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC v2:
Output DPA flags as a single field
Ensure names of fields match what TP_print outputs
Steven
prefix TRACE_EVENT with 'cxl_'
Jonathan
Remove Reserved field
Changes from RFC:
Add reserved byte array
Use common CXL event header record macros
Jonathan
Use unaligned_le{24,16} for unaligned fields
Don't use the inverse of phy addr mask
Dave Jiang
s/cxl_gen_media_event/general_media
s/cxl_evt_gen_media/cxl_event_gen_media
---
drivers/cxl/core/mbox.c | 40 ++++++++++--
drivers/cxl/cxlmem.h | 19 ++++++
include/trace/events/cxl.h | 124 +++++++++++++++++++++++++++++++++++++
3 files changed, 179 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f46558e09f08..6d48fdb07700 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -709,6 +709,38 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+static const uuid_t gen_media_event_uuid =
+ UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
+
+static bool cxl_event_tracing_enabled(void)
+{
+ return trace_cxl_generic_event_enabled() ||
+ trace_cxl_general_media_enabled();
+}
+
+static void cxl_trace_event_record(const char *dev_name,
+ enum cxl_event_log_type type,
+ struct cxl_event_record_raw *record)
+{
+ uuid_t *id = &record->hdr.id;
+
+ if (uuid_equal(id, &gen_media_event_uuid)) {
+ struct cxl_event_gen_media *rec =
+ (struct cxl_event_gen_media *)record;
+
+ trace_cxl_general_media(dev_name, type, rec);
+ return;
+ }
+
+ /* For unknown record types print just the header */
+ trace_cxl_generic_event(dev_name, type, record);
+}
+
static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
enum cxl_event_log_type log,
struct cxl_get_event_payload *get_pl, u16 nr)
@@ -754,11 +786,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
int i;
- if (trace_cxl_generic_event_enabled()) {
+ if (cxl_event_tracing_enabled()) {
for (i = 0; i < nr_rec; i++)
- trace_cxl_generic_event(dev_name(cxlds->dev),
- type,
- &payload.record[i]);
+ cxl_trace_event_record(dev_name(cxlds->dev),
+ type,
+ &payload.record[i]);
}
rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 28a114c7cf69..86197f3168c7 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -412,6 +412,25 @@ struct cxl_mbox_clear_event_payload {
__le16 handle[CXL_GET_EVENT_NR_RECORDS];
};
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
+struct cxl_event_gen_media {
+ struct cxl_event_record_hdr hdr;
+ __le64 phys_addr;
+ u8 descriptor;
+ u8 type;
+ u8 transaction_type;
+ u8 validity_flags[2];
+ u8 channel;
+ u8 rank;
+ u8 device[3];
+ u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
+ u8 reserved[0x2e];
+} __packed;
+
struct cxl_mbox_get_partition_info {
__le64 active_volatile_cap;
__le64 active_persistent_cap;
diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
index 60dec9a84918..a0c20e110708 100644
--- a/include/trace/events/cxl.h
+++ b/include/trace/events/cxl.h
@@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
__print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
);
+/*
+ * Physical Address field masks
+ *
+ * General Media Event Record
+ * CXL v2.0 Section 8.2.9.1.1.1; Table 154
+ *
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CXL_DPA_FLAGS_MASK 0x3F
+#define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK)
+
+#define CXL_DPA_VOLATILE BIT(0)
+#define CXL_DPA_NOT_REPAIRABLE BIT(1)
+#define show_dpa_flags(flags) __print_flags(flags, "|", \
+ { CXL_DPA_VOLATILE, "VOLATILE" }, \
+ { CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \
+)
+
+/*
+ * General Media Event Record - GMER
+ * CXL v2.0 Section 8.2.9.1.1.1; Table 154
+ */
+#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
+#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
+#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW BIT(2)
+#define show_event_desc_flags(flags) __print_flags(flags, "|", \
+ { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
+ { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
+ { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
+)
+
+#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
+#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
+#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
+#define show_mem_event_type(type) __print_symbolic(type, \
+ { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
+ { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
+ { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" } \
+)
+
+#define CXL_GMER_TRANS_UNKNOWN 0x00
+#define CXL_GMER_TRANS_HOST_READ 0x01
+#define CXL_GMER_TRANS_HOST_WRITE 0x02
+#define CXL_GMER_TRANS_HOST_SCAN_MEDIA 0x03
+#define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
+#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
+#define show_trans_type(type) __print_symbolic(type, \
+ { CXL_GMER_TRANS_UNKNOWN, "Unknown" }, \
+ { CXL_GMER_TRANS_HOST_READ, "Host Read" }, \
+ { CXL_GMER_TRANS_HOST_WRITE, "Host Write" }, \
+ { CXL_GMER_TRANS_HOST_SCAN_MEDIA, "Host Scan Media" }, \
+ { CXL_GMER_TRANS_HOST_INJECT_POISON, "Host Inject Poison" }, \
+ { CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB, "Internal Media Scrub" }, \
+ { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT, "Internal Media Management" } \
+)
+
+#define CXL_GMER_VALID_CHANNEL BIT(0)
+#define CXL_GMER_VALID_RANK BIT(1)
+#define CXL_GMER_VALID_DEVICE BIT(2)
+#define CXL_GMER_VALID_COMPONENT BIT(3)
+#define show_valid_flags(flags) __print_flags(flags, "|", \
+ { CXL_GMER_VALID_CHANNEL, "CHANNEL" }, \
+ { CXL_GMER_VALID_RANK, "RANK" }, \
+ { CXL_GMER_VALID_DEVICE, "DEVICE" }, \
+ { CXL_GMER_VALID_COMPONENT, "COMPONENT" } \
+)
+
+TRACE_EVENT(cxl_general_media,
+
+ TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
+ struct cxl_event_gen_media *rec),
+
+ TP_ARGS(dev_name, log, rec),
+
+ TP_STRUCT__entry(
+ CXL_EVT_TP_entry
+ /* General Media */
+ __field(u64, dpa)
+ __field(u8, descriptor)
+ __field(u8, type)
+ __field(u8, transaction_type)
+ __field(u8, channel)
+ __field(u32, device)
+ __array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
+ __field(u16, validity_flags)
+ /* Following are out of order to pack trace record */
+ __field(u8, rank)
+ __field(u8, dpa_flags)
+ ),
+
+ TP_fast_assign(
+ CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
+
+ /* General Media */
+ __entry->dpa = le64_to_cpu(rec->phys_addr);
+ __entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
+ /* Mask after flags have been parsed */
+ __entry->dpa &= CXL_DPA_MASK;
+ __entry->descriptor = rec->descriptor;
+ __entry->type = rec->type;
+ __entry->transaction_type = rec->transaction_type;
+ __entry->channel = rec->channel;
+ __entry->rank = rec->rank;
+ __entry->device = get_unaligned_le24(rec->device);
+ memcpy(__entry->comp_id, &rec->component_id,
+ CXL_EVENT_GEN_MED_COMP_ID_SIZE);
+ __entry->validity_flags = get_unaligned_le16(&rec->validity_flags);
+ ),
+
+ CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
+ "descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
+ "device=%x comp_id=%s validity_flags='%s'",
+ __entry->dpa, show_dpa_flags(__entry->dpa_flags),
+ show_event_desc_flags(__entry->descriptor),
+ show_mem_event_type(__entry->type),
+ show_trans_type(__entry->transaction_type),
+ __entry->channel, __entry->rank, __entry->device,
+ __print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
+ show_valid_flags(__entry->validity_flags)
+ )
+);
+
#endif /* _CXL_TRACE_EVENTS_H */
/* This part must be outside protection */
--
2.37.2
From: Davidlohr Bueso <[email protected]>
Currently the only CXL features targeted for irq support require their
message numbers to be within the first 16 entries. The device may
however support less than 16 entries depending on the support it
provides.
Attempt to allocate these 16 irq vectors. If the device supports less
then the PCI infrastructure will allocate that number. Store the number
of vectors actually allocated in the device state for later use
by individual functions.
Upon successful allocation, users can plug in their respective isr at
any point thereafter, for example, if the irq setup is not done in the
PCI driver, such as the case of the CXL-PMU.
Cc: Bjorn Helgaas <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Co-developed-by: Ira Weiny <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
---
Changes from Ira
Remove reviews
Allocate up to a static 16 vectors.
Change cover letter
---
drivers/cxl/cxlmem.h | 3 +++
drivers/cxl/cxlpci.h | 6 ++++++
drivers/cxl/pci.c | 32 ++++++++++++++++++++++++++++++++
3 files changed, 41 insertions(+)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88e3a8e54b6a..b7b955ded3ac 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
* @info: Cached DVSEC information about the device.
* @serial: PCIe Device Serial Number
* @doe_mbs: PCI DOE mailbox array
+ * @nr_irq_vecs: Number of MSI-X/MSI vectors available
* @mbox_send: @dev specific transport for transmitting mailbox commands
*
* See section 8.2.9.5.2 Capacity Configuration and Label Storage for
@@ -247,6 +248,8 @@ struct cxl_dev_state {
struct xarray doe_mbs;
+ int nr_irq_vecs;
+
int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
};
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index eec597dbe763..b7f4e2f417d3 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -53,6 +53,12 @@
#define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
#define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
+/*
+ * NOTE: Currently all the functions which are enabled for CXL require their
+ * vectors to be in the first 16. Use this as the max.
+ */
+#define CXL_PCI_REQUIRED_VECTORS 16
+
/* Register Block Identifier (RBI) */
enum cxl_regloc_type {
CXL_REGLOC_RBI_EMPTY = 0,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faeb5d9d7a7a..62e560063e50 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
}
}
+static void cxl_pci_free_irq_vectors(void *data)
+{
+ pci_free_irq_vectors(data);
+}
+
+static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
+{
+ struct device *dev = cxlds->dev;
+ struct pci_dev *pdev = to_pci_dev(dev);
+ int nvecs;
+ int rc;
+
+ nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
+ PCI_IRQ_MSIX | PCI_IRQ_MSI);
+ if (nvecs < 0) {
+ dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
+ return;
+ }
+
+ rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
+ if (rc) {
+ dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
+ /* some got allocated, clean them up */
+ cxl_pci_free_irq_vectors(pdev);
+ return;
+ }
+
+ cxlds->nr_irq_vecs = nvecs;
+}
+
static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct cxl_register_map map;
@@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
+ cxl_pci_alloc_irq_vectors(cxlds);
+
cxlmd = devm_cxl_add_memdev(cxlds);
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);
--
2.37.2
From: Ira Weiny <[email protected]>
Log overflow is marked by a separate trace message.
Simulate a log with lots of messages and flag overflow until it is
drained a bit.
Signed-off-by: Ira Weiny <[email protected]>
---
Changes from RFC
Adjust for new struct changes
---
tools/testing/cxl/test/events.c | 37 +++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
index 8693f3fb9cbb..5ce257114f4e 100644
--- a/tools/testing/cxl/test/events.c
+++ b/tools/testing/cxl/test/events.c
@@ -69,11 +69,21 @@ static void event_store_add_event(struct mock_event_store *mes,
log->nr_events++;
}
+static u16 log_overflow(struct mock_event_log *log)
+{
+ int cnt = log_rec_left(log) - 5;
+
+ if (cnt < 0)
+ return 0;
+ return cnt;
+}
+
int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
{
struct cxl_get_event_payload *pl;
struct mock_event_log *log;
u8 log_type;
+ u16 nr_overflow;
/* Valid request? */
if (cmd->size_in != sizeof(log_type))
@@ -95,6 +105,20 @@ int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
if (log_rec_left(log) > 1)
pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
+ nr_overflow = log_overflow(log);
+ if (nr_overflow) {
+ u64 ns;
+
+ pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
+ pl->overflow_err_count = cpu_to_le16(nr_overflow);
+ ns = ktime_get_real_ns();
+ ns -= 5000000000; /* 5s ago */
+ pl->first_overflow_timestamp = cpu_to_le64(ns);
+ ns = ktime_get_real_ns();
+ ns -= 1000000000; /* 1s ago */
+ pl->last_overflow_timestamp = cpu_to_le64(ns);
+ }
+
memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
pl->record[0].hdr.handle = get_cur_event_handle(log);
return 0;
@@ -274,6 +298,19 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
(struct cxl_event_record_raw *)&mem_module);
mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&dram);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&gen_media);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&mem_module);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+ event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
+ (struct cxl_event_record_raw *)&dram);
+ mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL;
+
event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
(struct cxl_event_record_raw *)&dram);
--
2.37.2
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Davidlohr Bueso <[email protected]>
>
> Currently the only CXL features targeted for irq support require their
> message numbers to be within the first 16 entries. The device may
> however support less than 16 entries depending on the support it
> provides.
>
> Attempt to allocate these 16 irq vectors. If the device supports less
> then the PCI infrastructure will allocate that number. Store the number
> of vectors actually allocated in the device state for later use
> by individual functions.
>
> Upon successful allocation, users can plug in their respective isr at
> any point thereafter, for example, if the irq setup is not done in the
> PCI driver, such as the case of the CXL-PMU.
>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Jonathan Cameron <[email protected]>
> Co-developed-by: Ira Weiny <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
> Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
>
> ---
> Changes from Ira
> Remove reviews
> Allocate up to a static 16 vectors.
> Change cover letter
> ---
> drivers/cxl/cxlmem.h | 3 +++
> drivers/cxl/cxlpci.h | 6 ++++++
> drivers/cxl/pci.c | 32 ++++++++++++++++++++++++++++++++
> 3 files changed, 41 insertions(+)
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..b7b955ded3ac 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> * @info: Cached DVSEC information about the device.
> * @serial: PCIe Device Serial Number
> * @doe_mbs: PCI DOE mailbox array
> + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
> * @mbox_send: @dev specific transport for transmitting mailbox commands
> *
> * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -247,6 +248,8 @@ struct cxl_dev_state {
>
> struct xarray doe_mbs;
>
> + int nr_irq_vecs;
> +
> int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> };
>
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index eec597dbe763..b7f4e2f417d3 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,6 +53,12 @@
> #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
>
> +/*
> + * NOTE: Currently all the functions which are enabled for CXL require their
> + * vectors to be in the first 16. Use this as the max.
> + */
> +#define CXL_PCI_REQUIRED_VECTORS 16
> +
> /* Register Block Identifier (RBI) */
> enum cxl_regloc_type {
> CXL_REGLOC_RBI_EMPTY = 0,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..62e560063e50 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> }
> }
>
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> + pci_free_irq_vectors(data);
> +}
> +
> +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> + struct device *dev = cxlds->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int nvecs;
> + int rc;
> +
> + nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> + PCI_IRQ_MSIX | PCI_IRQ_MSI);
> + if (nvecs < 0) {
> + dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> + return;
> + }
> +
> + rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> + if (rc) {
> + dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> + /* some got allocated, clean them up */
> + cxl_pci_free_irq_vectors(pdev);
> + return;
> + }
> +
> + cxlds->nr_irq_vecs = nvecs;
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> + cxl_pci_alloc_irq_vectors(cxlds);
> +
> cxlmd = devm_cxl_add_memdev(cxlds);
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> The information contained in the events prior to the driver loading can
> be queried at any time through other mailbox commands.
>
> Ensure a clean slate of events by reading and clearing the events. The
> events are sent to the trace buffer but it is not anticipated to have
> anyone listening to it at driver load time.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
> ---
> drivers/cxl/pci.c | 2 ++
> tools/testing/cxl/test/mem.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 62e560063e50..e0d511575b45 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -530,6 +530,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> + cxl_mem_get_event_records(cxlds);
> +
> if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> rc = devm_cxl_add_nvdimm(&pdev->dev, cxlmd);
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index aa2df3a15051..e2f5445d24ff 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -285,6 +285,8 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
>
> + cxl_mem_get_event_records(cxlds);
> +
> if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM))
> rc = devm_cxl_add_nvdimm(dev, cxlmd);
>
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL devices have multiple event logs which can be queried for CXL event
> records. Devices are required to support the storage of at least one
> event record in each event log type.
>
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
>
> Software queries events via the Get Event Record mailbox command; CXL
> rev 3.0 section 8.2.9.2.2.
>
> Issue the Get Event Record mailbox command on driver load. Trace each
> record found with a generic record trace. Trace any overflow
> conditions.
>
> The device can return up to 1MB worth of event records per query. This
> presents complications with allocating a huge buffers to potentially
> capture all the records. It is not anticipated that these event logs
> will be very deep and reading them does not need to be performant.
> Process only 3 records at a time. 3 records was chosen as it fits
> comfortably on the stack to prevent dynamic allocation while still
> cutting down on extra mailbox messages.
>
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
>
> Macros are created to use for tracing the common CXL Event header
> fields.
>
> Cc: Steven Rostedt <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
Would it be cleaner to split out the include/trace/events/cxl.h changes
to its own patch?
Reviewed-by: Dave Jiang <[email protected]>
>
> ---
> Change from RFC v2:
> Support reading 3 events at once.
> Reverse Jonathan's suggestion and check for positive number of
> records. Because the record count may have been
> returned as something > 3 based on what the device
> thinks it can send back even though the core Linux mbox
> processing truncates the data.
> Alison and Dave Jiang
> Change header uuid type to uuid_t for better user space
> processing
> Smita
> Check status reg before reading log.
> Steven
> Prefix all trace points with 'cxl_'
> Use static branch <trace>_enabled() calls
> Jonathan
> s/CXL_EVENT_TYPE_INFO/0
> s/{first,last}/{first,last}_ts
> Remove Reserved field from header
> Fix header issue for cxl_event_log_type_str()
>
> Change from RFC:
> Remove redundant error message in get event records loop
> s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
> Use hdr_uuid for the header UUID field
> Use cxl_event_log_type_str() for the trace events
> Create macros for the header fields and common entries of each event
> Add reserved buffer output dump
> Report error if event query fails
> Remove unused record_cnt variable
> Steven - reorder overflow record
> Remove NOTE about checkpatch
> Jonathan
> check for exactly 1 record
> s/v3.0/rev 3.0
> Use 3 byte fields for 24bit fields
> Add 3.0 Maintenance Operation Class
> Add Dynamic Capacity log type
> Fix spelling
> Dave Jiang/Dan/Alison
> s/cxl-event/cxl
> trace/events/cxl-events => trace/events/cxl.h
> s/cxl_event_overflow/overflow
> s/cxl_event/generic_event
> ---
> MAINTAINERS | 1 +
> drivers/cxl/core/mbox.c | 70 +++++++++++++++++++
> drivers/cxl/cxl.h | 8 +++
> drivers/cxl/cxlmem.h | 73 ++++++++++++++++++++
> include/trace/events/cxl.h | 127 +++++++++++++++++++++++++++++++++++
> include/uapi/linux/cxl_mem.h | 1 +
> 6 files changed, 280 insertions(+)
> create mode 100644 include/trace/events/cxl.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ca063a504026..4b7c6e3055c6 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5223,6 +5223,7 @@ M: Dan Williams <[email protected]>
> L: [email protected]
> S: Maintained
> F: drivers/cxl/
> +F: include/trace/events/cxl.h
> F: include/uapi/linux/cxl_mem.h
>
> CONEXANT ACCESSRUNNER USB DRIVER
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..a908b95a7de4 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -7,6 +7,9 @@
> #include <cxlmem.h>
> #include <cxl.h>
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/cxl.h>
> +
> #include "core.h"
>
> static bool cxl_raw_allow_all;
> @@ -48,6 +51,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> CXL_CMD(RAW, CXL_VARIABLE_PAYLOAD, CXL_VARIABLE_PAYLOAD, 0),
> #endif
> CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> + CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -704,6 +708,72 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type type)
> +{
> + struct cxl_get_event_payload payload;
> + u16 pl_nr;
> +
> + do {
> + u8 log_type = type;
> + int rc;
> +
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> + &log_type, sizeof(log_type),
> + &payload, sizeof(payload));
> + if (rc) {
> + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> + cxl_event_log_type_str(type), rc);
> + return;
> + }
> +
> + pl_nr = le16_to_cpu(payload.record_count);
> + if (trace_cxl_generic_event_enabled()) {
> + u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> + int i;
> +
> + for (i = 0; i < nr_rec; i++)
> + trace_cxl_generic_event(dev_name(cxlds->dev),
> + type,
> + &payload.record[i]);
> + }
> +
> + if (trace_cxl_overflow_enabled() &&
> + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> +
> + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +}
> +
> +/**
> + * cxl_mem_get_event_records - Get Event Records from the device
> + * @cxlds: The device data for the operation
> + *
> + * Retrieve all event records available on the device and report them as trace
> + * events.
> + *
> + * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + */
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> +{
> + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> + dev_dbg(cxlds->dev, "Reading event logs: %x\n", status);
> +
> + if (status & CXLDEV_EVENT_STATUS_INFO)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> + if (status & CXLDEV_EVENT_STATUS_WARN)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> + if (status & CXLDEV_EVENT_STATUS_FAIL)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> + if (status & CXLDEV_EVENT_STATUS_FATAL)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> + if (status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> +
> /**
> * cxl_mem_get_partition_info - Get partition info
> * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index f680450f0b16..492cff1bea6d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -132,6 +132,14 @@ static inline int ways_to_cxl(unsigned int ways, u8 *iw)
> #define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
> #define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
>
> +/* CXL 3.0 8.2.8.3.1 Event Status Register */
> +#define CXLDEV_DEV_EVENT_STATUS_OFFSET 0x00
> +#define CXLDEV_EVENT_STATUS_INFO BIT(0)
> +#define CXLDEV_EVENT_STATUS_WARN BIT(1)
> +#define CXLDEV_EVENT_STATUS_FAIL BIT(2)
> +#define CXLDEV_EVENT_STATUS_FATAL BIT(3)
> +#define CXLDEV_EVENT_STATUS_DYNAMIC_CAP BIT(4)
> +
> /* CXL 2.0 8.2.8.4 Mailbox Registers */
> #define CXLDEV_MBOX_CAPS_OFFSET 0x00
> #define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index b7b955ded3ac..da64ba0f156b 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -4,6 +4,7 @@
> #define __CXL_MEM_H__
> #include <uapi/linux/cxl_mem.h>
> #include <linux/cdev.h>
> +#include <linux/uuid.h>
> #include "cxl.h"
>
> /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -256,6 +257,7 @@ struct cxl_dev_state {
> enum cxl_opcode {
> CXL_MBOX_OP_INVALID = 0x0000,
> CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> + CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
> CXL_MBOX_OP_GET_FW_INFO = 0x0200,
> CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> @@ -325,6 +327,76 @@ struct cxl_mbox_identify {
> u8 qos_telemetry_caps;
> } __packed;
>
> +/*
> + * Common Event Record Format
> + * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +struct cxl_event_record_hdr {
> + uuid_t id;
> + u8 length;
> + u8 flags[3];
> + __le16 handle;
> + __le16 related_handle;
> + __le64 timestamp;
> + u8 maint_op_class;
> + u8 reserved[0xf];
> +} __packed;
> +
> +#define CXL_EVENT_RECORD_DATA_LENGTH 0x50
> +struct cxl_event_record_raw {
> + struct cxl_event_record_hdr hdr;
> + u8 data[CXL_EVENT_RECORD_DATA_LENGTH];
> +} __packed;
> +
> +/*
> + * Get Event Records output payload
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
> + */
> +#define CXL_GET_EVENT_FLAG_OVERFLOW BIT(0)
> +#define CXL_GET_EVENT_FLAG_MORE_RECORDS BIT(1)
> +#define CXL_GET_EVENT_NR_RECORDS 3
> +struct cxl_get_event_payload {
> + u8 flags;
> + u8 reserved1;
> + __le16 overflow_err_count;
> + __le64 first_overflow_timestamp;
> + __le64 last_overflow_timestamp;
> + __le16 record_count;
> + u8 reserved2[0xa];
> + struct cxl_event_record_raw record[CXL_GET_EVENT_NR_RECORDS];
> +} __packed;
> +
> +/*
> + * CXL rev 3.0 section 8.2.9.2.2; Table 8-49
> + */
> +enum cxl_event_log_type {
> + CXL_EVENT_TYPE_INFO = 0x00,
> + CXL_EVENT_TYPE_WARN,
> + CXL_EVENT_TYPE_FAIL,
> + CXL_EVENT_TYPE_FATAL,
> + CXL_EVENT_TYPE_DYNAMIC_CAP,
> + CXL_EVENT_TYPE_MAX
> +};
> +
> +static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> +{
> + switch (type) {
> + case CXL_EVENT_TYPE_INFO:
> + return "Informational";
> + case CXL_EVENT_TYPE_WARN:
> + return "Warning";
> + case CXL_EVENT_TYPE_FAIL:
> + return "Failure";
> + case CXL_EVENT_TYPE_FATAL:
> + return "Fatal";
> + case CXL_EVENT_TYPE_DYNAMIC_CAP:
> + return "Dynamic Capacity";
> + default:
> + break;
> + }
> + return "<unknown>";
> +}
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> @@ -384,6 +456,7 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..60dec9a84918
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,127 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM cxl
> +
> +#if !defined(_CXL_TRACE_EVENTS_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _CXL_TRACE_EVENTS_H
> +
> +#include <asm-generic/unaligned.h>
> +#include <linux/tracepoint.h>
> +#include <cxlmem.h>
> +
> +TRACE_EVENT(cxl_overflow,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_get_event_payload *payload),
> +
> + TP_ARGS(dev_name, log, payload),
> +
> + TP_STRUCT__entry(
> + __string(dev_name, dev_name)
> + __field(int, log)
> + __field(u64, first_ts)
> + __field(u64, last_ts)
> + __field(u16, count)
> + ),
> +
> + TP_fast_assign(
> + __assign_str(dev_name, dev_name);
> + __entry->log = log;
> + __entry->count = le16_to_cpu(payload->overflow_err_count);
> + __entry->first_ts = le64_to_cpu(payload->first_overflow_timestamp);
> + __entry->last_ts = le64_to_cpu(payload->last_overflow_timestamp);
> + ),
> +
> + TP_printk("%s: EVENT LOG OVERFLOW log=%s : %u records from %llu to %llu",
> + __get_str(dev_name), cxl_event_log_type_str(__entry->log),
> + __entry->count, __entry->first_ts, __entry->last_ts)
> +
> +);
> +
> +/*
> + * Common Event Record Format
> + * CXL 3.0 section 8.2.9.2.1; Table 8-42
> + */
> +#define CXL_EVENT_RECORD_FLAG_PERMANENT BIT(2)
> +#define CXL_EVENT_RECORD_FLAG_MAINT_NEEDED BIT(3)
> +#define CXL_EVENT_RECORD_FLAG_PERF_DEGRADED BIT(4)
> +#define CXL_EVENT_RECORD_FLAG_HW_REPLACE BIT(5)
> +#define show_hdr_flags(flags) __print_flags(flags, " | ", \
> + { CXL_EVENT_RECORD_FLAG_PERMANENT, "Permanent Condition" }, \
> + { CXL_EVENT_RECORD_FLAG_MAINT_NEEDED, "Maintenance Needed" }, \
> + { CXL_EVENT_RECORD_FLAG_PERF_DEGRADED, "Performance Degraded" }, \
> + { CXL_EVENT_RECORD_FLAG_HW_REPLACE, "Hardware Replacement Needed" } \
> +)
> +
> +/*
> + * Define macros for the common header of each CXL event.
> + *
> + * Tracepoints using these macros must do 3 things:
> + *
> + * 1) Add CXL_EVT_TP_entry to TP_STRUCT__entry
> + * 2) Use CXL_EVT_TP_fast_assign within TP_fast_assign;
> + * pass the dev_name, log, and CXL event header
> + * 3) Use CXL_EVT_TP_printk() instead of TP_printk()
> + *
> + * See the generic_event tracepoint as an example.
> + */
> +#define CXL_EVT_TP_entry \
> + __string(dev_name, dev_name) \
> + __field(int, log) \
> + __field_struct(uuid_t, hdr_uuid) \
> + __field(u32, hdr_flags) \
> + __field(u16, hdr_handle) \
> + __field(u16, hdr_related_handle) \
> + __field(u64, hdr_timestamp) \
> + __field(u8, hdr_length) \
> + __field(u8, hdr_maint_op_class)
> +
> +#define CXL_EVT_TP_fast_assign(dname, l, hdr) \
> + __assign_str(dev_name, (dname)); \
> + __entry->log = (l); \
> + memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
> + __entry->hdr_length = (hdr).length; \
> + __entry->hdr_flags = get_unaligned_le24((hdr).flags); \
> + __entry->hdr_handle = le16_to_cpu((hdr).handle); \
> + __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
> + __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
> + __entry->hdr_maint_op_class = (hdr).maint_op_class
> +
> +
> +#define CXL_EVT_TP_printk(fmt, ...) \
> + TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' " \
> + "handle=%x related_handle=%x maint_op_class=%u" \
> + " : " fmt, \
> + __get_str(dev_name), cxl_event_log_type_str(__entry->log), \
> + __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> + show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
> + __entry->hdr_related_handle, __entry->hdr_maint_op_class, \
> + ##__VA_ARGS__)
> +
> +TRACE_EVENT(cxl_generic_event,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_event_record_raw *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + CXL_EVT_TP_entry
> + __array(u8, data, CXL_EVENT_RECORD_DATA_LENGTH)
> + ),
> +
> + TP_fast_assign(
> + CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> + memcpy(__entry->data, &rec->data, CXL_EVENT_RECORD_DATA_LENGTH);
> + ),
> +
> + CXL_EVT_TP_printk("%s",
> + __print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> +);
> +
> +#endif /* _CXL_TRACE_EVENTS_H */
> +
> +/* This part must be outside protection */
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE cxl
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c71021a2a9ed..70459be5bdd4 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -24,6 +24,7 @@
> ___C(IDENTIFY, "Identify Command"), \
> ___C(RAW, "Raw device command"), \
> ___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
> + ___C(GET_EVENT_RECORD, "Get Event Record"), \
> ___C(GET_FW_INFO, "Get FW Info"), \
> ___C(GET_PARTITION_INFO, "Get Partition Information"), \
> ___C(GET_LSA, "Get Label Storage Area"), \
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record.
>
> Determine if the event read is a DRAM event record and if so trace the
> record.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
>
> ---
> Changes from RFC v2:
> Output DPA flags as a separate field.
> Ensure field names match TP_print output
> Steven
> prefix TRACE_EVENT with 'cxl_'
> Jonathan
> Formatting fix
> Remove reserved field
>
> Changes from RFC:
> Add reserved byte data
> Use new CXL header macros
> Jonathan
> Use get_unaligned_le{24,16}() for unaligned fields
> Use 'else if'
> Dave Jiang
> s/cxl_dram_event/dram
> s/cxl_evt_dram_rec/cxl_event_dram
> Adjust for new phys addr mask
> ---
> drivers/cxl/core/mbox.c | 16 ++++++-
> drivers/cxl/cxlmem.h | 23 ++++++++++
> include/trace/events/cxl.h | 92 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 130 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6d48fdb07700..b03d7b856f3d 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -717,10 +717,19 @@ static const uuid_t gen_media_event_uuid =
> UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
>
> +/*
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +static const uuid_t dram_event_uuid =
> + UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
> +
> static bool cxl_event_tracing_enabled(void)
> {
> return trace_cxl_generic_event_enabled() ||
> - trace_cxl_general_media_enabled();
> + trace_cxl_general_media_enabled() ||
> + trace_cxl_dram_enabled();
> }
>
> static void cxl_trace_event_record(const char *dev_name,
> @@ -735,6 +744,11 @@ static void cxl_trace_event_record(const char *dev_name,
>
> trace_cxl_general_media(dev_name, type, rec);
> return;
> + } else if (uuid_equal(id, &dram_event_uuid)) {
> + struct cxl_event_dram *rec = (struct cxl_event_dram *)record;
> +
> + trace_cxl_dram(dev_name, type, rec);
> + return;
> }
>
> /* For unknown record types print just the header */
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 86197f3168c7..87c877f0940d 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -431,6 +431,29 @@ struct cxl_event_gen_media {
> u8 reserved[0x2e];
> } __packed;
>
> +/*
> + * DRAM Event Record - DER
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 3-44
> + */
> +#define CXL_EVENT_DER_CORRECTION_MASK_SIZE 0x20
> +struct cxl_event_dram {
> + struct cxl_event_record_hdr hdr;
> + __le64 phys_addr;
> + u8 descriptor;
> + u8 type;
> + u8 transaction_type;
> + u8 validity_flags[2];
> + u8 channel;
> + u8 rank;
> + u8 nibble_mask[3];
> + u8 bank_group;
> + u8 bank;
> + u8 row[3];
> + u8 column[2];
> + u8 correction_mask[CXL_EVENT_DER_CORRECTION_MASK_SIZE];
> + u8 reserved[0x17];
> +} __packed;
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index a0c20e110708..37bbe59905af 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -243,6 +243,98 @@ TRACE_EVENT(cxl_general_media,
> )
> );
>
> +/*
> + * DRAM Event Record - DER
> + *
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +/*
> + * DRAM Event Record defines many fields the same as the General Media Event
> + * Record. Reuse those definitions as appropriate.
> + */
> +#define CXL_DER_VALID_CHANNEL BIT(0)
> +#define CXL_DER_VALID_RANK BIT(1)
> +#define CXL_DER_VALID_NIBBLE BIT(2)
> +#define CXL_DER_VALID_BANK_GROUP BIT(3)
> +#define CXL_DER_VALID_BANK BIT(4)
> +#define CXL_DER_VALID_ROW BIT(5)
> +#define CXL_DER_VALID_COLUMN BIT(6)
> +#define CXL_DER_VALID_CORRECTION_MASK BIT(7)
> +#define show_dram_valid_flags(flags) __print_flags(flags, "|", \
> + { CXL_DER_VALID_CHANNEL, "CHANNEL" }, \
> + { CXL_DER_VALID_RANK, "RANK" }, \
> + { CXL_DER_VALID_NIBBLE, "NIBBLE" }, \
> + { CXL_DER_VALID_BANK_GROUP, "BANK GROUP" }, \
> + { CXL_DER_VALID_BANK, "BANK" }, \
> + { CXL_DER_VALID_ROW, "ROW" }, \
> + { CXL_DER_VALID_COLUMN, "COLUMN" }, \
> + { CXL_DER_VALID_CORRECTION_MASK, "CORRECTION MASK" } \
> +)
> +
> +TRACE_EVENT(cxl_dram,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_event_dram *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + CXL_EVT_TP_entry
> + /* DRAM */
> + __field(u64, dpa)
> + __field(u8, descriptor)
> + __field(u8, type)
> + __field(u8, transaction_type)
> + __field(u8, channel)
> + __field(u16, validity_flags)
> + __field(u16, column) /* Out of order to pack trace record */
> + __field(u32, nibble_mask)
> + __field(u32, row)
> + __array(u8, cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE)
> + __field(u8, rank) /* Out of order to pack trace record */
> + __field(u8, bank_group) /* Out of order to pack trace record */
> + __field(u8, bank) /* Out of order to pack trace record */
> + __field(u8, dpa_flags) /* Out of order to pack trace record */
> + ),
> +
> + TP_fast_assign(
> + CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +
> + /* DRAM */
> + __entry->dpa = le64_to_cpu(rec->phys_addr);
> + __entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
> + __entry->dpa &= CXL_DPA_MASK;
> + __entry->descriptor = rec->descriptor;
> + __entry->type = rec->type;
> + __entry->transaction_type = rec->transaction_type;
> + __entry->validity_flags = get_unaligned_le16(rec->validity_flags);
> + __entry->channel = rec->channel;
> + __entry->rank = rec->rank;
> + __entry->nibble_mask = get_unaligned_le24(rec->nibble_mask);
> + __entry->bank_group = rec->bank_group;
> + __entry->bank = rec->bank;
> + __entry->row = get_unaligned_le24(rec->row);
> + __entry->column = get_unaligned_le16(rec->column);
> + memcpy(__entry->cor_mask, &rec->correction_mask,
> + CXL_EVENT_DER_CORRECTION_MASK_SIZE);
> + ),
> +
> + CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' descriptor='%s' type='%s' " \
> + "transaction_type='%s' channel=%u rank=%u nibble_mask=%x " \
> + "bank_group=%u bank=%u row=%u column=%u cor_mask=%s " \
> + "validity_flags='%s'",
> + __entry->dpa, show_dpa_flags(__entry->dpa_flags),
> + show_event_desc_flags(__entry->descriptor),
> + show_mem_event_type(__entry->type),
> + show_trans_type(__entry->transaction_type),
> + __entry->channel, __entry->rank, __entry->nibble_mask,
> + __entry->bank_group, __entry->bank,
> + __entry->row, __entry->column,
> + __print_hex(__entry->cor_mask, CXL_EVENT_DER_CORRECTION_MASK_SIZE),
> + show_dram_valid_flags(__entry->validity_flags)
> + )
> +);
> +
> #endif /* _CXL_TRACE_EVENTS_H */
>
> /* This part must be outside protection */
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
>
> Determine if the event read is memory module record and if so trace the
> record.
>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
>
> ---
> Changes from RFC v2:
> Ensure field names match TP_print output
> Steven
> prefix TRACE_EVENT with 'cxl_'
> Jonathan
> Remove reserved field
> Define a 1bit and 2 bit status decoder
> Fix paren alignment
>
> Changes from RFC:
> Clean up spec reference
> Add reserved data
> Use new CXL header macros
> Jonathan
> Use else if
> Use get_unaligned_le*() for unaligned fields
> Dave Jiang
> s/cxl_mem_mod_event/memory_module
> s/cxl_evt_mem_mod_rec/cxl_event_mem_module
> ---
> drivers/cxl/core/mbox.c | 17 ++++-
> drivers/cxl/cxlmem.h | 26 +++++++
> include/trace/events/cxl.h | 144 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 186 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index b03d7b856f3d..879b228a98a0 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -725,11 +725,20 @@ static const uuid_t dram_event_uuid =
> UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24);
>
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +static const uuid_t mem_mod_event_uuid =
> + UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74);
> +
> static bool cxl_event_tracing_enabled(void)
> {
> return trace_cxl_generic_event_enabled() ||
> trace_cxl_general_media_enabled() ||
> - trace_cxl_dram_enabled();
> + trace_cxl_dram_enabled() ||
> + trace_cxl_memory_module_enabled();
> }
>
> static void cxl_trace_event_record(const char *dev_name,
> @@ -749,6 +758,12 @@ static void cxl_trace_event_record(const char *dev_name,
>
> trace_cxl_dram(dev_name, type, rec);
> return;
> + } else if (uuid_equal(id, &mem_mod_event_uuid)) {
> + struct cxl_event_mem_module *rec =
> + (struct cxl_event_mem_module *)record;
> +
> + trace_cxl_memory_module(dev_name, type, rec);
> + return;
> }
>
> /* For unknown record types print just the header */
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 87c877f0940d..03da4f8f74d3 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -454,6 +454,32 @@ struct cxl_event_dram {
> u8 reserved[0x17];
> } __packed;
>
> +/*
> + * Get Health Info Record
> + * CXL rev 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +struct cxl_get_health_info {
> + u8 health_status;
> + u8 media_status;
> + u8 add_status;
> + u8 life_used;
> + u8 device_temp[2];
> + u8 dirty_shutdown_cnt[4];
> + u8 cor_vol_err_cnt[4];
> + u8 cor_per_err_cnt[4];
> +} __packed;
> +
> +/*
> + * Memory Module Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +struct cxl_event_mem_module {
> + struct cxl_event_record_hdr hdr;
> + u8 event_type;
> + struct cxl_get_health_info info;
> + u8 reserved[0x3d];
> +} __packed;
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index 37bbe59905af..05437e13a882 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -335,6 +335,150 @@ TRACE_EVENT(cxl_dram,
> )
> );
>
> +/*
> + * Memory Module Event Record - MMER
> + *
> + * CXL res 3.0 section 8.2.9.2.1.3; Table 8-45
> + */
> +#define CXL_MMER_HEALTH_STATUS_CHANGE 0x00
> +#define CXL_MMER_MEDIA_STATUS_CHANGE 0x01
> +#define CXL_MMER_LIFE_USED_CHANGE 0x02
> +#define CXL_MMER_TEMP_CHANGE 0x03
> +#define CXL_MMER_DATA_PATH_ERROR 0x04
> +#define CXL_MMER_LAS_ERROR 0x05
> +#define show_dev_evt_type(type) __print_symbolic(type, \
> + { CXL_MMER_HEALTH_STATUS_CHANGE, "Health Status Change" }, \
> + { CXL_MMER_MEDIA_STATUS_CHANGE, "Media Status Change" }, \
> + { CXL_MMER_LIFE_USED_CHANGE, "Life Used Change" }, \
> + { CXL_MMER_TEMP_CHANGE, "Temperature Change" }, \
> + { CXL_MMER_DATA_PATH_ERROR, "Data Path Error" }, \
> + { CXL_MMER_LAS_ERROR, "LSA Error" } \
> +)
> +
> +/*
> + * Device Health Information - DHI
> + *
> + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
> +)
> +
> +#define CXL_DHI_MS_NORMAL 0x00
> +#define CXL_DHI_MS_NOT_READY 0x01
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOST 0x02
> +#define CXL_DHI_MS_ALL_DATA_LOST 0x03
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS 0x04
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN 0x05
> +#define CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT 0x06
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS 0x07
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN 0x08
> +#define CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT 0x09
> +#define show_media_status(ms) __print_symbolic(ms, \
> + { CXL_DHI_MS_NORMAL, \
> + "Normal" }, \
> + { CXL_DHI_MS_NOT_READY, \
> + "Not Ready" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOST, \
> + "Write Persistency Lost" }, \
> + { CXL_DHI_MS_ALL_DATA_LOST, \
> + "All Data Lost" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_POWER_LOSS, \
> + "Write Persistency Loss in the Event of Power Loss" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_EVENT_SHUTDOWN, \
> + "Write Persistency Loss in Event of Shutdown" }, \
> + { CXL_DHI_MS_WRITE_PERSISTENCY_LOSS_IMMINENT, \
> + "Write Persistency Loss Imminent" }, \
> + { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_POWER_LOSS, \
> + "All Data Loss in Event of Power Loss" }, \
> + { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_EVENT_SHUTDOWN, \
> + "All Data loss in the Event of Shutdown" }, \
> + { CXL_DHI_MS_WRITE_ALL_DATA_LOSS_IMMINENT, \
> + "All Data Loss Imminent" } \
> +)
> +
> +#define CXL_DHI_AS_NORMAL 0x0
> +#define CXL_DHI_AS_WARNING 0x1
> +#define CXL_DHI_AS_CRITICAL 0x2
> +#define show_two_bit_status(as) __print_symbolic(as, \
> + { CXL_DHI_AS_NORMAL, "Normal" }, \
> + { CXL_DHI_AS_WARNING, "Warning" }, \
> + { CXL_DHI_AS_CRITICAL, "Critical" } \
> +)
> +#define show_one_bit_status(as) __print_symbolic(as, \
> + { CXL_DHI_AS_NORMAL, "Normal" }, \
> + { CXL_DHI_AS_WARNING, "Warning" } \
> +)
> +
> +#define CXL_DHI_AS_LIFE_USED(as) (as & 0x3)
> +#define CXL_DHI_AS_DEV_TEMP(as) ((as & 0xC) >> 2)
> +#define CXL_DHI_AS_COR_VOL_ERR_CNT(as) ((as & 0x10) >> 4)
> +#define CXL_DHI_AS_COR_PER_ERR_CNT(as) ((as & 0x20) >> 5)
> +
> +TRACE_EVENT(cxl_memory_module,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_event_mem_module *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + CXL_EVT_TP_entry
> +
> + /* Memory Module Event */
> + __field(u8, event_type)
> +
> + /* Device Health Info */
> + __field(u8, health_status)
> + __field(u8, media_status)
> + __field(u8, life_used)
> + __field(u32, dirty_shutdown_cnt)
> + __field(u32, cor_vol_err_cnt)
> + __field(u32, cor_per_err_cnt)
> + __field(s16, device_temp)
> + __field(u8, add_status)
> + ),
> +
> + TP_fast_assign(
> + CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +
> + /* Memory Module Event */
> + __entry->event_type = rec->event_type;
> +
> + /* Device Health Info */
> + __entry->health_status = rec->info.health_status;
> + __entry->media_status = rec->info.media_status;
> + __entry->life_used = rec->info.life_used;
> + __entry->dirty_shutdown_cnt = get_unaligned_le32(rec->info.dirty_shutdown_cnt);
> + __entry->cor_vol_err_cnt = get_unaligned_le32(rec->info.cor_vol_err_cnt);
> + __entry->cor_per_err_cnt = get_unaligned_le32(rec->info.cor_per_err_cnt);
> + __entry->device_temp = get_unaligned_le16(rec->info.device_temp);
> + __entry->add_status = rec->info.add_status;
> + ),
> +
> + CXL_EVT_TP_printk("event_type='%s' health_status='%s' media_status='%s' " \
> + "as_life_used=%s as_dev_temp=%s as_cor_vol_err_cnt=%s " \
> + "as_cor_per_err_cnt=%s life_used=%u device_temp=%d " \
> + "dirty_shutdown_cnt=%u cor_vol_err_cnt=%u cor_per_err_cnt=%u",
> + show_dev_evt_type(__entry->event_type),
> + show_health_status_flags(__entry->health_status),
> + show_media_status(__entry->media_status),
> + show_two_bit_status(CXL_DHI_AS_LIFE_USED(__entry->add_status)),
> + show_two_bit_status(CXL_DHI_AS_DEV_TEMP(__entry->add_status)),
> + show_one_bit_status(CXL_DHI_AS_COR_VOL_ERR_CNT(__entry->add_status)),
> + show_one_bit_status(CXL_DHI_AS_COR_PER_ERR_CNT(__entry->add_status)),
> + __entry->life_used, __entry->device_temp,
> + __entry->dirty_shutdown_cnt, __entry->cor_vol_err_cnt,
> + __entry->cor_per_err_cnt
> + )
> +);
> +
> +
> #endif /* _CXL_TRACE_EVENTS_H */
>
> /* This part must be outside protection */
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL device events are signaled via interrupts. Each event log may have
> a different interrupt message number. These message numbers are
> reported in the Get Event Interrupt Policy mailbox command.
>
> Add interrupt support for event logs. Interrupts are allocated as
> shared interrupts. Therefore, all or some event logs can share the same
> message number.
>
> The driver must deal with the possibility that dynamic capacity is not
> yet supported by a device it sees. Fallback and retry without dynamic
> capacity if the first attempt fails.
>
> Device capacity event logs interrupt as part of the informational event
> log. Check the event status to see which log has data.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
> ---
> Changes from RFC v2
> Adjust to new irq 16 vector allocation
> Jonathan
> Remove CXL_INT_RES
> Use irq threads to ensure mailbox commands are executed outside irq context
> Adjust for optional Dynamic Capacity log
> ---
> drivers/cxl/core/mbox.c | 53 +++++++++++++-
> drivers/cxl/cxlmem.h | 31 ++++++++
> drivers/cxl/pci.c | 133 +++++++++++++++++++++++++++++++++++
> include/uapi/linux/cxl_mem.h | 2 +
> 4 files changed, 217 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 879b228a98a0..1e6762af2a00 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -53,6 +53,8 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> + CXL_CMD(GET_EVT_INT_POLICY, 0, 0x5, 0),
> + CXL_CMD(SET_EVT_INT_POLICY, 0x5, 0, 0),
> CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -791,8 +793,8 @@ static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> &payload, sizeof(payload), NULL, 0);
> }
>
> -static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> - enum cxl_event_log_type type)
> +void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type type)
> {
> struct cxl_get_event_payload payload;
> u16 pl_nr;
> @@ -837,6 +839,7 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_get_records_log, CXL);
>
> /**
> * cxl_mem_get_event_records - Get Event Records from the device
> @@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
>
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> + size_t policy_size = sizeof(*policy);
> + bool retry = true;
> + int rc;
> +
> + policy->info_settings = CXL_INT_MSI_MSIX;
> + policy->warn_settings = CXL_INT_MSI_MSIX;
> + policy->failure_settings = CXL_INT_MSI_MSIX;
> + policy->fatal_settings = CXL_INT_MSI_MSIX;
> + policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> +
> +again:
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> + policy, policy_size, NULL, 0);
> + if (rc < 0) {
> + /*
> + * If the device does not support dynamic capacity it may fail
> + * the command due to an invalid payload. Retry without
> + * dynamic capacity.
> + */
> + if (retry) {
> + retry = false;
> + policy->dyn_cap_settings = 0;
> + policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> + goto again;
> + }
> + dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> + rc);
> + memset(policy, CXL_INT_NONE, sizeof(*policy));
> + return rc;
> + }
Up to you, but I think you can avoid the goto:
int retry = 2;
do {
rc = cxl_mbox_send_cmd(...);
if (rc == 0 || retry == 1)
break;
policy->dyn_cap_settings = 0;
policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
retry--;
} while (retry);
if (rc < 0) {
dev_err(...);
memset(policy, ...);
return rc;
}
> +
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
> + policy, policy_size);
> + if (rc < 0) {
> + dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> + rc);
> + return rc;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
> +
> /**
> * cxl_mem_get_partition_info - Get partition info
> * @cxlds: The device data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 03da4f8f74d3..4d9c3ea30c24 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -179,6 +179,31 @@ struct cxl_endpoint_dvsec_info {
> struct range dvsec_range[2];
> };
>
> +/**
> + * Event Interrupt Policy
> + *
> + * CXL rev 3.0 section 8.2.9.2.4; Table 8-52
> + */
> +enum cxl_event_int_mode {
> + CXL_INT_NONE = 0x00,
> + CXL_INT_MSI_MSIX = 0x01,
> + CXL_INT_FW = 0x02
> +};
> +#define CXL_EVENT_INT_MODE_MASK 0x3
> +#define CXL_EVENT_INT_MSGNUM(setting) (((setting) & 0xf0) >> 4)
> +struct cxl_event_interrupt_policy {
> + u8 info_settings;
> + u8 warn_settings;
> + u8 failure_settings;
> + u8 fatal_settings;
> + u8 dyn_cap_settings;
> +} __packed;
> +
> +static inline bool cxl_evt_int_is_msi(u8 setting)
> +{
> + return CXL_INT_MSI_MSIX == (setting & CXL_EVENT_INT_MODE_MASK);
> +}
> +
> /**
> * struct cxl_dev_state - The driver device state
> *
> @@ -246,6 +271,7 @@ struct cxl_dev_state {
>
> resource_size_t component_reg_phys;
> u64 serial;
> + struct cxl_event_interrupt_policy evt_int_policy;
>
> struct xarray doe_mbs;
>
> @@ -259,6 +285,8 @@ enum cxl_opcode {
> CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
> CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
> + CXL_MBOX_OP_GET_EVT_INT_POLICY = 0x0102,
> + CXL_MBOX_OP_SET_EVT_INT_POLICY = 0x0103,
> CXL_MBOX_OP_GET_FW_INFO = 0x0200,
> CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> @@ -539,7 +567,10 @@ int cxl_mem_create_range_info(struct cxl_dev_state *cxlds);
> struct cxl_dev_state *cxl_dev_state_create(struct device *dev);
> void set_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> void clear_exclusive_cxl_commands(struct cxl_dev_state *cxlds, unsigned long *cmds);
> +void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type type);
> void cxl_mem_get_event_records(struct cxl_dev_state *cxlds);
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds);
> #ifdef CONFIG_CXL_SUSPEND
> void cxl_mem_active_inc(void);
> void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e0d511575b45..64b2e2671043 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> cxlds->nr_irq_vecs = nvecs;
> }
>
> +struct cxl_event_irq_id {
> + struct cxl_dev_state *cxlds;
> + u32 status;
> + unsigned int msgnum;
> +};
> +
> +static irqreturn_t cxl_event_int_thread(int irq, void *id)
> +{
> + struct cxl_event_irq_id *cxlid = id;
> + struct cxl_dev_state *cxlds = cxlid->cxlds;
> +
> + if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> +{
> + struct cxl_event_irq_id *cxlid = id;
> + struct cxl_dev_state *cxlds = cxlid->cxlds;
> + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> + if (cxlid->status & status)
> + return IRQ_WAKE_THREAD;
> + return IRQ_HANDLED;
IRQ_NONE since your handler did not handle anything and this is a shared
interrupt?
> +}
> +
> +static void cxl_free_event_irq(void *id)
> +{
> + struct cxl_event_irq_id *cxlid = id;
> + struct pci_dev *pdev = to_pci_dev(cxlid->cxlds->dev);
> +
> + pci_free_irq(pdev, cxlid->msgnum, id);
> +}
> +
> +static u32 log_type_to_status(enum cxl_event_log_type log_type)
> +{
> + switch (log_type) {
> + case CXL_EVENT_TYPE_INFO:
> + return CXLDEV_EVENT_STATUS_INFO | CXLDEV_EVENT_STATUS_DYNAMIC_CAP;
> + case CXL_EVENT_TYPE_WARN:
> + return CXLDEV_EVENT_STATUS_WARN;
> + case CXL_EVENT_TYPE_FAIL:
> + return CXLDEV_EVENT_STATUS_FAIL;
> + case CXL_EVENT_TYPE_FATAL:
> + return CXLDEV_EVENT_STATUS_FATAL;
> + default:
> + break;
> + }
> + return 0;
> +}
> +
> +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type log_type,
> + u8 setting)
> +{
> + struct device *dev = cxlds->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + struct cxl_event_irq_id *id;
> + unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> + int irq;
int rc? pci_request_irq() returns an errno or 0, not the number of irq.
The variable naming is a bit confusing.
DJ
> +
> + /* Disabled irq is not an error */
> + if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
> + dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
> + msgnum, cxlds->nr_irq_vecs);
> + return 0;
> + }
> +
> + id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
> + if (!id)
> + return -ENOMEM;
> +
> + id->cxlds = cxlds;
> + id->msgnum = msgnum;
> + id->status = log_type_to_status(log_type);
> +
> + irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
> + cxl_event_int_thread, id,
> + "%s:event-log-%s", dev_name(dev),
> + cxl_event_log_type_str(log_type));
> + if (irq)
> + return irq;
> +
> + devm_add_action_or_reset(dev, cxl_free_event_irq, id);
> + return 0;
> +}
> +
> +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> +{
> + struct device *dev = cxlds->dev;
> + u8 setting;
> +
> + if (cxl_event_config_msgnums(cxlds))
> + return;
> +
> + /*
> + * Dynamic Capacity shares the info message number
> + * Nothing to be done except check the status bit in the
> + * irq thread.
> + */
> + setting = cxlds->evt_int_policy.info_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
> +
> + setting = cxlds->evt_int_policy.warn_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
> +
> + setting = cxlds->evt_int_policy.failure_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
> +
> + setting = cxlds->evt_int_policy.fatal_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> @@ -525,6 +657,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> return rc;
>
> cxl_pci_alloc_irq_vectors(cxlds);
> + cxl_event_irqsetup(cxlds);
>
> cxlmd = devm_cxl_add_memdev(cxlds);
> if (IS_ERR(cxlmd))
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 7c1ad8062792..a8204802fcca 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -26,6 +26,8 @@
> ___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
> ___C(GET_EVENT_RECORD, "Get Event Record"), \
> ___C(CLEAR_EVENT_RECORD, "Clear Event Record"), \
> + ___C(GET_EVT_INT_POLICY, "Get Event Interrupt Policy"), \
> + ___C(SET_EVT_INT_POLICY, "Set Event Interrupt Policy"), \
> ___C(GET_FW_INFO, "Get FW Info"), \
> ___C(GET_PARTITION_INFO, "Get Partition Information"), \
> ___C(GET_LSA, "Get Label Storage Area"), \
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
>
> Determine if the event read is a general media record and if so trace
> the record as a General Media Event Record.
>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
>
> ---
> Changes from RFC v2:
> Output DPA flags as a single field
> Ensure names of fields match what TP_print outputs
> Steven
> prefix TRACE_EVENT with 'cxl_'
> Jonathan
> Remove Reserved field
>
> Changes from RFC:
> Add reserved byte array
> Use common CXL event header record macros
> Jonathan
> Use unaligned_le{24,16} for unaligned fields
> Don't use the inverse of phy addr mask
> Dave Jiang
> s/cxl_gen_media_event/general_media
> s/cxl_evt_gen_media/cxl_event_gen_media
> ---
> drivers/cxl/core/mbox.c | 40 ++++++++++--
> drivers/cxl/cxlmem.h | 19 ++++++
> include/trace/events/cxl.h | 124 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 179 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index f46558e09f08..6d48fdb07700 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -709,6 +709,38 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +static const uuid_t gen_media_event_uuid =
> + UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6);
> +
> +static bool cxl_event_tracing_enabled(void)
> +{
> + return trace_cxl_generic_event_enabled() ||
> + trace_cxl_general_media_enabled();
> +}
> +
> +static void cxl_trace_event_record(const char *dev_name,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> +{
> + uuid_t *id = &record->hdr.id;
> +
> + if (uuid_equal(id, &gen_media_event_uuid)) {
> + struct cxl_event_gen_media *rec =
> + (struct cxl_event_gen_media *)record;
> +
> + trace_cxl_general_media(dev_name, type, rec);
> + return;
> + }
> +
> + /* For unknown record types print just the header */
> + trace_cxl_generic_event(dev_name, type, record);
> +}
> +
> static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> enum cxl_event_log_type log,
> struct cxl_get_event_payload *get_pl, u16 nr)
> @@ -754,11 +786,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> int i;
>
> - if (trace_cxl_generic_event_enabled()) {
> + if (cxl_event_tracing_enabled()) {
> for (i = 0; i < nr_rec; i++)
> - trace_cxl_generic_event(dev_name(cxlds->dev),
> - type,
> - &payload.record[i]);
> + cxl_trace_event_record(dev_name(cxlds->dev),
> + type,
> + &payload.record[i]);
> }
>
> rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 28a114c7cf69..86197f3168c7 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -412,6 +412,25 @@ struct cxl_mbox_clear_event_payload {
> __le16 handle[CXL_GET_EVENT_NR_RECORDS];
> };
>
> +/*
> + * General Media Event Record
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> + */
> +#define CXL_EVENT_GEN_MED_COMP_ID_SIZE 0x10
> +struct cxl_event_gen_media {
> + struct cxl_event_record_hdr hdr;
> + __le64 phys_addr;
> + u8 descriptor;
> + u8 type;
> + u8 transaction_type;
> + u8 validity_flags[2];
> + u8 channel;
> + u8 rank;
> + u8 device[3];
> + u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
> + u8 reserved[0x2e];
> +} __packed;
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index 60dec9a84918..a0c20e110708 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
> __print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> );
>
> +/*
> + * Physical Address field masks
> + *
> + * General Media Event Record
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> + *
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +#define CXL_DPA_FLAGS_MASK 0x3F
> +#define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK)
> +
> +#define CXL_DPA_VOLATILE BIT(0)
> +#define CXL_DPA_NOT_REPAIRABLE BIT(1)
> +#define show_dpa_flags(flags) __print_flags(flags, "|", \
> + { CXL_DPA_VOLATILE, "VOLATILE" }, \
> + { CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \
> +)
> +
> +/*
> + * General Media Event Record - GMER
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> + */
> +#define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT BIT(0)
> +#define CXL_GMER_EVT_DESC_THRESHOLD_EVENT BIT(1)
> +#define CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW BIT(2)
> +#define show_event_desc_flags(flags) __print_flags(flags, "|", \
> + { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
> + { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
> + { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
> +)
> +
> +#define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR 0x00
> +#define CXL_GMER_MEM_EVT_TYPE_INV_ADDR 0x01
> +#define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR 0x02
> +#define show_mem_event_type(type) __print_symbolic(type, \
> + { CXL_GMER_MEM_EVT_TYPE_ECC_ERROR, "ECC Error" }, \
> + { CXL_GMER_MEM_EVT_TYPE_INV_ADDR, "Invalid Address" }, \
> + { CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR, "Data Path Error" } \
> +)
> +
> +#define CXL_GMER_TRANS_UNKNOWN 0x00
> +#define CXL_GMER_TRANS_HOST_READ 0x01
> +#define CXL_GMER_TRANS_HOST_WRITE 0x02
> +#define CXL_GMER_TRANS_HOST_SCAN_MEDIA 0x03
> +#define CXL_GMER_TRANS_HOST_INJECT_POISON 0x04
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB 0x05
> +#define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT 0x06
> +#define show_trans_type(type) __print_symbolic(type, \
> + { CXL_GMER_TRANS_UNKNOWN, "Unknown" }, \
> + { CXL_GMER_TRANS_HOST_READ, "Host Read" }, \
> + { CXL_GMER_TRANS_HOST_WRITE, "Host Write" }, \
> + { CXL_GMER_TRANS_HOST_SCAN_MEDIA, "Host Scan Media" }, \
> + { CXL_GMER_TRANS_HOST_INJECT_POISON, "Host Inject Poison" }, \
> + { CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB, "Internal Media Scrub" }, \
> + { CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT, "Internal Media Management" } \
> +)
> +
> +#define CXL_GMER_VALID_CHANNEL BIT(0)
> +#define CXL_GMER_VALID_RANK BIT(1)
> +#define CXL_GMER_VALID_DEVICE BIT(2)
> +#define CXL_GMER_VALID_COMPONENT BIT(3)
> +#define show_valid_flags(flags) __print_flags(flags, "|", \
> + { CXL_GMER_VALID_CHANNEL, "CHANNEL" }, \
> + { CXL_GMER_VALID_RANK, "RANK" }, \
> + { CXL_GMER_VALID_DEVICE, "DEVICE" }, \
> + { CXL_GMER_VALID_COMPONENT, "COMPONENT" } \
> +)
> +
> +TRACE_EVENT(cxl_general_media,
> +
> + TP_PROTO(const char *dev_name, enum cxl_event_log_type log,
> + struct cxl_event_gen_media *rec),
> +
> + TP_ARGS(dev_name, log, rec),
> +
> + TP_STRUCT__entry(
> + CXL_EVT_TP_entry
> + /* General Media */
> + __field(u64, dpa)
> + __field(u8, descriptor)
> + __field(u8, type)
> + __field(u8, transaction_type)
> + __field(u8, channel)
> + __field(u32, device)
> + __array(u8, comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE)
> + __field(u16, validity_flags)
> + /* Following are out of order to pack trace record */
> + __field(u8, rank)
> + __field(u8, dpa_flags)
> + ),
> +
> + TP_fast_assign(
> + CXL_EVT_TP_fast_assign(dev_name, log, rec->hdr);
> +
> + /* General Media */
> + __entry->dpa = le64_to_cpu(rec->phys_addr);
> + __entry->dpa_flags = __entry->dpa & CXL_DPA_FLAGS_MASK;
> + /* Mask after flags have been parsed */
> + __entry->dpa &= CXL_DPA_MASK;
> + __entry->descriptor = rec->descriptor;
> + __entry->type = rec->type;
> + __entry->transaction_type = rec->transaction_type;
> + __entry->channel = rec->channel;
> + __entry->rank = rec->rank;
> + __entry->device = get_unaligned_le24(rec->device);
> + memcpy(__entry->comp_id, &rec->component_id,
> + CXL_EVENT_GEN_MED_COMP_ID_SIZE);
> + __entry->validity_flags = get_unaligned_le16(&rec->validity_flags);
> + ),
> +
> + CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
> + "descriptor='%s' type='%s' transaction_type='%s' channel=%u rank=%u " \
> + "device=%x comp_id=%s validity_flags='%s'",
> + __entry->dpa, show_dpa_flags(__entry->dpa_flags),
> + show_event_desc_flags(__entry->descriptor),
> + show_mem_event_type(__entry->type),
> + show_trans_type(__entry->transaction_type),
> + __entry->channel, __entry->rank, __entry->device,
> + __print_hex(__entry->comp_id, CXL_EVENT_GEN_MED_COMP_ID_SIZE),
> + show_valid_flags(__entry->validity_flags)
> + )
> +);
> +
> #endif /* _CXL_TRACE_EVENTS_H */
>
> /* This part must be outside protection */
On 11/10/2022 10:57 AM, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command. After an event record is read it needs to be cleared from the
> event log.
>
> Implement cxl_clear_event_record() and call it for each record retrieved
> from the device.
>
> Each record is cleared individually. A clear all bit is specified but
> events could arrive between a get and the final clear all operation.
> Therefore each event is cleared specifically.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
>
> ---
> Changes from RFC:
> Jonathan
> Clean up init of payload and use return code.
> Also report any error to clear the event.
> s/v3.0/rev 3.0
> ---
> drivers/cxl/core/mbox.c | 46 ++++++++++++++++++++++++++++++------
> drivers/cxl/cxlmem.h | 15 ++++++++++++
> include/uapi/linux/cxl_mem.h | 1 +
> 3 files changed, 55 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index a908b95a7de4..f46558e09f08 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -52,6 +52,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
> #endif
> CXL_CMD(GET_SUPPORTED_LOGS, 0, CXL_VARIABLE_PAYLOAD, CXL_CMD_FLAG_FORCE_ENABLE),
> CXL_CMD(GET_EVENT_RECORD, 1, CXL_VARIABLE_PAYLOAD, 0),
> + CXL_CMD(CLEAR_EVENT_RECORD, CXL_VARIABLE_PAYLOAD, 0, 0),
> CXL_CMD(GET_FW_INFO, 0, 0x50, 0),
> CXL_CMD(GET_PARTITION_INFO, 0, 0x20, 0),
> CXL_CMD(GET_LSA, 0x8, CXL_VARIABLE_PAYLOAD, 0),
> @@ -708,6 +709,27 @@ int cxl_enumerate_cmds(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> +static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type log,
> + struct cxl_get_event_payload *get_pl, u16 nr)
> +{
> + struct cxl_mbox_clear_event_payload payload = {
> + .event_log = log,
> + .nr_recs = nr,
> + };
> + int i;
> +
> + for (i = 0; i < nr; i++) {
> + payload.handle[i] = get_pl->record[i].hdr.handle;
> + dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
> + cxl_event_log_type_str(log),
> + le16_to_cpu(payload.handle[i]));
> + }
> +
> + return cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_CLEAR_EVENT_RECORD,
> + &payload, sizeof(payload), NULL, 0);
> +}
> +
> static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> enum cxl_event_log_type type)
> {
> @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> }
>
> pl_nr = le16_to_cpu(payload.record_count);
> - if (trace_cxl_generic_event_enabled()) {
> + if (pl_nr > 0) {
> u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> int i;
>
> - for (i = 0; i < nr_rec; i++)
> - trace_cxl_generic_event(dev_name(cxlds->dev),
> - type,
> - &payload.record[i]);
> + if (trace_cxl_generic_event_enabled()) {
> + for (i = 0; i < nr_rec; i++)
> + trace_cxl_generic_event(dev_name(cxlds->dev),
> + type,
> + &payload.record[i]);
> + }
> +
> + rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> + if (rc) {
> + dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> + cxl_event_log_type_str(type), rc);
> + return;
> + }
> }
>
> if (trace_cxl_overflow_enabled() &&
> @@ -750,10 +781,11 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> * cxl_mem_get_event_records - Get Event Records from the device
> * @cxlds: The device data for the operation
> *
> - * Retrieve all event records available on the device and report them as trace
> - * events.
> + * Retrieve all event records available on the device, report them as trace
> + * events, and clear them.
> *
> * See CXL rev 3.0 @8.2.9.2.2 Get Event Records
> + * See CXL rev 3.0 @8.2.9.2.3 Clear Event Records
> */
> void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> {
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index da64ba0f156b..28a114c7cf69 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -258,6 +258,7 @@ enum cxl_opcode {
> CXL_MBOX_OP_INVALID = 0x0000,
> CXL_MBOX_OP_RAW = CXL_MBOX_OP_INVALID,
> CXL_MBOX_OP_GET_EVENT_RECORD = 0x0100,
> + CXL_MBOX_OP_CLEAR_EVENT_RECORD = 0x0101,
> CXL_MBOX_OP_GET_FW_INFO = 0x0200,
> CXL_MBOX_OP_ACTIVATE_FW = 0x0202,
> CXL_MBOX_OP_GET_SUPPORTED_LOGS = 0x0400,
> @@ -397,6 +398,20 @@ static inline const char *cxl_event_log_type_str(enum cxl_event_log_type type)
> return "<unknown>";
> }
>
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record
> + */
> +struct cxl_mbox_clear_event_payload {
> + u8 event_log; /* enum cxl_event_log_type */
> + u8 clear_flags;
> + u8 nr_recs; /* 1 for this struct */
> + u8 reserved[3];
> + __le16 handle[CXL_GET_EVENT_NR_RECORDS];
> +};
> +
> struct cxl_mbox_get_partition_info {
> __le64 active_volatile_cap;
> __le64 active_persistent_cap;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index 70459be5bdd4..7c1ad8062792 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -25,6 +25,7 @@
> ___C(RAW, "Raw device command"), \
> ___C(GET_SUPPORTED_LOGS, "Get Supported Logs"), \
> ___C(GET_EVENT_RECORD, "Get Event Record"), \
> + ___C(CLEAR_EVENT_RECORD, "Clear Event Record"), \
> ___C(GET_FW_INFO, "Get FW Info"), \
> ___C(GET_PARTITION_INFO, "Get Partition Information"), \
> ___C(GET_LSA, "Get Label Storage Area"), \
On Thu, 10 Nov 2022 10:57:50 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> command. After an event record is read it needs to be cleared from the
> event log.
>
> Implement cxl_clear_event_record() and call it for each record retrieved
> from the device.
>
> Each record is cleared individually. A clear all bit is specified but
> events could arrive between a get and the final clear all operation.
> Therefore each event is cleared specifically.
>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
>
Some follow through comment updates needed from changes in earlier patches +
one comment you can ignore if you prefer to keep it as is.
> static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> enum cxl_event_log_type type)
> {
> @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> }
>
> pl_nr = le16_to_cpu(payload.record_count);
> - if (trace_cxl_generic_event_enabled()) {
To simplify this patch, maybe push this check down in the previous patch so this
one doesn't move code around? It'll look a tiny bit odd there of course..
> + if (pl_nr > 0) {
> u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> int i;
>
> - for (i = 0; i < nr_rec; i++)
> - trace_cxl_generic_event(dev_name(cxlds->dev),
> - type,
> - &payload.record[i]);
> + if (trace_cxl_generic_event_enabled()) {
> + for (i = 0; i < nr_rec; i++)
> + trace_cxl_generic_event(dev_name(cxlds->dev),
> + type,
> + &payload.record[i]);
> + }
> +
> + rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> + if (rc) {
> + dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> + cxl_event_log_type_str(type), rc);
> + return;
> + }
> }
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index da64ba0f156b..28a114c7cf69 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
>
> +/*
> + * Clear Event Records input payload
> + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> + *
> + * Space given for 1 record
Nope...
> + */
> +struct cxl_mbox_clear_event_payload {
> + u8 event_log; /* enum cxl_event_log_type */
> + u8 clear_flags;
> + u8 nr_recs; /* 1 for this struct */
Nope :) Delete the comments so they can't be wrong if this changes in future!
> + u8 reserved[3];
> + __le16 handle[CXL_GET_EVENT_NR_RECORDS];
> +};
> +
On Thu, 10 Nov 2022 10:57:49 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL devices have multiple event logs which can be queried for CXL event
> records. Devices are required to support the storage of at least one
> event record in each event log type.
>
> Devices track event log overflow by incrementing a counter and tracking
> the time of the first and last overflow event seen.
>
> Software queries events via the Get Event Record mailbox command; CXL
> rev 3.0 section 8.2.9.2.2.
>
> Issue the Get Event Record mailbox command on driver load. Trace each
> record found with a generic record trace. Trace any overflow
> conditions.
>
> The device can return up to 1MB worth of event records per query. This
> presents complications with allocating a huge buffers to potentially
> capture all the records. It is not anticipated that these event logs
> will be very deep and reading them does not need to be performant.
> Process only 3 records at a time. 3 records was chosen as it fits
> comfortably on the stack to prevent dynamic allocation while still
> cutting down on extra mailbox messages.
>
> This patch traces a raw event record only and leaves the specific event
> record types to subsequent patches.
>
> Macros are created to use for tracing the common CXL Event header
> fields.
>
> Cc: Steven Rostedt <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
Hi Ira,
A question inline about whether some of the conditions you are checking
for can actually happen. Otherwise looks good to me.
Jonathan
>
> ---
> Change from RFC v2:
> Support reading 3 events at once.
> Reverse Jonathan's suggestion and check for positive number of
> records. Because the record count may have been
> returned as something > 3 based on what the device
> thinks it can send back even though the core Linux mbox
> processing truncates the data.
> Alison and Dave Jiang
> Change header uuid type to uuid_t for better user space
> processing
> Smita
> Check status reg before reading log.
> Steven
> Prefix all trace points with 'cxl_'
> Use static branch <trace>_enabled() calls
> Jonathan
> s/CXL_EVENT_TYPE_INFO/0
> s/{first,last}/{first,last}_ts
> Remove Reserved field from header
> Fix header issue for cxl_event_log_type_str()
>
> Change from RFC:
> Remove redundant error message in get event records loop
> s/EVENT_RECORD_DATA_LENGTH/CXL_EVENT_RECORD_DATA_LENGTH
> Use hdr_uuid for the header UUID field
> Use cxl_event_log_type_str() for the trace events
> Create macros for the header fields and common entries of each event
> Add reserved buffer output dump
> Report error if event query fails
> Remove unused record_cnt variable
> Steven - reorder overflow record
> Remove NOTE about checkpatch
> Jonathan
> check for exactly 1 record
> s/v3.0/rev 3.0
> Use 3 byte fields for 24bit fields
> Add 3.0 Maintenance Operation Class
> Add Dynamic Capacity log type
> Fix spelling
> Dave Jiang/Dan/Alison
> s/cxl-event/cxl
> trace/events/cxl-events => trace/events/cxl.h
> s/cxl_event_overflow/overflow
> s/cxl_event/generic_event
> ---
> MAINTAINERS | 1 +
> drivers/cxl/core/mbox.c | 70 +++++++++++++++++++
> drivers/cxl/cxl.h | 8 +++
> drivers/cxl/cxlmem.h | 73 ++++++++++++++++++++
> include/trace/events/cxl.h | 127 +++++++++++++++++++++++++++++++++++
> include/uapi/linux/cxl_mem.h | 1 +
> 6 files changed, 280 insertions(+)
> create mode 100644 include/trace/events/cxl.h
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 16176b9278b4..a908b95a7de4 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type type)
> +{
> + struct cxl_get_event_payload payload;
> + u16 pl_nr;
> +
> + do {
> + u8 log_type = type;
> + int rc;
> +
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> + &log_type, sizeof(log_type),
> + &payload, sizeof(payload));
> + if (rc) {
> + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> + cxl_event_log_type_str(type), rc);
> + return;
> + }
> +
> + pl_nr = le16_to_cpu(payload.record_count);
> + if (trace_cxl_generic_event_enabled()) {
> + u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
Either I'm misreading the spec, or it can't be greater than NR_RECORDS.
"The number of event records in the Event Records list...."
Event Records being the field inside this payload which is not big enough to
take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
refers to the number being restricted by the mailbox output payload provided.
I'm in favor of defense against broken hardware, but don't paper over any
such error - scream about it.
> + int i;
> +
> + for (i = 0; i < nr_rec; i++)
> + trace_cxl_generic_event(dev_name(cxlds->dev),
> + type,
> + &payload.record[i]);
> + }
> +
> + if (trace_cxl_overflow_enabled() &&
> + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> +
> + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
payload not the total number.
> + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> +}
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> new file mode 100644
> index 000000000000..60dec9a84918
> --- /dev/null
> +++ b/include/trace/events/cxl.h
> @@ -0,0 +1,127 @@
> +#define CXL_EVT_TP_fast_assign(dname, l, hdr) \
> + __assign_str(dev_name, (dname)); \
> + __entry->log = (l); \
> + memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
> + __entry->hdr_length = (hdr).length; \
> + __entry->hdr_flags = get_unaligned_le24((hdr).flags); \
> + __entry->hdr_handle = le16_to_cpu((hdr).handle); \
> + __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
> + __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
> + __entry->hdr_maint_op_class = (hdr).maint_op_class
> +
Trivial: Maybe one blank line is enough?
> +
> +#define CXL_EVT_TP_printk(fmt, ...) \
> + TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' " \
> + "handle=%x related_handle=%x maint_op_class=%u" \
> + " : " fmt, \
> + __get_str(dev_name), cxl_event_log_type_str(__entry->log), \
> + __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> + show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
> + __entry->hdr_related_handle, __entry->hdr_maint_op_class, \
> + ##__VA_ARGS__)
On Thu, 10 Nov 2022 10:57:55 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL device events are signaled via interrupts. Each event log may have
> a different interrupt message number. These message numbers are
> reported in the Get Event Interrupt Policy mailbox command.
>
> Add interrupt support for event logs. Interrupts are allocated as
> shared interrupts. Therefore, all or some event logs can share the same
> message number.
>
> The driver must deal with the possibility that dynamic capacity is not
> yet supported by a device it sees. Fallback and retry without dynamic
> capacity if the first attempt fails.
>
> Device capacity event logs interrupt as part of the informational event
> log. Check the event status to see which log has data.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
Hi Ira,
A few comments inline.
Thanks,
Jonathan
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 879b228a98a0..1e6762af2a00 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> /**
> * cxl_mem_get_event_records - Get Event Records from the device
> @@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
>
> +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> + size_t policy_size = sizeof(*policy);
> + bool retry = true;
> + int rc;
> +
> + policy->info_settings = CXL_INT_MSI_MSIX;
> + policy->warn_settings = CXL_INT_MSI_MSIX;
> + policy->failure_settings = CXL_INT_MSI_MSIX;
> + policy->fatal_settings = CXL_INT_MSI_MSIX;
> + policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> +
> +again:
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> + policy, policy_size, NULL, 0);
> + if (rc < 0) {
> + /*
> + * If the device does not support dynamic capacity it may fail
> + * the command due to an invalid payload. Retry without
> + * dynamic capacity.
> + */
There are a number of ways to discover if DCD is supported that aren't based
on try and retry like this. 9.13.3 has "basic sequence to utilize Dynamic Capacity"
That calls out:
Verify the necessary Dynamic Capacity commands are returned in the CEL.
First I'm not sure we should set the interrupt on for DCD until we have a lot
more of the flow handled, secondly even then we should figure out if it is supported
at a higher level than this command and pass that info down here.
> + if (retry) {
> + retry = false;
> + policy->dyn_cap_settings = 0;
> + policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> + goto again;
> + }
> + dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> + rc);
> + memset(policy, CXL_INT_NONE, sizeof(*policy));
Relying on all the fields being 1 byte is a bit error prone. I'd just set them all
individually in the interests of more readable code.
> + return rc;
> + }
> +
> + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
> + policy, policy_size);
Add a comment on why you are reading this back (to get the msgnums in the upper
bits) as it's not obvious to a casual reader.
> + if (rc < 0) {
> + dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> + rc);
> + return rc;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
> +
...
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e0d511575b45..64b2e2671043 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> cxlds->nr_irq_vecs = nvecs;
> }
>
> +struct cxl_event_irq_id {
> + struct cxl_dev_state *cxlds;
> + u32 status;
> + unsigned int msgnum;
msgnum is only here for freeing the interrupt - I'd rather we fixed
that by using standard infrastructure (or adding some - see below).
status is an indirect way of allowing us to share an interrupt handler.
You could do that by registering a trivial wrapper for each instead.
Then all you have left is the cxl_dev_state which could be passed
in directly as the callback parameter removing need to have this
structure at all. I think that might be neater.
> +};
> +
> +static irqreturn_t cxl_event_int_thread(int irq, void *id)
> +{
> + struct cxl_event_irq_id *cxlid = id;
> + struct cxl_dev_state *cxlds = cxlid->cxlds;
> +
> + if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> + if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> +{
> + struct cxl_event_irq_id *cxlid = id;
> + struct cxl_dev_state *cxlds = cxlid->cxlds;
> + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> +
> + if (cxlid->status & status)
> + return IRQ_WAKE_THREAD;
> + return IRQ_HANDLED;
If status not set IRQ_NONE.
Ah. I see Dave raised this as well.
> +}
...
> +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> + enum cxl_event_log_type log_type,
> + u8 setting)
> +{
> + struct device *dev = cxlds->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + struct cxl_event_irq_id *id;
> + unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> + int irq;
> +
> + /* Disabled irq is not an error */
> + if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
I don't think that second condition can occur. The language under table 8-52
(I think) means that it will move around if there aren't enough vectors
(for MSI - MSI-X is more complex, but result the same).
> + dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
> + msgnum, cxlds->nr_irq_vecs);
> + return 0;
> + }
> +
> + id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
> + if (!id)
> + return -ENOMEM;
> +
> + id->cxlds = cxlds;
> + id->msgnum = msgnum;
> + id->status = log_type_to_status(log_type);
> +
> + irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
> + cxl_event_int_thread, id,
> + "%s:event-log-%s", dev_name(dev),
> + cxl_event_log_type_str(log_type));
> + if (irq)
> + return irq;
> +
> + devm_add_action_or_reset(dev, cxl_free_event_irq, id);
Hmm. no pcim_request_irq() maybe this is the time to propose one
(separate from this patch so we don't get delayed by that!)
We discussed this way back in DOE series (I'd forgotten but lore found
it for me). There I suggested just calling
devm_request_threaded_irq() directly as a work around.
> + return 0;
> +}
> +
> +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> +{
> + struct device *dev = cxlds->dev;
> + u8 setting;
> +
> + if (cxl_event_config_msgnums(cxlds))
> + return;
> +
> + /*
> + * Dynamic Capacity shares the info message number
> + * Nothing to be done except check the status bit in the
> + * irq thread.
> + */
> + setting = cxlds->evt_int_policy.info_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
> +
> + setting = cxlds->evt_int_policy.warn_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
> +
> + setting = cxlds->evt_int_policy.failure_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
> +
> + setting = cxlds->evt_int_policy.fatal_settings;
> + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
> + dev_err(dev, "Failed to get interrupt for %s event log\n",
> + cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
> +}
On Thu, 10 Nov 2022 10:57:54 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
>
> Determine if the event read is memory module record and if so trace the
> record.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
Noticed that we have a mixture of fully capitalized and not for flags.
With that either explained or tidied up:
Reviewed-by: Jonathan Cameron <[email protected]>
> +/*
> + * Device Health Information - DHI
> + *
> + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> + */
> +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
Why are we sometime using capitals for flags (e.g patch 5) and not other times?
> +)
On Thu, 10 Nov 2022 10:57:48 -0800
[email protected] wrote:
> From: Davidlohr Bueso <[email protected]>
>
> Currently the only CXL features targeted for irq support require their
> message numbers to be within the first 16 entries. The device may
> however support less than 16 entries depending on the support it
> provides.
>
> Attempt to allocate these 16 irq vectors. If the device supports less
> then the PCI infrastructure will allocate that number. Store the number
> of vectors actually allocated in the device state for later use
> by individual functions.
See later patch review, but I don't think we need to store the number
allocated because any vector is guaranteed to be below that point
(QEMU code is wrong on this at the momemt, but there are very few vectors
so it hasn't mattered yet).
Otherwise, pcim fun deals with some of the cleanup you are doing again
here for us so can simplify this somewhat. See inline.
Jonathan
>
> Upon successful allocation, users can plug in their respective isr at
> any point thereafter, for example, if the irq setup is not done in the
> PCI driver, such as the case of the CXL-PMU.
>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Jonathan Cameron <[email protected]>
> Co-developed-by: Ira Weiny <[email protected]>
> Signed-off-by: Ira Weiny <[email protected]>
> Signed-off-by: Davidlohr Bueso <[email protected]>
>
> ---
> Changes from Ira
> Remove reviews
> Allocate up to a static 16 vectors.
> Change cover letter
> ---
> drivers/cxl/cxlmem.h | 3 +++
> drivers/cxl/cxlpci.h | 6 ++++++
> drivers/cxl/pci.c | 32 ++++++++++++++++++++++++++++++++
> 3 files changed, 41 insertions(+)
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 88e3a8e54b6a..b7b955ded3ac 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> * @info: Cached DVSEC information about the device.
> * @serial: PCIe Device Serial Number
> * @doe_mbs: PCI DOE mailbox array
> + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
> * @mbox_send: @dev specific transport for transmitting mailbox commands
> *
> * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> @@ -247,6 +248,8 @@ struct cxl_dev_state {
>
> struct xarray doe_mbs;
>
> + int nr_irq_vecs;
> +
> int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> };
>
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index eec597dbe763..b7f4e2f417d3 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -53,6 +53,12 @@
> #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
>
> +/*
> + * NOTE: Currently all the functions which are enabled for CXL require their
> + * vectors to be in the first 16. Use this as the max.
> + */
> +#define CXL_PCI_REQUIRED_VECTORS 16
> +
> /* Register Block Identifier (RBI) */
> enum cxl_regloc_type {
> CXL_REGLOC_RBI_EMPTY = 0,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index faeb5d9d7a7a..62e560063e50 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> }
> }
>
> +static void cxl_pci_free_irq_vectors(void *data)
> +{
> + pci_free_irq_vectors(data);
> +}
> +
> +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> +{
> + struct device *dev = cxlds->dev;
> + struct pci_dev *pdev = to_pci_dev(dev);
> + int nvecs;
> + int rc;
> +
> + nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> + PCI_IRQ_MSIX | PCI_IRQ_MSI);
> + if (nvecs < 0) {
> + dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> + return;
> + }
> +
> + rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
The pci managed code always gives me a headache because there is a lot of magic
under the hood if you ever called pcim_enable_device() which we did.
Chasing through
pci_alloc_irq_vectors_affinity()->
either
__pci_enable_msix_range()
or
__pci_enable_msi_range()
they are similar
pci_setup_msi_context()
pci_setup_msi_release()
adds pcmi_msi_release devm action.
and that frees the vectors for us.
So we don't need to do it here.
> + if (rc) {
> + dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> + /* some got allocated, clean them up */
> + cxl_pci_free_irq_vectors(pdev);
We could just leave them lying around for devm cleanup to sweep up eventually
or free them as you have done here.
> + return;
> + }
> +
> + cxlds->nr_irq_vecs = nvecs;
> +}
> +
> static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> {
> struct cxl_register_map map;
> @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> + cxl_pci_alloc_irq_vectors(cxlds);
> +
> cxlmd = devm_cxl_add_memdev(cxlds);
> if (IS_ERR(cxlmd))
> return PTR_ERR(cxlmd);
On Wed, 16 Nov 2022 15:24:26 +0000
Jonathan Cameron <[email protected]> wrote:
> On Thu, 10 Nov 2022 10:57:50 -0800
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command. After an event record is read it needs to be cleared from the
> > event log.
> >
> > Implement cxl_clear_event_record() and call it for each record retrieved
> > from the device.
> >
> > Each record is cleared individually. A clear all bit is specified but
> > events could arrive between a get and the final clear all operation.
> > Therefore each event is cleared specifically.
> >
> > Reviewed-by: Jonathan Cameron <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> Some follow through comment updates needed from changes in earlier patches +
> one comment you can ignore if you prefer to keep it as is.
>
> > static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > enum cxl_event_log_type type)
> > {
> > @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > }
> >
> > pl_nr = le16_to_cpu(payload.record_count);
> > - if (trace_cxl_generic_event_enabled()) {
>
> To simplify this patch, maybe push this check down in the previous patch so this
> one doesn't move code around? It'll look a tiny bit odd there of course..
>
> > + if (pl_nr > 0) {
> > u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> > int i;
> >
> > - for (i = 0; i < nr_rec; i++)
> > - trace_cxl_generic_event(dev_name(cxlds->dev),
> > - type,
> > - &payload.record[i]);
> > + if (trace_cxl_generic_event_enabled()) {
> > + for (i = 0; i < nr_rec; i++)
> > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > + type,
> > + &payload.record[i]);
> > + }
> > +
> > + rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> > + if (rc) {
> > + dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> > + cxl_event_log_type_str(type), rc);
> > + return;
> > + }
> > }
> >
>
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index da64ba0f156b..28a114c7cf69 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
>
> >
> > +/*
> > + * Clear Event Records input payload
> > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > + *
> > + * Space given for 1 record
>
> Nope...
>
>
> > + */
> > +struct cxl_mbox_clear_event_payload {
> > + u8 event_log; /* enum cxl_event_log_type */
> > + u8 clear_flags;
> > + u8 nr_recs; /* 1 for this struct */
> Nope :) Delete the comments so they can't be wrong if this changes in future!
Ah. You only use one. So should hard code that in the array size below.
>
> > + u8 reserved[3];
> > + __le16 handle[CXL_GET_EVENT_NR_RECORDS];
> > +};
> > +
>
On Thu, 10 Nov 2022 10:57:52 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
>
> Determine if the event read is a general media record and if so trace
> the record as a General Media Event Record.
>
> Signed-off-by: Ira Weiny <[email protected]>
>
A few v2.0 references left in here that should be updated given it's new code.
With those tidied up
Reviewed-by: Jonathan Cameron <[email protected]>
> diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> index 60dec9a84918..a0c20e110708 100644
> --- a/include/trace/events/cxl.h
> +++ b/include/trace/events/cxl.h
> @@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
> __print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> );
>
> +/*
> + * Physical Address field masks
> + *
> + * General Media Event Record
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
Update to CXL rev 3.0 as I think we are preferring latest
spec references on any new code.
> + *
> + * DRAM Event Record
> + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> + */
> +
> +/*
> + * General Media Event Record - GMER
> + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
Update ref to r3.0
Never v or the spec folk will get irritable :)
> + */
On Thu, 10 Nov 2022 10:57:58 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Log overflow is marked by a separate trace message.
>
> Simulate a log with lots of messages and flag overflow until it is
> drained a bit.
>
> Signed-off-by: Ira Weiny <[email protected]>
Looks fine to me
Reviewed-by: Jonathan Cameron <[email protected]>
>
> ---
> Changes from RFC
> Adjust for new struct changes
> ---
> tools/testing/cxl/test/events.c | 37 +++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> index 8693f3fb9cbb..5ce257114f4e 100644
> --- a/tools/testing/cxl/test/events.c
> +++ b/tools/testing/cxl/test/events.c
> @@ -69,11 +69,21 @@ static void event_store_add_event(struct mock_event_store *mes,
> log->nr_events++;
> }
>
> +static u16 log_overflow(struct mock_event_log *log)
> +{
> + int cnt = log_rec_left(log) - 5;
> +
> + if (cnt < 0)
> + return 0;
> + return cnt;
> +}
> +
> int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> {
> struct cxl_get_event_payload *pl;
> struct mock_event_log *log;
> u8 log_type;
> + u16 nr_overflow;
>
> /* Valid request? */
> if (cmd->size_in != sizeof(log_type))
> @@ -95,6 +105,20 @@ int mock_get_event(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd)
> if (log_rec_left(log) > 1)
> pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
>
> + nr_overflow = log_overflow(log);
> + if (nr_overflow) {
> + u64 ns;
> +
> + pl->flags |= CXL_GET_EVENT_FLAG_OVERFLOW;
> + pl->overflow_err_count = cpu_to_le16(nr_overflow);
> + ns = ktime_get_real_ns();
> + ns -= 5000000000; /* 5s ago */
> + pl->first_overflow_timestamp = cpu_to_le64(ns);
> + ns = ktime_get_real_ns();
> + ns -= 1000000000; /* 1s ago */
> + pl->last_overflow_timestamp = cpu_to_le64(ns);
> + }
> +
> memcpy(&pl->record[0], get_cur_event(log), sizeof(pl->record[0]));
> pl->record[0].hdr.handle = get_cur_event_handle(log);
> return 0;
> @@ -274,6 +298,19 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
> (struct cxl_event_record_raw *)&mem_module);
> mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
>
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> + (struct cxl_event_record_raw *)&dram);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> + (struct cxl_event_record_raw *)&gen_media);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> + (struct cxl_event_record_raw *)&mem_module);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FAIL,
> + (struct cxl_event_record_raw *)&dram);
> + mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL;
> +
> event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
> (struct cxl_event_record_raw *)&dram);
On Thu, 10 Nov 2022 10:57:57 -0800
[email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> Each type of event has different trace point outputs.
>
> Add mock General Media Event, DRAM event, and Memory Module Event
> records to the mock list of events returned.
>
> Signed-off-by: Ira Weiny <[email protected]>
A few trivial things inline. Otherwise
Reviewed-by: Jonathan Cameron <[email protected]>
>
> ---
> Changes from RFC:
> Adjust for struct changes
> adjust for unaligned fields
> ---
> tools/testing/cxl/test/events.c | 70 +++++++++++++++++++++++++++++++++
> 1 file changed, 70 insertions(+)
>
> diff --git a/tools/testing/cxl/test/events.c b/tools/testing/cxl/test/events.c
> index a4816f230bb5..8693f3fb9cbb 100644
> --- a/tools/testing/cxl/test/events.c
> +++ b/tools/testing/cxl/test/events.c
> @@ -186,6 +186,70 @@ struct cxl_event_record_raw hardware_replace = {
> .data = { 0xDE, 0xAD, 0xBE, 0xEF },
> };
>
> +struct cxl_event_gen_media gen_media = {
> + .hdr = {
> + .id = UUID_INIT(0xfbcd0a77, 0xc260, 0x417f,
> + 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6),
> + .length = sizeof(struct cxl_event_gen_media),
> + .flags[0] = CXL_EVENT_RECORD_FLAG_PERMANENT,
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0),
> + },
> + .phys_addr = cpu_to_le64(0x2000),
> + .descriptor = CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT,
> + .type = CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,
> + .transaction_type = CXL_GMER_TRANS_HOST_WRITE,
> + .validity_flags = { CXL_GMER_VALID_CHANNEL |
> + CXL_GMER_VALID_RANK, 0 },
put_unaligned_le16()
> + .channel = 1,
> + .rank = 30
> +};
> +
> +struct cxl_event_dram dram = {
> + .hdr = {
> + .id = UUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,
> + 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24),
> + .length = sizeof(struct cxl_event_dram),
> + .flags[0] = CXL_EVENT_RECORD_FLAG_PERF_DEGRADED,
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0),
> + },
> + .phys_addr = cpu_to_le64(0x8000),
> + .descriptor = CXL_GMER_EVT_DESC_THRESHOLD_EVENT,
> + .type = CXL_GMER_MEM_EVT_TYPE_INV_ADDR,
> + .transaction_type = CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,
> + .validity_flags = { CXL_DER_VALID_CHANNEL |
> + CXL_DER_VALID_BANK_GROUP |
> + CXL_DER_VALID_BANK |
> + CXL_DER_VALID_COLUMN, 0 },
put_unaligned_le16() etc
> + .channel = 1,
> + .bank_group = 5,
> + .bank = 2,
> + .column = { 0xDE, 0xAD},
spacing
> +};
> +
> +struct cxl_event_mem_module mem_module = {
> + .hdr = {
> + .id = UUID_INIT(0xfe927475, 0xdd59, 0x4339,
> + 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74),
> + .length = sizeof(struct cxl_event_mem_module),
> + /* .handle = Set dynamically */
> + .related_handle = cpu_to_le16(0),
> + },
> + .event_type = CXL_MMER_TEMP_CHANGE,
> + .info = {
> + .health_status = CXL_DHI_HS_PERFORMANCE_DEGRADED,
> + .media_status = CXL_DHI_MS_ALL_DATA_LOST,
> + .add_status = (CXL_DHI_AS_CRITICAL << 2) |
> + (CXL_DHI_AS_WARNING << 4) |
> + (CXL_DHI_AS_WARNING << 5),
> + .device_temp = { 0xDE, 0xAD},
> + .dirty_shutdown_cnt = { 0xde, 0xad, 0xbe, 0xef },
> + .cor_vol_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> + .cor_per_err_cnt = { 0xde, 0xad, 0xbe, 0xef },
> + }
> +};
> +
> u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
> {
> struct device *dev = cxlds->dev;
> @@ -204,9 +268,15 @@ u32 cxl_mock_add_event_logs(struct cxl_dev_state *cxlds)
> }
>
> event_store_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
> + event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
> + (struct cxl_event_record_raw *)&gen_media);
> + event_store_add_event(mes, CXL_EVENT_TYPE_INFO,
> + (struct cxl_event_record_raw *)&mem_module);
> mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
>
> event_store_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
> + event_store_add_event(mes, CXL_EVENT_TYPE_FATAL,
> + (struct cxl_event_record_raw *)&dram);
> mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
>
> return mes->ev_status;
On Wed, Nov 16, 2022 at 02:53:41PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:48 -0800
> [email protected] wrote:
>
> > From: Davidlohr Bueso <[email protected]>
> >
> > Currently the only CXL features targeted for irq support require their
> > message numbers to be within the first 16 entries. The device may
> > however support less than 16 entries depending on the support it
> > provides.
> >
> > Attempt to allocate these 16 irq vectors. If the device supports less
> > then the PCI infrastructure will allocate that number. Store the number
> > of vectors actually allocated in the device state for later use
> > by individual functions.
> See later patch review, but I don't think we need to store the number
> allocated because any vector is guaranteed to be below that point
Only as long as we stick to those functions which are guaranteed to be under
16. If a device supports more than 16 and code is added to try and enable that
irq this base support will not cover that.
> (QEMU code is wrong on this at the momemt, but there are very few vectors
> so it hasn't mattered yet).
How so? Does the spec state that a device must report at least 16 vectors?
>
> Otherwise, pcim fun deals with some of the cleanup you are doing again
> here for us so can simplify this somewhat. See inline.
Yea it is broken.
>
> Jonathan
>
>
>
> >
> > Upon successful allocation, users can plug in their respective isr at
> > any point thereafter, for example, if the irq setup is not done in the
> > PCI driver, such as the case of the CXL-PMU.
> >
> > Cc: Bjorn Helgaas <[email protected]>
> > Cc: Jonathan Cameron <[email protected]>
> > Co-developed-by: Ira Weiny <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> > Signed-off-by: Davidlohr Bueso <[email protected]>
> >
> > ---
> > Changes from Ira
> > Remove reviews
> > Allocate up to a static 16 vectors.
> > Change cover letter
> > ---
> > drivers/cxl/cxlmem.h | 3 +++
> > drivers/cxl/cxlpci.h | 6 ++++++
> > drivers/cxl/pci.c | 32 ++++++++++++++++++++++++++++++++
> > 3 files changed, 41 insertions(+)
> >
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 88e3a8e54b6a..b7b955ded3ac 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> > * @info: Cached DVSEC information about the device.
> > * @serial: PCIe Device Serial Number
> > * @doe_mbs: PCI DOE mailbox array
> > + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
> > * @mbox_send: @dev specific transport for transmitting mailbox commands
> > *
> > * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> > @@ -247,6 +248,8 @@ struct cxl_dev_state {
> >
> > struct xarray doe_mbs;
> >
> > + int nr_irq_vecs;
> > +
> > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > };
> >
> > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > index eec597dbe763..b7f4e2f417d3 100644
> > --- a/drivers/cxl/cxlpci.h
> > +++ b/drivers/cxl/cxlpci.h
> > @@ -53,6 +53,12 @@
> > #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> > #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
> >
> > +/*
> > + * NOTE: Currently all the functions which are enabled for CXL require their
> > + * vectors to be in the first 16. Use this as the max.
> > + */
> > +#define CXL_PCI_REQUIRED_VECTORS 16
> > +
> > /* Register Block Identifier (RBI) */
> > enum cxl_regloc_type {
> > CXL_REGLOC_RBI_EMPTY = 0,
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index faeb5d9d7a7a..62e560063e50 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > }
> > }
> >
> > +static void cxl_pci_free_irq_vectors(void *data)
> > +{
> > + pci_free_irq_vectors(data);
> > +}
> > +
> > +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > +{
> > + struct device *dev = cxlds->dev;
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > + int nvecs;
> > + int rc;
> > +
> > + nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> > + PCI_IRQ_MSIX | PCI_IRQ_MSI);
> > + if (nvecs < 0) {
> > + dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> > + return;
> > + }
> > +
> > + rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> The pci managed code always gives me a headache because there is a lot of magic
> under the hood if you ever called pcim_enable_device() which we did.
>
> Chasing through
>
> pci_alloc_irq_vectors_affinity()->
> either
> __pci_enable_msix_range()
> or
> __pci_enable_msi_range()
>
> they are similar
> pci_setup_msi_context()
> pci_setup_msi_release()
> adds pcmi_msi_release devm action.
> and that frees the vectors for us.
> So we don't need to do it here.
:-/
So what is the point of pci_free_irq_vectors()? This is very confusing to have
a function not called pcim_* [pci_alloc_irq_vectors()] do 'pcim stuff'.
Ok I'll drop this extra because I see it now.
>
>
> > + if (rc) {
> > + dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> > + /* some got allocated, clean them up */
> > + cxl_pci_free_irq_vectors(pdev);
> We could just leave them lying around for devm cleanup to sweep up eventually
> or free them as you have done here.
And besides this extra call is flat out broken. cxl_pci_free_irq_vectors() is
already called at this point if devm_add_action_or_reset() failed... But I see
this is not required.
I do plan to add a big ol' comment as to why we don't need to mirror the call
with the corresponding 'free'.
I'll respin,
Ira
>
> > + return;
> > + }
> > +
> > + cxlds->nr_irq_vecs = nvecs;
> > +}
> > +
> > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > {
> > struct cxl_register_map map;
> > @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > if (rc)
> > return rc;
> >
> > + cxl_pci_alloc_irq_vectors(cxlds);
> > +
> > cxlmd = devm_cxl_add_memdev(cxlds);
> > if (IS_ERR(cxlmd))
> > return PTR_ERR(cxlmd);
>
On Wed, Nov 16, 2022 at 03:19:36PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:49 -0800
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL devices have multiple event logs which can be queried for CXL event
> > records. Devices are required to support the storage of at least one
> > event record in each event log type.
> >
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> >
> > Software queries events via the Get Event Record mailbox command; CXL
> > rev 3.0 section 8.2.9.2.2.
> >
> > Issue the Get Event Record mailbox command on driver load. Trace each
> > record found with a generic record trace. Trace any overflow
> > conditions.
> >
> > The device can return up to 1MB worth of event records per query. This
> > presents complications with allocating a huge buffers to potentially
> > capture all the records. It is not anticipated that these event logs
> > will be very deep and reading them does not need to be performant.
> > Process only 3 records at a time. 3 records was chosen as it fits
> > comfortably on the stack to prevent dynamic allocation while still
> > cutting down on extra mailbox messages.
> >
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> >
> > Macros are created to use for tracing the common CXL Event header
> > fields.
> >
> > Cc: Steven Rostedt <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
>
> Hi Ira,
>
> A question inline about whether some of the conditions you are checking
> for can actually happen. Otherwise looks good to me.
>
> Jonathan
>
[snip]
> > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type type)
> > +{
> > + struct cxl_get_event_payload payload;
> > + u16 pl_nr;
> > +
> > + do {
> > + u8 log_type = type;
> > + int rc;
> > +
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > + &log_type, sizeof(log_type),
> > + &payload, sizeof(payload));
> > + if (rc) {
> > + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > + cxl_event_log_type_str(type), rc);
> > + return;
> > + }
> > +
> > + pl_nr = le16_to_cpu(payload.record_count);
> > + if (trace_cxl_generic_event_enabled()) {
> > + u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
>
> Either I'm misreading the spec, or it can't be greater than NR_RECORDS.
Well... I could have read the spec wrong as well. But after reading very
carefully I think this is actually correct.
> "The number of event records in the Event Records list...."
Where is this quote from? I don't see that in the spec.
> Event Records being the field inside this payload which is not big enough to
> take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
> refers to the number being restricted by the mailbox output payload provided.
My understanding is that the output payload is only limited by the Payload Size
reported in the Mailbox Capability Register.Payload Size. (Section 8.2.8.4.3)
This can be up to 1MB. So the device could fill up to 1MB's worth of Event
Records while still being in compliance. The generic mailbox code in the
driver caps the data based on the size passed into cxl_mbox_send_cmd() however,
the number of records reported is not changed.
>
> I'm in favor of defense against broken hardware, but don't paper over any
> such error - scream about it.
I don't think this is out of spec unless the device is trying to write more
than 1MB and I think the core mailbox code will scream about that.
>
> > + int i;
> > +
> > + for (i = 0; i < nr_rec; i++)
> > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > + type,
> > + &payload.record[i]);
> > + }
> > +
> > + if (trace_cxl_overflow_enabled() &&
> > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > +
> > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
>
> Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> payload not the total number.
I don't think so. The only value passed to the device is the _input_ payload
size. The output payload size is not passed to the device and is not included
in the Get Event Records Input Payload. (Table 8-49)
So my previous code was wrong. Here is an example I think which is within the
spec but would result in the more records flag not being set.
Device log depth == 10
nr log entries == 7
nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
mailbox code truncates that to 3. More Event Records == 0 because it sent all
7 that it had.
This code will clear 3 and read again 2 more times.
Am I reading that wrong?
>
> > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > +}
>
>
> > diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> > new file mode 100644
> > index 000000000000..60dec9a84918
> > --- /dev/null
> > +++ b/include/trace/events/cxl.h
> > @@ -0,0 +1,127 @@
>
>
> > +#define CXL_EVT_TP_fast_assign(dname, l, hdr) \
> > + __assign_str(dev_name, (dname)); \
> > + __entry->log = (l); \
> > + memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
> > + __entry->hdr_length = (hdr).length; \
> > + __entry->hdr_flags = get_unaligned_le24((hdr).flags); \
> > + __entry->hdr_handle = le16_to_cpu((hdr).handle); \
> > + __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
> > + __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
> > + __entry->hdr_maint_op_class = (hdr).maint_op_class
> > +
> Trivial: Maybe one blank line is enough?
Yea I'll adjust,
Ira
> > +
> > +#define CXL_EVT_TP_printk(fmt, ...) \
> > + TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' " \
> > + "handle=%x related_handle=%x maint_op_class=%u" \
> > + " : " fmt, \
> > + __get_str(dev_name), cxl_event_log_type_str(__entry->log), \
> > + __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> > + show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
> > + __entry->hdr_related_handle, __entry->hdr_maint_op_class, \
> > + ##__VA_ARGS__)
>
On Wed, Nov 16, 2022 at 03:24:26PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:50 -0800
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL rev 3.0 section 8.2.9.2.3 defines the Clear Event Records mailbox
> > command. After an event record is read it needs to be cleared from the
> > event log.
> >
> > Implement cxl_clear_event_record() and call it for each record retrieved
> > from the device.
> >
> > Each record is cleared individually. A clear all bit is specified but
> > events could arrive between a get and the final clear all operation.
> > Therefore each event is cleared specifically.
> >
> > Reviewed-by: Jonathan Cameron <[email protected]>
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> Some follow through comment updates needed from changes in earlier patches +
> one comment you can ignore if you prefer to keep it as is.
>
> > static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > enum cxl_event_log_type type)
> > {
> > @@ -728,14 +750,23 @@ static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > }
> >
> > pl_nr = le16_to_cpu(payload.record_count);
> > - if (trace_cxl_generic_event_enabled()) {
>
> To simplify this patch, maybe push this check down in the previous patch so this
> one doesn't move code around? It'll look a tiny bit odd there of course..
That is the issue I think the oddness is easier to defend here vs having it in
the previous patch.
>
> > + if (pl_nr > 0) {
> > u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> > int i;
> >
> > - for (i = 0; i < nr_rec; i++)
> > - trace_cxl_generic_event(dev_name(cxlds->dev),
> > - type,
> > - &payload.record[i]);
> > + if (trace_cxl_generic_event_enabled()) {
> > + for (i = 0; i < nr_rec; i++)
> > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > + type,
> > + &payload.record[i]);
> > + }
> > +
> > + rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
> > + if (rc) {
> > + dev_err(cxlds->dev, "Event log '%s': Failed to clear events : %d",
> > + cxl_event_log_type_str(type), rc);
> > + return;
> > + }
> > }
> >
>
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index da64ba0f156b..28a114c7cf69 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
>
> >
> > +/*
> > + * Clear Event Records input payload
> > + * CXL rev 3.0 section 8.2.9.2.3; Table 8-51
> > + *
> > + * Space given for 1 record
>
> Nope...
<sigh> yep... ;-)
>
>
> > + */
> > +struct cxl_mbox_clear_event_payload {
> > + u8 event_log; /* enum cxl_event_log_type */
> > + u8 clear_flags;
> > + u8 nr_recs; /* 1 for this struct */
> Nope :) Delete the comments so they can't be wrong if this changes in future!
Yep. :-/
Ira
On Wed, Nov 16, 2022 at 03:45:43PM +0000, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 15:24:26 +0000
> Jonathan Cameron <[email protected]> wrote:
>
[snip]
> >
> >
> > > + */
> > > +struct cxl_mbox_clear_event_payload {
> > > + u8 event_log; /* enum cxl_event_log_type */
> > > + u8 clear_flags;
> > > + u8 nr_recs; /* 1 for this struct */
> > Nope :) Delete the comments so they can't be wrong if this changes in future!
> Ah. You only use one. So should hard code that in the array size below.
No it can can send up to CXL_GET_EVENT_NR_RECORDS at a time : 'nr_rec'.
rc = cxl_clear_event_record(cxlds, type, &payload, nr_rec);
static int cxl_clear_event_record(struct cxl_dev_state *cxlds,
enum cxl_event_log_type log,
struct cxl_get_event_payload *get_pl, u16 nr)
{
struct cxl_mbox_clear_event_payload payload = {
.event_log = log,
.nr_recs = nr,
^^^^^^^^^^^^^^
Here...
};
int i;
for (i = 0; i < nr; i++) {
payload.handle[i] = get_pl->record[i].hdr.handle;
dev_dbg(cxlds->dev, "Event log '%s': Clearning %u\n",
cxl_event_log_type_str(log),
le16_to_cpu(payload.handle[i]));
}
...
Ira
On Wed, Nov 16, 2022 at 03:31:06PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:52 -0800
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record.
> >
> > Determine if the event read is a general media record and if so trace
> > the record as a General Media Event Record.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> A few v2.0 references left in here that should be updated given it's new code.
>
> With those tidied up
Fixed.
> Reviewed-by: Jonathan Cameron <[email protected]>
And thanks!
Ira
>
>
> > diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> > index 60dec9a84918..a0c20e110708 100644
> > --- a/include/trace/events/cxl.h
> > +++ b/include/trace/events/cxl.h
> > @@ -119,6 +119,130 @@ TRACE_EVENT(cxl_generic_event,
> > __print_hex(__entry->data, CXL_EVENT_RECORD_DATA_LENGTH))
> > );
> >
> > +/*
> > + * Physical Address field masks
> > + *
> > + * General Media Event Record
> > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
>
> Update to CXL rev 3.0 as I think we are preferring latest
> spec references on any new code.
>
> > + *
> > + * DRAM Event Record
> > + * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> > + */
>
> > +
> > +/*
> > + * General Media Event Record - GMER
> > + * CXL v2.0 Section 8.2.9.1.1.1; Table 154
> Update ref to r3.0
> Never v or the spec folk will get irritable :)
>
> > + */
>
>
>
On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:54 -0800
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> >
> > Determine if the event read is memory module record and if so trace the
> > record.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> Noticed that we have a mixture of fully capitalized and not for flags.
> With that either explained or tidied up:
>
> Reviewed-by: Jonathan Cameron <[email protected]>
>
> > +/*
> > + * Device Health Information - DHI
> > + *
> > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > + */
> > +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> > +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> > + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> > + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> > + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
>
> Why are we sometime using capitals for flags (e.g patch 5) and not other times?
Not sure what you mean. Do you mean this from patch 5?
...
{ CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
{ CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
{ CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
...
Threshold event was a mistake. This is the capitalization the spec uses.
Bit[0]: Uncorrectable Event: When set, indicates the reported event is
^^^^^^^^^^^^^^^^^^^
uncorrectable by the device. When cleared, indicates the reported
event was corrected by the device.
Bit[1]: Threshold Event: When set, the event is the result of a
^^^^^^^^^^^^^^^
threshold on the device having been reached. When cleared, the event
is not the result of a threshold limit.
Bit[2]: Poison List Overflow Event: When set, the Poison List has
^^^^^^^^^^^^^^^^^^^^^^^^^^
overflowed, and this event is not in the Poison List. When cleared, the
Poison List has not overflowed.
I'll update this 'Event' in patch 5. Probably need to add 'Event' to the
Poison List...
Ira
On Tue, Nov 15, 2022 at 04:13:24PM -0700, Jiang, Dave wrote:
>
>
> On 11/10/2022 10:57 AM, [email protected] wrote:
[snip]
> > +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> > +{
> > + struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> > + size_t policy_size = sizeof(*policy);
> > + bool retry = true;
> > + int rc;
> > +
> > + policy->info_settings = CXL_INT_MSI_MSIX;
> > + policy->warn_settings = CXL_INT_MSI_MSIX;
> > + policy->failure_settings = CXL_INT_MSI_MSIX;
> > + policy->fatal_settings = CXL_INT_MSI_MSIX;
> > + policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> > +
> > +again:
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> > + policy, policy_size, NULL, 0);
> > + if (rc < 0) {
> > + /*
> > + * If the device does not support dynamic capacity it may fail
> > + * the command due to an invalid payload. Retry without
> > + * dynamic capacity.
> > + */
> > + if (retry) {
> > + retry = false;
> > + policy->dyn_cap_settings = 0;
> > + policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> > + goto again;
> > + }
> > + dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> > + rc);
> > + memset(policy, CXL_INT_NONE, sizeof(*policy));
> > + return rc;
> > + }
>
> Up to you, but I think you can avoid the goto:
I think this is a bit more confusing because we are not really retrying 2
times.
>
> int retry = 2;
> do {
> rc = cxl_mbox_send_cmd(...);
> if (rc == 0 || retry == 1)
Specifically this looks confusing to me. Why break on retry == 1?
> break;
> policy->dyn_cap_settings = 0;
> policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> retry--;
> } while (retry);
>
> if (rc < 0) {
> dev_err(...);
> memset(policy, ...);
> return rc;
> }
That said perhaps the retry should be based on policy_size... :-/ I'm not
sure that adds much. I'm going to leave it as is.
[snip]
> > +
> > +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> > +{
> > + struct cxl_event_irq_id *cxlid = id;
> > + struct cxl_dev_state *cxlds = cxlid->cxlds;
> > + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > + if (cxlid->status & status)
> > + return IRQ_WAKE_THREAD;
> > + return IRQ_HANDLED;
>
> IRQ_NONE since your handler did not handle anything and this is a shared
> interrupt?
Yes. Good catch thanks!
>
> > +}
> > +
> > +static void cxl_free_event_irq(void *id)
> > +{
> > + struct cxl_event_irq_id *cxlid = id;
> > + struct pci_dev *pdev = to_pci_dev(cxlid->cxlds->dev);
> > +
> > + pci_free_irq(pdev, cxlid->msgnum, id);
> > +}
> > +
> > +static u32 log_type_to_status(enum cxl_event_log_type log_type)
> > +{
> > + switch (log_type) {
> > + case CXL_EVENT_TYPE_INFO:
> > + return CXLDEV_EVENT_STATUS_INFO | CXLDEV_EVENT_STATUS_DYNAMIC_CAP;
> > + case CXL_EVENT_TYPE_WARN:
> > + return CXLDEV_EVENT_STATUS_WARN;
> > + case CXL_EVENT_TYPE_FAIL:
> > + return CXLDEV_EVENT_STATUS_FAIL;
> > + case CXL_EVENT_TYPE_FATAL:
> > + return CXLDEV_EVENT_STATUS_FATAL;
> > + default:
> > + break;
> > + }
> > + return 0;
> > +}
> > +
> > +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type log_type,
> > + u8 setting)
> > +{
> > + struct device *dev = cxlds->dev;
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > + struct cxl_event_irq_id *id;
> > + unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> > + int irq;
>
> int rc? pci_request_irq() returns an errno or 0, not the number of irq. The
> variable naming is a bit confusing.
Indeed. Changed, and thanks,
Ira
>
> DJ
>
On Wed, 16 Nov 2022 16:47:20 -0800
Ira Weiny <[email protected]> wrote:
> On Wed, Nov 16, 2022 at 03:19:36PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:49 -0800
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > CXL devices have multiple event logs which can be queried for CXL event
> > > records. Devices are required to support the storage of at least one
> > > event record in each event log type.
> > >
> > > Devices track event log overflow by incrementing a counter and tracking
> > > the time of the first and last overflow event seen.
> > >
> > > Software queries events via the Get Event Record mailbox command; CXL
> > > rev 3.0 section 8.2.9.2.2.
> > >
> > > Issue the Get Event Record mailbox command on driver load. Trace each
> > > record found with a generic record trace. Trace any overflow
> > > conditions.
> > >
> > > The device can return up to 1MB worth of event records per query. This
> > > presents complications with allocating a huge buffers to potentially
> > > capture all the records. It is not anticipated that these event logs
> > > will be very deep and reading them does not need to be performant.
> > > Process only 3 records at a time. 3 records was chosen as it fits
> > > comfortably on the stack to prevent dynamic allocation while still
> > > cutting down on extra mailbox messages.
> > >
> > > This patch traces a raw event record only and leaves the specific event
> > > record types to subsequent patches.
> > >
> > > Macros are created to use for tracing the common CXL Event header
> > > fields.
> > >
> > > Cc: Steven Rostedt <[email protected]>
> > > Signed-off-by: Ira Weiny <[email protected]>
> >
> > Hi Ira,
> >
> > A question inline about whether some of the conditions you are checking
> > for can actually happen. Otherwise looks good to me.
> >
> > Jonathan
> >
>
> [snip]
>
> > > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > > + enum cxl_event_log_type type)
> > > +{
> > > + struct cxl_get_event_payload payload;
> > > + u16 pl_nr;
> > > +
> > > + do {
> > > + u8 log_type = type;
> > > + int rc;
> > > +
> > > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > > + &log_type, sizeof(log_type),
> > > + &payload, sizeof(payload));
> > > + if (rc) {
> > > + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > > + cxl_event_log_type_str(type), rc);
> > > + return;
> > > + }
> > > +
> > > + pl_nr = le16_to_cpu(payload.record_count);
> > > + if (trace_cxl_generic_event_enabled()) {
> > > + u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
> >
> > Either I'm misreading the spec, or it can't be greater than NR_RECORDS.
>
> Well... I could have read the spec wrong as well. But after reading very
> carefully I think this is actually correct.
>
> > "The number of event records in the Event Records list...."
>
> Where is this quote from? I don't see that in the spec.
Table 8-50 Event Record Count (the field we are reading here).
>
> > Event Records being the field inside this payload which is not big enough to
> > take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
> > refers to the number being restricted by the mailbox output payload provided.
>
> My understanding is that the output payload is only limited by the Payload Size
> reported in the Mailbox Capability Register.Payload Size. (Section 8.2.8.4.3)
>
> This can be up to 1MB. So the device could fill up to 1MB's worth of Event
> Records while still being in compliance. The generic mailbox code in the
> driver caps the data based on the size passed into cxl_mbox_send_cmd() however,
> the number of records reported is not changed.
Indeed I had that wrong. I thought we passed in an output payload length whereas
we only provide "payload length" which is defined as being the input length in 8.2.8.4.5
>
> >
> > I'm in favor of defense against broken hardware, but don't paper over any
> > such error - scream about it.
>
> I don't think this is out of spec unless the device is trying to write more
> than 1MB and I think the core mailbox code will scream about that.
>
> >
> > > + int i;
> > > +
> > > + for (i = 0; i < nr_rec; i++)
> > > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > > + type,
> > > + &payload.record[i]);
> > > + }
> > > +
> > > + if (trace_cxl_overflow_enabled() &&
> > > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > +
> > > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> >
> > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > payload not the total number.
>
> I don't think so. The only value passed to the device is the _input_ payload
> size. The output payload size is not passed to the device and is not included
> in the Get Event Records Input Payload. (Table 8-49)
>
> So my previous code was wrong. Here is an example I think which is within the
> spec but would result in the more records flag not being set.
>
> Device log depth == 10
> nr log entries == 7
> nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
>
> Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
> mailbox code truncates that to 3. More Event Records == 0 because it sent all
> 7 that it had.
>
> This code will clear 3 and read again 2 more times.
>
> Am I reading that wrong?
I think this is still wrong, but for a different reason. :)
If we don't clear the records and more records is set, that means it didn't
fit in the mailbox payload (potentially 1MB) then the next read
will return the next set of records from there.
Taking this patch only, let's say the mailbox takes 4 records.
Read 1: Records 0, 1, 2, 3 More set.
We handle 0, 1, 2
Read 2: Records 4, 5, 6 More not set.
We handle 4, 5, 6
Record 3 is never handled.
If we add in clearing as happens later in the series, the current
assumption is that if we clear some records a subsequent read will
start again. I'm not sure that is true. If it is spec reference needed.
So assumption is
Read 1: Records 0, 1, 2, 3 More set
Clear 0, 1, 2
Read 2: Records 3, 4, 5, 6
Clear 3, 4, 5 More not set, but catch it with the condition above.
Read 3: 6 only
Clear 6
However, I think a valid implementation could do the following
(imagine a ring buffer with a pointer to the 'next' record to read out and
each record has a 'valid' flag to deal with corner cases around
sequences such as read log once, start reading again and some
clears occur using handles obtained from first read - not that
case isn't ruled out by the spec as far as I can see).
Read 1: Records 0, 1, 2, 3 More set. 'next' pointer points to record 4.
Clear 0, 1, 2
Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
Clear 4, 5, 6
Skipping record 3.
So I think we have to absorb the full mailbox payload each time to guarantee
we don't skip events or process them out of order (which is what would happen
if we relied on a retry loop - we aren't allowed to clear them out of
order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
shall verify the event record handles specified in the input payload are in
temporal order. ... ").
Obviously that temporal order thing is only relevant if we get my second
example occurring on real hardware. I think the spec is vague enough
to allow that implementation. Would have been easy to specify this originally
but it probably won't go in as errata so we need to cope with all the
flexibility that is present.
What fun and oh for a parameter to control how many records are returned!
Jonathan
>
> >
> > > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > +}
> >
>
On Wed, 16 Nov 2022 17:23:58 -0800
Ira Weiny <[email protected]> wrote:
> On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:54 -0800
> > [email protected] wrote:
> >
> > > From: Ira Weiny <[email protected]>
> > >
> > > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > >
> > > Determine if the event read is memory module record and if so trace the
> > > record.
> > >
> > > Signed-off-by: Ira Weiny <[email protected]>
> > >
> > Noticed that we have a mixture of fully capitalized and not for flags.
> > With that either explained or tidied up:
> >
> > Reviewed-by: Jonathan Cameron <[email protected]>
> >
> > > +/*
> > > + * Device Health Information - DHI
> > > + *
> > > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> > > +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> > > + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> > > + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> > > + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
> >
> > Why are we sometime using capitals for flags (e.g patch 5) and not other times?
>
> Not sure what you mean. Do you mean this from patch 5?
Nope
+#define CXL_DPA_VOLATILE BIT(0)
+#define CXL_DPA_NOT_REPAIRABLE BIT(1)
+#define show_dpa_flags(flags) __print_flags(flags, "|", \
+ { CXL_DPA_VOLATILE, "VOLATILE" }, \
+ { CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \
+)
+
Where they are all capitals. I thought that was maybe a flags vs other fields
thing but it doesn't seem to be.
>
> ...
> { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
> { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
> { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
> ...
>
> Threshold event was a mistake. This is the capitalization the spec uses.
>
> Bit[0]: Uncorrectable Event: When set, indicates the reported event is
> ^^^^^^^^^^^^^^^^^^^
> uncorrectable by the device. When cleared, indicates the reported
> event was corrected by the device.
>
> Bit[1]: Threshold Event: When set, the event is the result of a
> ^^^^^^^^^^^^^^^
> threshold on the device having been reached. When cleared, the event
> is not the result of a threshold limit.
>
> Bit[2]: Poison List Overflow Event: When set, the Poison List has
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> overflowed, and this event is not in the Poison List. When cleared, the
> Poison List has not overflowed.
>
>
> I'll update this 'Event' in patch 5. Probably need to add 'Event' to the
> Poison List...
>
> Ira
On Wed, 16 Nov 2022 15:48:48 -0800
Ira Weiny <[email protected]> wrote:
> On Wed, Nov 16, 2022 at 02:53:41PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:48 -0800
> > [email protected] wrote:
> >
> > > From: Davidlohr Bueso <[email protected]>
> > >
> > > Currently the only CXL features targeted for irq support require their
> > > message numbers to be within the first 16 entries. The device may
> > > however support less than 16 entries depending on the support it
> > > provides.
> > >
> > > Attempt to allocate these 16 irq vectors. If the device supports less
> > > then the PCI infrastructure will allocate that number. Store the number
> > > of vectors actually allocated in the device state for later use
> > > by individual functions.
> > See later patch review, but I don't think we need to store the number
> > allocated because any vector is guaranteed to be below that point
>
> Only as long as we stick to those functions which are guaranteed to be under
> 16. If a device supports more than 16 and code is added to try and enable that
> irq this base support will not cover that.
It matters only if this enable is changed. If this remains at 16, the vectors
are guaranteed to be under 16..
>
> > (QEMU code is wrong on this at the momemt, but there are very few vectors
> > so it hasn't mattered yet).
>
> How so? Does the spec state that a device must report at least 16 vectors?
Technically QEMU upstream today uses 1 vector I think, so failure to get that
is the same as no irqs.
As we expand to more possible vectors, QEMU should adjust the reported msgnums
to fit in whatever vectors are enabled (if using msi, logic is handled
elsewhere for msix as there is an indirection in the way and I think it
is down to the OS to program that indirection correctly). See spec language
referred to in review of the patch using the irqs. We may never implement
that magic. but it is done correctly for other devices.
https://elixir.bootlin.com/qemu/latest/source/hw/pci-bridge/ioh3420.c#L47
is an example where aer is on vector 1 unless there is only one vector in which
case it falls back to vector 0.
Jonathan
>
> >
> > Otherwise, pcim fun deals with some of the cleanup you are doing again
> > here for us so can simplify this somewhat. See inline.
>
> Yea it is broken.
>
> >
> > Jonathan
> >
> >
> >
> > >
> > > Upon successful allocation, users can plug in their respective isr at
> > > any point thereafter, for example, if the irq setup is not done in the
> > > PCI driver, such as the case of the CXL-PMU.
> > >
> > > Cc: Bjorn Helgaas <[email protected]>
> > > Cc: Jonathan Cameron <[email protected]>
> > > Co-developed-by: Ira Weiny <[email protected]>
> > > Signed-off-by: Ira Weiny <[email protected]>
> > > Signed-off-by: Davidlohr Bueso <[email protected]>
> > >
> > > ---
> > > Changes from Ira
> > > Remove reviews
> > > Allocate up to a static 16 vectors.
> > > Change cover letter
> > > ---
> > > drivers/cxl/cxlmem.h | 3 +++
> > > drivers/cxl/cxlpci.h | 6 ++++++
> > > drivers/cxl/pci.c | 32 ++++++++++++++++++++++++++++++++
> > > 3 files changed, 41 insertions(+)
> > >
> > > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > > index 88e3a8e54b6a..b7b955ded3ac 100644
> > > --- a/drivers/cxl/cxlmem.h
> > > +++ b/drivers/cxl/cxlmem.h
> > > @@ -211,6 +211,7 @@ struct cxl_endpoint_dvsec_info {
> > > * @info: Cached DVSEC information about the device.
> > > * @serial: PCIe Device Serial Number
> > > * @doe_mbs: PCI DOE mailbox array
> > > + * @nr_irq_vecs: Number of MSI-X/MSI vectors available
> > > * @mbox_send: @dev specific transport for transmitting mailbox commands
> > > *
> > > * See section 8.2.9.5.2 Capacity Configuration and Label Storage for
> > > @@ -247,6 +248,8 @@ struct cxl_dev_state {
> > >
> > > struct xarray doe_mbs;
> > >
> > > + int nr_irq_vecs;
> > > +
> > > int (*mbox_send)(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *cmd);
> > > };
> > >
> > > diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> > > index eec597dbe763..b7f4e2f417d3 100644
> > > --- a/drivers/cxl/cxlpci.h
> > > +++ b/drivers/cxl/cxlpci.h
> > > @@ -53,6 +53,12 @@
> > > #define CXL_DVSEC_REG_LOCATOR_BLOCK_ID_MASK GENMASK(15, 8)
> > > #define CXL_DVSEC_REG_LOCATOR_BLOCK_OFF_LOW_MASK GENMASK(31, 16)
> > >
> > > +/*
> > > + * NOTE: Currently all the functions which are enabled for CXL require their
> > > + * vectors to be in the first 16. Use this as the max.
> > > + */
> > > +#define CXL_PCI_REQUIRED_VECTORS 16
> > > +
> > > /* Register Block Identifier (RBI) */
> > > enum cxl_regloc_type {
> > > CXL_REGLOC_RBI_EMPTY = 0,
> > > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > > index faeb5d9d7a7a..62e560063e50 100644
> > > --- a/drivers/cxl/pci.c
> > > +++ b/drivers/cxl/pci.c
> > > @@ -428,6 +428,36 @@ static void devm_cxl_pci_create_doe(struct cxl_dev_state *cxlds)
> > > }
> > > }
> > >
> > > +static void cxl_pci_free_irq_vectors(void *data)
> > > +{
> > > + pci_free_irq_vectors(data);
> > > +}
> > > +
> > > +static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > > +{
> > > + struct device *dev = cxlds->dev;
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > + int nvecs;
> > > + int rc;
> > > +
> > > + nvecs = pci_alloc_irq_vectors(pdev, 1, CXL_PCI_REQUIRED_VECTORS,
> > > + PCI_IRQ_MSIX | PCI_IRQ_MSI);
> > > + if (nvecs < 0) {
> > > + dev_dbg(dev, "Not enough interrupts; use polling instead.\n");
> > > + return;
> > > + }
> > > +
> > > + rc = devm_add_action_or_reset(dev, cxl_pci_free_irq_vectors, pdev);
> > The pci managed code always gives me a headache because there is a lot of magic
> > under the hood if you ever called pcim_enable_device() which we did.
> >
> > Chasing through
> >
> > pci_alloc_irq_vectors_affinity()->
> > either
> > __pci_enable_msix_range()
> > or
> > __pci_enable_msi_range()
> >
> > they are similar
> > pci_setup_msi_context()
> > pci_setup_msi_release()
> > adds pcmi_msi_release devm action.
> > and that frees the vectors for us.
> > So we don't need to do it here.
>
> :-/
>
> So what is the point of pci_free_irq_vectors()? This is very confusing to have
> a function not called pcim_* [pci_alloc_irq_vectors()] do 'pcim stuff'.
>
> Ok I'll drop this extra because I see it now.
>
> >
> >
> > > + if (rc) {
> > > + dev_dbg(dev, "Device managed call failed; interrupts disabled.\n");
> > > + /* some got allocated, clean them up */
> > > + cxl_pci_free_irq_vectors(pdev);
> > We could just leave them lying around for devm cleanup to sweep up eventually
> > or free them as you have done here.
>
> And besides this extra call is flat out broken. cxl_pci_free_irq_vectors() is
> already called at this point if devm_add_action_or_reset() failed... But I see
> this is not required.
>
> I do plan to add a big ol' comment as to why we don't need to mirror the call
> with the corresponding 'free'.
>
> I'll respin,
> Ira
>
> >
> > > + return;
> > > + }
> > > +
> > > + cxlds->nr_irq_vecs = nvecs;
> > > +}
> > > +
> > > static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > {
> > > struct cxl_register_map map;
> > > @@ -494,6 +524,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > if (rc)
> > > return rc;
> > >
> > > + cxl_pci_alloc_irq_vectors(cxlds);
> > > +
> > > cxlmd = devm_cxl_add_memdev(cxlds);
> > > if (IS_ERR(cxlmd))
> > > return PTR_ERR(cxlmd);
> >
On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 16:47:20 -0800
> Ira Weiny <[email protected]> wrote:
>
>
[snip]
> >
> > >
> > > > + int i;
> > > > +
> > > > + for (i = 0; i < nr_rec; i++)
> > > > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > + type,
> > > > + &payload.record[i]);
> > > > + }
> > > > +
> > > > + if (trace_cxl_overflow_enabled() &&
> > > > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > +
> > > > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> > >
> > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > payload not the total number.
> >
> > I don't think so. The only value passed to the device is the _input_ payload
> > size. The output payload size is not passed to the device and is not included
> > in the Get Event Records Input Payload. (Table 8-49)
> >
> > So my previous code was wrong. Here is an example I think which is within the
> > spec but would result in the more records flag not being set.
> >
> > Device log depth == 10
> > nr log entries == 7
> > nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> >
> > Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
> > mailbox code truncates that to 3. More Event Records == 0 because it sent all
> > 7 that it had.
> >
> > This code will clear 3 and read again 2 more times.
> >
> > Am I reading that wrong?
>
> I think this is still wrong, but for a different reason. :)
I hope not... :-/
> If we don't clear the records and more records is set, that means it didn't
> fit in the mailbox payload (potentially 1MB) then the next read
> will return the next set of records from there.
That is not how I read the Get Event Records command:
From 8.2.9.2.2 Get Event Records
... "Devices shall return event records to the host in the temporal order the
device detected the events in. The event occurring the earliest in time, in the
specific event log, shall be returned first."
If item 3 below is earlier than 4 then it must be returned if we have not
cleared it. At least that is how I read the above. :-/
>
> Taking this patch only, let's say the mailbox takes 4 records.
> Read 1: Records 0, 1, 2, 3 More set.
> We handle 0, 1, 2
> Read 2: Records 4, 5, 6 More not set.
> We handle 4, 5, 6
>
> Record 3 is never handled.
>
> If we add in clearing as happens later in the series,
I suppose I should squash the patches as this may not work without the
clearing. :-/
> the current
> assumption is that if we clear some records a subsequent read will
> start again. I'm not sure that is true. If it is spec reference needed.
>
> So assumption is
> Read 1: Records 0, 1, 2, 3 More set
> Clear 0, 1, 2
> Read 2: Records 3, 4, 5, 6
> Clear 3, 4, 5 More not set, but catch it with the condition above.
> Read 3: 6 only
> Clear 6
>
> However, I think a valid implementation could do the following
> (imagine a ring buffer with a pointer to the 'next' record to read out and
> each record has a 'valid' flag to deal with corner cases around
> sequences such as read log once, start reading again and some
> clears occur using handles obtained from first read - not that
> case isn't ruled out by the spec as far as I can see).
I believe this is a violation because the next pointer can't be advanced until
the record is cleared. Otherwise the device is not returning items in temporal
order based on what is in the log.
>
> Read 1: Records 0, 1, 2, 3 More set. 'next' pointer points to record 4.
> Clear 0, 1, 2
> Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> Clear 4, 5, 6
>
> Skipping record 3.
>
> So I think we have to absorb the full mailbox payload each time to guarantee
> we don't skip events or process them out of order (which is what would happen
> if we relied on a retry loop - we aren't allowed to clear them out of
> order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> shall verify the event record handles specified in the input payload are in
> temporal order. ... ").
> Obviously that temporal order thing is only relevant if we get my second
> example occurring on real hardware. I think the spec is vague enough
> to allow that implementation. Would have been easy to specify this originally
> but it probably won't go in as errata so we need to cope with all the
> flexibility that is present.
:-( Yea coulda, woulda, shoulda... ;-)
>
> What fun and oh for a parameter to control how many records are returned!
Yea. But I really don't think there is a problem unless someone really take
liberty with the spec. I think it boils down to how one interprets _when_ a
record is removed from the log.
If the record is removed when it is returned (as in your 'next' pointer
example) then why have a clear at all? If my interpretation is correct then
the next available entry is the one which has not been cleared. Therefore in
your example 'next' is not incremented until clear has been called. I think
that implementation is also supported by the idea that records must be cleared
in temporal order. Otherwise I think devices would get confused.
FWIW the qemu implementation is based on my interpretation ATM.
Ira
>
> Jonathan
>
>
> >
> > >
> > > > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > +}
> > >
>
> >
>
On Fri, 18 Nov 2022 15:26:17 -0800
Ira Weiny <[email protected]> wrote:
> On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> > On Wed, 16 Nov 2022 16:47:20 -0800
> > Ira Weiny <[email protected]> wrote:
> >
> >
>
> [snip]
>
> > >
> > > >
> > > > > + int i;
> > > > > +
> > > > > + for (i = 0; i < nr_rec; i++)
> > > > > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > > + type,
> > > > > + &payload.record[i]);
> > > > > + }
> > > > > +
> > > > > + if (trace_cxl_overflow_enabled() &&
> > > > > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > > +
> > > > > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> > > >
> > > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > > payload not the total number.
> > >
> > > I don't think so. The only value passed to the device is the _input_ payload
> > > size. The output payload size is not passed to the device and is not included
> > > in the Get Event Records Input Payload. (Table 8-49)
> > >
> > > So my previous code was wrong. Here is an example I think which is within the
> > > spec but would result in the more records flag not being set.
> > >
> > > Device log depth == 10
> > > nr log entries == 7
> > > nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > >
> > > Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
> > > mailbox code truncates that to 3. More Event Records == 0 because it sent all
> > > 7 that it had.
> > >
> > > This code will clear 3 and read again 2 more times.
> > >
> > > Am I reading that wrong?
> >
> > I think this is still wrong, but for a different reason. :)
>
> I hope not... :-/
>
> > If we don't clear the records and more records is set, that means it didn't
> > fit in the mailbox payload (potentially 1MB) then the next read
> > will return the next set of records from there.
>
> That is not how I read the Get Event Records command:
>
> From 8.2.9.2.2 Get Event Records
>
> ... "Devices shall return event records to the host in the temporal order the
> device detected the events in. The event occurring the earliest in time, in the
> specific event log, shall be returned first."
>
> If item 3 below is earlier than 4 then it must be returned if we have not
> cleared it. At least that is how I read the above. :-/
In general that doesn't work. Imagine we cleared no records.
In that case we'd return 4 despite there being earlier records.
There is no language to cover this particular case of clearing
part of what was returned. The device did return the records
in temporal order, we just didn't notice some of them.
The wonders of slightly loose spec wording. Far as I can tell
we are stuck with having to come with all things that could be
read as being valid implementations.
>
> >
> > Taking this patch only, let's say the mailbox takes 4 records.
> > Read 1: Records 0, 1, 2, 3 More set.
> > We handle 0, 1, 2
> > Read 2: Records 4, 5, 6 More not set.
> > We handle 4, 5, 6
> >
> > Record 3 is never handled.
> >
> > If we add in clearing as happens later in the series,
>
> I suppose I should squash the patches as this may not work without the
> clearing. :-/
>
> > the current
> > assumption is that if we clear some records a subsequent read will
> > start again. I'm not sure that is true. If it is spec reference needed.
> >
> > So assumption is
> > Read 1: Records 0, 1, 2, 3 More set
> > Clear 0, 1, 2
> > Read 2: Records 3, 4, 5, 6
> > Clear 3, 4, 5 More not set, but catch it with the condition above.
> > Read 3: 6 only
> > Clear 6
> >
> > However, I think a valid implementation could do the following
> > (imagine a ring buffer with a pointer to the 'next' record to read out and
> > each record has a 'valid' flag to deal with corner cases around
> > sequences such as read log once, start reading again and some
> > clears occur using handles obtained from first read - not that
> > case isn't ruled out by the spec as far as I can see).
>
> I believe this is a violation because the next pointer can't be advanced until
> the record is cleared. Otherwise the device is not returning items in temporal
> order based on what is in the log.
Ah. This is where we disagree. The temporal order is (potentially?) unconnected
from the clearing. The device did return them in temporal order, we just didn't
take any novice of record 3 being returned.
A valid reading of that temporal order comment is actually the other way around
that the device must not reset it's idea of temporal order until all records
have been read (reading 3 twice is not in temporal order - imagine we had
read 5 each time and it becomes more obvious as the read order becomes
0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
reading of the term. The more I read this, the more I think the current implementation
is not compliant with the specification at all.
I'm not seeing a spec mention of 'reseting' the ordering on clearing records
(which might have been a good thing in the first place but too late now).
>
> >
> > Read 1: Records 0, 1, 2, 3 More set. 'next' pointer points to record 4.
> > Clear 0, 1, 2
> > Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> > Clear 4, 5, 6
> >
> > Skipping record 3.
> >
> > So I think we have to absorb the full mailbox payload each time to guarantee
> > we don't skip events or process them out of order (which is what would happen
> > if we relied on a retry loop - we aren't allowed to clear them out of
> > order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> > shall verify the event record handles specified in the input payload are in
> > temporal order. ... ").
> > Obviously that temporal order thing is only relevant if we get my second
> > example occurring on real hardware. I think the spec is vague enough
> > to allow that implementation. Would have been easy to specify this originally
> > but it probably won't go in as errata so we need to cope with all the
> > flexibility that is present.
>
> :-( Yea coulda, woulda, shoulda... ;-)
>
> >
> > What fun and oh for a parameter to control how many records are returned!
>
> Yea. But I really don't think there is a problem unless someone really take
> liberty with the spec. I think it boils down to how one interprets _when_ a
> record is removed from the log.
This is nothing to do with removal. The wording we have is just about reading
and I think a strict reading of the spec would say your assumption of a reset of the
read pointer on clear is NOT a valid implementation. There is separate wording
about clears being in temporal order, but that doesn't effect the Get Event
Records handling.
>
> If the record is removed when it is returned (as in your 'next' pointer
> example) then why have a clear at all?
Because if your software crashes, you don't have a handshake to reestablish
state. If that happens you read the whole log until MORE is not set and
then read it again to get a clean list. It's messy situation that has
been discussed before for GET POISON LIST which has the same nasty handing
of MORE. (look in appropriate forum for resolution to that one that we can't
yet discuss here!)
Also, allows for non destructive readback (debugging tools might take a look
having paused the normal handling).
> If my interpretation is correct then
> the next available entry is the one which has not been cleared.
If that is the case the language in "More Event Records" doesn't work
"The host should continue to retrieve records using this command, until
this indicator is no longer set by the device"
With your reading of the spec, if we clear nothing, we'd keep getting the
first set of records and only be able to read more by clearing them...
> Therefore in
> your example 'next' is not incremented until clear has been called. I think
> that implementation is also supported by the idea that records must be cleared
> in temporal order. Otherwise I think devices would get confused.
Not hard for device to do this (how I now read the spec) properly.
Two pointers:
1) Next to clear: CLEAR
2) Next to read: READ
Advance the the READ pointer on Get Event Records
For CLEAR, check that the requested clears are handled in order and that
they are before the READ pointer.
Maybe we should just take it to appropriate spec forum to seek a clarification?
Jonathan
>
> FWIW the qemu implementation is based on my interpretation ATM.
>
> Ira
>
> >
> > Jonathan
> >
> >
> > >
> > > >
> > > > > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > > +}
> > > >
> >
> > >
> >
On Thu, 10 Nov 2022 10:57:54 -0800
[email protected] wrote:
> static bool cxl_event_tracing_enabled(void)
> {
> return trace_cxl_generic_event_enabled() ||
> trace_cxl_general_media_enabled() ||
> - trace_cxl_dram_enabled();
> + trace_cxl_dram_enabled() ||
> + trace_cxl_memory_module_enabled();
> }
>
My only concern with this patch set is that gcc may decide to not inline
this function and you will lose the performance of the static branches
provided by the trace_cxl_*enabled() functions.
Other than that, for patches 5-7 from a tracing perspective:
Reviewed-by: Steven Rostedt (Google) <[email protected]>
-- Steve
On Mon, Nov 21, 2022 at 10:47:14AM +0000, Jonathan Cameron wrote:
> On Fri, 18 Nov 2022 15:26:17 -0800
> Ira Weiny <[email protected]> wrote:
>
> > On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> > > On Wed, 16 Nov 2022 16:47:20 -0800
> > > Ira Weiny <[email protected]> wrote:
> > >
> > >
> >
> > [snip]
> >
> > > >
> > > > >
> > > > > > + int i;
> > > > > > +
> > > > > > + for (i = 0; i < nr_rec; i++)
> > > > > > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > > > + type,
> > > > > > + &payload.record[i]);
> > > > > > + }
> > > > > > +
> > > > > > + if (trace_cxl_overflow_enabled() &&
> > > > > > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > > > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > > > +
> > > > > > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> > > > >
> > > > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > > > payload not the total number.
> > > >
> > > > I don't think so. The only value passed to the device is the _input_ payload
> > > > size. The output payload size is not passed to the device and is not included
> > > > in the Get Event Records Input Payload. (Table 8-49)
> > > >
> > > > So my previous code was wrong. Here is an example I think which is within the
> > > > spec but would result in the more records flag not being set.
> > > >
> > > > Device log depth == 10
> > > > nr log entries == 7
> > > > nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > > >
> > > > Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
> > > > mailbox code truncates that to 3. More Event Records == 0 because it sent all
> > > > 7 that it had.
> > > >
> > > > This code will clear 3 and read again 2 more times.
> > > >
> > > > Am I reading that wrong?
> > >
> > > I think this is still wrong, but for a different reason. :)
> >
> > I hope not... :-/
> >
> > > If we don't clear the records and more records is set, that means it didn't
> > > fit in the mailbox payload (potentially 1MB) then the next read
> > > will return the next set of records from there.
> >
> > That is not how I read the Get Event Records command:
> >
> > From 8.2.9.2.2 Get Event Records
> >
> > ... "Devices shall return event records to the host in the temporal order the
> > device detected the events in. The event occurring the earliest in time, in the
> > specific event log, shall be returned first."
> >
> > If item 3 below is earlier than 4 then it must be returned if we have not
> > cleared it. At least that is how I read the above. :-/
>
> In general that doesn't work. Imagine we cleared no records.
> In that case we'd return 4 despite there being earlier records.
> There is no language to cover this particular case of clearing
> part of what was returned. The device did return the records
> in temporal order, we just didn't notice some of them.
>
> The wonders of slightly loose spec wording. Far as I can tell
> we are stuck with having to come with all things that could be
> read as being valid implementations.
So I've been thinking about this for a while.
Lets take this example:
> > >
> > > Taking this patch only, let's say the mailbox takes 4 records.
> > > Read 1: Records 0, 1, 2, 3 More set.
> > > We handle 0, 1, 2
> > > Read 2: Records 4, 5, 6 More not set.
> > > We handle 4, 5, 6
> > >
In this case what happens if you do a 3rd read? Does the device return
nothing? Or does it return 0, 1, 2, 3 again?
It must start from the beginning right? But that is no longer in temporal
order by your definition either.
And if it returns nothing then there is no way to recover them except on device
reset?
FWIW I'm altering the patch set to do what you say and allocate a buffer large
enough to get all the records. Because I am thinking you are correct.
However, considering the buffer may be large, I fear we may run afoul of memory
allocation failures. And that will require some more tricky error recovery to
continue reading the log because the irq settings state:
"... Settings: Specifies the settings for the interrupt when the <event> event
log transitions from having no entries to having one or more entries."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This means that no more interrupts will happen until the log is empty and
additional events occur. So if an allocation failure happens I'll have to put
a task on a work queue to wake up and continue to try. Otherwise the log will
stall. Or we could just put a WARN_ON_ONCE() in and hope this never happens...
I still believe that with a clear operation defined my method makes more sense.
But I agree with you that the language is not strong.
:-(
> > > Record 3 is never handled.
> > >
> > > If we add in clearing as happens later in the series,
> >
> > I suppose I should squash the patches as this may not work without the
> > clearing. :-/
> >
> > > the current
> > > assumption is that if we clear some records a subsequent read will
> > > start again. I'm not sure that is true. If it is spec reference needed.
> > >
> > > So assumption is
> > > Read 1: Records 0, 1, 2, 3 More set
> > > Clear 0, 1, 2
> > > Read 2: Records 3, 4, 5, 6
> > > Clear 3, 4, 5 More not set, but catch it with the condition above.
> > > Read 3: 6 only
> > > Clear 6
> > >
> > > However, I think a valid implementation could do the following
> > > (imagine a ring buffer with a pointer to the 'next' record to read out and
> > > each record has a 'valid' flag to deal with corner cases around
> > > sequences such as read log once, start reading again and some
> > > clears occur using handles obtained from first read - not that
> > > case isn't ruled out by the spec as far as I can see).
> >
> > I believe this is a violation because the next pointer can't be advanced until
> > the record is cleared. Otherwise the device is not returning items in temporal
> > order based on what is in the log.
>
> Ah. This is where we disagree. The temporal order is (potentially?) unconnected
> from the clearing. The device did return them in temporal order, we just didn't
> take any novice of record 3 being returned.
:-/
> A valid reading of that temporal order comment is actually the other way around
> that the device must not reset it's idea of temporal order until all records
> have been read (reading 3 twice is not in temporal order - imagine we had
> read 5 each time and it becomes more obvious as the read order becomes
> 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> reading of the term.
Well I guess. My reading was that it must return the first element temporally
within the list at the time of the Get operation.
So in this example since 3 is still in the list it must return it first. Each
read is considered atomic from the others. Yes as long as 0 is in the queue it
will be returned.
But I can see it your way too...
>
> The more I read this, the more I think the current implementation
> is not compliant with the specification at all.
>
> I'm not seeing a spec mention of 'reseting' the ordering on clearing records
> (which might have been a good thing in the first place but too late now).
There is no resetting of order. Only that the device does not consider the
previous reads on determining which events to return on any individual Get
call.
>
> >
> > >
> > > Read 1: Records 0, 1, 2, 3 More set. 'next' pointer points to record 4.
> > > Clear 0, 1, 2
> > > Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> > > Clear 4, 5, 6
> > >
> > > Skipping record 3.
> > >
> > > So I think we have to absorb the full mailbox payload each time to guarantee
> > > we don't skip events or process them out of order (which is what would happen
> > > if we relied on a retry loop - we aren't allowed to clear them out of
> > > order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> > > shall verify the event record handles specified in the input payload are in
> > > temporal order. ... ").
> > > Obviously that temporal order thing is only relevant if we get my second
> > > example occurring on real hardware. I think the spec is vague enough
> > > to allow that implementation. Would have been easy to specify this originally
> > > but it probably won't go in as errata so we need to cope with all the
> > > flexibility that is present.
> >
> > :-( Yea coulda, woulda, shoulda... ;-)
> >
> > >
> > > What fun and oh for a parameter to control how many records are returned!
> >
> > Yea. But I really don't think there is a problem unless someone really take
> > liberty with the spec. I think it boils down to how one interprets _when_ a
> > record is removed from the log.
>
> This is nothing to do with removal. The wording we have is just about reading
> and I think a strict reading of the spec would say your assumption of a reset of the
> read pointer on clear is NOT a valid implementation. There is separate wording
> about clears being in temporal order, but that doesn't effect the Get Event
> Records handling.
>
> >
> > If the record is removed when it is returned (as in your 'next' pointer
> > example) then why have a clear at all?
>
> Because if your software crashes, you don't have a handshake to reestablish
> state. If that happens you read the whole log until MORE is not set and
> then read it again to get a clean list. It's messy situation that has
> been discussed before for GET POISON LIST which has the same nasty handing
> of MORE. (look in appropriate forum for resolution to that one that we can't
> yet discuss here!)
I can see the similarities but I think events are a more ephemeral item which
makes sense to clear once they are consumed. The idea that they should be left
for others to consume does not make sense to me. Where Poison is something
which could be a permanent marker which should be left in a list.
>
> Also, allows for non destructive readback (debugging tools might take a look
> having paused the normal handling).
That is true.
>
> > If my interpretation is correct then
> > the next available entry is the one which has not been cleared.
>
> If that is the case the language in "More Event Records" doesn't work
> "The host should continue to retrieve records using this command, until
> this indicator is no longer set by the device"
>
> With your reading of the spec, if we clear nothing, we'd keep getting the
> first set of records and only be able to read more by clearing them...
>
Yea.
>
> > Therefore in
> > your example 'next' is not incremented until clear has been called. I think
> > that implementation is also supported by the idea that records must be cleared
> > in temporal order. Otherwise I think devices would get confused.
>
> Not hard for device to do this (how I now read the spec) properly.
>
> Two pointers:
> 1) Next to clear: CLEAR
> 2) Next to read: READ
>
> Advance the the READ pointer on Get Event Records
And loop back to the start on a further read... I'm looking at changing the
code for this but I think making it fully robust under a memory allocation
failure is going to be more tedious or we punt.
> For CLEAR, check that the requested clears are handled in order and that
> they are before the READ pointer.
>
> Maybe we should just take it to appropriate spec forum to seek a clarification?
Probably. I've not paid attention lately.
I've sent a separate email with you cc'ed. Perhaps we can get some
clarification before I completely rework this.
Ira
>
> Jonathan
>
> >
> > FWIW the qemu implementation is based on my interpretation ATM.
> >
> > Ira
> >
> > >
> > > Jonathan
> > >
> > >
> > > >
> > > > >
> > > > > > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > > > +}
> > > > >
> > >
> > > >
> > >
>
On Mon, 28 Nov 2022 15:30:12 -0800
Ira Weiny <[email protected]> wrote:
> On Mon, Nov 21, 2022 at 10:47:14AM +0000, Jonathan Cameron wrote:
> > On Fri, 18 Nov 2022 15:26:17 -0800
> > Ira Weiny <[email protected]> wrote:
> >
> > > On Thu, Nov 17, 2022 at 10:43:37AM +0000, Jonathan Cameron wrote:
> > > > On Wed, 16 Nov 2022 16:47:20 -0800
> > > > Ira Weiny <[email protected]> wrote:
> > > >
> > > >
> > >
> > > [snip]
> > >
> > > > >
> > > > > >
> > > > > > > + int i;
> > > > > > > +
> > > > > > > + for (i = 0; i < nr_rec; i++)
> > > > > > > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > > > > > > + type,
> > > > > > > + &payload.record[i]);
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (trace_cxl_overflow_enabled() &&
> > > > > > > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > > > > > > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > > > > > > +
> > > > > > > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
> > > > > >
> > > > > > Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> > > > > > payload not the total number.
> > > > >
> > > > > I don't think so. The only value passed to the device is the _input_ payload
> > > > > size. The output payload size is not passed to the device and is not included
> > > > > in the Get Event Records Input Payload. (Table 8-49)
> > > > >
> > > > > So my previous code was wrong. Here is an example I think which is within the
> > > > > spec but would result in the more records flag not being set.
> > > > >
> > > > > Device log depth == 10
> > > > > nr log entries == 7
> > > > > nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000
> > > > >
> > > > > Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
> > > > > mailbox code truncates that to 3. More Event Records == 0 because it sent all
> > > > > 7 that it had.
> > > > >
> > > > > This code will clear 3 and read again 2 more times.
> > > > >
> > > > > Am I reading that wrong?
> > > >
> > > > I think this is still wrong, but for a different reason. :)
> > >
> > > I hope not... :-/
> > >
> > > > If we don't clear the records and more records is set, that means it didn't
> > > > fit in the mailbox payload (potentially 1MB) then the next read
> > > > will return the next set of records from there.
> > >
> > > That is not how I read the Get Event Records command:
> > >
> > > From 8.2.9.2.2 Get Event Records
> > >
> > > ... "Devices shall return event records to the host in the temporal order the
> > > device detected the events in. The event occurring the earliest in time, in the
> > > specific event log, shall be returned first."
> > >
> > > If item 3 below is earlier than 4 then it must be returned if we have not
> > > cleared it. At least that is how I read the above. :-/
> >
> > In general that doesn't work. Imagine we cleared no records.
> > In that case we'd return 4 despite there being earlier records.
> > There is no language to cover this particular case of clearing
> > part of what was returned. The device did return the records
> > in temporal order, we just didn't notice some of them.
> >
> > The wonders of slightly loose spec wording. Far as I can tell
> > we are stuck with having to come with all things that could be
> > read as being valid implementations.
>
> So I've been thinking about this for a while.
>
> Lets take this example:
>
> > > >
> > > > Taking this patch only, let's say the mailbox takes 4 records.
> > > > Read 1: Records 0, 1, 2, 3 More set.
> > > > We handle 0, 1, 2
> > > > Read 2: Records 4, 5, 6 More not set.
> > > > We handle 4, 5, 6
> > > >
>
> In this case what happens if you do a 3rd read? Does the device return
> nothing? Or does it return 0, 1, 2, 3 again?
>
> It must start from the beginning right? But that is no longer in temporal
> order by your definition either.
Agreed that is not clearly specified either. I assume it works the same
way as poison where we raised the question and conclusion was it starts again
at the beginning. In fact we have to loop twice to guarantee that we have
all the records (as other software may have crashed half way through reading
the poison list so we don't know if we have the first record or not)..
>
> And if it returns nothing then there is no way to recover them except on device
> reset?
>
> FWIW I'm altering the patch set to do what you say and allocate a buffer large
> enough to get all the records. Because I am thinking you are correct.
Horrible, but maybe the best we can do (subject to suggested hack below ;)
>
> However, considering the buffer may be large, I fear we may run afoul of memory
> allocation failures. And that will require some more tricky error recovery to
> continue reading the log because the irq settings state:
>
We could implement cleverer mailbox handling to avoid the large allocation requirement.
Would be messy though as we'd effectively have to lock the mailbox whilst we did
multiple reads of the content into a smaller buffer.
> "... Settings: Specifies the settings for the interrupt when the <event> event
> log transitions from having no entries to having one or more entries."
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This means that no more interrupts will happen until the log is empty and
> additional events occur. So if an allocation failure happens I'll have to put
> a task on a work queue to wake up and continue to try. Otherwise the log will
> stall. Or we could just put a WARN_ON_ONCE() in and hope this never happens...
I think the WARN_ON_ONCE() is probably fine. If we are paranoid vmalloc one
when we initially connect device as failure less likely...
As a side note, seems like we should maybe take a request to SSWG for devices
to optionally be told to use a smaller mailbox than they support - in order to allow
for corners like this. There is such a command but it's prohibited on primary and
secondary mailboxes (Set Response Message Limit). That is allowed on switch CCIs,
I guess because it is assumed they may be connected to a BMC without much memory.
>
> I still believe that with a clear operation defined my method makes more sense.
> But I agree with you that the language is not strong.
Absolutely agree! Your method would be the one I'd push for if we were starting
from scratch (or another similar method looking like what I can't talk about
for a similar case...)
>
> :-(
>
> > > > Record 3 is never handled.
> > > >
> > > > If we add in clearing as happens later in the series,
> > >
> > > I suppose I should squash the patches as this may not work without the
> > > clearing. :-/
> > >
> > > > the current
> > > > assumption is that if we clear some records a subsequent read will
> > > > start again. I'm not sure that is true. If it is spec reference needed.
> > > >
> > > > So assumption is
> > > > Read 1: Records 0, 1, 2, 3 More set
> > > > Clear 0, 1, 2
> > > > Read 2: Records 3, 4, 5, 6
> > > > Clear 3, 4, 5 More not set, but catch it with the condition above.
> > > > Read 3: 6 only
> > > > Clear 6
> > > >
> > > > However, I think a valid implementation could do the following
> > > > (imagine a ring buffer with a pointer to the 'next' record to read out and
> > > > each record has a 'valid' flag to deal with corner cases around
> > > > sequences such as read log once, start reading again and some
> > > > clears occur using handles obtained from first read - not that
> > > > case isn't ruled out by the spec as far as I can see).
> > >
> > > I believe this is a violation because the next pointer can't be advanced until
> > > the record is cleared. Otherwise the device is not returning items in temporal
> > > order based on what is in the log.
> >
> > Ah. This is where we disagree. The temporal order is (potentially?) unconnected
> > from the clearing. The device did return them in temporal order, we just didn't
> > take any novice of record 3 being returned.
>
> :-/
>
> > A valid reading of that temporal order comment is actually the other way around
> > that the device must not reset it's idea of temporal order until all records
> > have been read (reading 3 twice is not in temporal order - imagine we had
> > read 5 each time and it becomes more obvious as the read order becomes
> > 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> > reading of the term.
>
> Well I guess. My reading was that it must return the first element temporally
> within the list at the time of the Get operation.
>
> So in this example since 3 is still in the list it must return it first. Each
> read is considered atomic from the others. Yes as long as 0 is in the queue it
> will be returned.
>
> But I can see it your way too...
That pesky text under More Event Records flag doesn't mention clearing when it
says "The host should continue to retrieve
records using this command, until this indicator is no longer set by the
device."
I wish it did :(
>
> >
> > The more I read this, the more I think the current implementation
> > is not compliant with the specification at all.
> >
> > I'm not seeing a spec mention of 'reseting' the ordering on clearing records
> > (which might have been a good thing in the first place but too late now).
>
> There is no resetting of order. Only that the device does not consider the
> previous reads on determining which events to return on any individual Get
> call.
Sure, see above quote though.
>
> >
> > >
> > > >
> > > > Read 1: Records 0, 1, 2, 3 More set. 'next' pointer points to record 4.
> > > > Clear 0, 1, 2
> > > > Read 2: Records 4, 5, 6 More not set. 'next' pointer points to record 7.
> > > > Clear 4, 5, 6
> > > >
> > > > Skipping record 3.
> > > >
> > > > So I think we have to absorb the full mailbox payload each time to guarantee
> > > > we don't skip events or process them out of order (which is what would happen
> > > > if we relied on a retry loop - we aren't allowed to clear them out of
> > > > order anyway 8.2.9.2.3 "Events shall be cleared in temporal order. The device
> > > > shall verify the event record handles specified in the input payload are in
> > > > temporal order. ... ").
> > > > Obviously that temporal order thing is only relevant if we get my second
> > > > example occurring on real hardware. I think the spec is vague enough
> > > > to allow that implementation. Would have been easy to specify this originally
> > > > but it probably won't go in as errata so we need to cope with all the
> > > > flexibility that is present.
> > >
> > > :-( Yea coulda, woulda, shoulda... ;-)
> > >
> > > >
> > > > What fun and oh for a parameter to control how many records are returned!
> > >
> > > Yea. But I really don't think there is a problem unless someone really take
> > > liberty with the spec. I think it boils down to how one interprets _when_ a
> > > record is removed from the log.
> >
> > This is nothing to do with removal. The wording we have is just about reading
> > and I think a strict reading of the spec would say your assumption of a reset of the
> > read pointer on clear is NOT a valid implementation. There is separate wording
> > about clears being in temporal order, but that doesn't effect the Get Event
> > Records handling.
> >
> > >
> > > If the record is removed when it is returned (as in your 'next' pointer
> > > example) then why have a clear at all?
> >
> > Because if your software crashes, you don't have a handshake to reestablish
> > state. If that happens you read the whole log until MORE is not set and
> > then read it again to get a clean list. It's messy situation that has
> > been discussed before for GET POISON LIST which has the same nasty handing
> > of MORE. (look in appropriate forum for resolution to that one that we can't
> > yet discuss here!)
>
> I can see the similarities but I think events are a more ephemeral item which
> makes sense to clear once they are consumed. The idea that they should be left
> for others to consume does not make sense to me. Where Poison is something
> which could be a permanent marker which should be left in a list.
Agreed - but sections use same wording for the More flag.. So we need to interpret
the same.
>
> >
> > Also, allows for non destructive readback (debugging tools might take a look
> > having paused the normal handling).
>
> That is true.
>
> >
> > > If my interpretation is correct then
> > > the next available entry is the one which has not been cleared.
> >
> > If that is the case the language in "More Event Records" doesn't work
> > "The host should continue to retrieve records using this command, until
> > this indicator is no longer set by the device"
> >
> > With your reading of the spec, if we clear nothing, we'd keep getting the
> > first set of records and only be able to read more by clearing them...
> >
>
> Yea.
>
> >
> > > Therefore in
> > > your example 'next' is not incremented until clear has been called. I think
> > > that implementation is also supported by the idea that records must be cleared
> > > in temporal order. Otherwise I think devices would get confused.
> >
> > Not hard for device to do this (how I now read the spec) properly.
> >
> > Two pointers:
> > 1) Next to clear: CLEAR
> > 2) Next to read: READ
> >
> > Advance the the READ pointer on Get Event Records
>
> And loop back to the start on a further read... I'm looking at changing the
> code for this but I think making it fully robust under a memory allocation
> failure is going to be more tedious or we punt.
If we get a memory allocation failure, perhaps we could do the follow horrible hack.
1 Allocate a small buffer.
2 Read once.
3 Hopefully we get the full record - in which case success.
4 Clear those records.
5 If not dealt with all records - read again until More Event Records not set
(may already not be if it fitted in the buffer)
6 Go back to 2.
If we think a valid implementation might reset the read pointer on clear then
there is a variant where we make use of the fact the handles are constant
- read 3 records, clear 2 and then use the handle of remaining one to identify
if we have the next 3 to clear or not...
>
> > For CLEAR, check that the requested clears are handled in order and that
> > they are before the READ pointer.
> >
> > Maybe we should just take it to appropriate spec forum to seek a clarification?
>
> Probably. I've not paid attention lately.
>
> I've sent a separate email with you cc'ed. Perhaps we can get some
> clarification before I completely rework this.
Fingers crossed.
Thanks,
Jonathan
>
> Ira
>
> >
> > Jonathan
> >
> > >
> > > FWIW the qemu implementation is based on my interpretation ATM.
> > >
> > > Ira
> > >
> > > >
> > > > Jonathan
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > > > > > > +}
> > > > > >
> > > >
> > > > >
> > > >
> >
On Tue, Nov 29, 2022 at 12:26:20PM +0000, Jonathan Cameron wrote:
> On Mon, 28 Nov 2022 15:30:12 -0800
> Ira Weiny <[email protected]> wrote:
>
[snip]
> > > A valid reading of that temporal order comment is actually the other way around
> > > that the device must not reset it's idea of temporal order until all records
> > > have been read (reading 3 twice is not in temporal order - imagine we had
> > > read 5 each time and it becomes more obvious as the read order becomes
> > > 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> > > reading of the term.
> >
> > Well I guess. My reading was that it must return the first element temporally
> > within the list at the time of the Get operation.
> >
> > So in this example since 3 is still in the list it must return it first. Each
> > read is considered atomic from the others. Yes as long as 0 is in the queue it
> > will be returned.
> >
> > But I can see it your way too...
>
> That pesky text under More Event Records flag doesn't mention clearing when it
> says "The host should continue to retrieve
> records using this command, until this indicator is no longer set by the
> device."
>
> I wish it did :(
>
As I have reviewed these in my head again I have come to the conclusion that
the More Event Records flags is useless. Let me explain:
The Clear all Records flag is useless because if an event which occurs between the
Get and Clear all operation will be dropped without the host having seen it.
However, while clearing records based on the handles read, additional events
could come in. Because of the way the interrupts are specified the host can't
be sure that those new events will cause a zero to non-zero transition. This
is because there is no way to guarantee all the events were cleared at the
moment the events came in.
I believe this is what you mentioned in another email about needing an 'extra
read' at the end to ensure there was nothing more to be read. But based on
that logic the only thing that matters is the Get Event.Record
Count. If it is not 0 keep on reading because while the host is clearing the
records another event could come in.
In other words, the only way to be sure that all records are seen is to do a
Get and see the number of records equal to 0. Thus any further events will
trigger an interrupt and we can safely exit the loop.
Ira
Basically the loop looks like:
int nr_rec;
do {
... <Get Events> ...
nr_rec = le16_to_cpu(payload->record_count);
... <for each record trace> ...
... <for each record clear> ...
} while (nr_rec);
On Thu, Nov 17, 2022 at 11:22:35AM +0000, Jonathan Cameron wrote:
> On Wed, 16 Nov 2022 17:23:58 -0800
> Ira Weiny <[email protected]> wrote:
>
> > On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> > > On Thu, 10 Nov 2022 10:57:54 -0800
> > > [email protected] wrote:
> > >
> > > > From: Ira Weiny <[email protected]>
> > > >
> > > > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > > >
> > > > Determine if the event read is memory module record and if so trace the
> > > > record.
> > > >
> > > > Signed-off-by: Ira Weiny <[email protected]>
> > > >
> > > Noticed that we have a mixture of fully capitalized and not for flags.
> > > With that either explained or tidied up:
> > >
> > > Reviewed-by: Jonathan Cameron <[email protected]>
> > >
> > > > +/*
> > > > + * Device Health Information - DHI
> > > > + *
> > > > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > > > + */
> > > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> > > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> > > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> > > > +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> > > > + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> > > > + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> > > > + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
> > >
> > > Why are we sometime using capitals for flags (e.g patch 5) and not other times?
> >
> > Not sure what you mean. Do you mean this from patch 5?
> Nope
>
> +#define CXL_DPA_VOLATILE BIT(0)
> +#define CXL_DPA_NOT_REPAIRABLE BIT(1)
> +#define show_dpa_flags(flags) __print_flags(flags, "|", \
> + { CXL_DPA_VOLATILE, "VOLATILE" }, \
> + { CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \
> +)
> +
>
> Where they are all capitals. I thought that was maybe a flags vs other fields
> thing but it doesn't seem to be.
I've not made all flags capital on this and other patches.
Ira
>
>
> >
> > ...
> > { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
> > { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
> > { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
> > ...
> >
> > Threshold event was a mistake. This is the capitalization the spec uses.
> >
> > Bit[0]: Uncorrectable Event: When set, indicates the reported event is
> > ^^^^^^^^^^^^^^^^^^^
> > uncorrectable by the device. When cleared, indicates the reported
> > event was corrected by the device.
> >
> > Bit[1]: Threshold Event: When set, the event is the result of a
> > ^^^^^^^^^^^^^^^
> > threshold on the device having been reached. When cleared, the event
> > is not the result of a threshold limit.
> >
> > Bit[2]: Poison List Overflow Event: When set, the Poison List has
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > overflowed, and this event is not in the Poison List. When cleared, the
> > Poison List has not overflowed.
> >
> >
> > I'll update this 'Event' in patch 5. Probably need to add 'Event' to the
> > Poison List...
> >
> > Ira
>
On Wed, Nov 16, 2022 at 02:40:21PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:55 -0800
> [email protected] wrote:
>
> > From: Ira Weiny <[email protected]>
> >
> > CXL device events are signaled via interrupts. Each event log may have
> > a different interrupt message number. These message numbers are
> > reported in the Get Event Interrupt Policy mailbox command.
> >
> > Add interrupt support for event logs. Interrupts are allocated as
> > shared interrupts. Therefore, all or some event logs can share the same
> > message number.
> >
> > The driver must deal with the possibility that dynamic capacity is not
> > yet supported by a device it sees. Fallback and retry without dynamic
> > capacity if the first attempt fails.
> >
> > Device capacity event logs interrupt as part of the informational event
> > log. Check the event status to see which log has data.
> >
> > Signed-off-by: Ira Weiny <[email protected]>
> >
> Hi Ira,
>
> A few comments inline.
Thanks for the review!
>
> Thanks,
>
> Jonathan
>
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 879b228a98a0..1e6762af2a00 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
>
> > /**
> > * cxl_mem_get_event_records - Get Event Records from the device
> > @@ -867,6 +870,52 @@ void cxl_mem_get_event_records(struct cxl_dev_state *cxlds)
> > }
> > EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
> >
> > +int cxl_event_config_msgnums(struct cxl_dev_state *cxlds)
> > +{
> > + struct cxl_event_interrupt_policy *policy = &cxlds->evt_int_policy;
> > + size_t policy_size = sizeof(*policy);
> > + bool retry = true;
> > + int rc;
> > +
> > + policy->info_settings = CXL_INT_MSI_MSIX;
> > + policy->warn_settings = CXL_INT_MSI_MSIX;
> > + policy->failure_settings = CXL_INT_MSI_MSIX;
> > + policy->fatal_settings = CXL_INT_MSI_MSIX;
> > + policy->dyn_cap_settings = CXL_INT_MSI_MSIX;
> > +
> > +again:
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_SET_EVT_INT_POLICY,
> > + policy, policy_size, NULL, 0);
> > + if (rc < 0) {
> > + /*
> > + * If the device does not support dynamic capacity it may fail
> > + * the command due to an invalid payload. Retry without
> > + * dynamic capacity.
> > + */
>
> There are a number of ways to discover if DCD is supported that aren't based
> on try and retry like this. 9.13.3 has "basic sequence to utilize Dynamic Capacity"
> That calls out:
> Verify the necessary Dynamic Capacity commands are returned in the CEL.
>
> First I'm not sure we should set the interrupt on for DCD until we have a lot
> more of the flow handled, secondly even then we should figure out if it is supported
> at a higher level than this command and pass that info down here.
I'm not sure I really agree. The events are just traced. I think this
functionality is really orthogonal to if any other support for DCD is there.
Regardless like I said in the call I think deferring this is the right way to
go for now.
>
>
> > + if (retry) {
> > + retry = false;
> > + policy->dyn_cap_settings = 0;
> > + policy_size = sizeof(*policy) - sizeof(policy->dyn_cap_settings);
> > + goto again;
> > + }
> > + dev_err(cxlds->dev, "Failed to set event interrupt policy : %d",
> > + rc);
> > + memset(policy, CXL_INT_NONE, sizeof(*policy));
>
> Relying on all the fields being 1 byte is a bit error prone. I'd just set them all
> individually in the interests of more readable code.
Done.
>
> > + return rc;
> > + }
> > +
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVT_INT_POLICY, NULL, 0,
> > + policy, policy_size);
>
> Add a comment on why you are reading this back (to get the msgnums in the upper
> bits) as it's not obvious to a casual reader.
Done.
>
> > + if (rc < 0) {
> > + dev_err(cxlds->dev, "Failed to get event interrupt policy : %d",
> > + rc);
> > + return rc;
> > + }
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_event_config_msgnums, CXL);
> > +
>
> ...
>
> > diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> > index e0d511575b45..64b2e2671043 100644
> > --- a/drivers/cxl/pci.c
> > +++ b/drivers/cxl/pci.c
> > @@ -458,6 +458,138 @@ static void cxl_pci_alloc_irq_vectors(struct cxl_dev_state *cxlds)
> > cxlds->nr_irq_vecs = nvecs;
> > }
> >
> > +struct cxl_event_irq_id {
> > + struct cxl_dev_state *cxlds;
> > + u32 status;
> > + unsigned int msgnum;
> msgnum is only here for freeing the interrupt - I'd rather we fixed
> that by using standard infrastructure (or adding some - see below).
>
> status is an indirect way of allowing us to share an interrupt handler.
> You could do that by registering a trivial wrapper for each instead.
> Then all you have left is the cxl_dev_state which could be passed
> in directly as the callback parameter removing need to have this
> structure at all. I think that might be neater.
It does prevent the alloc of this structure which I like.
I've made the change.
>
> > +};
> > +
> > +static irqreturn_t cxl_event_int_thread(int irq, void *id)
> > +{
> > + struct cxl_event_irq_id *cxlid = id;
> > + struct cxl_dev_state *cxlds = cxlid->cxlds;
> > +
> > + if (cxlid->status & CXLDEV_EVENT_STATUS_INFO)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_INFO);
> > + if (cxlid->status & CXLDEV_EVENT_STATUS_WARN)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_WARN);
> > + if (cxlid->status & CXLDEV_EVENT_STATUS_FAIL)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FAIL);
> > + if (cxlid->status & CXLDEV_EVENT_STATUS_FATAL)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_FATAL);
> > + if (cxlid->status & CXLDEV_EVENT_STATUS_DYNAMIC_CAP)
> > + cxl_mem_get_records_log(cxlds, CXL_EVENT_TYPE_DYNAMIC_CAP);
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t cxl_event_int_handler(int irq, void *id)
> > +{
> > + struct cxl_event_irq_id *cxlid = id;
> > + struct cxl_dev_state *cxlds = cxlid->cxlds;
> > + u32 status = readl(cxlds->regs.status + CXLDEV_DEV_EVENT_STATUS_OFFSET);
> > +
> > + if (cxlid->status & status)
> > + return IRQ_WAKE_THREAD;
> > + return IRQ_HANDLED;
>
> If status not set IRQ_NONE.
> Ah. I see Dave raised this as well.
Yep done.
>
> > +}
>
> ...
>
> > +static int cxl_request_event_irq(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type log_type,
> > + u8 setting)
> > +{
> > + struct device *dev = cxlds->dev;
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > + struct cxl_event_irq_id *id;
> > + unsigned int msgnum = CXL_EVENT_INT_MSGNUM(setting);
> > + int irq;
> > +
> > + /* Disabled irq is not an error */
> > + if (!cxl_evt_int_is_msi(setting) || msgnum > cxlds->nr_irq_vecs) {
>
> I don't think that second condition can occur. The language under table 8-52
> (I think) means that it will move around if there aren't enough vectors
> (for MSI - MSI-X is more complex, but result the same).
Based on the other review this is just a bool msi_enabled which is used to
determine if this should be set up at all.
>
> > + dev_dbg(dev, "Event interrupt not enabled; %s %u %d\n",
> > + cxl_event_log_type_str(CXL_EVENT_TYPE_INFO),
> > + msgnum, cxlds->nr_irq_vecs);
> > + return 0;
> > + }
> > +
> > + id = devm_kzalloc(dev, sizeof(*id), GFP_KERNEL);
> > + if (!id)
> > + return -ENOMEM;
> > +
> > + id->cxlds = cxlds;
> > + id->msgnum = msgnum;
> > + id->status = log_type_to_status(log_type);
> > +
> > + irq = pci_request_irq(pdev, id->msgnum, cxl_event_int_handler,
> > + cxl_event_int_thread, id,
> > + "%s:event-log-%s", dev_name(dev),
> > + cxl_event_log_type_str(log_type));
> > + if (irq)
> > + return irq;
> > +
> > + devm_add_action_or_reset(dev, cxl_free_event_irq, id);
>
> Hmm. no pcim_request_irq() maybe this is the time to propose one
> (separate from this patch so we don't get delayed by that!)
Perhaps. But not tonight... ;-)
>
> We discussed this way back in DOE series (I'd forgotten but lore found
> it for me). There I suggested just calling
> devm_request_threaded_irq() directly as a work around.
Yea that works fine. One issue is we lose the format printing of the irq name:
...
29: ... PCI-MSI 100663300-edge 0000:c0:00.0:event-log-Fatal
30: ... PCI-MSI 100663301-edge 0000:c0:00.0:event-log-Failure
31: ... PCI-MSI 100663302-edge 0000:c0:00.0:event-log-Warning
32: ... PCI-MSI 100663303-edge 0000:c0:00.0:event-log-Informational
...
Thanks,
Ira
>
> > + return 0;
> > +}
> > +
> > +static void cxl_event_irqsetup(struct cxl_dev_state *cxlds)
> > +{
> > + struct device *dev = cxlds->dev;
> > + u8 setting;
> > +
> > + if (cxl_event_config_msgnums(cxlds))
> > + return;
> > +
> > + /*
> > + * Dynamic Capacity shares the info message number
> > + * Nothing to be done except check the status bit in the
> > + * irq thread.
> > + */
> > + setting = cxlds->evt_int_policy.info_settings;
> > + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_INFO, setting))
> > + dev_err(dev, "Failed to get interrupt for %s event log\n",
> > + cxl_event_log_type_str(CXL_EVENT_TYPE_INFO));
> > +
> > + setting = cxlds->evt_int_policy.warn_settings;
> > + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_WARN, setting))
> > + dev_err(dev, "Failed to get interrupt for %s event log\n",
> > + cxl_event_log_type_str(CXL_EVENT_TYPE_WARN));
> > +
> > + setting = cxlds->evt_int_policy.failure_settings;
> > + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FAIL, setting))
> > + dev_err(dev, "Failed to get interrupt for %s event log\n",
> > + cxl_event_log_type_str(CXL_EVENT_TYPE_FAIL));
> > +
> > + setting = cxlds->evt_int_policy.fatal_settings;
> > + if (cxl_request_event_irq(cxlds, CXL_EVENT_TYPE_FATAL, setting))
> > + dev_err(dev, "Failed to get interrupt for %s event log\n",
> > + cxl_event_log_type_str(CXL_EVENT_TYPE_FATAL));
> > +}
>
On Tue, 29 Nov 2022 21:09:58 -0800
Ira Weiny <[email protected]> wrote:
> On Tue, Nov 29, 2022 at 12:26:20PM +0000, Jonathan Cameron wrote:
> > On Mon, 28 Nov 2022 15:30:12 -0800
> > Ira Weiny <[email protected]> wrote:
> >
>
> [snip]
>
> > > > A valid reading of that temporal order comment is actually the other way around
> > > > that the device must not reset it's idea of temporal order until all records
> > > > have been read (reading 3 twice is not in temporal order - imagine we had
> > > > read 5 each time and it becomes more obvious as the read order becomes
> > > > 0,1,2,3,4,3,4,5,6,7 etc which is clearly not in temporal order by any normal
> > > > reading of the term.
> > >
> > > Well I guess. My reading was that it must return the first element temporally
> > > within the list at the time of the Get operation.
> > >
> > > So in this example since 3 is still in the list it must return it first. Each
> > > read is considered atomic from the others. Yes as long as 0 is in the queue it
> > > will be returned.
> > >
> > > But I can see it your way too...
> >
> > That pesky text under More Event Records flag doesn't mention clearing when it
> > says "The host should continue to retrieve
> > records using this command, until this indicator is no longer set by the
> > device."
> >
> > I wish it did :(
> >
>
> As I have reviewed these in my head again I have come to the conclusion that
> the More Event Records flags is useless. Let me explain:
>
> The Clear all Records flag is useless because if an event which occurs between the
> Get and Clear all operation will be dropped without the host having seen it.
Can still be used to get a known clean sheet if you don't care about a bunch
of records on initial boot because no data in flight yet etc.
Agreed it is no use if you care about content of the records.
Make sure interrupts are enabled before re-checking if there are new records
to close that race.
>
> However, while clearing records based on the handles read, additional events
> could come in. Because of the way the interrupts are specified the host can't
> be sure that those new events will cause a zero to non-zero transition. This
> is because there is no way to guarantee all the events were cleared at the
> moment the events came in.
>
> I believe this is what you mentioned in another email about needing an 'extra
> read' at the end to ensure there was nothing more to be read. But based on
> that logic the only thing that matters is the Get Event.Record
> Count. If it is not 0 keep on reading because while the host is clearing the
> records another event could come in.
>
> In other words, the only way to be sure that all records are seen is to do a
> Get and see the number of records equal to 0. Thus any further events will
> trigger an interrupt and we can safely exit the loop.
Agreed - standard race to close when ever we have a FIFO with edge interrupts
on how full it is.
More records is useful for a different potential pattern of non destructive
read and later clear. Or for a debug non destructive read.
int nr_rec;
<list>
round_we_go:
do {
... <for each record trace and add to list...> ...
...
} while (!MORE);
for_each_list_entry() {
clear records one at a time.
}
nr_rec = le16_to_cpu(payload->record_count);
if (nr_rec)
goto round_we_go;
...
>
> Ira
>
> Basically the loop looks like:
>
> int nr_rec;
>
> do {
> ... <Get Events> ...
>
> nr_rec = le16_to_cpu(payload->record_count);
>
> ... <for each record trace> ...
> ... <for each record clear> ...
>
> } while (nr_rec);
>