2024-05-07 19:07:57

by Fabio M. De Francesco

[permalink] [raw]
Subject: [RFC PATCH 0/3] ACPI: extlog: Log and trace similarly to GHES

When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
via two similar paths, ELOG and GHES.

Currently, ELOG and GHES show some inconsistencies in how they print to
the kernel log as well as in how they report to userspace via trace
events.

This short series wants to make these two competing path act similarly.

This is an RFC mainly because of patch 3/3 which enables kernel logging
even in precence of userspace monitoring events. I'm not sure if this
behavior is wanted.

Fabio M. De Francesco (3):
ACPI: extlog: Trace CPER Non-standard Section Body
ACPI: extlog: Trace PCI Express Error Section
ACPI: extlog: Make print_extlog_rcd() log unconditionally

drivers/acpi/acpi_extlog.c | 44 +++++++++++++++++++++++++++++++++-----
drivers/ras/debugfs.c | 6 ------
include/linux/ras.h | 2 --
3 files changed, 39 insertions(+), 13 deletions(-)

--
2.45.0



2024-05-07 19:08:09

by Fabio M. De Francesco

[permalink] [raw]
Subject: [RFC PATCH 1/3] ACPI: extlog: Trace CPER Non-standard Section Body

Make extlog_print() (ELOG) trace "Non-standard Section Body" reported by
firmware to the OS via Common Platform Error Record (CPER) (UEFI v2.10
Appendix N 2.3).

This adds further debug information and makes ELOG logs consistent with
ghes_do_proc() (GHES).

Cc: Dan Williams <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/acpi/acpi_extlog.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index ca87a0939135..4e62d7235d33 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -182,6 +182,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else {
+ void *err = acpi_hest_get_payload(gdata);
+
+ trace_non_standard_event(sec_type, fru_id, fru_text,
+ gdata->error_severity, err,
+ gdata->error_data_length);
}
}

--
2.45.0


2024-05-07 19:08:53

by Fabio M. De Francesco

[permalink] [raw]
Subject: [RFC PATCH 3/3] ACPI: extlog: Make print_extlog_rcd() log unconditionally

Make extlog_print_rcd() log unconditionally to report errors even if
userspace is not consuming trace events. Remove ras_userspace_consumers()
because it is not anymore used.

This change makes extlog_print() (ELOG) to act consistently to ghes_proc()
(GHES).

Cc: Dan Williams <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/acpi/acpi_extlog.c | 6 +-----
drivers/ras/debugfs.c | 6 ------
include/linux/ras.h | 2 --
3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index c167e391ba43..7bb1204080ef 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -185,10 +185,7 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,

tmp = (struct acpi_hest_generic_status *)elog_buf;

- if (!ras_userspace_consumers()) {
- print_extlog_rcd(NULL, tmp, cpu);
- goto out;
- }
+ print_extlog_rcd(NULL, tmp, cpu);

/* log event via trace */
err_seq++;
@@ -222,7 +219,6 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
}
}

-out:
mce->kflags |= MCE_HANDLED_EXTLOG;
return NOTIFY_OK;
}
diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 42afd3de68b2..6633844acdd6 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -13,12 +13,6 @@ struct dentry *ras_get_debugfs_root(void)
}
EXPORT_SYMBOL_GPL(ras_get_debugfs_root);

-int ras_userspace_consumers(void)
-{
- return atomic_read(&trace_count);
-}
-EXPORT_SYMBOL_GPL(ras_userspace_consumers);
-
static int trace_show(struct seq_file *m, void *v)
{
return 0;
diff --git a/include/linux/ras.h b/include/linux/ras.h
index a64182bc72ad..98840f16b697 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -7,11 +7,9 @@
#include <linux/cper.h>

#ifdef CONFIG_DEBUG_FS
-int ras_userspace_consumers(void);
void ras_debugfs_init(void);
int ras_add_daemon_trace(void);
#else
-static inline int ras_userspace_consumers(void) { return 0; }
static inline void ras_debugfs_init(void) { }
static inline int ras_add_daemon_trace(void) { return 0; }
#endif
--
2.45.0


2024-05-07 22:55:20

by Fabio M. De Francesco

[permalink] [raw]
Subject: [RFC PATCH 2/3] ACPI: extlog: Trace PCI Express Error Section

Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
the similar ghes_do_proc() (GHES) prints to kernel log and calls
pci_print_aer() to report via the ftrace infrastructure.

Add support to report the CPER PCIe Error section also via the ftrace
infrastructure by calling pci_print_aer() to make ELOG act consistently
with GHES.

Cc: Dan Williams <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/acpi/acpi_extlog.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 4e62d7235d33..c167e391ba43 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
return 1;
}

+static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
+ int aer_severity)
+{
+ struct aer_capability_regs *aer;
+ struct pci_dev *pdev;
+ unsigned int devfn;
+ unsigned int bus;
+ int domain;
+
+ if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+ pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
+ aer = (struct aer_capability_regs *)pcie_err->aer_info;
+ domain = pcie_err->device_id.segment;
+ bus = pcie_err->device_id.bus;
+ devfn = PCI_DEVFN(pcie_err->device_id.device,
+ pcie_err->device_id.function);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (!pdev) {
+ pr_err("no pci_dev for %04x:%02x:%02x.%x\n",
+ domain, bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
+ return;
+ }
+ pci_print_aer(pdev, aer_severity, aer);
+ }
+}
+
static int extlog_print(struct notifier_block *nb, unsigned long val,
void *data)
{
@@ -182,6 +208,11 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+ struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+ int aer_severity = cper_severity_to_aer(gdata->error_severity);
+
+ extlog_print_pcie(pcie_err, aer_severity);
} else {
void *err = acpi_hest_get_payload(gdata);

@@ -331,3 +362,4 @@ module_exit(extlog_exit);
MODULE_AUTHOR("Chen, Gong <[email protected]>");
MODULE_DESCRIPTION("Extended MCA Error Log Driver");
MODULE_LICENSE("GPL");
+MODULE_IMPORT_NS(CXL);
--
2.45.0