2024-05-27 15:06:09

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH 0/2] Make ELOG log and trace consistently with GHES

When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
via one of two similar paths, either ELOG or GHES.

Currently, ELOG and GHES show some inconsistencies in how they print to
the kernel log as well as in how they report to userspace via trace
events.

Make the two mentioned paths act similarly for what relates to logging
and tracing.

--- Changes for v1 ---

- Drop the RFC prefix and restart from PATCH v1
- Drop patch 3/3 because a discussion on it has not yet been
settled
- Drop namespacing in export of pci_print_aer while() (Dan)
- Don't use '#ifdef' in *.c files (Dan)
- Drop a reference on pdev after operation is complete (Dan)
- Don't log an error message if pdev is NULL (Dan)

--- Changes for RFC v2 ---

- 0/3: rework the subject line and the letter.
- 1/3: no changes.
- 2/3: trace CPER PCIe Section only if CONFIG_ACPI_APEI_PCIEAER
is defined; the kernel test robot reported the use of two
undefined symbols because the test for the config option was
missing; rewrite the subject line and part of commit message.
- 3/3: no changes.

Fabio M. De Francesco (2):
ACPI: extlog: Trace CPER Non-standard Section Body
ACPI: extlog: Trace CPER PCI Express Error Section

drivers/acpi/acpi_extlog.c | 35 +++++++++++++++++++++++++++++++++++
drivers/pci/pcie/aer.c | 2 +-
include/linux/aer.h | 9 ++++++---
3 files changed, 42 insertions(+), 4 deletions(-)

--
2.45.1



2024-05-27 15:06:25

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body

In extlog_print(), trace "Non-standard Section Body" reported by firmware
to the OS via Common Platform Error Record (CPER) (UEFI v2.10 Appendix N
2.3) to add further debug information and so to make ELOG log
consistently with ghes_do_proc() (GHES).

Cc: Dan Williams <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/acpi/acpi_extlog.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index f055609d4b64..e025ae390737 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -179,6 +179,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else {
+ void *err = acpi_hest_get_payload(gdata);
+
+ trace_non_standard_event(sec_type, fru_id, fru_text,
+ gdata->error_severity, err,
+ gdata->error_data_length);
}
}

--
2.45.1


2024-05-27 16:08:46

by Fabio M. De Francesco

[permalink] [raw]
Subject: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section

Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
the similar ghes_do_proc() (GHES) prints to kernel log and calls
pci_print_aer() to report via the ftrace infrastructure.

Add support to report the CPER PCIe Error section also via the ftrace
infrastructure by calling pci_print_aer() to make ELOG act consistently
with GHES.

Cc: Dan Williams <[email protected]>
Signed-off-by: Fabio M. De Francesco <[email protected]>
---
drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
drivers/pci/pcie/aer.c | 2 +-
include/linux/aer.h | 13 +++++++++++--
3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index e025ae390737..007ce96f8672 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
return 1;
}

+static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
+ int severity)
+{
+ struct aer_capability_regs *aer;
+ struct pci_dev *pdev;
+ unsigned int devfn;
+ unsigned int bus;
+ int aer_severity;
+ int domain;
+
+ if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+ pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
+ aer_severity = cper_severity_to_aer(severity);
+ aer = (struct aer_capability_regs *)pcie_err->aer_info;
+ domain = pcie_err->device_id.segment;
+ bus = pcie_err->device_id.bus;
+ devfn = PCI_DEVFN(pcie_err->device_id.device,
+ pcie_err->device_id.function);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (!pdev)
+ return;
+ pci_print_aer(pdev, aer_severity, aer);
+ pci_dev_put(pdev);
+ }
+}
+
static int extlog_print(struct notifier_block *nb, unsigned long val,
void *data)
{
@@ -179,6 +205,10 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+ struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+ extlog_print_pcie(pcie_err, gdata->error_severity);
} else {
void *err = acpi_hest_get_payload(gdata);

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ac6293c24976..794aa15527ba 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -801,7 +801,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
trace_aer_event(dev_name(&dev->dev), (status & ~mask),
aer_severity, tlp_header_valid, &aer->header_log);
}
-EXPORT_SYMBOL_NS_GPL(pci_print_aer, CXL);
+EXPORT_SYMBOL_GPL(pci_print_aer);

/**
* add_error_device - list device to be handled
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 4b97f38f3fcf..fbc82206045c 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -42,17 +42,26 @@ int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log *log);
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
+void pci_print_aer(struct pci_dev *dev, int aer_severity,
+ struct aer_capability_regs *aer);
#else
static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
{
return -EINVAL;
}
+
static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
+static inline void pci_print_aer(struct pci_dev *dev, int aer_severity,
+ struct aer_capability_regs *aer)
+{ }
#endif

-void pci_print_aer(struct pci_dev *dev, int aer_severity,
- struct aer_capability_regs *aer);
+#if defined(CONFIG_ACPI_APEI_PCIEAER)
int cper_severity_to_aer(int cper_severity);
+#else
+static inline int cper_severity_to_aer(int cper_severity) { return 0; }
+#endif
+
void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
int severity, struct aer_capability_regs *aer_regs);
#endif //_AER_H_
--
2.45.1