2020-02-07 10:33:26

by Shiju Jose

[permalink] [raw]
Subject: [PATCH v4 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Presently the vendor drivers are unable to do the recovery for the
vendor specific recoverable HW errors, reported to the APEI driver
in the vendor defined sections, because APEI driver does not support
reporting the same to the vendor drivers.

This patch set
1. add an interface to the APEI driver to enable the vendor
drivers to register the event handling functions for the corresponding
vendor specific HW errors and report the error to the vendor driver.

2. add driver to handle HiSilicon hip08 PCIe controller's errors
which is an example application of the above APEI interface.

Changes:

V4:
1. Fix for the smatch warning in the PCIe error driver:
warn: should '((((1))) << (9 + i))' be a 64 bit type?
if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
^^^ This should be BIT_ULL() because it goes up to 9 + 32.

V3:
1. Fix the comments from Bjorn Helgaas.

V2:
1. Changes in the HiSilicon PCIe controller's error handling driver
for the comments from Bjorn Helgaas.

2. Changes in the APEI interface to support reporting the vendor error
for module with multiple devices, but use the same section type.
In the error handler will use socket id/sub module id etc to distinguish
the device.

V1:
1. Fix comments from James Morse.

2. add driver to handle HiSilicon hip08 PCIe controller's errors,
which is an application of the above interface.

Shiju Jose (1):
ACPI: APEI: Add support to notify the vendor specific HW errors

Yicong Yang (1):
PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors

drivers/acpi/apei/ghes.c | 116 ++++++++++-
drivers/pci/controller/Kconfig | 8 +
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pcie-hisi-error.c | 334 +++++++++++++++++++++++++++++++
include/acpi/ghes.h | 56 ++++++
5 files changed, 510 insertions(+), 5 deletions(-)
create mode 100644 drivers/pci/controller/pcie-hisi-error.c

--
1.9.1



2020-02-07 10:33:26

by Shiju Jose

[permalink] [raw]
Subject: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Presently APEI does not support reporting the vendor specific
HW errors, received in the vendor defined table entries, to the
vendor drivers for any recovery.

This patch adds the support to register and unregister the
error handling function for the vendor specific HW errors and
notify the registered kernel driver.

Signed-off-by: Shiju Jose <[email protected]>
---
drivers/acpi/apei/ghes.c | 116 +++++++++++++++++++++++++++++++++++++++++++++--
include/acpi/ghes.h | 56 +++++++++++++++++++++++
2 files changed, 167 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 103acbb..69e18d7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -490,6 +490,109 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
#endif
}

+struct ghes_event_notify {
+ struct list_head list;
+ struct rcu_head rcu_head;
+ guid_t sec_type; /* guid of the error record */
+ ghes_event_handler_t event_handler; /* event handler function */
+ void *data; /* handler driver's private data if any */
+};
+
+/* List to store the registered event handling functions */
+static DEFINE_MUTEX(ghes_event_notify_mutex);
+static LIST_HEAD(ghes_event_handler_list);
+
+/**
+ * ghes_register_event_handler - register an event handling
+ * function for the non-fatal HW errors.
+ * @sec_type: sec_type of the corresponding CPER to be notified.
+ * @event_handler: pointer to the error handling function.
+ * @data: handler driver's private data.
+ *
+ * return 0 : SUCCESS, non-zero : FAIL
+ */
+int ghes_register_event_handler(guid_t sec_type,
+ ghes_event_handler_t event_handler,
+ void *data)
+{
+ struct ghes_event_notify *event_notify;
+
+ event_notify = kzalloc(sizeof(*event_notify), GFP_KERNEL);
+ if (!event_notify)
+ return -ENOMEM;
+
+ event_notify->event_handler = event_handler;
+ guid_copy(&event_notify->sec_type, &sec_type);
+ event_notify->data = data;
+
+ mutex_lock(&ghes_event_notify_mutex);
+ list_add_rcu(&event_notify->list, &ghes_event_handler_list);
+ mutex_unlock(&ghes_event_notify_mutex);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(ghes_register_event_handler);
+
+/**
+ * ghes_unregister_event_handler - unregister the previously
+ * registered event handling function.
+ * @sec_type: sec_type of the corresponding CPER.
+ * @data: driver specific data to distinguish devices.
+ */
+void ghes_unregister_event_handler(guid_t sec_type, void *data)
+{
+ struct ghes_event_notify *event_notify;
+ bool found = false;
+
+ mutex_lock(&ghes_event_notify_mutex);
+ rcu_read_lock();
+ list_for_each_entry_rcu(event_notify,
+ &ghes_event_handler_list, list) {
+ if (guid_equal(&event_notify->sec_type, &sec_type)) {
+ if (data != event_notify->data)
+ continue;
+ list_del_rcu(&event_notify->list);
+ found = true;
+ break;
+ }
+ }
+ rcu_read_unlock();
+ mutex_unlock(&ghes_event_notify_mutex);
+
+ if (!found) {
+ pr_err("Tried to unregister a GHES event handler that has not been registered\n");
+ return;
+ }
+
+ synchronize_rcu();
+ kfree(event_notify);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);
+
+static int ghes_handle_non_standard_event(guid_t *sec_type,
+ struct acpi_hest_generic_data *gdata, int sev)
+{
+ struct ghes_event_notify *event_notify;
+ bool found = false;
+ int ret;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(event_notify,
+ &ghes_event_handler_list, list) {
+ if (guid_equal(&event_notify->sec_type, sec_type)) {
+ ret = event_notify->event_handler(gdata, sev,
+ event_notify->data);
+ if (!ret)
+ continue;
+ found = true;
+ break;
+ }
+ }
+ rcu_read_unlock();
+
+ return found;
+}
+
static void ghes_do_proc(struct ghes *ghes,
const struct acpi_hest_generic_status *estatus)
{
@@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,

log_arm_hw_error(err);
} else {
- void *err = acpi_hest_get_payload(gdata);
-
- log_non_standard_event(sec_type, fru_id, fru_text,
- sec_sev, err,
- gdata->error_data_length);
+ if (!ghes_handle_non_standard_event(sec_type, gdata,
+ sev)) {
+ void *err = acpi_hest_get_payload(gdata);
+
+ log_non_standard_event(sec_type, fru_id,
+ fru_text, sec_sev, err,
+ gdata->error_data_length);
+ }
}
}
}
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e3f1cdd..e3387cf 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -50,6 +50,62 @@ enum {
GHES_SEV_PANIC = 0x3,
};

+enum {
+ GHES_EVENT_NONE = 0x0,
+ GHES_EVENT_HANDLED = 0x1,
+};
+
+/**
+ * typedef ghes_event_handler_t - event handling function
+ * for the non-fatal HW errors.
+ *
+ * @gdata: acpi_hest_generic_data.
+ * @sev: error severity of the entire error event defined in the
+ * ACPI spec table generic error status block.
+ * @data: handler driver's private data.
+ *
+ * Return : GHES_EVENT_NONE - event not handled, GHES_EVENT_HANDLED - handled.
+ *
+ * The error handling function is responsible for logging error and
+ * this function would be called in the interrupt context.
+ */
+typedef int (*ghes_event_handler_t)(struct acpi_hest_generic_data *gdata,
+ int sev, void *data);
+
+#ifdef CONFIG_ACPI_APEI_GHES
+/**
+ * ghes_register_event_handler - register an event handling
+ * function for the non-fatal HW errors.
+ * @sec_type: sec_type of the corresponding CPER to be notified.
+ * @event_handler: pointer to the event handling function.
+ * @data: handler driver's private data.
+ *
+ * Return : 0 - SUCCESS, non-zero - FAIL.
+ */
+int ghes_register_event_handler(guid_t sec_type,
+ ghes_event_handler_t event_handler,
+ void *data);
+
+/**
+ * ghes_unregister_event_handler - unregister the previously
+ * registered event handling function.
+ * @sec_type: sec_type of the corresponding CPER.
+ * @data: driver specific data to distinguish devices.
+ */
+void ghes_unregister_event_handler(guid_t sec_typei, void *data);
+#else
+static inline int ghes_register_event_handler(guid_t sec_type,
+ ghes_event_handler_t event_handler,
+ void *data)
+{
+ return -ENODEV;
+}
+
+static inline void ghes_unregister_event_handler(guid_t sec_type, void *data)
+{
+}
+#endif
+
int ghes_estatus_pool_init(int num_ghes);

/* From drivers/edac/ghes_edac.c */
--
1.9.1


2020-02-07 10:34:32

by Shiju Jose

[permalink] [raw]
Subject: [PATCH v4 2/2] PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors

From: Yicong Yang <[email protected]>

The HiSilicon HIP PCIe controller is capable of handling errors
on root port and perform port reset separately at each root port.

This patch add error handling driver for HIP PCIe controller to log
and report recoverable errors. Perform root port reset and restore
link status after the recovery.

Following are some of the PCIe controller's recoverable errors
1. completion transmission timeout error.
2. CRS retry counter over the threshold error.
3. ECC 2 bit errors
4. AXI bresponse/rresponse errors etc.

Also fix the following Smatch warning:
warn: should '((((1))) << (9 + i))' be a 64 bit type?
if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
^^^ This should be BIT_ULL() because it goes up to 9 + 32.
Reported-by: kbuild test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>

Signed-off-by: Yicong Yang <[email protected]>
Signed-off-by: Shiju Jose <[email protected]>
--
drivers/pci/controller/Kconfig | 8 +
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pcie-hisi-error.c | 336 +++++++++++++++++++++++++++++++
3 files changed, 345 insertions(+)
create mode 100644 drivers/pci/controller/pcie-hisi-error.c
---
drivers/pci/controller/Kconfig | 8 +
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pcie-hisi-error.c | 334 +++++++++++++++++++++++++++++++
3 files changed, 343 insertions(+)
create mode 100644 drivers/pci/controller/pcie-hisi-error.c

diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index c77069c..5dad1ca 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -260,6 +260,14 @@ config PCI_HYPERV_INTERFACE
The Hyper-V PCI Interface is a helper driver allows other drivers to
have a common interface with the Hyper-V PCI frontend driver.

+config PCIE_HISI_ERR
+ depends on ARM64 || COMPILE_TEST
+ depends on ACPI
+ bool "HiSilicon HIP PCIe controller error handling driver"
+ help
+ Say Y here if you want error handling support
+ for the PCIe controller's errors on HiSilicon HIP SoCs
+
source "drivers/pci/controller/dwc/Kconfig"
source "drivers/pci/controller/cadence/Kconfig"
endmenu
diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
index 3d4f597..2d1565f 100644
--- a/drivers/pci/controller/Makefile
+++ b/drivers/pci/controller/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o
obj-$(CONFIG_PCIE_MOBIVEIL) += pcie-mobiveil.o
obj-$(CONFIG_PCIE_TANGO_SMP8759) += pcie-tango.o
obj-$(CONFIG_VMD) += vmd.o
+obj-$(CONFIG_PCIE_HISI_ERR) += pcie-hisi-error.o
# pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
obj-y += dwc/

diff --git a/drivers/pci/controller/pcie-hisi-error.c b/drivers/pci/controller/pcie-hisi-error.c
new file mode 100644
index 0000000..7867612
--- /dev/null
+++ b/drivers/pci/controller/pcie-hisi-error.c
@@ -0,0 +1,334 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for handling the PCIe controller errors on
+ * HiSilicon HIP SoCs.
+ *
+ * Copyright (c) 2018-2019 HiSilicon Limited.
+ */
+
+#include <linux/acpi.h>
+#include <acpi/ghes.h>
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+#include <linux/kfifo.h>
+#include <linux/spinlock.h>
+
+#include "../pci.h"
+
+#define HISI_PCIE_ERR_RECOVER_RING_SIZE 16
+#define HISI_PCIE_ERR_INFO_SIZE 1024
+
+/* HISI PCIe controller error definitions */
+#define HISI_PCIE_ERR_MISC_REGS 33
+
+#define HISI_PCIE_SUB_MODULE_ID_AP 0
+#define HISI_PCIE_SUB_MODULE_ID_TL 1
+#define HISI_PCIE_SUB_MODULE_ID_MAC 2
+#define HISI_PCIE_SUB_MODULE_ID_DL 3
+#define HISI_PCIE_SUB_MODULE_ID_SDI 4
+
+#define HISI_PCIE_LOCAL_VALID_VERSION BIT(0)
+#define HISI_PCIE_LOCAL_VALID_SOC_ID BIT(1)
+#define HISI_PCIE_LOCAL_VALID_SOCKET_ID BIT(2)
+#define HISI_PCIE_LOCAL_VALID_NIMBUS_ID BIT(3)
+#define HISI_PCIE_LOCAL_VALID_SUB_MODULE_ID BIT(4)
+#define HISI_PCIE_LOCAL_VALID_CORE_ID BIT(5)
+#define HISI_PCIE_LOCAL_VALID_PORT_ID BIT(6)
+#define HISI_PCIE_LOCAL_VALID_ERR_TYPE BIT(7)
+#define HISI_PCIE_LOCAL_VALID_ERR_SEVERITY BIT(8)
+#define HISI_PCIE_LOCAL_VALID_ERR_MISC 9
+
+#define HISI_ERR_SEV_RECOVERABLE 0
+#define HISI_ERR_SEV_FATAL 1
+#define HISI_ERR_SEV_CORRECTED 2
+#define HISI_ERR_SEV_NONE 3
+
+static guid_t hisi_pcie_sec_type = GUID_INIT(0xB2889FC9, 0xE7D7, 0x4F9D,
+ 0xA8, 0x67, 0xAF, 0x42, 0xE9, 0x8B, 0xE7, 0x72);
+
+#define HISI_PCIE_CORE_ID(v) ((v) >> 3)
+#define HISI_PCIE_PORT_ID(core, v) (((v) >> 1) + ((core) << 3))
+#define HISI_PCIE_CORE_PORT_ID(v) (((v) % 8) << 1)
+
+struct hisi_pcie_err_data {
+ u64 val_bits;
+ u8 version;
+ u8 soc_id;
+ u8 socket_id;
+ u8 nimbus_id;
+ u8 sub_module_id;
+ u8 core_id;
+ u8 port_id;
+ u8 err_severity;
+ u16 err_type;
+ u8 reserv[2];
+ u32 err_misc[HISI_PCIE_ERR_MISC_REGS];
+};
+
+struct hisi_pcie_err_info {
+ struct hisi_pcie_err_data err_data;
+ struct platform_device *pdev;
+};
+
+static char *hisi_pcie_sub_module_name(u8 id)
+{
+ switch (id) {
+ case HISI_PCIE_SUB_MODULE_ID_AP: return "AP Layer";
+ case HISI_PCIE_SUB_MODULE_ID_TL: return "TL Layer";
+ case HISI_PCIE_SUB_MODULE_ID_MAC: return "MAC Layer";
+ case HISI_PCIE_SUB_MODULE_ID_DL: return "DL Layer";
+ case HISI_PCIE_SUB_MODULE_ID_SDI: return "SDI Layer";
+ }
+
+ return "unknown";
+}
+
+static char *hisi_pcie_err_severity(u8 err_sev)
+{
+ switch (err_sev) {
+ case HISI_ERR_SEV_RECOVERABLE: return "recoverable";
+ case HISI_ERR_SEV_FATAL: return "fatal";
+ case HISI_ERR_SEV_CORRECTED: return "corrected";
+ case HISI_ERR_SEV_NONE: return "none";
+ }
+
+ return "unknown";
+}
+
+static int hisi_pcie_port_reset(struct platform_device *pdev,
+ u32 chip_id, u32 port_id)
+{
+ struct device *dev = &pdev->dev;
+ acpi_handle handle = ACPI_HANDLE(dev);
+ union acpi_object arg[3];
+ struct acpi_object_list arg_list;
+ acpi_status s;
+ unsigned long long data = 0;
+
+ arg[0].type = ACPI_TYPE_INTEGER;
+ arg[0].integer.value = chip_id;
+ arg[1].type = ACPI_TYPE_INTEGER;
+ arg[1].integer.value = HISI_PCIE_CORE_ID(port_id);
+ arg[2].type = ACPI_TYPE_INTEGER;
+ arg[2].integer.value = HISI_PCIE_CORE_PORT_ID(port_id);
+
+ arg_list.count = 3;
+ arg_list.pointer = arg;
+
+ /* Call the ACPI handle to reset root port */
+ s = acpi_evaluate_integer(handle, "RST", &arg_list, &data);
+ if (ACPI_FAILURE(s)) {
+ dev_err(dev, "No RST method\n");
+ return -EIO;
+ }
+
+ if (data) {
+ dev_err(dev, "Failed to Reset\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static int hisi_pcie_port_do_recovery(struct platform_device *dev,
+ u32 chip_id, u32 port_id)
+{
+ acpi_status s;
+ struct device *device = &dev->dev;
+ acpi_handle root_handle = ACPI_HANDLE(device);
+ struct acpi_pci_root *pci_root;
+ struct pci_bus *root_bus;
+ struct pci_dev *pdev;
+ u32 domain, busnr, devfn;
+
+ s = acpi_get_parent(root_handle, &root_handle);
+ if (ACPI_FAILURE(s))
+ return -ENODEV;
+ pci_root = acpi_pci_find_root(root_handle);
+ if (!pci_root)
+ return -ENODEV;
+ root_bus = pci_root->bus;
+ domain = pci_root->segment;
+
+ busnr = root_bus->number;
+ devfn = PCI_DEVFN(port_id, 0);
+ pdev = pci_get_domain_bus_and_slot(domain, busnr, devfn);
+ if (!pdev) {
+ dev_info(device, "Fail to get root port %04x:%02x:%02x.%d device\n",
+ domain, busnr, PCI_SLOT(devfn), PCI_FUNC(devfn));
+ return -ENODEV;
+ }
+
+ pci_stop_and_remove_bus_device_locked(pdev);
+ pci_dev_put(pdev);
+
+ if (hisi_pcie_port_reset(dev, chip_id, port_id))
+ return -EIO;
+
+ /*
+ * The initialization time of subordinate devices after
+ * hot reset is no more than 1s, which is required by
+ * the PCI spec v5.0 sec 6.6.1. The time will shorten
+ * if Readiness Notifications mechanisms are used. But
+ * wait 1s here to adapt any conditions.
+ */
+ ssleep(1UL);
+
+ /* add root port and downstream devices */
+ pci_lock_rescan_remove();
+ pci_rescan_bus(root_bus);
+ pci_unlock_rescan_remove();
+
+ return 0;
+}
+
+static void hisi_pcie_handle_one_error(const struct hisi_pcie_err_data *err,
+ struct platform_device *pdev)
+{
+ char buf[HISI_PCIE_ERR_INFO_SIZE];
+ char *p = buf, *end = buf + sizeof(buf);
+ struct device *dev = &pdev->dev;
+ u32 i;
+ int rc;
+
+ if (err->val_bits == 0) {
+ dev_warn(dev, "%s: no valid error information\n", __func__);
+ return;
+ }
+
+ /* Logging */
+ p += snprintf(p, end - p, "[ Table version=%d ", err->version);
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_SOC_ID)
+ p += snprintf(p, end - p, "SOC ID=%d ", err->soc_id);
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_SOCKET_ID)
+ p += snprintf(p, end - p, "socket ID=%d ", err->socket_id);
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_NIMBUS_ID)
+ p += snprintf(p, end - p, "nimbus ID=%d ", err->nimbus_id);
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_SUB_MODULE_ID)
+ p += snprintf(p, end - p, "sub module=%s ",
+ hisi_pcie_sub_module_name(err->sub_module_id));
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_CORE_ID)
+ p += snprintf(p, end - p, "core ID=core%d ", err->core_id);
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_PORT_ID)
+ p += snprintf(p, end - p, "port ID=port%d ", err->port_id);
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_ERR_SEVERITY)
+ p += snprintf(p, end - p, "error severity=%s ",
+ hisi_pcie_err_severity(err->err_severity));
+
+ if (err->val_bits & HISI_PCIE_LOCAL_VALID_ERR_TYPE)
+ p += snprintf(p, end - p, "error type=0x%x ", err->err_type);
+
+ p += snprintf(p, end - p, "]\n");
+ dev_info(dev, "\nHISI : HIP : PCIe controller error\n");
+ dev_info(dev, "%s\n", buf);
+
+ dev_info(dev, "Reg Dump:\n");
+ for (i = 0; i < HISI_PCIE_ERR_MISC_REGS; i++) {
+ if (err->val_bits & BIT_ULL(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
+ dev_info(dev,
+ "ERR_MISC_%d=0x%x\n", i, err->err_misc[i]);
+ }
+
+ /* Recovery for the PCIe controller errors */
+ if (err->err_severity == HISI_ERR_SEV_RECOVERABLE) {
+ /* try reset PCI port for the error recovery */
+ rc = hisi_pcie_port_do_recovery(pdev, err->socket_id,
+ HISI_PCIE_PORT_ID(err->core_id, err->port_id));
+ if (rc) {
+ dev_info(dev, "fail to do hisi pcie port reset\n");
+ return;
+ }
+ }
+}
+
+static DEFINE_KFIFO(hisi_pcie_err_recover_ring, struct hisi_pcie_err_info,
+ HISI_PCIE_ERR_RECOVER_RING_SIZE);
+static DEFINE_SPINLOCK(hisi_pcie_err_recover_ring_lock);
+
+static void hisi_pcie_err_recover_work_func(struct work_struct *work)
+{
+ struct hisi_pcie_err_info pcie_err_entry;
+
+ while (kfifo_get(&hisi_pcie_err_recover_ring, &pcie_err_entry)) {
+ hisi_pcie_handle_one_error(&pcie_err_entry.err_data,
+ pcie_err_entry.pdev);
+ }
+}
+
+static DECLARE_WORK(hisi_pcie_err_recover_work,
+ hisi_pcie_err_recover_work_func);
+
+static int hisi_pcie_error_handle(struct acpi_hest_generic_data *gdata,
+ int sev, void *data)
+{
+ const struct hisi_pcie_err_data *err_data =
+ acpi_hest_get_payload(gdata);
+ struct hisi_pcie_err_info err_info;
+ struct platform_device *pdev = data;
+ struct device *dev = &pdev->dev;
+ u8 socket;
+
+ if (device_property_read_u8(dev, "socket", &socket))
+ return GHES_EVENT_NONE;
+
+ if (err_data->socket_id != socket)
+ return GHES_EVENT_NONE;
+
+ memcpy(&err_info.err_data, err_data, sizeof(*err_data));
+ err_info.pdev = pdev;
+
+ if (kfifo_in_spinlocked(&hisi_pcie_err_recover_ring, &err_info, 1,
+ &hisi_pcie_err_recover_ring_lock))
+ schedule_work(&hisi_pcie_err_recover_work);
+ else
+ dev_warn(dev, "queue full when recovering PCIe controller error\n");
+
+ return GHES_EVENT_HANDLED;
+}
+
+static int hisi_pcie_err_handler_probe(struct platform_device *pdev)
+{
+ int ret;
+
+ ret = ghes_register_event_handler(hisi_pcie_sec_type,
+ hisi_pcie_error_handle, pdev);
+ if (ret) {
+ dev_err(&pdev->dev, "%s : ghes_register_event_handler fail\n",
+ __func__);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int hisi_pcie_err_handler_remove(struct platform_device *pdev)
+{
+ ghes_unregister_event_handler(hisi_pcie_sec_type, pdev);
+
+ return 0;
+}
+
+static const struct acpi_device_id hisi_pcie_acpi_match[] = {
+ { "HISI0361", 0 },
+ { }
+};
+
+static struct platform_driver hisi_pcie_err_handler_driver = {
+ .driver = {
+ .name = "hisi-pcie-err-handler",
+ .acpi_match_table = hisi_pcie_acpi_match,
+ },
+ .probe = hisi_pcie_err_handler_probe,
+ .remove = hisi_pcie_err_handler_remove,
+};
+module_platform_driver(hisi_pcie_err_handler_driver);
+
+MODULE_DESCRIPTION("HiSilicon HIP PCIe controller error handling driver");
+MODULE_LICENSE("GPL v2");
--
1.9.1


2020-03-09 09:24:34

by Shiju Jose

[permalink] [raw]
Subject: RE: [PATCH v4 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Hi All,

Gentle reminder on this patch set.

Thanks,
Shiju

>-----Original Message-----
>From: [email protected] [mailto:linux-acpi-
>[email protected]] On Behalf Of Shiju Jose
>Sent: 07 February 2020 10:32
>To: [email protected]; [email protected]; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]
>Cc: Linuxarm <[email protected]>; Jonathan Cameron
><[email protected]>; tanxiaofei <[email protected]>;
>yangyicong <[email protected]>; Shiju Jose <[email protected]>
>Subject: [PATCH v4 0/2] ACPI: APEI: Add support to notify the vendor specific
>HW errors
>
>Presently the vendor drivers are unable to do the recovery for the vendor
>specific recoverable HW errors, reported to the APEI driver in the vendor
>defined sections, because APEI driver does not support reporting the same to
>the vendor drivers.
>
>This patch set
>1. add an interface to the APEI driver to enable the vendor drivers to register
>the event handling functions for the corresponding vendor specific HW errors
>and report the error to the vendor driver.
>
>2. add driver to handle HiSilicon hip08 PCIe controller's errors
> which is an example application of the above APEI interface.
>
>Changes:
>
>V4:
>1. Fix for the smatch warning in the PCIe error driver:
> warn: should '((((1))) << (9 + i))' be a 64 bit type?
> if (err->val_bits & BIT(HISI_PCIE_LOCAL_VALID_ERR_MISC + i))
> ^^^ This should be BIT_ULL() because it goes up to 9 + 32.
>
>V3:
>1. Fix the comments from Bjorn Helgaas.
>
>V2:
>1. Changes in the HiSilicon PCIe controller's error handling driver
> for the comments from Bjorn Helgaas.
>
>2. Changes in the APEI interface to support reporting the vendor error
> for module with multiple devices, but use the same section type.
> In the error handler will use socket id/sub module id etc to distinguish
> the device.
>
>V1:
>1. Fix comments from James Morse.
>
>2. add driver to handle HiSilicon hip08 PCIe controller's errors,
> which is an application of the above interface.
>
>Shiju Jose (1):
> ACPI: APEI: Add support to notify the vendor specific HW errors
>
>Yicong Yang (1):
> PCI: HIP: Add handling of HiSilicon HIP PCIe controller errors
>
> drivers/acpi/apei/ghes.c | 116 ++++++++++-
> drivers/pci/controller/Kconfig | 8 +
> drivers/pci/controller/Makefile | 1 +
> drivers/pci/controller/pcie-hisi-error.c | 334
>+++++++++++++++++++++++++++++++
> include/acpi/ghes.h | 56 ++++++
> 5 files changed, 510 insertions(+), 5 deletions(-) create mode 100644
>drivers/pci/controller/pcie-hisi-error.c
>
>--
>1.9.1
>

2020-03-11 17:28:22

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Hi Shiju,

On 09/03/2020 09:23, Shiju Jose wrote:
> Gentle reminder on this patch set.

Your cover-letter has:
| X-Mailer: git-send-email 2.19.2.windows.1
| In-Reply-To: <Shiju Jose>
| References: <Shiju Jose>

Which causes my mail client to thread this with year-old mail ... hence I've only just
seen this. Other people may have the same problem.
If you're feeding these headers into git-send-email, it expects the value from the
original message's 'Message-Id'... but you don't want this for a cover letter!


Thanks,

James

2020-03-11 17:31:01

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Hi Shiju,

On 07/02/2020 10:31, Shiju Jose wrote:
> Presently APEI does not support reporting the vendor specific
> HW errors, received in the vendor defined table entries, to the
> vendor drivers for any recovery.
>
> This patch adds the support to register and unregister the
> error handling function for the vendor specific HW errors and
> notify the registered kernel driver.

Is it possible to use the kernel's existing atomic_notifier_chain_register() API for this?

The one thing that can't be done in the same way is the GUID filtering in ghes.c. Each
driver would need to check if the call matched a GUID they knew about, and return
NOTIFY_DONE if they "don't care".

I think this patch would be a lot smaller if it was tweaked to be able to use the existing
API. If there is a reason not to use it, it would be good to know what it is.


> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 103acbb..69e18d7 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)

> +/**
> + * ghes_unregister_event_handler - unregister the previously
> + * registered event handling function.
> + * @sec_type: sec_type of the corresponding CPER.
> + * @data: driver specific data to distinguish devices.
> + */
> +void ghes_unregister_event_handler(guid_t sec_type, void *data)
> +{
> + struct ghes_event_notify *event_notify;
> + bool found = false;
> +
> + mutex_lock(&ghes_event_notify_mutex);
> + rcu_read_lock();
> + list_for_each_entry_rcu(event_notify,
> + &ghes_event_handler_list, list) {
> + if (guid_equal(&event_notify->sec_type, &sec_type)) {

> + if (data != event_notify->data)

It looks like you need multiple drivers to handle the same GUID because of multiple root
ports. Can't the handler lookup the right device?


> + continue;
> + list_del_rcu(&event_notify->list);
> + found = true;
> + break;
> + }
> + }
> + rcu_read_unlock();
> + mutex_unlock(&ghes_event_notify_mutex);
> +
> + if (!found) {
> + pr_err("Tried to unregister a GHES event handler that has not been registered\n");
> + return;
> + }
> +
> + synchronize_rcu();
> + kfree(event_notify);
> +}
> +EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);

> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>
> log_arm_hw_error(err);
> } else {
> - void *err = acpi_hest_get_payload(gdata);
> -
> - log_non_standard_event(sec_type, fru_id, fru_text,
> - sec_sev, err,
> - gdata->error_data_length);
> + if (!ghes_handle_non_standard_event(sec_type, gdata,
> + sev)) {
> + void *err = acpi_hest_get_payload(gdata);
> +
> + log_non_standard_event(sec_type, fru_id,
> + fru_text, sec_sev, err,
> + gdata->error_data_length);
> + }

So, a side effect of the kernel handling these is they no longer get logged out of trace
points?

I guess the driver the claims this logs some more accurate information. Are there expected
to be any user-space programs doing something useful with B2889FC9... today?


Thanks,

James

2020-03-12 12:12:48

by Shiju Jose

[permalink] [raw]
Subject: RE: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Hi James,

Thanks for reviewing the code.

>-----Original Message-----
>From: [email protected] [mailto:linux-pci-
>[email protected]] On Behalf Of James Morse
>Sent: 11 March 2020 17:30
>To: Shiju Jose <[email protected]>
>Cc: [email protected]; [email protected]; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; Linuxarm <[email protected]>; Jonathan Cameron
><[email protected]>; tanxiaofei <[email protected]>;
>yangyicong <[email protected]>
>Subject: Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>Hi Shiju,
>
>On 07/02/2020 10:31, Shiju Jose wrote:
>> Presently APEI does not support reporting the vendor specific HW
>> errors, received in the vendor defined table entries, to the vendor
>> drivers for any recovery.
>>
>> This patch adds the support to register and unregister the error
>> handling function for the vendor specific HW errors and notify the
>> registered kernel driver.
>
>Is it possible to use the kernel's existing atomic_notifier_chain_register() API for
>this?
>
>The one thing that can't be done in the same way is the GUID filtering in ghes.c.
>Each driver would need to check if the call matched a GUID they knew about,
>and return NOTIFY_DONE if they "don't care".
>
>I think this patch would be a lot smaller if it was tweaked to be able to use the
>existing API. If there is a reason not to use it, it would be good to know what it
>is.
I think when using atomic_notifier_chain_register we have following limitations,
1. All the registered error handlers would get called, though an error is not related to those handlers.
Also this may lead to mishandling of the error information if a handler does not
implement GUID checking etc.
2. atomic_notifier_chain_register (notifier_chain_register) looks like does not support
pass the handler's private data during the registration which supposed to
passed later in the call back function *notifier_fn_t(... ,void *data) to the handler.
3. Also got difficulty in passing the ghes error data(acpi_hest_generic_data), GUID
for the error received to the handler through the notifier_chain callback interface.

Sorry if I did not understood your suggestion correctly.

>
>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>> 103acbb..69e18d7 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct
>> acpi_hest_generic_data *gdata)
>
>> +/**
>> + * ghes_unregister_event_handler - unregister the previously
>> + * registered event handling function.
>> + * @sec_type: sec_type of the corresponding CPER.
>> + * @data: driver specific data to distinguish devices.
>> + */
>> +void ghes_unregister_event_handler(guid_t sec_type, void *data) {
>> + struct ghes_event_notify *event_notify;
>> + bool found = false;
>> +
>> + mutex_lock(&ghes_event_notify_mutex);
>> + rcu_read_lock();
>> + list_for_each_entry_rcu(event_notify,
>> + &ghes_event_handler_list, list) {
>> + if (guid_equal(&event_notify->sec_type, &sec_type)) {
>
>> + if (data != event_notify->data)
>
>It looks like you need multiple drivers to handle the same GUID because of
>multiple root ports. Can't the handler lookup the right device?
This check was because GUID is shared among multiple devices with one driver as seen
in the B2889FC9 driver (pcie-hisi-error.c).

>
>
>> + continue;
>> + list_del_rcu(&event_notify->list);
>> + found = true;
>> + break;
>> + }
>> + }
>> + rcu_read_unlock();
>> + mutex_unlock(&ghes_event_notify_mutex);
>> +
>> + if (!found) {
>> + pr_err("Tried to unregister a GHES event handler that has not
>been registered\n");
>> + return;
>> + }
>> +
>> + synchronize_rcu();
>> + kfree(event_notify);
>> +}
>> +EXPORT_SYMBOL_GPL(ghes_unregister_event_handler);
>
>> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>>
>> log_arm_hw_error(err);
>> } else {
>> - void *err = acpi_hest_get_payload(gdata);
>> -
>> - log_non_standard_event(sec_type, fru_id, fru_text,
>> - sec_sev, err,
>> - gdata->error_data_length);
>> + if (!ghes_handle_non_standard_event(sec_type, gdata,
>> + sev)) {
>> + void *err = acpi_hest_get_payload(gdata);
>> +
>> + log_non_standard_event(sec_type, fru_id,
>> + fru_text, sec_sev, err,
>> + gdata->error_data_length);
>> + }
>
>So, a side effect of the kernel handling these is they no longer get logged out of
>trace points?
>
>I guess the driver the claims this logs some more accurate information. Are
>there expected to be any user-space programs doing something useful with
>B2889FC9... today?
The B2889FC9 driver does not expect any corresponding user space programs.
The driver mainly for the error recovery and basic error decoding and logging.
Previously we added the error logging for the B2889FC9 in the rasdaemon.
>
>
>Thanks,
>
>James

Thanks,
Shiju

2020-03-13 15:18:20

by James Morse

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Hi Shiju,

On 3/12/20 12:10 PM, Shiju Jose wrote:
>> On 07/02/2020 10:31, Shiju Jose wrote:
>>> Presently APEI does not support reporting the vendor specific HW
>>> errors, received in the vendor defined table entries, to the vendor
>>> drivers for any recovery.
>>>
>>> This patch adds the support to register and unregister the error
>>> handling function for the vendor specific HW errors and notify the
>>> registered kernel driver.
>>
>> Is it possible to use the kernel's existing atomic_notifier_chain_register() API for
>> this?
>>
>> The one thing that can't be done in the same way is the GUID filtering in ghes.c.
>> Each driver would need to check if the call matched a GUID they knew about,
>> and return NOTIFY_DONE if they "don't care".
>>
>> I think this patch would be a lot smaller if it was tweaked to be able to use the
>> existing API. If there is a reason not to use it, it would be good to know what it
>> is.

> I think when using atomic_notifier_chain_register we have following limitations,
> 1. All the registered error handlers would get called, though an error is not related to those handlers.

The notifier chain provides NOTIFY_STOP_MASK, so that one of the callers
can say the work is done. We only expect a handful of these, so I don't
think there is going to be a scalability problem.


> Also this may lead to mishandling of the error information if a handler does not
> implement GUID checking etc.

Which would be a bug we can fix.
There is no point worrying about bugs in out of tree code.


> 2. atomic_notifier_chain_register (notifier_chain_register) looks like does not support
> pass the handler's private data during the registration which supposed to
> passed later in the call back function *notifier_fn_t(... ,void *data) to the handler.

The callback is provided with the struct notifier_block. A bit of
container_of() magic will give you whatever structure you embedded it in!


> 3. Also got difficulty in passing the ghes error data(acpi_hest_generic_data), GUID
> for the error received to the handler through the notifier_chain callback interface.

Here you've lost me. Because you need to pass more than one thing? Can't
we have a struct for that?

But, isn't it all in struct acpi_hest_generic_data already? That is
where the guid and severity come from.


>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>>> 103acbb..69e18d7 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct
>>> acpi_hest_generic_data *gdata)
>>
>>> +/**
>>> + * ghes_unregister_event_handler - unregister the previously
>>> + * registered event handling function.
>>> + * @sec_type: sec_type of the corresponding CPER.
>>> + * @data: driver specific data to distinguish devices.
>>> + */
>>> +void ghes_unregister_event_handler(guid_t sec_type, void *data) {
>>> + struct ghes_event_notify *event_notify;
>>> + bool found = false;
>>> +
>>> + mutex_lock(&ghes_event_notify_mutex);
>>> + rcu_read_lock();
>>> + list_for_each_entry_rcu(event_notify,
>>> + &ghes_event_handler_list, list) {
>>> + if (guid_equal(&event_notify->sec_type, &sec_type)) {
>>
>>> + if (data != event_notify->data)
>>
>> It looks like you need multiple drivers to handle the same GUID because of
>> multiple root ports. Can't the handler lookup the right device?

> This check was because GUID is shared among multiple devices with one driver as seen
> in the B2889FC9 driver (pcie-hisi-error.c).

(we should stop calling it by its guid ... does it have a name?!)


This must be some kind of error collector for a bus right?

I agree we may need to have multiple drivers register to handle vendor
events, but it looks like you are registering the same handler multiple
times, with different private structures.

Can't it find the affected device from the error description?


>>> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>>>
>>> log_arm_hw_error(err);
>>> } else {
>>> - void *err = acpi_hest_get_payload(gdata);
>>> -
>>> - log_non_standard_event(sec_type, fru_id, fru_text,
>>> - sec_sev, err,
>>> - gdata->error_data_length);
>>> + if (!ghes_handle_non_standard_event(sec_type, gdata,
>>> + sev)) {
>>> + void *err = acpi_hest_get_payload(gdata);
>>> +
>>> + log_non_standard_event(sec_type, fru_id,
>>> + fru_text, sec_sev, err,
>>> + gdata->error_data_length);
>>> + }
>>
>> So, a side effect of the kernel handling these is they no longer get logged out of
>> trace points?
>>
>> I guess the driver the claims this logs some more accurate information. Are
>> there expected to be any user-space programs doing something useful with
>> B2889FC9... today?

> The B2889FC9 driver does not expect any corresponding user space programs.
> The driver mainly for the error recovery and basic error decoding and logging.

> Previously we added the error logging for the B2889FC9 in the rasdaemon.

So this series would break the error logging in rasdaemon.

User-space would need to be upgraded to receive the trace information
from the specific driver instead. (how does it know?!)

Could we log_non_standard_event() unconditionally, maybe adding a field
to indicate that a driver claimed it, so there may be more data
somewhere else...


Thanks,

James

2020-03-13 17:10:30

by Shiju Jose

[permalink] [raw]
Subject: RE: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor specific HW errors

Hi James,

>-----Original Message-----
>From: James Morse [mailto:[email protected]]
>Sent: 13 March 2020 15:17
>To: Shiju Jose <[email protected]>
>Cc: [email protected]; [email protected]; linux-
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]; Linuxarm <[email protected]>; Jonathan Cameron
><[email protected]>; tanxiaofei <[email protected]>;
>yangyicong <[email protected]>
>Subject: Re: [PATCH v4 1/2] ACPI: APEI: Add support to notify the vendor
>specific HW errors
>
>Hi Shiju,
>
>On 3/12/20 12:10 PM, Shiju Jose wrote:
>>> On 07/02/2020 10:31, Shiju Jose wrote:
>>>> Presently APEI does not support reporting the vendor specific HW
>>>> errors, received in the vendor defined table entries, to the vendor
>>>> drivers for any recovery.
>>>>
>>>> This patch adds the support to register and unregister the error
>>>> handling function for the vendor specific HW errors and notify the
>>>> registered kernel driver.
>>>
>>> Is it possible to use the kernel's existing
>>> atomic_notifier_chain_register() API for this?
>>>
>>> The one thing that can't be done in the same way is the GUID filtering in
>ghes.c.
>>> Each driver would need to check if the call matched a GUID they knew
>>> about, and return NOTIFY_DONE if they "don't care".
>>>
>>> I think this patch would be a lot smaller if it was tweaked to be
>>> able to use the existing API. If there is a reason not to use it, it
>>> would be good to know what it is.
>
>> I think when using atomic_notifier_chain_register we have following
>limitations,
>> 1. All the registered error handlers would get called, though an error is not
>related to those handlers.
>
>The notifier chain provides NOTIFY_STOP_MASK, so that one of the callers can
>say the work is done. We only expect a handful of these, so I don't think there is
>going to be a scalability problem.
Ok. I will check the error reporting by using atomic_notifier_chain and test.

>
>
>> Also this may lead to mishandling of the error information if a handler does
>not
>> implement GUID checking etc.
>
>Which would be a bug we can fix.
>There is no point worrying about bugs in out of tree code.
Ok.

>
>
>> 2. atomic_notifier_chain_register (notifier_chain_register) looks like does not
>support
>> pass the handler's private data during the registration which supposed to
>> passed later in the call back function *notifier_fn_t(... ,void *data) to the
>handler.
>
>The callback is provided with the struct notifier_block. A bit of
>container_of() magic will give you whatever structure you embedded it in!
Ok. I will check this.

>
>
>> 3. Also got difficulty in passing the ghes error data(acpi_hest_generic_data),
>GUID
>> for the error received to the handler through the notifier_chain callback
>interface.
>
>Here you've lost me. Because you need to pass more than one thing? Can't we
>have a struct for that?
>
>But, isn't it all in struct acpi_hest_generic_data already? That is where the guid
>and severity come from.
Ok. right.

>
>
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index
>>>> 103acbb..69e18d7 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -490,6 +490,109 @@ static void ghes_handle_aer(struct
>>>> acpi_hest_generic_data *gdata)
>>>
>>>> +/**
>>>> + * ghes_unregister_event_handler - unregister the previously
>>>> + * registered event handling function.
>>>> + * @sec_type: sec_type of the corresponding CPER.
>>>> + * @data: driver specific data to distinguish devices.
>>>> + */
>>>> +void ghes_unregister_event_handler(guid_t sec_type, void *data) {
>>>> + struct ghes_event_notify *event_notify;
>>>> + bool found = false;
>>>> +
>>>> + mutex_lock(&ghes_event_notify_mutex);
>>>> + rcu_read_lock();
>>>> + list_for_each_entry_rcu(event_notify,
>>>> + &ghes_event_handler_list, list) {
>>>> + if (guid_equal(&event_notify->sec_type, &sec_type)) {
>>>
>>>> + if (data != event_notify->data)
>>>
>>> It looks like you need multiple drivers to handle the same GUID
>>> because of multiple root ports. Can't the handler lookup the right device?
>
>> This check was because GUID is shared among multiple devices with one
>> driver as seen in the B2889FC9 driver (pcie-hisi-error.c).
>
>(we should stop calling it by its guid ... does it have a name?!)
>
>
>This must be some kind of error collector for a bus right?
>
>I agree we may need to have multiple drivers register to handle vendor events,
>but it looks like you are registering the same handler multiple times, with
>different private structures.
>
>Can't it find the affected device from the error description?
Yes. We already have the code in the PCIe error handling driver to identify the right device
from the error information.

>
>
>>>> @@ -525,11 +628,14 @@ static void ghes_do_proc(struct ghes *ghes,
>>>>
>>>> log_arm_hw_error(err);
>>>> } else {
>>>> - void *err = acpi_hest_get_payload(gdata);
>>>> -
>>>> - log_non_standard_event(sec_type, fru_id, fru_text,
>>>> - sec_sev, err,
>>>> - gdata->error_data_length);
>>>> + if (!ghes_handle_non_standard_event(sec_type, gdata,
>>>> + sev)) {
>>>> + void *err = acpi_hest_get_payload(gdata);
>>>> +
>>>> + log_non_standard_event(sec_type, fru_id,
>>>> + fru_text, sec_sev, err,
>>>> + gdata->error_data_length);
>>>> + }
>>>
>>> So, a side effect of the kernel handling these is they no longer get
>>> logged out of trace points?
>>>
>>> I guess the driver the claims this logs some more accurate
>>> information. Are there expected to be any user-space programs doing
>>> something useful with B2889FC9... today?
>
>> The B2889FC9 driver does not expect any corresponding user space
>programs.
>> The driver mainly for the error recovery and basic error decoding and logging.
>
>> Previously we added the error logging for the B2889FC9 in the rasdaemon.
>
>So this series would break the error logging in rasdaemon.
It does not affect the logging information to the user for the HiSilicon PCIe controller errors
because the level of logging information is the same both in the rasdaemon and in the
newly adding HiSilicon PCIe controller error handling driver.
>
>User-space would need to be upgraded to receive the trace information from
>the specific driver instead. (how does it know?!)
>
>Could we log_non_standard_event() unconditionally, maybe adding a field to
>indicate that a driver claimed it, so there may be more data somewhere else...
sure, I will check the possibility of adding the field to indicate driver claimed it and
calling log_non_standard_event() always.
>
>
>Thanks,
>
>James

Thanks,
Shiju