From: Sean V Kelley <[email protected]>
Changes since v8 [1] and based on discussion [2] and pci/err tree [3]:
- No functional changes. Tested with aer injection.
PCI/AER: Apply function level reset to RCiEP on fatal error
- Remove. Handle with pcie_flr() directly when adding linked RCEC to AER/ERR.
PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
- Just call pcie_flr() and remove need for wrapping with flr_on_rciep(). Note it appears
that a check on pcie_has_flr() (as also used in flr_on_rciep())relates to hardware specific
quirks and so I've added it.
- Consolidate AER register setting in aer_root_reset() with a test for the non-native case.
With that change, simplify "state == pci_channel_io_frozen" case by removing tests for the
non-native case. Also simplify pci_walk_bridge().
(Bjorn Helgaas)
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/20201009213011.GA3504871@bjorn-Precision-5520/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/err
Root Complex Event Collectors (RCEC) provide support for terminating error
and PME messages from Root Complex Integrated Endpoints (RCiEPs). An RCEC
resides on a Bus in the Root Complex. Multiple RCECs can in fact reside on
a single bus. An RCEC will explicitly declare supported RCiEPs through the
Root Complex Endpoint Association Extended Capability.
(See PCIe 5.0-1, sections 1.3.2.3 (RCiEP), and 7.9.10 (RCEC Ext. Cap.))
The kernel lacks handling for these RCECs and the error messages received
from their respective associated RCiEPs. More recently, a new CPU
interconnect, Compute eXpress Link (CXL) depends on RCEC capabilities for
purposes of error messaging from CXL 1.1 supported RCiEP devices.
DocLink: https://www.computeexpresslink.org/
This use case is not limited to CXL. Existing hardware today includes
support for RCECs, such as the Denverton microserver product
family. Future hardware will be forthcoming.
(See Intel Document, Order number: 33061-003US)
So services such as AER or PME could be associated with an RCEC driver.
In the case of CXL, if an RCiEP (i.e., CXL 1.1 device) is associated with a
platform's RCEC it shall signal PME and AER error conditions through that
RCEC.
Towards the above use cases, add the missing RCEC class and extend the
PCIe Root Port and service drivers to allow association of RCiEPs to their
respective parent RCEC and facilitate handling of terminating error and PME
messages.
Tested-by: Jonathan Cameron <[email protected]> #non-native/no RCEC
Qiuxu Zhuo (4):
PCI/RCEC: Add RCEC class code and extended capability
PCI/RCEC: Bind RCEC devices to the Root Port driver
PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
PCI/AER: Add RCEC AER error injection support
Sean V Kelley (11):
PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
PCI/ERR: Rename reset_link() to reset_subordinates()
PCI/ERR: Simplify by using pci_upstream_bridge()
PCI/ERR: Simplify by computing pci_pcie_type() once
PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
PCI/ERR: Avoid negated conditional for clarity
PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
PCI/ERR: Limit AER resets in pcie_do_recovery()
PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
PCI/PME: Add pcie_walk_rcec() to RCEC PME handling
drivers/pci/pci.h | 29 ++++-
drivers/pci/pcie/Makefile | 2 +-
drivers/pci/pcie/aer.c | 82 ++++++++++----
drivers/pci/pcie/aer_inject.c | 5 +-
drivers/pci/pcie/err.c | 93 +++++++++++-----
drivers/pci/pcie/pme.c | 15 ++-
drivers/pci/pcie/portdrv_core.c | 9 +-
drivers/pci/pcie/portdrv_pci.c | 8 +-
drivers/pci/pcie/rcec.c | 190 ++++++++++++++++++++++++++++++++
drivers/pci/probe.c | 2 +
include/linux/pci.h | 5 +
include/linux/pci_ids.h | 1 +
include/uapi/linux/pci_regs.h | 7 ++
13 files changed, 384 insertions(+), 64 deletions(-)
create mode 100644 drivers/pci/pcie/rcec.c
--
2.28.0
From: Sean V Kelley <[email protected]>
reset_link() appears to be misnamed. The point is to reset any devices
below a given bridge, so rename it to reset_subordinates() to make it clear
that we are passing a bridge with the intent to reset the devices below it.
[bhelgaas: fix reset_subordinate_device() typo, shorten name]
Suggested-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pci.h | 4 ++--
drivers/pci/pcie/err.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index af98b7d2134b..bc2340971a50 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -573,8 +573,8 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
/* PCI error reporting and recovery */
pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
- pci_channel_state_t state,
- pci_ers_result_t (*reset_link)(struct pci_dev *pdev));
+ pci_channel_state_t state,
+ pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev));
bool pcie_wait_for_link(struct pci_dev *pdev, bool active);
#ifdef CONFIG_PCIEASPM
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index c543f419d8f9..db149c6ce4fb 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -147,8 +147,8 @@ static int report_resume(struct pci_dev *dev, void *data)
}
pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
- pci_channel_state_t state,
- pci_ers_result_t (*reset_link)(struct pci_dev *pdev))
+ pci_channel_state_t state,
+ pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
struct pci_bus *bus;
@@ -165,9 +165,9 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(dev, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bus(bus, report_frozen_detected, &status);
- status = reset_link(dev);
+ status = reset_subordinates(dev);
if (status != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(dev, "link reset failed\n");
+ pci_warn(dev, "subordinate device reset failed\n");
goto failed;
}
} else {
--
2.28.0
From: Sean V Kelley <[email protected]>
Instead of calling pci_pcie_type(dev) twice, call it once and save the
result. No functional change intended.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/err.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 05f61da5ed9d..7a5af873d8bc 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -150,6 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_channel_state_t state,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
+ int type = pci_pcie_type(dev);
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
struct pci_bus *bus;
@@ -157,8 +158,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
* Error recovery runs on all subordinates of the first downstream port.
* If the downstream port detected the error, it is cleared at the end.
*/
- if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
- pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
+ if (!(type == PCI_EXP_TYPE_ROOT_PORT ||
+ type == PCI_EXP_TYPE_DOWNSTREAM))
dev = pci_upstream_bridge(dev);
bus = dev->subordinate;
--
2.28.0
From: Qiuxu Zhuo <[email protected]>
If a Root Complex Integrated Endpoint (RCiEP) is implemented, it may signal
errors through a Root Complex Event Collector (RCEC). Each RCiEP must be
associated with no more than one RCEC.
For an RCEC (which is technically not a Bridge), error messages "received"
from associated RCiEPs must be enabled for "transmission" in order to cause
a System Error via the Root Control register or (when the Advanced Error
Reporting Capability is present) reporting via the Root Error Command
register and logging in the Root Error Status register and Error Source
Identification register.
Given the commonality with Root Ports and the need to also support AER and
PME services for RCECs, extend the Root Port driver to support RCEC devices
by adding the RCEC Class ID to the driver structure.
Co-developed-by: Sean V Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/portdrv_pci.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 3a3ce40ae1ab..4d880679b9b1 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -106,7 +106,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
if (!pci_is_pcie(dev) ||
((pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
(pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM) &&
- (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)))
+ (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM) &&
+ (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
return -ENODEV;
status = pcie_port_device_register(dev);
@@ -195,6 +196,8 @@ static const struct pci_device_id port_pci_ids[] = {
{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x00), ~0) },
/* subtractive decode PCI-to-PCI bridge, class type is 060401h */
{ PCI_DEVICE_CLASS(((PCI_CLASS_BRIDGE_PCI << 8) | 0x01), ~0) },
+ /* handle any Root Complex Event Collector */
+ { PCI_DEVICE_CLASS(((PCI_CLASS_SYSTEM_RCEC << 8) | 0x00), ~0) },
{ },
};
--
2.28.0
From: Sean V Kelley <[email protected]>
Extend support for Root Complex Event Collectors by decoding and caching
the RCEC Endpoint Association Extended Capabilities when enumerating. Use
that cached information for later error source reporting. See PCIe r5.0,
sec 7.9.10.
[bhelgaas: make pci_rcec_init() void, set dev->rcec_ea after filling it]
Suggested-by: Bjorn Helgaas <[email protected]>
Co-developed-by: Qiuxu Zhuo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pci.h | 17 +++++++++++
drivers/pci/pcie/Makefile | 2 +-
drivers/pci/pcie/rcec.c | 59 +++++++++++++++++++++++++++++++++++++++
drivers/pci/probe.c | 2 ++
include/linux/pci.h | 4 +++
5 files changed, 83 insertions(+), 1 deletion(-)
create mode 100644 drivers/pci/pcie/rcec.c
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fa12f7cbc1a0..af98b7d2134b 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -449,6 +449,15 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info);
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
#endif /* CONFIG_PCIEAER */
+#ifdef CONFIG_PCIEPORTBUS
+/* Cached RCEC Endpoint Association */
+struct rcec_ea {
+ u8 nextbusn;
+ u8 lastbusn;
+ u32 bitmap;
+};
+#endif
+
#ifdef CONFIG_PCIE_DPC
void pci_save_dpc_state(struct pci_dev *dev);
void pci_restore_dpc_state(struct pci_dev *dev);
@@ -461,6 +470,14 @@ static inline void pci_restore_dpc_state(struct pci_dev *dev) {}
static inline void pci_dpc_init(struct pci_dev *pdev) {}
#endif
+#ifdef CONFIG_PCIEPORTBUS
+void pci_rcec_init(struct pci_dev *dev);
+void pci_rcec_exit(struct pci_dev *dev);
+#else
+static inline void pci_rcec_init(struct pci_dev *dev) {}
+static inline void pci_rcec_exit(struct pci_dev *dev) {}
+#endif
+
#ifdef CONFIG_PCI_ATS
/* Address Translation Service */
void pci_ats_init(struct pci_dev *dev);
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 68da9280ff11..d9697892fa3e 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -2,7 +2,7 @@
#
# Makefile for PCI Express features and port driver
-pcieportdrv-y := portdrv_core.o portdrv_pci.o err.o
+pcieportdrv-y := portdrv_core.o portdrv_pci.o err.o rcec.o
obj-$(CONFIG_PCIEPORTBUS) += pcieportdrv.o
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
new file mode 100644
index 000000000000..038e9d706d5f
--- /dev/null
+++ b/drivers/pci/pcie/rcec.c
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Root Complex Event Collector Support
+ *
+ * Authors:
+ * Sean V Kelley <[email protected]>
+ * Qiuxu Zhuo <[email protected]>
+ *
+ * Copyright (C) 2020 Intel Corp.
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/pci_regs.h>
+
+#include "../pci.h"
+
+void pci_rcec_init(struct pci_dev *dev)
+{
+ struct rcec_ea *rcec_ea;
+ u32 rcec, hdr, busn;
+ u8 ver;
+
+ /* Only for Root Complex Event Collectors */
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)
+ return;
+
+ rcec = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_RCEC);
+ if (!rcec)
+ return;
+
+ rcec_ea = kzalloc(sizeof(*rcec_ea), GFP_KERNEL);
+ if (!rcec_ea)
+ return;
+
+ pci_read_config_dword(dev, rcec + PCI_RCEC_RCIEP_BITMAP,
+ &rcec_ea->bitmap);
+
+ /* Check whether RCEC BUSN register is present */
+ pci_read_config_dword(dev, rcec, &hdr);
+ ver = PCI_EXT_CAP_VER(hdr);
+ if (ver >= PCI_RCEC_BUSN_REG_VER) {
+ pci_read_config_dword(dev, rcec + PCI_RCEC_BUSN, &busn);
+ rcec_ea->nextbusn = PCI_RCEC_BUSN_NEXT(busn);
+ rcec_ea->lastbusn = PCI_RCEC_BUSN_LAST(busn);
+ } else {
+ /* Avoid later ver check by setting nextbusn */
+ rcec_ea->nextbusn = 0xff;
+ rcec_ea->lastbusn = 0x00;
+ }
+
+ dev->rcec_ea = rcec_ea;
+}
+
+void pci_rcec_exit(struct pci_dev *dev)
+{
+ kfree(dev->rcec_ea);
+ dev->rcec_ea = NULL;
+}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 03d37128a24f..25f01f841f2d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2201,6 +2201,7 @@ static void pci_configure_device(struct pci_dev *dev)
static void pci_release_capabilities(struct pci_dev *dev)
{
pci_aer_exit(dev);
+ pci_rcec_exit(dev);
pci_vpd_release(dev);
pci_iov_release(dev);
pci_free_cap_save_buffers(dev);
@@ -2400,6 +2401,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
pci_ptm_init(dev); /* Precision Time Measurement */
pci_aer_init(dev); /* Advanced Error Reporting */
pci_dpc_init(dev); /* Downstream Port Containment */
+ pci_rcec_init(dev); /* Root Complex Event Collector */
pcie_report_downtraining(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..2290439e8bc0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -304,6 +304,7 @@ struct pcie_link_state;
struct pci_vpd;
struct pci_sriov;
struct pci_p2pdma;
+struct rcec_ea;
/* The pci_dev structure describes PCI devices */
struct pci_dev {
@@ -326,6 +327,9 @@ struct pci_dev {
#ifdef CONFIG_PCIEAER
u16 aer_cap; /* AER capability offset */
struct aer_stats *aer_stats; /* AER stats for this device */
+#endif
+#ifdef CONFIG_PCIEPORTBUS
+ struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */
#endif
u8 pcie_cap; /* PCIe capability offset */
u8 msi_cap; /* MSI capability offset */
--
2.28.0
From: Qiuxu Zhuo <[email protected]>
A PCIe Root Complex Event Collector (RCEC) has base class 0x08, sub-class
0x07, and programming interface 0x00. Add the class code 0x0807 to
identify RCEC devices and add #defines for the RCEC Endpoint Association
Extended Capability.
See PCIe r5.0, sec 1.3.4 ("Root Complex Event Collector") and sec 7.9.10
("Root Complex Event Collector Endpoint Association Extended Capability").
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
---
include/linux/pci_ids.h | 1 +
include/uapi/linux/pci_regs.h | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 1ab1e24bcbce..d8156a5dbee8 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -81,6 +81,7 @@
#define PCI_CLASS_SYSTEM_RTC 0x0803
#define PCI_CLASS_SYSTEM_PCI_HOTPLUG 0x0804
#define PCI_CLASS_SYSTEM_SDHCI 0x0805
+#define PCI_CLASS_SYSTEM_RCEC 0x0807
#define PCI_CLASS_SYSTEM_OTHER 0x0880
#define PCI_BASE_CLASS_INPUT 0x09
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f9701410d3b5..f6475a9e63d8 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -828,6 +828,13 @@
#define PCI_PWR_CAP_BUDGET(x) ((x) & 1) /* Included in system budget */
#define PCI_EXT_CAP_PWR_SIZEOF 16
+/* Root Complex Event Collector Endpoint Association */
+#define PCI_RCEC_RCIEP_BITMAP 4 /* Associated Bitmap for RCiEPs */
+#define PCI_RCEC_BUSN 8 /* RCEC Associated Bus Numbers */
+#define PCI_RCEC_BUSN_REG_VER 0x02 /* Least version with BUSN present */
+#define PCI_RCEC_BUSN_NEXT(x) (((x) >> 8) & 0xff)
+#define PCI_RCEC_BUSN_LAST(x) (((x) >> 16) & 0xff)
+
/* Vendor-Specific (VSEC, PCI_EXT_CAP_ID_VNDR) */
#define PCI_VNDR_HEADER 4 /* Vendor-Specific Header */
#define PCI_VNDR_HEADER_ID(x) ((x) & 0xffff)
--
2.28.0
From: Qiuxu Zhuo <[email protected]>
Root Complex Event Collectors (RCEC) appear as peers to Root Ports and may
also have the AER capability.
Add RCEC support to the AER error injection driver.
Co-developed-by: Sean V Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pcie/aer_inject.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/aer_inject.c b/drivers/pci/pcie/aer_inject.c
index c2cbf425afc5..767f8859b99b 100644
--- a/drivers/pci/pcie/aer_inject.c
+++ b/drivers/pci/pcie/aer_inject.c
@@ -333,8 +333,11 @@ static int aer_inject(struct aer_error_inj *einj)
if (!dev)
return -ENODEV;
rpdev = pcie_find_root_port(dev);
+ /* If Root Port not found, try to find an RCEC */
+ if (!rpdev)
+ rpdev = dev->rcec;
if (!rpdev) {
- pci_err(dev, "Root port not found\n");
+ pci_err(dev, "Neither Root Port nor RCEC found\n");
ret = -ENODEV;
goto out_put;
}
--
2.28.0
From: Sean V Kelley <[email protected]>
Root Complex Event Collectors (RCEC) appear as peers to Root Ports and also
have the AER capability. In addition, actions need to be taken for
associated RCiEPs. In such cases the RCECs will need to be walked in order
to find and act upon their respective RCiEPs.
Extend the existing ability to link the RCECs with a walking function
pcie_walk_rcec(). Add RCEC support to the current AER service driver and
attach the AER service driver to the RCEC device.
[bhelgaas: kernel doc, whitespace cleanup]
Co-developed-by: Qiuxu Zhuo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pci.h | 6 ++++++
drivers/pci/pcie/aer.c | 29 ++++++++++++++++++++++-------
drivers/pci/pcie/rcec.c | 37 +++++++++++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9e43a265c006..4090f2f98bf6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -474,10 +474,16 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
void pci_rcec_init(struct pci_dev *dev);
void pci_rcec_exit(struct pci_dev *dev);
void pcie_link_rcec(struct pci_dev *rcec);
+void pcie_walk_rcec(struct pci_dev *rcec,
+ int (*cb)(struct pci_dev *, void *),
+ void *userdata);
#else
static inline void pci_rcec_init(struct pci_dev *dev) {}
static inline void pci_rcec_exit(struct pci_dev *dev) {}
static inline void pcie_link_rcec(struct pci_dev *rcec) {}
+static inline void pcie_walk_rcec(struct pci_dev *rcec,
+ int (*cb)(struct pci_dev *, void *),
+ void *userdata) {}
#endif
#ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 083f69b67bfd..8244a29ef334 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -300,7 +300,8 @@ int pci_aer_raw_clear_status(struct pci_dev *dev)
return -EIO;
port_type = pci_pcie_type(dev);
- if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
+ if (port_type == PCI_EXP_TYPE_ROOT_PORT ||
+ port_type == PCI_EXP_TYPE_RC_EC) {
pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, &status);
pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, status);
}
@@ -595,7 +596,8 @@ static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
if ((a == &dev_attr_aer_rootport_total_err_cor.attr ||
a == &dev_attr_aer_rootport_total_err_fatal.attr ||
a == &dev_attr_aer_rootport_total_err_nonfatal.attr) &&
- pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT)
+ ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
+ (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC)))
return 0;
return a->mode;
@@ -916,7 +918,10 @@ static bool find_source_device(struct pci_dev *parent,
if (result)
return true;
- pci_walk_bus(parent->subordinate, find_device_iter, e_info);
+ if (pci_pcie_type(parent) == PCI_EXP_TYPE_RC_EC)
+ pcie_walk_rcec(parent, find_device_iter, e_info);
+ else
+ pci_walk_bus(parent->subordinate, find_device_iter, e_info);
if (!e_info->error_dev_num) {
pci_info(parent, "can't find device of ID%04x\n", e_info->id);
@@ -1053,6 +1058,7 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
if (!(info->status & ~info->mask))
return 0;
} else if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
info->severity == AER_NONFATAL) {
@@ -1205,6 +1211,7 @@ static int set_device_error_reporting(struct pci_dev *dev, void *data)
int type = pci_pcie_type(dev);
if ((type == PCI_EXP_TYPE_ROOT_PORT) ||
+ (type == PCI_EXP_TYPE_RC_EC) ||
(type == PCI_EXP_TYPE_UPSTREAM) ||
(type == PCI_EXP_TYPE_DOWNSTREAM)) {
if (enable)
@@ -1229,9 +1236,12 @@ static void set_downstream_devices_error_reporting(struct pci_dev *dev,
{
set_device_error_reporting(dev, &enable);
- if (!dev->subordinate)
- return;
- pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable);
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+ pcie_walk_rcec(dev, set_device_error_reporting, &enable);
+ else if (dev->subordinate)
+ pci_walk_bus(dev->subordinate, set_device_error_reporting,
+ &enable);
+
}
/**
@@ -1329,6 +1339,11 @@ static int aer_probe(struct pcie_device *dev)
struct device *device = &dev->device;
struct pci_dev *port = dev->port;
+ /* Limit to Root Ports or Root Complex Event Collectors */
+ if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+ (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+ return -ENODEV;
+
rpc = devm_kzalloc(device, sizeof(struct aer_rpc), GFP_KERNEL);
if (!rpc)
return -ENOMEM;
@@ -1407,7 +1422,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
static struct pcie_port_service_driver aerdriver = {
.name = "aer",
- .port_type = PCI_EXP_TYPE_ROOT_PORT,
+ .port_type = PCIE_ANY_PORT,
.service = PCIE_PORT_SERVICE_AER,
.probe = aer_probe,
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index cdec277cbd62..2c5c552994e4 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -53,6 +53,18 @@ static int link_rcec_helper(struct pci_dev *dev, void *data)
return 0;
}
+static int walk_rcec_helper(struct pci_dev *dev, void *data)
+{
+ struct walk_rcec_data *rcec_data = data;
+ struct pci_dev *rcec = rcec_data->rcec;
+
+ if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) &&
+ rcec_assoc_rciep(rcec, dev))
+ rcec_data->user_callback(dev, rcec_data->user_data);
+
+ return 0;
+}
+
static void walk_rcec(int (*cb)(struct pci_dev *dev, void *data),
void *userdata)
{
@@ -109,6 +121,31 @@ void pcie_link_rcec(struct pci_dev *rcec)
walk_rcec(link_rcec_helper, &rcec_data);
}
+/**
+ * pcie_walk_rcec - Walk RCiEP devices associating with RCEC and call callback.
+ * @rcec: RCEC whose RCiEP devices should be walked
+ * @cb: Callback to be called for each RCiEP device found
+ * @userdata: Arbitrary pointer to be passed to callback
+ *
+ * Walk the given RCEC. Call the callback on each RCiEP found.
+ *
+ * If @cb returns anything other than 0, break out.
+ */
+void pcie_walk_rcec(struct pci_dev *rcec, int (*cb)(struct pci_dev *, void *),
+ void *userdata)
+{
+ struct walk_rcec_data rcec_data;
+
+ if (!rcec->rcec_ea)
+ return;
+
+ rcec_data.rcec = rcec;
+ rcec_data.user_callback = cb;
+ rcec_data.user_data = userdata;
+
+ walk_rcec(walk_rcec_helper, &rcec_data);
+}
+
void pci_rcec_init(struct pci_dev *dev)
{
struct rcec_ea *rcec_ea;
--
2.28.0
From: Qiuxu Zhuo <[email protected]>
When attempting error recovery for an RCiEP associated with an RCEC device,
there needs to be a way to update the Root Error Status, the Uncorrectable
Error Status and the Uncorrectable Error Severity of the parent RCEC. In
some non-native cases in which there is no OS-visible device associated
with the RCiEP, there is nothing to act upon as the firmware is acting
before the OS.
Add handling for the linked RCEC in AER/ERR while taking into account
non-native cases.
Co-developed-by: Sean V Kelley <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
drivers/pci/pcie/err.c | 20 ++++++++--------
2 files changed, 48 insertions(+), 25 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 65dff5f3457a..083f69b67bfd 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
*/
static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
{
- int aer = dev->aer_cap;
+ int type = pci_pcie_type(dev);
+ struct pci_dev *root;
+ int aer = 0;
+ int rc = 0;
u32 reg32;
- int rc;
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
+ /*
+ * The reset should only clear the Root Error Status
+ * of the RCEC. Only perform this for the
+ * native case, i.e., an RCEC is present.
+ */
+ root = dev->rcec;
+ else
+ root = dev;
- /* Disable Root's interrupt in response to error messages */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
- reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ if (root)
+ aer = dev->aer_cap;
- rc = pci_bus_error_reset(dev);
- pci_info(dev, "Root Port link has been reset\n");
+ if (aer) {
+ /* Disable Root's interrupt in response to error messages */
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
+ reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
- /* Clear Root Error Status */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
+ /* Clear Root Error Status */
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
- /* Enable Root Port's interrupt in response to error messages */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
- reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ /* Enable Root Port's interrupt in response to error messages */
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
+ reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ }
+
+ if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
+ if (pcie_has_flr(root)) {
+ rc = pcie_flr(root);
+ pci_info(dev, "has been reset (%d)\n", rc);
+ }
+ } else {
+ rc = pci_bus_error_reset(root);
+ pci_info(dev, "Root Port link has been reset (%d)\n", rc);
+ }
return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
}
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 7883c9791562..cbc5abfe767b 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
/**
* pci_walk_bridge - walk bridges potentially AER affected
- * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
- * or an RCiEP associated with an RCEC
- * @cb: callback to be called for each device found
- * @userdata: arbitrary pointer to be passed to callback
+ * @bridge bridge which may be an RCEC with associated RCiEPs,
+ * or a Port.
+ * @cb callback to be called for each device found
+ * @userdata arbitrary pointer to be passed to callback.
*
* If the device provided is a bridge, walk the subordinate bus, including
* any bridged devices on buses under this bus. Call the provided callback
@@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
int (*cb)(struct pci_dev *, void *),
void *userdata)
{
+ /*
+ * In a non-native case where there is no OS-visible reporting
+ * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
+ */
if (bridge->subordinate)
pci_walk_bus(bridge->subordinate, cb, userdata);
+ else if (bridge->rcec)
+ cb(bridge->rcec, userdata);
else
cb(bridge, userdata);
}
@@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, report_frozen_detected, &status);
- if (type == PCI_EXP_TYPE_RC_END) {
- pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
- status = PCI_ERS_RESULT_NONE;
- goto failed;
- }
-
status = reset_subordinates(bridge);
if (status != PCI_ERS_RESULT_RECOVERED) {
pci_warn(bridge, "subordinate device reset failed\n");
--
2.28.0
From: Sean V Kelley <[email protected]>
Use pci_upstream_bridge() in place of dev->bus->self. No functional change
intended.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/err.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index db149c6ce4fb..05f61da5ed9d 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -159,7 +159,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
*/
if (!(pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM))
- dev = dev->bus->self;
+ dev = pci_upstream_bridge(dev);
bus = dev->subordinate;
pci_dbg(dev, "broadcast error_detected message\n");
--
2.28.0
From: Sean V Kelley <[email protected]>
Root Complex Event Collectors (RCEC) appear as peers of Root Ports and also
have the PME capability. As with AER, there is a need to be able to walk
the RCiEPs associated with their RCEC for purposes of acting upon them with
callbacks.
Add RCEC support through the use of pcie_walk_rcec() to the current PME
service driver and attach the PME service driver to the RCEC device.
Co-developed-by: Qiuxu Zhuo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pcie/pme.c | 15 +++++++++++----
drivers/pci/pcie/portdrv_core.c | 9 +++------
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c
index 6a32970bb731..87799166c96a 100644
--- a/drivers/pci/pcie/pme.c
+++ b/drivers/pci/pcie/pme.c
@@ -310,7 +310,10 @@ static int pcie_pme_can_wakeup(struct pci_dev *dev, void *ign)
static void pcie_pme_mark_devices(struct pci_dev *port)
{
pcie_pme_can_wakeup(port, NULL);
- if (port->subordinate)
+
+ if (pci_pcie_type(port) == PCI_EXP_TYPE_RC_EC)
+ pcie_walk_rcec(port, pcie_pme_can_wakeup, NULL);
+ else if (port->subordinate)
pci_walk_bus(port->subordinate, pcie_pme_can_wakeup, NULL);
}
@@ -320,10 +323,15 @@ static void pcie_pme_mark_devices(struct pci_dev *port)
*/
static int pcie_pme_probe(struct pcie_device *srv)
{
- struct pci_dev *port;
+ struct pci_dev *port = srv->port;
struct pcie_pme_service_data *data;
int ret;
+ /* Limit to Root Ports or Root Complex Event Collectors */
+ if ((pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC) &&
+ (pci_pcie_type(port) != PCI_EXP_TYPE_ROOT_PORT))
+ return -ENODEV;
+
data = kzalloc(sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
@@ -333,7 +341,6 @@ static int pcie_pme_probe(struct pcie_device *srv)
data->srv = srv;
set_service_data(srv, data);
- port = srv->port;
pcie_pme_interrupt_enable(port, false);
pcie_clear_root_pme_status(port);
@@ -445,7 +452,7 @@ static void pcie_pme_remove(struct pcie_device *srv)
static struct pcie_port_service_driver pcie_pme_driver = {
.name = "pcie_pme",
- .port_type = PCI_EXP_TYPE_ROOT_PORT,
+ .port_type = PCIE_ANY_PORT,
.service = PCIE_PORT_SERVICE_PME,
.probe = pcie_pme_probe,
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 50a9522ab07d..e1fed6649c41 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -233,12 +233,9 @@ static int get_port_device_capability(struct pci_dev *dev)
}
#endif
- /*
- * Root ports are capable of generating PME too. Root Complex
- * Event Collectors can also generate PMEs, but we don't handle
- * those yet.
- */
- if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT &&
+ /* Root Ports and Root Complex Event Collectors may generate PMEs */
+ if ((pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+ pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC) &&
(pcie_ports_native || host->native_pme)) {
services |= PCIE_PORT_SERVICE_PME;
--
2.28.0
From: Sean V Kelley <[email protected]>
pcie_do_recovery() may be called with "dev" being either a bridge (Root
Port or Switch Downstream Port) or an Endpoint. The bulk of the function
deals with the bridge, so if we start with an Endpoint, we reset "dev" to
be the bridge leading to it.
For clarity, replace "dev" in the body of the function with "bridge". No
functional change intended.
[bhelgaas: commit log, split pieces out so this is pure rename, also
replace "dev" with "bridge" in messages and pci_uevent_ers()]
Suggested-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/err.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 7a5af873d8bc..46a5b84f8842 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -151,24 +151,27 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
int type = pci_pcie_type(dev);
- pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
+ struct pci_dev *bridge;
struct pci_bus *bus;
+ pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
/*
- * Error recovery runs on all subordinates of the first downstream port.
- * If the downstream port detected the error, it is cleared at the end.
+ * Error recovery runs on all subordinates of the bridge. If the
+ * bridge detected the error, it is cleared at the end.
*/
if (!(type == PCI_EXP_TYPE_ROOT_PORT ||
type == PCI_EXP_TYPE_DOWNSTREAM))
- dev = pci_upstream_bridge(dev);
- bus = dev->subordinate;
+ bridge = pci_upstream_bridge(dev);
+ else
+ bridge = dev;
- pci_dbg(dev, "broadcast error_detected message\n");
+ bus = bridge->subordinate;
+ pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bus(bus, report_frozen_detected, &status);
- status = reset_subordinates(dev);
+ status = reset_subordinates(bridge);
if (status != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(dev, "subordinate device reset failed\n");
+ pci_warn(bridge, "subordinate device reset failed\n");
goto failed;
}
} else {
@@ -177,7 +180,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
if (status == PCI_ERS_RESULT_CAN_RECOVER) {
status = PCI_ERS_RESULT_RECOVERED;
- pci_dbg(dev, "broadcast mmio_enabled message\n");
+ pci_dbg(bridge, "broadcast mmio_enabled message\n");
pci_walk_bus(bus, report_mmio_enabled, &status);
}
@@ -188,27 +191,27 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
* drivers' slot_reset callbacks?
*/
status = PCI_ERS_RESULT_RECOVERED;
- pci_dbg(dev, "broadcast slot_reset message\n");
+ pci_dbg(bridge, "broadcast slot_reset message\n");
pci_walk_bus(bus, report_slot_reset, &status);
}
if (status != PCI_ERS_RESULT_RECOVERED)
goto failed;
- pci_dbg(dev, "broadcast resume message\n");
+ pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bus(bus, report_resume, &status);
- if (pcie_aer_is_native(dev))
- pcie_clear_device_status(dev);
- pci_aer_clear_nonfatal_status(dev);
- pci_info(dev, "device recovery successful\n");
+ if (pcie_aer_is_native(bridge))
+ pcie_clear_device_status(bridge);
+ pci_aer_clear_nonfatal_status(bridge);
+ pci_info(bridge, "device recovery successful\n");
return status;
failed:
- pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
+ pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
/* TODO: Should kernel panic here? */
- pci_info(dev, "device recovery failed\n");
+ pci_info(bridge, "device recovery failed\n");
return status;
}
--
2.28.0
From: Sean V Kelley <[email protected]>
A Root Complex Event Collector terminates error and PME messages from
associated RCiEPs.
Use the RCEC Endpoint Association Extended Capability to identify
associated RCiEPs. Link the associated RCiEPs as the RCECs are enumerated.
Co-developed-by: Qiuxu Zhuo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pci.h | 2 +
drivers/pci/pcie/portdrv_pci.c | 3 ++
drivers/pci/pcie/rcec.c | 94 ++++++++++++++++++++++++++++++++++
include/linux/pci.h | 1 +
4 files changed, 100 insertions(+)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index bc2340971a50..9e43a265c006 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -473,9 +473,11 @@ static inline void pci_dpc_init(struct pci_dev *pdev) {}
#ifdef CONFIG_PCIEPORTBUS
void pci_rcec_init(struct pci_dev *dev);
void pci_rcec_exit(struct pci_dev *dev);
+void pcie_link_rcec(struct pci_dev *rcec);
#else
static inline void pci_rcec_init(struct pci_dev *dev) {}
static inline void pci_rcec_exit(struct pci_dev *dev) {}
+static inline void pcie_link_rcec(struct pci_dev *rcec) {}
#endif
#ifdef CONFIG_PCI_ATS
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 4d880679b9b1..dbeb0155c2c3 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -110,6 +110,9 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
(pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)))
return -ENODEV;
+ if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
+ pcie_link_rcec(dev);
+
status = pcie_port_device_register(dev);
if (status)
return status;
diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index 038e9d706d5f..cdec277cbd62 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -15,6 +15,100 @@
#include "../pci.h"
+struct walk_rcec_data {
+ struct pci_dev *rcec;
+ int (*user_callback)(struct pci_dev *dev, void *data);
+ void *user_data;
+};
+
+static bool rcec_assoc_rciep(struct pci_dev *rcec, struct pci_dev *rciep)
+{
+ unsigned long bitmap = rcec->rcec_ea->bitmap;
+ unsigned int devn;
+
+ /* An RCiEP found on a different bus in range */
+ if (rcec->bus->number != rciep->bus->number)
+ return true;
+
+ /* Same bus, so check bitmap */
+ for_each_set_bit(devn, &bitmap, 32)
+ if (devn == rciep->devfn)
+ return true;
+
+ return false;
+}
+
+static int link_rcec_helper(struct pci_dev *dev, void *data)
+{
+ struct walk_rcec_data *rcec_data = data;
+ struct pci_dev *rcec = rcec_data->rcec;
+
+ if ((pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END) &&
+ rcec_assoc_rciep(rcec, dev)) {
+ dev->rcec = rcec;
+ pci_dbg(dev, "PME & error events signaled via %s\n",
+ pci_name(rcec));
+ }
+
+ return 0;
+}
+
+static void walk_rcec(int (*cb)(struct pci_dev *dev, void *data),
+ void *userdata)
+{
+ struct walk_rcec_data *rcec_data = userdata;
+ struct pci_dev *rcec = rcec_data->rcec;
+ u8 nextbusn, lastbusn;
+ struct pci_bus *bus;
+ unsigned int bnr;
+
+ if (!rcec->rcec_ea)
+ return;
+
+ /* Walk own bus for bitmap based association */
+ pci_walk_bus(rcec->bus, cb, rcec_data);
+
+ nextbusn = rcec->rcec_ea->nextbusn;
+ lastbusn = rcec->rcec_ea->lastbusn;
+
+ /* All RCiEP devices are on the same bus as the RCEC */
+ if (nextbusn == 0xff && lastbusn == 0x00)
+ return;
+
+ for (bnr = nextbusn; bnr <= lastbusn; bnr++) {
+ /* No association indicated (PCIe 5.0-1, 7.9.10.3) */
+ if (bnr == rcec->bus->number)
+ continue;
+
+ bus = pci_find_bus(pci_domain_nr(rcec->bus), bnr);
+ if (!bus)
+ continue;
+
+ /* Find RCiEP devices on the given bus ranges */
+ pci_walk_bus(bus, cb, rcec_data);
+ }
+}
+
+/**
+ * pcie_link_rcec - Link RCiEP devices associated with RCEC.
+ * @rcec: RCEC whose RCiEP devices should be linked.
+ *
+ * Link the given RCEC to each RCiEP device found.
+ */
+void pcie_link_rcec(struct pci_dev *rcec)
+{
+ struct walk_rcec_data rcec_data;
+
+ if (!rcec->rcec_ea)
+ return;
+
+ rcec_data.rcec = rcec;
+ rcec_data.user_callback = NULL;
+ rcec_data.user_data = NULL;
+
+ walk_rcec(link_rcec_helper, &rcec_data);
+}
+
void pci_rcec_init(struct pci_dev *dev)
{
struct rcec_ea *rcec_ea;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 2290439e8bc0..e546b16b13c1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -330,6 +330,7 @@ struct pci_dev {
#endif
#ifdef CONFIG_PCIEPORTBUS
struct rcec_ea *rcec_ea; /* RCEC cached endpoint association */
+ struct pci_dev *rcec; /* Associated RCEC device */
#endif
u8 pcie_cap; /* PCIe capability offset */
u8 msi_cap; /* MSI capability offset */
--
2.28.0
From: Sean V Kelley <[email protected]>
Consolidate subordinate bus checks with pci_walk_bus() into
pci_walk_bridge() for walking below potentially AER affected bridges.
[bhelgaas: fix kerneldoc]
Suggested-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/pcie/err.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 931e75f2549d..8b53aecdb43d 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -146,13 +146,30 @@ static int report_resume(struct pci_dev *dev, void *data)
return 0;
}
+/**
+ * pci_walk_bridge - walk bridges potentially AER affected
+ * @bridge: bridge which may be a Port
+ * @cb: callback to be called for each device found
+ * @userdata: arbitrary pointer to be passed to callback
+ *
+ * If the device provided is a bridge, walk the subordinate bus, including
+ * any bridged devices on buses under this bus. Call the provided callback
+ * on each device found.
+ */
+static void pci_walk_bridge(struct pci_dev *bridge,
+ int (*cb)(struct pci_dev *, void *),
+ void *userdata)
+{
+ if (bridge->subordinate)
+ pci_walk_bus(bridge->subordinate, cb, userdata);
+}
+
pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_channel_state_t state,
pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
{
int type = pci_pcie_type(dev);
struct pci_dev *bridge;
- struct pci_bus *bus;
pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
/*
@@ -165,23 +182,22 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
else
bridge = pci_upstream_bridge(dev);
- bus = bridge->subordinate;
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
- pci_walk_bus(bus, report_frozen_detected, &status);
+ pci_walk_bridge(bridge, report_frozen_detected, &status);
status = reset_subordinates(bridge);
if (status != PCI_ERS_RESULT_RECOVERED) {
pci_warn(bridge, "subordinate device reset failed\n");
goto failed;
}
} else {
- pci_walk_bus(bus, report_normal_detected, &status);
+ pci_walk_bridge(bridge, report_normal_detected, &status);
}
if (status == PCI_ERS_RESULT_CAN_RECOVER) {
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast mmio_enabled message\n");
- pci_walk_bus(bus, report_mmio_enabled, &status);
+ pci_walk_bridge(bridge, report_mmio_enabled, &status);
}
if (status == PCI_ERS_RESULT_NEED_RESET) {
@@ -192,14 +208,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
*/
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast slot_reset message\n");
- pci_walk_bus(bus, report_slot_reset, &status);
+ pci_walk_bridge(bridge, report_slot_reset, &status);
}
if (status != PCI_ERS_RESULT_RECOVERED)
goto failed;
pci_dbg(bridge, "broadcast resume message\n");
- pci_walk_bus(bus, report_resume, &status);
+ pci_walk_bridge(bridge, report_resume, &status);
if (pcie_aer_is_native(bridge))
pcie_clear_device_status(bridge);
--
2.28.0
From: Sean V Kelley <[email protected]>
In some cases a bridge may not exist as the hardware controlling may be
handled only by firmware and so is not visible to the OS. This scenario is
also possible in future use cases involving non-native use of RCECs by
firmware.
Explicitly apply conditional logic around these resets by limiting them to
Root Ports and Downstream Ports.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 8b53aecdb43d..7883c9791562 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data)
/**
* pci_walk_bridge - walk bridges potentially AER affected
- * @bridge: bridge which may be a Port
+ * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
+ * or an RCiEP associated with an RCEC
* @cb: callback to be called for each device found
* @userdata: arbitrary pointer to be passed to callback
*
* If the device provided is a bridge, walk the subordinate bus, including
* any bridged devices on buses under this bus. Call the provided callback
* on each device found.
+ *
+ * If the device provided has no subordinate bus, call the callback on the
+ * device itself.
*/
static void pci_walk_bridge(struct pci_dev *bridge,
int (*cb)(struct pci_dev *, void *),
@@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge,
{
if (bridge->subordinate)
pci_walk_bus(bridge->subordinate, cb, userdata);
+ else
+ cb(bridge, userdata);
}
pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
@@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
/*
* Error recovery runs on all subordinates of the bridge. If the
- * bridge detected the error, it is cleared at the end.
+ * bridge detected the error, it is cleared at the end. For RCiEPs
+ * we should reset just the RCiEP itself.
*/
if (type == PCI_EXP_TYPE_ROOT_PORT ||
- type == PCI_EXP_TYPE_DOWNSTREAM)
+ type == PCI_EXP_TYPE_DOWNSTREAM ||
+ type == PCI_EXP_TYPE_RC_EC ||
+ type == PCI_EXP_TYPE_RC_END)
bridge = dev;
else
bridge = pci_upstream_bridge(dev);
@@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, report_frozen_detected, &status);
+ if (type == PCI_EXP_TYPE_RC_END) {
+ pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
+ status = PCI_ERS_RESULT_NONE;
+ goto failed;
+ }
+
status = reset_subordinates(bridge);
if (status != PCI_ERS_RESULT_RECOVERED) {
pci_warn(bridge, "subordinate device reset failed\n");
@@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast resume message\n");
pci_walk_bridge(bridge, report_resume, &status);
- if (pcie_aer_is_native(bridge))
- pcie_clear_device_status(bridge);
- pci_aer_clear_nonfatal_status(bridge);
+ if (type == PCI_EXP_TYPE_ROOT_PORT ||
+ type == PCI_EXP_TYPE_DOWNSTREAM ||
+ type == PCI_EXP_TYPE_RC_EC) {
+ if (pcie_aer_is_native(bridge))
+ pcie_clear_device_status(bridge);
+ pci_aer_clear_nonfatal_status(bridge);
+ }
pci_info(bridge, "device recovery successful\n");
return status;
--
2.28.0
From: Sean V Kelley <[email protected]>
Reverse the sense of the Root Port/Downstream Port conditional for clarity.
No functional change intended.
[bhelgaas: split to separate patch]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean V Kelley <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Jonathan Cameron <[email protected]>
---
drivers/pci/pcie/err.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 46a5b84f8842..931e75f2549d 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -159,11 +159,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
* Error recovery runs on all subordinates of the bridge. If the
* bridge detected the error, it is cleared at the end.
*/
- if (!(type == PCI_EXP_TYPE_ROOT_PORT ||
- type == PCI_EXP_TYPE_DOWNSTREAM))
- bridge = pci_upstream_bridge(dev);
- else
+ if (type == PCI_EXP_TYPE_ROOT_PORT ||
+ type == PCI_EXP_TYPE_DOWNSTREAM)
bridge = dev;
+ else
+ bridge = pci_upstream_bridge(dev);
bus = bridge->subordinate;
pci_dbg(bridge, "broadcast error_detected message\n");
--
2.28.0
On Thu, Oct 15, 2020 at 05:11:07PM -0700, Sean V Kelley wrote:
> From: Sean V Kelley <[email protected]>
>
> Consolidate subordinate bus checks with pci_walk_bus() into
> pci_walk_bridge() for walking below potentially AER affected bridges.
>
> [bhelgaas: fix kerneldoc]
> Suggested-by: Bjorn Helgaas <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Sean V Kelley <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> ---
> drivers/pci/pcie/err.c | 30 +++++++++++++++++++++++-------
> 1 file changed, 23 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 931e75f2549d..8b53aecdb43d 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -146,13 +146,30 @@ static int report_resume(struct pci_dev *dev, void *data)
> return 0;
> }
>
> +/**
> + * pci_walk_bridge - walk bridges potentially AER affected
> + * @bridge: bridge which may be a Port
> + * @cb: callback to be called for each device found
> + * @userdata: arbitrary pointer to be passed to callback
> + *
> + * If the device provided is a bridge, walk the subordinate bus, including
> + * any bridged devices on buses under this bus. Call the provided callback
> + * on each device found.
> + */
> +static void pci_walk_bridge(struct pci_dev *bridge,
> + int (*cb)(struct pci_dev *, void *),
> + void *userdata)
> +{
> + if (bridge->subordinate)
Remind me why we add this bridge->subordinate test? I see that we're
going to need it later, but I think we should add the test in the same
patch that adds the case where "bridge->subordinate == NULL" becomes
possible.
Or else a note in this commit log about what's happening.
AFAICT, this test is literally the only possible functional change in
this patch, so the commit log should mention it.
> + pci_walk_bus(bridge->subordinate, cb, userdata);
> +}
> +
> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_channel_state_t state,
> pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev))
> {
> int type = pci_pcie_type(dev);
> struct pci_dev *bridge;
> - struct pci_bus *bus;
> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
>
> /*
> @@ -165,23 +182,22 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> else
> bridge = pci_upstream_bridge(dev);
>
> - bus = bridge->subordinate;
> pci_dbg(bridge, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> - pci_walk_bus(bus, report_frozen_detected, &status);
> + pci_walk_bridge(bridge, report_frozen_detected, &status);
> status = reset_subordinates(bridge);
> if (status != PCI_ERS_RESULT_RECOVERED) {
> pci_warn(bridge, "subordinate device reset failed\n");
> goto failed;
> }
> } else {
> - pci_walk_bus(bus, report_normal_detected, &status);
> + pci_walk_bridge(bridge, report_normal_detected, &status);
> }
>
> if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> status = PCI_ERS_RESULT_RECOVERED;
> pci_dbg(bridge, "broadcast mmio_enabled message\n");
> - pci_walk_bus(bus, report_mmio_enabled, &status);
> + pci_walk_bridge(bridge, report_mmio_enabled, &status);
> }
>
> if (status == PCI_ERS_RESULT_NEED_RESET) {
> @@ -192,14 +208,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> */
> status = PCI_ERS_RESULT_RECOVERED;
> pci_dbg(bridge, "broadcast slot_reset message\n");
> - pci_walk_bus(bus, report_slot_reset, &status);
> + pci_walk_bridge(bridge, report_slot_reset, &status);
> }
>
> if (status != PCI_ERS_RESULT_RECOVERED)
> goto failed;
>
> pci_dbg(bridge, "broadcast resume message\n");
> - pci_walk_bus(bus, report_resume, &status);
> + pci_walk_bridge(bridge, report_resume, &status);
>
> if (pcie_aer_is_native(bridge))
> pcie_clear_device_status(bridge);
> --
> 2.28.0
>
On 16 Oct 2020, at 10:22, Bjorn Helgaas wrote:
> On Thu, Oct 15, 2020 at 05:11:08PM -0700, Sean V Kelley wrote:
>> From: Sean V Kelley <[email protected]>
>>
>> In some cases a bridge may not exist as the hardware controlling may
>> be
>> handled only by firmware and so is not visible to the OS. This
>> scenario is
>> also possible in future use cases involving non-native use of RCECs
>> by
>> firmware.
>>
>> Explicitly apply conditional logic around these resets by limiting
>> them to
>> Root Ports and Downstream Ports.
>>
>> Link:
>> https://lore.kernel.org/r/[email protected]
>> Signed-off-by: Sean V Kelley <[email protected]>
>> Signed-off-by: Bjorn Helgaas <[email protected]>
>> Acked-by: Jonathan Cameron <[email protected]>
>> ---
>> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------
>> 1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index 8b53aecdb43d..7883c9791562 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev,
>> void *data)
>>
>> /**
>> * pci_walk_bridge - walk bridges potentially AER affected
>> - * @bridge: bridge which may be a Port
>> + * @bridge: bridge which may be a Port, an RCEC with associated
>> RCiEPs,
>> + * or an RCiEP associated with an RCEC
>> * @cb: callback to be called for each device found
>> * @userdata: arbitrary pointer to be passed to callback
>> *
>> * If the device provided is a bridge, walk the subordinate bus,
>> including
>> * any bridged devices on buses under this bus. Call the provided
>> callback
>> * on each device found.
>> + *
>> + * If the device provided has no subordinate bus, call the callback
>> on the
>> + * device itself.
>> */
>> static void pci_walk_bridge(struct pci_dev *bridge,
>> int (*cb)(struct pci_dev *, void *),
>> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev
>> *bridge,
>> {
>> if (bridge->subordinate)
>> pci_walk_bus(bridge->subordinate, cb, userdata);
>> + else
>> + cb(bridge, userdata);
>
> Looks like *this* is the patch where the "no subordinate bus" case
> becomes possible? If you agree, I can just move the test here, no
> need to repost.
Agree, this is the patch that the check is needed as we are opening
things up for walking RC_ECs and RC_ENDs as below.
Sean
>
>> }
>>
>> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct
>> pci_dev *dev,
>>
>> /*
>> * Error recovery runs on all subordinates of the bridge. If the
>> - * bridge detected the error, it is cleared at the end.
>> + * bridge detected the error, it is cleared at the end. For RCiEPs
>> + * we should reset just the RCiEP itself.
>> */
>> if (type == PCI_EXP_TYPE_ROOT_PORT ||
>> - type == PCI_EXP_TYPE_DOWNSTREAM)
>> + type == PCI_EXP_TYPE_DOWNSTREAM ||
>> + type == PCI_EXP_TYPE_RC_EC ||
>> + type == PCI_EXP_TYPE_RC_END)
>> bridge = dev;
>> else
>> bridge = pci_upstream_bridge(dev);
>> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev
>> *dev,
>> pci_dbg(bridge, "broadcast error_detected message\n");
>> if (state == pci_channel_io_frozen) {
>> pci_walk_bridge(bridge, report_frozen_detected, &status);
>> + if (type == PCI_EXP_TYPE_RC_END) {
>> + pci_warn(dev, "subordinate device reset not possible for
>> RCiEP\n");
>> + status = PCI_ERS_RESULT_NONE;
>> + goto failed;
>> + }
>> +
>> status = reset_subordinates(bridge);
>> if (status != PCI_ERS_RESULT_RECOVERED) {
>> pci_warn(bridge, "subordinate device reset failed\n");
>> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev
>> *dev,
>> pci_dbg(bridge, "broadcast resume message\n");
>> pci_walk_bridge(bridge, report_resume, &status);
>>
>> - if (pcie_aer_is_native(bridge))
>> - pcie_clear_device_status(bridge);
>> - pci_aer_clear_nonfatal_status(bridge);
>> + if (type == PCI_EXP_TYPE_ROOT_PORT ||
>> + type == PCI_EXP_TYPE_DOWNSTREAM ||
>> + type == PCI_EXP_TYPE_RC_EC) {
>> + if (pcie_aer_is_native(bridge))
>> + pcie_clear_device_status(bridge);
>> + pci_aer_clear_nonfatal_status(bridge);
>> + }
>> pci_info(bridge, "device recovery successful\n");
>> return status;
>>
>> --
>> 2.28.0
>>
On Thu, Oct 15, 2020 at 05:11:08PM -0700, Sean V Kelley wrote:
> From: Sean V Kelley <[email protected]>
>
> In some cases a bridge may not exist as the hardware controlling may be
> handled only by firmware and so is not visible to the OS. This scenario is
> also possible in future use cases involving non-native use of RCECs by
> firmware.
>
> Explicitly apply conditional logic around these resets by limiting them to
> Root Ports and Downstream Ports.
>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Sean V Kelley <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> Acked-by: Jonathan Cameron <[email protected]>
> ---
> drivers/pci/pcie/err.c | 31 +++++++++++++++++++++++++------
> 1 file changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 8b53aecdb43d..7883c9791562 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -148,13 +148,17 @@ static int report_resume(struct pci_dev *dev, void *data)
>
> /**
> * pci_walk_bridge - walk bridges potentially AER affected
> - * @bridge: bridge which may be a Port
> + * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
> + * or an RCiEP associated with an RCEC
> * @cb: callback to be called for each device found
> * @userdata: arbitrary pointer to be passed to callback
> *
> * If the device provided is a bridge, walk the subordinate bus, including
> * any bridged devices on buses under this bus. Call the provided callback
> * on each device found.
> + *
> + * If the device provided has no subordinate bus, call the callback on the
> + * device itself.
> */
> static void pci_walk_bridge(struct pci_dev *bridge,
> int (*cb)(struct pci_dev *, void *),
> @@ -162,6 +166,8 @@ static void pci_walk_bridge(struct pci_dev *bridge,
> {
> if (bridge->subordinate)
> pci_walk_bus(bridge->subordinate, cb, userdata);
> + else
> + cb(bridge, userdata);
Looks like *this* is the patch where the "no subordinate bus" case
becomes possible? If you agree, I can just move the test here, no
need to repost.
> }
>
> pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> @@ -174,10 +180,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>
> /*
> * Error recovery runs on all subordinates of the bridge. If the
> - * bridge detected the error, it is cleared at the end.
> + * bridge detected the error, it is cleared at the end. For RCiEPs
> + * we should reset just the RCiEP itself.
> */
> if (type == PCI_EXP_TYPE_ROOT_PORT ||
> - type == PCI_EXP_TYPE_DOWNSTREAM)
> + type == PCI_EXP_TYPE_DOWNSTREAM ||
> + type == PCI_EXP_TYPE_RC_EC ||
> + type == PCI_EXP_TYPE_RC_END)
> bridge = dev;
> else
> bridge = pci_upstream_bridge(dev);
> @@ -185,6 +194,12 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_dbg(bridge, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> pci_walk_bridge(bridge, report_frozen_detected, &status);
> + if (type == PCI_EXP_TYPE_RC_END) {
> + pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
> + status = PCI_ERS_RESULT_NONE;
> + goto failed;
> + }
> +
> status = reset_subordinates(bridge);
> if (status != PCI_ERS_RESULT_RECOVERED) {
> pci_warn(bridge, "subordinate device reset failed\n");
> @@ -217,9 +232,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_dbg(bridge, "broadcast resume message\n");
> pci_walk_bridge(bridge, report_resume, &status);
>
> - if (pcie_aer_is_native(bridge))
> - pcie_clear_device_status(bridge);
> - pci_aer_clear_nonfatal_status(bridge);
> + if (type == PCI_EXP_TYPE_ROOT_PORT ||
> + type == PCI_EXP_TYPE_DOWNSTREAM ||
> + type == PCI_EXP_TYPE_RC_EC) {
> + if (pcie_aer_is_native(bridge))
> + pcie_clear_device_status(bridge);
> + pci_aer_clear_nonfatal_status(bridge);
> + }
> pci_info(bridge, "device recovery successful\n");
> return status;
>
> --
> 2.28.0
>
[+to Jonathan]
On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
> From: Qiuxu Zhuo <[email protected]>
>
> When attempting error recovery for an RCiEP associated with an RCEC device,
> there needs to be a way to update the Root Error Status, the Uncorrectable
> Error Status and the Uncorrectable Error Severity of the parent RCEC. In
> some non-native cases in which there is no OS-visible device associated
> with the RCiEP, there is nothing to act upon as the firmware is acting
> before the OS.
>
> Add handling for the linked RCEC in AER/ERR while taking into account
> non-native cases.
>
> Co-developed-by: Sean V Kelley <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Sean V Kelley <[email protected]>
> Signed-off-by: Qiuxu Zhuo <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> Reviewed-by: Jonathan Cameron <[email protected]>
> ---
> drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
> drivers/pci/pcie/err.c | 20 ++++++++--------
> 2 files changed, 48 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 65dff5f3457a..083f69b67bfd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
> */
> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> {
> - int aer = dev->aer_cap;
> + int type = pci_pcie_type(dev);
> + struct pci_dev *root;
> + int aer = 0;
> + int rc = 0;
> u32 reg32;
> - int rc;
>
> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
"type == PCI_EXP_TYPE_RC_END"
> + /*
> + * The reset should only clear the Root Error Status
> + * of the RCEC. Only perform this for the
> + * native case, i.e., an RCEC is present.
> + */
> + root = dev->rcec;
> + else
> + root = dev;
>
> - /* Disable Root's interrupt in response to error messages */
> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + if (root)
> + aer = dev->aer_cap;
>
> - rc = pci_bus_error_reset(dev);
> - pci_info(dev, "Root Port link has been reset\n");
> + if (aer) {
> + /* Disable Root's interrupt in response to error messages */
> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
Not directly related to *this* patch, but my assumption was that in
the APEI case, the firmware should retain ownership of the AER
Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
PCI_ERR_ROOT_STATUS.
But this code appears to ignore that ownership. Jonathan, you must
have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
Device Status errors only if OS owns AER"). Do you have any insight
about this?
> - /* Clear Root Error Status */
> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
> + /* Clear Root Error Status */
> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>
> - /* Enable Root Port's interrupt in response to error messages */
> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + /* Enable Root Port's interrupt in response to error messages */
> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> + }
> +
> + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
> + if (pcie_has_flr(root)) {
> + rc = pcie_flr(root);
> + pci_info(dev, "has been reset (%d)\n", rc);
> + }
> + } else {
> + rc = pci_bus_error_reset(root);
Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
think "root == dev" except when dev is an RCiEP. When dev is an
RCiEP, "root" is the RCEC (if present), and we want to reset the
RCiEP, not the RCEC.
> + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> + }
There are a couple changes here that I think should be split out.
Based on my theory that when firmware retains control of AER, the OS
should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
updates to them would have to be done by firmware before we get here,
I suggested reordering this:
- clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
- do reset
- clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
- enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
to this:
- clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
- clear PCI_ERR_ROOT_STATUS
- enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
- do reset
If my theory is correct, I think we should still reorder this, but:
- It's a significant behavior change that deserves its own patch so
we can document/bisect/revert.
- I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
order, it looks superfluous. There's no reason to disable error
reporting while clearing the status bits.
The current "clear, reset, enable" order suggests that the reset
might cause errors that we should ignore. I don't know whether
that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
AER implemetation: AER core and aerdriver"), which doesn't
elaborate.
- Should we also test for OS ownership of AER before touching
PCI_ERR_ROOT_STATUS?
- If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
think we *should* unless we can justify it), that would also
deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
test for OS ownership of AER (?), (4) the rest of this patch.
> return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
> }
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 7883c9791562..cbc5abfe767b 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
>
> /**
> * pci_walk_bridge - walk bridges potentially AER affected
> - * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
> - * or an RCiEP associated with an RCEC
> - * @cb: callback to be called for each device found
> - * @userdata: arbitrary pointer to be passed to callback
> + * @bridge bridge which may be an RCEC with associated RCiEPs,
> + * or a Port.
> + * @cb callback to be called for each device found
> + * @userdata arbitrary pointer to be passed to callback.
> *
> * If the device provided is a bridge, walk the subordinate bus, including
> * any bridged devices on buses under this bus. Call the provided callback
> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
> int (*cb)(struct pci_dev *, void *),
> void *userdata)
> {
> + /*
> + * In a non-native case where there is no OS-visible reporting
> + * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
> + */
> if (bridge->subordinate)
> pci_walk_bus(bridge->subordinate, cb, userdata);
> + else if (bridge->rcec)
> + cb(bridge->rcec, userdata);
> else
> cb(bridge, userdata);
> }
> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_dbg(bridge, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> pci_walk_bridge(bridge, report_frozen_detected, &status);
> - if (type == PCI_EXP_TYPE_RC_END) {
> - pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
> - status = PCI_ERS_RESULT_NONE;
> - goto failed;
> - }
> -
> status = reset_subordinates(bridge);
> if (status != PCI_ERS_RESULT_RECOVERED) {
> pci_warn(bridge, "subordinate device reset failed\n");
> --
> 2.28.0
>
[+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
begin with since you're looking at this code too. Particularly
interested in your thoughts about whether we should be touching
PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]
On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
> [+to Jonathan]
>
> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
> > From: Qiuxu Zhuo <[email protected]>
> >
> > When attempting error recovery for an RCiEP associated with an RCEC device,
> > there needs to be a way to update the Root Error Status, the Uncorrectable
> > Error Status and the Uncorrectable Error Severity of the parent RCEC. In
> > some non-native cases in which there is no OS-visible device associated
> > with the RCiEP, there is nothing to act upon as the firmware is acting
> > before the OS.
> >
> > Add handling for the linked RCEC in AER/ERR while taking into account
> > non-native cases.
> >
> > Co-developed-by: Sean V Kelley <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > Signed-off-by: Sean V Kelley <[email protected]>
> > Signed-off-by: Qiuxu Zhuo <[email protected]>
> > Signed-off-by: Bjorn Helgaas <[email protected]>
> > Reviewed-by: Jonathan Cameron <[email protected]>
> > ---
> > drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
> > drivers/pci/pcie/err.c | 20 ++++++++--------
> > 2 files changed, 48 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index 65dff5f3457a..083f69b67bfd 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
> > */
> > static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> > {
> > - int aer = dev->aer_cap;
> > + int type = pci_pcie_type(dev);
> > + struct pci_dev *root;
> > + int aer = 0;
> > + int rc = 0;
> > u32 reg32;
> > - int rc;
> >
> > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>
> "type == PCI_EXP_TYPE_RC_END"
>
> > + /*
> > + * The reset should only clear the Root Error Status
> > + * of the RCEC. Only perform this for the
> > + * native case, i.e., an RCEC is present.
> > + */
> > + root = dev->rcec;
> > + else
> > + root = dev;
> >
> > - /* Disable Root's interrupt in response to error messages */
> > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> > + if (root)
> > + aer = dev->aer_cap;
> >
> > - rc = pci_bus_error_reset(dev);
> > - pci_info(dev, "Root Port link has been reset\n");
> > + if (aer) {
> > + /* Disable Root's interrupt in response to error messages */
> > + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>
> Not directly related to *this* patch, but my assumption was that in
> the APEI case, the firmware should retain ownership of the AER
> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
> PCI_ERR_ROOT_STATUS.
>
> But this code appears to ignore that ownership. Jonathan, you must
> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
> Device Status errors only if OS owns AER"). Do you have any insight
> about this?
>
> > - /* Clear Root Error Status */
> > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
> > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
> > + /* Clear Root Error Status */
> > + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
> > + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
> >
> > - /* Enable Root Port's interrupt in response to error messages */
> > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> > + /* Enable Root Port's interrupt in response to error messages */
> > + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> > + }
> > +
> > + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
> > + if (pcie_has_flr(root)) {
> > + rc = pcie_flr(root);
> > + pci_info(dev, "has been reset (%d)\n", rc);
> > + }
> > + } else {
> > + rc = pci_bus_error_reset(root);
>
> Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
> think "root == dev" except when dev is an RCiEP. When dev is an
> RCiEP, "root" is the RCEC (if present), and we want to reset the
> RCiEP, not the RCEC.
>
> > + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> > + }
>
> There are a couple changes here that I think should be split out.
>
> Based on my theory that when firmware retains control of AER, the OS
> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
> updates to them would have to be done by firmware before we get here,
> I suggested reordering this:
>
> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> - do reset
> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>
> to this:
>
> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> - clear PCI_ERR_ROOT_STATUS
> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> - do reset
>
> If my theory is correct, I think we should still reorder this, but:
>
> - It's a significant behavior change that deserves its own patch so
> we can document/bisect/revert.
>
> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
> bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
> order, it looks superfluous. There's no reason to disable error
> reporting while clearing the status bits.
>
> The current "clear, reset, enable" order suggests that the reset
> might cause errors that we should ignore. I don't know whether
> that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
> AER implemetation: AER core and aerdriver"), which doesn't
> elaborate.
>
> - Should we also test for OS ownership of AER before touching
> PCI_ERR_ROOT_STATUS?
>
> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
> think we *should* unless we can justify it), that would also
> deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
> test for OS ownership of AER (?), (4) the rest of this patch.
>
> > return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
> > }
> > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > index 7883c9791562..cbc5abfe767b 100644
> > --- a/drivers/pci/pcie/err.c
> > +++ b/drivers/pci/pcie/err.c
> > @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
> >
> > /**
> > * pci_walk_bridge - walk bridges potentially AER affected
> > - * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
> > - * or an RCiEP associated with an RCEC
> > - * @cb: callback to be called for each device found
> > - * @userdata: arbitrary pointer to be passed to callback
> > + * @bridge bridge which may be an RCEC with associated RCiEPs,
> > + * or a Port.
> > + * @cb callback to be called for each device found
> > + * @userdata arbitrary pointer to be passed to callback.
> > *
> > * If the device provided is a bridge, walk the subordinate bus, including
> > * any bridged devices on buses under this bus. Call the provided callback
> > @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
> > int (*cb)(struct pci_dev *, void *),
> > void *userdata)
> > {
> > + /*
> > + * In a non-native case where there is no OS-visible reporting
> > + * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
> > + */
> > if (bridge->subordinate)
> > pci_walk_bus(bridge->subordinate, cb, userdata);
> > + else if (bridge->rcec)
> > + cb(bridge->rcec, userdata);
> > else
> > cb(bridge, userdata);
> > }
> > @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > pci_dbg(bridge, "broadcast error_detected message\n");
> > if (state == pci_channel_io_frozen) {
> > pci_walk_bridge(bridge, report_frozen_detected, &status);
> > - if (type == PCI_EXP_TYPE_RC_END) {
> > - pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
> > - status = PCI_ERS_RESULT_NONE;
> > - goto failed;
> > - }
> > -
> > status = reset_subordinates(bridge);
> > if (status != PCI_ERS_RESULT_RECOVERED) {
> > pci_warn(bridge, "subordinate device reset failed\n");
> > --
> > 2.28.0
> >
On 10/16/20 3:29 PM, Bjorn Helgaas wrote:
> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
> begin with since you're looking at this code too. Particularly
> interested in your thoughts about whether we should be touching
> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]
This part is not very clear in ACPI spec or PCI firmware spec.
IMO, since AEPI notifies the OS about the error, then I guess
we are allowed to clear the PCI_ERR_ROOT_STATUS register
after handling the error.
>
> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
>> [+to Jonathan]
>>
>> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
>>> From: Qiuxu Zhuo <[email protected]>
>>>
>>> When attempting error recovery for an RCiEP associated with an RCEC device,
>>> there needs to be a way to update the Root Error Status, the Uncorrectable
>>> Error Status and the Uncorrectable Error Severity of the parent RCEC. In
>>> some non-native cases in which there is no OS-visible device associated
>>> with the RCiEP, there is nothing to act upon as the firmware is acting
>>> before the OS.
>>>
>>> Add handling for the linked RCEC in AER/ERR while taking into account
>>> non-native cases.
>>>
>>> Co-developed-by: Sean V Kelley <[email protected]>
>>> Link: https://lore.kernel.org/r/[email protected]
>>> Signed-off-by: Sean V Kelley <[email protected]>
>>> Signed-off-by: Qiuxu Zhuo <[email protected]>
>>> Signed-off-by: Bjorn Helgaas <[email protected]>
>>> Reviewed-by: Jonathan Cameron <[email protected]>
>>> ---
>>> drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
>>> drivers/pci/pcie/err.c | 20 ++++++++--------
>>> 2 files changed, 48 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index 65dff5f3457a..083f69b67bfd 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
>>> */
>>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>> {
>>> - int aer = dev->aer_cap;
>>> + int type = pci_pcie_type(dev);
>>> + struct pci_dev *root;
>>> + int aer = 0;
>>> + int rc = 0;
>>> u32 reg32;
>>> - int rc;
>>>
>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>> "type == PCI_EXP_TYPE_RC_END"
>>
>>> + /*
>>> + * The reset should only clear the Root Error Status
>>> + * of the RCEC. Only perform this for the
>>> + * native case, i.e., an RCEC is present.
>>> + */
>>> + root = dev->rcec;
>>> + else
>>> + root = dev;
>>>
>>> - /* Disable Root's interrupt in response to error messages */
>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>> + if (root)
>>> + aer = dev->aer_cap;
>>>
>>> - rc = pci_bus_error_reset(dev);
>>> - pci_info(dev, "Root Port link has been reset\n");
>>> + if (aer) {
>>> + /* Disable Root's interrupt in response to error messages */
>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> Not directly related to *this* patch, but my assumption was that in
>> the APEI case, the firmware should retain ownership of the AER
>> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
>> PCI_ERR_ROOT_STATUS.
>>
>> But this code appears to ignore that ownership. Jonathan, you must
>> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
>> Device Status errors only if OS owns AER"). Do you have any insight
>> about this?
>>
>>> - /* Clear Root Error Status */
>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
>>> + /* Clear Root Error Status */
>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>>>
>>> - /* Enable Root Port's interrupt in response to error messages */
>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>> + /* Enable Root Port's interrupt in response to error messages */
>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>> + }
>>> +
>>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
>>> + if (pcie_has_flr(root)) {
>>> + rc = pcie_flr(root);
>>> + pci_info(dev, "has been reset (%d)\n", rc);
>>> + }
>>> + } else {
>>> + rc = pci_bus_error_reset(root);
>> Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
>> think "root == dev" except when dev is an RCiEP. When dev is an
>> RCiEP, "root" is the RCEC (if present), and we want to reset the
>> RCiEP, not the RCEC.
>>
>>> + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>>> + }
>> There are a couple changes here that I think should be split out.
>>
>> Based on my theory that when firmware retains control of AER, the OS
>> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
>> updates to them would have to be done by firmware before we get here,
>> I suggested reordering this:
>>
>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>> - do reset
>> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>
>> to this:
>>
>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>> - clear PCI_ERR_ROOT_STATUS
>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>> - do reset
>>
>> If my theory is correct, I think we should still reorder this, but:
>>
>> - It's a significant behavior change that deserves its own patch so
>> we can document/bisect/revert.
>>
>> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
>> bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
>> order, it looks superfluous. There's no reason to disable error
>> reporting while clearing the status bits.
>>
>> The current "clear, reset, enable" order suggests that the reset
>> might cause errors that we should ignore. I don't know whether
>> that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
>> AER implemetation: AER core and aerdriver"), which doesn't
>> elaborate.
>>
>> - Should we also test for OS ownership of AER before touching
>> PCI_ERR_ROOT_STATUS?
>>
>> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
>> think we *should* unless we can justify it), that would also
>> deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
>> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
>> test for OS ownership of AER (?), (4) the rest of this patch.
>>
>>> return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
>>> }
>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>> index 7883c9791562..cbc5abfe767b 100644
>>> --- a/drivers/pci/pcie/err.c
>>> +++ b/drivers/pci/pcie/err.c
>>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
>>>
>>> /**
>>> * pci_walk_bridge - walk bridges potentially AER affected
>>> - * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
>>> - * or an RCiEP associated with an RCEC
>>> - * @cb: callback to be called for each device found
>>> - * @userdata: arbitrary pointer to be passed to callback
>>> + * @bridge bridge which may be an RCEC with associated RCiEPs,
>>> + * or a Port.
>>> + * @cb callback to be called for each device found
>>> + * @userdata arbitrary pointer to be passed to callback.
>>> *
>>> * If the device provided is a bridge, walk the subordinate bus, including
>>> * any bridged devices on buses under this bus. Call the provided callback
>>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
>>> int (*cb)(struct pci_dev *, void *),
>>> void *userdata)
>>> {
>>> + /*
>>> + * In a non-native case where there is no OS-visible reporting
>>> + * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
>>> + */
>>> if (bridge->subordinate)
>>> pci_walk_bus(bridge->subordinate, cb, userdata);
>>> + else if (bridge->rcec)
>>> + cb(bridge->rcec, userdata);
>>> else
>>> cb(bridge, userdata);
>>> }
>>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>> pci_dbg(bridge, "broadcast error_detected message\n");
>>> if (state == pci_channel_io_frozen) {
>>> pci_walk_bridge(bridge, report_frozen_detected, &status);
>>> - if (type == PCI_EXP_TYPE_RC_END) {
>>> - pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
>>> - status = PCI_ERS_RESULT_NONE;
>>> - goto failed;
>>> - }
>>> -
>>> status = reset_subordinates(bridge);
>>> if (status != PCI_ERS_RESULT_RECOVERED) {
>>> pci_warn(bridge, "subordinate device reset failed\n");
>>> --
>>> 2.28.0
>>>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
On 10/16/20 3:29 PM, Bjorn Helgaas wrote:
> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
> begin with since you're looking at this code too. Particularly
> interested in your thoughts about whether we should be touching
> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]
This part is not very clear in ACPI spec or PCI firmware spec.
IMO, since AEPI notifies the OS about the error, then I guess
we are allowed to clear the PCI_ERR_ROOT_STATUS register
after handling the error (similar to EDR case).
>
> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
>> [+to Jonathan]
>>
>> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
>>> From: Qiuxu Zhuo <[email protected]>
>>>
>>> When attempting error recovery for an RCiEP associated with an RCEC device,
>>> there needs to be a way to update the Root Error Status, the Uncorrectable
>>> Error Status and the Uncorrectable Error Severity of the parent RCEC. In
>>> some non-native cases in which there is no OS-visible device associated
>>> with the RCiEP, there is nothing to act upon as the firmware is acting
>>> before the OS.
>>>
>>> Add handling for the linked RCEC in AER/ERR while taking into account
>>> non-native cases.
>>>
>>> Co-developed-by: Sean V Kelley <[email protected]>
>>> Link: https://lore.kernel.org/r/[email protected]
>>> Signed-off-by: Sean V Kelley <[email protected]>
>>> Signed-off-by: Qiuxu Zhuo <[email protected]>
>>> Signed-off-by: Bjorn Helgaas <[email protected]>
>>> Reviewed-by: Jonathan Cameron <[email protected]>
>>> ---
>>> drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
>>> drivers/pci/pcie/err.c | 20 ++++++++--------
>>> 2 files changed, 48 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index 65dff5f3457a..083f69b67bfd 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
>>> */
>>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>> {
>>> - int aer = dev->aer_cap;
>>> + int type = pci_pcie_type(dev);
>>> + struct pci_dev *root;
>>> + int aer = 0;
>>> + int rc = 0;
>>> u32 reg32;
>>> - int rc;
>>>
>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>> "type == PCI_EXP_TYPE_RC_END"
>>
>>> + /*
>>> + * The reset should only clear the Root Error Status
>>> + * of the RCEC. Only perform this for the
>>> + * native case, i.e., an RCEC is present.
>>> + */
>>> + root = dev->rcec;
>>> + else
>>> + root = dev;
>>>
>>> - /* Disable Root's interrupt in response to error messages */
>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>> + if (root)
>>> + aer = dev->aer_cap;
>>>
>>> - rc = pci_bus_error_reset(dev);
>>> - pci_info(dev, "Root Port link has been reset\n");
>>> + if (aer) {
>>> + /* Disable Root's interrupt in response to error messages */
>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> Not directly related to *this* patch, but my assumption was that in
>> the APEI case, the firmware should retain ownership of the AER
>> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
>> PCI_ERR_ROOT_STATUS.
>>
>> But this code appears to ignore that ownership. Jonathan, you must
>> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
>> Device Status errors only if OS owns AER"). Do you have any insight
>> about this?
>>
>>> - /* Clear Root Error Status */
>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
>>> + /* Clear Root Error Status */
>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>>>
>>> - /* Enable Root Port's interrupt in response to error messages */
>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>> + /* Enable Root Port's interrupt in response to error messages */
>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>> + }
>>> +
>>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
>>> + if (pcie_has_flr(root)) {
>>> + rc = pcie_flr(root);
>>> + pci_info(dev, "has been reset (%d)\n", rc);
>>> + }
>>> + } else {
>>> + rc = pci_bus_error_reset(root);
>> Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
>> think "root == dev" except when dev is an RCiEP. When dev is an
>> RCiEP, "root" is the RCEC (if present), and we want to reset the
>> RCiEP, not the RCEC.
>>
>>> + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>>> + }
>> There are a couple changes here that I think should be split out.
>>
>> Based on my theory that when firmware retains control of AER, the OS
>> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
>> updates to them would have to be done by firmware before we get here,
>> I suggested reordering this:
>>
>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>> - do reset
>> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>
>> to this:
>>
>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>> - clear PCI_ERR_ROOT_STATUS
>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>> - do reset
>>
>> If my theory is correct, I think we should still reorder this, but:
>>
>> - It's a significant behavior change that deserves its own patch so
>> we can document/bisect/revert.
>>
>> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
>> bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
>> order, it looks superfluous. There's no reason to disable error
>> reporting while clearing the status bits.
>>
>> The current "clear, reset, enable" order suggests that the reset
>> might cause errors that we should ignore. I don't know whether
>> that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
>> AER implemetation: AER core and aerdriver"), which doesn't
>> elaborate.
>>
>> - Should we also test for OS ownership of AER before touching
>> PCI_ERR_ROOT_STATUS?
>>
>> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
>> think we *should* unless we can justify it), that would also
>> deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
>> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
>> test for OS ownership of AER (?), (4) the rest of this patch.
>>
>>> return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
>>> }
>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>> index 7883c9791562..cbc5abfe767b 100644
>>> --- a/drivers/pci/pcie/err.c
>>> +++ b/drivers/pci/pcie/err.c
>>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
>>>
>>> /**
>>> * pci_walk_bridge - walk bridges potentially AER affected
>>> - * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
>>> - * or an RCiEP associated with an RCEC
>>> - * @cb: callback to be called for each device found
>>> - * @userdata: arbitrary pointer to be passed to callback
>>> + * @bridge bridge which may be an RCEC with associated RCiEPs,
>>> + * or a Port.
>>> + * @cb callback to be called for each device found
>>> + * @userdata arbitrary pointer to be passed to callback.
>>> *
>>> * If the device provided is a bridge, walk the subordinate bus, including
>>> * any bridged devices on buses under this bus. Call the provided callback
>>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
>>> int (*cb)(struct pci_dev *, void *),
>>> void *userdata)
>>> {
>>> + /*
>>> + * In a non-native case where there is no OS-visible reporting
>>> + * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
>>> + */
>>> if (bridge->subordinate)
>>> pci_walk_bus(bridge->subordinate, cb, userdata);
>>> + else if (bridge->rcec)
>>> + cb(bridge->rcec, userdata);
>>> else
>>> cb(bridge, userdata);
>>> }
>>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>> pci_dbg(bridge, "broadcast error_detected message\n");
>>> if (state == pci_channel_io_frozen) {
>>> pci_walk_bridge(bridge, report_frozen_detected, &status);
>>> - if (type == PCI_EXP_TYPE_RC_END) {
>>> - pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
>>> - status = PCI_ERS_RESULT_NONE;
>>> - goto failed;
>>> - }
>>> -
>>> status = reset_subordinates(bridge);
>>> if (status != PCI_ERS_RESULT_RECOVERED) {
>>> pci_warn(bridge, "subordinate device reset failed\n");
>>> --
>>> 2.28.0
>>>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
On 16 Oct 2020, at 13:30, Bjorn Helgaas wrote:
> [+to Jonathan]
>
> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
>> From: Qiuxu Zhuo <[email protected]>
>>
>> When attempting error recovery for an RCiEP associated with an RCEC
>> device,
>> there needs to be a way to update the Root Error Status, the
>> Uncorrectable
>> Error Status and the Uncorrectable Error Severity of the parent RCEC.
>> In
>> some non-native cases in which there is no OS-visible device
>> associated
>> with the RCiEP, there is nothing to act upon as the firmware is
>> acting
>> before the OS.
>>
>> Add handling for the linked RCEC in AER/ERR while taking into account
>> non-native cases.
>>
>> Co-developed-by: Sean V Kelley <[email protected]>
>> Link:
>> https://lore.kernel.org/r/[email protected]
>> Signed-off-by: Sean V Kelley <[email protected]>
>> Signed-off-by: Qiuxu Zhuo <[email protected]>
>> Signed-off-by: Bjorn Helgaas <[email protected]>
>> Reviewed-by: Jonathan Cameron <[email protected]>
>> ---
>> drivers/pci/pcie/aer.c | 53
>> ++++++++++++++++++++++++++++++------------
>> drivers/pci/pcie/err.c | 20 ++++++++--------
>> 2 files changed, 48 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 65dff5f3457a..083f69b67bfd 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
>> */
>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>> {
>> - int aer = dev->aer_cap;
>> + int type = pci_pcie_type(dev);
>> + struct pci_dev *root;
>> + int aer = 0;
>> + int rc = 0;
>> u32 reg32;
>> - int rc;
>>
>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>
> "type == PCI_EXP_TYPE_RC_END"
Right, I merged your suggested changes which added the type. Will
correct.
>
>> + /*
>> + * The reset should only clear the Root Error Status
>> + * of the RCEC. Only perform this for the
>> + * native case, i.e., an RCEC is present.
>> + */
>> + root = dev->rcec;
>> + else
>> + root = dev;
>>
>> - /* Disable Root's interrupt in response to error messages */
>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> + if (root)
>> + aer = dev->aer_cap;
>>
>> - rc = pci_bus_error_reset(dev);
>> - pci_info(dev, "Root Port link has been reset\n");
>> + if (aer) {
>> + /* Disable Root's interrupt in response to error messages */
>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>
> Not directly related to *this* patch, but my assumption was that in
> the APEI case, the firmware should retain ownership of the AER
> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
> PCI_ERR_ROOT_STATUS.
>
> But this code appears to ignore that ownership. Jonathan, you must
> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
> Device Status errors only if OS owns AER"). Do you have any insight
> about this?
>
>> - /* Clear Root Error Status */
>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
>> + /* Clear Root Error Status */
>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>>
>> - /* Enable Root Port's interrupt in response to error messages */
>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> + /* Enable Root Port's interrupt in response to error messages */
>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>> + }
>> +
>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END))
>> {
>> + if (pcie_has_flr(root)) {
>> + rc = pcie_flr(root);
>> + pci_info(dev, "has been reset (%d)\n", rc);
>> + }
>> + } else {
>> + rc = pci_bus_error_reset(root);
>
> Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
> think "root == dev" except when dev is an RCiEP. When dev is an
> RCiEP, "root" is the RCEC (if present), and we want to reset the
> RCiEP, not the RCEC.
Right, when I did the goto in the earlier incarnation, I always set root
to dev at the start and in the merge it needs to be dev always except
for the RC_END where RCEC exists. Will change without bringing back the
goto…
+ struct pci_dev *root = dev;
…
+non_native:
+ if ((type == PCI_EXP_TYPE_RC_EC) || (type ==
PCI_EXP_TYPE_RC_END)) {
+ rc = flr_on_rc(root);
+ pci_info(dev, "has been reset (%d)\n", rc);
+ } else {
+ rc = pci_bus_error_reset(root);
+ pci_info(dev, "Root Port link has been reset (%d)\n",
rc);
+ }
>
>> + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>> + }
>
> There are a couple changes here that I think should be split out.
>
> Based on my theory that when firmware retains control of AER, the OS
> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
> updates to them would have to be done by firmware before we get here,
> I suggested reordering this:
>
> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> - do reset
> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>
> to this:
>
> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> - clear PCI_ERR_ROOT_STATUS
> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> - do reset
>
> If my theory is correct, I think we should still reorder this, but:
>
> - It's a significant behavior change that deserves its own patch so
> we can document/bisect/revert.
>
> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
> bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
> order, it looks superfluous. There's no reason to disable error
> reporting while clearing the status bits.
>
> The current "clear, reset, enable" order suggests that the reset
> might cause errors that we should ignore. I don't know whether
> that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
> AER implemetation: AER core and aerdriver"), which doesn't
> elaborate.
>
> - Should we also test for OS ownership of AER before touching
> PCI_ERR_ROOT_STATUS?
>
> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
> think we *should* unless we can justify it), that would also
> deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
> test for OS ownership of AER (?), (4) the rest of this patch.
You’ve highlighted some good questions.
I think we should remove the fiddling until we have a clearer picture
and put that into its own patch.
Sean
>
>> return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
>> }
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index 7883c9791562..cbc5abfe767b 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev,
>> void *data)
>>
>> /**
>> * pci_walk_bridge - walk bridges potentially AER affected
>> - * @bridge: bridge which may be a Port, an RCEC with associated
>> RCiEPs,
>> - * or an RCiEP associated with an RCEC
>> - * @cb: callback to be called for each device found
>> - * @userdata: arbitrary pointer to be passed to callback
>> + * @bridge bridge which may be an RCEC with associated RCiEPs,
>> + * or a Port.
>> + * @cb callback to be called for each device found
>> + * @userdata arbitrary pointer to be passed to callback.
>> *
>> * If the device provided is a bridge, walk the subordinate bus,
>> including
>> * any bridged devices on buses under this bus. Call the provided
>> callback
>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev
>> *bridge,
>> int (*cb)(struct pci_dev *, void *),
>> void *userdata)
>> {
>> + /*
>> + * In a non-native case where there is no OS-visible reporting
>> + * device the bridge will be NULL, i.e., no RCEC, no Downstream
>> Port.
>> + */
>> if (bridge->subordinate)
>> pci_walk_bus(bridge->subordinate, cb, userdata);
>> + else if (bridge->rcec)
>> + cb(bridge->rcec, userdata);
>> else
>> cb(bridge, userdata);
>> }
>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev
>> *dev,
>> pci_dbg(bridge, "broadcast error_detected message\n");
>> if (state == pci_channel_io_frozen) {
>> pci_walk_bridge(bridge, report_frozen_detected, &status);
>> - if (type == PCI_EXP_TYPE_RC_END) {
>> - pci_warn(dev, "subordinate device reset not possible for
>> RCiEP\n");
>> - status = PCI_ERS_RESULT_NONE;
>> - goto failed;
>> - }
>> -
>> status = reset_subordinates(bridge);
>> if (status != PCI_ERS_RESULT_RECOVERED) {
>> pci_warn(bridge, "subordinate device reset failed\n");
>> --
>> 2.28.0
>>
On Sat, Oct 17, 2020 at 6:29 AM Bjorn Helgaas <[email protected]> wrote:
>
> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
> begin with since you're looking at this code too. Particularly
> interested in your thoughts about whether we should be touching
> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]
aer_root_reset() function has a prefix 'aer_', looks like it's a
function of aer driver, will
only be called by aer driver at runtime. if so it's up to the
owner/aer to know if OSPM is
granted to init. while actually some of the functions and runtime service of
aer driver is also shared by GHES driver (running time) and DPC driver
(compiling time ?)
etc. then it is confused now.
Shall we move some of the shared functions and running time service to
pci/err.c ?
if so , just like pcie_do_recovery(), it's share by firmware_first mode GHES
ghes_probe()
->ghes_irq_func
->ghes_proc
->ghes_do_proc()
->ghes_handle_aer()
->aer_recover_work_func()
->pcie_do_recovery()
->aer_root_reset()
and aer driver etc. if aer wants to do some access might conflict
with firmware(or
firmware in embedded controller) should check _OSC_ etc first. blindly issue
PCI_ERR_ROOT_COMMAND or clear PCI_ERR_ROOT_STATUS *likely*
cause errors by error handling itself.
Thanks,
Ethan
>
> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
> > [+to Jonathan]
> >
> > On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
> > > From: Qiuxu Zhuo <[email protected]>
> > >
> > > When attempting error recovery for an RCiEP associated with an RCEC device,
> > > there needs to be a way to update the Root Error Status, the Uncorrectable
> > > Error Status and the Uncorrectable Error Severity of the parent RCEC. In
> > > some non-native cases in which there is no OS-visible device associated
> > > with the RCiEP, there is nothing to act upon as the firmware is acting
> > > before the OS.
> > >
> > > Add handling for the linked RCEC in AER/ERR while taking into account
> > > non-native cases.
> > >
> > > Co-developed-by: Sean V Kelley <[email protected]>
> > > Link: https://lore.kernel.org/r/[email protected]
> > > Signed-off-by: Sean V Kelley <[email protected]>
> > > Signed-off-by: Qiuxu Zhuo <[email protected]>
> > > Signed-off-by: Bjorn Helgaas <[email protected]>
> > > Reviewed-by: Jonathan Cameron <[email protected]>
> > > ---
> > > drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
> > > drivers/pci/pcie/err.c | 20 ++++++++--------
> > > 2 files changed, 48 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 65dff5f3457a..083f69b67bfd 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
> > > */
> > > static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> > > {
> > > - int aer = dev->aer_cap;
> > > + int type = pci_pcie_type(dev);
> > > + struct pci_dev *root;
> > > + int aer = 0;
> > > + int rc = 0;
> > > u32 reg32;
> > > - int rc;
> > >
> > > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> >
> > "type == PCI_EXP_TYPE_RC_END"
> >
> > > + /*
> > > + * The reset should only clear the Root Error Status
> > > + * of the RCEC. Only perform this for the
> > > + * native case, i.e., an RCEC is present.
> > > + */
> > > + root = dev->rcec;
> > > + else
> > > + root = dev;
> > >
> > > - /* Disable Root's interrupt in response to error messages */
> > > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> > > + if (root)
> > > + aer = dev->aer_cap;
> > >
> > > - rc = pci_bus_error_reset(dev);
> > > - pci_info(dev, "Root Port link has been reset\n");
> > > + if (aer) {
> > > + /* Disable Root's interrupt in response to error messages */
> > > + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > > + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > > + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> >
> > Not directly related to *this* patch, but my assumption was that in
> > the APEI case, the firmware should retain ownership of the AER
> > Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
> > PCI_ERR_ROOT_STATUS.
> >
> > But this code appears to ignore that ownership. Jonathan, you must
> > have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
> > Device Status errors only if OS owns AER"). Do you have any insight
> > about this?
> >
> > > - /* Clear Root Error Status */
> > > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
> > > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
> > > + /* Clear Root Error Status */
> > > + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
> > > + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
> > >
> > > - /* Enable Root Port's interrupt in response to error messages */
> > > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
> > > + /* Enable Root Port's interrupt in response to error messages */
> > > + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
> > > + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > > + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
> > > + }
> > > +
> > > + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
> > > + if (pcie_has_flr(root)) {
> > > + rc = pcie_flr(root);
> > > + pci_info(dev, "has been reset (%d)\n", rc);
> > > + }
> > > + } else {
> > > + rc = pci_bus_error_reset(root);
> >
> > Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
> > think "root == dev" except when dev is an RCiEP. When dev is an
> > RCiEP, "root" is the RCEC (if present), and we want to reset the
> > RCiEP, not the RCEC.
> >
> > > + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
> > > + }
> >
> > There are a couple changes here that I think should be split out.
> >
> > Based on my theory that when firmware retains control of AER, the OS
> > should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
> > updates to them would have to be done by firmware before we get here,
> > I suggested reordering this:
> >
> > - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> > - do reset
> > - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
> > - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> >
> > to this:
> >
> > - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> > - clear PCI_ERR_ROOT_STATUS
> > - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> > - do reset
> >
> > If my theory is correct, I think we should still reorder this, but:
> >
> > - It's a significant behavior change that deserves its own patch so
> > we can document/bisect/revert.
> >
> > - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
> > bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
> > order, it looks superfluous. There's no reason to disable error
> > reporting while clearing the status bits.
> >
> > The current "clear, reset, enable" order suggests that the reset
> > might cause errors that we should ignore. I don't know whether
> > that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
> > AER implemetation: AER core and aerdriver"), which doesn't
> > elaborate.
> >
> > - Should we also test for OS ownership of AER before touching
> > PCI_ERR_ROOT_STATUS?
> >
> > - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
> > think we *should* unless we can justify it), that would also
> > deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
> > fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
> > test for OS ownership of AER (?), (4) the rest of this patch.
> >
> > > return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
> > > }
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index 7883c9791562..cbc5abfe767b 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
> > >
> > > /**
> > > * pci_walk_bridge - walk bridges potentially AER affected
> > > - * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
> > > - * or an RCiEP associated with an RCEC
> > > - * @cb: callback to be called for each device found
> > > - * @userdata: arbitrary pointer to be passed to callback
> > > + * @bridge bridge which may be an RCEC with associated RCiEPs,
> > > + * or a Port.
> > > + * @cb callback to be called for each device found
> > > + * @userdata arbitrary pointer to be passed to callback.
> > > *
> > > * If the device provided is a bridge, walk the subordinate bus, including
> > > * any bridged devices on buses under this bus. Call the provided callback
> > > @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
> > > int (*cb)(struct pci_dev *, void *),
> > > void *userdata)
> > > {
> > > + /*
> > > + * In a non-native case where there is no OS-visible reporting
> > > + * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
> > > + */
> > > if (bridge->subordinate)
> > > pci_walk_bus(bridge->subordinate, cb, userdata);
> > > + else if (bridge->rcec)
> > > + cb(bridge->rcec, userdata);
> > > else
> > > cb(bridge, userdata);
> > > }
> > > @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> > > pci_dbg(bridge, "broadcast error_detected message\n");
> > > if (state == pci_channel_io_frozen) {
> > > pci_walk_bridge(bridge, report_frozen_detected, &status);
> > > - if (type == PCI_EXP_TYPE_RC_END) {
> > > - pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
> > > - status = PCI_ERS_RESULT_NONE;
> > > - goto failed;
> > > - }
> > > -
> > > status = reset_subordinates(bridge);
> > > if (status != PCI_ERS_RESULT_RECOVERED) {
> > > pci_warn(bridge, "subordinate device reset failed\n");
> > > --
> > > 2.28.0
> > >
On 19 Oct 2020, at 3:49, Ethan Zhao wrote:
> On Sat, Oct 17, 2020 at 6:29 AM Bjorn Helgaas <[email protected]>
> wrote:
>>
>> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
>> begin with since you're looking at this code too. Particularly
>> interested in your thoughts about whether we should be touching
>> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]
>
> aer_root_reset() function has a prefix 'aer_', looks like it's a
> function of aer driver, will
> only be called by aer driver at runtime. if so it's up to the
> owner/aer to know if OSPM is
> granted to init. while actually some of the functions and runtime
> service of
> aer driver is also shared by GHES driver (running time) and DPC driver
> (compiling time ?)
> etc. then it is confused now.
>
> Shall we move some of the shared functions and running time service to
> pci/err.c ?
> if so , just like pcie_do_recovery(), it's share by firmware_first
> mode GHES
> ghes_probe()
> ->ghes_irq_func
> ->ghes_proc
> ->ghes_do_proc()
> ->ghes_handle_aer()
> ->aer_recover_work_func()
> ->pcie_do_recovery()
> ->aer_root_reset()
>
> and aer driver etc. if aer wants to do some access might conflict
> with firmware(or
> firmware in embedded controller) should check _OSC_ etc first.
> blindly issue
> PCI_ERR_ROOT_COMMAND or clear PCI_ERR_ROOT_STATUS *likely*
> cause errors by error handling itself.
If _OSC negotiation ends up with FW being in control of AER, that means
OS is not in charge and should not be messing with AER I guess. That
seems appropriate to me then.
Thanks,
Sean
>
> Thanks,
> Ethan
>
>>
>> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
>>> [+to Jonathan]
>>>
>>> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
>>>> From: Qiuxu Zhuo <[email protected]>
>>>>
>>>> When attempting error recovery for an RCiEP associated with an RCEC
>>>> device,
>>>> there needs to be a way to update the Root Error Status, the
>>>> Uncorrectable
>>>> Error Status and the Uncorrectable Error Severity of the parent
>>>> RCEC. In
>>>> some non-native cases in which there is no OS-visible device
>>>> associated
>>>> with the RCiEP, there is nothing to act upon as the firmware is
>>>> acting
>>>> before the OS.
>>>>
>>>> Add handling for the linked RCEC in AER/ERR while taking into
>>>> account
>>>> non-native cases.
>>>>
>>>> Co-developed-by: Sean V Kelley <[email protected]>
>>>> Link:
>>>> https://lore.kernel.org/r/[email protected]
>>>> Signed-off-by: Sean V Kelley <[email protected]>
>>>> Signed-off-by: Qiuxu Zhuo <[email protected]>
>>>> Signed-off-by: Bjorn Helgaas <[email protected]>
>>>> Reviewed-by: Jonathan Cameron <[email protected]>
>>>> ---
>>>> drivers/pci/pcie/aer.c | 53
>>>> ++++++++++++++++++++++++++++++------------
>>>> drivers/pci/pcie/err.c | 20 ++++++++--------
>>>> 2 files changed, 48 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>> index 65dff5f3457a..083f69b67bfd 100644
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device
>>>> *dev)
>>>> */
>>>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>>> {
>>>> - int aer = dev->aer_cap;
>>>> + int type = pci_pcie_type(dev);
>>>> + struct pci_dev *root;
>>>> + int aer = 0;
>>>> + int rc = 0;
>>>> u32 reg32;
>>>> - int rc;
>>>>
>>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>>>
>>> "type == PCI_EXP_TYPE_RC_END"
>>>
>>>> + /*
>>>> + * The reset should only clear the Root Error Status
>>>> + * of the RCEC. Only perform this for the
>>>> + * native case, i.e., an RCEC is present.
>>>> + */
>>>> + root = dev->rcec;
>>>> + else
>>>> + root = dev;
>>>>
>>>> - /* Disable Root's interrupt in response to error messages */
>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>>> + if (root)
>>>> + aer = dev->aer_cap;
>>>>
>>>> - rc = pci_bus_error_reset(dev);
>>>> - pci_info(dev, "Root Port link has been reset\n");
>>>> + if (aer) {
>>>> + /* Disable Root's interrupt in response to error
>>>> messages */
>>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND,
>>>> ®32);
>>>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>>> + pci_write_config_dword(root, aer +
>>>> PCI_ERR_ROOT_COMMAND, reg32);
>>>
>>> Not directly related to *this* patch, but my assumption was that in
>>> the APEI case, the firmware should retain ownership of the AER
>>> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
>>> PCI_ERR_ROOT_STATUS.
>>>
>>> But this code appears to ignore that ownership. Jonathan, you must
>>> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
>>> Device Status errors only if OS owns AER"). Do you have any insight
>>> about this?
>>>
>>>> - /* Clear Root Error Status */
>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
>>>> + /* Clear Root Error Status */
>>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS,
>>>> ®32);
>>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS,
>>>> reg32);
>>>>
>>>> - /* Enable Root Port's interrupt in response to error messages
>>>> */
>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>>> + /* Enable Root Port's interrupt in response to error
>>>> messages */
>>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND,
>>>> ®32);
>>>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>>> + pci_write_config_dword(root, aer +
>>>> PCI_ERR_ROOT_COMMAND, reg32);
>>>> + }
>>>> +
>>>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type ==
>>>> PCI_EXP_TYPE_RC_END)) {
>>>> + if (pcie_has_flr(root)) {
>>>> + rc = pcie_flr(root);
>>>> + pci_info(dev, "has been reset (%d)\n", rc);
>>>> + }
>>>> + } else {
>>>> + rc = pci_bus_error_reset(root);
>>>
>>> Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
>>> think "root == dev" except when dev is an RCiEP. When dev is an
>>> RCiEP, "root" is the RCEC (if present), and we want to reset the
>>> RCiEP, not the RCEC.
>>>
>>>> + pci_info(dev, "Root Port link has been reset (%d)\n",
>>>> rc);
>>>> + }
>>>
>>> There are a couple changes here that I think should be split out.
>>>
>>> Based on my theory that when firmware retains control of AER, the OS
>>> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and
>>> any
>>> updates to them would have to be done by firmware before we get
>>> here,
>>> I suggested reordering this:
>>>
>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>> - do reset
>>> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by
>>> firmware?)
>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>
>>> to this:
>>>
>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>> - clear PCI_ERR_ROOT_STATUS
>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>> - do reset
>>>
>>> If my theory is correct, I think we should still reorder this, but:
>>>
>>> - It's a significant behavior change that deserves its own patch
>>> so
>>> we can document/bisect/revert.
>>>
>>> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error
>>> reporting
>>> bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
>>> order, it looks superfluous. There's no reason to disable error
>>> reporting while clearing the status bits.
>>>
>>> The current "clear, reset, enable" order suggests that the reset
>>> might cause errors that we should ignore. I don't know whether
>>> that's the case or not. It dates from 6c2b374d7485
>>> ("PCI-Express
>>> AER implemetation: AER core and aerdriver"), which doesn't
>>> elaborate.
>>>
>>> - Should we also test for OS ownership of AER before touching
>>> PCI_ERR_ROOT_STATUS?
>>>
>>> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I
>>> tentatively
>>> think we *should* unless we can justify it), that would also
>>> deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
>>> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset,
>>> (3)
>>> test for OS ownership of AER (?), (4) the rest of this patch.
>>>
>>>> return rc ? PCI_ERS_RESULT_DISCONNECT :
>>>> PCI_ERS_RESULT_RECOVERED;
>>>> }
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index 7883c9791562..cbc5abfe767b 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev,
>>>> void *data)
>>>>
>>>> /**
>>>> * pci_walk_bridge - walk bridges potentially AER affected
>>>> - * @bridge: bridge which may be a Port, an RCEC with
>>>> associated RCiEPs,
>>>> - * or an RCiEP associated with an RCEC
>>>> - * @cb: callback to be called for each device found
>>>> - * @userdata: arbitrary pointer to be passed to callback
>>>> + * @bridge bridge which may be an RCEC with associated RCiEPs,
>>>> + * or a Port.
>>>> + * @cb callback to be called for each device found
>>>> + * @userdata arbitrary pointer to be passed to callback.
>>>> *
>>>> * If the device provided is a bridge, walk the subordinate bus,
>>>> including
>>>> * any bridged devices on buses under this bus. Call the provided
>>>> callback
>>>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev
>>>> *bridge,
>>>> int (*cb)(struct pci_dev *, void *),
>>>> void *userdata)
>>>> {
>>>> + /*
>>>> + * In a non-native case where there is no OS-visible reporting
>>>> + * device the bridge will be NULL, i.e., no RCEC, no Downstream
>>>> Port.
>>>> + */
>>>> if (bridge->subordinate)
>>>> pci_walk_bus(bridge->subordinate, cb, userdata);
>>>> + else if (bridge->rcec)
>>>> + cb(bridge->rcec, userdata);
>>>> else
>>>> cb(bridge, userdata);
>>>> }
>>>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct
>>>> pci_dev *dev,
>>>> pci_dbg(bridge, "broadcast error_detected message\n");
>>>> if (state == pci_channel_io_frozen) {
>>>> pci_walk_bridge(bridge, report_frozen_detected,
>>>> &status);
>>>> - if (type == PCI_EXP_TYPE_RC_END) {
>>>> - pci_warn(dev, "subordinate device reset not
>>>> possible for RCiEP\n");
>>>> - status = PCI_ERS_RESULT_NONE;
>>>> - goto failed;
>>>> - }
>>>> -
>>>> status = reset_subordinates(bridge);
>>>> if (status != PCI_ERS_RESULT_RECOVERED) {
>>>> pci_warn(bridge, "subordinate device reset
>>>> failed\n");
>>>> --
>>>> 2.28.0
>>>>
On 10/19/20 11:31 AM, Sean V Kelley wrote:
> On 19 Oct 2020, at 3:49, Ethan Zhao wrote:
>
>> On Sat, Oct 17, 2020 at 6:29 AM Bjorn Helgaas <[email protected]> wrote:
>>>
>>> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
>>> begin with since you're looking at this code too. Particularly
>>> interested in your thoughts about whether we should be touching
>>> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own AER.]
>>
>> aer_root_reset() function has a prefix 'aer_', looks like it's a
>> function of aer driver, will
>> only be called by aer driver at runtime. if so it's up to the
>> owner/aer to know if OSPM is
>> granted to init. while actually some of the functions and runtime service of
>> aer driver is also shared by GHES driver (running time) and DPC driver
>> (compiling time ?)
>> etc. then it is confused now.
>>
>> Shall we move some of the shared functions and running time service to
>> pci/err.c ?
>> if so , just like pcie_do_recovery(), it's share by firmware_first mode GHES
>> ghes_probe()
>> ->ghes_irq_func
>> ->ghes_proc
>> ->ghes_do_proc()
>> ->ghes_handle_aer()
>> ->aer_recover_work_func()
>> ->pcie_do_recovery()
>> ->aer_root_reset()
>>
>> and aer driver etc. if aer wants to do some access might conflict
>> with firmware(or
>> firmware in embedded controller) should check _OSC_ etc first. blindly issue
>> PCI_ERR_ROOT_COMMAND or clear PCI_ERR_ROOT_STATUS *likely*
>> cause errors by error handling itself.
>
> If _OSC negotiation ends up with FW being in control of AER, that means OS is not in charge and
> should not be messing with AER I guess. That seems appropriate to me then.
But APEI based notification is more like a hybrid approach (frimware first detects the
error and notifies OS). Since spec does not clarify what OS is allowed to do, its bit of a
gray area now. My point is, since firmware allows OS to process the error by sending
the notification, I think its OK to clear the status once the error is handled.
>
> Thanks,
>
> Sean
>
>
>
>>
>> Thanks,
>> Ethan
>>
>>>
>>> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
>>>> [+to Jonathan]
>>>>
>>>> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
>>>>> From: Qiuxu Zhuo <[email protected]>
>>>>>
>>>>> When attempting error recovery for an RCiEP associated with an RCEC device,
>>>>> there needs to be a way to update the Root Error Status, the Uncorrectable
>>>>> Error Status and the Uncorrectable Error Severity of the parent RCEC. In
>>>>> some non-native cases in which there is no OS-visible device associated
>>>>> with the RCiEP, there is nothing to act upon as the firmware is acting
>>>>> before the OS.
>>>>>
>>>>> Add handling for the linked RCEC in AER/ERR while taking into account
>>>>> non-native cases.
>>>>>
>>>>> Co-developed-by: Sean V Kelley <[email protected]>
>>>>> Link: https://lore.kernel.org/r/[email protected]
>>>>> Signed-off-by: Sean V Kelley <[email protected]>
>>>>> Signed-off-by: Qiuxu Zhuo <[email protected]>
>>>>> Signed-off-by: Bjorn Helgaas <[email protected]>
>>>>> Reviewed-by: Jonathan Cameron <[email protected]>
>>>>> ---
>>>>> drivers/pci/pcie/aer.c | 53 ++++++++++++++++++++++++++++++------------
>>>>> drivers/pci/pcie/err.c | 20 ++++++++--------
>>>>> 2 files changed, 48 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>>> index 65dff5f3457a..083f69b67bfd 100644
>>>>> --- a/drivers/pci/pcie/aer.c
>>>>> +++ b/drivers/pci/pcie/aer.c
>>>>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device *dev)
>>>>> */
>>>>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>>>> {
>>>>> - int aer = dev->aer_cap;
>>>>> + int type = pci_pcie_type(dev);
>>>>> + struct pci_dev *root;
>>>>> + int aer = 0;
>>>>> + int rc = 0;
>>>>> u32 reg32;
>>>>> - int rc;
>>>>>
>>>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>>>>
>>>> "type == PCI_EXP_TYPE_RC_END"
>>>>
>>>>> + /*
>>>>> + * The reset should only clear the Root Error Status
>>>>> + * of the RCEC. Only perform this for the
>>>>> + * native case, i.e., an RCEC is present.
>>>>> + */
>>>>> + root = dev->rcec;
>>>>> + else
>>>>> + root = dev;
>>>>>
>>>>> - /* Disable Root's interrupt in response to error messages */
>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>>>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>>>> + if (root)
>>>>> + aer = dev->aer_cap;
>>>>>
>>>>> - rc = pci_bus_error_reset(dev);
>>>>> - pci_info(dev, "Root Port link has been reset\n");
>>>>> + if (aer) {
>>>>> + /* Disable Root's interrupt in response to error messages */
>>>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>>>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>>>
>>>> Not directly related to *this* patch, but my assumption was that in
>>>> the APEI case, the firmware should retain ownership of the AER
>>>> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
>>>> PCI_ERR_ROOT_STATUS.
>>>>
>>>> But this code appears to ignore that ownership. Jonathan, you must
>>>> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
>>>> Device Status errors only if OS owns AER"). Do you have any insight
>>>> about this?
>>>>
>>>>> - /* Clear Root Error Status */
>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
>>>>> + /* Clear Root Error Status */
>>>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
>>>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
>>>>>
>>>>> - /* Enable Root Port's interrupt in response to error messages */
>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>>>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>>>> + /* Enable Root Port's interrupt in response to error messages */
>>>>> + pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32);
>>>>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>>>> + pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32);
>>>>> + }
>>>>> +
>>>>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type == PCI_EXP_TYPE_RC_END)) {
>>>>> + if (pcie_has_flr(root)) {
>>>>> + rc = pcie_flr(root);
>>>>> + pci_info(dev, "has been reset (%d)\n", rc);
>>>>> + }
>>>>> + } else {
>>>>> + rc = pci_bus_error_reset(root);
>>>>
>>>> Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
>>>> think "root == dev" except when dev is an RCiEP. When dev is an
>>>> RCiEP, "root" is the RCEC (if present), and we want to reset the
>>>> RCiEP, not the RCEC.
>>>>
>>>>> + pci_info(dev, "Root Port link has been reset (%d)\n", rc);
>>>>> + }
>>>>
>>>> There are a couple changes here that I think should be split out.
>>>>
>>>> Based on my theory that when firmware retains control of AER, the OS
>>>> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and any
>>>> updates to them would have to be done by firmware before we get here,
>>>> I suggested reordering this:
>>>>
>>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>> - do reset
>>>> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by firmware?)
>>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>>
>>>> to this:
>>>>
>>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>> - clear PCI_ERR_ROOT_STATUS
>>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>> - do reset
>>>>
>>>> If my theory is correct, I think we should still reorder this, but:
>>>>
>>>> - It's a significant behavior change that deserves its own patch so
>>>> we can document/bisect/revert.
>>>>
>>>> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error reporting
>>>> bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
>>>> order, it looks superfluous. There's no reason to disable error
>>>> reporting while clearing the status bits.
>>>>
>>>> The current "clear, reset, enable" order suggests that the reset
>>>> might cause errors that we should ignore. I don't know whether
>>>> that's the case or not. It dates from 6c2b374d7485 ("PCI-Express
>>>> AER implemetation: AER core and aerdriver"), which doesn't
>>>> elaborate.
>>>>
>>>> - Should we also test for OS ownership of AER before touching
>>>> PCI_ERR_ROOT_STATUS?
>>>>
>>>> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I tentatively
>>>> think we *should* unless we can justify it), that would also
>>>> deserve its own patch. Possibly (1) remove PCI_ERR_ROOT_COMMAND
>>>> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset, (3)
>>>> test for OS ownership of AER (?), (4) the rest of this patch.
>>>>
>>>>> return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
>>>>> }
>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>>> index 7883c9791562..cbc5abfe767b 100644
>>>>> --- a/drivers/pci/pcie/err.c
>>>>> +++ b/drivers/pci/pcie/err.c
>>>>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev *dev, void *data)
>>>>>
>>>>> /**
>>>>> * pci_walk_bridge - walk bridges potentially AER affected
>>>>> - * @bridge: bridge which may be a Port, an RCEC with associated RCiEPs,
>>>>> - * or an RCiEP associated with an RCEC
>>>>> - * @cb: callback to be called for each device found
>>>>> - * @userdata: arbitrary pointer to be passed to callback
>>>>> + * @bridge bridge which may be an RCEC with associated RCiEPs,
>>>>> + * or a Port.
>>>>> + * @cb callback to be called for each device found
>>>>> + * @userdata arbitrary pointer to be passed to callback.
>>>>> *
>>>>> * If the device provided is a bridge, walk the subordinate bus, including
>>>>> * any bridged devices on buses under this bus. Call the provided callback
>>>>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev *bridge,
>>>>> int (*cb)(struct pci_dev *, void *),
>>>>> void *userdata)
>>>>> {
>>>>> + /*
>>>>> + * In a non-native case where there is no OS-visible reporting
>>>>> + * device the bridge will be NULL, i.e., no RCEC, no Downstream Port.
>>>>> + */
>>>>> if (bridge->subordinate)
>>>>> pci_walk_bus(bridge->subordinate, cb, userdata);
>>>>> + else if (bridge->rcec)
>>>>> + cb(bridge->rcec, userdata);
>>>>> else
>>>>> cb(bridge, userdata);
>>>>> }
>>>>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>>> pci_dbg(bridge, "broadcast error_detected message\n");
>>>>> if (state == pci_channel_io_frozen) {
>>>>> pci_walk_bridge(bridge, report_frozen_detected, &status);
>>>>> - if (type == PCI_EXP_TYPE_RC_END) {
>>>>> - pci_warn(dev, "subordinate device reset not possible for RCiEP\n");
>>>>> - status = PCI_ERS_RESULT_NONE;
>>>>> - goto failed;
>>>>> - }
>>>>> -
>>>>> status = reset_subordinates(bridge);
>>>>> if (status != PCI_ERS_RESULT_RECOVERED) {
>>>>> pci_warn(bridge, "subordinate device reset failed\n");
>>>>> --
>>>>> 2.28.0
>>>>>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
On Sat, 2020-10-17 at 09:14 -0700, Sean V Kelley wrote:
> On 16 Oct 2020, at 13:30, Bjorn Helgaas wrote:
>
> > [+to Jonathan]
> >
> > On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
> > > From: Qiuxu Zhuo <[email protected]>
> > >
> > > When attempting error recovery for an RCiEP associated with an
> > > RCEC
> > > device,
> > > there needs to be a way to update the Root Error Status, the
> > > Uncorrectable
> > > Error Status and the Uncorrectable Error Severity of the parent
> > > RCEC.
> > > In
> > > some non-native cases in which there is no OS-visible device
> > > associated
> > > with the RCiEP, there is nothing to act upon as the firmware is
> > > acting
> > > before the OS.
> > >
> > > Add handling for the linked RCEC in AER/ERR while taking into
> > > account
> > > non-native cases.
> > >
> > > Co-developed-by: Sean V Kelley <[email protected]>
> > > Link:
> > > https://lore.kernel.org/r/[email protected]
> > > Signed-off-by: Sean V Kelley <[email protected]>
> > > Signed-off-by: Qiuxu Zhuo <[email protected]>
> > > Signed-off-by: Bjorn Helgaas <[email protected]>
> > > Reviewed-by: Jonathan Cameron <[email protected]>
> > > ---
> > > drivers/pci/pcie/aer.c | 53
> > > ++++++++++++++++++++++++++++++------------
> > > drivers/pci/pcie/err.c | 20 ++++++++--------
> > > 2 files changed, 48 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index 65dff5f3457a..083f69b67bfd 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device
> > > *dev)
> > > */
> > > static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> > > {
> > > - int aer = dev->aer_cap;
> > > + int type = pci_pcie_type(dev);
> > > + struct pci_dev *root;
> > > + int aer = 0;
> > > + int rc = 0;
> > > u32 reg32;
> > > - int rc;
> > >
> > > + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> >
> > "type == PCI_EXP_TYPE_RC_END"
>
> Right, I merged your suggested changes which added the type. Will
> correct.
>
> >
> > > + /*
> > > + * The reset should only clear the Root Error
> > > Status
> > > + * of the RCEC. Only perform this for the
> > > + * native case, i.e., an RCEC is present.
> > > + */
> > > + root = dev->rcec;
> > > + else
> > > + root = dev;
> > >
> > > - /* Disable Root's interrupt in response to error messages
> > > */
> > > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> > > ®32);
> > > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> > > reg32);
> > > + if (root)
> > > + aer = dev->aer_cap;
> > >
> > > - rc = pci_bus_error_reset(dev);
> > > - pci_info(dev, "Root Port link has been reset\n");
> > > + if (aer) {
> > > + /* Disable Root's interrupt in response to error
> > > messages */
> > > + pci_read_config_dword(root, aer +
> > > PCI_ERR_ROOT_COMMAND, ®32);
> > > + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> > > + pci_write_config_dword(root, aer +
> > > PCI_ERR_ROOT_COMMAND, reg32);
> >
> > Not directly related to *this* patch, but my assumption was that in
> > the APEI case, the firmware should retain ownership of the AER
> > Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
> > PCI_ERR_ROOT_STATUS.
> >
> > But this code appears to ignore that ownership. Jonathan, you must
> > have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear PCIe
> > Device Status errors only if OS owns AER"). Do you have any
> > insight
> > about this?
> >
> > > - /* Clear Root Error Status */
> > > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS,
> > > ®32);
> > > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS,
> > > reg32);
> > > + /* Clear Root Error Status */
> > > + pci_read_config_dword(root, aer +
> > > PCI_ERR_ROOT_STATUS, ®32);
> > > + pci_write_config_dword(root, aer +
> > > PCI_ERR_ROOT_STATUS, reg32);
> > >
> > > - /* Enable Root Port's interrupt in response to error
> > > messages */
> > > - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> > > ®32);
> > > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > > - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> > > reg32);
> > > + /* Enable Root Port's interrupt in response to
> > > error messages */
> > > + pci_read_config_dword(root, aer +
> > > PCI_ERR_ROOT_COMMAND, ®32);
> > > + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> > > + pci_write_config_dword(root, aer +
> > > PCI_ERR_ROOT_COMMAND, reg32);
> > > + }
> > > +
> > > + if ((type == PCI_EXP_TYPE_RC_EC) || (type ==
> > > PCI_EXP_TYPE_RC_END))
> > > {
> > > + if (pcie_has_flr(root)) {
> > > + rc = pcie_flr(root);
> > > + pci_info(dev, "has been reset (%d)\n",
> > > rc);
> > > + }
> > > + } else {
> > > + rc = pci_bus_error_reset(root);
> >
> > Don't we want "dev" for both the FLR and pci_bus_error_reset()? I
> > think "root == dev" except when dev is an RCiEP. When dev is an
> > RCiEP, "root" is the RCEC (if present), and we want to reset the
> > RCiEP, not the RCEC.
>
> Right, when I did the goto in the earlier incarnation, I always set
> root
> to dev at the start and in the merge it needs to be dev always except
> for the RC_END where RCEC exists. Will change without bringing back
> the
> goto…
>
> + struct pci_dev *root = dev;
>
> …
>
> +non_native:
> + if ((type == PCI_EXP_TYPE_RC_EC) || (type ==
> PCI_EXP_TYPE_RC_END)) {
> + rc = flr_on_rc(root);
> + pci_info(dev, "has been reset (%d)\n", rc);
> + } else {
> + rc = pci_bus_error_reset(root);
> + pci_info(dev, "Root Port link has been reset (%d)\n",
> rc);
> + }
>
>
> >
> > > + pci_info(dev, "Root Port link has been reset
> > > (%d)\n", rc);
> > > + }
> >
> > There are a couple changes here that I think should be split out.
> >
> > Based on my theory that when firmware retains control of AER, the
> > OS
> > should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and
> > any
> > updates to them would have to be done by firmware before we get
> > here,
> > I suggested reordering this:
> >
> > - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> > - do reset
> > - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by
> > firmware?)
> > - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> >
> > to this:
> >
> > - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> > - clear PCI_ERR_ROOT_STATUS
> > - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> > - do reset
> >
> > If my theory is correct, I think we should still reorder this, but:
> >
> > - It's a significant behavior change that deserves its own patch
> > so
> > we can document/bisect/revert.
> >
> > - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error
> > reporting
> > bits. In the new "clear COMMAND, clear STATUS, enable COMMAND"
> > order, it looks superfluous. There's no reason to disable
> > error
> > reporting while clearing the status bits.
> >
> > The current "clear, reset, enable" order suggests that the
> > reset
> > might cause errors that we should ignore. I don't know whether
> > that's the case or not. It dates from 6c2b374d7485 ("PCI-
> > Express
> > AER implemetation: AER core and aerdriver"), which doesn't
> > elaborate.
> >
> > - Should we also test for OS ownership of AER before touching
> > PCI_ERR_ROOT_STATUS?
> >
> > - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I
> > tentatively
> > think we *should* unless we can justify it), that would also
> > deserve its own patch. Possibly (1) remove
> > PCI_ERR_ROOT_COMMAND
> > fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and reset,
> > (3)
> > test for OS ownership of AER (?), (4) the rest of this patch.
>
> You’ve highlighted some good questions.
Reading Ethan's reply and also thinking about separation from an _OSC
perspective perhaps something like this could be done with a check in
aer_root_reset().
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 65dff5f3457a..70bf637042ff 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1357,27 +1357,46 @@ static int aer_probe(struct pcie_device *dev)
*/
static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
{
+ int type = pci_pcie_type(dev);
+ struct pci_dev *root = dev;
int aer = dev->aer_cap;
+ int rc = 0;
u32 reg32;
- int rc;
+ if (!pcie_aer_is_native(dev))
+ return PCI_ERS_RESULT_RECOVERD;
+
+ if (type == PCI_EXP_TYPE_RC_END)
+ /*
+ * The reset should only clear the Root Error Status
+ * of the RCEC. Only perform this for the
+ * native case, i.e., an RCEC is present.
+ */
+ root = dev->rcec;
/* Disable Root's interrupt in response to error messages */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND,
®32);
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
-
- rc = pci_bus_error_reset(dev);
- pci_info(dev, "Root Port link has been reset\n");
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND,
reg32);
/* Clear Root Error Status */
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, ®32);
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS, reg32);
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32);
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32);
/* Enable Root Port's interrupt in response to error messages
*/
- pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, ®32);
+ pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND,
®32);
reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
- pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND, reg32);
+ pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND,
reg32);
+
+ if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END)
{
+ if (pcie_has_flr(dev)) {
+ rc = pcie_flr(dev);
+ pci_info(dev, "has been reset (%d)\n", rc);
+ }
+ } else {
+ rc = pci_bus_error_reset(dev);
+ pci_info(dev, "Root Port link has been reset (%d)\n",
rc);
+ }
>
> I think we should remove the fiddling until we have a clearer picture
> and put that into its own patch.
>
> Sean
> >
> > > return rc ? PCI_ERS_RESULT_DISCONNECT :
> > > PCI_ERS_RESULT_RECOVERED;
> > > }
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index 7883c9791562..cbc5abfe767b 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev
> > > *dev,
> > > void *data)
> > >
> > > /**
> > > * pci_walk_bridge - walk bridges potentially AER affected
> > > - * @bridge: bridge which may be a Port, an RCEC with
> > > associated
> > > RCiEPs,
> > > - * or an RCiEP associated with an RCEC
> > > - * @cb: callback to be called for each device
> > > found
> > > - * @userdata: arbitrary pointer to be passed to callback
> > > + * @bridge bridge which may be an RCEC with associated RCiEPs,
> > > + * or a Port.
> > > + * @cb callback to be called for each device found
> > > + * @userdata arbitrary pointer to be passed to callback.
> > > *
> > > * If the device provided is a bridge, walk the subordinate bus,
> > > including
> > > * any bridged devices on buses under this bus. Call the
> > > provided
> > > callback
> > > @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev
> > > *bridge,
> > > int (*cb)(struct pci_dev *, void *),
> > > void *userdata)
> > > {
> > > + /*
> > > + * In a non-native case where there is no OS-visible
> > > reporting
> > > + * device the bridge will be NULL, i.e., no RCEC, no
> > > Downstream
> > > Port.
> > > + */
> > > if (bridge->subordinate)
> > > pci_walk_bus(bridge->subordinate, cb, userdata);
> > > + else if (bridge->rcec)
> > > + cb(bridge->rcec, userdata);
> > > else
> > > cb(bridge, userdata);
> > > }
> > > @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct
> > > pci_dev
> > > *dev,
> > > pci_dbg(bridge, "broadcast error_detected message\n");
> > > if (state == pci_channel_io_frozen) {
> > > pci_walk_bridge(bridge, report_frozen_detected,
> > > &status);
> > > - if (type == PCI_EXP_TYPE_RC_END) {
> > > - pci_warn(dev, "subordinate device reset
> > > not possible for
> > > RCiEP\n");
> > > - status = PCI_ERS_RESULT_NONE;
> > > - goto failed;
> > > - }
> > > -
> > > status = reset_subordinates(bridge);
> > > if (status != PCI_ERS_RESULT_RECOVERED) {
> > > pci_warn(bridge, "subordinate device
> > > reset failed\n");
> > > --
> > > 2.28.0
> > >
On 19 Oct 2020, at 11:59, Kuppuswamy, Sathyanarayanan wrote:
> On 10/19/20 11:31 AM, Sean V Kelley wrote:
>> On 19 Oct 2020, at 3:49, Ethan Zhao wrote:
>>
>>> On Sat, Oct 17, 2020 at 6:29 AM Bjorn Helgaas <[email protected]>
>>> wrote:
>>>>
>>>> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
>>>> begin with since you're looking at this code too. Particularly
>>>> interested in your thoughts about whether we should be touching
>>>> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own
>>>> AER.]
>>>
>>> aer_root_reset() function has a prefix 'aer_', looks like it's a
>>> function of aer driver, will
>>> only be called by aer driver at runtime. if so it's up to the
>>> owner/aer to know if OSPM is
>>> granted to init. while actually some of the functions and runtime
>>> service of
>>> aer driver is also shared by GHES driver (running time) and DPC
>>> driver
>>> (compiling time ?)
>>> etc. then it is confused now.
>>>
>>> Shall we move some of the shared functions and running time service
>>> to
>>> pci/err.c ?
>>> if so , just like pcie_do_recovery(), it's share by firmware_first
>>> mode GHES
>>> ghes_probe()
>>> ->ghes_irq_func
>>> ->ghes_proc
>>> ->ghes_do_proc()
>>> ->ghes_handle_aer()
>>> ->aer_recover_work_func()
>>> ->pcie_do_recovery()
>>> ->aer_root_reset()
>>>
>>> and aer driver etc. if aer wants to do some access might conflict
>>> with firmware(or
>>> firmware in embedded controller) should check _OSC_ etc first.
>>> blindly issue
>>> PCI_ERR_ROOT_COMMAND or clear PCI_ERR_ROOT_STATUS *likely*
>>> cause errors by error handling itself.
>>
>> If _OSC negotiation ends up with FW being in control of AER, that
>> means OS is not in charge and should not be messing with AER I guess.
>> That seems appropriate to me then.
> But APEI based notification is more like a hybrid approach (frimware
> first detects the
> error and notifies OS). Since spec does not clarify what OS is allowed
> to do, its bit of a
> gray area now. My point is, since firmware allows OS to process the
> error by sending
> the notification, I think its OK to clear the status once the error is
> handled.
I don’t disagree as long as AER is granted to the OS via _OSC. But if
it’s not granted explicitly via _OSC even in the APEI case where
it’s either an SCI or NMI and not an MSI, I’m unsure whether the OS
should be touching those registers.
Sean
>>
>> Thanks,
>>
>> Sean
>>
>>
>>
>>>
>>> Thanks,
>>> Ethan
>>>
>>>>
>>>> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
>>>>> [+to Jonathan]
>>>>>
>>>>> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
>>>>>> From: Qiuxu Zhuo <[email protected]>
>>>>>>
>>>>>> When attempting error recovery for an RCiEP associated with an
>>>>>> RCEC device,
>>>>>> there needs to be a way to update the Root Error Status, the
>>>>>> Uncorrectable
>>>>>> Error Status and the Uncorrectable Error Severity of the parent
>>>>>> RCEC. In
>>>>>> some non-native cases in which there is no OS-visible device
>>>>>> associated
>>>>>> with the RCiEP, there is nothing to act upon as the firmware is
>>>>>> acting
>>>>>> before the OS.
>>>>>>
>>>>>> Add handling for the linked RCEC in AER/ERR while taking into
>>>>>> account
>>>>>> non-native cases.
>>>>>>
>>>>>> Co-developed-by: Sean V Kelley <[email protected]>
>>>>>> Link:
>>>>>> https://lore.kernel.org/r/[email protected]
>>>>>> Signed-off-by: Sean V Kelley <[email protected]>
>>>>>> Signed-off-by: Qiuxu Zhuo <[email protected]>
>>>>>> Signed-off-by: Bjorn Helgaas <[email protected]>
>>>>>> Reviewed-by: Jonathan Cameron <[email protected]>
>>>>>> ---
>>>>>> drivers/pci/pcie/aer.c | 53
>>>>>> ++++++++++++++++++++++++++++++------------
>>>>>> drivers/pci/pcie/err.c | 20 ++++++++--------
>>>>>> 2 files changed, 48 insertions(+), 25 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>>>> index 65dff5f3457a..083f69b67bfd 100644
>>>>>> --- a/drivers/pci/pcie/aer.c
>>>>>> +++ b/drivers/pci/pcie/aer.c
>>>>>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device
>>>>>> *dev)
>>>>>> */
>>>>>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
>>>>>> {
>>>>>> - int aer = dev->aer_cap;
>>>>>> + int type = pci_pcie_type(dev);
>>>>>> + struct pci_dev *root;
>>>>>> + int aer = 0;
>>>>>> + int rc = 0;
>>>>>> u32 reg32;
>>>>>> - int rc;
>>>>>>
>>>>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
>>>>>
>>>>> "type == PCI_EXP_TYPE_RC_END"
>>>>>
>>>>>> + /*
>>>>>> + * The reset should only clear the Root
>>>>>> Error Status
>>>>>> + * of the RCEC. Only perform this for the
>>>>>> + * native case, i.e., an RCEC is present.
>>>>>> + */
>>>>>> + root = dev->rcec;
>>>>>> + else
>>>>>> + root = dev;
>>>>>>
>>>>>> - /* Disable Root's interrupt in response to error messages
>>>>>> */
>>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
>>>>>> ®32);
>>>>>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
>>>>>> reg32);
>>>>>> + if (root)
>>>>>> + aer = dev->aer_cap;
>>>>>>
>>>>>> - rc = pci_bus_error_reset(dev);
>>>>>> - pci_info(dev, "Root Port link has been reset\n");
>>>>>> + if (aer) {
>>>>>> + /* Disable Root's interrupt in response to
>>>>>> error messages */
>>>>>> + pci_read_config_dword(root, aer +
>>>>>> PCI_ERR_ROOT_COMMAND, ®32);
>>>>>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
>>>>>> + pci_write_config_dword(root, aer +
>>>>>> PCI_ERR_ROOT_COMMAND, reg32);
>>>>>
>>>>> Not directly related to *this* patch, but my assumption was that
>>>>> in
>>>>> the APEI case, the firmware should retain ownership of the AER
>>>>> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
>>>>> PCI_ERR_ROOT_STATUS.
>>>>>
>>>>> But this code appears to ignore that ownership. Jonathan, you
>>>>> must
>>>>> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear
>>>>> PCIe
>>>>> Device Status errors only if OS owns AER"). Do you have any
>>>>> insight
>>>>> about this?
>>>>>
>>>>>> - /* Clear Root Error Status */
>>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS,
>>>>>> ®32);
>>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS,
>>>>>> reg32);
>>>>>> + /* Clear Root Error Status */
>>>>>> + pci_read_config_dword(root, aer +
>>>>>> PCI_ERR_ROOT_STATUS, ®32);
>>>>>> + pci_write_config_dword(root, aer +
>>>>>> PCI_ERR_ROOT_STATUS, reg32);
>>>>>>
>>>>>> - /* Enable Root Port's interrupt in response to error
>>>>>> messages */
>>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
>>>>>> ®32);
>>>>>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
>>>>>> reg32);
>>>>>> + /* Enable Root Port's interrupt in response
>>>>>> to error messages */
>>>>>> + pci_read_config_dword(root, aer +
>>>>>> PCI_ERR_ROOT_COMMAND, ®32);
>>>>>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
>>>>>> + pci_write_config_dword(root, aer +
>>>>>> PCI_ERR_ROOT_COMMAND, reg32);
>>>>>> + }
>>>>>> +
>>>>>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type ==
>>>>>> PCI_EXP_TYPE_RC_END)) {
>>>>>> + if (pcie_has_flr(root)) {
>>>>>> + rc = pcie_flr(root);
>>>>>> + pci_info(dev, "has been
>>>>>> reset (%d)\n", rc);
>>>>>> + }
>>>>>> + } else {
>>>>>> + rc = pci_bus_error_reset(root);
>>>>>
>>>>> Don't we want "dev" for both the FLR and pci_bus_error_reset()?
>>>>> I
>>>>> think "root == dev" except when dev is an RCiEP. When dev is an
>>>>> RCiEP, "root" is the RCEC (if present), and we want to reset the
>>>>> RCiEP, not the RCEC.
>>>>>
>>>>>> + pci_info(dev, "Root Port link has been
>>>>>> reset (%d)\n", rc);
>>>>>> + }
>>>>>
>>>>> There are a couple changes here that I think should be split out.
>>>>>
>>>>> Based on my theory that when firmware retains control of AER, the
>>>>> OS
>>>>> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and
>>>>> any
>>>>> updates to them would have to be done by firmware before we get
>>>>> here,
>>>>> I suggested reordering this:
>>>>>
>>>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>>> - do reset
>>>>> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by
>>>>> firmware?)
>>>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>>>
>>>>> to this:
>>>>>
>>>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>>> - clear PCI_ERR_ROOT_STATUS
>>>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
>>>>> - do reset
>>>>>
>>>>> If my theory is correct, I think we should still reorder this,
>>>>> but:
>>>>>
>>>>> - It's a significant behavior change that deserves its own
>>>>> patch so
>>>>> we can document/bisect/revert.
>>>>>
>>>>> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error
>>>>> reporting
>>>>> bits. In the new "clear COMMAND, clear STATUS, enable
>>>>> COMMAND"
>>>>> order, it looks superfluous. There's no reason to disable
>>>>> error
>>>>> reporting while clearing the status bits.
>>>>>
>>>>> The current "clear, reset, enable" order suggests that the
>>>>> reset
>>>>> might cause errors that we should ignore. I don't know
>>>>> whether
>>>>> that's the case or not. It dates from 6c2b374d7485
>>>>> ("PCI-Express
>>>>> AER implemetation: AER core and aerdriver"), which doesn't
>>>>> elaborate.
>>>>>
>>>>> - Should we also test for OS ownership of AER before touching
>>>>> PCI_ERR_ROOT_STATUS?
>>>>>
>>>>> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I
>>>>> tentatively
>>>>> think we *should* unless we can justify it), that would
>>>>> also
>>>>> deserve its own patch. Possibly (1) remove
>>>>> PCI_ERR_ROOT_COMMAND
>>>>> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and
>>>>> reset, (3)
>>>>> test for OS ownership of AER (?), (4) the rest of this
>>>>> patch.
>>>>>
>>>>>> return rc ? PCI_ERS_RESULT_DISCONNECT :
>>>>>> PCI_ERS_RESULT_RECOVERED;
>>>>>> }
>>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>>>> index 7883c9791562..cbc5abfe767b 100644
>>>>>> --- a/drivers/pci/pcie/err.c
>>>>>> +++ b/drivers/pci/pcie/err.c
>>>>>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev
>>>>>> *dev, void *data)
>>>>>>
>>>>>> /**
>>>>>> * pci_walk_bridge - walk bridges potentially AER affected
>>>>>> - * @bridge: bridge which may be a Port, an RCEC
>>>>>> with associated RCiEPs,
>>>>>> - * or an RCiEP associated with an RCEC
>>>>>> - * @cb: callback to be called for each
>>>>>> device found
>>>>>> - * @userdata: arbitrary pointer to be passed to
>>>>>> callback
>>>>>> + * @bridge bridge which may be an RCEC with associated
>>>>>> RCiEPs,
>>>>>> + * or a Port.
>>>>>> + * @cb callback to be called for each device found
>>>>>> + * @userdata arbitrary pointer to be passed to callback.
>>>>>> *
>>>>>> * If the device provided is a bridge, walk the subordinate
>>>>>> bus, including
>>>>>> * any bridged devices on buses under this bus. Call the
>>>>>> provided callback
>>>>>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev
>>>>>> *bridge,
>>>>>> int (*cb)(struct
>>>>>> pci_dev *, void *),
>>>>>> void *userdata)
>>>>>> {
>>>>>> + /*
>>>>>> + * In a non-native case where there is no OS-visible
>>>>>> reporting
>>>>>> + * device the bridge will be NULL, i.e., no RCEC, no
>>>>>> Downstream Port.
>>>>>> + */
>>>>>> if (bridge->subordinate)
>>>>>> pci_walk_bus(bridge->subordinate, cb,
>>>>>> userdata);
>>>>>> + else if (bridge->rcec)
>>>>>> + cb(bridge->rcec, userdata);
>>>>>> else
>>>>>> cb(bridge, userdata);
>>>>>> }
>>>>>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct
>>>>>> pci_dev *dev,
>>>>>> pci_dbg(bridge, "broadcast error_detected message\n");
>>>>>> if (state == pci_channel_io_frozen) {
>>>>>> pci_walk_bridge(bridge,
>>>>>> report_frozen_detected, &status);
>>>>>> - if (type == PCI_EXP_TYPE_RC_END) {
>>>>>> - pci_warn(dev, "subordinate
>>>>>> device reset not possible for RCiEP\n");
>>>>>> - status =
>>>>>> PCI_ERS_RESULT_NONE;
>>>>>> - goto failed;
>>>>>> - }
>>>>>> -
>>>>>> status = reset_subordinates(bridge);
>>>>>> if (status != PCI_ERS_RESULT_RECOVERED) {
>>>>>> pci_warn(bridge,
>>>>>> "subordinate device reset failed\n");
>>>>>> --
>>>>>> 2.28.0
>>>>>>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
On Mon, 19 Oct 2020 13:50:17 -0700
Sean V Kelley <[email protected]> wrote:
> On 19 Oct 2020, at 11:59, Kuppuswamy, Sathyanarayanan wrote:
>
> > On 10/19/20 11:31 AM, Sean V Kelley wrote:
> >> On 19 Oct 2020, at 3:49, Ethan Zhao wrote:
> >>
> >>> On Sat, Oct 17, 2020 at 6:29 AM Bjorn Helgaas <[email protected]>
> >>> wrote:
> >>>>
> >>>> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
> >>>> begin with since you're looking at this code too. Particularly
> >>>> interested in your thoughts about whether we should be touching
> >>>> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own
> >>>> AER.]
> >>>
> >>> aer_root_reset() function has a prefix 'aer_', looks like it's a
> >>> function of aer driver, will
> >>> only be called by aer driver at runtime. if so it's up to the
> >>> owner/aer to know if OSPM is
> >>> granted to init. while actually some of the functions and runtime
> >>> service of
> >>> aer driver is also shared by GHES driver (running time) and DPC
> >>> driver
> >>> (compiling time ?)
> >>> etc. then it is confused now.
> >>>
> >>> Shall we move some of the shared functions and running time service
> >>> to
> >>> pci/err.c ?
> >>> if so , just like pcie_do_recovery(), it's share by firmware_first
> >>> mode GHES
> >>> ghes_probe()
> >>> ->ghes_irq_func
> >>> ->ghes_proc
> >>> ->ghes_do_proc()
> >>> ->ghes_handle_aer()
> >>> ->aer_recover_work_func()
> >>> ->pcie_do_recovery()
> >>> ->aer_root_reset()
> >>>
> >>> and aer driver etc. if aer wants to do some access might conflict
> >>> with firmware(or
> >>> firmware in embedded controller) should check _OSC_ etc first.
> >>> blindly issue
> >>> PCI_ERR_ROOT_COMMAND or clear PCI_ERR_ROOT_STATUS *likely*
> >>> cause errors by error handling itself.
> >>
> >> If _OSC negotiation ends up with FW being in control of AER, that
> >> means OS is not in charge and should not be messing with AER I guess.
> >> That seems appropriate to me then.
> > But APEI based notification is more like a hybrid approach (frimware
> > first detects the
> > error and notifies OS). Since spec does not clarify what OS is allowed
> > to do, its bit of a
> > gray area now. My point is, since firmware allows OS to process the
> > error by sending
> > the notification, I think its OK to clear the status once the error is
> > handled.
>
> I don’t disagree as long as AER is granted to the OS via _OSC. But if
> it’s not granted explicitly via _OSC even in the APEI case where
> it’s either an SCI or NMI and not an MSI, I’m unsure whether the OS
> should be touching those registers.
My assumption was indeed this. If AER hasn't been granted to the OS,
it shouldn't be doing anything involving AER itself. It should constrain
itself to dealing with the End Points etc due to the need there for
driver interaction.
I fully agree with the comment that the specifications aren't entirely
clear on these cases.
It is possible that no one is currently generating the particular
combination of severity bits in the APEI path to actually hit this.
It requires the outer record to be marked recoverable, but the inner
part to be marked fatal. Kind of an odd mix.
In the GHES case, you get to this path by having a
Generic Error Status Block - recoverable (must not be fatal to avoid panic()
in APEI layer) containing one more more Generic Error Blocks, one of
which is fatal.
Response of our firmware team is that this particularly combination is
probably crazy.
So good to clean up this corner, but it is probably not a problem
anyone has actually hit so far.
Jonathan
>
> Sean
>
> >>
> >> Thanks,
> >>
> >> Sean
> >>
> >>
> >>
> >>>
> >>> Thanks,
> >>> Ethan
> >>>
> >>>>
> >>>> On Fri, Oct 16, 2020 at 03:30:37PM -0500, Bjorn Helgaas wrote:
> >>>>> [+to Jonathan]
> >>>>>
> >>>>> On Thu, Oct 15, 2020 at 05:11:10PM -0700, Sean V Kelley wrote:
> >>>>>> From: Qiuxu Zhuo <[email protected]>
> >>>>>>
> >>>>>> When attempting error recovery for an RCiEP associated with an
> >>>>>> RCEC device,
> >>>>>> there needs to be a way to update the Root Error Status, the
> >>>>>> Uncorrectable
> >>>>>> Error Status and the Uncorrectable Error Severity of the parent
> >>>>>> RCEC. In
> >>>>>> some non-native cases in which there is no OS-visible device
> >>>>>> associated
> >>>>>> with the RCiEP, there is nothing to act upon as the firmware is
> >>>>>> acting
> >>>>>> before the OS.
> >>>>>>
> >>>>>> Add handling for the linked RCEC in AER/ERR while taking into
> >>>>>> account
> >>>>>> non-native cases.
> >>>>>>
> >>>>>> Co-developed-by: Sean V Kelley <[email protected]>
> >>>>>> Link:
> >>>>>> https://lore.kernel.org/r/[email protected]
> >>>>>> Signed-off-by: Sean V Kelley <[email protected]>
> >>>>>> Signed-off-by: Qiuxu Zhuo <[email protected]>
> >>>>>> Signed-off-by: Bjorn Helgaas <[email protected]>
> >>>>>> Reviewed-by: Jonathan Cameron <[email protected]>
> >>>>>> ---
> >>>>>> drivers/pci/pcie/aer.c | 53
> >>>>>> ++++++++++++++++++++++++++++++------------
> >>>>>> drivers/pci/pcie/err.c | 20 ++++++++--------
> >>>>>> 2 files changed, 48 insertions(+), 25 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >>>>>> index 65dff5f3457a..083f69b67bfd 100644
> >>>>>> --- a/drivers/pci/pcie/aer.c
> >>>>>> +++ b/drivers/pci/pcie/aer.c
> >>>>>> @@ -1357,27 +1357,50 @@ static int aer_probe(struct pcie_device
> >>>>>> *dev)
> >>>>>> */
> >>>>>> static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
> >>>>>> {
> >>>>>> - int aer = dev->aer_cap;
> >>>>>> + int type = pci_pcie_type(dev);
> >>>>>> + struct pci_dev *root;
> >>>>>> + int aer = 0;
> >>>>>> + int rc = 0;
> >>>>>> u32 reg32;
> >>>>>> - int rc;
> >>>>>>
> >>>>>> + if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
> >>>>>
> >>>>> "type == PCI_EXP_TYPE_RC_END"
> >>>>>
> >>>>>> + /*
> >>>>>> + * The reset should only clear the Root
> >>>>>> Error Status
> >>>>>> + * of the RCEC. Only perform this for the
> >>>>>> + * native case, i.e., an RCEC is present.
> >>>>>> + */
> >>>>>> + root = dev->rcec;
> >>>>>> + else
> >>>>>> + root = dev;
> >>>>>>
> >>>>>> - /* Disable Root's interrupt in response to error messages
> >>>>>> */
> >>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> >>>>>> ®32);
> >>>>>> - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> >>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> >>>>>> reg32);
> >>>>>> + if (root)
> >>>>>> + aer = dev->aer_cap;
> >>>>>>
> >>>>>> - rc = pci_bus_error_reset(dev);
> >>>>>> - pci_info(dev, "Root Port link has been reset\n");
> >>>>>> + if (aer) {
> >>>>>> + /* Disable Root's interrupt in response to
> >>>>>> error messages */
> >>>>>> + pci_read_config_dword(root, aer +
> >>>>>> PCI_ERR_ROOT_COMMAND, ®32);
> >>>>>> + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
> >>>>>> + pci_write_config_dword(root, aer +
> >>>>>> PCI_ERR_ROOT_COMMAND, reg32);
> >>>>>
> >>>>> Not directly related to *this* patch, but my assumption was that
> >>>>> in
> >>>>> the APEI case, the firmware should retain ownership of the AER
> >>>>> Capability, so the OS should not touch PCI_ERR_ROOT_COMMAND and
> >>>>> PCI_ERR_ROOT_STATUS.
> >>>>>
> >>>>> But this code appears to ignore that ownership. Jonathan, you
> >>>>> must
> >>>>> have looked at this recently for 068c29a248b6 ("PCI/ERR: Clear
> >>>>> PCIe
> >>>>> Device Status errors only if OS owns AER"). Do you have any
> >>>>> insight
> >>>>> about this?
> >>>>>
> >>>>>> - /* Clear Root Error Status */
> >>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_STATUS,
> >>>>>> ®32);
> >>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_STATUS,
> >>>>>> reg32);
> >>>>>> + /* Clear Root Error Status */
> >>>>>> + pci_read_config_dword(root, aer +
> >>>>>> PCI_ERR_ROOT_STATUS, ®32);
> >>>>>> + pci_write_config_dword(root, aer +
> >>>>>> PCI_ERR_ROOT_STATUS, reg32);
> >>>>>>
> >>>>>> - /* Enable Root Port's interrupt in response to error
> >>>>>> messages */
> >>>>>> - pci_read_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> >>>>>> ®32);
> >>>>>> - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> >>>>>> - pci_write_config_dword(dev, aer + PCI_ERR_ROOT_COMMAND,
> >>>>>> reg32);
> >>>>>> + /* Enable Root Port's interrupt in response
> >>>>>> to error messages */
> >>>>>> + pci_read_config_dword(root, aer +
> >>>>>> PCI_ERR_ROOT_COMMAND, ®32);
> >>>>>> + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK;
> >>>>>> + pci_write_config_dword(root, aer +
> >>>>>> PCI_ERR_ROOT_COMMAND, reg32);
> >>>>>> + }
> >>>>>> +
> >>>>>> + if ((type == PCI_EXP_TYPE_RC_EC) || (type ==
> >>>>>> PCI_EXP_TYPE_RC_END)) {
> >>>>>> + if (pcie_has_flr(root)) {
> >>>>>> + rc = pcie_flr(root);
> >>>>>> + pci_info(dev, "has been
> >>>>>> reset (%d)\n", rc);
> >>>>>> + }
> >>>>>> + } else {
> >>>>>> + rc = pci_bus_error_reset(root);
> >>>>>
> >>>>> Don't we want "dev" for both the FLR and pci_bus_error_reset()?
> >>>>> I
> >>>>> think "root == dev" except when dev is an RCiEP. When dev is an
> >>>>> RCiEP, "root" is the RCEC (if present), and we want to reset the
> >>>>> RCiEP, not the RCEC.
> >>>>>
> >>>>>> + pci_info(dev, "Root Port link has been
> >>>>>> reset (%d)\n", rc);
> >>>>>> + }
> >>>>>
> >>>>> There are a couple changes here that I think should be split out.
> >>>>>
> >>>>> Based on my theory that when firmware retains control of AER, the
> >>>>> OS
> >>>>> should not touch PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS, and
> >>>>> any
> >>>>> updates to them would have to be done by firmware before we get
> >>>>> here,
> >>>>> I suggested reordering this:
> >>>>>
> >>>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> >>>>> - do reset
> >>>>> - clear PCI_ERR_ROOT_STATUS (for APEI, presumably done by
> >>>>> firmware?)
> >>>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> >>>>>
> >>>>> to this:
> >>>>>
> >>>>> - clear PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> >>>>> - clear PCI_ERR_ROOT_STATUS
> >>>>> - enable PCI_ERR_ROOT_COMMAND ROOT_PORT_INTR_ON_MESG_MASK
> >>>>> - do reset
> >>>>>
> >>>>> If my theory is correct, I think we should still reorder this,
> >>>>> but:
> >>>>>
> >>>>> - It's a significant behavior change that deserves its own
> >>>>> patch so
> >>>>> we can document/bisect/revert.
> >>>>>
> >>>>> - I'm not sure why we clear the PCI_ERR_ROOT_COMMAND error
> >>>>> reporting
> >>>>> bits. In the new "clear COMMAND, clear STATUS, enable
> >>>>> COMMAND"
> >>>>> order, it looks superfluous. There's no reason to disable
> >>>>> error
> >>>>> reporting while clearing the status bits.
> >>>>>
> >>>>> The current "clear, reset, enable" order suggests that the
> >>>>> reset
> >>>>> might cause errors that we should ignore. I don't know
> >>>>> whether
> >>>>> that's the case or not. It dates from 6c2b374d7485
> >>>>> ("PCI-Express
> >>>>> AER implemetation: AER core and aerdriver"), which doesn't
> >>>>> elaborate.
> >>>>>
> >>>>> - Should we also test for OS ownership of AER before touching
> >>>>> PCI_ERR_ROOT_STATUS?
> >>>>>
> >>>>> - If we remove the PCI_ERR_ROOT_COMMAND fiddling (and I
> >>>>> tentatively
> >>>>> think we *should* unless we can justify it), that would
> >>>>> also
> >>>>> deserve its own patch. Possibly (1) remove
> >>>>> PCI_ERR_ROOT_COMMAND
> >>>>> fiddling, (2) reorder PCI_ERR_ROOT_STATUS clearing and
> >>>>> reset, (3)
> >>>>> test for OS ownership of AER (?), (4) the rest of this
> >>>>> patch.
> >>>>>
> >>>>>> return rc ? PCI_ERS_RESULT_DISCONNECT :
> >>>>>> PCI_ERS_RESULT_RECOVERED;
> >>>>>> }
> >>>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> >>>>>> index 7883c9791562..cbc5abfe767b 100644
> >>>>>> --- a/drivers/pci/pcie/err.c
> >>>>>> +++ b/drivers/pci/pcie/err.c
> >>>>>> @@ -148,10 +148,10 @@ static int report_resume(struct pci_dev
> >>>>>> *dev, void *data)
> >>>>>>
> >>>>>> /**
> >>>>>> * pci_walk_bridge - walk bridges potentially AER affected
> >>>>>> - * @bridge: bridge which may be a Port, an RCEC
> >>>>>> with associated RCiEPs,
> >>>>>> - * or an RCiEP associated with an RCEC
> >>>>>> - * @cb: callback to be called for each
> >>>>>> device found
> >>>>>> - * @userdata: arbitrary pointer to be passed to
> >>>>>> callback
> >>>>>> + * @bridge bridge which may be an RCEC with associated
> >>>>>> RCiEPs,
> >>>>>> + * or a Port.
> >>>>>> + * @cb callback to be called for each device found
> >>>>>> + * @userdata arbitrary pointer to be passed to callback.
> >>>>>> *
> >>>>>> * If the device provided is a bridge, walk the subordinate
> >>>>>> bus, including
> >>>>>> * any bridged devices on buses under this bus. Call the
> >>>>>> provided callback
> >>>>>> @@ -164,8 +164,14 @@ static void pci_walk_bridge(struct pci_dev
> >>>>>> *bridge,
> >>>>>> int (*cb)(struct
> >>>>>> pci_dev *, void *),
> >>>>>> void *userdata)
> >>>>>> {
> >>>>>> + /*
> >>>>>> + * In a non-native case where there is no OS-visible
> >>>>>> reporting
> >>>>>> + * device the bridge will be NULL, i.e., no RCEC, no
> >>>>>> Downstream Port.
> >>>>>> + */
> >>>>>> if (bridge->subordinate)
> >>>>>> pci_walk_bus(bridge->subordinate, cb,
> >>>>>> userdata);
> >>>>>> + else if (bridge->rcec)
> >>>>>> + cb(bridge->rcec, userdata);
> >>>>>> else
> >>>>>> cb(bridge, userdata);
> >>>>>> }
> >>>>>> @@ -194,12 +200,6 @@ pci_ers_result_t pcie_do_recovery(struct
> >>>>>> pci_dev *dev,
> >>>>>> pci_dbg(bridge, "broadcast error_detected message\n");
> >>>>>> if (state == pci_channel_io_frozen) {
> >>>>>> pci_walk_bridge(bridge,
> >>>>>> report_frozen_detected, &status);
> >>>>>> - if (type == PCI_EXP_TYPE_RC_END) {
> >>>>>> - pci_warn(dev, "subordinate
> >>>>>> device reset not possible for RCiEP\n");
> >>>>>> - status =
> >>>>>> PCI_ERS_RESULT_NONE;
> >>>>>> - goto failed;
> >>>>>> - }
> >>>>>> -
> >>>>>> status = reset_subordinates(bridge);
> >>>>>> if (status != PCI_ERS_RESULT_RECOVERED) {
> >>>>>> pci_warn(bridge,
> >>>>>> "subordinate device reset failed\n");
> >>>>>> --
> >>>>>> 2.28.0
> >>>>>>
> > --
> > Sathyanarayanan Kuppuswamy
> > Linux Kernel Developer
On Tue, Oct 20, 2020 at 01:59:20PM +0100, Jonathan Cameron wrote:
> On Mon, 19 Oct 2020 13:50:17 -0700
> Sean V Kelley <[email protected]> wrote:
> > On 19 Oct 2020, at 11:59, Kuppuswamy, Sathyanarayanan wrote:
> > > On 10/19/20 11:31 AM, Sean V Kelley wrote:
> > >> On 19 Oct 2020, at 3:49, Ethan Zhao wrote:
> > >>> On Sat, Oct 17, 2020 at 6:29 AM Bjorn Helgaas <[email protected]>
> > >>> wrote:
> > >>>>
> > >>>> [+cc Christoph, Ethan, Sinan, Keith; sorry should have cc'd you to
> > >>>> begin with since you're looking at this code too. Particularly
> > >>>> interested in your thoughts about whether we should be touching
> > >>>> PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS when we don't own
> > >>>> AER.]
> > >>>
> > >>> aer_root_reset() function has a prefix 'aer_', looks like it's a
> > >>> function of aer driver, will
> > >>> only be called by aer driver at runtime. if so it's up to the
> > >>> owner/aer to know if OSPM is
> > >>> granted to init. while actually some of the functions and runtime
> > >>> service of
> > >>> aer driver is also shared by GHES driver (running time) and DPC
> > >>> driver
> > >>> (compiling time ?)
> > >>> etc. then it is confused now.
> > >>>
> > >>> Shall we move some of the shared functions and running time service
> > >>> to
> > >>> pci/err.c ?
> > >>> if so , just like pcie_do_recovery(), it's share by firmware_first
> > >>> mode GHES
> > >>> ghes_probe()
> > >>> ->ghes_irq_func
> > >>> ->ghes_proc
> > >>> ->ghes_do_proc()
> > >>> ->ghes_handle_aer()
> > >>> ->aer_recover_work_func()
> > >>> ->pcie_do_recovery()
> > >>> ->aer_root_reset()
> > >>>
> > >>> and aer driver etc. if aer wants to do some access might conflict
> > >>> with firmware(or
> > >>> firmware in embedded controller) should check _OSC_ etc first.
> > >>> blindly issue
> > >>> PCI_ERR_ROOT_COMMAND or clear PCI_ERR_ROOT_STATUS *likely*
> > >>> cause errors by error handling itself.
> > >>
> > >> If _OSC negotiation ends up with FW being in control of AER, that
> > >> means OS is not in charge and should not be messing with AER I guess.
> > >> That seems appropriate to me then.
> > >
> > > But APEI based notification is more like a hybrid approach (frimware
> > > first detects the
> > > error and notifies OS). Since spec does not clarify what OS is allowed
> > > to do, its bit of a
> > > gray area now. My point is, since firmware allows OS to process the
> > > error by sending
> > > the notification, I think its OK to clear the status once the error is
> > > handled.
> >
> > I don’t disagree as long as AER is granted to the OS via _OSC. But if
> > it’s not granted explicitly via _OSC even in the APEI case where
> > it’s either an SCI or NMI and not an MSI, I’m unsure whether the OS
> > should be touching those registers.
>
> My assumption was indeed this. If AER hasn't been granted to the OS,
> it shouldn't be doing anything involving AER itself. It should constrain
> itself to dealing with the End Points etc due to the need there for
> driver interaction.
>
> I fully agree with the comment that the specifications aren't entirely
> clear on these cases.
>
> It is possible that no one is currently generating the particular
> combination of severity bits in the APEI path to actually hit this.
> It requires the outer record to be marked recoverable, but the inner
> part to be marked fatal. Kind of an odd mix.
>
> In the GHES case, you get to this path by having a
> Generic Error Status Block - recoverable (must not be fatal to avoid panic()
> in APEI layer) containing one more more Generic Error Blocks, one of
> which is fatal.
>
> Response of our firmware team is that this particularly combination is
> probably crazy.
Thanks a lot for researching this and outlining these details. I
hadn't worked all that out. It makes me a lot less worried about
breaking something if we tweak this.
> So good to clean up this corner, but it is probably not a problem
> anyone has actually hit so far.
IMO if the OS has not been granted AER control via _OSC, it shouldn't
be touching these registers. I don't want to speculate based on what
the intent might have been with APEI. If the intent was that the OS
should write those registers even if it doesn't own the AER
capability, that could easily have been put in the spec.
IIUC APEI enables cases where the device with Root Error Command and
Root Error Status registers (or device-specific equivalents) is not
even visible to the OS. In those cases the OS *cannot* fiddle with
them.
I assume APEI tells us about an error with an Endpoint. I do not
think we should be groping around for an an upstream device and poking
things in it. I'm not even 100% comfortable with the fact that we
find an upstream device and reset all devices below it. Maybe that's
OK as a "bigger hammer," but I don't know about it being the default
approach.
I was hoping to get this series merged for v5.10, but I don't think
it's really practical to merge it at this stage with four days left in
the merge window. When we do merge this, I propose tightening up
aer_root_reset() along these lines as preliminary patches, since this
has nothing to do with RCEC/RCiEP, and if they *do* cause any issues,
I don't want these patches to be implicated.
Bjorn