This started as a nudge from Keith, who pointed out that it doesn't make sense
to disable AER services when only one device has a FIRMWARE_FIRST HEST.
I won't re-phrase the points in the original patch [1]. The patch started a
long discussion in the ACPI Software Working Group (ASWG). The nearly unanimous
conclusion is that my original interpretation is correct.
I'd like to quote one of the tables that was produced as part of that
conversation:
(_OSC AER Control, HEST AER Structure FFS) = (0, 0)
* OSPM is prevented from writing to the PCI Express AER registers.
* OSPM has no guidance on how AER errors are being handled – but it
does know that it is not in control of AER registers. PCI-e errors
that make it to the OS (via NMI, etc) would be treated as spurious
since access to the AER registers isn’t allowed for proper sourcing.
(_OSC AER Control, HEST AER Structure FFS) = (0, 1)
* OSPM is prevented from writing to the PCI Express AER registers.
* OSPM is being given guidance that Firmware is handling AER errors and
those interrupts are routed to the platform. Firmware may pass along
error information via GHES
(_OSC AER Control, HEST AER Structure FFS) = (0, Does not exist)
* OSPM is prevented from writing to the PCI Express AER registers.
* OSPM has no guidance on how AER errors are being handled – but it
does know that it is not in control of AER registers. PCI-e errors
that make it to the OS (via NMI, etc) would be treated as spurious
since access to the AER registers isn’t allowed for proper sourcing.
(_OSC AER Control, HEST AER Structure FFS) = (1, 0)
* OSPM is in control of writing to the PCI Express AER registers.
* OSPM is being given guidance that AER errors will interrupt the OS
directly and that the OS is expected to handle all AER capability
structure read/clears for the devices with this attribute (or all if
the Global Bit is set.)
(_OSC AER Control, HEST AER Structure FFS) = (1, 1)
* OSPM is in control of writing to the PCI Express AER registers.
* OSPM is being given guidance that although OS is in control of AER
read/writes – the actual interrupt is being routed to the platform
first.
* Subsequent fields with masks/enables should be performed by the OS
during initialization on behalf of firmware. These are to be honoured
in this mode because with FF, the firmware needs to be able to handle
the errors it expects and not be given errors it was not expecting to
handle.
* Firmware may pass along error information via GHES, or generate an OS
interrupt and allow the OS to interrogate AER status directly via the
AER capability structures.
(_OSC AER Control, HEST AER Structure FFS) = (0, Does not exist)
* OSPM is in control of writing to the PCI Express AER registers.
* OSPM has no guidance from the platform and is in complete control of
AER error handling.
There may be one caveat. Someone mentioned in the original discussions that
there may exist machines which make the assumption that HEST is authoritative,
but did not identify any such machine. We should keep in mind that they may
require a quirk.
Alex
[1] https://lkml.org/lkml/2018/11/16/202
Changes since v1:
* Started 6-month conversation in ASWG
* Re-phrased commit message to reflect some of the points in ASWG discussion
Alexandru Gagniuc (2):
PCI/AER: Do not use APEI/HEST to disable AER services globally
PCI/AER: Determine AER ownership based on _OSC instead of HEST
drivers/acpi/pci_root.c | 9 +----
drivers/pci/pcie/aer.c | 82 ++--------------------------------------
include/linux/pci-acpi.h | 6 ---
3 files changed, 5 insertions(+), 92 deletions(-)
--
2.19.2
HEST is used to describe the meaning of errors received as part of ACPI
Platform Error Interfaces (APEI), however the correct way to determine
AER ownership is the _OSC method.
The ACPI spec allows _OSC and HEST to say things that might not make
sense on a specific hardware implementation.
For example _OSC can say "OS has control" on the root bus, while HEST
says "This PCIe device is firmware-first". This is fine when the
platform can differentiate error sources at the root complex. On x86
platforms, AER errors can be routed to either to an MSI vector
(native AER), or to SMM (firmware-first). As this is done at the root
complex level, the example above would not make sense.
Notice that in neither case is it correct to go to HEST to determine
AER ownership. This is the conclusion of a six-months discussion in
the ACPI Software Working Group.
pci_dev->__aer_firmware_first is used to prevent modification of AER
registers when firmware owns AER. This is synonymous with the AER
ownership negotiated during _OSC. Thus _OSC is the correct way to
use to set this flag, not HEST.
Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/pci/pcie/aer.c | 57 ++----------------------------------------
1 file changed, 2 insertions(+), 55 deletions(-)
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d029979e61f6..08cee30b3ef3 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -237,66 +237,13 @@ static inline bool hest_match_type(struct acpi_hest_header *hest_hdr,
return false;
}
-struct aer_hest_parse_info {
- struct pci_dev *pci_dev;
- int firmware_first;
-};
-
-static int hest_source_is_pcie_aer(struct acpi_hest_header *hest_hdr)
-{
- if (hest_hdr->type == ACPI_HEST_TYPE_AER_ROOT_PORT ||
- hest_hdr->type == ACPI_HEST_TYPE_AER_ENDPOINT ||
- hest_hdr->type == ACPI_HEST_TYPE_AER_BRIDGE)
- return 1;
- return 0;
-}
-
-static int aer_hest_parse(struct acpi_hest_header *hest_hdr, void *data)
-{
- struct aer_hest_parse_info *info = data;
- struct acpi_hest_aer_common *p;
- int ff;
-
- if (!hest_source_is_pcie_aer(hest_hdr))
- return 0;
-
- p = (struct acpi_hest_aer_common *)(hest_hdr + 1);
- ff = !!(p->flags & ACPI_HEST_FIRMWARE_FIRST);
-
- /*
- * If no specific device is supplied, determine whether
- * FIRMWARE_FIRST is set for *any* PCIe device.
- */
- if (!info->pci_dev) {
- info->firmware_first |= ff;
- return 0;
- }
- /* Otherwise, check the specific device */
- if (p->flags & ACPI_HEST_GLOBAL) {
- if (hest_match_type(hest_hdr, info->pci_dev))
- info->firmware_first = ff;
- } else
- if (hest_match_pci(p, info->pci_dev))
- info->firmware_first = ff;
-
- return 0;
-}
static void aer_set_firmware_first(struct pci_dev *pci_dev)
{
- int rc;
- struct aer_hest_parse_info info = {
- .pci_dev = pci_dev,
- .firmware_first = 0,
- };
+ struct pci_host_bridge *host = pci_find_host_bridge(pci_dev->bus);
- rc = apei_hest_parse(aer_hest_parse, &info);
-
- if (rc)
- pci_dev->__aer_firmware_first = 0;
- else
- pci_dev->__aer_firmware_first = info.firmware_first;
+ pci_dev->__aer_firmware_first = !host->native_aer;
pci_dev->__aer_firmware_first_valid = 1;
}
--
2.19.2
As part of the ACPI Platform Error Interfaces (APEI), the HEST table
describes the meaning of errors sources. Although HEST is related to
ownership of AER, the gatekeeper for AER ownership is the _OSC method.
HEST can identify error sources as firmware-first with granularity
ranging from device-level to global. It's not uncommon for HEST to
say "all AER errors sent via of APEI are firmware-first" by setting
the firmware-first and global bits. It's still allowable to do this
and grant the OS AER control over part of the PCIe topology.
Because there is quite some flexibility in how HEST and _OSC can
interact, it is wrong to assume that global firmware-first implies
_OSC will never grant AER control. If we don't ask for control, we are
not going to get it, and we may have entire parts of the PCIe tree
that do not report errors. Thus, ask for AER ownership, regardless of
HEST.
Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/acpi/pci_root.c | 9 ++-------
drivers/pci/pcie/aer.c | 25 +------------------------
include/linux/pci-acpi.h | 6 ------
3 files changed, 3 insertions(+), 37 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 707aafc7c2aa..32b2053bb0fa 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -491,13 +491,8 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm,
if (IS_ENABLED(CONFIG_HOTPLUG_PCI_SHPC))
control |= OSC_PCI_SHPC_NATIVE_HP_CONTROL;
- if (pci_aer_available()) {
- if (aer_acpi_firmware_first())
- dev_info(&device->dev,
- "PCIe AER handled by firmware\n");
- else
- control |= OSC_PCI_EXPRESS_AER_CONTROL;
- }
+ if (pci_aer_available())
+ control |= OSC_PCI_EXPRESS_AER_CONTROL;
requested = control;
status = acpi_pci_osc_control_set(handle, &control,
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index f8fc2114ad39..d029979e61f6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -313,29 +313,6 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev)
return dev->__aer_firmware_first;
}
-static bool aer_firmware_first;
-
-/**
- * aer_acpi_firmware_first - Check if APEI should control AER.
- */
-bool aer_acpi_firmware_first(void)
-{
- static bool parsed = false;
- struct aer_hest_parse_info info = {
- .pci_dev = NULL, /* Check all PCIe devices */
- .firmware_first = 0,
- };
-
- if (pcie_ports_native)
- return false;
-
- if (!parsed) {
- apei_hest_parse(aer_hest_parse, &info);
- aer_firmware_first = info.firmware_first;
- parsed = true;
- }
- return aer_firmware_first;
-}
#endif
#define PCI_EXP_AER_FLAGS (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
@@ -1450,7 +1427,7 @@ static struct pcie_port_service_driver aerdriver = {
*/
int __init pcie_aer_init(void)
{
- if (!pci_aer_available() || aer_acpi_firmware_first())
+ if (!pci_aer_available())
return -ENXIO;
return pcie_port_service_register(&aerdriver);
}
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
index 8082b612f561..2e9c0b973eba 100644
--- a/include/linux/pci-acpi.h
+++ b/include/linux/pci-acpi.h
@@ -116,10 +116,4 @@ static inline void acpi_pci_add_bus(struct pci_bus *bus) { }
static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
#endif /* CONFIG_ACPI */
-#ifdef CONFIG_ACPI_APEI
-extern bool aer_acpi_firmware_first(void);
-#else
-static inline bool aer_acpi_firmware_first(void) { return false; }
-#endif
-
#endif /* _PCI_ACPI_H_ */
--
2.19.2