2019-12-09 16:07:30

by Andre Przywara

[permalink] [raw]
Subject: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

From: Deepak Pandey <[email protected]>

The Arm N1SDP SoC suffers from some PCIe integration issues, most
prominently config space accesses to not existing BDFs being answered
with a bus abort, resulting in an SError.
To mitigate this, the firmware scans the bus before boot (catching the
SErrors) and creates a table with valid BDFs, which acts as a filter for
Linux' config space accesses.

Add code consulting the table as an ACPI PCIe quirk, also register the
corresponding device tree based description of the host controller.
Also fix the other two minor issues on the way, namely not being fully
ECAM compliant and config space accesses being restricted to 32-bit
accesses only.

This allows the Arm Neoverse N1SDP board to boot Linux without crashing
and to access *any* devices (there are no platform devices except UART).

Signed-off-by: Deepak Pandey <[email protected]>
[Sudipto: extend to cover the CCIX root port as well]
Signed-off-by: Sudipto Paul <[email protected]>
[Andre: fix coding style issues, rewrite some parts, add DT support]
Signed-off-by: Andre Przywara <[email protected]>
---
arch/arm64/configs/defconfig | 1 +
drivers/acpi/pci_mcfg.c | 7 +
drivers/pci/controller/Kconfig | 11 ++
drivers/pci/controller/Makefile | 1 +
drivers/pci/controller/pcie-n1sdp.c | 196 ++++++++++++++++++++++++++++
include/linux/pci-ecam.h | 2 +
6 files changed, 218 insertions(+)
create mode 100644 drivers/pci/controller/pcie-n1sdp.c

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 6a83ba2aea3e..58124ef5070b 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -177,6 +177,7 @@ CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
+CONFIG_PCI_QUIRKS=y
CONFIG_PCI_IOV=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
index 6b347d9920cc..7a2b41b9ab57 100644
--- a/drivers/acpi/pci_mcfg.c
+++ b/drivers/acpi/pci_mcfg.c
@@ -142,6 +142,13 @@ static struct mcfg_fixup mcfg_quirks[] = {
XGENE_V2_ECAM_MCFG(4, 0),
XGENE_V2_ECAM_MCFG(4, 1),
XGENE_V2_ECAM_MCFG(4, 2),
+
+#define N1SDP_ECAM_MCFG(rev, seg, ops) \
+ {"ARMLTD", "ARMN1SDP", rev, seg, MCFG_BUS_ANY, ops }
+
+ /* N1SDP SoC with v1 PCIe controller */
+ N1SDP_ECAM_MCFG(0x20181101, 0, &pci_n1sdp_pcie_ecam_ops),
+ N1SDP_ECAM_MCFG(0x20181101, 1, &pci_n1sdp_ccix_ecam_ops),
};

static char mcfg_oem_id[ACPI_OEM_ID_SIZE];
diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
index c77069c8ee5d..45700d32f02e 100644
--- a/drivers/pci/controller/Kconfig
+++ b/drivers/pci/controller/Kconfig
@@ -37,6 +37,17 @@ config PCI_FTPCI100
depends on OF
default ARCH_GEMINI

+config PCIE_HOST_N1SDP_ECAM
+ bool "ARM N1SDP PCIe Controller"
+ depends on ARM64
+ depends on OF || (ACPI && PCI_QUIRKS)
+ select PCI_HOST_COMMON
+ default y if ARCH_VEXPRESS
+ help
+ Say Y here if you want PCIe support for the Arm N1SDP platform.
+ The controller is ECAM compliant, but needs a quirk to workaround
+ an integration issue.
+
config PCI_TEGRA
bool "NVIDIA Tegra PCIe controller"
depends on ARCH_TEGRA || COMPILE_TEST
diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
index 3d4f597f15ce..5f47fefbd67d 100644
--- a/drivers/pci/controller/Makefile
+++ b/drivers/pci/controller/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o
obj-$(CONFIG_PCIE_MOBIVEIL) += pcie-mobiveil.o
obj-$(CONFIG_PCIE_TANGO_SMP8759) += pcie-tango.o
obj-$(CONFIG_VMD) += vmd.o
+obj-$(CONFIG_PCIE_HOST_N1SDP_ECAM) += pcie-n1sdp.o
# pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
obj-y += dwc/

diff --git a/drivers/pci/controller/pcie-n1sdp.c b/drivers/pci/controller/pcie-n1sdp.c
new file mode 100644
index 000000000000..620ab221466c
--- /dev/null
+++ b/drivers/pci/controller/pcie-n1sdp.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018/2019 ARM Ltd.
+ *
+ * This quirk is to mask the following issues:
+ * - PCIE SLVERR: config space accesses to invalid PCIe BDFs cause a bus
+ * error (signalled as an asynchronous SError)
+ * - MCFG BDF mapping: the root complex is mapped separately from the device
+ * config space
+ * - Non 32-bit accesses to config space are not supported.
+ *
+ * At boot time the SCP board firmware creates a discovery table with
+ * the root complex' base address and the valid BDF values, discovered while
+ * scanning the config space and catching the SErrors.
+ * Linux responds only to the EPs listed in this table, returning NULL
+ * for the rest.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/sizes.h>
+#include <linux/of_pci.h>
+#include <linux/of.h>
+#include <linux/pci-ecam.h>
+#include <linux/platform_device.h>
+#include <linux/module.h>
+
+/* Platform specific values as hardcoded in the firmware. */
+#define AP_NS_SHARED_MEM_BASE 0x06000000
+#define MAX_SEGMENTS 2 /* Two PCIe root complexes. */
+#define BDF_TABLE_SIZE SZ_16K
+
+/*
+ * Shared memory layout as written by the SCP upon boot time:
+ * ----
+ * Discover data header --> RC base address
+ * \-> BDF Count
+ * Discover data --> BDF 0...n
+ * ----
+ */
+struct pcie_discovery_data {
+ u32 rc_base_addr;
+ u32 nr_bdfs;
+ u32 valid_bdfs[0];
+} *pcie_discovery_data[MAX_SEGMENTS];
+
+void __iomem *rc_remapped_addr[MAX_SEGMENTS];
+
+/*
+ * map_bus() is called before we do a config space access for a certain
+ * device. We use this to check whether this device is valid, avoiding
+ * config space accesses which would result in an SError otherwise.
+ */
+static void __iomem *pci_n1sdp_map_bus(struct pci_bus *bus, unsigned int devfn,
+ int where)
+{
+ struct pci_config_window *cfg = bus->sysdata;
+ unsigned int devfn_shift = cfg->ops->bus_shift - 8;
+ unsigned int busn = bus->number;
+ unsigned int segment = bus->domain_nr;
+ unsigned int bdf_addr;
+ unsigned int table_count, i;
+
+ if (segment >= MAX_SEGMENTS ||
+ busn < cfg->busr.start || busn > cfg->busr.end)
+ return NULL;
+
+ /* The PCIe root complex has a separate config space mapping. */
+ if (busn == 0 && devfn == 0)
+ return rc_remapped_addr[segment] + where;
+
+ busn -= cfg->busr.start;
+ bdf_addr = (busn << cfg->ops->bus_shift) + (devfn << devfn_shift);
+ table_count = pcie_discovery_data[segment]->nr_bdfs;
+ for (i = 0; i < table_count; i++) {
+ if (bdf_addr == pcie_discovery_data[segment]->valid_bdfs[i])
+ return pci_ecam_map_bus(bus, devfn, where);
+ }
+
+ return NULL;
+}
+
+static int pci_n1sdp_init(struct pci_config_window *cfg, unsigned int segment)
+{
+ phys_addr_t table_base;
+ struct device *dev = cfg->parent;
+ struct pcie_discovery_data *shared_data;
+ size_t bdfs_size;
+
+ if (segment >= MAX_SEGMENTS)
+ return -ENODEV;
+
+ table_base = AP_NS_SHARED_MEM_BASE + segment * BDF_TABLE_SIZE;
+
+ if (!request_mem_region(table_base, BDF_TABLE_SIZE,
+ "PCIe valid BDFs")) {
+ dev_err(dev, "PCIe BDF shared region request failed\n");
+ return -ENOMEM;
+ }
+
+ shared_data = devm_ioremap(dev,
+ table_base, BDF_TABLE_SIZE);
+ if (!shared_data)
+ return -ENOMEM;
+
+ /* Copy the valid BDFs structure to allocated normal memory. */
+ bdfs_size = sizeof(struct pcie_discovery_data) +
+ sizeof(u32) * shared_data->nr_bdfs;
+ pcie_discovery_data[segment] = devm_kmalloc(dev, bdfs_size, GFP_KERNEL);
+ if (!pcie_discovery_data[segment])
+ return -ENOMEM;
+
+ memcpy_fromio(pcie_discovery_data[segment], shared_data, bdfs_size);
+
+ rc_remapped_addr[segment] = devm_ioremap_nocache(dev,
+ shared_data->rc_base_addr,
+ PCI_CFG_SPACE_EXP_SIZE);
+ if (!rc_remapped_addr[segment]) {
+ dev_err(dev, "Cannot remap root port base\n");
+ return -ENOMEM;
+ }
+
+ devm_iounmap(dev, shared_data);
+
+ return 0;
+}
+
+static int pci_n1sdp_pcie_init(struct pci_config_window *cfg)
+{
+ return pci_n1sdp_init(cfg, 0);
+}
+
+static int pci_n1sdp_ccix_init(struct pci_config_window *cfg)
+{
+ return pci_n1sdp_init(cfg, 1);
+}
+
+struct pci_ecam_ops pci_n1sdp_pcie_ecam_ops = {
+ .bus_shift = 20,
+ .init = pci_n1sdp_pcie_init,
+ .pci_ops = {
+ .map_bus = pci_n1sdp_map_bus,
+ .read = pci_generic_config_read32,
+ .write = pci_generic_config_write32,
+ }
+};
+
+struct pci_ecam_ops pci_n1sdp_ccix_ecam_ops = {
+ .bus_shift = 20,
+ .init = pci_n1sdp_ccix_init,
+ .pci_ops = {
+ .map_bus = pci_n1sdp_map_bus,
+ .read = pci_generic_config_read32,
+ .write = pci_generic_config_write32,
+ }
+};
+
+static const struct of_device_id n1sdp_pcie_of_match[] = {
+ { .compatible = "arm,n1sdp-pcie" },
+ { },
+};
+MODULE_DEVICE_TABLE(of, n1sdp_pcie_of_match);
+
+static int n1sdp_pcie_probe(struct platform_device *pdev)
+{
+ const struct device_node *of_node = pdev->dev.of_node;
+ u32 segment;
+
+ if (of_property_read_u32(of_node, "linux,pci-domain", &segment)) {
+ dev_err(&pdev->dev, "N1SDP PCI controllers require linux,pci-domain property\n");
+ return -EINVAL;
+ }
+
+ switch (segment) {
+ case 0:
+ return pci_host_common_probe(pdev, &pci_n1sdp_pcie_ecam_ops);
+ case 1:
+ return pci_host_common_probe(pdev, &pci_n1sdp_ccix_ecam_ops);
+ }
+
+ dev_err(&pdev->dev, "Invalid segment number, must be smaller than %d\n",
+ MAX_SEGMENTS);
+
+ return -EINVAL;
+}
+
+static struct platform_driver n1sdp_pcie_driver = {
+ .driver = {
+ .name = KBUILD_MODNAME,
+ .of_match_table = n1sdp_pcie_of_match,
+ .suppress_bind_attrs = true,
+ },
+ .probe = n1sdp_pcie_probe,
+};
+builtin_platform_driver(n1sdp_pcie_driver);
diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
index a73164c85e78..03cdea69f4e8 100644
--- a/include/linux/pci-ecam.h
+++ b/include/linux/pci-ecam.h
@@ -57,6 +57,8 @@ extern struct pci_ecam_ops pci_thunder_ecam_ops; /* Cavium ThunderX 1.x */
extern struct pci_ecam_ops xgene_v1_pcie_ecam_ops; /* APM X-Gene PCIe v1 */
extern struct pci_ecam_ops xgene_v2_pcie_ecam_ops; /* APM X-Gene PCIe v2.x */
extern struct pci_ecam_ops al_pcie_ops; /* Amazon Annapurna Labs PCIe */
+extern struct pci_ecam_ops pci_n1sdp_pcie_ecam_ops; /* Arm N1SDP PCIe */
+extern struct pci_ecam_ops pci_n1sdp_ccix_ecam_ops; /* Arm N1SDP PCIe */
#endif

#ifdef CONFIG_PCI_HOST_COMMON
--
2.17.1


2019-12-09 16:28:44

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> From: Deepak Pandey <[email protected]>
>
> The Arm N1SDP SoC suffers from some PCIe integration issues, most
> prominently config space accesses to not existing BDFs being answered
> with a bus abort, resulting in an SError.

"Do as I say, not as I do"?

> To mitigate this, the firmware scans the bus before boot (catching the
> SErrors) and creates a table with valid BDFs, which acts as a filter for
> Linux' config space accesses.
>
> Add code consulting the table as an ACPI PCIe quirk, also register the
> corresponding device tree based description of the host controller.
> Also fix the other two minor issues on the way, namely not being fully
> ECAM compliant and config space accesses being restricted to 32-bit
> accesses only.
>
> This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> and to access *any* devices (there are no platform devices except UART).
>
> Signed-off-by: Deepak Pandey <[email protected]>
> [Sudipto: extend to cover the CCIX root port as well]
> Signed-off-by: Sudipto Paul <[email protected]>
> [Andre: fix coding style issues, rewrite some parts, add DT support]
> Signed-off-by: Andre Przywara <[email protected]>
> ---
> arch/arm64/configs/defconfig | 1 +
> drivers/acpi/pci_mcfg.c | 7 +
> drivers/pci/controller/Kconfig | 11 ++
> drivers/pci/controller/Makefile | 1 +
> drivers/pci/controller/pcie-n1sdp.c | 196 ++++++++++++++++++++++++++++
> include/linux/pci-ecam.h | 2 +
> 6 files changed, 218 insertions(+)
> create mode 100644 drivers/pci/controller/pcie-n1sdp.c

Where can I buy one of these? They're "unreleased" according to:

https://community.arm.com/developer/tools-software/oss-platforms/w/docs/440/neoverse-n1-sdp

and I don't think we should wreck upstream because of a platform that
doesn't exist.

Will

2019-12-10 14:43:26

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> From: Deepak Pandey <[email protected]>
>
> The Arm N1SDP SoC suffers from some PCIe integration issues, most
> prominently config space accesses to not existing BDFs being answered
> with a bus abort, resulting in an SError.

Can we tease this apart a little more? Linux doesn't program all the
bits that control error signaling, so even on hardware that works
perfectly, much of this behavior is determined by what firmware did.
I wonder if Linux could be more careful about this.

"Bus abort" is not a term used in PCIe. IIUC, a config read to a
device that doesn't exist should terminate with an Unsupported Request
completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.

The UR should be an uncorrectable non-fatal error (Table 6-5), and
Figures 6-2 and 6-3 show how it should be handled and when it should
be signaled as a system error. In case you don't have a copy of the
spec, I extracted those two figures and put them at [1].

Can you collect "lspci -vvxxx" output to see if we can correlate it
with those figures and the behavior you see?

[1] https://drive.google.com/file/d/1ihhdQvr0a7ZEJG-3gPddw1Tq7cTFAsah/view?usp=sharing

> To mitigate this, the firmware scans the bus before boot (catching the
> SErrors) and creates a table with valid BDFs, which acts as a filter for
> Linux' config space accesses.
>
> Add code consulting the table as an ACPI PCIe quirk, also register the
> corresponding device tree based description of the host controller.
> Also fix the other two minor issues on the way, namely not being fully
> ECAM compliant and config space accesses being restricted to 32-bit
> accesses only.

As I'm sure you've noticed, controllers that support only 32-bit
config writes are not spec compliant and devices may not work
correctly. The comment in pci_generic_config_write32() explains why.

You may not trip over this problem frequently, but I wouldn't call it
a "minor" issue because when you *do* trip over it, you have no
indication that a register was corrupted.

Even ECAM compliance is not really minor -- if this controller were
fully compliant with the spec, you would need ZERO Linux changes to
support it. Every quirk like this means additional maintenance
burden, and it's not just a one-time thing. It means old kernels that
*should* "just work" on your system will not work unless somebody
backports the quirk.

> This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> and to access *any* devices (there are no platform devices except UART).

2019-12-11 11:02:06

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Tue, 10 Dec 2019 08:41:15 -0600
Bjorn Helgaas <[email protected]> wrote:

Hi Bjorn,

> On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> > From: Deepak Pandey <[email protected]>
> >
> > The Arm N1SDP SoC suffers from some PCIe integration issues, most
> > prominently config space accesses to not existing BDFs being answered
> > with a bus abort, resulting in an SError.
>
> Can we tease this apart a little more? Linux doesn't program all the
> bits that control error signaling, so even on hardware that works
> perfectly, much of this behavior is determined by what firmware did.
> I wonder if Linux could be more careful about this.
>
> "Bus abort" is not a term used in PCIe.

Yes, sorry, that was my sloppy term, also aiming more at the CPU side of the bus, between the cores and the RC.

> IIUC, a config read to a
> device that doesn't exist should terminate with an Unsupported Request
> completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.

Yes, that's what Lorenzo mentioned as well.

> The UR should be an uncorrectable non-fatal error (Table 6-5), and
> Figures 6-2 and 6-3 show how it should be handled and when it should
> be signaled as a system error. In case you don't have a copy of the
> spec, I extracted those two figures and put them at [1].

Thanks for that.
So in the last few months we tossed several ideas around how to work-around this without kernel intervention, all of them turned out to be not working. There are indeed registers in the RC that influence error reporting to the CPU side, but even if we could suppress (or catch) the SError, we can't recover and fixup the read transaction to the CPU. Even Lorenzo gave up on this ;-) As far as I understood this, there are gates missing which are supposed to translate this specific UR into a valid "all-1s" response.

> Can you collect "lspci -vvxxx" output to see if we can correlate it
> with those figures and the behavior you see?

Attached.

>
> [1] https://drive.google.com/file/d/1ihhdQvr0a7ZEJG-3gPddw1Tq7cTFAsah/view?usp=sharing
>
> > To mitigate this, the firmware scans the bus before boot (catching the
> > SErrors) and creates a table with valid BDFs, which acts as a filter for
> > Linux' config space accesses.
> >
> > Add code consulting the table as an ACPI PCIe quirk, also register the
> > corresponding device tree based description of the host controller.
> > Also fix the other two minor issues on the way, namely not being fully
> > ECAM compliant and config space accesses being restricted to 32-bit
> > accesses only.
>
> As I'm sure you've noticed, controllers that support only 32-bit
> config writes are not spec compliant and devices may not work
> correctly. The comment in pci_generic_config_write32() explains why.

Yes, I understand. Actually it seems to work fine in my experiments without that part of the patch, but the hardware guys insisted that it's needed. I will double check.

> You may not trip over this problem frequently, but I wouldn't call it
> a "minor" issue because when you *do* trip over it, you have no
> indication that a register was corrupted.
>
> Even ECAM compliance is not really minor -- if this controller were
> fully compliant with the spec, you would need ZERO Linux changes to
> support it. Every quirk like this means additional maintenance
> burden, and it's not just a one-time thing. It means old kernels that
> *should* "just work" on your system will not work unless somebody
> backports the quirk.

I am well aware of that, and we had quite some discussions internally, with quite some opposition.
The point is that this board has virtually everything behind PCI (which is good!), so not having working PCI renders this virtually useless. And installing Linux brings back warm and fuzzy memories of 1990's boot floppies - just without the actual disks ;-)
So anything that improves this situation in any way is helpful.

On the technical side this is a quirk, in the ACPI quirk framework(!), so it shouldn't affect anyone without the magic ACPI ID strings. People could even compile it out if needed. Plus the actual code is contained in a single, new file.
So, yes, I see that it's not pretty and we should not have broken hardware in the first place, but this is probably as good as this will get.

Cheers,
Andre

> > This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> > and to access *any* devices (there are no platform devices except UART).


Attachments:
(No filename) (4.46 kB)
lspci_n1sdp.txt (69.47 kB)
Download all attachments

2019-12-11 20:19:01

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Wed, Dec 11, 2019 at 11:00:49AM +0000, Andre Przywara wrote:
> On Tue, 10 Dec 2019 08:41:15 -0600
> Bjorn Helgaas <[email protected]> wrote:
> > On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> > > From: Deepak Pandey <[email protected]>
> > >
> > > The Arm N1SDP SoC suffers from some PCIe integration issues, most
> > > prominently config space accesses to not existing BDFs being answered
> > > with a bus abort, resulting in an SError.
> >
> > Can we tease this apart a little more? Linux doesn't program all the
> > bits that control error signaling, so even on hardware that works
> > perfectly, much of this behavior is determined by what firmware did.
> > I wonder if Linux could be more careful about this.
> >
> > "Bus abort" is not a term used in PCIe.
>
> Yes, sorry, that was my sloppy term, also aiming more at the CPU
> side of the bus, between the cores and the RC.
>
> > IIUC, a config read to a
> > device that doesn't exist should terminate with an Unsupported Request
> > completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.
>
> Yes, that's what Lorenzo mentioned as well.
>
> > The UR should be an uncorrectable non-fatal error (Table 6-5), and
> > Figures 6-2 and 6-3 show how it should be handled and when it should
> > be signaled as a system error. In case you don't have a copy of the
> > spec, I extracted those two figures and put them at [1].
>
> Thanks for that.
> So in the last few months we tossed several ideas around how to
> work-around this without kernel intervention, all of them turned out
> to be not working. There are indeed registers in the RC that
> influence error reporting to the CPU side, but even if we could
> suppress (or catch) the SError, we can't recover and fixup the read
> transaction to the CPU. Even Lorenzo gave up on this ;-) As far as I
> understood this, there are gates missing which are supposed to
> translate this specific UR into a valid "all-1s" response.

But the commit log says firmware scanned the bus (catching the
SErrors). Shouldn't Linux be able to catch them the same way?

The "all-1s" response directly from hardware is typical of most
platforms, but I don't think it's strictly required by the PCIe spec
and I don't think it's absolutely essential even to Linux. If you can
catch the SErrors, isn't there a way for software to fabricate that
all-1s data and continue after the read?

> > Even ECAM compliance is not really minor -- if this controller were
> > fully compliant with the spec, you would need ZERO Linux changes to
> > support it. Every quirk like this means additional maintenance
> > burden, and it's not just a one-time thing. It means old kernels that
> > *should* "just work" on your system will not work unless somebody
> > backports the quirk.
>
> I am well aware of that, and we had quite some discussions
> internally, with quite some opposition. ...

The main point is that *future* silicon should be designed to avoid
this issue. I hope at least that part was not controversial.

If we want to take advantage of the generic PCI code supplied by
Linux, we have to expect that the hardware will play by the rules of
PCI.

Bjorn

2019-12-12 11:06:53

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On 11/12/2019 20:17, Bjorn Helgaas wrote:

Hi Bjorn,

> On Wed, Dec 11, 2019 at 11:00:49AM +0000, Andre Przywara wrote:
>> On Tue, 10 Dec 2019 08:41:15 -0600
>> Bjorn Helgaas <[email protected]> wrote:
>>> On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
>>>> From: Deepak Pandey <[email protected]>
>>>>
>>>> The Arm N1SDP SoC suffers from some PCIe integration issues, most
>>>> prominently config space accesses to not existing BDFs being answered
>>>> with a bus abort, resulting in an SError.
>>>
>>> Can we tease this apart a little more? Linux doesn't program all the
>>> bits that control error signaling, so even on hardware that works
>>> perfectly, much of this behavior is determined by what firmware did.
>>> I wonder if Linux could be more careful about this.
>>>
>>> "Bus abort" is not a term used in PCIe.
>>
>> Yes, sorry, that was my sloppy term, also aiming more at the CPU
>> side of the bus, between the cores and the RC.
>>
>>> IIUC, a config read to a
>>> device that doesn't exist should terminate with an Unsupported Request
>>> completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.
>>
>> Yes, that's what Lorenzo mentioned as well.
>>
>>> The UR should be an uncorrectable non-fatal error (Table 6-5), and
>>> Figures 6-2 and 6-3 show how it should be handled and when it should
>>> be signaled as a system error. In case you don't have a copy of the
>>> spec, I extracted those two figures and put them at [1].
>>
>> Thanks for that.
>> So in the last few months we tossed several ideas around how to
>> work-around this without kernel intervention, all of them turned out
>> to be not working. There are indeed registers in the RC that
>> influence error reporting to the CPU side, but even if we could
>> suppress (or catch) the SError, we can't recover and fixup the read
>> transaction to the CPU. Even Lorenzo gave up on this ;-) As far as I
>> understood this, there are gates missing which are supposed to
>> translate this specific UR into a valid "all-1s" response.
>
> But the commit log says firmware scanned the bus (catching the
> SErrors). Shouldn't Linux be able to catch them the same way?

Not really. The scanning is done by the SCP management processor, which is a Cortex-M class core on the same bus. So it's a simple, single core running very early after power-on, when the actual AP cores are still off. The SError handler is set to just increase a value, then to return. This value is then checked before and after a config space access for a given
BDF: https://git.linaro.org/landing-teams/working/arm/n1sdp-pcie-quirk.git/tree/scp

On the AP cores that run Linux later on this is quite different: The SError is asynchronous, imprecise (inexact) and has no syndrome information. That means we can't attribute this anymore to the faulting instruction, we don't even know if it happened due to this config space access. The CPU might have executed later instructions already, so the state is broken at this point. SError basically means: the system is screwed up.
Because this is quite common for SErrors, we don't even allow to register SError handlers in arm64 Linux.

So even if we could somehow handle this is in Linux, it would be a much greater and intrusive hack, so I'd rather stick with this version.

> The "all-1s" response directly from hardware is typical of most
> platforms, but I don't think it's strictly required by the PCIe spec
> and I don't think it's absolutely essential even to Linux. If you can
> catch the SErrors, isn't there a way for software to fabricate that
> all-1s data and continue after the read?

That was an idea we had as well, but due to the points mentioned above this is not possible.

>>> Even ECAM compliance is not really minor -- if this controller were
>>> fully compliant with the spec, you would need ZERO Linux changes to
>>> support it. Every quirk like this means additional maintenance
>>> burden, and it's not just a one-time thing. It means old kernels that
>>> *should* "just work" on your system will not work unless somebody
>>> backports the quirk.
>>
>> I am well aware of that, and we had quite some discussions
>> internally, with quite some opposition. ...
>
> The main point is that *future* silicon should be designed to avoid
> this issue. I hope at least that part was not controversial.

Yes, the design goal was to be completely standards (SBSA, ACPI, ECAM) compliant, it was just the usual "things happen" that wasn't spotted in time.

> If we want to take advantage of the generic PCI code supplied by
> Linux, we have to expect that the hardware will play by the rules of
> PCI.

You don't need to convince me ;-), but I think the lesson has been learned.

Cheers,
Andre.

2019-12-12 12:38:55

by Andrew Murray

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> From: Deepak Pandey <[email protected]>
>
> The Arm N1SDP SoC suffers from some PCIe integration issues, most
> prominently config space accesses to not existing BDFs being answered
> with a bus abort, resulting in an SError.

It wouldn't be a surprise if the host controller handled UR completions
differently depending on if they were in response to a Type 0 configuration
request or a Type 1 configuration request. (I think I've seen this before).

Have you verified that you still get a bus abort when you attempt to
perform a config read of a non-existent device downstream of the PCIe switch?
(and thus as a response to a Type 1 request).

I ask because if this is the case, and knowing that the PCIe switch is
fixed, then it would be possible to simplify this quirk (by just making
assumptions of the presence of devices in bus 0 rather than all the busses).


> To mitigate this, the firmware scans the bus before boot (catching the
> SErrors) and creates a table with valid BDFs, which acts as a filter for
> Linux' config space accesses.
>
> Add code consulting the table as an ACPI PCIe quirk, also register the
> corresponding device tree based description of the host controller.
> Also fix the other two minor issues on the way, namely not being fully
> ECAM compliant and config space accesses being restricted to 32-bit
> accesses only.
>
> This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> and to access *any* devices (there are no platform devices except UART).

This implies that this quirk has no side-effects and everything will work
as expected - but this is only true for the simple case. For example hot
plug won't work, SR-IOV, and others won't work.

Also what happens for devices that return CRS? - does this also result in an
abort? Does that mean that the firmware will consider these devices as not
present instead of not ready yet? If this is an issue, then FLR of devices
will also create issues (resulting in SErrors for users).

I think it would be helpful to update this commit message to indicate that
this makes it work better, but it may be broken in certain ways.


>
> Signed-off-by: Deepak Pandey <[email protected]>
> [Sudipto: extend to cover the CCIX root port as well]
> Signed-off-by: Sudipto Paul <[email protected]>
> [Andre: fix coding style issues, rewrite some parts, add DT support]
> Signed-off-by: Andre Przywara <[email protected]>
> ---
> arch/arm64/configs/defconfig | 1 +
> drivers/acpi/pci_mcfg.c | 7 +
> drivers/pci/controller/Kconfig | 11 ++
> drivers/pci/controller/Makefile | 1 +
> drivers/pci/controller/pcie-n1sdp.c | 196 ++++++++++++++++++++++++++++
> include/linux/pci-ecam.h | 2 +
> 6 files changed, 218 insertions(+)
> create mode 100644 drivers/pci/controller/pcie-n1sdp.c
>
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 6a83ba2aea3e..58124ef5070b 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -177,6 +177,7 @@ CONFIG_NET_9P=y
> CONFIG_NET_9P_VIRTIO=y
> CONFIG_PCI=y
> CONFIG_PCIEPORTBUS=y
> +CONFIG_PCI_QUIRKS=y
> CONFIG_PCI_IOV=y
> CONFIG_HOTPLUG_PCI=y
> CONFIG_HOTPLUG_PCI_ACPI=y
> diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
> index 6b347d9920cc..7a2b41b9ab57 100644
> --- a/drivers/acpi/pci_mcfg.c
> +++ b/drivers/acpi/pci_mcfg.c
> @@ -142,6 +142,13 @@ static struct mcfg_fixup mcfg_quirks[] = {
> XGENE_V2_ECAM_MCFG(4, 0),
> XGENE_V2_ECAM_MCFG(4, 1),
> XGENE_V2_ECAM_MCFG(4, 2),
> +
> +#define N1SDP_ECAM_MCFG(rev, seg, ops) \
> + {"ARMLTD", "ARMN1SDP", rev, seg, MCFG_BUS_ANY, ops }
> +
> + /* N1SDP SoC with v1 PCIe controller */
> + N1SDP_ECAM_MCFG(0x20181101, 0, &pci_n1sdp_pcie_ecam_ops),
> + N1SDP_ECAM_MCFG(0x20181101, 1, &pci_n1sdp_ccix_ecam_ops),
> };
>
> static char mcfg_oem_id[ACPI_OEM_ID_SIZE];
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index c77069c8ee5d..45700d32f02e 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -37,6 +37,17 @@ config PCI_FTPCI100
> depends on OF
> default ARCH_GEMINI
>
> +config PCIE_HOST_N1SDP_ECAM
> + bool "ARM N1SDP PCIe Controller"
> + depends on ARM64
> + depends on OF || (ACPI && PCI_QUIRKS)
> + select PCI_HOST_COMMON
> + default y if ARCH_VEXPRESS
> + help
> + Say Y here if you want PCIe support for the Arm N1SDP platform.
> + The controller is ECAM compliant, but needs a quirk to workaround
> + an integration issue.

Again - please indicate the scope of support provided.

> +
> config PCI_TEGRA
> bool "NVIDIA Tegra PCIe controller"
> depends on ARCH_TEGRA || COMPILE_TEST
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index 3d4f597f15ce..5f47fefbd67d 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -28,6 +28,7 @@ obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o
> obj-$(CONFIG_PCIE_MOBIVEIL) += pcie-mobiveil.o
> obj-$(CONFIG_PCIE_TANGO_SMP8759) += pcie-tango.o
> obj-$(CONFIG_VMD) += vmd.o
> +obj-$(CONFIG_PCIE_HOST_N1SDP_ECAM) += pcie-n1sdp.o
> # pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
> obj-y += dwc/
>
> diff --git a/drivers/pci/controller/pcie-n1sdp.c b/drivers/pci/controller/pcie-n1sdp.c
> new file mode 100644
> index 000000000000..620ab221466c
> --- /dev/null
> +++ b/drivers/pci/controller/pcie-n1sdp.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2018/2019 ARM Ltd.
> + *
> + * This quirk is to mask the following issues:
> + * - PCIE SLVERR: config space accesses to invalid PCIe BDFs cause a bus
> + * error (signalled as an asynchronous SError)
> + * - MCFG BDF mapping: the root complex is mapped separately from the device
> + * config space
> + * - Non 32-bit accesses to config space are not supported.
> + *
> + * At boot time the SCP board firmware creates a discovery table with
> + * the root complex' base address and the valid BDF values, discovered while
> + * scanning the config space and catching the SErrors.
> + * Linux responds only to the EPs listed in this table, returning NULL

NIT: it will respond to the switch devices which aren't EPs.


> + * for the rest.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/init.h>
> +#include <linux/ioport.h>
> +#include <linux/sizes.h>
> +#include <linux/of_pci.h>
> +#include <linux/of.h>
> +#include <linux/pci-ecam.h>
> +#include <linux/platform_device.h>
> +#include <linux/module.h>
> +
> +/* Platform specific values as hardcoded in the firmware. */
> +#define AP_NS_SHARED_MEM_BASE 0x06000000
> +#define MAX_SEGMENTS 2 /* Two PCIe root complexes. */
> +#define BDF_TABLE_SIZE SZ_16K
> +
> +/*
> + * Shared memory layout as written by the SCP upon boot time:
> + * ----
> + * Discover data header --> RC base address
> + * \-> BDF Count
> + * Discover data --> BDF 0...n
> + * ----
> + */
> +struct pcie_discovery_data {
> + u32 rc_base_addr;
> + u32 nr_bdfs;
> + u32 valid_bdfs[0];
> +} *pcie_discovery_data[MAX_SEGMENTS];
> +
> +void __iomem *rc_remapped_addr[MAX_SEGMENTS];
> +
> +/*
> + * map_bus() is called before we do a config space access for a certain
> + * device. We use this to check whether this device is valid, avoiding
> + * config space accesses which would result in an SError otherwise.
> + */
> +static void __iomem *pci_n1sdp_map_bus(struct pci_bus *bus, unsigned int devfn,
> + int where)
> +{
> + struct pci_config_window *cfg = bus->sysdata;
> + unsigned int devfn_shift = cfg->ops->bus_shift - 8;
> + unsigned int busn = bus->number;
> + unsigned int segment = bus->domain_nr;
> + unsigned int bdf_addr;
> + unsigned int table_count, i;
> +
> + if (segment >= MAX_SEGMENTS ||
> + busn < cfg->busr.start || busn > cfg->busr.end)
> + return NULL;
> +
> + /* The PCIe root complex has a separate config space mapping. */
> + if (busn == 0 && devfn == 0)
> + return rc_remapped_addr[segment] + where;
> +
> + busn -= cfg->busr.start;
> + bdf_addr = (busn << cfg->ops->bus_shift) + (devfn << devfn_shift);
> + table_count = pcie_discovery_data[segment]->nr_bdfs;
> + for (i = 0; i < table_count; i++) {
> + if (bdf_addr == pcie_discovery_data[segment]->valid_bdfs[i])
> + return pci_ecam_map_bus(bus, devfn, where);
> + }
> +
> + return NULL;
> +}
> +
> +static int pci_n1sdp_init(struct pci_config_window *cfg, unsigned int segment)
> +{
> + phys_addr_t table_base;
> + struct device *dev = cfg->parent;
> + struct pcie_discovery_data *shared_data;
> + size_t bdfs_size;
> +
> + if (segment >= MAX_SEGMENTS)
> + return -ENODEV;
> +
> + table_base = AP_NS_SHARED_MEM_BASE + segment * BDF_TABLE_SIZE;

How can you be sure that this table is populated and isn't junk? I.e. using an older
SCP version?

> +
> + if (!request_mem_region(table_base, BDF_TABLE_SIZE,
> + "PCIe valid BDFs")) {
> + dev_err(dev, "PCIe BDF shared region request failed\n");
> + return -ENOMEM;
> + }
> +
> + shared_data = devm_ioremap(dev,
> + table_base, BDF_TABLE_SIZE);
> + if (!shared_data)
> + return -ENOMEM;
> +
> + /* Copy the valid BDFs structure to allocated normal memory. */
> + bdfs_size = sizeof(struct pcie_discovery_data) +
> + sizeof(u32) * shared_data->nr_bdfs;
> + pcie_discovery_data[segment] = devm_kmalloc(dev, bdfs_size, GFP_KERNEL);
> + if (!pcie_discovery_data[segment])
> + return -ENOMEM;
> +
> + memcpy_fromio(pcie_discovery_data[segment], shared_data, bdfs_size);
> +
> + rc_remapped_addr[segment] = devm_ioremap_nocache(dev,
> + shared_data->rc_base_addr,
> + PCI_CFG_SPACE_EXP_SIZE);
> + if (!rc_remapped_addr[segment]) {
> + dev_err(dev, "Cannot remap root port base\n");
> + return -ENOMEM;
> + }
> +
> + devm_iounmap(dev, shared_data);
> +
> + return 0;
> +}
> +
> +static int pci_n1sdp_pcie_init(struct pci_config_window *cfg)
> +{
> + return pci_n1sdp_init(cfg, 0);
> +}
> +
> +static int pci_n1sdp_ccix_init(struct pci_config_window *cfg)
> +{
> + return pci_n1sdp_init(cfg, 1);
> +}
> +
> +struct pci_ecam_ops pci_n1sdp_pcie_ecam_ops = {
> + .bus_shift = 20,
> + .init = pci_n1sdp_pcie_init,
> + .pci_ops = {
> + .map_bus = pci_n1sdp_map_bus,
> + .read = pci_generic_config_read32,
> + .write = pci_generic_config_write32,
> + }
> +};
> +
> +struct pci_ecam_ops pci_n1sdp_ccix_ecam_ops = {
> + .bus_shift = 20,
> + .init = pci_n1sdp_ccix_init,
> + .pci_ops = {
> + .map_bus = pci_n1sdp_map_bus,
> + .read = pci_generic_config_read32,
> + .write = pci_generic_config_write32,
> + }
> +};
> +
> +static const struct of_device_id n1sdp_pcie_of_match[] = {
> + { .compatible = "arm,n1sdp-pcie" },
> + { },
> +};
> +MODULE_DEVICE_TABLE(of, n1sdp_pcie_of_match);
> +
> +static int n1sdp_pcie_probe(struct platform_device *pdev)
> +{
> + const struct device_node *of_node = pdev->dev.of_node;
> + u32 segment;
> +
> + if (of_property_read_u32(of_node, "linux,pci-domain", &segment)) {
> + dev_err(&pdev->dev, "N1SDP PCI controllers require linux,pci-domain property\n");
> + return -EINVAL;
> + }

Can you use of_get_pci_domain_nr here?

Thanks,

Andrew Murray

> +
> + switch (segment) {
> + case 0:
> + return pci_host_common_probe(pdev, &pci_n1sdp_pcie_ecam_ops);
> + case 1:
> + return pci_host_common_probe(pdev, &pci_n1sdp_ccix_ecam_ops);
> + }
> +
> + dev_err(&pdev->dev, "Invalid segment number, must be smaller than %d\n",
> + MAX_SEGMENTS);
> +
> + return -EINVAL;
> +}
> +
> +static struct platform_driver n1sdp_pcie_driver = {
> + .driver = {
> + .name = KBUILD_MODNAME,
> + .of_match_table = n1sdp_pcie_of_match,
> + .suppress_bind_attrs = true,
> + },
> + .probe = n1sdp_pcie_probe,
> +};
> +builtin_platform_driver(n1sdp_pcie_driver);
> diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
> index a73164c85e78..03cdea69f4e8 100644
> --- a/include/linux/pci-ecam.h
> +++ b/include/linux/pci-ecam.h
> @@ -57,6 +57,8 @@ extern struct pci_ecam_ops pci_thunder_ecam_ops; /* Cavium ThunderX 1.x */
> extern struct pci_ecam_ops xgene_v1_pcie_ecam_ops; /* APM X-Gene PCIe v1 */
> extern struct pci_ecam_ops xgene_v2_pcie_ecam_ops; /* APM X-Gene PCIe v2.x */
> extern struct pci_ecam_ops al_pcie_ops; /* Amazon Annapurna Labs PCIe */
> +extern struct pci_ecam_ops pci_n1sdp_pcie_ecam_ops; /* Arm N1SDP PCIe */
> +extern struct pci_ecam_ops pci_n1sdp_ccix_ecam_ops; /* Arm N1SDP PCIe */
> #endif
>
> #ifdef CONFIG_PCI_HOST_COMMON
> --
> 2.17.1
>

2019-12-12 13:45:40

by Robin Murphy

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On 12/12/2019 11:05 am, Andre Przywara wrote:
> On 11/12/2019 20:17, Bjorn Helgaas wrote:
>
> Hi Bjorn,
>
>> On Wed, Dec 11, 2019 at 11:00:49AM +0000, Andre Przywara wrote:
>>> On Tue, 10 Dec 2019 08:41:15 -0600
>>> Bjorn Helgaas <[email protected]> wrote:
>>>> On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
>>>>> From: Deepak Pandey <[email protected]>
>>>>>
>>>>> The Arm N1SDP SoC suffers from some PCIe integration issues, most
>>>>> prominently config space accesses to not existing BDFs being answered
>>>>> with a bus abort, resulting in an SError.
>>>>
>>>> Can we tease this apart a little more? Linux doesn't program all the
>>>> bits that control error signaling, so even on hardware that works
>>>> perfectly, much of this behavior is determined by what firmware did.
>>>> I wonder if Linux could be more careful about this.
>>>>
>>>> "Bus abort" is not a term used in PCIe.
>>>
>>> Yes, sorry, that was my sloppy term, also aiming more at the CPU
>>> side of the bus, between the cores and the RC.
>>>
>>>> IIUC, a config read to a
>>>> device that doesn't exist should terminate with an Unsupported Request
>>>> completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.
>>>
>>> Yes, that's what Lorenzo mentioned as well.
>>>
>>>> The UR should be an uncorrectable non-fatal error (Table 6-5), and
>>>> Figures 6-2 and 6-3 show how it should be handled and when it should
>>>> be signaled as a system error. In case you don't have a copy of the
>>>> spec, I extracted those two figures and put them at [1].
>>>
>>> Thanks for that.
>>> So in the last few months we tossed several ideas around how to
>>> work-around this without kernel intervention, all of them turned out
>>> to be not working. There are indeed registers in the RC that
>>> influence error reporting to the CPU side, but even if we could
>>> suppress (or catch) the SError, we can't recover and fixup the read
>>> transaction to the CPU. Even Lorenzo gave up on this ;-) As far as I
>>> understood this, there are gates missing which are supposed to
>>> translate this specific UR into a valid "all-1s" response.
>>
>> But the commit log says firmware scanned the bus (catching the
>> SErrors). Shouldn't Linux be able to catch them the same way?
>
> Not really. The scanning is done by the SCP management processor, which is a Cortex-M class core on the same bus. So it's a simple, single core running very early after power-on, when the actual AP cores are still off. The SError handler is set to just increase a value, then to return. This value is then checked before and after a config space access for a given
> BDF: https://git.linaro.org/landing-teams/working/arm/n1sdp-pcie-quirk.git/tree/scp
>
> On the AP cores that run Linux later on this is quite different: The SError is asynchronous, imprecise (inexact) and has no syndrome information. That means we can't attribute this anymore to the faulting instruction, we don't even know if it happened due to this config space access. The CPU might have executed later instructions already, so the state is broken at this point. SError basically means: the system is screwed up.
> Because this is quite common for SErrors, we don't even allow to register SError handlers in arm64 Linux.

Furthermore, on the main application processor, SError might be
delivered to EL3 firmware well beyond the reach of Linux, so we can make
zero assumptions about how it's handled and whether we'll ever see it,
or survive the result (EL3 is at liberty to say "oh, something went
wrong, I'll reset the system immediately").

Robin.
> So even if we could somehow handle this is in Linux, it would be a much greater and intrusive hack, so I'd rather stick with this version.
>
>> The "all-1s" response directly from hardware is typical of most
>> platforms, but I don't think it's strictly required by the PCIe spec
>> and I don't think it's absolutely essential even to Linux. If you can
>> catch the SErrors, isn't there a way for software to fabricate that
>> all-1s data and continue after the read?
>
> That was an idea we had as well, but due to the points mentioned above this is not possible.
>
>>>> Even ECAM compliance is not really minor -- if this controller were
>>>> fully compliant with the spec, you would need ZERO Linux changes to
>>>> support it. Every quirk like this means additional maintenance
>>>> burden, and it's not just a one-time thing. It means old kernels that
>>>> *should* "just work" on your system will not work unless somebody
>>>> backports the quirk.
>>>
>>> I am well aware of that, and we had quite some discussions
>>> internally, with quite some opposition. ...
>>
>> The main point is that *future* silicon should be designed to avoid
>> this issue. I hope at least that part was not controversial.
>
> Yes, the design goal was to be completely standards (SBSA, ACPI, ECAM) compliant, it was just the usual "things happen" that wasn't spotted in time.
>
>> If we want to take advantage of the generic PCI code supplied by
>> Linux, we have to expect that the hardware will play by the rules of
>> PCI.
>
> You don't need to convince me ;-), but I think the lesson has been learned.
>
> Cheers,
> Andre.
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

2019-12-12 21:55:57

by Andrew Murray

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Tue, Dec 10, 2019 at 08:41:15AM -0600, Bjorn Helgaas wrote:
> On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> > From: Deepak Pandey <[email protected]>
> >
> > The Arm N1SDP SoC suffers from some PCIe integration issues, most
> > prominently config space accesses to not existing BDFs being answered
> > with a bus abort, resulting in an SError.
>
> Can we tease this apart a little more? Linux doesn't program all the
> bits that control error signaling, so even on hardware that works
> perfectly, much of this behavior is determined by what firmware did.
> I wonder if Linux could be more careful about this.
>
> "Bus abort" is not a term used in PCIe. IIUC, a config read to a
> device that doesn't exist should terminate with an Unsupported Request
> completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.
>
> The UR should be an uncorrectable non-fatal error (Table 6-5), and
> Figures 6-2 and 6-3 show how it should be handled and when it should
> be signaled as a system error. In case you don't have a copy of the
> spec, I extracted those two figures and put them at [1].
>
> Can you collect "lspci -vvxxx" output to see if we can correlate it
> with those figures and the behavior you see?
>
> [1] https://drive.google.com/file/d/1ihhdQvr0a7ZEJG-3gPddw1Tq7cTFAsah/view?usp=sharing
>
> > To mitigate this, the firmware scans the bus before boot (catching the
> > SErrors) and creates a table with valid BDFs, which acts as a filter for
> > Linux' config space accesses.
> >
> > Add code consulting the table as an ACPI PCIe quirk, also register the
> > corresponding device tree based description of the host controller.
> > Also fix the other two minor issues on the way, namely not being fully
> > ECAM compliant and config space accesses being restricted to 32-bit
> > accesses only.
>
> As I'm sure you've noticed, controllers that support only 32-bit
> config writes are not spec compliant and devices may not work
> correctly. The comment in pci_generic_config_write32() explains why.
>
> You may not trip over this problem frequently, but I wouldn't call it
> a "minor" issue because when you *do* trip over it, you have no
> indication that a register was corrupted.
>
> Even ECAM compliance is not really minor -- if this controller were
> fully compliant with the spec, you would need ZERO Linux changes to
> support it. Every quirk like this means additional maintenance
> burden, and it's not just a one-time thing. It means old kernels that
> *should* "just work" on your system will not work unless somebody
> backports the quirk.

With regards to URs resulting in unwanted aborts or similar - this seems
to be a very common theme amongst ARM PCI controller drivers. For example
both ARM32 imx6 and ARM32 keystone have fault handlers to handle an abort
and fabricate a 0xffffffff read value.

The ARM32 rcar driver, whilst it doesn't appear to produce an abort, does
read the PCI_STATUS register after making a config read to determine if
any aborts have happened - in which case it reports
PCIBIOS_DEVICE_NOT_FOUND.

And as recently reported [1], the rockchip driver also appears to produce
aborts.

I suspect that this ARM64 controller driver won't be the last either. Thus
any solution here may form the basis of copy-cat solutions for subsequent
controllers.

From my understanding of the issues, the ARM64 serrors are imprecise and
as a result there isn't a sensible way of using them to determine that a
read is a UR. So where there are no other solutions to suppress the
generation of an abort by the controller, the only solutions that seem to
exist are 1) pre-scan the devices in firmware and only talk to those devices
in Linux - a safe option but limiting - perhaps with side effects for CRS
and 2) the approach rcar takes in using the PCI_STATUS register - though
you'd end up having to mask the serror (PSTATE.A) for a limited period of
time - a risky option (you'll miss real serrors) - but with no side effects.

(I don't know if option 2 is feasible in this case by the way).

[1] https://lore.kernel.org/linux-pci/[email protected]/2-0001-WFT-PCI-rockchip-play-game-with-unsupported-request-.patch

Thanks,

Andrew Murray

>
> > This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> > and to access *any* devices (there are no platform devices except UART).

2019-12-13 14:26:14

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Thu, 12 Dec 2019 12:37:48 +0000
Andrew Murray <[email protected]> wrote:

Hi Andrew,

> On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> > From: Deepak Pandey <[email protected]>
> >
> > The Arm N1SDP SoC suffers from some PCIe integration issues, most
> > prominently config space accesses to not existing BDFs being answered
> > with a bus abort, resulting in an SError.
>
> It wouldn't be a surprise if the host controller handled UR completions
> differently depending on if they were in response to a Type 0 configuration
> request or a Type 1 configuration request. (I think I've seen this before).

Yeah, that rings a bell here as well, I remember something with the RK3399 PCIe RC.

> Have you verified that you still get a bus abort when you attempt to
> perform a config read of a non-existent device downstream of the PCIe switch?
> (and thus as a response to a Type 1 request).

I think I have checked this (please confirm if my experiment is valid): I get SErrors for both probing non-existent devices on bus 0 and other busses.

> I ask because if this is the case, and knowing that the PCIe switch is
> fixed, then it would be possible to simplify this quirk (by just making
> assumptions of the presence of devices in bus 0 rather than all the busses).

Yeah, thanks for trying, but no luck here ;-)

> > To mitigate this, the firmware scans the bus before boot (catching the
> > SErrors) and creates a table with valid BDFs, which acts as a filter for
> > Linux' config space accesses.
> >
> > Add code consulting the table as an ACPI PCIe quirk, also register the
> > corresponding device tree based description of the host controller.
> > Also fix the other two minor issues on the way, namely not being fully
> > ECAM compliant and config space accesses being restricted to 32-bit
> > accesses only.
> >
> > This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> > and to access *any* devices (there are no platform devices except UART).
>
> This implies that this quirk has no side-effects and everything will work
> as expected - but this is only true for the simple case. For example hot
> plug won't work, SR-IOV, and others won't work.

Yeah, good point, I should have mentioned this. This is really a best effort hack^Wworkaround to make the system work as best as possible.
Yes, SR-IOV won't work. We are about to evaluate our options here, but for now it's just not supported on this system.

I don't think hot plug is of a particular concern here, since the hardware doesn't support physical hot plug. I am not sure if a Thunderbolt add-in card would trigger this problem, but in the worst case it just wouldn't work.

> Also what happens for devices that return CRS? - does this also result in an
> abort?

I haven't tried, and it looks like it's hard to trigger? But chances are indeed that it could generate SErrors.

> Does that mean that the firmware will consider these devices as not
> present instead of not ready yet? If this is an issue, then FLR of devices
> will also create issues (resulting in SErrors for users).
>
> I think it would be helpful to update this commit message to indicate that
> this makes it work better, but it may be broken in certain ways.

Good point. This is really a pragmatic patch to make the system usable at all.

> > Signed-off-by: Deepak Pandey <[email protected]>
> > [Sudipto: extend to cover the CCIX root port as well]
> > Signed-off-by: Sudipto Paul <[email protected]>
> > [Andre: fix coding style issues, rewrite some parts, add DT support]
> > Signed-off-by: Andre Przywara <[email protected]>
> > ---
> > arch/arm64/configs/defconfig | 1 +
> > drivers/acpi/pci_mcfg.c | 7 +
> > drivers/pci/controller/Kconfig | 11 ++
> > drivers/pci/controller/Makefile | 1 +
> > drivers/pci/controller/pcie-n1sdp.c | 196 ++++++++++++++++++++++++++++
> > include/linux/pci-ecam.h | 2 +
> > 6 files changed, 218 insertions(+)
> > create mode 100644 drivers/pci/controller/pcie-n1sdp.c
> >
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index 6a83ba2aea3e..58124ef5070b 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -177,6 +177,7 @@ CONFIG_NET_9P=y
> > CONFIG_NET_9P_VIRTIO=y
> > CONFIG_PCI=y
> > CONFIG_PCIEPORTBUS=y
> > +CONFIG_PCI_QUIRKS=y
> > CONFIG_PCI_IOV=y
> > CONFIG_HOTPLUG_PCI=y
> > CONFIG_HOTPLUG_PCI_ACPI=y
> > diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c
> > index 6b347d9920cc..7a2b41b9ab57 100644
> > --- a/drivers/acpi/pci_mcfg.c
> > +++ b/drivers/acpi/pci_mcfg.c
> > @@ -142,6 +142,13 @@ static struct mcfg_fixup mcfg_quirks[] = {
> > XGENE_V2_ECAM_MCFG(4, 0),
> > XGENE_V2_ECAM_MCFG(4, 1),
> > XGENE_V2_ECAM_MCFG(4, 2),
> > +
> > +#define N1SDP_ECAM_MCFG(rev, seg, ops) \
> > + {"ARMLTD", "ARMN1SDP", rev, seg, MCFG_BUS_ANY, ops }
> > +
> > + /* N1SDP SoC with v1 PCIe controller */
> > + N1SDP_ECAM_MCFG(0x20181101, 0, &pci_n1sdp_pcie_ecam_ops),
> > + N1SDP_ECAM_MCFG(0x20181101, 1, &pci_n1sdp_ccix_ecam_ops),
> > };
> >
> > static char mcfg_oem_id[ACPI_OEM_ID_SIZE];
> > diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> > index c77069c8ee5d..45700d32f02e 100644
> > --- a/drivers/pci/controller/Kconfig
> > +++ b/drivers/pci/controller/Kconfig
> > @@ -37,6 +37,17 @@ config PCI_FTPCI100
> > depends on OF
> > default ARCH_GEMINI
> >
> > +config PCIE_HOST_N1SDP_ECAM
> > + bool "ARM N1SDP PCIe Controller"
> > + depends on ARM64
> > + depends on OF || (ACPI && PCI_QUIRKS)
> > + select PCI_HOST_COMMON
> > + default y if ARCH_VEXPRESS
> > + help
> > + Say Y here if you want PCIe support for the Arm N1SDP platform.
> > + The controller is ECAM compliant, but needs a quirk to workaround
> > + an integration issue.
>
> Again - please indicate the scope of support provided.
>
> > +
> > config PCI_TEGRA
> > bool "NVIDIA Tegra PCIe controller"
> > depends on ARCH_TEGRA || COMPILE_TEST
> > diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> > index 3d4f597f15ce..5f47fefbd67d 100644
> > --- a/drivers/pci/controller/Makefile
> > +++ b/drivers/pci/controller/Makefile
> > @@ -28,6 +28,7 @@ obj-$(CONFIG_PCIE_MEDIATEK) += pcie-mediatek.o
> > obj-$(CONFIG_PCIE_MOBIVEIL) += pcie-mobiveil.o
> > obj-$(CONFIG_PCIE_TANGO_SMP8759) += pcie-tango.o
> > obj-$(CONFIG_VMD) += vmd.o
> > +obj-$(CONFIG_PCIE_HOST_N1SDP_ECAM) += pcie-n1sdp.o
> > # pcie-hisi.o quirks are needed even without CONFIG_PCIE_DW
> > obj-y += dwc/
> >
> > diff --git a/drivers/pci/controller/pcie-n1sdp.c b/drivers/pci/controller/pcie-n1sdp.c
> > new file mode 100644
> > index 000000000000..620ab221466c
> > --- /dev/null
> > +++ b/drivers/pci/controller/pcie-n1sdp.c
> > @@ -0,0 +1,196 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2018/2019 ARM Ltd.
> > + *
> > + * This quirk is to mask the following issues:
> > + * - PCIE SLVERR: config space accesses to invalid PCIe BDFs cause a bus
> > + * error (signalled as an asynchronous SError)
> > + * - MCFG BDF mapping: the root complex is mapped separately from the device
> > + * config space
> > + * - Non 32-bit accesses to config space are not supported.
> > + *
> > + * At boot time the SCP board firmware creates a discovery table with
> > + * the root complex' base address and the valid BDF values, discovered while
> > + * scanning the config space and catching the SErrors.
> > + * Linux responds only to the EPs listed in this table, returning NULL
>
> NIT: it will respond to the switch devices which aren't EPs.
>
>
> > + * for the rest.
> > + */
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/init.h>
> > +#include <linux/ioport.h>
> > +#include <linux/sizes.h>
> > +#include <linux/of_pci.h>
> > +#include <linux/of.h>
> > +#include <linux/pci-ecam.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/module.h>
> > +
> > +/* Platform specific values as hardcoded in the firmware. */
> > +#define AP_NS_SHARED_MEM_BASE 0x06000000
> > +#define MAX_SEGMENTS 2 /* Two PCIe root complexes. */
> > +#define BDF_TABLE_SIZE SZ_16K
> > +
> > +/*
> > + * Shared memory layout as written by the SCP upon boot time:
> > + * ----
> > + * Discover data header --> RC base address
> > + * \-> BDF Count
> > + * Discover data --> BDF 0...n
> > + * ----
> > + */
> > +struct pcie_discovery_data {
> > + u32 rc_base_addr;
> > + u32 nr_bdfs;
> > + u32 valid_bdfs[0];
> > +} *pcie_discovery_data[MAX_SEGMENTS];
> > +
> > +void __iomem *rc_remapped_addr[MAX_SEGMENTS];
> > +
> > +/*
> > + * map_bus() is called before we do a config space access for a certain
> > + * device. We use this to check whether this device is valid, avoiding
> > + * config space accesses which would result in an SError otherwise.
> > + */
> > +static void __iomem *pci_n1sdp_map_bus(struct pci_bus *bus, unsigned int devfn,
> > + int where)
> > +{
> > + struct pci_config_window *cfg = bus->sysdata;
> > + unsigned int devfn_shift = cfg->ops->bus_shift - 8;
> > + unsigned int busn = bus->number;
> > + unsigned int segment = bus->domain_nr;
> > + unsigned int bdf_addr;
> > + unsigned int table_count, i;
> > +
> > + if (segment >= MAX_SEGMENTS ||
> > + busn < cfg->busr.start || busn > cfg->busr.end)
> > + return NULL;
> > +
> > + /* The PCIe root complex has a separate config space mapping. */
> > + if (busn == 0 && devfn == 0)
> > + return rc_remapped_addr[segment] + where;
> > +
> > + busn -= cfg->busr.start;
> > + bdf_addr = (busn << cfg->ops->bus_shift) + (devfn << devfn_shift);
> > + table_count = pcie_discovery_data[segment]->nr_bdfs;
> > + for (i = 0; i < table_count; i++) {
> > + if (bdf_addr == pcie_discovery_data[segment]->valid_bdfs[i])
> > + return pci_ecam_map_bus(bus, devfn, where);
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +static int pci_n1sdp_init(struct pci_config_window *cfg, unsigned int segment)
> > +{
> > + phys_addr_t table_base;
> > + struct device *dev = cfg->parent;
> > + struct pcie_discovery_data *shared_data;
> > + size_t bdfs_size;
> > +
> > + if (segment >= MAX_SEGMENTS)
> > + return -ENODEV;
> > +
> > + table_base = AP_NS_SHARED_MEM_BASE + segment * BDF_TABLE_SIZE;
>
> How can you be sure that this table is populated and isn't junk? I.e. using an older
> SCP version?

This is just (and always was) part of the firmware. If not, it won't work. Technically we tie this to the DT compatible or the ACPI signature.

> > +
> > + if (!request_mem_region(table_base, BDF_TABLE_SIZE,
> > + "PCIe valid BDFs")) {
> > + dev_err(dev, "PCIe BDF shared region request failed\n");
> > + return -ENOMEM;
> > + }
> > +
> > + shared_data = devm_ioremap(dev,
> > + table_base, BDF_TABLE_SIZE);
> > + if (!shared_data)
> > + return -ENOMEM;
> > +
> > + /* Copy the valid BDFs structure to allocated normal memory. */
> > + bdfs_size = sizeof(struct pcie_discovery_data) +
> > + sizeof(u32) * shared_data->nr_bdfs;
> > + pcie_discovery_data[segment] = devm_kmalloc(dev, bdfs_size, GFP_KERNEL);
> > + if (!pcie_discovery_data[segment])
> > + return -ENOMEM;
> > +
> > + memcpy_fromio(pcie_discovery_data[segment], shared_data, bdfs_size);
> > +
> > + rc_remapped_addr[segment] = devm_ioremap_nocache(dev,
> > + shared_data->rc_base_addr,
> > + PCI_CFG_SPACE_EXP_SIZE);
> > + if (!rc_remapped_addr[segment]) {
> > + dev_err(dev, "Cannot remap root port base\n");
> > + return -ENOMEM;
> > + }
> > +
> > + devm_iounmap(dev, shared_data);
> > +
> > + return 0;
> > +}
> > +
> > +static int pci_n1sdp_pcie_init(struct pci_config_window *cfg)
> > +{
> > + return pci_n1sdp_init(cfg, 0);
> > +}
> > +
> > +static int pci_n1sdp_ccix_init(struct pci_config_window *cfg)
> > +{
> > + return pci_n1sdp_init(cfg, 1);
> > +}
> > +
> > +struct pci_ecam_ops pci_n1sdp_pcie_ecam_ops = {
> > + .bus_shift = 20,
> > + .init = pci_n1sdp_pcie_init,
> > + .pci_ops = {
> > + .map_bus = pci_n1sdp_map_bus,
> > + .read = pci_generic_config_read32,
> > + .write = pci_generic_config_write32,
> > + }
> > +};
> > +
> > +struct pci_ecam_ops pci_n1sdp_ccix_ecam_ops = {
> > + .bus_shift = 20,
> > + .init = pci_n1sdp_ccix_init,
> > + .pci_ops = {
> > + .map_bus = pci_n1sdp_map_bus,
> > + .read = pci_generic_config_read32,
> > + .write = pci_generic_config_write32,
> > + }
> > +};
> > +
> > +static const struct of_device_id n1sdp_pcie_of_match[] = {
> > + { .compatible = "arm,n1sdp-pcie" },
> > + { },
> > +};
> > +MODULE_DEVICE_TABLE(of, n1sdp_pcie_of_match);
> > +
> > +static int n1sdp_pcie_probe(struct platform_device *pdev)
> > +{
> > + const struct device_node *of_node = pdev->dev.of_node;
> > + u32 segment;
> > +
> > + if (of_property_read_u32(of_node, "linux,pci-domain", &segment)) {
> > + dev_err(&pdev->dev, "N1SDP PCI controllers require linux,pci-domain property\n");
> > + return -EINVAL;
> > + }
>
> Can you use of_get_pci_domain_nr here?

Ah, indeed. There seems to be an of_... function for everything ;-)

Cheers,
Andre.

> > +
> > + switch (segment) {
> > + case 0:
> > + return pci_host_common_probe(pdev, &pci_n1sdp_pcie_ecam_ops);
> > + case 1:
> > + return pci_host_common_probe(pdev, &pci_n1sdp_ccix_ecam_ops);
> > + }
> > +
> > + dev_err(&pdev->dev, "Invalid segment number, must be smaller than %d\n",
> > + MAX_SEGMENTS);
> > +
> > + return -EINVAL;
> > +}
> > +
> > +static struct platform_driver n1sdp_pcie_driver = {
> > + .driver = {
> > + .name = KBUILD_MODNAME,
> > + .of_match_table = n1sdp_pcie_of_match,
> > + .suppress_bind_attrs = true,
> > + },
> > + .probe = n1sdp_pcie_probe,
> > +};
> > +builtin_platform_driver(n1sdp_pcie_driver);
> > diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
> > index a73164c85e78..03cdea69f4e8 100644
> > --- a/include/linux/pci-ecam.h
> > +++ b/include/linux/pci-ecam.h
> > @@ -57,6 +57,8 @@ extern struct pci_ecam_ops pci_thunder_ecam_ops; /* Cavium ThunderX 1.x */
> > extern struct pci_ecam_ops xgene_v1_pcie_ecam_ops; /* APM X-Gene PCIe v1 */
> > extern struct pci_ecam_ops xgene_v2_pcie_ecam_ops; /* APM X-Gene PCIe v2.x */
> > extern struct pci_ecam_ops al_pcie_ops; /* Amazon Annapurna Labs PCIe */
> > +extern struct pci_ecam_ops pci_n1sdp_pcie_ecam_ops; /* Arm N1SDP PCIe */
> > +extern struct pci_ecam_ops pci_n1sdp_ccix_ecam_ops; /* Arm N1SDP PCIe */
> > #endif
> >
> > #ifdef CONFIG_PCI_HOST_COMMON
> > --
> > 2.17.1
> >

2019-12-13 14:40:40

by Andre Przywara

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Thu, 12 Dec 2019 21:07:24 +0000
Andrew Murray <[email protected]> wrote:

Hi,

> On Tue, Dec 10, 2019 at 08:41:15AM -0600, Bjorn Helgaas wrote:
> > On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:

[ ... ]

> > Even ECAM compliance is not really minor -- if this controller were
> > fully compliant with the spec, you would need ZERO Linux changes to
> > support it. Every quirk like this means additional maintenance
> > burden, and it's not just a one-time thing. It means old kernels that
> > *should* "just work" on your system will not work unless somebody
> > backports the quirk.
>
> With regards to URs resulting in unwanted aborts or similar - this seems
> to be a very common theme amongst ARM PCI controller drivers. For example
> both ARM32 imx6 and ARM32 keystone have fault handlers to handle an abort
> and fabricate a 0xffffffff read value.
>
> The ARM32 rcar driver, whilst it doesn't appear to produce an abort, does
> read the PCI_STATUS register after making a config read to determine if
> any aborts have happened - in which case it reports
> PCIBIOS_DEVICE_NOT_FOUND.
>
> And as recently reported [1], the rockchip driver also appears to produce
> aborts.
>
> I suspect that this ARM64 controller driver won't be the last either. Thus
> any solution here may form the basis of copy-cat solutions for subsequent
> controllers.

Well, I think Bjorn is aware of them, but was actually hoping that those broken controllers would go away at some point ;-)
And just to make this clear: I would categorise this issue as an integration bug, which just can't be fixed in hardware or firmware easily. It was never meant to be this way. So I am not sure we should promote generic solutions here.

> From my understanding of the issues, the ARM64 serrors are imprecise and
> as a result there isn't a sensible way of using them to determine that a
> read is a UR. So where there are no other solutions to suppress the
> generation of an abort by the controller, the only solutions that seem to
> exist are 1) pre-scan the devices in firmware and only talk to those devices
> in Linux - a safe option but limiting - perhaps with side effects for CRS
> and 2) the approach rcar takes in using the PCI_STATUS register - though
> you'd end up having to mask the serror (PSTATE.A) for a limited period of
> time - a risky option (you'll miss real serrors) - but with no side effects.
>
> (I don't know if option 2 is feasible in this case by the way).

Interesting, we might evaluate this, but mostly out of curiosity or for debugging. I don't think it's really a better option.
If there is a safe way of making this work in the majority of cases, that should be the way to go. Setting PSTATE.A sounds quite wacky to me.

Thanks,
Andre.

> [1] https://lore.kernel.org/linux-pci/[email protected]/2-0001-WFT-PCI-rockchip-play-game-with-unsupported-request-.patch
>
> Thanks,
>
> Andrew Murray
>
> >
> > > This allows the Arm Neoverse N1SDP board to boot Linux without crashing
> > > and to access *any* devices (there are no platform devices except UART).

2019-12-13 21:08:37

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] pcie: Add quirk for the Arm Neoverse N1SDP platform

On Thu, Dec 12, 2019 at 11:05:31AM +0000, Andre Przywara wrote:
> On 11/12/2019 20:17, Bjorn Helgaas wrote:
> > On Wed, Dec 11, 2019 at 11:00:49AM +0000, Andre Przywara wrote:
> >> On Tue, 10 Dec 2019 08:41:15 -0600
> >> Bjorn Helgaas <[email protected]> wrote:
> >>> On Mon, Dec 09, 2019 at 04:06:38PM +0000, Andre Przywara wrote:
> >>>> From: Deepak Pandey <[email protected]>
> >>>>
> >>>> The Arm N1SDP SoC suffers from some PCIe integration issues, most
> >>>> prominently config space accesses to not existing BDFs being answered
> >>>> with a bus abort, resulting in an SError.
> >>>
> >>> Can we tease this apart a little more? Linux doesn't program all the
> >>> bits that control error signaling, so even on hardware that works
> >>> perfectly, much of this behavior is determined by what firmware did.
> >>> I wonder if Linux could be more careful about this.
> >>>
> >>> "Bus abort" is not a term used in PCIe.
> >>
> >> Yes, sorry, that was my sloppy term, also aiming more at the CPU
> >> side of the bus, between the cores and the RC.
> >>
> >>> IIUC, a config read to a
> >>> device that doesn't exist should terminate with an Unsupported Request
> >>> completion, e.g., see the implementation note in PCIe r5.0 sec 2.3.1.
> >>
> >> Yes, that's what Lorenzo mentioned as well.
> >>
> >>> The UR should be an uncorrectable non-fatal error (Table 6-5), and
> >>> Figures 6-2 and 6-3 show how it should be handled and when it should
> >>> be signaled as a system error. In case you don't have a copy of the
> >>> spec, I extracted those two figures and put them at [1].
> >>
> >> Thanks for that.
> >> So in the last few months we tossed several ideas around how to
> >> work-around this without kernel intervention, all of them turned out
> >> to be not working. There are indeed registers in the RC that
> >> influence error reporting to the CPU side, but even if we could
> >> suppress (or catch) the SError, we can't recover and fixup the read
> >> transaction to the CPU. Even Lorenzo gave up on this ;-) As far as I
> >> understood this, there are gates missing which are supposed to
> >> translate this specific UR into a valid "all-1s" response.
> >
> > But the commit log says firmware scanned the bus (catching the
> > SErrors). Shouldn't Linux be able to catch them the same way?
>
> Not really. The scanning is done by the SCP management processor,
> which is a Cortex-M class core on the same bus. So it's a simple,
> single core running very early after power-on, when the actual AP
> cores are still off. The SError handler is set to just increase a
> value, then to return. This value is then checked before and after a
> config space access for a given BDF:
> https://git.linaro.org/landing-teams/working/arm/n1sdp-pcie-quirk.git/tree/scp
>
> On the AP cores that run Linux later on this is quite different: The
> SError is asynchronous, imprecise (inexact) and has no syndrome
> information. That means we can't attribute this anymore to the
> faulting instruction, we don't even know if it happened due to this
> config space access. The CPU might have executed later instructions
> already, so the state is broken at this point. SError basically
> means: the system is screwed up. Because this is quite common for
> SErrors, we don't even allow to register SError handlers in arm64
> Linux.
>
> So even if we could somehow handle this is in Linux, it would be a
> much greater and intrusive hack, so I'd rather stick with this
> version.

The problem is that from a PCIe point of view, UR is something we
should be able to tolerate. It happens during enumeration and also
during hotplug. It definitely does not mean "the system is screwed up
and must be rebooted."

To go back to Figure 6-3, I'm getting the impression that the "System
Error" shown at the top is *not* the "SError" you're referring to. If
they were the same, the Root Control enable bits should gate it, but
according to your lspci, those enable bits are cleared, yet you still
take SErrors.

SError is asynchronous and imprecise. Is there no way to do the
config access in a way that makes it precise, by adding some kind of
sync? There's no reason we can't single-thread config accesses and
maybe even MMIO/IO port accesses as well if necessary.

Bjorn