LinuxLists.cc - [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

Hi Vincenzo,

Thanks for raising this issue. Let's see what we can do to address
it.

On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> Add a configurable delay to the Rockchip PCIe driver to address
> crashes that occur on some old devices, such as the Pine64 RockPro64.
>
> This issue is affecting the ARM community, but there is no
> upstream solution for it yet.

It sounds like this happens with several endpoints, right? And I
assume the endpoints work fine in other non-Rockchip systems? If
that's the case, my guess is the problem is with the Rockchip host
controller and how it's initialized, not with the endpoints.

The only delays and timeouts I see in the driver now are in
rockchip_pcie_host_init_port(), where it waits for link training to
complete. I assume the link training did completely successfully
since you don't mention either a gen1 or gen2 timeout (although the
gen2 message is a dev_dbg() that normally wouldn't go to the console).

I don't know that the spec contains a retrain timeout value. Several
other drivers use 1 second, while rockchip uses 500ms (for example,
see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).

I think we need to understand the issue better before adding a DT
property and a module parameter. Those are hard for users to deal
with. If we can figure out a value that works for everybody, it would
be better to just hard-code it in the driver and use that all the
time.

A few minor style/formatting comments below just for future reference:

> Crash dump (customized Manjaro kernel before this patch):
> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1
> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
> [ 1.229870] sp : ffff80001004b850
> [ 1.229872] x29: ffff80001004b850 x28: 0000000000000001
> [ 1.229877] x27: 0000000000000000 x26: ffff00007a795000
> [ 1.229882] x25: ffff00007a7910b0 x24: 0000000000000000
> [ 1.229887] x23: 0000000000000000 x22: ffff00007b3a4380
> [ 1.229891] x21: ffff80001004b8c4 x20: 0000000000000004
> [ 1.229895] x19: 0000000000100000 x18: 0000000000000020
> [ 1.229900] x17: 0000000000000001 x16: 0000000000000019
> [ 1.229904] x15: ffff00007b222fd8 x14: ffffffffffffffff
> [ 1.229908] x13: ffff00007a79ba1c x12: ffff00007a79b290
> [ 1.229912] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> [ 1.229917] x9 : ff72646268756463 x8 : 0000000000000391
> [ 1.229921] x7 : ffff80001004b880 x6 : 0000000000000001
> [ 1.229925] x5 : 0000000000000000 x4 : 0000000000000000
> [ 1.229930] x3 : 0000000000c00008 x2 : 000000000080000a
> [ 1.229934] x1 : 0000000000000000 x0 : ffff800014000000
> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
> [ 1.229942] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1
> [ 1.229944] Hardware name: Pine64 RockPro64 v2.1 (DT)
> [ 1.229946] Call trace:
> [ 1.229948] dump_backtrace+0x0/0x1d0
> [ 1.229949] show_stack+0x18/0x24
> [ 1.229951] dump_stack+0xc0/0x118
> [ 1.229953] panic+0x148/0x320
> [ 1.229955] nmi_panic+0x8c/0x90
> [ 1.229956] arm64_serror_panic+0x78/0x84
> [ 1.229958] do_serror+0x15c/0x160
> [ 1.229960] el1_error+0x84/0x100
> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
> [ 1.229968] pci_scan_single_device+0xa4/0x144
> [ 1.229970] pci_scan_slot+0x40/0x12c
> [ 1.229972] pci_scan_child_bus_extend+0x58/0x34c
> [ 1.229974] pci_scan_bridge_extend+0x310/0x590
> [ 1.229976] pci_scan_child_bus_extend+0x210/0x34c
> [ 1.229978] pci_scan_root_bus_bridge+0x68/0xdc
> [ 1.229980] pci_host_probe+0x18/0xc4
> [ 1.229981] rockchip_pcie_probe+0x204/0x330

Include only the parts of the crash dump that are needed to debug the
problem or identify the problem enough to find this patch. Timestamps
probably aren't necessary. Register contents -- probably not either.

The rest of the backtrace (below here) probably isn't useful.

> [ 1.229984] platform_drv_probe+0x54/0xb0
> [ 1.229985] really_probe+0xe8/0x500
> [ 1.229987] driver_probe_device+0xd8/0xf0
> [ 1.229989] device_driver_attach+0xc0/0xcc
> [ 1.229991] __driver_attach+0xa4/0x170
> [ 1.229993] bus_for_each_dev+0x70/0xc0
> [ 1.229994] driver_attach+0x24/0x30
> [ 1.229996] bus_add_driver+0x140/0x234
> [ 1.229998] driver_register+0x78/0x130
> [ 1.230000] __platform_driver_register+0x4c/0x60
> [ 1.230002] rockchip_pcie_driver_init+0x1c/0x28
> [ 1.230004] do_one_initcall+0x54/0x1c0
> [ 1.230005] do_initcalls+0xf4/0x130
> [ 1.230007] kernel_init_freeable+0x144/0x19c
> [ 1.230009] kernel_init+0x14/0x11c
> [ 1.230011] ret_from_fork+0x10/0x34
> [ 1.230035] SMP: stopping secondary CPUs
> [ 1.230037] Kernel Offset: disabled
> [ 1.230039] CPU features: 0x0240022,2100200c
> [ 1.230041] Memory Limit: none

> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4286,6 +4286,14 @@
> any pair of devices, possibly at the cost of
> reduced performance. This also guarantees
> that hot-added devices will work.
> + pcie_rockchip_host.bus_scan_delay= [PCIE] Delay in ms before
> + scanning PCIe bus in Rockchip PCIe host driver. Some PCIe
> + cards seem to need delays that can be several hundred ms.
> + If set to greater than or equal to 0 this parameter will
> + override delay that can be set in device tree.
> + Values less than 0 the module will hit an assertion
> + during the init.
> + The default value is 0.

Generally speaking module-specific stuff like this doesn't get
documented in kernel-parameters.txt.

> +++ b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi
> @@ -663,7 +663,8 @@ &pcie0 {
> pinctrl-0 = <&pcie_perst>;
> vpcie12v-supply = <&vcc12v_dcin>;
> vpcie3v3-supply = <&vcc3v3_pcie>;
> - status = "okay";
> + bus-scan-delay-ms = <0>;
> + status = "okay";

Please don't add arbitrary whitespace changes (it looks like this
uses leading spaces instead of tabs).

> +/* bus_scan_delay - module parameter to override the
> + * device tree value, which is 0 by default. */

Please follow comment style in the file, i.e.,

/*
* bus_scan_delay - ...
*/

Wrap comments to fill 78 columns or so to match the rest of the file.

> @@ -987,6 +996,23 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>
> rockchip_pcie_enable_interrupts(rockchip);
>
> + prob_delay = rockchip->bus_scan_delay;
> + if (bus_scan_delay)
> + prob_delay = bus_scan_delay;
> +
> + /*
> + * FIXME: This is a workaround for some devices that crash on calls to pci_host_probe()
> + * or pci_scan_root_bus_bridge(). We add a delay before bus scanning to avoid the crash.
> + * The call trace reaches rockchip_pcie_rd_conf() while attempting to read the vendor ID
> + * (pci_bus_generic_read_dev_vendor_id() is in the call stack) before panicking.
> + *
> + * I'm not sure why this workaround is effective or what causes the panic. It may be related
> + * to the cansleep value.

Wrap comments to fit in 78 columns to match the rest of the file.

> + */
> + dev_info(dev, "wait %u ms before bus scan\n", prob_delay);
> + if (prob_delay > 0)
> + msleep(prob_delay);

I don't think we want to just add a random delay here that's not
connected to anything else. I assume it could go in
rockchip_pcie_host_init_port() or perhaps rockchip_pcie_init_port()
(which deasserts resets, and there are usually timing constraints
related to deasserting resets). Hopefully Shawn can shed some light
on this.

> err = pci_host_probe(bridge);
> if (err < 0)
> goto err_remove_irq_domain;
> @@ -1055,6 +1081,11 @@ static struct platform_driver rockchip_pcie_driver = {
> };
> module_platform_driver(rockchip_pcie_driver);
>
> +/** Allow to override the device tree default configuration with
> + * a command line argument.
> + **/

Use multi-line comment style that matches the rest of the file.

> +module_param_named(bus_scan_delay, bus_scan_delay, int, S_IRUGO);

This should go right next to the bus_scan_delay definition above.

> +++ b/drivers/pci/controller/pcie-rockchip.c
> @@ -149,6 +149,11 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
> return PTR_ERR(rockchip->clk_pcie_pm);
> }
>
> + err = of_property_read_u32(node, "bus-scan-delay-ms", &rockchip->bus_scan_delay);
> + if (err) {
> + dev_info(dev, "no bus scan delay, default to 0 ms\n");
> + rockchip->bus_scan_delay = 0;

I hope we don't need this property at all, but if we do, I assume it
should be optional, with no message needed if it's not present.

> +++ b/drivers/pci/controller/pcie-rockchip.h
> @@ -299,6 +299,16 @@ struct rockchip_pcie {
> phys_addr_t msg_bus_addr;
> bool is_rc;
> struct resource *mem_res;
> +
> + /* It seems that the driver crashes on some
> + * older devices. To work around this, we
> + * should add a sleep delay before probing.
> + *
> + * FIXME: need more investigated with an,
> + * but looks like the problem can be related with
> + * the cansleep value?
> + **/

We need better understanding of what's going on here. Then this
comment could be made more specific, shorter, and formatted like
others.

> + u32 bus_scan_delay;

2023-05-10 01:09:06

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <[email protected]> wrote:
>
> Hi Vincenzo,
>
> Thanks for raising this issue. Let's see what we can do to address
> it.
>
> On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> >
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right? And I
> assume the endpoints work fine in other non-Rockchip systems? If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.
>
> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete. I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value. Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter. Those are hard for users to deal
> with. If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.

Good Evening,

The main issue with the rk3399 is the PCIe controller is buggy and
triggers a SoC panic when certain error conditions occur that should
be handled gracefully. One of those conditions is when an endpoint
requests an access to wait and retry later. Many years ago we ran that
issue to ground and with Robin Murphy's help we found that while it's
possible to gracefully handle that condition it required hijacking the
entire arm64 error handling routine. Not exactly scalable for just one
SoC. The configurable waits allow us to program reasonable times for
90% of the endpoints that come up in the normal amount of time, while
being able to adjust it for the other 10% that do not. Some require
multiple seconds before they return without error. Part of the reason
we don't want to hardcode the wait time is because the probe isn't
handled asynchronously, so the kernel appears to hang while waiting
for the timeout.

I'm curious if it's been tested with this series on top:
https://lore.kernel.org/linux-arm-kernel/[email protected]/T/
I'm particularly curious if
[v5,04/11] PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
makes a difference in the behavior. Please test this and see if it
improves the timeouts you need for the endpoints you're testing
against.

Very Respectfully,
Peter Geis

>
> A few minor style/formatting comments below just for future reference:
>
> > Crash dump (customized Manjaro kernel before this patch):
> > [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
> > [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1
> > [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
> > [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
> > [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
> > [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
> > [ 1.229870] sp : ffff80001004b850
> > [ 1.229872] x29: ffff80001004b850 x28: 0000000000000001
> > [ 1.229877] x27: 0000000000000000 x26: ffff00007a795000
> > [ 1.229882] x25: ffff00007a7910b0 x24: 0000000000000000
> > [ 1.229887] x23: 0000000000000000 x22: ffff00007b3a4380
> > [ 1.229891] x21: ffff80001004b8c4 x20: 0000000000000004
> > [ 1.229895] x19: 0000000000100000 x18: 0000000000000020
> > [ 1.229900] x17: 0000000000000001 x16: 0000000000000019
> > [ 1.229904] x15: ffff00007b222fd8 x14: ffffffffffffffff
> > [ 1.229908] x13: ffff00007a79ba1c x12: ffff00007a79b290
> > [ 1.229912] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
> > [ 1.229917] x9 : ff72646268756463 x8 : 0000000000000391
> > [ 1.229921] x7 : ffff80001004b880 x6 : 0000000000000001
> > [ 1.229925] x5 : 0000000000000000 x4 : 0000000000000000
> > [ 1.229930] x3 : 0000000000c00008 x2 : 000000000080000a
> > [ 1.229934] x1 : 0000000000000000 x0 : ffff800014000000
> > [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
> > [ 1.229942] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM #1
> > [ 1.229944] Hardware name: Pine64 RockPro64 v2.1 (DT)
> > [ 1.229946] Call trace:
> > [ 1.229948] dump_backtrace+0x0/0x1d0
> > [ 1.229949] show_stack+0x18/0x24
> > [ 1.229951] dump_stack+0xc0/0x118
> > [ 1.229953] panic+0x148/0x320
> > [ 1.229955] nmi_panic+0x8c/0x90
> > [ 1.229956] arm64_serror_panic+0x78/0x84
> > [ 1.229958] do_serror+0x15c/0x160
> > [ 1.229960] el1_error+0x84/0x100
> > [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
> > [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
> > [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
> > [ 1.229968] pci_scan_single_device+0xa4/0x144
> > [ 1.229970] pci_scan_slot+0x40/0x12c
> > [ 1.229972] pci_scan_child_bus_extend+0x58/0x34c
> > [ 1.229974] pci_scan_bridge_extend+0x310/0x590
> > [ 1.229976] pci_scan_child_bus_extend+0x210/0x34c
> > [ 1.229978] pci_scan_root_bus_bridge+0x68/0xdc
> > [ 1.229980] pci_host_probe+0x18/0xc4
> > [ 1.229981] rockchip_pcie_probe+0x204/0x330
>
> Include only the parts of the crash dump that are needed to debug the
> problem or identify the problem enough to find this patch. Timestamps
> probably aren't necessary. Register contents -- probably not either.
>
> The rest of the backtrace (below here) probably isn't useful.
>
> > [ 1.229984] platform_drv_probe+0x54/0xb0
> > [ 1.229985] really_probe+0xe8/0x500
> > [ 1.229987] driver_probe_device+0xd8/0xf0
> > [ 1.229989] device_driver_attach+0xc0/0xcc
> > [ 1.229991] __driver_attach+0xa4/0x170
> > [ 1.229993] bus_for_each_dev+0x70/0xc0
> > [ 1.229994] driver_attach+0x24/0x30
> > [ 1.229996] bus_add_driver+0x140/0x234
> > [ 1.229998] driver_register+0x78/0x130
> > [ 1.230000] __platform_driver_register+0x4c/0x60
> > [ 1.230002] rockchip_pcie_driver_init+0x1c/0x28
> > [ 1.230004] do_one_initcall+0x54/0x1c0
> > [ 1.230005] do_initcalls+0xf4/0x130
> > [ 1.230007] kernel_init_freeable+0x144/0x19c
> > [ 1.230009] kernel_init+0x14/0x11c
> > [ 1.230011] ret_from_fork+0x10/0x34
> > [ 1.230035] SMP: stopping secondary CPUs
> > [ 1.230037] Kernel Offset: disabled
> > [ 1.230039] CPU features: 0x0240022,2100200c
> > [ 1.230041] Memory Limit: none
>
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4286,6 +4286,14 @@
> > any pair of devices, possibly at the cost of
> > reduced performance. This also guarantees
> > that hot-added devices will work.
> > + pcie_rockchip_host.bus_scan_delay= [PCIE] Delay in ms before
> > + scanning PCIe bus in Rockchip PCIe host driver. Some PCIe
> > + cards seem to need delays that can be several hundred ms.
> > + If set to greater than or equal to 0 this parameter will
> > + override delay that can be set in device tree.
> > + Values less than 0 the module will hit an assertion
> > + during the init.
> > + The default value is 0.
>
> Generally speaking module-specific stuff like this doesn't get
> documented in kernel-parameters.txt.
>
> > +++ b/arch/arm64/boot/dts/rockchip/rk3399-rockpro64.dtsi
> > @@ -663,7 +663,8 @@ &pcie0 {
> > pinctrl-0 = <&pcie_perst>;
> > vpcie12v-supply = <&vcc12v_dcin>;
> > vpcie3v3-supply = <&vcc3v3_pcie>;
> > - status = "okay";
> > + bus-scan-delay-ms = <0>;
> > + status = "okay";
>
> Please don't add arbitrary whitespace changes (it looks like this
> uses leading spaces instead of tabs).
>
> > +/* bus_scan_delay - module parameter to override the
> > + * device tree value, which is 0 by default. */
>
> Please follow comment style in the file, i.e.,
>
> /*
> * bus_scan_delay - ...
> */
>
> Wrap comments to fill 78 columns or so to match the rest of the file.
>
> > @@ -987,6 +996,23 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
> >
> > rockchip_pcie_enable_interrupts(rockchip);
> >
> > + prob_delay = rockchip->bus_scan_delay;
> > + if (bus_scan_delay)
> > + prob_delay = bus_scan_delay;
> > +
> > + /*
> > + * FIXME: This is a workaround for some devices that crash on calls to pci_host_probe()
> > + * or pci_scan_root_bus_bridge(). We add a delay before bus scanning to avoid the crash.
> > + * The call trace reaches rockchip_pcie_rd_conf() while attempting to read the vendor ID
> > + * (pci_bus_generic_read_dev_vendor_id() is in the call stack) before panicking.
> > + *
> > + * I'm not sure why this workaround is effective or what causes the panic. It may be related
> > + * to the cansleep value.
>
> Wrap comments to fit in 78 columns to match the rest of the file.
>
> > + */
> > + dev_info(dev, "wait %u ms before bus scan\n", prob_delay);
> > + if (prob_delay > 0)
> > + msleep(prob_delay);
>
> I don't think we want to just add a random delay here that's not
> connected to anything else. I assume it could go in
> rockchip_pcie_host_init_port() or perhaps rockchip_pcie_init_port()
> (which deasserts resets, and there are usually timing constraints
> related to deasserting resets). Hopefully Shawn can shed some light
> on this.
>
> > err = pci_host_probe(bridge);
> > if (err < 0)
> > goto err_remove_irq_domain;
> > @@ -1055,6 +1081,11 @@ static struct platform_driver rockchip_pcie_driver = {
> > };
> > module_platform_driver(rockchip_pcie_driver);
> >
> > +/** Allow to override the device tree default configuration with
> > + * a command line argument.
> > + **/
>
> Use multi-line comment style that matches the rest of the file.
>
> > +module_param_named(bus_scan_delay, bus_scan_delay, int, S_IRUGO);
>
> This should go right next to the bus_scan_delay definition above.
>
> > +++ b/drivers/pci/controller/pcie-rockchip.c
> > @@ -149,6 +149,11 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
> > return PTR_ERR(rockchip->clk_pcie_pm);
> > }
> >
> > + err = of_property_read_u32(node, "bus-scan-delay-ms", &rockchip->bus_scan_delay);
> > + if (err) {
> > + dev_info(dev, "no bus scan delay, default to 0 ms\n");
> > + rockchip->bus_scan_delay = 0;
>
> I hope we don't need this property at all, but if we do, I assume it
> should be optional, with no message needed if it's not present.
>
> > +++ b/drivers/pci/controller/pcie-rockchip.h
> > @@ -299,6 +299,16 @@ struct rockchip_pcie {
> > phys_addr_t msg_bus_addr;
> > bool is_rc;
> > struct resource *mem_res;
> > +
> > + /* It seems that the driver crashes on some
> > + * older devices. To work around this, we
> > + * should add a sleep delay before probing.
> > + *
> > + * FIXME: need more investigated with an,
> > + * but looks like the problem can be related with
> > + * the cansleep value?
> > + **/
>
> We need better understanding of what's going on here. Then this
> comment could be made more specific, shorter, and formatted like
> others.
>
> > + u32 bus_scan_delay;
>
> _______________________________________________
> Linux-rockchip mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

2023-05-10 08:14:31

by Greg KH

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> --- a/drivers/pci/controller/pcie-rockchip-host.c
> +++ b/drivers/pci/controller/pcie-rockchip-host.c
> @@ -38,6 +38,10 @@
> #include "../pci.h"
> #include "pcie-rockchip.h"
>
> +/* bus_scan_delay - module parameter to override the
> + * device tree value, which is 0 by default. */
> +static int bus_scan_delay = -1;

Please do not add new module parameters, this is not the 1990's anymore.
We have per-device settings everywhere, this makes that impossible.
Just use a DT value, if it is wrong, then fix the DT value! No need to
have the kernel override it, that's not what DT files are for.

thanks,

greg k-h

2023-05-10 11:06:42

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

> On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > --- a/drivers/pci/controller/pcie-rockchip-host.c
> > +++ b/drivers/pci/controller/pcie-rockchip-host.c
> > @@ -38,6 +38,10 @@
> > #include "../pci.h"
> > #include "pcie-rockchip.h"
> >
> > +/* bus_scan_delay - module parameter to override the
> > + * device tree value, which is 0 by default. */
> > +static int bus_scan_delay = -1;
>
> Please do not add new module parameters, this is not the 1990's anymore.
> We have per-device settings everywhere, this makes that impossible.
> Just use a DT value, if it is wrong, then fix the DT value! No need to
> have the kernel override it, that's not what DT files are for.

Thanks!

Cheers!

Vincent.

2023-05-10 11:37:38

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

> On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <[email protected]> wrote:
> >
> > Hi Vincenzo,
> >
> > Thanks for raising this issue. Let's see what we can do to address
> > it.
> >
> > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > > Add a configurable delay to the Rockchip PCIe driver to address
> > > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > >
> > > This issue is affecting the ARM community, but there is no
> > > upstream solution for it yet.
> >
> > It sounds like this happens with several endpoints, right? And I
> > assume the endpoints work fine in other non-Rockchip systems? If
> > that's the case, my guess is the problem is with the Rockchip host
> > controller and how it's initialized, not with the endpoints.
> >
> > The only delays and timeouts I see in the driver now are in
> > rockchip_pcie_host_init_port(), where it waits for link training to
> > complete. I assume the link training did completely successfully
> > since you don't mention either a gen1 or gen2 timeout (although the
> > gen2 message is a dev_dbg() that normally wouldn't go to the console).
> >
> > I don't know that the spec contains a retrain timeout value. Several
> > other drivers use 1 second, while rockchip uses 500ms (for example,
> > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
> >
> > I think we need to understand the issue better before adding a DT
> > property and a module parameter. Those are hard for users to deal
> > with. If we can figure out a value that works for everybody, it would
> > be better to just hard-code it in the driver and use that all the
> > time.
>
> Good Evening,
>
> The main issue with the rk3399 is the PCIe controller is buggy and
> triggers a SoC panic when certain error conditions occur that should
> be handled gracefully. One of those conditions is when an endpoint
> requests an access to wait and retry later. Many years ago we ran that
> issue to ground and with Robin Murphy's help we found that while it's
> possible to gracefully handle that condition it required hijacking the
> entire arm64 error handling routine. Not exactly scalable for just one
> SoC. The configurable waits allow us to program reasonable times for
> 90% of the endpoints that come up in the normal amount of time, while
> being able to adjust it for the other 10% that do not. Some require
> multiple seconds before they return without error. Part of the reason
> we don't want to hardcode the wait time is because the probe isn't
> handled asynchronously, so the kernel appears to hang while waiting
> for the timeout.

Yeah, I smell a hardware bug in my code. I hate waiting in this way inside
the code, so it's usually wrong when you need to do something like that.

During my research, I also found this patch (https://bugzilla.redhat.com/show_bug.cgi?id=2134177)
that provides a fix in another possibly cleaner way.

But I don't understand the reasoning behind it, so maybe I
haven't spent enough time thinking about it.

> I'm curious if it's been tested with this series on top:
> https://lore.kernel.org/linux-arm-kernel/[email protected]/T/
> I'm particularly curious if
> [v5,04/11] PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
> makes a difference in the behavior. Please test this and see if it
> improves the timeouts you need for the endpoints you're testing
> against.

Mh, I can try to cherry-pick the commit and test it in my own test environment. Currently, I haven't been
able to test it due to a lack of hardware, but I'm seeking a way to obtain one.
Luckily, I have someone on the Manjaro arm team who can help me test it,
so I'll try to do that.

Cheers!

Vincent.

2023-05-10 11:52:16

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

> Hi Vincenzo,

Hi :)

> Thanks for raising this issue. Let's see what we can do to address
> it.

Yeah, as I said in my cover letter, I am not happy with my solution,
but we should start somewhere to discuss it.

> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> >
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right? And I
> assume the endpoints work fine in other non-Rockchip systems? If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.

Yeah, the crash is only reproducible with the Rockchip system, or better,
the crash is reproducible only in some modern devices that use the old
Rockchip driver mentioned in this patch.

> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete. I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value. Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter. Those are hard for users to deal
> with. If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.

Yeah, I see, I see. This makes sense. Is there any path that I can follow in
order to better understand what's going on at the hardware level? In other
words, how can I help to understand this issue better and provide a
unique solution for everybody?

Thanks for the nits in the patch, I will take a look with a fresh mind
later in the day.

Cheers!

Vincent.

2023-05-10 19:58:44

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Wed, May 10, 2023 at 7:16 AM Vincenzo Palazzo
<[email protected]> wrote:
>
> > On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <[email protected]> wrote:
> > >
> > > Hi Vincenzo,
> > >
> > > Thanks for raising this issue. Let's see what we can do to address
> > > it.
> > >
> > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > > > Add a configurable delay to the Rockchip PCIe driver to address
> > > > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > > >
> > > > This issue is affecting the ARM community, but there is no
> > > > upstream solution for it yet.
> > >
> > > It sounds like this happens with several endpoints, right? And I
> > > assume the endpoints work fine in other non-Rockchip systems? If
> > > that's the case, my guess is the problem is with the Rockchip host
> > > controller and how it's initialized, not with the endpoints.
> > >
> > > The only delays and timeouts I see in the driver now are in
> > > rockchip_pcie_host_init_port(), where it waits for link training to
> > > complete. I assume the link training did completely successfully
> > > since you don't mention either a gen1 or gen2 timeout (although the
> > > gen2 message is a dev_dbg() that normally wouldn't go to the console).
> > >
> > > I don't know that the spec contains a retrain timeout value. Several
> > > other drivers use 1 second, while rockchip uses 500ms (for example,
> > > see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
> > >
> > > I think we need to understand the issue better before adding a DT
> > > property and a module parameter. Those are hard for users to deal
> > > with. If we can figure out a value that works for everybody, it would
> > > be better to just hard-code it in the driver and use that all the
> > > time.
> >
> > Good Evening,
> >
> > The main issue with the rk3399 is the PCIe controller is buggy and
> > triggers a SoC panic when certain error conditions occur that should
> > be handled gracefully. One of those conditions is when an endpoint
> > requests an access to wait and retry later. Many years ago we ran that
> > issue to ground and with Robin Murphy's help we found that while it's
> > possible to gracefully handle that condition it required hijacking the
> > entire arm64 error handling routine. Not exactly scalable for just one
> > SoC. The configurable waits allow us to program reasonable times for
> > 90% of the endpoints that come up in the normal amount of time, while
> > being able to adjust it for the other 10% that do not. Some require
> > multiple seconds before they return without error. Part of the reason
> > we don't want to hardcode the wait time is because the probe isn't
> > handled asynchronously, so the kernel appears to hang while waiting
> > for the timeout.
>
> Yeah, I smell a hardware bug in my code. I hate waiting in this way inside
> the code, so it's usually wrong when you need to do something like that.

Correct, it's the unfortunate way of arm64 development. Almost none of
the SoCs implement all of their hardware in a completely specification
compliant way.

>
> During my research, I also found this patch (https://bugzilla.redhat.com/show_bug.cgi?id=2134177)
> that provides a fix in another possibly cleaner way.
>
> But I don't understand the reasoning behind it, so maybe I
> haven't spent enough time thinking about it.

That is a completely different issue, unrelated to the PCI crash.

>
> > I'm curious if it's been tested with this series on top:
> > https://lore.kernel.org/linux-arm-kernel/[email protected]/T/
> > I'm particularly curious if
> > [v5,04/11] PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
> > makes a difference in the behavior. Please test this and see if it
> > improves the timeouts you need for the endpoints you're testing
> > against.
>
> Mh, I can try to cherry-pick the commit and test it in my own test environment. Currently, I haven't been
> able to test it due to a lack of hardware, but I'm seeking a way to obtain one.
> Luckily, I have someone on the Manjaro arm team who can help me test it,
> so I'll try to do that.
>
> Cheers!
>
> Vincent.

2023-05-10 21:18:34

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Tue, May 09, 2023 at 08:11:29PM -0400, Peter Geis wrote:
> On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <[email protected]> wrote:
> > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > > Add a configurable delay to the Rockchip PCIe driver to address
> > > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > >
> > > This issue is affecting the ARM community, but there is no
> > > upstream solution for it yet.
> >
> > It sounds like this happens with several endpoints, right? And I
> > assume the endpoints work fine in other non-Rockchip systems? If
> > that's the case, my guess is the problem is with the Rockchip host
> > controller and how it's initialized, not with the endpoints.
> > ...

> The main issue with the rk3399 is the PCIe controller is buggy and
> triggers a SoC panic when certain error conditions occur that should
> be handled gracefully. One of those conditions is when an endpoint
> requests an access to wait and retry later.

I assume this refers to a Completion with Request Retry Status (RRS)?

> Many years ago we ran that issue to ground and with Robin Murphy's
> help we found that while it's possible to gracefully handle that
> condition it required hijacking the entire arm64 error handling
> routine. Not exactly scalable for just one SoC.

Do you have a pointer to that discussion? The URL might save
repeating the whole exercise and could be useful for the commit log
when we try to resolve this.

> The configurable waits allow us to program reasonable times for
> 90% of the endpoints that come up in the normal amount of time, while
> being able to adjust it for the other 10% that do not. Some require
> multiple seconds before they return without error. Part of the reason
> we don't want to hardcode the wait time is because the probe isn't
> handled asynchronously, so the kernel appears to hang while waiting
> for the timeout.

Is there some way for users to figure out that they would need this
property? Or is it just "if your kernel panics on boot, try
adding or increasing "bus-scan-delay-ms" in your DT?

Bjorn

2023-05-11 01:25:20

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Wed, May 10, 2023 at 4:48 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Tue, May 09, 2023 at 08:11:29PM -0400, Peter Geis wrote:
> > On Tue, May 9, 2023 at 5:19 PM Bjorn Helgaas <[email protected]> wrote:
> > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > > > Add a configurable delay to the Rockchip PCIe driver to address
> > > > crashes that occur on some old devices, such as the Pine64 RockPro64.
> > > >
> > > > This issue is affecting the ARM community, but there is no
> > > > upstream solution for it yet.
> > >
> > > It sounds like this happens with several endpoints, right? And I
> > > assume the endpoints work fine in other non-Rockchip systems? If
> > > that's the case, my guess is the problem is with the Rockchip host
> > > controller and how it's initialized, not with the endpoints.
> > > ...
>
> > The main issue with the rk3399 is the PCIe controller is buggy and
> > triggers a SoC panic when certain error conditions occur that should
> > be handled gracefully. One of those conditions is when an endpoint
> > requests an access to wait and retry later.
>
> I assume this refers to a Completion with Request Retry Status (RRS)?

I'm not sure the full coverage, the test patch from Shawn Lin that
allowed the system to handle the errors has the following description:
"Native defect prevents this RC far from supporting any response from
EP which UR filed is set."

>
> > Many years ago we ran that issue to ground and with Robin Murphy's
> > help we found that while it's possible to gracefully handle that
> > condition it required hijacking the entire arm64 error handling
> > routine. Not exactly scalable for just one SoC.
>
> Do you have a pointer to that discussion? The URL might save
> repeating the whole exercise and could be useful for the commit log
> when we try to resolve this.

The link to the patch email is here, the full discussion is pretty
easy to follow:
https://lore.kernel.org/linux-pci/[email protected]/
Also:
https://lore.kernel.org/linux-rockchip/[email protected]/T/#m9c9d4a28a0d3aa064864f188b8ee3b16ce107aff

>
> > The configurable waits allow us to program reasonable times for
> > 90% of the endpoints that come up in the normal amount of time, while
> > being able to adjust it for the other 10% that do not. Some require
> > multiple seconds before they return without error. Part of the reason
> > we don't want to hardcode the wait time is because the probe isn't
> > handled asynchronously, so the kernel appears to hang while waiting
> > for the timeout.
>
> Is there some way for users to figure out that they would need this
> property? Or is it just "if your kernel panics on boot, try
> adding or increasing "bus-scan-delay-ms" in your DT?

There's a listing of tested cards at:
https://wiki.pine64.org/wiki/ROCKPro64_Hardware_compatibility

Most cards work fine that don't require a large BAR. PCIe switches are
completely dead without the above hack patch. Cards that lie in the
middle are ones that expect BIOS / EFI support to initialize, or ones
that have complex boot roms and don't initialize quickly.
But yes, it's unfortunately going to be "if you panic, increase the
delay" unless a more complete database of cards can be generated.

Very Respectfully,
Peter Geis

>
> Bjorn

2023-05-12 10:54:34

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

> > > Many years ago we ran that issue to ground and with Robin Murphy's
> > > help we found that while it's possible to gracefully handle that
> > > condition it required hijacking the entire arm64 error handling
> > > routine. Not exactly scalable for just one SoC.
> >
> > Do you have a pointer to that discussion? The URL might save
> > repeating the whole exercise and could be useful for the commit log
> > when we try to resolve this.
>
> The link to the patch email is here, the full discussion is pretty
> easy to follow:
> https://lore.kernel.org/linux-pci/[email protected]/
> Also:
> https://lore.kernel.org/linux-rockchip/[email protected]/T/#m9c9d4a28a0d3aa064864f188b8ee3b16ce107aff

I have some concerns about the patch proposed in the email that you share. It seems like
it is quite extensive (code that is it not just related to the HW) just to fix a hardware
issue. I would have expected the code to fix the bug to be integrated into the driver itself,
so that if the hardware will died at some point in the future, I would expect that also the
buddy code will died with it.

However, it is possible that I may have missed something in the patch,
and my thoughts could be wrong.

> >
> > > The configurable waits allow us to program reasonable times for
> > > 90% of the endpoints that come up in the normal amount of time, while
> > > being able to adjust it for the other 10% that do not. Some require
> > > multiple seconds before they return without error. Part of the reason
> > > we don't want to hardcode the wait time is because the probe isn't
> > > handled asynchronously, so the kernel appears to hang while waiting
> > > for the timeout.
> >
> > Is there some way for users to figure out that they would need this
> > property? Or is it just "if your kernel panics on boot, try
> > adding or increasing "bus-scan-delay-ms" in your DT?
>
> There's a listing of tested cards at:
> https://wiki.pine64.org/wiki/ROCKPro64_Hardware_compatibility
>
> Most cards work fine that don't require a large BAR. PCIe switches are
> completely dead without the above hack patch. Cards that lie in the
> middle are ones that expect BIOS / EFI support to initialize, or ones
> that have complex boot roms and don't initialize quickly.
> But yes, it's unfortunately going to be "if you panic, increase the
> delay" unless a more complete database of cards can be generated.

This is really unfortunate because as mentioned in some previous emails,
using sleep time slows down the kernel. Is there any way to tell the kernel
to tell the kernel "hey we need some more time here", or in computer science
terms, load a driver asynchronously?

Thanks.

Vincent.

2023-05-12 16:41:57

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

>
> Thanks for raising this issue. Let's see what we can do to address
> it.
>
> On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> >
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right? And I
> assume the endpoints work fine in other non-Rockchip systems? If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.
>
> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete. I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value. Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter. Those are hard for users to deal
> with. If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.
>
> A few minor style/formatting comments below just for future reference:

Due the recent email I think it is worth make a version 2 of this
patch with your suggestion? and iterate over it another time?

Cheers!

Vincent.

2023-05-13 05:03:25

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

[+cc ARM64 folks, in case you have abort handling tips; thread at:
https://lore.kernel.org/r/[email protected]]

Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a
delay avoids the panic. My theory is a PCIe Request Retry Status to a
Vendor ID config read causes an abort that we don't handle.

> On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
>> ...
>> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
>> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM
>> #1
>> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
>> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
>> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
>> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
>> ...
>> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
>> ...
>> [ 1.229955] nmi_panic+0x8c/0x90
>> [ 1.229956] arm64_serror_panic+0x78/0x84
>> [ 1.229958] do_serror+0x15c/0x160
>> [ 1.229960] el1_error+0x84/0x100
>> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
>> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
>> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
>> [ 1.229968] pci_scan_single_device+0xa4/0x144

On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote:
> ... Is there any way to tell the kernel "hey we need some more time
> here"?

We enumerate PCI devices by trying to read the Vendor ID of every
possible device address (see pci_scan_slot()). On PCIe, if a device
doesn't exist at that address, the Vendor ID config read will be
terminated with Unsupported Request (UR) status. This is normal
and happens every time we enumerate devices.

The crash doesn't happen every time we enumerate, so I don't think
this UR is the problem. Also, if it *were* the problem, adding a
delay would not make any difference.

There *is* a way for a PCIe device to say "I need more time". It does
this by responding to that Vendor ID config read with Request Retry
Status (RRS, aka CRS in older specs), which means "I'm not ready yet,
but I will be ready in the future." Adding a delay would definitely
make a difference here, so my guess is this is what's happening.

Most root complexes return ~0 data to the CPU when a config read
terminates with UR or RRS. It sounds like rockchip does this for UR
but possibly not for RRS.

There is a "RRS Software Visibility" feature, which is supposed to
turn the RRS into a special value (Vendor ID == 0x0001), but per [1],
rockchip doesn't support it (lspci calls it "CRSVisible").

But the CPU load instruction corresponding to the config read has to
complete by reading *something* or else be aborted. It sounds like
it's aborted in this case. I don't know the arm64 details, but if we
could catch that abort and determine that it was an RRS and not a UR,
maybe we could fabricate the magic RRS 0x0001 value.

imx6q_pcie_abort_handler() does something like that, although I think
it's for arm32, not arm64. But obviously we already catch the abort
enough to dump the register state and panic, so maybe there's a way to
extend that?

Bjorn

[1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/

2023-05-13 12:34:04

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Fri, May 12, 2023 at 9:24 PM Bjorn Helgaas <[email protected]> wrote:
>
> [+cc ARM64 folks, in case you have abort handling tips; thread at:
> https://lore.kernel.org/r/[email protected]]
>
> Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a
> delay avoids the panic. My theory is a PCIe Request Retry Status to a
> Vendor ID config read causes an abort that we don't handle.
>
> > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> >> ...
> >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
> >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM
> >> #1
> >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
> >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
> >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
> >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
> >> ...
> >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
> >> ...
> >> [ 1.229955] nmi_panic+0x8c/0x90
> >> [ 1.229956] arm64_serror_panic+0x78/0x84
> >> [ 1.229958] do_serror+0x15c/0x160
> >> [ 1.229960] el1_error+0x84/0x100
> >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
> >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
> >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
> >> [ 1.229968] pci_scan_single_device+0xa4/0x144
>
> On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote:
> > ... Is there any way to tell the kernel "hey we need some more time
> > here"?
>
> We enumerate PCI devices by trying to read the Vendor ID of every
> possible device address (see pci_scan_slot()). On PCIe, if a device
> doesn't exist at that address, the Vendor ID config read will be
> terminated with Unsupported Request (UR) status. This is normal
> and happens every time we enumerate devices.
>
> The crash doesn't happen every time we enumerate, so I don't think
> this UR is the problem. Also, if it *were* the problem, adding a
> delay would not make any difference.

Is this behavior different if there is a switch device forwarding on
the UR? On rk3399 switches are completely non-functional because of
the panic, which is observed in the output of the dmesg in [2] with
the hack patch enabled. Considering what you just described it looks
like the forwarded UR for each non-existent device behind the switch
is causing an serror.

>
> There *is* a way for a PCIe device to say "I need more time". It does
> this by responding to that Vendor ID config read with Request Retry
> Status (RRS, aka CRS in older specs), which means "I'm not ready yet,
> but I will be ready in the future." Adding a delay would definitely
> make a difference here, so my guess is this is what's happening.
>
> Most root complexes return ~0 data to the CPU when a config read
> terminates with UR or RRS. It sounds like rockchip does this for UR
> but possibly not for RRS.
>
> There is a "RRS Software Visibility" feature, which is supposed to
> turn the RRS into a special value (Vendor ID == 0x0001), but per [1],
> rockchip doesn't support it (lspci calls it "CRSVisible").
>
> But the CPU load instruction corresponding to the config read has to
> complete by reading *something* or else be aborted. It sounds like
> it's aborted in this case. I don't know the arm64 details, but if we
> could catch that abort and determine that it was an RRS and not a UR,
> maybe we could fabricate the magic RRS 0x0001 value.
>
> imx6q_pcie_abort_handler() does something like that, although I think
> it's for arm32, not arm64. But obviously we already catch the abort
> enough to dump the register state and panic, so maybe there's a way to
> extend that?

Perhaps a hook mechanism that allows drivers to register with the
serror handler and offer to handle specific errors before the generic
code causes the system panic?

Very Respectfully,
Peter Geis

[2] https://lore.kernel.org/linux-pci/CAMdYzYqn3L7x-vc+_K6jG0EVTiPGbz8pQ-N1Q1mRbcVXE822Yg@mail.gmail.com/

>
> Bjorn
>
> [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/

2023-05-15 11:10:31

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

> >
> > There *is* a way for a PCIe device to say "I need more time". It does
> > this by responding to that Vendor ID config read with Request Retry
> > Status (RRS, aka CRS in older specs), which means "I'm not ready yet,
> > but I will be ready in the future." Adding a delay would definitely
> > make a difference here, so my guess is this is what's happening.
> >
> > Most root complexes return ~0 data to the CPU when a config read
> > terminates with UR or RRS. It sounds like rockchip does this for UR
> > but possibly not for RRS.
> >
> > There is a "RRS Software Visibility" feature, which is supposed to
> > turn the RRS into a special value (Vendor ID == 0x0001), but per [1],
> > rockchip doesn't support it (lspci calls it "CRSVisible").
> >
> > But the CPU load instruction corresponding to the config read has to
> > complete by reading *something* or else be aborted. It sounds like
> > it's aborted in this case. I don't know the arm64 details, but if we
> > could catch that abort and determine that it was an RRS and not a UR,
> > maybe we could fabricate the magic RRS 0x0001 value.
> >
> > imx6q_pcie_abort_handler() does something like that, although I think
> > it's for arm32, not arm64. But obviously we already catch the abort
> > enough to dump the register state and panic, so maybe there's a way to
> > extend that?
>
> Perhaps a hook mechanism that allows drivers to register with the
> serror handler and offer to handle specific errors before the generic
> code causes the system panic?

This sounds to me a good general solution that also help to handle
future HW like this one.

So this is a Concept Ack for me.

Cheers!

Vincent.

2023-05-15 17:32:36

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Sat, May 13, 2023 at 07:40:12AM -0400, Peter Geis wrote:
> On Fri, May 12, 2023 at 9:24 PM Bjorn Helgaas <[email protected]> wrote:
> >
> > [+cc ARM64 folks, in case you have abort handling tips; thread at:
> > https://lore.kernel.org/r/[email protected]]
> >
> > Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a
> > delay avoids the panic. My theory is a PCIe Request Retry Status to a
> > Vendor ID config read causes an abort that we don't handle.
> >
> > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > >> ...
> > >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
> > >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM
> > >> #1
> > >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
> > >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
> > >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
> > >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
> > >> ...
> > >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
> > >> ...
> > >> [ 1.229955] nmi_panic+0x8c/0x90
> > >> [ 1.229956] arm64_serror_panic+0x78/0x84
> > >> [ 1.229958] do_serror+0x15c/0x160
> > >> [ 1.229960] el1_error+0x84/0x100
> > >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
> > >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
> > >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
> > >> [ 1.229968] pci_scan_single_device+0xa4/0x144
> >
> > On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote:
> > > ... Is there any way to tell the kernel "hey we need some more time
> > > here"?
> >
> > We enumerate PCI devices by trying to read the Vendor ID of every
> > possible device address (see pci_scan_slot()). On PCIe, if a device
> > doesn't exist at that address, the Vendor ID config read will be
> > terminated with Unsupported Request (UR) status. This is normal
> > and happens every time we enumerate devices.
> >
> > The crash doesn't happen every time we enumerate, so I don't think
> > this UR is the problem. Also, if it *were* the problem, adding a
> > delay would not make any difference.
>
> Is this behavior different if there is a switch device forwarding on
> the UR? On rk3399 switches are completely non-functional because of
> the panic, which is observed in the output of the dmesg in [2] with
> the hack patch enabled. Considering what you just described it looks
> like the forwarded UR for each non-existent device behind the switch
> is causing an serror.

I don't know exactly what the panic looks like, but I wouldn't expect
UR handling to be different when there's a switch.

pcie-rockchip-host.c does handle devices on the root bus (00)
differently than others because rockchip_pcie_valid_device() knows
that device 00:00 is the only device on the root bus. That part makes
sense because 00:00 is built into the SoC.

I'm a little suspicious of the fact that rockchip_pcie_valid_device()
also enforces that bus 01 can only have a single device on it. No
other *_pcie_valid_device() implementations enforce that. It's true
that traditional PCIe devices can only implement device 00, but ARI
relaxes that by reusing the Device Number as extended Function Number
bits.

> > There *is* a way for a PCIe device to say "I need more time". It does
> > this by responding to that Vendor ID config read with Request Retry
> > Status (RRS, aka CRS in older specs), which means "I'm not ready yet,
> > but I will be ready in the future." Adding a delay would definitely
> > make a difference here, so my guess is this is what's happening.
> >
> > Most root complexes return ~0 data to the CPU when a config read
> > terminates with UR or RRS. It sounds like rockchip does this for UR
> > but possibly not for RRS.
> >
> > There is a "RRS Software Visibility" feature, which is supposed to
> > turn the RRS into a special value (Vendor ID == 0x0001), but per [1],
> > rockchip doesn't support it (lspci calls it "CRSVisible").
> >
> > But the CPU load instruction corresponding to the config read has to
> > complete by reading *something* or else be aborted. It sounds like
> > it's aborted in this case. I don't know the arm64 details, but if we
> > could catch that abort and determine that it was an RRS and not a UR,
> > maybe we could fabricate the magic RRS 0x0001 value.
> >
> > imx6q_pcie_abort_handler() does something like that, although I think
> > it's for arm32, not arm64. But obviously we already catch the abort
> > enough to dump the register state and panic, so maybe there's a way to
> > extend that?
>
> Perhaps a hook mechanism that allows drivers to register with the
> serror handler and offer to handle specific errors before the generic
> code causes the system panic?
>
> Very Respectfully,
> Peter Geis
>
> [2] https://lore.kernel.org/linux-pci/CAMdYzYqn3L7x-vc+_K6jG0EVTiPGbz8pQ-N1Q1mRbcVXE822Yg@mail.gmail.com/
>
> >
> > Bjorn
> >
> > [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/
> _______________________________________________
> Linux-kernel-mentees mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

2023-05-15 20:58:51

[permalink] [raw]

Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for Rockchip PCIe bus scan

On Mon, May 15, 2023 at 12:51 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Sat, May 13, 2023 at 07:40:12AM -0400, Peter Geis wrote:
> > On Fri, May 12, 2023 at 9:24 PM Bjorn Helgaas <[email protected]> wrote:
> > >
> > > [+cc ARM64 folks, in case you have abort handling tips; thread at:
> > > https://lore.kernel.org/r/[email protected]]
> > >
> > > Pine64 RockPro64 panics while enumerating some PCIe devices. Adding a
> > > delay avoids the panic. My theory is a PCIe Request Retry Status to a
> > > Vendor ID config read causes an abort that we don't handle.
> > >
> > > > On Tue, May 09, 2023 at 05:39:12PM +0200, Vincenzo Palazzo wrote:
> > > >> ...
> > > >> [ 1.229856] SError Interrupt on CPU4, code 0xbf000002 -- SError
> > > >> [ 1.229860] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.9.9-2.0-MANJARO-ARM
> > > >> #1
> > > >> [ 1.229862] Hardware name: Pine64 RockPro64 v2.1 (DT)
> > > >> [ 1.229864] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
> > > >> [ 1.229866] pc : rockchip_pcie_rd_conf+0xb4/0x270
> > > >> [ 1.229868] lr : rockchip_pcie_rd_conf+0x1b4/0x270
> > > >> ...
> > > >> [ 1.229939] Kernel panic - not syncing: Asynchronous SError Interrupt
> > > >> ...
> > > >> [ 1.229955] nmi_panic+0x8c/0x90
> > > >> [ 1.229956] arm64_serror_panic+0x78/0x84
> > > >> [ 1.229958] do_serror+0x15c/0x160
> > > >> [ 1.229960] el1_error+0x84/0x100
> > > >> [ 1.229962] rockchip_pcie_rd_conf+0xb4/0x270
> > > >> [ 1.229964] pci_bus_read_config_dword+0x6c/0xd0
> > > >> [ 1.229966] pci_bus_generic_read_dev_vendor_id+0x34/0x1b0
> > > >> [ 1.229968] pci_scan_single_device+0xa4/0x144
> > >
> > > On Fri, May 12, 2023 at 12:46:21PM +0200, Vincenzo Palazzo wrote:
> > > > ... Is there any way to tell the kernel "hey we need some more time
> > > > here"?
> > >
> > > We enumerate PCI devices by trying to read the Vendor ID of every
> > > possible device address (see pci_scan_slot()). On PCIe, if a device
> > > doesn't exist at that address, the Vendor ID config read will be
> > > terminated with Unsupported Request (UR) status. This is normal
> > > and happens every time we enumerate devices.
> > >
> > > The crash doesn't happen every time we enumerate, so I don't think
> > > this UR is the problem. Also, if it *were* the problem, adding a
> > > delay would not make any difference.
> >
> > Is this behavior different if there is a switch device forwarding on
> > the UR? On rk3399 switches are completely non-functional because of
> > the panic, which is observed in the output of the dmesg in [2] with
> > the hack patch enabled. Considering what you just described it looks
> > like the forwarded UR for each non-existent device behind the switch
> > is causing an serror.
>
> I don't know exactly what the panic looks like, but I wouldn't expect
> UR handling to be different when there's a switch.
>
> pcie-rockchip-host.c does handle devices on the root bus (00)
> differently than others because rockchip_pcie_valid_device() knows
> that device 00:00 is the only device on the root bus. That part makes
> sense because 00:00 is built into the SoC.
>
> I'm a little suspicious of the fact that rockchip_pcie_valid_device()
> also enforces that bus 01 can only have a single device on it. No
> other *_pcie_valid_device() implementations enforce that. It's true
> that traditional PCIe devices can only implement device 00, but ARI
> relaxes that by reusing the Device Number as extended Function Number
> bits.

Bjorn, great catch, thank you!

I suspect you're actually onto the core of the problem. Looking
through various other drivers that implement _pcie_valid_device they
all appear to file similar restrictions on scanning for devices. The
drivers are all similar enough that I am starting to suspect they are
all running some version of the same bugged IP. Then I came across
advk_pcie_pio_is_running() in pci-aardvark.c, which describes our
issue pretty spot on including the same exact SError. Interestingly
enough they made a TF-A patch [3] to catch and handle the error
without ever passing it to the kernel. Other limitations they added
are ensuring reads are not attempted while the link is down.
pci-aardvark.c also implements limitations on Completion Retry Status.
It has given me ideas for solving the problem.

Very Respectfully,
Peter Geis

[3] https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50

>
> > > There *is* a way for a PCIe device to say "I need more time". It does
> > > this by responding to that Vendor ID config read with Request Retry
> > > Status (RRS, aka CRS in older specs), which means "I'm not ready yet,
> > > but I will be ready in the future." Adding a delay would definitely
> > > make a difference here, so my guess is this is what's happening.
> > >
> > > Most root complexes return ~0 data to the CPU when a config read
> > > terminates with UR or RRS. It sounds like rockchip does this for UR
> > > but possibly not for RRS.
> > >
> > > There is a "RRS Software Visibility" feature, which is supposed to
> > > turn the RRS into a special value (Vendor ID == 0x0001), but per [1],
> > > rockchip doesn't support it (lspci calls it "CRSVisible").
> > >
> > > But the CPU load instruction corresponding to the config read has to
> > > complete by reading *something* or else be aborted. It sounds like
> > > it's aborted in this case. I don't know the arm64 details, but if we
> > > could catch that abort and determine that it was an RRS and not a UR,
> > > maybe we could fabricate the magic RRS 0x0001 value.
> > >
> > > imx6q_pcie_abort_handler() does something like that, although I think
> > > it's for arm32, not arm64. But obviously we already catch the abort
> > > enough to dump the register state and panic, so maybe there's a way to
> > > extend that?
> >
> > Perhaps a hook mechanism that allows drivers to register with the
> > serror handler and offer to handle specific errors before the generic
> > code causes the system panic?
> >
> > Very Respectfully,
> > Peter Geis
> >
> > [2] https://lore.kernel.org/linux-pci/CAMdYzYqn3L7x-vc+_K6jG0EVTiPGbz8pQ-N1Q1mRbcVXE822Yg@mail.gmail.com/
> >
> > >
> > > Bjorn
> > >
> > > [1] https://lore.kernel.org/linux-pci/CAMdYzYpOFAVq30N+O2gOxXiRtpoHpakFg3LKq3TEZq4S6Y0y0g@mail.gmail.com/
> > _______________________________________________
> > Linux-kernel-mentees mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees

2023-07-12 16:22:09