Dear all,
my home machine stopped to boot starting from kernel version 4.12.7.
The last message I read is about resetting some USB3 bus. It's 100%
reproducible also with any recent kernel up to 4.17.0-rc3.
I bisected down to the following commit:
commit 0e1f0eaed6c20db41ff61e024b361ee3ec9d686c (tag: my_broken_xhci)
Author: Marc Zyngier <[email protected]>
Date: Tue Aug 1 20:11:08 2017 -0500
xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
commit 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 upstream.
The Renesas uPD72020x XHCI controller seems to suffer from a really
annoying bug, where it may retain some of its DMA programming across a XHCI
reset, and despite the driver correctly programming new DMA addresses.
This is visible if the device has been using 64-bit DMA addresses, and is
then switched to using 32-bit DMA addresses. The top 32 bits of the
address (now zero) are ignored are replaced by the 32 bits from the
*previous* programming. Sticking with 64-bit DMA always works, but doesn't
seem very appropriate.
A PCI reset of the device restores the normal functionality, which is done
at probe time. Unfortunately, this has to be done before any quirk has
been discovered, hence the intrusive nature of the fix.
Tested-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
Acked-by: Mathias Nyman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
diff --git drivers/usb/host/pci-quirks.c drivers/usb/host/pci-quirks.c
index 5f4ca7890435..c8f38649f749 100644
--- drivers/usb/host/pci-quirks.c
+++ drivers/usb/host/pci-quirks.c
@@ -1157,3 +1157,23 @@ static void quirk_usb_early_handoff(struct pci_dev *pdev)
}
DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
PCI_CLASS_SERIAL_USB, 8, quirk_usb_early_handoff);
+
+bool usb_xhci_needs_pci_reset(struct pci_dev *pdev)
+{
+ /*
+ * Our dear uPD72020{1,2} friend only partially resets when
+ * asked to via the XHCI interface, and may end up doing DMA
+ * at the wrong addresses, as it keeps the top 32bit of some
+ * addresses from its previous programming under obscure
+ * circumstances.
+ * Give it a good wack at probe time. Unfortunately, this
+ * needs to happen before we've had a chance to discover any
+ * quirk, or the system will be in a rather bad state.
+ */
+ if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
+ (pdev->device == 0x0014 || pdev->device == 0x0015))
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(usb_xhci_needs_pci_reset);
diff --git drivers/usb/host/pci-quirks.h drivers/usb/host/pci-quirks.h
index 655994480198..5582cbafecd4 100644
--- drivers/usb/host/pci-quirks.h
+++ drivers/usb/host/pci-quirks.h
@@ -15,6 +15,7 @@ void usb_asmedia_modifyflowcontrol(struct pci_dev *pdev);
void usb_enable_intel_xhci_ports(struct pci_dev *xhci_pdev);
void usb_disable_xhci_ports(struct pci_dev *xhci_pdev);
void sb800_prefetch(struct device *dev, int on);
+bool usb_xhci_needs_pci_reset(struct pci_dev *pdev);
#else
struct pci_dev;
static inline void usb_amd_quirk_pll_disable(void) {}
diff --git drivers/usb/host/xhci-pci.c drivers/usb/host/xhci-pci.c
index 1ef622ededfd..cefa223f9f08 100644
--- drivers/usb/host/xhci-pci.c
+++ drivers/usb/host/xhci-pci.c
@@ -285,6 +285,13 @@ static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
driver = (struct hc_driver *)id->driver_data;
+ /* For some HW implementation, a XHCI reset is just not enough... */
+ if (usb_xhci_needs_pci_reset(dev)) {
+ dev_info(&dev->dev, "Resetting\n");
+ if (pci_reset_function_locked(dev))
+ dev_warn(&dev->dev, "Reset failed");
+ }
+
/* Prevent runtime suspending between USB-2 and USB-3 initialization */
pm_runtime_get_noresume(&dev->dev);
---
Commenting out the call to pci_reset_function_locked() makes 4.17.0-rc3
boot again so I think this bug qualifies as regression.
I investigated a bit, the culprit is pci_reset_secondary_bus().
pci_reset_function_locked ->
__pci_reset_function_locked ->
pci_reset_bridge_secondary_bus ->
pcibios_reset_secondary_bus ->
pci_reset_secondary_bus ->>>>>> here the system dies/hangs/stops
I can read the printk I put right before PCI_BRIDGE_CTL_BUS_RESET is
written in PCI_BRIDGE_CONTROL. I cannot read the printk I put right
after.
It seems my system doesn't like that PCI reset at all. I cannot swear
that it is completely frozen, some disk activity might be happening
shortly after, but my only option is to power cycle.
I cannot debug easily. I tried to boot with a patched module and just
attempted to work at it on a live machine but as soon I unload the
module I'm left without keyboard/mouse (apparently all the accessible
USB ports are going through the xhci-pci module) and at the moment I
cannot go via network.
This is the output of lspci -vt:
-[0000:00]-+-00.0 Intel Corporation 4th Gen Core Processor DRAM Controller
+-01.0-[01]----00.0 NVIDIA Corporation GM108M [GeForce 840M]
+-02.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
+-03.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
+-14.0 Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
+-16.0 Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
+-1a.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
+-1b.0 Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
+-1c.0-[02]--
+-1c.2-[03]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
+-1c.3-[04]----00.0 Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader
+-1c.4-[05]----00.0 Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter
+-1c.5-[06]----00.0 Renesas Technology Corp. uPD720202 USB 3.0 Host Controller
+-1d.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
+-1f.0 Intel Corporation C220 Series Chipset Family H81 Express LPC Controller
+-1f.2 Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
\-1f.3 Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller
The call to pci_reset_secondary_bus() is correctly applied to -1c.5-.
I suspect that -1c.5- might benefit from having PCI_DEV_FLAGS_NO_BUS_RESET
in its dev->dev_flags but I'm not sure that it's the proper fix and how
exactly it could end up there.
Any suggestion on how to proceed further?
Please ask more info if needed.
Kind regards,
Domenico
--
3B10 0CA1 8674 ACBA B4FE FCD2 CE5B CF17 9960 DE13
On 2 May 2018 at 10:06, Domenico Andreoli <[email protected]> wrote:
> Dear all,
>
> my home machine stopped to boot starting from kernel version 4.12.7.
>
> The last message I read is about resetting some USB3 bus. It's 100%
> reproducible also with any recent kernel up to 4.17.0-rc3.
>
> I bisected down to the following commit:
>
> commit 0e1f0eaed6c20db41ff61e024b361ee3ec9d686c (tag: my_broken_xhci)
> Author: Marc Zyngier <[email protected]>
> Date: Tue Aug 1 20:11:08 2017 -0500
>
> xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
>
Hello Domenico,
Long time no see :-)
Please refer to the following thread
https://marc.info/?l=linux-usb&m=151872295419486
for some discussion on this topic, and a possible workaround.
Regards,
Ard.
> commit 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 upstream.
>
> The Renesas uPD72020x XHCI controller seems to suffer from a really
> annoying bug, where it may retain some of its DMA programming across a XHCI
> reset, and despite the driver correctly programming new DMA addresses.
> This is visible if the device has been using 64-bit DMA addresses, and is
> then switched to using 32-bit DMA addresses. The top 32 bits of the
> address (now zero) are ignored are replaced by the 32 bits from the
> *previous* programming. Sticking with 64-bit DMA always works, but doesn't
> seem very appropriate.
>
> A PCI reset of the device restores the normal functionality, which is done
> at probe time. Unfortunately, this has to be done before any quirk has
> been discovered, hence the intrusive nature of the fix.
>
> Tested-by: Ard Biesheuvel <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> Acked-by: Mathias Nyman <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> diff --git drivers/usb/host/pci-quirks.c drivers/usb/host/pci-quirks.c
> index 5f4ca7890435..c8f38649f749 100644
> --- drivers/usb/host/pci-quirks.c
> +++ drivers/usb/host/pci-quirks.c
> @@ -1157,3 +1157,23 @@ static void quirk_usb_early_handoff(struct pci_dev *pdev)
> }
> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_ANY_ID, PCI_ANY_ID,
> PCI_CLASS_SERIAL_USB, 8, quirk_usb_early_handoff);
> +
> +bool usb_xhci_needs_pci_reset(struct pci_dev *pdev)
> +{
> + /*
> + * Our dear uPD72020{1,2} friend only partially resets when
> + * asked to via the XHCI interface, and may end up doing DMA
> + * at the wrong addresses, as it keeps the top 32bit of some
> + * addresses from its previous programming under obscure
> + * circumstances.
> + * Give it a good wack at probe time. Unfortunately, this
> + * needs to happen before we've had a chance to discover any
> + * quirk, or the system will be in a rather bad state.
> + */
> + if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
> + (pdev->device == 0x0014 || pdev->device == 0x0015))
> + return true;
> +
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(usb_xhci_needs_pci_reset);
> diff --git drivers/usb/host/pci-quirks.h drivers/usb/host/pci-quirks.h
> index 655994480198..5582cbafecd4 100644
> --- drivers/usb/host/pci-quirks.h
> +++ drivers/usb/host/pci-quirks.h
> @@ -15,6 +15,7 @@ void usb_asmedia_modifyflowcontrol(struct pci_dev *pdev);
> void usb_enable_intel_xhci_ports(struct pci_dev *xhci_pdev);
> void usb_disable_xhci_ports(struct pci_dev *xhci_pdev);
> void sb800_prefetch(struct device *dev, int on);
> +bool usb_xhci_needs_pci_reset(struct pci_dev *pdev);
> #else
> struct pci_dev;
> static inline void usb_amd_quirk_pll_disable(void) {}
> diff --git drivers/usb/host/xhci-pci.c drivers/usb/host/xhci-pci.c
> index 1ef622ededfd..cefa223f9f08 100644
> --- drivers/usb/host/xhci-pci.c
> +++ drivers/usb/host/xhci-pci.c
> @@ -285,6 +285,13 @@ static int xhci_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
>
> driver = (struct hc_driver *)id->driver_data;
>
> + /* For some HW implementation, a XHCI reset is just not enough... */
> + if (usb_xhci_needs_pci_reset(dev)) {
> + dev_info(&dev->dev, "Resetting\n");
> + if (pci_reset_function_locked(dev))
> + dev_warn(&dev->dev, "Reset failed");
> + }
> +
> /* Prevent runtime suspending between USB-2 and USB-3 initialization */
> pm_runtime_get_noresume(&dev->dev);
>
> ---
>
> Commenting out the call to pci_reset_function_locked() makes 4.17.0-rc3
> boot again so I think this bug qualifies as regression.
>
> I investigated a bit, the culprit is pci_reset_secondary_bus().
>
> pci_reset_function_locked ->
> __pci_reset_function_locked ->
> pci_reset_bridge_secondary_bus ->
> pcibios_reset_secondary_bus ->
> pci_reset_secondary_bus ->>>>>> here the system dies/hangs/stops
>
> I can read the printk I put right before PCI_BRIDGE_CTL_BUS_RESET is
> written in PCI_BRIDGE_CONTROL. I cannot read the printk I put right
> after.
>
> It seems my system doesn't like that PCI reset at all. I cannot swear
> that it is completely frozen, some disk activity might be happening
> shortly after, but my only option is to power cycle.
>
> I cannot debug easily. I tried to boot with a patched module and just
> attempted to work at it on a live machine but as soon I unload the
> module I'm left without keyboard/mouse (apparently all the accessible
> USB ports are going through the xhci-pci module) and at the moment I
> cannot go via network.
>
> This is the output of lspci -vt:
>
> -[0000:00]-+-00.0 Intel Corporation 4th Gen Core Processor DRAM Controller
> +-01.0-[01]----00.0 NVIDIA Corporation GM108M [GeForce 840M]
> +-02.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
> +-03.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
> +-14.0 Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI
> +-16.0 Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1
> +-1a.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2
> +-1b.0 Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller
> +-1c.0-[02]--
> +-1c.2-[03]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> +-1c.3-[04]----00.0 Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader
> +-1c.4-[05]----00.0 Realtek Semiconductor Co., Ltd. RTL8821AE 802.11ac PCIe Wireless Network Adapter
> +-1c.5-[06]----00.0 Renesas Technology Corp. uPD720202 USB 3.0 Host Controller
> +-1d.0 Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1
> +-1f.0 Intel Corporation C220 Series Chipset Family H81 Express LPC Controller
> +-1f.2 Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
> \-1f.3 Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller
>
> The call to pci_reset_secondary_bus() is correctly applied to -1c.5-.
>
> I suspect that -1c.5- might benefit from having PCI_DEV_FLAGS_NO_BUS_RESET
> in its dev->dev_flags but I'm not sure that it's the proper fix and how
> exactly it could end up there.
>
> Any suggestion on how to proceed further?
>
> Please ask more info if needed.
>
> Kind regards,
> Domenico
>
> --
> 3B10 0CA1 8674 ACBA B4FE FCD2 CE5B CF17 9960 DE13
On Wed, May 02, 2018 at 10:22:05AM +0200, Ard Biesheuvel wrote:
> On 2 May 2018 at 10:06, Domenico Andreoli <[email protected]> wrote:
> > Dear all,
> >
> > my home machine stopped to boot starting from kernel version 4.12.7.
> >
> > The last message I read is about resetting some USB3 bus. It's 100%
> > reproducible also with any recent kernel up to 4.17.0-rc3.
> >
> > I bisected down to the following commit:
> >
> > commit 0e1f0eaed6c20db41ff61e024b361ee3ec9d686c (tag: my_broken_xhci)
> > Author: Marc Zyngier <[email protected]>
> > Date: Tue Aug 1 20:11:08 2017 -0500
> >
> > xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
> >
>
> Hello Domenico,
Sir..
>
> Long time no see :-)
Instead I see you pretty much everywhere :P
>
> Please refer to the following thread
>
> https://marc.info/?l=linux-usb&m=151872295419486
>
> for some discussion on this topic, and a possible workaround.
Interesting, thanks for the pointer!
I'm happy to learn I've this nice piece of HW in my machine. It might
have at least some other issue while trying to hibernate (there is a
fair chance for it to abort the process because of some "business").
I tried the usb/uPD720202-reset branch from Marc (with some massaging
due to quirks now being >32) and my system boots again. I think it's
more because of the PCI reset is removed, I am not affected by the
original issue it was meant to fix.
>
> Regards,
> Ard.
thanks,
Dom
--
3B10 0CA1 8674 ACBA B4FE FCD2 CE5B CF17 9960 DE13
On 2 May 2018 at 11:18, Domenico Andreoli <[email protected]> wrote:
> On Wed, May 02, 2018 at 10:22:05AM +0200, Ard Biesheuvel wrote:
>> On 2 May 2018 at 10:06, Domenico Andreoli <[email protected]> wrote:
>> > Dear all,
>> >
>> > my home machine stopped to boot starting from kernel version 4.12.7.
>> >
>> > The last message I read is about resetting some USB3 bus. It's 100%
>> > reproducible also with any recent kernel up to 4.17.0-rc3.
>> >
>> > I bisected down to the following commit:
>> >
>> > commit 0e1f0eaed6c20db41ff61e024b361ee3ec9d686c (tag: my_broken_xhci)
>> > Author: Marc Zyngier <[email protected]>
>> > Date: Tue Aug 1 20:11:08 2017 -0500
>> >
>> > xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
>> >
>>
>> Hello Domenico,
>
> Sir..
>
>>
>> Long time no see :-)
>
> Instead I see you pretty much everywhere :P
>
>>
>> Please refer to the following thread
>>
>> https://marc.info/?l=linux-usb&m=151872295419486
>>
>> for some discussion on this topic, and a possible workaround.
>
> Interesting, thanks for the pointer!
>
> I'm happy to learn I've this nice piece of HW in my machine. It might
> have at least some other issue while trying to hibernate (there is a
> fair chance for it to abort the process because of some "business").
>
> I tried the usb/uPD720202-reset branch from Marc (with some massaging
> due to quirks now being >32) and my system boots again. I think it's
> more because of the PCI reset is removed, I am not affected by the
> original issue it was meant to fix.
>
Indeed. The majority of x86 machines never perform DMA above 4 GB in
UEFI, so the issue does not occur, although it could still affect a
kernel with IOMMU on kexec'ed from a kernel with IOMMU off.