2020-07-23 09:52:24

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PCI] 3233e41d3e: WARNING:at_drivers/pci/pci.c:#pci_reset_hotplug_slot

On Thu, Jul 23, 2020 at 05:13:06PM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-9):
[...]
> commit: 3233e41d3e8ebcd44e92da47ffed97fd49b84278 ("[PATCH] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock")
[...]
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> [ 0.971752] WARNING: CPU: 0 PID: 1 at drivers/pci/pci.c:4905 pci_reset_hotplug_slot+0x70/0x80

Thank you, trusty robot.

I botched the call to lockdep_assert_held_write(), it should have been
conditional on "if (probe)".

Happy to respin the patch, but I'd like to hear opinions on the locking
issues surrounding xen and octeon (and the patch in general).

In particular, would a solution be entertained wherein the pci_dev is
reset by the PCI core after driver unbinding, contingent on a flag which
is set by a PCI driver to indicate that the pci_dev is returned to the
core in an unclean state?

Also, why does xen require a device reset on bind?

Thanks!

Lukas


2020-09-09 20:42:19

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PCI] 3233e41d3e: WARNING:at_drivers/pci/pci.c:#pci_reset_hotplug_slot

On Thu, Jul 23, 2020 at 11:51:52AM +0200, Lukas Wunner wrote:
> On Thu, Jul 23, 2020 at 05:13:06PM +0800, kernel test robot wrote:
> > FYI, we noticed the following commit (built with gcc-9):
> [...]
> > commit: 3233e41d3e8ebcd44e92da47ffed97fd49b84278 ("[PATCH] PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock")
> [...]
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > [ 0.971752] WARNING: CPU: 0 PID: 1 at drivers/pci/pci.c:4905 pci_reset_hotplug_slot+0x70/0x80
>
> Thank you, trusty robot.
>
> I botched the call to lockdep_assert_held_write(), it should have been
> conditional on "if (probe)".
>
> Happy to respin the patch, but I'd like to hear opinions on the locking
> issues surrounding xen and octeon (and the patch in general).

I wish liquidio/octeon weren't a special case. Why should that driver
reset the device when unbinding when no other drivers do?

Looks like this was added by 70535350e26f ("liquidio: with embedded
f/w, don't reload f/w, issue pf flr at exit"). Maybe Rick will chime
in.

> In particular, would a solution be entertained wherein the pci_dev is
> reset by the PCI core after driver unbinding, contingent on a flag which
> is set by a PCI driver to indicate that the pci_dev is returned to the
> core in an unclean state?

How would we do this? The PCI core isn't called after unbinding, is
it? So I guess we'd have to have a queue and a worker thread to
process it?

Device removal also has nasty locking issues, and a queue might help
solve those, too. Might also help in the problematic case of
40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"),
which we had to revert.

> Also, why does xen require a device reset on bind?
>
> Thanks!
>
> Lukas