2023-12-13 00:36:55

by Igor Mammedov

[permalink] [raw]
Subject: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job

Hacks to mask a race between HBA scan job and bridge re-configuration(s)
during hotplug.

I don't like it a bit but it something that could be done quickly
and solves problems that were reported.

Other options to discuss/possibly more invasive:
1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure
bridge if it's not necessary.
2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on
the bridge or somehow restart the job if it fails
3. any other ideas?


1st reported: https://lore.kernel.org/r/[email protected]

CC: Dongli Zhang <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: Fiona Ebner <[email protected]>
CC: Thomas Lamprecht <[email protected]>

Igor Mammedov (2):
PCI: acpiphp: enable slot only if it hasn't been enabled already
PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a
time

drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--
2.39.3


2023-12-13 08:13:01

by Dongli Zhang

[permalink] [raw]
Subject: Re: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job

Hi Igor,

I am not able to reproduce the issue any longer with the two patches on
top of the mainline linux.

Thank you very much!

Dongli Zhang

On 12/12/23 16:36, Igor Mammedov wrote:
> Hacks to mask a race between HBA scan job and bridge re-configuration(s)
> during hotplug.
>
> I don't like it a bit but it something that could be done quickly
> and solves problems that were reported.
>
> Other options to discuss/possibly more invasive:
> 1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure
> bridge if it's not necessary.
> 2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on
> the bridge or somehow restart the job if it fails
> 3. any other ideas?
>
>
> 1st reported: https://urldefense.com/v3/__https://lore.kernel.org/r/[email protected]__;!!ACWV5N9M2RV99hQ!ORo96Nh22kv1Yj0pazd3c692djoLbWscgouJoyVG1c1CNQnYz-H7nPM7RIp8N-0qQjScZ7BgORR_Lm4oMGMl$
>
> CC: Dongli Zhang <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: Fiona Ebner <[email protected]>
> CC: Thomas Lamprecht <[email protected]>
>
> Igor Mammedov (2):
> PCI: acpiphp: enable slot only if it hasn't been enabled already
> PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a
> time
>
> drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>

2023-12-13 18:11:34

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job

On Wed, Dec 13, 2023 at 01:36:12AM +0100, Igor Mammedov wrote:
> Hacks to mask a race between HBA scan job and bridge re-configuration(s)
> during hotplug.
>
> I don't like it a bit but it something that could be done quickly
> and solves problems that were reported.

I agree, I don't like it either. Adding a 1s delay doesn't address
the real problem, and putting in a band-aid like this means the real
problem would likely never be addressed.

At this point the best option I see is to revert these:

cc22522fd55e2 ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")

I hate the fact that reverting them would mean the root bus hotplug
and ACPI bus check notifications would become issues again.

But keeping these commits even though they add a new different problem
that breaks things for somebody else seems worse to me.

Bjorn

> Other options to discuss/possibly more invasive:
> 1: make sure pci_assign_unassigned_bridge_resources() doesn't reconfigure
> bridge if it's not necessary.
> 2. make SCSI_SCAN_ASYNC job wait till hotplug is finished for all slots on
> the bridge or somehow restart the job if it fails
> 3. any other ideas?
>
>
> 1st reported: https://lore.kernel.org/r/[email protected]
>
> CC: Dongli Zhang <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> CC: Fiona Ebner <[email protected]>
> CC: Thomas Lamprecht <[email protected]>
>
> Igor Mammedov (2):
> PCI: acpiphp: enable slot only if it hasn't been enabled already
> PCI: acpiphp: slowdown hotplug if hotplugging multiple devices at a
> time
>
> drivers/pci/hotplug/acpiphp_glue.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> --
> 2.39.3
>

2023-12-13 18:13:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC 0/2] PCI: acpiphp: workaround race between hotplug and SCSI_SCAN_ASYNC job

On Wed, Dec 13, 2023 at 7:11 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Wed, Dec 13, 2023 at 01:36:12AM +0100, Igor Mammedov wrote:
> > Hacks to mask a race between HBA scan job and bridge re-configuration(s)
> > during hotplug.
> >
> > I don't like it a bit but it something that could be done quickly
> > and solves problems that were reported.
>
> I agree, I don't like it either. Adding a 1s delay doesn't address
> the real problem, and putting in a band-aid like this means the real
> problem would likely never be addressed.
>
> At this point the best option I see is to revert these:
>
> cc22522fd55e2 ("PCI: acpiphp: Use pci_assign_unassigned_bridge_resources() only for non-root bus")
> 40613da52b13f ("PCI: acpiphp: Reassign resources on bridge if necessary")
>
> I hate the fact that reverting them would mean the root bus hotplug
> and ACPI bus check notifications would become issues again.
>
> But keeping these commits even though they add a new different problem
> that breaks things for somebody else seems worse to me.

Agreed.