2023-12-08 22:24:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: Allocate maximum available buses to help extending the daisy chain

[+cc Mika, Maciej]

On Wed, Aug 16, 2023 at 10:49:23AM +0530, Sanath S wrote:
> In the case of Thunderbolt, it contains a PCIe switch and one or
> more hotplug-capable PCIe downstream ports where the daisy chain
> can be extended.
>
> Currently when a Thunderbolt Dock is plugged in during S5/Reboot,
> System BIOS allocates a very minimal number of buses for bridges and
> hot-plug capable PCIe downstream ports to enumerate the dock during
> boot. Because of this, we run out of bus space pretty quickly when
> more PCIe devices are attached to hotplug downstream ports in order
> to extend the chain.
>
> Before:
> +-04.0
> +-04.1-[63-c1]----00.0-[64-69]--+-00.0-[65]--
> | +-01.0-[66]--
> | +-02.0-[67]--
> | +-03.0-[68]--
> | \-04.0-[69]--
> +-08.0

Looks like a clear issue here because there's no other use for
buses 70-c1. But what would happen if there were more hotplug-capable
downstream ports, e.g., assume one at 08.1 leading to [bus c2-c7]?

The 04.1 bridge has a lot of space, but 08.1 has very little. With
this patch, would we distribute it more evenly across 04.1 and 08.1?
If not, I think we'll just have the same problem when somebody plugs
in a similar hierarchy at 08.1.

> In case of a thunderbolt capable bridge, reconfigure the buses allocated
> by BIOS to the maximum available buses. So that the hot-plug bridges gets
> maximum buses and chain can be extended to accommodate more PCIe devices.
> This fix is necessary for all the PCIe downstream ports where the daisy
> chain can be extended.
>
> After:
> +-04.0
> +-04.1-[63-c1]----00.0-[64-c1]--+-00.0-[65]--
> | +-01.0-[66-84]--
> | +-02.0-[85-a3]--
> | +-03.0-[a4-c0]--
> | \-04.0-[c1]--
> +-08.0

This doesn't look like anything specific to Thunderbolt; it's just
that we don't do a good job of reassigning bus numbers in general,
right? We shouldn't just punt and say "BIOS should have done
something" because not all machines *have* BIOS, and the OS can
reconfigure bus numbers as needed. The patch certainly isn't
Thunderbolt-specific.

I guess this patch is on hold for now because the kernel test robot
complained:
https://lore.kernel.org/r/[email protected]
and this hasn't been resolved or explained yet.

> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000
> Signed-off-by: Sanjay R Mehta <[email protected]>
> Signed-off-by: Sanath S <[email protected]>
> ---
> drivers/pci/probe.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8bac3ce02609..ab7e90ef2382 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1263,6 +1263,8 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> bool fixed_buses;
> u8 fixed_sec, fixed_sub;
> int next_busnr;
> + int start = bus->busn_res.start;
> + int end = bus->busn_res.end;
>
> /*
> * Make sure the bridge is powered on to be able to access config
> @@ -1292,6 +1294,13 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
> broken = 1;
> }
>
> + /* Reconfigure, If maximum buses are not allocated */
> + if (!pass && start != 0 && end != 0xff && subordinate != end) {

I don't quite understand the test here. In the "Before" example
above, I think bus->busn_res is [bus 63-c1], and subordinate is 69.
That certainly makes this condition true, but wouldn't you also want
to reallocate bus numbers if bus->busn_res were [bus 63-ff] and
subordinate were 69?

> + pci_info(dev, "Bridge has subordinate 0x%x but max busn 0x%x, reconfiguring\n",

Most other logging here starts with lower-case, e.g., "bridge has ..."
Print the bus numbers in the typical format ("%02x"). Maybe use "%pR"
and &bus->busn_res for the first part.

> + subordinate, end);
> + broken = 1;
> + }
> +
> /*
> * Disable Master-Abort Mode during probing to avoid reporting of
> * bus errors in some architectures.
> --
> 2.34.1
>


2023-12-08 22:30:02

by Mario Limonciello

[permalink] [raw]
Subject: Re: [PATCH] PCI: Allocate maximum available buses to help extending the daisy chain

On 12/8/2023 16:24, Bjorn Helgaas wrote:
> [+cc Mika, Maciej]
>
> On Wed, Aug 16, 2023 at 10:49:23AM +0530, Sanath S wrote:
>> In the case of Thunderbolt, it contains a PCIe switch and one or
>> more hotplug-capable PCIe downstream ports where the daisy chain
>> can be extended.
>>
>> Currently when a Thunderbolt Dock is plugged in during S5/Reboot,
>> System BIOS allocates a very minimal number of buses for bridges and
>> hot-plug capable PCIe downstream ports to enumerate the dock during
>> boot. Because of this, we run out of bus space pretty quickly when
>> more PCIe devices are attached to hotplug downstream ports in order
>> to extend the chain.
>>
>> Before:
>> +-04.0
>> +-04.1-[63-c1]----00.0-[64-69]--+-00.0-[65]--
>> | +-01.0-[66]--
>> | +-02.0-[67]--
>> | +-03.0-[68]--
>> | \-04.0-[69]--
>> +-08.0
>
> Looks like a clear issue here because there's no other use for
> buses 70-c1. But what would happen if there were more hotplug-capable
> downstream ports, e.g., assume one at 08.1 leading to [bus c2-c7]?
>
> The 04.1 bridge has a lot of space, but 08.1 has very little. With
> this patch, would we distribute it more evenly across 04.1 and 08.1?
> If not, I think we'll just have the same problem when somebody plugs
> in a similar hierarchy at 08.1.
>
>> In case of a thunderbolt capable bridge, reconfigure the buses allocated
>> by BIOS to the maximum available buses. So that the hot-plug bridges gets
>> maximum buses and chain can be extended to accommodate more PCIe devices.
>> This fix is necessary for all the PCIe downstream ports where the daisy
>> chain can be extended.
>>
>> After:
>> +-04.0
>> +-04.1-[63-c1]----00.0-[64-c1]--+-00.0-[65]--
>> | +-01.0-[66-84]--
>> | +-02.0-[85-a3]--
>> | +-03.0-[a4-c0]--
>> | \-04.0-[c1]--
>> +-08.0
>
> This doesn't look like anything specific to Thunderbolt; it's just
> that we don't do a good job of reassigning bus numbers in general,
> right? We shouldn't just punt and say "BIOS should have done
> something" because not all machines *have* BIOS, and the OS can
> reconfigure bus numbers as needed. The patch certainly isn't
> Thunderbolt-specific.

From the discussions Sanath and I have been in related to this issue
the BIOS is pretty static with it's initialization under the presumption
that the OS will rebalance things if necessary.

>
> I guess this patch is on hold for now because the kernel test robot
> complained:
> https://lore.kernel.org/r/[email protected]
> and this hasn't been resolved or explained yet.
>

For this particular issue it's being approached a different way.

Windows never rebalances things but doesn't suffer from this issue.
That's because Windows actually does a "Downstream port reset" when it
encounters a USB4 router.

Sanath posted a quirk that aligned this behavior when encountering an
AMD USB4 router, but as part of the discussion I suggested that we do it
for everyone.

https://lore.kernel.org/linux-usb/[email protected]/

So Sanath has a new patch that does this that is under testing right now
and will be posted soon.

Thanks!

>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000
>> Signed-off-by: Sanjay R Mehta <[email protected]>
>> Signed-off-by: Sanath S <[email protected]>
>> ---
>> drivers/pci/probe.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 8bac3ce02609..ab7e90ef2382 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1263,6 +1263,8 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>> bool fixed_buses;
>> u8 fixed_sec, fixed_sub;
>> int next_busnr;
>> + int start = bus->busn_res.start;
>> + int end = bus->busn_res.end;
>>
>> /*
>> * Make sure the bridge is powered on to be able to access config
>> @@ -1292,6 +1294,13 @@ static int pci_scan_bridge_extend(struct pci_bus *bus, struct pci_dev *dev,
>> broken = 1;
>> }
>>
>> + /* Reconfigure, If maximum buses are not allocated */
>> + if (!pass && start != 0 && end != 0xff && subordinate != end) {
>
> I don't quite understand the test here. In the "Before" example
> above, I think bus->busn_res is [bus 63-c1], and subordinate is 69.
> That certainly makes this condition true, but wouldn't you also want
> to reallocate bus numbers if bus->busn_res were [bus 63-ff] and
> subordinate were 69?
>
>> + pci_info(dev, "Bridge has subordinate 0x%x but max busn 0x%x, reconfiguring\n",
>
> Most other logging here starts with lower-case, e.g., "bridge has ..."
> Print the bus numbers in the typical format ("%02x"). Maybe use "%pR"
> and &bus->busn_res for the first part.
>
>> + subordinate, end);
>> + broken = 1;
>> + }
>> +
>> /*
>> * Disable Master-Abort Mode during probing to avoid reporting of
>> * bus errors in some architectures.
>> --
>> 2.34.1
>>

2023-12-08 22:44:15

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: Allocate maximum available buses to help extending the daisy chain

On Fri, Dec 08, 2023 at 04:29:42PM -0600, Mario Limonciello wrote:
> On 12/8/2023 16:24, Bjorn Helgaas wrote:
> > On Wed, Aug 16, 2023 at 10:49:23AM +0530, Sanath S wrote:
> > > In the case of Thunderbolt, it contains a PCIe switch and one or
> > > more hotplug-capable PCIe downstream ports where the daisy chain
> > > can be extended.
> > >
> > > Currently when a Thunderbolt Dock is plugged in during S5/Reboot,
> > > System BIOS allocates a very minimal number of buses for bridges and
> > > hot-plug capable PCIe downstream ports to enumerate the dock during
> > > boot. Because of this, we run out of bus space pretty quickly when
> > > more PCIe devices are attached to hotplug downstream ports in order
> > > to extend the chain.
> > >
> > > Before:
> > > +-04.0
> > > +-04.1-[63-c1]----00.0-[64-69]--+-00.0-[65]--
> > > | +-01.0-[66]--
> > > | +-02.0-[67]--
> > > | +-03.0-[68]--
> > > | \-04.0-[69]--
> > > +-08.0
> >
> > Looks like a clear issue here because there's no other use for
> > buses 70-c1. But what would happen if there were more hotplug-capable
> > downstream ports, e.g., assume one at 08.1 leading to [bus c2-c7]?
> >
> > The 04.1 bridge has a lot of space, but 08.1 has very little. With
> > this patch, would we distribute it more evenly across 04.1 and 08.1?
> > If not, I think we'll just have the same problem when somebody plugs
> > in a similar hierarchy at 08.1.
> >
> > > In case of a thunderbolt capable bridge, reconfigure the buses allocated
> > > by BIOS to the maximum available buses. So that the hot-plug bridges gets
> > > maximum buses and chain can be extended to accommodate more PCIe devices.
> > > This fix is necessary for all the PCIe downstream ports where the daisy
> > > chain can be extended.
> > >
> > > After:
> > > +-04.0
> > > +-04.1-[63-c1]----00.0-[64-c1]--+-00.0-[65]--
> > > | +-01.0-[66-84]--
> > > | +-02.0-[85-a3]--
> > > | +-03.0-[a4-c0]--
> > > | \-04.0-[c1]--
> > > +-08.0
> >
> > This doesn't look like anything specific to Thunderbolt; it's just
> > that we don't do a good job of reassigning bus numbers in general,
> > right? We shouldn't just punt and say "BIOS should have done
> > something" because not all machines *have* BIOS, and the OS can
> > reconfigure bus numbers as needed. The patch certainly isn't
> > Thunderbolt-specific.
>
> From the discussions Sanath and I have been in related to this issue
> the BIOS is pretty static with it's initialization under the
> presumption that the OS will rebalance things if necessary.
> ...

> For this particular issue it's being approached a different way.
>
> Windows never rebalances things but doesn't suffer from this issue.
> That's because Windows actually does a "Downstream port reset" when
> it encounters a USB4 router.
>
> Sanath posted a quirk that aligned this behavior when encountering
> an AMD USB4 router, but as part of the discussion I suggested that
> we do it for everyone.
>
> https://lore.kernel.org/linux-usb/[email protected]/
>
> So Sanath has a new patch that does this that is under testing right
> now and will be posted soon.

Hmm, ok. I don't know what a "downstream port reset" does or how it
resolves the bus number allocation issue, but I'm happy if you have a
fix that doesn't need PCI core changes.

Bjorn

2023-12-11 21:49:05

by Mario Limonciello

[permalink] [raw]
Subject: Re: [PATCH] PCI: Allocate maximum available buses to help extending the daisy chain

On 12/8/2023 16:44, Bjorn Helgaas wrote:
> On Fri, Dec 08, 2023 at 04:29:42PM -0600, Mario Limonciello wrote:
>> On 12/8/2023 16:24, Bjorn Helgaas wrote:
>>> On Wed, Aug 16, 2023 at 10:49:23AM +0530, Sanath S wrote:
>>>> In the case of Thunderbolt, it contains a PCIe switch and one or
>>>> more hotplug-capable PCIe downstream ports where the daisy chain
>>>> can be extended.
>>>>
>>>> Currently when a Thunderbolt Dock is plugged in during S5/Reboot,
>>>> System BIOS allocates a very minimal number of buses for bridges and
>>>> hot-plug capable PCIe downstream ports to enumerate the dock during
>>>> boot. Because of this, we run out of bus space pretty quickly when
>>>> more PCIe devices are attached to hotplug downstream ports in order
>>>> to extend the chain.
>>>>
>>>> Before:
>>>> +-04.0
>>>> +-04.1-[63-c1]----00.0-[64-69]--+-00.0-[65]--
>>>> | +-01.0-[66]--
>>>> | +-02.0-[67]--
>>>> | +-03.0-[68]--
>>>> | \-04.0-[69]--
>>>> +-08.0
>>>
>>> Looks like a clear issue here because there's no other use for
>>> buses 70-c1. But what would happen if there were more hotplug-capable
>>> downstream ports, e.g., assume one at 08.1 leading to [bus c2-c7]?
>>>
>>> The 04.1 bridge has a lot of space, but 08.1 has very little. With
>>> this patch, would we distribute it more evenly across 04.1 and 08.1?
>>> If not, I think we'll just have the same problem when somebody plugs
>>> in a similar hierarchy at 08.1.
>>>
>>>> In case of a thunderbolt capable bridge, reconfigure the buses allocated
>>>> by BIOS to the maximum available buses. So that the hot-plug bridges gets
>>>> maximum buses and chain can be extended to accommodate more PCIe devices.
>>>> This fix is necessary for all the PCIe downstream ports where the daisy
>>>> chain can be extended.
>>>>
>>>> After:
>>>> +-04.0
>>>> +-04.1-[63-c1]----00.0-[64-c1]--+-00.0-[65]--
>>>> | +-01.0-[66-84]--
>>>> | +-02.0-[85-a3]--
>>>> | +-03.0-[a4-c0]--
>>>> | \-04.0-[c1]--
>>>> +-08.0
>>>
>>> This doesn't look like anything specific to Thunderbolt; it's just
>>> that we don't do a good job of reassigning bus numbers in general,
>>> right? We shouldn't just punt and say "BIOS should have done
>>> something" because not all machines *have* BIOS, and the OS can
>>> reconfigure bus numbers as needed. The patch certainly isn't
>>> Thunderbolt-specific.
>>
>> From the discussions Sanath and I have been in related to this issue
>> the BIOS is pretty static with it's initialization under the
>> presumption that the OS will rebalance things if necessary.
>> ...
>
>> For this particular issue it's being approached a different way.
>>
>> Windows never rebalances things but doesn't suffer from this issue.
>> That's because Windows actually does a "Downstream port reset" when
>> it encounters a USB4 router.
>>
>> Sanath posted a quirk that aligned this behavior when encountering
>> an AMD USB4 router, but as part of the discussion I suggested that
>> we do it for everyone.
>>
>> https://lore.kernel.org/linux-usb/[email protected]/
>>
>> So Sanath has a new patch that does this that is under testing right
>> now and will be posted soon.
>
> Hmm, ok. I don't know what a "downstream port reset" does or how it
> resolves the bus number allocation issue, but I'm happy if you have a
> fix that doesn't need PCI core changes.
>
> Bjorn

The issue is specifically with resources that were assigned with BIOS in
this "static case". The downstream port reset ends up resetting the
topology and thus the resources get assigned by Linux instead and will
be better balanced for more devices to be daisy chained.

2023-12-12 20:55:54

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: Allocate maximum available buses to help extending the daisy chain

On Mon, Dec 11, 2023 at 03:48:41PM -0600, Mario Limonciello wrote:
> On 12/8/2023 16:44, Bjorn Helgaas wrote:
> > On Fri, Dec 08, 2023 at 04:29:42PM -0600, Mario Limonciello wrote:
> > > On 12/8/2023 16:24, Bjorn Helgaas wrote:
> > > > On Wed, Aug 16, 2023 at 10:49:23AM +0530, Sanath S wrote:
> > > > > In the case of Thunderbolt, it contains a PCIe switch and one or
> > > > > more hotplug-capable PCIe downstream ports where the daisy chain
> > > > > can be extended.
> > > > >
> > > > > Currently when a Thunderbolt Dock is plugged in during S5/Reboot,
> > > > > System BIOS allocates a very minimal number of buses for bridges and
> > > > > hot-plug capable PCIe downstream ports to enumerate the dock during
> > > > > boot. Because of this, we run out of bus space pretty quickly when
> > > > > more PCIe devices are attached to hotplug downstream ports in order
> > > > > to extend the chain.
> > > > >
> > > > > Before:
> > > > > +-04.0
> > > > > +-04.1-[63-c1]----00.0-[64-69]--+-00.0-[65]--
> > > > > | +-01.0-[66]--
> > > > > | +-02.0-[67]--
> > > > > | +-03.0-[68]--
> > > > > | \-04.0-[69]--
> > > > > +-08.0
> > > >
> > > > Looks like a clear issue here because there's no other use for
> > > > buses 70-c1. But what would happen if there were more
> > > > hotplug-capable downstream ports, e.g., assume one at 08.1
> > > > leading to [bus c2-c7]?
> > > >
> > > > The 04.1 bridge has a lot of space, but 08.1 has very little.
> > > > With this patch, would we distribute it more evenly across
> > > > 04.1 and 08.1? If not, I think we'll just have the same
> > > > problem when somebody plugs in a similar hierarchy at 08.1.
> > > >
> > > > > In case of a thunderbolt capable bridge, reconfigure the
> > > > > buses allocated by BIOS to the maximum available buses. So
> > > > > that the hot-plug bridges gets maximum buses and chain can
> > > > > be extended to accommodate more PCIe devices. This fix is
> > > > > necessary for all the PCIe downstream ports where the daisy
> > > > > chain can be extended.
> > > > >
> > > > > After:
> > > > > +-04.0
> > > > > +-04.1-[63-c1]----00.0-[64-c1]--+-00.0-[65]--
> > > > > | +-01.0-[66-84]--
> > > > > | +-02.0-[85-a3]--
> > > > > | +-03.0-[a4-c0]--
> > > > > | \-04.0-[c1]--
> > > > > +-08.0
> > > >
> > > > This doesn't look like anything specific to Thunderbolt; it's just
> > > > that we don't do a good job of reassigning bus numbers in general,
> > > > right? We shouldn't just punt and say "BIOS should have done
> > > > something" because not all machines *have* BIOS, and the OS can
> > > > reconfigure bus numbers as needed. The patch certainly isn't
> > > > Thunderbolt-specific.
> > >
> > > From the discussions Sanath and I have been in related to this issue
> > > the BIOS is pretty static with it's initialization under the
> > > presumption that the OS will rebalance things if necessary.
> > > ...
> >
> > > For this particular issue it's being approached a different way.
> > >
> > > Windows never rebalances things but doesn't suffer from this issue.
> > > That's because Windows actually does a "Downstream port reset" when
> > > it encounters a USB4 router.
> > >
> > > Sanath posted a quirk that aligned this behavior when encountering
> > > an AMD USB4 router, but as part of the discussion I suggested that
> > > we do it for everyone.
> > >
> > > https://lore.kernel.org/linux-usb/[email protected]/
> > >
> > > So Sanath has a new patch that does this that is under testing right
> > > now and will be posted soon.
> >
> > Hmm, ok. I don't know what a "downstream port reset" does or how it
> > resolves the bus number allocation issue, but I'm happy if you have a
> > fix that doesn't need PCI core changes.
>
> The issue is specifically with resources that were assigned with BIOS in
> this "static case". The downstream port reset ends up resetting the
> topology and thus the resources get assigned by Linux instead and will
> be better balanced for more devices to be daisy chained.

It sounds like the downstream port reset maybe just resets the bridge
secondary/subordinate bus numbers, which forces Linux to reassign
them? But Linux isn't smart enough to proactively reassign them? If
so, the reset sounds a little like a band-aid, not a real fix, but I'm
guessing nobody is signing up to rework that PCI core reassignment
code ;)

Bjorn