2021-07-13 07:58:23

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH 1/1] PCI: Coalesce host bridge contiguous apertures without sorting

Commit 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
sorts the resources by address so the resources can be swapped.

Before:
PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])

And after:
PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])

However, the sorted resources make NVMe stops working on QEMU ppc:sam460ex.

Resources in the original issue are already in ascending order:
pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]

So remove the sorting part to resolve the issue.

Reported-by: Guenter Roeck <[email protected]>
Fixes: 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
Signed-off-by: Kai-Heng Feng <[email protected]>
---
drivers/pci/probe.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 79177ac37880..5de157600466 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -877,11 +877,11 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
static int pci_register_host_bridge(struct pci_host_bridge *bridge)
{
struct device *parent = bridge->dev.parent;
- struct resource_entry *window, *n;
+ struct resource_entry *window, *next, *n;
struct pci_bus *bus, *b;
- resource_size_t offset;
+ resource_size_t offset, next_offset;
LIST_HEAD(resources);
- struct resource *res;
+ struct resource *res, *next_res;
char addr[64], *fmt;
const char *name;
int err;
@@ -961,11 +961,34 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
if (nr_node_ids > 1 && pcibus_to_node(bus) == NUMA_NO_NODE)
dev_warn(&bus->dev, "Unknown NUMA node; performance will be reduced\n");

+ /* Coalesce contiguous windows */
+ resource_list_for_each_entry_safe(window, n, &resources) {
+ if (list_is_last(&window->node, &resources))
+ break;
+
+ next = list_next_entry(window, node);
+ offset = window->offset;
+ res = window->res;
+ next_offset = next->offset;
+ next_res = next->res;
+
+ if (res->flags != next_res->flags || offset != next_offset)
+ continue;
+
+ if (res->end + 1 == next_res->start) {
+ next_res->start = res->start;
+ res->flags = res->start = res->end = 0;
+ }
+ }
+
/* Add initial resources to the bus */
resource_list_for_each_entry_safe(window, n, &resources) {
- list_move_tail(&window->node, &bridge->windows);
offset = window->offset;
res = window->res;
+ if (!res->end)
+ continue;
+
+ list_move_tail(&window->node, &bridge->windows);

if (res->flags & IORESOURCE_BUS)
pci_bus_insert_busn_res(bus, bus->number, res->end);
--
2.31.1


2021-07-13 08:48:28

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Coalesce host bridge contiguous apertures without sorting

On 7/13/21 12:57 AM, Kai-Heng Feng wrote:
> Commit 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
> sorts the resources by address so the resources can be swapped.
>
> Before:
> PCI host bridge to bus 0002:00
> pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
> pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
> pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
>
> And after:
> PCI host bridge to bus 0002:00
> pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
> pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
> pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
>
> However, the sorted resources make NVMe stops working on QEMU ppc:sam460ex.
>
> Resources in the original issue are already in ascending order:
> pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
> pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]
>
> So remove the sorting part to resolve the issue.
>
> Reported-by: Guenter Roeck <[email protected]>
> Fixes: 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
> Signed-off-by: Kai-Heng Feng <[email protected]>

I think the original commit message would make more sense here. This patch
doesn't fix 65db04053efe, it replaces it. The commit message now misses
the point, and the patch coalesces continuous apertures without explaining
the reason.

Guenter

> ---
> drivers/pci/probe.c | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 79177ac37880..5de157600466 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -877,11 +877,11 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
> static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> {
> struct device *parent = bridge->dev.parent;
> - struct resource_entry *window, *n;
> + struct resource_entry *window, *next, *n;
> struct pci_bus *bus, *b;
> - resource_size_t offset;
> + resource_size_t offset, next_offset;
> LIST_HEAD(resources);
> - struct resource *res;
> + struct resource *res, *next_res;
> char addr[64], *fmt;
> const char *name;
> int err;
> @@ -961,11 +961,34 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> if (nr_node_ids > 1 && pcibus_to_node(bus) == NUMA_NO_NODE)
> dev_warn(&bus->dev, "Unknown NUMA node; performance will be reduced\n");
>
> + /* Coalesce contiguous windows */
> + resource_list_for_each_entry_safe(window, n, &resources) {
> + if (list_is_last(&window->node, &resources))
> + break;
> +
> + next = list_next_entry(window, node);
> + offset = window->offset;
> + res = window->res;
> + next_offset = next->offset;
> + next_res = next->res;
> +
> + if (res->flags != next_res->flags || offset != next_offset)
> + continue;
> +
> + if (res->end + 1 == next_res->start) {
> + next_res->start = res->start;
> + res->flags = res->start = res->end = 0;
> + }
> + }
> +
> /* Add initial resources to the bus */
> resource_list_for_each_entry_safe(window, n, &resources) {
> - list_move_tail(&window->node, &bridge->windows);
> offset = window->offset;
> res = window->res;
> + if (!res->end)
> + continue;
> +
> + list_move_tail(&window->node, &bridge->windows);
>
> if (res->flags & IORESOURCE_BUS)
> pci_bus_insert_busn_res(bus, bus->number, res->end);
>

2021-07-13 08:51:35

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Coalesce host bridge contiguous apertures without sorting

On Tue, Jul 13, 2021 at 4:45 PM Guenter Roeck <[email protected]> wrote:
>
> On 7/13/21 12:57 AM, Kai-Heng Feng wrote:
> > Commit 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
> > sorts the resources by address so the resources can be swapped.
> >
> > Before:
> > PCI host bridge to bus 0002:00
> > pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
> > pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
> > pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
> >
> > And after:
> > PCI host bridge to bus 0002:00
> > pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
> > pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
> > pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
> >
> > However, the sorted resources make NVMe stops working on QEMU ppc:sam460ex.
> >
> > Resources in the original issue are already in ascending order:
> > pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
> > pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]
> >
> > So remove the sorting part to resolve the issue.
> >
> > Reported-by: Guenter Roeck <[email protected]>
> > Fixes: 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
> > Signed-off-by: Kai-Heng Feng <[email protected]>
>
> I think the original commit message would make more sense here. This patch
> doesn't fix 65db04053efe, it replaces it. The commit message now misses
> the point, and the patch coalesces continuous apertures without explaining
> the reason.

Because the message is already in the git log so I didn't think that's
necessary.
Will send another one with the original message along with this one.

Kai-Heng

>
> Guenter
>
> > ---
> > drivers/pci/probe.c | 31 +++++++++++++++++++++++++++----
> > 1 file changed, 27 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 79177ac37880..5de157600466 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -877,11 +877,11 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
> > static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> > {
> > struct device *parent = bridge->dev.parent;
> > - struct resource_entry *window, *n;
> > + struct resource_entry *window, *next, *n;
> > struct pci_bus *bus, *b;
> > - resource_size_t offset;
> > + resource_size_t offset, next_offset;
> > LIST_HEAD(resources);
> > - struct resource *res;
> > + struct resource *res, *next_res;
> > char addr[64], *fmt;
> > const char *name;
> > int err;
> > @@ -961,11 +961,34 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> > if (nr_node_ids > 1 && pcibus_to_node(bus) == NUMA_NO_NODE)
> > dev_warn(&bus->dev, "Unknown NUMA node; performance will be reduced\n");
> >
> > + /* Coalesce contiguous windows */
> > + resource_list_for_each_entry_safe(window, n, &resources) {
> > + if (list_is_last(&window->node, &resources))
> > + break;
> > +
> > + next = list_next_entry(window, node);
> > + offset = window->offset;
> > + res = window->res;
> > + next_offset = next->offset;
> > + next_res = next->res;
> > +
> > + if (res->flags != next_res->flags || offset != next_offset)
> > + continue;
> > +
> > + if (res->end + 1 == next_res->start) {
> > + next_res->start = res->start;
> > + res->flags = res->start = res->end = 0;
> > + }
> > + }
> > +
> > /* Add initial resources to the bus */
> > resource_list_for_each_entry_safe(window, n, &resources) {
> > - list_move_tail(&window->node, &bridge->windows);
> > offset = window->offset;
> > res = window->res;
> > + if (!res->end)
> > + continue;
> > +
> > + list_move_tail(&window->node, &bridge->windows);
> >
> > if (res->flags & IORESOURCE_BUS)
> > pci_bus_insert_busn_res(bus, bus->number, res->end);
> >
>

2021-07-13 09:10:09

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 1/1] PCI: Coalesce host bridge contiguous apertures without sorting

On 7/13/21 1:49 AM, Kai-Heng Feng wrote:
> On Tue, Jul 13, 2021 at 4:45 PM Guenter Roeck <[email protected]> wrote:
>>
>> On 7/13/21 12:57 AM, Kai-Heng Feng wrote:
>>> Commit 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
>>> sorts the resources by address so the resources can be swapped.
>>>
>>> Before:
>>> PCI host bridge to bus 0002:00
>>> pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
>>> pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
>>> pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
>>>
>>> And after:
>>> PCI host bridge to bus 0002:00
>>> pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
>>> pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
>>> pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
>>>
>>> However, the sorted resources make NVMe stops working on QEMU ppc:sam460ex.
>>>
>>> Resources in the original issue are already in ascending order:
>>> pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
>>> pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]
>>>
>>> So remove the sorting part to resolve the issue.
>>>
>>> Reported-by: Guenter Roeck <[email protected]>
>>> Fixes: 65db04053efe ("PCI: Coalesce host bridge contiguous apertures")
>>> Signed-off-by: Kai-Heng Feng <[email protected]>
>>
>> I think the original commit message would make more sense here. This patch
>> doesn't fix 65db04053efe, it replaces it. The commit message now misses
>> the point, and the patch coalesces continuous apertures without explaining
>> the reason.
>
> Because the message is already in the git log so I didn't think that's
> necessary.

Hmm, not my decision to make, but the original commit got reverted.
The commit log associated with this patch should still reflect what
the patch does.

Guenter

> Will send another one with the original message along with this one.
>
> Kai-Heng
>
>>
>> Guenter
>>
>>> ---
>>> drivers/pci/probe.c | 31 +++++++++++++++++++++++++++----
>>> 1 file changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index 79177ac37880..5de157600466 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -877,11 +877,11 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
>>> static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>>> {
>>> struct device *parent = bridge->dev.parent;
>>> - struct resource_entry *window, *n;
>>> + struct resource_entry *window, *next, *n;
>>> struct pci_bus *bus, *b;
>>> - resource_size_t offset;
>>> + resource_size_t offset, next_offset;
>>> LIST_HEAD(resources);
>>> - struct resource *res;
>>> + struct resource *res, *next_res;
>>> char addr[64], *fmt;
>>> const char *name;
>>> int err;
>>> @@ -961,11 +961,34 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>>> if (nr_node_ids > 1 && pcibus_to_node(bus) == NUMA_NO_NODE)
>>> dev_warn(&bus->dev, "Unknown NUMA node; performance will be reduced\n");
>>>
>>> + /* Coalesce contiguous windows */
>>> + resource_list_for_each_entry_safe(window, n, &resources) {
>>> + if (list_is_last(&window->node, &resources))
>>> + break;
>>> +
>>> + next = list_next_entry(window, node);
>>> + offset = window->offset;
>>> + res = window->res;
>>> + next_offset = next->offset;
>>> + next_res = next->res;
>>> +
>>> + if (res->flags != next_res->flags || offset != next_offset)
>>> + continue;
>>> +
>>> + if (res->end + 1 == next_res->start) {
>>> + next_res->start = res->start;
>>> + res->flags = res->start = res->end = 0;
>>> + }
>>> + }
>>> +
>>> /* Add initial resources to the bus */
>>> resource_list_for_each_entry_safe(window, n, &resources) {
>>> - list_move_tail(&window->node, &bridge->windows);
>>> offset = window->offset;
>>> res = window->res;
>>> + if (!res->end)
>>> + continue;
>>> +
>>> + list_move_tail(&window->node, &bridge->windows);
>>>
>>> if (res->flags & IORESOURCE_BUS)
>>> pci_bus_insert_busn_res(bus, bus->number, res->end);
>>>
>>

2021-07-13 12:51:01

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH v2] PCI: Reinstate "PCI: Coalesce host bridge contiguous apertures"

Built-in graphics on HP EliteDesk 805 G6 doesn't work because graphics
can't get the BAR it needs:
pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]

pci 0000:00:08.1: bridge window [mem 0xd2000000-0xd23fffff]
pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100401fffff 64bit pref]
pci 0000:00:08.1: can't claim BAR 15 [mem 0x10030000000-0x100401fffff 64bit pref]: no compatible bridge window
pci 0000:00:08.1: [mem 0x10030000000-0x100401fffff 64bit pref] clipped to [mem 0x10030000000-0x100303fffff 64bit pref]
pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100303fffff 64bit pref]
pci 0000:07:00.0: can't claim BAR 0 [mem 0x10030000000-0x1003fffffff 64bit pref]: no compatible bridge window
pci 0000:07:00.0: can't claim BAR 2 [mem 0x10040000000-0x100401fffff 64bit pref]: no compatible bridge window

However, the root bus has two contiguous apertures that can contain the
child resource requested.

Coalesce contiguous apertures so we can allocate from the entire contiguous
region.

This is the second take of commit 65db04053efe ("PCI: Coalesce host
bridge contiguous apertures"). The original approach sorts the apertures
by address, but that makes NVMe stop working on QEMU ppc:sam460ex:
PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])

After the offending commit:
PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])

Since the apertures on HP EliteDesk 805 G6 are already in ascending
order, doing a precautious sorting is not necessary.

Remove the sorting part to avoid the regression on ppc:sam460ex.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212013
Cc: Guenter Roeck <[email protected]>
Suggested-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Kai-Heng Feng <[email protected]>
---
v2:
- Bring back the original commit message.

drivers/pci/probe.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 79177ac37880..5de157600466 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -877,11 +877,11 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
static int pci_register_host_bridge(struct pci_host_bridge *bridge)
{
struct device *parent = bridge->dev.parent;
- struct resource_entry *window, *n;
+ struct resource_entry *window, *next, *n;
struct pci_bus *bus, *b;
- resource_size_t offset;
+ resource_size_t offset, next_offset;
LIST_HEAD(resources);
- struct resource *res;
+ struct resource *res, *next_res;
char addr[64], *fmt;
const char *name;
int err;
@@ -961,11 +961,34 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
if (nr_node_ids > 1 && pcibus_to_node(bus) == NUMA_NO_NODE)
dev_warn(&bus->dev, "Unknown NUMA node; performance will be reduced\n");

+ /* Coalesce contiguous windows */
+ resource_list_for_each_entry_safe(window, n, &resources) {
+ if (list_is_last(&window->node, &resources))
+ break;
+
+ next = list_next_entry(window, node);
+ offset = window->offset;
+ res = window->res;
+ next_offset = next->offset;
+ next_res = next->res;
+
+ if (res->flags != next_res->flags || offset != next_offset)
+ continue;
+
+ if (res->end + 1 == next_res->start) {
+ next_res->start = res->start;
+ res->flags = res->start = res->end = 0;
+ }
+ }
+
/* Add initial resources to the bus */
resource_list_for_each_entry_safe(window, n, &resources) {
- list_move_tail(&window->node, &bridge->windows);
offset = window->offset;
res = window->res;
+ if (!res->end)
+ continue;
+
+ list_move_tail(&window->node, &bridge->windows);

if (res->flags & IORESOURCE_BUS)
pci_bus_insert_busn_res(bus, bus->number, res->end);
--
2.31.1

2021-08-09 04:25:53

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

Some platforms cannot detect ethernet hotplug once its upstream port is
runtime suspended because PME isn't enabled in _OSC. The issue can be
workarounded by "pcie_ports=native".

The vendor confirmed that the PME in _OSC is disabled intentionally for
stability issues on the other OS, so we should also honor the PME
setting here.

Disallow port runtime suspend when any child device requires wakeup, so
pci_pme_list_scan() can still read the PME status from the devices
behind the port.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213873
Signed-off-by: Kai-Heng Feng <[email protected]>
---
drivers/pci/pcie/portdrv_pci.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..e693d243c90d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev)
return pcie_port_device_runtime_suspend(dev);
}

+static int pcie_port_wakeup_check(struct device *dev, void *data)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pdev)
+ return 0;
+
+ return pdev->wakeup_prepared;
+}
+
static int pcie_port_runtime_idle(struct device *dev)
{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) &&
+ device_for_each_child(dev, NULL, pcie_port_wakeup_check))
+ return -EBUSY;
+
/*
* Assume the PCI core has set bridge_d3 whenever it thinks the port
* should be good to go to D3. Everything else, including moving
* the port to D3, is handled by the PCI core.
*/
- return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
+ return pdev->bridge_d3 ? 0 : -EBUSY;
}

static const struct dev_pm_ops pcie_portdrv_pm_ops = {
--
2.31.1

2021-08-09 10:22:54

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

[cc += Mika]

On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote:
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213873

The last comment on this bugzilla says "BIOS will fix this."
and the status is RESOLVED WILL_NOT_FIX.

Why is the patch still necessary?


> Some platforms cannot detect ethernet hotplug once its upstream port is
> runtime suspended because PME isn't enabled in _OSC.

If PME is not handled natively, why does the NIC runtime suspend?
Shouldn't this be fixed in the NIC driver by keeping the device
runtime active if PME cannot be used?


> Disallow port runtime suspend when any child device requires wakeup, so
> pci_pme_list_scan() can still read the PME status from the devices
> behind the port.

pci_pme_list_scan() is for broken devices which fail to signal PME.
Is this NIC really among them or does PME fail merely because it's not
granted to OSPM?


> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev)
> return pcie_port_device_runtime_suspend(dev);
> }
>
> +static int pcie_port_wakeup_check(struct device *dev, void *data)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pdev)
> + return 0;
> +
> + return pdev->wakeup_prepared;
> +}
> +
> static int pcie_port_runtime_idle(struct device *dev)
> {
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) &&
> + device_for_each_child(dev, NULL, pcie_port_wakeup_check))
> + return -EBUSY;
> +
> /*
> * Assume the PCI core has set bridge_d3 whenever it thinks the port
> * should be good to go to D3. Everything else, including moving
> * the port to D3, is handled by the PCI core.
> */
> - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
> + return pdev->bridge_d3 ? 0 : -EBUSY;

If an additional check is necessary for this issue, it should be
integrated into pci_dev_check_d3cold() instead of pcie_port_runtime_idle().

Thanks,

Lukas

2021-08-09 10:51:08

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

On Mon, Aug 9, 2021 at 5:47 PM Lukas Wunner <[email protected]> wrote:
>
> [cc += Mika]
>
> On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote:
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213873
>
> The last comment on this bugzilla says "BIOS will fix this."
> and the status is RESOLVED WILL_NOT_FIX.
>
> Why is the patch still necessary?

Well, let me update the bug report.

>
>
> > Some platforms cannot detect ethernet hotplug once its upstream port is
> > runtime suspended because PME isn't enabled in _OSC.
>
> If PME is not handled natively, why does the NIC runtime suspend?
> Shouldn't this be fixed in the NIC driver by keeping the device
> runtime active if PME cannot be used?

That means we need to fix every user of pci_dev_run_wake(), or fix the
issue in pci_dev_run_wake() helper itself.
However, I am afraid that implementing the fix in pci_dev_run_wake()
may break the while loop check:
bool pci_dev_run_wake(struct pci_dev *dev)
{
...
while (bus->parent) {
struct pci_dev *bridge = bus->self;

if (device_can_wakeup(&bridge->dev))
return true;

bus = bus->parent;
...
}

So I took the current approach.

>
>
> > Disallow port runtime suspend when any child device requires wakeup, so
> > pci_pme_list_scan() can still read the PME status from the devices
> > behind the port.
>
> pci_pme_list_scan() is for broken devices which fail to signal PME.
> Is this NIC really among them or does PME fail merely because it's not
> granted to OSPM?

The latter, PME IRQ isn't enabled because it's not granted by BIOS _OSC.

>
>
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev)
> > return pcie_port_device_runtime_suspend(dev);
> > }
> >
> > +static int pcie_port_wakeup_check(struct device *dev, void *data)
> > +{
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > + if (!pdev)
> > + return 0;
> > +
> > + return pdev->wakeup_prepared;
> > +}
> > +
> > static int pcie_port_runtime_idle(struct device *dev)
> > {
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) &&
> > + device_for_each_child(dev, NULL, pcie_port_wakeup_check))
> > + return -EBUSY;
> > +
> > /*
> > * Assume the PCI core has set bridge_d3 whenever it thinks the port
> > * should be good to go to D3. Everything else, including moving
> > * the port to D3, is handled by the PCI core.
> > */
> > - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
> > + return pdev->bridge_d3 ? 0 : -EBUSY;
>
> If an additional check is necessary for this issue, it should be
> integrated into pci_dev_check_d3cold() instead of pcie_port_runtime_idle().

I think PME IRQ and D3cold are different things here.
The root port of the affected NIC doesn't support D3cold because
there's no power resource.

Kai-Heng

>
> Thanks,
>
> Lukas

2021-08-09 15:02:11

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

[cc += Rafael]

On Mon, Aug 09, 2021 at 06:40:41PM +0800, Kai-Heng Feng wrote:
> On Mon, Aug 9, 2021 at 5:47 PM Lukas Wunner <[email protected]> wrote:
> > On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote:
> > > Some platforms cannot detect ethernet hotplug once its upstream port is
> > > runtime suspended because PME isn't enabled in _OSC.
> >
> > If PME is not handled natively, why does the NIC runtime suspend?
> > Shouldn't this be fixed in the NIC driver by keeping the device
> > runtime active if PME cannot be used?
>
> That means we need to fix every user of pci_dev_run_wake(), or fix the
> issue in pci_dev_run_wake() helper itself.

If PME is not granted to the OS, the only consequence is that the PME
port service is not instantiated at the root port. But PME is still
enabled for downstream devices. Maybe that's a mistake? I think the
ACPI spec is a little unclear what to do if PME control is *not* granted.
It only specifies what to do if PME control is *granted*:

"If the OS successfully receives control of this feature, it must
handle power management events as described in the PCI Express Base
Specification."

"If firmware allows the OS control of this feature, then in the context
of the _OSC method it must ensure that all PMEs are routed to root port
interrupts as described in the PCI Express Base Specification.
Additionally, after control is transferred to the OS, firmware must not
update the PME Status field in the Root Status register or the PME
Interrupt Enable field in the Root Control register. If control of this
feature was requested and denied or was not requested, firmware returns
this bit set to 0."

Perhaps something like the below is appropriate, I'm not sure.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 091b4a4..7e64185 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3099,7 +3099,7 @@ void pci_pm_init(struct pci_dev *dev)
}

pmc &= PCI_PM_CAP_PME_MASK;
- if (pmc) {
+ if (pmc && pci_find_host_bridge(dev->bus)->native_pme) {
pci_info(dev, "PME# supported from%s%s%s%s%s\n",
(pmc & PCI_PM_CAP_PME_D0) ? " D0" : "",
(pmc & PCI_PM_CAP_PME_D1) ? " D1" : "",


> > > --- a/drivers/pci/pcie/portdrv_pci.c
> > > +++ b/drivers/pci/pcie/portdrv_pci.c
> > > @@ -59,14 +59,30 @@ static int pcie_port_runtime_suspend(struct device *dev)
> > > return pcie_port_device_runtime_suspend(dev);
> > > }
> > >
> > > +static int pcie_port_wakeup_check(struct device *dev, void *data)
> > > +{
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > +
> > > + if (!pdev)
> > > + return 0;
> > > +
> > > + return pdev->wakeup_prepared;
> > > +}
> > > +
> > > static int pcie_port_runtime_idle(struct device *dev)
> > > {
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > +
> > > + if (!pcie_port_find_device(pdev, PCIE_PORT_SERVICE_PME) &&
> > > + device_for_each_child(dev, NULL, pcie_port_wakeup_check))
> > > + return -EBUSY;
> > > +
> > > /*
> > > * Assume the PCI core has set bridge_d3 whenever it thinks the port
> > > * should be good to go to D3. Everything else, including moving
> > > * the port to D3, is handled by the PCI core.
> > > */
> > > - return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
> > > + return pdev->bridge_d3 ? 0 : -EBUSY;
> >
> > If an additional check is necessary for this issue, it should be
> > integrated into pci_dev_check_d3cold() instead of pcie_port_runtime_idle().
>
> I think PME IRQ and D3cold are different things here.
> The root port of the affected NIC doesn't support D3cold because
> there's no power resource.

If a bridge is runtime suspended to D3, the hierarchy below it is
inaccessible, which is basically the same as if it's put in D3cold,
hence the name pci_dev_check_d3cold(). That function allows a device
to block an upstream bridge from runtime suspending because the device
is not allowed to go to D3cold. The function specifically checks whether
a device is PME-capable from D3cold. The NIC claims it's capable but
the PME event has no effect because PME control wasn't granted to the
OS and firmware neglected to set PME Interrupt Enable in the Root Control
register. We could check for this case and block runtime PM of bridges
based on the rationale that PME polling is needed to detect wakeup.

Thanks,

Lukas

2021-08-10 16:42:42

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <[email protected]> wrote:
>
> [cc += Rafael]
>
> On Mon, Aug 09, 2021 at 06:40:41PM +0800, Kai-Heng Feng wrote:
> > On Mon, Aug 9, 2021 at 5:47 PM Lukas Wunner <[email protected]> wrote:
> > > On Mon, Aug 09, 2021 at 12:24:12PM +0800, Kai-Heng Feng wrote:
> > > > Some platforms cannot detect ethernet hotplug once its upstream port is
> > > > runtime suspended because PME isn't enabled in _OSC.
> > >
> > > If PME is not handled natively, why does the NIC runtime suspend?
> > > Shouldn't this be fixed in the NIC driver by keeping the device
> > > runtime active if PME cannot be used?
> >
> > That means we need to fix every user of pci_dev_run_wake(), or fix the
> > issue in pci_dev_run_wake() helper itself.
>
> If PME is not granted to the OS, the only consequence is that the PME
> port service is not instantiated at the root port. But PME is still
> enabled for downstream devices. Maybe that's a mistake? I think the
> ACPI spec is a little unclear what to do if PME control is *not* granted.
> It only specifies what to do if PME control is *granted*:

So do you prefer to just disable runtime PM for the downstream device?

>
> "If the OS successfully receives control of this feature, it must
> handle power management events as described in the PCI Express Base
> Specification."
>
> "If firmware allows the OS control of this feature, then in the context
> of the _OSC method it must ensure that all PMEs are routed to root port
> interrupts as described in the PCI Express Base Specification.
> Additionally, after control is transferred to the OS, firmware must not
> update the PME Status field in the Root Status register or the PME
> Interrupt Enable field in the Root Control register. If control of this
> feature was requested and denied or was not requested, firmware returns
> this bit set to 0."
>
> Perhaps something like the below is appropriate, I'm not sure.
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 091b4a4..7e64185 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3099,7 +3099,7 @@ void pci_pm_init(struct pci_dev *dev)
> }
>
> pmc &= PCI_PM_CAP_PME_MASK;
> - if (pmc) {
> + if (pmc && pci_find_host_bridge(dev->bus)->native_pme) {
> pci_info(dev, "PME# supported from%s%s%s%s%s\n",
> (pmc & PCI_PM_CAP_PME_D0) ? " D0" : "",
> (pmc & PCI_PM_CAP_PME_D1) ? " D1" : "",
>
>

I think this will also prevent non-root port devices from using PME.

[snipped]

> >
> > I think PME IRQ and D3cold are different things here.
> > The root port of the affected NIC doesn't support D3cold because
> > there's no power resource.
>
> If a bridge is runtime suspended to D3, the hierarchy below it is
> inaccessible, which is basically the same as if it's put in D3cold,
> hence the name pci_dev_check_d3cold(). That function allows a device
> to block an upstream bridge from runtime suspending because the device
> is not allowed to go to D3cold. The function specifically checks whether
> a device is PME-capable from D3cold. The NIC claims it's capable but
> the PME event has no effect because PME control wasn't granted to the
> OS and firmware neglected to set PME Interrupt Enable in the Root Control
> register. We could check for this case and block runtime PM of bridges
> based on the rationale that PME polling is needed to detect wakeup.

So for this case, should we prevent the downstream devices from
runtime suspending, or let it suspend but keep the root port active in
order to make pci_pme_list_scan() work?

Kai-Heng

>
> Thanks,
>
> Lukas

2021-08-10 16:47:25

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote:
> On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <[email protected]> wrote:
> > If PME is not granted to the OS, the only consequence is that the PME
> > port service is not instantiated at the root port. But PME is still
> > enabled for downstream devices. Maybe that's a mistake? I think the
> > ACPI spec is a little unclear what to do if PME control is *not* granted.
> > It only specifies what to do if PME control is *granted*:
>
> So do you prefer to just disable runtime PM for the downstream device?

I honestly don't know. I was just wondering whether it is okay
to enable PME on devices if control is not granted by the firmware.
The spec is fairly vague. But I guess the idea is that enabling PME
on devices is correct, just handling the interrupts is done by firmware
instead of the OS.

In your case, the endpoint device claims it can signal PME from D3cold,
which is why we allow the root port above to runtime suspend to D3hot.
The lspci output you've attached to the bugzilla indicates that yes,
signaling PME in D3cold does work, but the PME interrupt is neither
handled by the OS (because it's not allowed to) nor by firmware.

So you would like to rely on PME polling instead, which only works if the
root port remains in D0. Otherwise config space of the endpoint device
is inaccessible.

I think the proper solution is that firmware should handle the PME
interrupt. You've said the vendor objects because they found PME
doesn't work reliably. Well in that case the endpoint device shouldn't
indicate that it can signal PME, at least not from D3cold. Perhaps
the vendor is able to change the endpoint device's config space so
that it doesn't claim to support PME?

If that doesn't work and thus a kernel patch is necessary, the next
question is whether changing core code is the right approach.

If you do want to change core code, I'd suggest modifying
pci_dev_check_d3cold() so that it blocks runtime PM on upstream
bridges if PME is not handled natively AND firmware failed to enable
the PME interrupt at the root port. The rationale is that upstream
bridges need to remain in D0 so that PME polling is possible.

An alternative would be a quirk for this specific laptop which clears
pdev->pme_support.

Thanks,

Lukas

2021-08-11 05:12:12

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <[email protected]> wrote:
>
> On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote:
> > On Mon, Aug 9, 2021 at 11:00 PM Lukas Wunner <[email protected]> wrote:
> > > If PME is not granted to the OS, the only consequence is that the PME
> > > port service is not instantiated at the root port. But PME is still
> > > enabled for downstream devices. Maybe that's a mistake? I think the
> > > ACPI spec is a little unclear what to do if PME control is *not* granted.
> > > It only specifies what to do if PME control is *granted*:
> >
> > So do you prefer to just disable runtime PM for the downstream device?
>
> I honestly don't know. I was just wondering whether it is okay
> to enable PME on devices if control is not granted by the firmware.
> The spec is fairly vague. But I guess the idea is that enabling PME
> on devices is correct, just handling the interrupts is done by firmware
> instead of the OS.

Does this imply that current ACPI doesn't handle this part?

>
> In your case, the endpoint device claims it can signal PME from D3cold,
> which is why we allow the root port above to runtime suspend to D3hot.
> The lspci output you've attached to the bugzilla indicates that yes,
> signaling PME in D3cold does work, but the PME interrupt is neither
> handled by the OS (because it's not allowed to) nor by firmware.
>
> So you would like to rely on PME polling instead, which only works if the
> root port remains in D0. Otherwise config space of the endpoint device
> is inaccessible.

The Windows approach is to make the entire hierarchy stays at D0, I
think maybe it's a better way than relying on PME polling.

>
> I think the proper solution is that firmware should handle the PME
> interrupt. You've said the vendor objects because they found PME
> doesn't work reliably.

The PME works, what vendor said is that enabling PME makes the system
"unstable".

> Well in that case the endpoint device shouldn't
> indicate that it can signal PME, at least not from D3cold. Perhaps
> the vendor is able to change the endpoint device's config space so
> that it doesn't claim to support PME?

This is not an viable option, and we have to consider that BIOS from
different vendors can exhibit the same behavior.

>
> If that doesn't work and thus a kernel patch is necessary, the next
> question is whether changing core code is the right approach.

I really don't see other way because non-granted PME is a system-wide thing...

>
> If you do want to change core code, I'd suggest modifying
> pci_dev_check_d3cold() so that it blocks runtime PM on upstream
> bridges if PME is not handled natively AND firmware failed to enable
> the PME interrupt at the root port. The rationale is that upstream
> bridges need to remain in D0 so that PME polling is possible.

How do I know that firmware failed to enable PME IRQ?

And let me see how to make pci_dev_check_d3cold() work for this case.

>
> An alternative would be a quirk for this specific laptop which clears
> pdev->pme_support.

This won't scale, because many models are affected.

Kai-Heng

>
> Thanks,
>
> Lukas

2021-08-11 07:16:30

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

On Wed, Aug 11, 2021 at 01:06:27PM +0800, Kai-Heng Feng wrote:
> On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <[email protected]> wrote:
> >
> > On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote:
> > I honestly don't know. I was just wondering whether it is okay
> > to enable PME on devices if control is not granted by the firmware.
> > The spec is fairly vague. But I guess the idea is that enabling PME
> > on devices is correct, just handling the interrupts is done by firmware
> > instead of the OS.
>
> Does this imply that current ACPI doesn't handle this part?

Apparently not, according to the "lspci-bridge-after-hotplug" you've
attached to the bugzilla, the PME Interrupt Enable bit wasn't set in
the Root Control register. The kernel doesn't register an IRQ handler
for PME because firmware doesn't grant it control, so it's firmware's
job to enable and handle the IRQ (or poll the relevant register or
whatever).

RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
^^^^^^^^^^

> The Windows approach is to make the entire hierarchy stays at D0, I
> think maybe it's a better way than relying on PME polling.

Including the endpoint device, i.e. the NIC?


> > If you do want to change core code, I'd suggest modifying
> > pci_dev_check_d3cold() so that it blocks runtime PM on upstream
> > bridges if PME is not handled natively AND firmware failed to enable
> > the PME interrupt at the root port. The rationale is that upstream
> > bridges need to remain in D0 so that PME polling is possible.
>
> How do I know that firmware failed to enable PME IRQ?

Check whether PCI_EXP_RTCTL_PMEIE was set by firmware in the Root Control
register.


> > An alternative would be a quirk for this specific laptop which clears
> > pdev->pme_support.
>
> This won't scale, because many models are affected.

We already have quirks which clear pdev->pme_support, e.g.
pci_fixup_no_d0_pme() and pci_fixup_no_msi_no_pme().
Perhaps something like that would be appropriate here.

Thanks,

Lukas

2021-08-12 05:23:48

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH] PCI/portdrv: Disallow runtime suspend when waekup is required but PME service isn't supported

On Wed, Aug 11, 2021 at 3:11 PM Lukas Wunner <[email protected]> wrote:
>
> On Wed, Aug 11, 2021 at 01:06:27PM +0800, Kai-Heng Feng wrote:
> > On Wed, Aug 11, 2021 at 12:21 AM Lukas Wunner <[email protected]> wrote:
> > >
> > > On Tue, Aug 10, 2021 at 11:37:12PM +0800, Kai-Heng Feng wrote:
> > > I honestly don't know. I was just wondering whether it is okay
> > > to enable PME on devices if control is not granted by the firmware.
> > > The spec is fairly vague. But I guess the idea is that enabling PME
> > > on devices is correct, just handling the interrupts is done by firmware
> > > instead of the OS.
> >
> > Does this imply that current ACPI doesn't handle this part?
>
> Apparently not, according to the "lspci-bridge-after-hotplug" you've
> attached to the bugzilla, the PME Interrupt Enable bit wasn't set in
> the Root Control register. The kernel doesn't register an IRQ handler
> for PME because firmware doesn't grant it control, so it's firmware's
> job to enable and handle the IRQ (or poll the relevant register or
> whatever).
>
> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
> ^^^^^^^^^^

OK, I'll send a patch that checks this flag for PME capability.

>
> > The Windows approach is to make the entire hierarchy stays at D0, I
> > think maybe it's a better way than relying on PME polling.
>
> Including the endpoint device, i.e. the NIC?

Yes, including the endpoint device.

>
>
> > > If you do want to change core code, I'd suggest modifying
> > > pci_dev_check_d3cold() so that it blocks runtime PM on upstream
> > > bridges if PME is not handled natively AND firmware failed to enable
> > > the PME interrupt at the root port. The rationale is that upstream
> > > bridges need to remain in D0 so that PME polling is possible.
> >
> > How do I know that firmware failed to enable PME IRQ?
>
> Check whether PCI_EXP_RTCTL_PMEIE was set by firmware in the Root Control
> register.

I originally thought there can be a special ACPI method to query this info.

>
>
> > > An alternative would be a quirk for this specific laptop which clears
> > > pdev->pme_support.
> >
> > This won't scale, because many models are affected.
>
> We already have quirks which clear pdev->pme_support, e.g.
> pci_fixup_no_d0_pme() and pci_fixup_no_msi_no_pme().
> Perhaps something like that would be appropriate here.

OK, I'll take this approach.

Kai-Heng

>
> Thanks,
>
> Lukas

2021-09-29 21:37:51

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Reinstate "PCI: Coalesce host bridge contiguous apertures"

On Tue, Jul 13, 2021 at 08:50:07PM +0800, Kai-Heng Feng wrote:
> Built-in graphics on HP EliteDesk 805 G6 doesn't work because graphics
> can't get the BAR it needs:
> pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
> pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]
>
> pci 0000:00:08.1: bridge window [mem 0xd2000000-0xd23fffff]
> pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100401fffff 64bit pref]
> pci 0000:00:08.1: can't claim BAR 15 [mem 0x10030000000-0x100401fffff 64bit pref]: no compatible bridge window
> pci 0000:00:08.1: [mem 0x10030000000-0x100401fffff 64bit pref] clipped to [mem 0x10030000000-0x100303fffff 64bit pref]
> pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100303fffff 64bit pref]
> pci 0000:07:00.0: can't claim BAR 0 [mem 0x10030000000-0x1003fffffff 64bit pref]: no compatible bridge window
> pci 0000:07:00.0: can't claim BAR 2 [mem 0x10040000000-0x100401fffff 64bit pref]: no compatible bridge window
>
> However, the root bus has two contiguous apertures that can contain the
> child resource requested.
>
> Coalesce contiguous apertures so we can allocate from the entire contiguous
> region.
>
> This is the second take of commit 65db04053efe ("PCI: Coalesce host
> bridge contiguous apertures"). The original approach sorts the apertures
> by address, but that makes NVMe stop working on QEMU ppc:sam460ex:
> PCI host bridge to bus 0002:00
> pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
> pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
> pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
>
> After the offending commit:
> PCI host bridge to bus 0002:00
> pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
> pci_bus 0002:00: root bus resource [mem 0xc0ee00000-0xc0eefffff] (bus address [0x00000000-0x000fffff])
> pci_bus 0002:00: root bus resource [mem 0xd80000000-0xdffffffff] (bus address [0x80000000-0xffffffff])
>
> Since the apertures on HP EliteDesk 805 G6 are already in ascending
> order, doing a precautious sorting is not necessary.
>
> Remove the sorting part to avoid the regression on ppc:sam460ex.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212013
> Cc: Guenter Roeck <[email protected]>
> Suggested-by: Bjorn Helgaas <[email protected]>
> Signed-off-by: Kai-Heng Feng <[email protected]>

Applied to pci/resource for v5.16, thanks!

> ---
> v2:
> - Bring back the original commit message.
>
> drivers/pci/probe.c | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 79177ac37880..5de157600466 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -877,11 +877,11 @@ static void pci_set_bus_msi_domain(struct pci_bus *bus)
> static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> {
> struct device *parent = bridge->dev.parent;
> - struct resource_entry *window, *n;
> + struct resource_entry *window, *next, *n;
> struct pci_bus *bus, *b;
> - resource_size_t offset;
> + resource_size_t offset, next_offset;
> LIST_HEAD(resources);
> - struct resource *res;
> + struct resource *res, *next_res;
> char addr[64], *fmt;
> const char *name;
> int err;
> @@ -961,11 +961,34 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
> if (nr_node_ids > 1 && pcibus_to_node(bus) == NUMA_NO_NODE)
> dev_warn(&bus->dev, "Unknown NUMA node; performance will be reduced\n");
>
> + /* Coalesce contiguous windows */
> + resource_list_for_each_entry_safe(window, n, &resources) {
> + if (list_is_last(&window->node, &resources))
> + break;
> +
> + next = list_next_entry(window, node);
> + offset = window->offset;
> + res = window->res;
> + next_offset = next->offset;
> + next_res = next->res;
> +
> + if (res->flags != next_res->flags || offset != next_offset)
> + continue;
> +
> + if (res->end + 1 == next_res->start) {
> + next_res->start = res->start;
> + res->flags = res->start = res->end = 0;
> + }
> + }
> +
> /* Add initial resources to the bus */
> resource_list_for_each_entry_safe(window, n, &resources) {
> - list_move_tail(&window->node, &bridge->windows);
> offset = window->offset;
> res = window->res;
> + if (!res->end)
> + continue;
> +
> + list_move_tail(&window->node, &bridge->windows);
>
> if (res->flags & IORESOURCE_BUS)
> pci_bus_insert_busn_res(bus, bus->number, res->end);
> --
> 2.31.1
>

2021-11-17 14:12:28

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Reinstate "PCI: Coalesce host bridge contiguous apertures"

Hi!

So this patch landed now ... :)

> + /* Coalesce contiguous windows */
> + resource_list_for_each_entry_safe(window, n, &resources) {
> + if (list_is_last(&window->node, &resources))
> + break;
> +
> + next = list_next_entry(window, node);
> + offset = window->offset;
> + res = window->res;
> + next_offset = next->offset;
> + next_res = next->res;
> +
> + if (res->flags != next_res->flags || offset != next_offset)
> + continue;
> +
> + if (res->end + 1 == next_res->start) {
> + next_res->start = res->start;
> + res->flags = res->start = res->end = 0;
> + }
> + }
>

Maybe this was already a problem before - but I had a stupid thing in
arch/um/drivers/virt-pci.c (busn_resource has start == end == 0), and
your changes here caused that resource to be dropped off the list.

Now this wouldn't be a problem, but we add it using pci_add_resource()
and then that does a memory allocation, but you don't free it here? I'm
not sure it'd even be safe to free it here and I'll just set
busn_resource to have end==1 instead (it's all kind of virtual).

But I still wanted to ask if this might be a problem here for others.

johannes

2022-07-13 09:46:18

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Reinstate "PCI: Coalesce host bridge contiguous apertures"

Hi Johannes,

On Wed, Nov 17, 2021 at 6:41 PM Johannes Berg <[email protected]> wrote:
> So this patch landed now ... :)
>
> > + /* Coalesce contiguous windows */
> > + resource_list_for_each_entry_safe(window, n, &resources) {
> > + if (list_is_last(&window->node, &resources))
> > + break;
> > +
> > + next = list_next_entry(window, node);
> > + offset = window->offset;
> > + res = window->res;
> > + next_offset = next->offset;
> > + next_res = next->res;
> > +
> > + if (res->flags != next_res->flags || offset != next_offset)
> > + continue;
> > +
> > + if (res->end + 1 == next_res->start) {
> > + next_res->start = res->start;
> > + res->flags = res->start = res->end = 0;
> > + }
> > + }
> >
>
> Maybe this was already a problem before - but I had a stupid thing in
> arch/um/drivers/virt-pci.c (busn_resource has start == end == 0), and
> your changes here caused that resource to be dropped off the list.
>
> Now this wouldn't be a problem, but we add it using pci_add_resource()
> and then that does a memory allocation, but you don't free it here? I'm
> not sure it'd even be safe to free it here and I'll just set
> busn_resource to have end==1 instead (it's all kind of virtual).
>
> But I still wanted to ask if this might be a problem here for others.

Yes it is. I've sent a fix
https://lore.kernel.org/r/9c41a4372b27420c732ff5599d823e363de00c6d.1657704829.git.geert+renesas@glider.be

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds