2020-05-29 19:23:50

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

[+cc Rafael, linux-kernel]

On Fri, May 29, 2020 at 08:50:46PM +0200, Heiner Kallweit wrote:
> On 28.05.2020 23:44, Heiner Kallweit wrote:
> > For whatever reason with this change (and losing ASPM control) I also
> > loose the PCIe PME interrupts. This prevents my network card from
> > resuming from runtime-suspend.
> > Reverting the change brings back ASPM control and the PCIe PME irq's.
> >
> > Affected system is a Zotac MiniPC with a N3450 CPU:
> > PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 (rev fb)
> >
> I checked a little bit further and w/o ASPM control the root ports
> don't have the PME service bit set in their capabilities.
> Not sure whether this is a chipset bug or whether there's a better
> explanation. However more chipsets may have such a behavior.

Hmm. Is the difference simply changing the PCIEASPM config symbol, or
are you booting with command-line arguments like "pcie_aspm=off"?

What's the specific PME bit that changes in the root ports? Can you
collect the "sudo lspci -vvxxxx" output with and without ASPM?

The capability bits are generally read-only as far as the PCI spec is
concerned, but devices have implementation-specific knobs that the
BIOS may use to change things. Without CONFIG_PCIEASPM, Linux will
not request control of LTR, and that could cause the BIOS to change
something. You should be able to see the LTR control difference in
the dmesg logging about _OSC.

> W/o the "default y" for ASPM control we also have the situation now
> that the config option description says "When in doubt, say Y."
> but it takes the EXPERT mode to enable it. This seems to be a little
> bit inconsistent.

We should probably remove the "if EXPERT" from the PCIEASPM kconfig.
But I would expect PME to work correctly regardless of PCIEASPM, so
removing "if EXPERT" doesn't solve the underlying problem.

Rafael, does this ring any bells for you? I don't remember a
connection between PME and ASPM, but maybe there is one.

> To cut a long story short:
> At least on some systems this change has unwanted side effects.


2020-05-29 19:43:21

by Heiner Kallweit

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On 29.05.2020 21:21, Bjorn Helgaas wrote:
> [+cc Rafael, linux-kernel]
>
> On Fri, May 29, 2020 at 08:50:46PM +0200, Heiner Kallweit wrote:
>> On 28.05.2020 23:44, Heiner Kallweit wrote:
>>> For whatever reason with this change (and losing ASPM control) I also
>>> loose the PCIe PME interrupts. This prevents my network card from
>>> resuming from runtime-suspend.
>>> Reverting the change brings back ASPM control and the PCIe PME irq's.
>>>
>>> Affected system is a Zotac MiniPC with a N3450 CPU:
>>> PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 (rev fb)
>>>
>> I checked a little bit further and w/o ASPM control the root ports
>> don't have the PME service bit set in their capabilities.
>> Not sure whether this is a chipset bug or whether there's a better
>> explanation. However more chipsets may have such a behavior.
>
> Hmm. Is the difference simply changing the PCIEASPM config symbol, or
> are you booting with command-line arguments like "pcie_aspm=off"?
>
Only difference is the config symbol. My command line is plain and simple:

Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root=/dev/sda2 rw

> What's the specific PME bit that changes in the root ports? Can you
> collect the "sudo lspci -vvxxxx" output with and without ASPM?
>
> The capability bits are generally read-only as far as the PCI spec is
> concerned, but devices have implementation-specific knobs that the
> BIOS may use to change things. Without CONFIG_PCIEASPM, Linux will
> not request control of LTR, and that could cause the BIOS to change
> something. You should be able to see the LTR control difference in
> the dmesg logging about _OSC.
>
>> W/o the "default y" for ASPM control we also have the situation now
>> that the config option description says "When in doubt, say Y."
>> but it takes the EXPERT mode to enable it. This seems to be a little
>> bit inconsistent.
>
> We should probably remove the "if EXPERT" from the PCIEASPM kconfig.
> But I would expect PME to work correctly regardless of PCIEASPM, so
> removing "if EXPERT" doesn't solve the underlying problem.
>
> Rafael, does this ring any bells for you? I don't remember a
> connection between PME and ASPM, but maybe there is one.
>
>> To cut a long story short:
>> At least on some systems this change has unwanted side effects.

lspci output w/ and w/o ASPM is attached incl. a diff.
Here comes the _OSC difference.

w/o ASPM

[ 0.386063] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig Segments MSI HPX-Type3]
[ 0.386918] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]

w/ ASPM
[ 0.388141] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[ 0.393648] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability LTR]

It's at least interesting that w/o ASPM OS doesn't control PME and AER.


Attachments:
lspci.diff (4.17 kB)
lspci_vvxxx_w_aspm (45.20 kB)
lspci_vvxxx_wo_aspm (45.20 kB)
Download all attachments

2020-05-29 20:13:58

by Heiner Kallweit

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On 29.05.2020 21:40, Heiner Kallweit wrote:
> On 29.05.2020 21:21, Bjorn Helgaas wrote:
>> [+cc Rafael, linux-kernel]
>>
>> On Fri, May 29, 2020 at 08:50:46PM +0200, Heiner Kallweit wrote:
>>> On 28.05.2020 23:44, Heiner Kallweit wrote:
>>>> For whatever reason with this change (and losing ASPM control) I also
>>>> loose the PCIe PME interrupts. This prevents my network card from
>>>> resuming from runtime-suspend.
>>>> Reverting the change brings back ASPM control and the PCIe PME irq's.
>>>>
>>>> Affected system is a Zotac MiniPC with a N3450 CPU:
>>>> PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 (rev fb)
>>>>
>>> I checked a little bit further and w/o ASPM control the root ports
>>> don't have the PME service bit set in their capabilities.
>>> Not sure whether this is a chipset bug or whether there's a better
>>> explanation. However more chipsets may have such a behavior.
>>
>> Hmm. Is the difference simply changing the PCIEASPM config symbol, or
>> are you booting with command-line arguments like "pcie_aspm=off"?
>>
> Only difference is the config symbol. My command line is plain and simple:
>
> Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root=/dev/sda2 rw
>
>> What's the specific PME bit that changes in the root ports? Can you
>> collect the "sudo lspci -vvxxxx" output with and without ASPM?
>>
>> The capability bits are generally read-only as far as the PCI spec is
>> concerned, but devices have implementation-specific knobs that the
>> BIOS may use to change things. Without CONFIG_PCIEASPM, Linux will
>> not request control of LTR, and that could cause the BIOS to change
>> something. You should be able to see the LTR control difference in
>> the dmesg logging about _OSC.
>>
>>> W/o the "default y" for ASPM control we also have the situation now
>>> that the config option description says "When in doubt, say Y."
>>> but it takes the EXPERT mode to enable it. This seems to be a little
>>> bit inconsistent.
>>
>> We should probably remove the "if EXPERT" from the PCIEASPM kconfig.
>> But I would expect PME to work correctly regardless of PCIEASPM, so
>> removing "if EXPERT" doesn't solve the underlying problem.
>>
>> Rafael, does this ring any bells for you? I don't remember a
>> connection between PME and ASPM, but maybe there is one.
>>
>>> To cut a long story short:
>>> At least on some systems this change has unwanted side effects.
>
> lspci output w/ and w/o ASPM is attached incl. a diff.
> Here comes the _OSC difference.
>
> w/o ASPM
>
> [ 0.386063] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig Segments MSI HPX-Type3]
> [ 0.386918] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
>
> w/ ASPM
> [ 0.388141] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
> [ 0.393648] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability LTR]
>
> It's at least interesting that w/o ASPM OS doesn't control PME and AER.
>

This was the right entry point, also w/o ASPM control OS states to ACPI that it
needs ASPM and ClockPM. The following patch fixes the PME issue for me.
See also the _OSC part below.


diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 9e235c1a7..8df1fa728 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -38,10 +38,15 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
return 0;
}

+#ifdef CONFIG_PCIEASPM
#define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
| OSC_PCI_ASPM_SUPPORT \
| OSC_PCI_CLOCK_PM_SUPPORT \
| OSC_PCI_MSI_SUPPORT)
+#else
+#define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
+ | OSC_PCI_MSI_SUPPORT)
+#endif

static const struct acpi_device_id root_device_ids[] = {
{"PNP0A03", 0},
--
2.26.2


[ 0.387527] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig Segments MSI HPX-Type3]
[ 0.393033] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability]

2020-05-29 20:25:42

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

[+cc Matthew]

On Fri, May 29, 2020 at 10:09:08PM +0200, Heiner Kallweit wrote:
> On 29.05.2020 21:40, Heiner Kallweit wrote:
> > On 29.05.2020 21:21, Bjorn Helgaas wrote:
> >> On Fri, May 29, 2020 at 08:50:46PM +0200, Heiner Kallweit wrote:
> >>> On 28.05.2020 23:44, Heiner Kallweit wrote:
> >>>> For whatever reason with this change (and losing ASPM control) I also
> >>>> loose the PCIe PME interrupts. This prevents my network card from
> >>>> resuming from runtime-suspend.
> >>>> Reverting the change brings back ASPM control and the PCIe PME irq's.
> >>>>
> >>>> Affected system is a Zotac MiniPC with a N3450 CPU:
> >>>> PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 (rev fb)
> >>>>
> >>> I checked a little bit further and w/o ASPM control the root ports
> >>> don't have the PME service bit set in their capabilities.
> >>> Not sure whether this is a chipset bug or whether there's a better
> >>> explanation. However more chipsets may have such a behavior.
> >>
> >> Hmm. Is the difference simply changing the PCIEASPM config symbol, or
> >> are you booting with command-line arguments like "pcie_aspm=off"?
> >>
> > Only difference is the config symbol. My command line is plain and simple:
> >
> > Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root=/dev/sda2 rw
> >
> >> What's the specific PME bit that changes in the root ports? Can you
> >> collect the "sudo lspci -vvxxxx" output with and without ASPM?
> >>
> >> The capability bits are generally read-only as far as the PCI spec is
> >> concerned, but devices have implementation-specific knobs that the
> >> BIOS may use to change things. Without CONFIG_PCIEASPM, Linux will
> >> not request control of LTR, and that could cause the BIOS to change
> >> something. You should be able to see the LTR control difference in
> >> the dmesg logging about _OSC.
> >>
> >>> W/o the "default y" for ASPM control we also have the situation now
> >>> that the config option description says "When in doubt, say Y."
> >>> but it takes the EXPERT mode to enable it. This seems to be a little
> >>> bit inconsistent.
> >>
> >> We should probably remove the "if EXPERT" from the PCIEASPM kconfig.
> >> But I would expect PME to work correctly regardless of PCIEASPM, so
> >> removing "if EXPERT" doesn't solve the underlying problem.
> >>
> >> Rafael, does this ring any bells for you? I don't remember a
> >> connection between PME and ASPM, but maybe there is one.
> >>
> >>> To cut a long story short:
> >>> At least on some systems this change has unwanted side effects.
> >
> > lspci output w/ and w/o ASPM is attached incl. a diff.
> > Here comes the _OSC difference.
> >
> > w/o ASPM
> >
> > [ 0.386063] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig Segments MSI HPX-Type3]
> > [ 0.386918] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
> >
> > w/ ASPM
> > [ 0.388141] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
> > [ 0.393648] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability LTR]
> >
> > It's at least interesting that w/o ASPM OS doesn't control PME and AER.
> >
>
> This was the right entry point, also w/o ASPM control OS states to ACPI that it
> needs ASPM and ClockPM. The following patch fixes the PME issue for me.
> See also the _OSC part below.
>
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 9e235c1a7..8df1fa728 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -38,10 +38,15 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
> return 0;
> }
>
> +#ifdef CONFIG_PCIEASPM
> #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
> | OSC_PCI_ASPM_SUPPORT \
> | OSC_PCI_CLOCK_PM_SUPPORT \
> | OSC_PCI_MSI_SUPPORT)
> +#else
> +#define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
> + | OSC_PCI_MSI_SUPPORT)
> +#endif

Yeah, that makes sense. I can't remember the details, but I'm pretty
sure there's a reason why we ask for the whole set of things. Seems
like it solved some problem. I think Matthew Garrett might have been
involved in that.

2020-05-29 20:55:54

by Heiner Kallweit

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On 29.05.2020 22:21, Bjorn Helgaas wrote:
> [+cc Matthew]
>
> On Fri, May 29, 2020 at 10:09:08PM +0200, Heiner Kallweit wrote:
>> On 29.05.2020 21:40, Heiner Kallweit wrote:
>>> On 29.05.2020 21:21, Bjorn Helgaas wrote:
>>>> On Fri, May 29, 2020 at 08:50:46PM +0200, Heiner Kallweit wrote:
>>>>> On 28.05.2020 23:44, Heiner Kallweit wrote:
>>>>>> For whatever reason with this change (and losing ASPM control) I also
>>>>>> loose the PCIe PME interrupts. This prevents my network card from
>>>>>> resuming from runtime-suspend.
>>>>>> Reverting the change brings back ASPM control and the PCIe PME irq's.
>>>>>>
>>>>>> Affected system is a Zotac MiniPC with a N3450 CPU:
>>>>>> PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 (rev fb)
>>>>>>
>>>>> I checked a little bit further and w/o ASPM control the root ports
>>>>> don't have the PME service bit set in their capabilities.
>>>>> Not sure whether this is a chipset bug or whether there's a better
>>>>> explanation. However more chipsets may have such a behavior.
>>>>
>>>> Hmm. Is the difference simply changing the PCIEASPM config symbol, or
>>>> are you booting with command-line arguments like "pcie_aspm=off"?
>>>>
>>> Only difference is the config symbol. My command line is plain and simple:
>>>
>>> Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root=/dev/sda2 rw
>>>
>>>> What's the specific PME bit that changes in the root ports? Can you
>>>> collect the "sudo lspci -vvxxxx" output with and without ASPM?
>>>>
>>>> The capability bits are generally read-only as far as the PCI spec is
>>>> concerned, but devices have implementation-specific knobs that the
>>>> BIOS may use to change things. Without CONFIG_PCIEASPM, Linux will
>>>> not request control of LTR, and that could cause the BIOS to change
>>>> something. You should be able to see the LTR control difference in
>>>> the dmesg logging about _OSC.
>>>>
>>>>> W/o the "default y" for ASPM control we also have the situation now
>>>>> that the config option description says "When in doubt, say Y."
>>>>> but it takes the EXPERT mode to enable it. This seems to be a little
>>>>> bit inconsistent.
>>>>
>>>> We should probably remove the "if EXPERT" from the PCIEASPM kconfig.
>>>> But I would expect PME to work correctly regardless of PCIEASPM, so
>>>> removing "if EXPERT" doesn't solve the underlying problem.
>>>>
>>>> Rafael, does this ring any bells for you? I don't remember a
>>>> connection between PME and ASPM, but maybe there is one.
>>>>
>>>>> To cut a long story short:
>>>>> At least on some systems this change has unwanted side effects.
>>>
>>> lspci output w/ and w/o ASPM is attached incl. a diff.
>>> Here comes the _OSC difference.
>>>
>>> w/o ASPM
>>>
>>> [ 0.386063] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig Segments MSI HPX-Type3]
>>> [ 0.386918] acpi PNP0A08:00: _OSC: not requesting OS control; OS requires [ExtendedConfig ASPM ClockPM MSI]
>>>
>>> w/ ASPM
>>> [ 0.388141] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
>>> [ 0.393648] acpi PNP0A08:00: _OSC: OS now controls [PME AER PCIeCapability LTR]
>>>
>>> It's at least interesting that w/o ASPM OS doesn't control PME and AER.
>>>
>>
>> This was the right entry point, also w/o ASPM control OS states to ACPI that it
>> needs ASPM and ClockPM. The following patch fixes the PME issue for me.
>> See also the _OSC part below.
>>
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 9e235c1a7..8df1fa728 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -38,10 +38,15 @@ static int acpi_pci_root_scan_dependent(struct acpi_device *adev)
>> return 0;
>> }
>>
>> +#ifdef CONFIG_PCIEASPM
>> #define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
>> | OSC_PCI_ASPM_SUPPORT \
>> | OSC_PCI_CLOCK_PM_SUPPORT \
>> | OSC_PCI_MSI_SUPPORT)
>> +#else
>> +#define ACPI_PCIE_REQ_SUPPORT (OSC_PCI_EXT_CONFIG_SUPPORT \
>> + | OSC_PCI_MSI_SUPPORT)
>> +#endif
>
> Yeah, that makes sense. I can't remember the details, but I'm pretty
> sure there's a reason why we ask for the whole set of things. Seems
> like it solved some problem. I think Matthew Garrett might have been
> involved in that.
>

ACPI_PCIE_REQ_SUPPORT was introduced with 415e12b23792 ("PCI/ACPI:
Request _OSC control once for each root bridge (v3)") in 2011.
It's linked to the following bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=20232

I can't say whether there's a chance that the proposed patch would
bring back the issue discussed at that time.

If somebody thinks that the proposed patch isn't the right solution,
then presumably the following needs to be touched.

if ((support & ACPI_PCIE_REQ_SUPPORT) != ACPI_PCIE_REQ_SUPPORT) {
decode_osc_support(root, "not requesting OS control; OS requires",
ACPI_PCIE_REQ_SUPPORT);
return;
}

control = OSC_PCI_EXPRESS_CAPABILITY_CONTROL
| OSC_PCI_EXPRESS_PME_CONTROL;

2020-05-29 21:41:23

by Matthew Garrett

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On Fri, May 29, 2020 at 03:21:35PM -0500, Bjorn Helgaas wrote:

> Yeah, that makes sense. I can't remember the details, but I'm pretty
> sure there's a reason why we ask for the whole set of things. Seems
> like it solved some problem. I think Matthew Garrett might have been
> involved in that.

This was https://bugzilla.redhat.com/show_bug.cgi?id=638912 - some
firmware misbehaves unless you pass the same set of supported
functionality as Windows does.

--
Matthew Garrett | [email protected]

2020-05-29 22:30:59

by Heiner Kallweit

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On 29.05.2020 22:59, Matthew Garrett wrote:
> On Fri, May 29, 2020 at 03:21:35PM -0500, Bjorn Helgaas wrote:
>
>> Yeah, that makes sense. I can't remember the details, but I'm pretty
>> sure there's a reason why we ask for the whole set of things. Seems
>> like it solved some problem. I think Matthew Garrett might have been
>> involved in that.
>
> This was https://bugzilla.redhat.com/show_bug.cgi?id=638912 - some
> firmware misbehaves unless you pass the same set of supported
> functionality as Windows does.
>

Current situation means that PME is unusable on all systems where
pcie_aspm_support_enabled() returns false, what is basically every
system except EXPERT mode is enabled and CONFIG_PCIEASPM is set.
So we definitely need to do something.

One question is whether the system from the 10yr old bug report
actually depends on OSC_PCI_EXPRESS_LTR_CONTROL control, or whether
some other change in recent years fixed the issue.
Not sure whether the system is still available for re-testing.

If worst case we have 10yr old systems breaking with a new kernel
then we still would have the workaround to enable CONFIG_PCIEASPM
on that system.

2020-05-29 23:00:20

by Matthew Garrett

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On Sat, May 30, 2020 at 12:26:17AM +0200, Heiner Kallweit wrote:

> Current situation means that PME is unusable on all systems where
> pcie_aspm_support_enabled() returns false, what is basically every
> system except EXPERT mode is enabled and CONFIG_PCIEASPM is set.
> So we definitely need to do something.

CONFIG_PCIEASPM is default y. I don't think there's huge value in
adding complexity to deal with it being disabled, given that the kernel
is then in a configuration that no vendor is testing against. There are
existing runtime mechanisms to disable it at runtime.

--
Matthew Garrett | [email protected]

2020-05-30 06:39:57

by Heiner Kallweit

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On 30.05.2020 00:58, Matthew Garrett wrote:
> On Sat, May 30, 2020 at 12:26:17AM +0200, Heiner Kallweit wrote:
>
>> Current situation means that PME is unusable on all systems where
>> pcie_aspm_support_enabled() returns false, what is basically every
>> system except EXPERT mode is enabled and CONFIG_PCIEASPM is set.
>> So we definitely need to do something.
>
> CONFIG_PCIEASPM is default y. I don't think there's huge value in
> adding complexity to deal with it being disabled, given that the kernel
> is then in a configuration that no vendor is testing against. There are
> existing runtime mechanisms to disable it at runtime.
>
>
It *was* default y. This changed with a914ff2d78ce ("PCI/ASPM: Don't
select CONFIG_PCIEASPM by default") and that's what triggered the
problem. If there's no easy solution, then maybe it's best to revert
the change for now.

2020-05-30 07:16:48

by Matthew Garrett

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On Sat, May 30, 2020 at 08:33:50AM +0200, Heiner Kallweit wrote:

> It *was* default y. This changed with a914ff2d78ce ("PCI/ASPM: Don't
> select CONFIG_PCIEASPM by default") and that's what triggered the
> problem. If there's no easy solution, then maybe it's best to revert
> the change for now.

Oh, sorry, I was looking at mainline. CONFIG_PCIEASPM should
*definitely* be enabled by default - platforms expect the OS to support
it. If we want to get rid of default y then I think it'd make more sense
to have a CONFIG_DISABLE_PCIEASPM that's under EXPERT, and people who
really want to disable the code can do so.

--
Matthew Garrett | [email protected]

2020-05-30 11:40:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On Sat, May 30, 2020 at 08:14:34AM +0100, Matthew Garrett wrote:
> On Sat, May 30, 2020 at 08:33:50AM +0200, Heiner Kallweit wrote:
>
> > It *was* default y. This changed with a914ff2d78ce ("PCI/ASPM: Don't
> > select CONFIG_PCIEASPM by default") and that's what triggered the
> > problem. If there's no easy solution, then maybe it's best to revert
> > the change for now.
>
> Oh, sorry, I was looking at mainline. CONFIG_PCIEASPM should
> *definitely* be enabled by default - platforms expect the OS to support
> it. If we want to get rid of default y then I think it'd make more sense
> to have a CONFIG_DISABLE_PCIEASPM that's under EXPERT, and people who
> really want to disable the code can do so.

I think the fact that the EXPERT didn't get removed in the above bug
is a defintive bug. But I'd go further and think the CONFIG_PCIEASPM
option should be removed entirely. There is absolutely no good reason
to not build this small amount of code if PCIe support is enabled.

2020-06-01 11:01:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On Sat, May 30, 2020 at 1:34 PM Christoph Hellwig <[email protected]> wrote:
>
> On Sat, May 30, 2020 at 08:14:34AM +0100, Matthew Garrett wrote:
> > On Sat, May 30, 2020 at 08:33:50AM +0200, Heiner Kallweit wrote:
> >
> > > It *was* default y. This changed with a914ff2d78ce ("PCI/ASPM: Don't
> > > select CONFIG_PCIEASPM by default") and that's what triggered the
> > > problem. If there's no easy solution, then maybe it's best to revert
> > > the change for now.
> >
> > Oh, sorry, I was looking at mainline. CONFIG_PCIEASPM should
> > *definitely* be enabled by default - platforms expect the OS to support
> > it. If we want to get rid of default y then I think it'd make more sense
> > to have a CONFIG_DISABLE_PCIEASPM that's under EXPERT, and people who
> > really want to disable the code can do so.
>
> I think the fact that the EXPERT didn't get removed in the above bug
> is a defintive bug. But I'd go further and think the CONFIG_PCIEASPM
> option should be removed entirely. There is absolutely no good reason
> to not build this small amount of code if PCIe support is enabled.

Well stated, thanks!

2020-06-01 15:15:58

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Lost PCIe PME after a914ff2d78ce ("PCI/ASPM: Don't select CONFIG_PCIEASPM by default")

On Sat, May 30, 2020 at 04:33:44AM -0700, Christoph Hellwig wrote:
> On Sat, May 30, 2020 at 08:14:34AM +0100, Matthew Garrett wrote:
> > On Sat, May 30, 2020 at 08:33:50AM +0200, Heiner Kallweit wrote:
> >
> > > It *was* default y. This changed with a914ff2d78ce ("PCI/ASPM: Don't
> > > select CONFIG_PCIEASPM by default") and that's what triggered the
> > > problem. If there's no easy solution, then maybe it's best to revert
> > > the change for now.
> >
> > Oh, sorry, I was looking at mainline. CONFIG_PCIEASPM should
> > *definitely* be enabled by default - platforms expect the OS to support
> > it. If we want to get rid of default y then I think it'd make more sense
> > to have a CONFIG_DISABLE_PCIEASPM that's under EXPERT, and people who
> > really want to disable the code can do so.
>
> I think the fact that the EXPERT didn't get removed in the above bug
> is a defintive bug. But I'd go further and think the CONFIG_PCIEASPM
> option should be removed entirely. There is absolutely no good reason
> to not build this small amount of code if PCIe support is enabled.

That might be a good solution. I don't really want a situation where
leaving CONFIG_PCIEASPM unset is "known to break PME."

ASPM support has historically been fragile, and it's actually a
significant amount of code (~10KB on my x86), which is larger than
many other things for which we have config options. But we do have
boot-time and run-time ways to disable it.

Another possibility might be to make it so Linux always advertises
ASPM support, but doesn't actually *do* anything unless
CONFIG_PCIEASPM is set. I don't think advertising ASPM support means
the OS is required to change whatever configuration firmware did.

Bjorn