2022-10-21 11:11:06

by Lukasz Majczak

[permalink] [raw]
Subject: [BUG] Intel Apollolake: PCIe bridge "loses" capabilities after entering D3Cold state

Hi,

This a follow-up from a discussion from “[PATCH V2] PCI/ASPM:
Save/restore L1SS Capability for suspend/resume”
(https://lore.kernel.org/lkml/[email protected]/t/)

While working with Vidya’s patch I have noticed that after
suspend/resume cycle on my Chromebook (Apollolake) PCIe bridge loses
its capabilities - the missing part is:

Capabilities: [200 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
T_CommonMode=40us LTR1.2_Threshold=98304ns
L1SubCtl2: T_PwrOn=60us

Digging more I’ve found out that entering D3Cold state causes this
issue (D3Hot seems to work fine).

With Vidya’s patch (all versions form V1 to V3) on upstream kernels
5.10/5.15 it was causing underlying device unavailable (in my case -
WiFi card) - the V4 (which was accepted and merged) works fine (I
guess thanks to “PCI/ASPM: Refactor L1 PM Substates Control Register
programming”) but the issue is still there - I mean now after
suspend/resume the underlying deceive works fine but mentioned
capabilities are still gone when using lspci -vvv.

I think with current code it does no harm to anyone, but just doing a
heads up about this.

Best regards,
Lukasz


2022-10-21 11:34:49

by Lukas Wunner

[permalink] [raw]
Subject: Re: [BUG] Intel Apollolake: PCIe bridge "loses" capabilities after entering D3Cold state

On Fri, Oct 21, 2022 at 12:17:35PM +0200, Lukasz Majczak wrote:
> While working with Vidya???s patch I have noticed that after
> suspend/resume cycle on my Chromebook (Apollolake) PCIe bridge loses
> its capabilities - the missing part is:
>
> Capabilities: [200 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> T_CommonMode=40us LTR1.2_Threshold=98304ns
> L1SubCtl2: T_PwrOn=60us
>
> Digging more I???ve found out that entering D3Cold state causes this

You mean the capability is gone from lspci after D3cold?

My understanding is that BIOS is responsible for populating config space.
So this sounds like a BIOS bug. What's the BIOS vendor and version?
(dmesg | grep DMI)

Thanks,

Lukas

2022-10-21 12:39:50

by Lukasz Majczak

[permalink] [raw]
Subject: Re: [BUG] Intel Apollolake: PCIe bridge "loses" capabilities after entering D3Cold state

pt., 21 paź 2022 o 13:19 Lukas Wunner <[email protected]> napisał(a):
>
> On Fri, Oct 21, 2022 at 12:17:35PM +0200, Lukasz Majczak wrote:
> > While working with Vidya???s patch I have noticed that after
> > suspend/resume cycle on my Chromebook (Apollolake) PCIe bridge loses
> > its capabilities - the missing part is:
> >
> > Capabilities: [200 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > T_CommonMode=40us LTR1.2_Threshold=98304ns
> > L1SubCtl2: T_PwrOn=60us
> >
> > Digging more I???ve found out that entering D3Cold state causes this
>
> You mean the capability is gone from lspci after D3cold?
>
> My understanding is that BIOS is responsible for populating config space.
> So this sounds like a BIOS bug. What's the BIOS vendor and version?
> (dmesg | grep DMI)
>
> Thanks,
>
> Lukas

Hi Lukasz

here is the DMI

localhost ~ # dmesg | grep DMI
[ 0.000000] DMI: Google Coral/Coral, BIOS Google_Coral.10068.81.0 11/27/2018
[ 0.155420] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 0.447820] [drm] DMI info: DMI_BIOS_VENDOR coreboot
[ 0.447828] [drm] DMI info: DMI_BIOS_VERSION Google_Coral.10068.81.0
[ 0.447832] [drm] DMI info: DMI_BIOS_DATE 11/27/2018
[ 0.447835] [drm] DMI info: DMI_BIOS_RELEASE 4.0
[ 0.447838] [drm] DMI info: DMI_SYS_VENDOR Google
[ 0.447841] [drm] DMI info: DMI_PRODUCT_NAME Coral
[ 0.447844] [drm] DMI info: DMI_PRODUCT_VERSION rev3
[ 0.447848] [drm] DMI info: DMI_PRODUCT_FAMILY Google_Coral

Yes, you are right and in our internal discussion the vendor (Intel)
has proposed a firmware patch, although I couldn't verified that the
issue is limited only to the given firmware/bios, so decided to send
this email.

Best regards,
Lukasz

2022-10-21 15:57:01

by Radosław Biernacki

[permalink] [raw]
Subject: Re: [BUG] Intel Apollolake: PCIe bridge "loses" capabilities after entering D3Cold state

>> pt., 21 paź 2022 o 13:19 Lukas Wunner <[email protected]> napisał(a):
>> >
>> > On Fri, Oct 21, 2022 at 12:17:35PM +0200, Lukasz Majczak wrote:
>> > > While working with Vidya???s patch I have noticed that after
>> > > suspend/resume cycle on my Chromebook (Apollolake) PCIe bridge loses
>> > > its capabilities - the missing part is:
>> > >
>> > > Capabilities: [200 v1] L1 PM Substates
>> > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>> > > PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
>> > > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>> > > T_CommonMode=40us LTR1.2_Threshold=98304ns
>> > > L1SubCtl2: T_PwrOn=60us
>> > >
>> > > Digging more I???ve found out that entering D3Cold state causes this
>> >
>> > You mean the capability is gone from lspci after D3cold?
>> >
>> > My understanding is that BIOS is responsible for populating config space.
>> > So this sounds like a BIOS bug. What's the BIOS vendor and version?
>> > (dmesg | grep DMI)
>> >
>> > Thanks,
>> >
>> > Lukas
>>
>> Hi Lukasz
>>
>> here is the DMI
>>
>> localhost ~ # dmesg | grep DMI
>> [ 0.000000] DMI: Google Coral/Coral, BIOS Google_Coral.10068.81.0 11/27/2018
>> [ 0.155420] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
>> [ 0.447820] [drm] DMI info: DMI_BIOS_VENDOR coreboot
>> [ 0.447828] [drm] DMI info: DMI_BIOS_VERSION Google_Coral.10068.81.0
>> [ 0.447832] [drm] DMI info: DMI_BIOS_DATE 11/27/2018
>> [ 0.447835] [drm] DMI info: DMI_BIOS_RELEASE 4.0
>> [ 0.447838] [drm] DMI info: DMI_SYS_VENDOR Google
>> [ 0.447841] [drm] DMI info: DMI_PRODUCT_NAME Coral
>> [ 0.447844] [drm] DMI info: DMI_PRODUCT_VERSION rev3
>> [ 0.447848] [drm] DMI info: DMI_PRODUCT_FAMILY Google_Coral
>>
>> Yes, you are right and in our internal discussion the vendor (Intel)
>> has proposed a firmware patch, although I couldn't verified that the
>> issue is limited only to the given firmware/bios, so decided to send
>> this email.
>>
>> Best regards,
>> Lukasz

Lukasz, Vidya, is the change in behaviour in V4 intentional fix for
mentioned problems with missing devices after D3cold or unintentional
side effects.
Or from another angle, can we base on this behaviour as a hotfix for
problems with missing devices?

As far as I understand we probably still should update FW in the fleet
of devices, right?

ps: Sorry for top-posting in the previous email, I forgot to switch my
gmail client.

2022-10-21 21:49:46

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [BUG] Intel Apollolake: PCIe bridge "loses" capabilities after entering D3Cold state

[+cc Radosław]

On Fri, Oct 21, 2022 at 12:17:35PM +0200, Lukasz Majczak wrote:
> Hi,
>
> This a follow-up from a discussion from “[PATCH V2] PCI/ASPM:
> Save/restore L1SS Capability for suspend/resume”
> (https://lore.kernel.org/lkml/[email protected]/t/)
>
> While working with Vidya’s patch I have noticed that after
> suspend/resume cycle on my Chromebook (Apollolake) PCIe bridge loses
> its capabilities - the missing part is:
>
> Capabilities: [200 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> T_CommonMode=40us LTR1.2_Threshold=98304ns
> L1SubCtl2: T_PwrOn=60us
>
> Digging more I’ve found out that entering D3Cold state causes this
> issue (D3Hot seems to work fine).
>
> With Vidya’s patch (all versions form V1 to V3) on upstream kernels
> 5.10/5.15 it was causing underlying device unavailable (in my case -
> WiFi card) - the V4 (which was accepted and merged) works fine (I
> guess thanks to “PCI/ASPM: Refactor L1 PM Substates Control Register
> programming”) but the issue is still there - I mean now after
> suspend/resume the underlying deceive works fine but mentioned
> capabilities are still gone when using lspci -vvv.
>
> I think with current code it does no harm to anyone, but just doing a
> heads up about this.

Thanks a lot for following up on this! Tell me if I have this right:

- After a fresh boot, the Root Port at 00:14.0 [8086:5ad6] has an L1
PM Substates Capability [per 1,2].

- You suspend and resume the system.

- After resume, 00:14.0 no longer has an L1 PM Substates Capability,
as in [2].

- The 00:14.0 Root Port leads to an iwlwifi device at 01:00.0, and
the wifi device works fine after resume.

- On the 01:00.0 iwlwifi device, lspci -vv still shows L1.1 and L1.2
enabled after resume, as it did in [2].

If substates are enabled at iwlwifi but not at the Root Port, that
would not be a valid scenario per spec. Per PCIe r6.0, sec 5.5.4:

An L1 PM Substate enable bit must only be Set in the Upstream and
Downstream Ports on a Link when the corresponding supported
capability bit is Set by both the Upstream and Downstream Ports on
that Link, otherwise the behavior is undefined.

So I don't know whether the L1.s states would still actually work.
(Is there any way to tell whether the iwlwifi power consumption
changes after the suspend/resume? Maybe powertop?)

And ASPM configuration, e.g., disabling/enabling substates via the
sysfs "l1_1_aspm" and "l1_2_aspm" files probably won't work right.

Bjorn

[1] https://lore.kernel.org/lkml/20220722174212.GA1911979@bhelgaas/
[2] https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3