2018-09-12 03:42:55

by Kai-Heng Feng

[permalink] [raw]
Subject: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

Hi Jian-Hong,

There's a Dell machine with RTL8106e stops to work after S3 since the
commit introduced.
So I am wondering if it's possible to revert the commit and use
DMI/subsystem id based quirk table?

It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
reservation mode for non maskable MSI") cleared the reservation mode, and I
can see this after S3:

[ 94.872838] do_IRQ: 3.33 No irq handler for vector

If the device uses MSI-X instead of MSI, the issue doesn't happen because
of reservation mode.


Hi Thomas,

Is it something should be handled by x86 BIOS? Because I don't see this
issue when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.

Kai-Heng



2018-09-12 04:57:10

by Jian-Hong Pan

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

2018-09-12 11:42 GMT+08:00 Kai-Heng Feng <[email protected]>:
> Hi Jian-Hong,
>
> There's a Dell machine with RTL8106e stops to work after S3 since the commit
> introduced.
> So I am wondering if it's possible to revert the commit and use
> DMI/subsystem id based quirk table?
>
> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
> reservation mode for non maskable MSI") cleared the reservation mode, and I
> can see this after S3:
>
> [ 94.872838] do_IRQ: 3.33 No irq handler for vector
>
> If the device uses MSI-X instead of MSI, the issue doesn't happen because of
> reservation mode.

Interesting! Opposite symptom!
Could you help try the patch
https://marc.info/?l=linux-pci&m=153629858601668&w=4 with and without
reverting the commit?

If the patch does not work, another suggestion: You can try falling
back to only PCI_IRQ_LEGACY.

Regards,
Jian-Hong Pan

>
> Hi Thomas,
>
> Is it something should be handled by x86 BIOS? Because I don't see this
> issue when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.
>
> Kai-Heng
>

2018-09-12 05:58:09

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

at 12:56, Jian-Hong Pan <[email protected]> wrote:

> 2018-09-12 11:42 GMT+08:00 Kai-Heng Feng <[email protected]>:
>> Hi Jian-Hong,
>>
>> There's a Dell machine with RTL8106e stops to work after S3 since the
>> commit
>> introduced.
>> So I am wondering if it's possible to revert the commit and use
>> DMI/subsystem id based quirk table?
>>
>> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
>> reservation mode for non maskable MSI") cleared the reservation mode,
>> and I
>> can see this after S3:
>>
>> [ 94.872838] do_IRQ: 3.33 No irq handler for vector
>>
>> If the device uses MSI-X instead of MSI, the issue doesn't happen
>> because of
>> reservation mode.
>
> Interesting! Opposite symptom!
> Could you help try the patch
> https://marc.info/?l=linux-pci&m=153629858601668&w=4 with and without
> reverting the commit?

Same issue after applying this patch. MSI-X works, MSI doesn't work.

>
> If the patch does not work, another suggestion: You can try falling
> back to only PCI_IRQ_LEGACY.

This device is capable of using MSI-X, I don't think falls back to use
legacy is a good idea.
Instead, using a quirk table should be more appropriate.

Kai-Heng

>
> Regards,
> Jian-Hong Pan
>
>> Hi Thomas,
>>
>> Is it something should be handled by x86 BIOS? Because I don't see this
>> issue when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.
>>
>> Kai-Heng



2018-09-12 06:33:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

On Wed, 12 Sep 2018, Kai-Heng Feng wrote:

> There's a Dell machine with RTL8106e stops to work after S3 since the
> commit introduced. So I am wondering if it's possible to revert the
> commit and use DMI/subsystem id based quirk table?

Probably.

> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
> reservation mode for non maskable MSI") cleared the reservation mode, and I
> can see this after S3:
>
> [ 94.872838] do_IRQ: 3.33 No irq handler for vector

It's not because of that commit, really. There is a interrupt sent after
resume to the wrong vector for whatever reason. The MSI vector cannot be
masked it seems in the device, but the driver should quiescen the device to
a point where it does not send interrupts.

> If the device uses MSI-X instead of MSI, the issue doesn't happen because of
> reservation mode.

Reservation mode has absolutely nothing to do with that. What prevents the
issue is the fact that MSI-X can be masked by the IRQ core.

> Is it something should be handled by x86 BIOS? Because I don't see this issue
> when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.

Suspend to idle works completely different and I don't see the BIOS at
fault here. it's more an issue of MSI not being maskable on that device,
which can't be fixed in BIOS or it's some half quiescened state which is
used when suspending and that's a pure driver issue.

Thanks,

tglx

2018-09-12 08:20:15

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

at 14:32, Thomas Gleixner <[email protected]> wrote:

> On Wed, 12 Sep 2018, Kai-Heng Feng wrote:
>
>> There's a Dell machine with RTL8106e stops to work after S3 since the
>> commit introduced. So I am wondering if it's possible to revert the
>> commit and use DMI/subsystem id based quirk table?
>
> Probably.

Hopefully Jian-Hong can cook up a quirk table for the issue.

>
>> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
>> reservation mode for non maskable MSI") cleared the reservation mode,
>> and I
>> can see this after S3:
>>
>> [ 94.872838] do_IRQ: 3.33 No irq handler for vector
>
> It's not because of that commit, really. There is a interrupt sent after
> resume to the wrong vector for whatever reason. The MSI vector cannot be
> masked it seems in the device, but the driver should quiescen the device to
> a point where it does not send interrupts.

Understood.

>
>> If the device uses MSI-X instead of MSI, the issue doesn't happen
>> because of
>> reservation mode.
>
> Reservation mode has absolutely nothing to do with that. What prevents the
> issue is the fact that MSI-X can be masked by the IRQ core.

So in this case I think keep the device using MSI-X is a better route, it's
MSI-X capable anyway.

>
>> Is it something should be handled by x86 BIOS? Because I don't see this
>> issue
>> when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.
>
> Suspend to idle works completely different and I don't see the BIOS at
> fault here. it's more an issue of MSI not being maskable on that device,
> which can't be fixed in BIOS or it's some half quiescened state which is
> used when suspending and that's a pure driver issue.

Understood.
Thanks for all the info!

Kai-Heng

>
> Thanks,
>
> tglx



2018-09-13 05:53:41

by Jian-Hong Pan

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

2018-09-12 16:19 GMT+08:00 Kai-Heng Feng <[email protected]>:
> at 14:32, Thomas Gleixner <[email protected]> wrote:
>
>> On Wed, 12 Sep 2018, Kai-Heng Feng wrote:
>>
>>> There's a Dell machine with RTL8106e stops to work after S3 since the
>>> commit introduced. So I am wondering if it's possible to revert the
>>> commit and use DMI/subsystem id based quirk table?
>>
>>
>> Probably.
>
>
> Hopefully Jian-Hong can cook up a quirk table for the issue.

Module r8169 gets nothing in the PCI BAR after system resumes which
makes MSI-X fail on some ASUS laptops equipped with RTL8106e chip.
https://www.spinics.net/lists/linux-pci/msg75598.html

Actually, I am waiting for the patch "PCI: Reprogram bridge prefetch
registers on resume" being merged.
https://marc.info/?l=linux-pm&m=153680987814299&w=2

It resolves the drivers which get nothing in PCI BAR after system resumes.

After that, I can remove the falling back code of RTL8106e.

Heiner, any comment?

Regards,
Jian-Hong Pan

>>
>>> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
>>> reservation mode for non maskable MSI") cleared the reservation mode, and
>>> I
>>> can see this after S3:
>>>
>>> [ 94.872838] do_IRQ: 3.33 No irq handler for vector
>>
>>
>> It's not because of that commit, really. There is a interrupt sent after
>> resume to the wrong vector for whatever reason. The MSI vector cannot be
>> masked it seems in the device, but the driver should quiescen the device
>> to
>> a point where it does not send interrupts.
>
>
> Understood.
>
>>
>>> If the device uses MSI-X instead of MSI, the issue doesn't happen because
>>> of
>>> reservation mode.
>>
>>
>> Reservation mode has absolutely nothing to do with that. What prevents the
>> issue is the fact that MSI-X can be masked by the IRQ core.
>
>
> So in this case I think keep the device using MSI-X is a better route, it's
> MSI-X capable anyway.
>
>>
>>> Is it something should be handled by x86 BIOS? Because I don't see this
>>> issue
>>> when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.
>>
>>
>> Suspend to idle works completely different and I don't see the BIOS at
>> fault here. it's more an issue of MSI not being maskable on that device,
>> which can't be fixed in BIOS or it's some half quiescened state which is
>> used when suspending and that's a pure driver issue.
>
>
> Understood.
> Thanks for all the info!
>
> Kai-Heng
>
>>
>> Thanks,
>>
>> tglx
>
>
>

2018-09-21 17:11:03

by Andy Shevchenko

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

On Thu, Sep 13, 2018 at 8:53 AM Jian-Hong Pan <[email protected]> wrote:
>
> 2018-09-12 16:19 GMT+08:00 Kai-Heng Feng <[email protected]>:
> > at 14:32, Thomas Gleixner <[email protected]> wrote:
> >
> >> On Wed, 12 Sep 2018, Kai-Heng Feng wrote:
> >>
> >>> There's a Dell machine with RTL8106e stops to work after S3 since the
> >>> commit introduced. So I am wondering if it's possible to revert the
> >>> commit and use DMI/subsystem id based quirk table?
> >>
> >>
> >> Probably.

Have you seen this thread:
https://patchwork.ozlabs.org/cover/968924/

and this one:
https://patchwork.kernel.org/patch/10583229/

?

> >
> >
> > Hopefully Jian-Hong can cook up a quirk table for the issue.
>
> Module r8169 gets nothing in the PCI BAR after system resumes which
> makes MSI-X fail on some ASUS laptops equipped with RTL8106e chip.
> https://www.spinics.net/lists/linux-pci/msg75598.html
>
> Actually, I am waiting for the patch "PCI: Reprogram bridge prefetch
> registers on resume" being merged.
> https://marc.info/?l=linux-pm&m=153680987814299&w=2
>
> It resolves the drivers which get nothing in PCI BAR after system resumes.
>
> After that, I can remove the falling back code of RTL8106e.
>
> Heiner, any comment?
>
> Regards,
> Jian-Hong Pan
>
> >>
> >>> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
> >>> reservation mode for non maskable MSI") cleared the reservation mode, and
> >>> I
> >>> can see this after S3:
> >>>
> >>> [ 94.872838] do_IRQ: 3.33 No irq handler for vector
> >>
> >>
> >> It's not because of that commit, really. There is a interrupt sent after
> >> resume to the wrong vector for whatever reason. The MSI vector cannot be
> >> masked it seems in the device, but the driver should quiescen the device
> >> to
> >> a point where it does not send interrupts.
> >
> >
> > Understood.
> >
> >>
> >>> If the device uses MSI-X instead of MSI, the issue doesn't happen because
> >>> of
> >>> reservation mode.
> >>
> >>
> >> Reservation mode has absolutely nothing to do with that. What prevents the
> >> issue is the fact that MSI-X can be masked by the IRQ core.
> >
> >
> > So in this case I think keep the device using MSI-X is a better route, it's
> > MSI-X capable anyway.
> >
> >>
> >>> Is it something should be handled by x86 BIOS? Because I don't see this
> >>> issue
> >>> when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.
> >>
> >>
> >> Suspend to idle works completely different and I don't see the BIOS at
> >> fault here. it's more an issue of MSI not being maskable on that device,
> >> which can't be fixed in BIOS or it's some half quiescened state which is
> >> used when suspending and that's a pure driver issue.
> >
> >
> > Understood.
> > Thanks for all the info!
> >
> > Kai-Heng
> >
> >>
> >> Thanks,
> >>
> >> tglx
> >
> >
> >



--
With Best Regards,
Andy Shevchenko

2018-09-27 09:04:36

by Jian-Hong Pan

[permalink] [raw]
Subject: Re: Regression caused by commit 7bb05b85bc2d ("r8169: don't use MSI-X on RTL8106e")

Andy Shevchenko <[email protected]> 於 2018年9月22日 週六 上午1:08寫道:
>
> On Thu, Sep 13, 2018 at 8:53 AM Jian-Hong Pan <[email protected]> wrote:
> >
> > 2018-09-12 16:19 GMT+08:00 Kai-Heng Feng <[email protected]>:
> > > at 14:32, Thomas Gleixner <[email protected]> wrote:
> > >
> > >> On Wed, 12 Sep 2018, Kai-Heng Feng wrote:
> > >>
> > >>> There's a Dell machine with RTL8106e stops to work after S3 since the
> > >>> commit introduced. So I am wondering if it's possible to revert the
> > >>> commit and use DMI/subsystem id based quirk table?
> > >>
> > >>
> > >> Probably.
>
> Have you seen this thread:
> https://patchwork.ozlabs.org/cover/968924/
>
> and this one:
> https://patchwork.kernel.org/patch/10583229/

Ya! It is the one. And it is discussed in bugzilla
https://bugzilla.kernel.org/show_bug.cgi?id=201181
Now, the revert patch is submitted https://lkml.org/lkml/2018/9/27/224
However, still thanks for your information. :)

Regards,
Jian-Hong Pan

> ?
>
> > >
> > >
> > > Hopefully Jian-Hong can cook up a quirk table for the issue.
> >
> > Module r8169 gets nothing in the PCI BAR after system resumes which
> > makes MSI-X fail on some ASUS laptops equipped with RTL8106e chip.
> > https://www.spinics.net/lists/linux-pci/msg75598.html
> >
> > Actually, I am waiting for the patch "PCI: Reprogram bridge prefetch
> > registers on resume" being merged.
> > https://marc.info/?l=linux-pm&m=153680987814299&w=2
> >
> > It resolves the drivers which get nothing in PCI BAR after system resumes.
> >
> > After that, I can remove the falling back code of RTL8106e.
> >
> > Heiner, any comment?
> >
> > Regards,
> > Jian-Hong Pan
> >
> > >>
> > >>> It's because of commit bc976233a872 ("genirq/msi, x86/vector: Prevent
> > >>> reservation mode for non maskable MSI") cleared the reservation mode, and
> > >>> I
> > >>> can see this after S3:
> > >>>
> > >>> [ 94.872838] do_IRQ: 3.33 No irq handler for vector
> > >>
> > >>
> > >> It's not because of that commit, really. There is a interrupt sent after
> > >> resume to the wrong vector for whatever reason. The MSI vector cannot be
> > >> masked it seems in the device, but the driver should quiescen the device
> > >> to
> > >> a point where it does not send interrupts.
> > >
> > >
> > > Understood.
> > >
> > >>
> > >>> If the device uses MSI-X instead of MSI, the issue doesn't happen because
> > >>> of
> > >>> reservation mode.
> > >>
> > >>
> > >> Reservation mode has absolutely nothing to do with that. What prevents the
> > >> issue is the fact that MSI-X can be masked by the IRQ core.
> > >
> > >
> > > So in this case I think keep the device using MSI-X is a better route, it's
> > > MSI-X capable anyway.
> > >
> > >>
> > >>> Is it something should be handled by x86 BIOS? Because I don't see this
> > >>> issue
> > >>> when I use Suspend-to-Idle, which doesn't use BIOS to do suspend.
> > >>
> > >>
> > >> Suspend to idle works completely different and I don't see the BIOS at
> > >> fault here. it's more an issue of MSI not being maskable on that device,
> > >> which can't be fixed in BIOS or it's some half quiescened state which is
> > >> used when suspending and that's a pure driver issue.
> > >
> > >
> > > Understood.
> > > Thanks for all the info!
> > >
> > > Kai-Heng
> > >
> > >>
> > >> Thanks,
> > >>
> > >> tglx
> > >
> > >
> > >
>
>
>
> --
> With Best Regards,
> Andy Shevchenko