2019-04-29 19:50:19

by Takashi Iwai

[permalink] [raw]
Subject: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

Hi,

we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
machine. 5.0.7 is confirmed to work, so it must be a regression
introduced by 5.0.8.

The details are found in openSUSE Bugzilla entry:
https://bugzilla.opensuse.org/show_bug.cgi?id=1132943

The probe of xhci_hcd on the dock fails like:
[ 6.269062] pcieport 0000:3a:00.0: enabling device (0006 -> 0007)
[ 6.270027] pcieport 0000:3b:03.0: enabling device (0006 -> 0007)
[ 6.270758] xhci_hcd 0000:3c:00.0: init 0000:3c:00.0 fail, -16
[ 6.270764] xhci_hcd: probe of 0000:3c:00.0 failed with error -16
[ 6.271002] xhci_hcd 0000:3d:00.0: init 0000:3d:00.0 fail, -16

and later on, thunderbolt gives warnings:
[ 30.232676] thunderbolt 0000:05:00.0: unexpected hop count: 1023
[ 30.232957] ------------[ cut here ]------------
[ 30.232958] thunderbolt 0000:05:00.0: interrupt for TX ring 0 is already enabled
[ 30.232974] WARNING: CPU: 3 PID: 1009 at drivers/thunderbolt/nhi.c:107 ring_interrupt_active+0x1ea/0x230 [thunderbolt]


I blindly suspected the commit 3943af9d01e9 and asked for a reverted
kernel, but in vain. And now it was confirmed that the problem is
present with the latest 5.1-rc, too.

I put some people who might have interest and the reporter (Michael)
to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
let me know if any help needed from the distro side.


Thanks!

Takashi


2019-04-29 20:05:41

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Mon, Apr 29, 2019 at 09:47:15PM +0200, Takashi Iwai wrote:
> Hi,

Hi,

> we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
> machine. 5.0.7 is confirmed to work, so it must be a regression
> introduced by 5.0.8.
>
> The details are found in openSUSE Bugzilla entry:
> https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
>
> The probe of xhci_hcd on the dock fails like:
> [ 6.269062] pcieport 0000:3a:00.0: enabling device (0006 -> 0007)
> [ 6.270027] pcieport 0000:3b:03.0: enabling device (0006 -> 0007)
> [ 6.270758] xhci_hcd 0000:3c:00.0: init 0000:3c:00.0 fail, -16
> [ 6.270764] xhci_hcd: probe of 0000:3c:00.0 failed with error -16
> [ 6.271002] xhci_hcd 0000:3d:00.0: init 0000:3d:00.0 fail, -16
>
> and later on, thunderbolt gives warnings:
> [ 30.232676] thunderbolt 0000:05:00.0: unexpected hop count: 1023
> [ 30.232957] ------------[ cut here ]------------
> [ 30.232958] thunderbolt 0000:05:00.0: interrupt for TX ring 0 is already enabled
> [ 30.232974] WARNING: CPU: 3 PID: 1009 at drivers/thunderbolt/nhi.c:107 ring_interrupt_active+0x1ea/0x230 [thunderbolt]
>
>
> I blindly suspected the commit 3943af9d01e9 and asked for a reverted
> kernel, but in vain. And now it was confirmed that the problem is
> present with the latest 5.1-rc, too.
>
> I put some people who might have interest and the reporter (Michael)
> to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
> let me know if any help needed from the distro side.

Since it exists in 5.1-rcX also it would be good if someone
who see the problem (Michael?) could bisect it.

2019-04-29 20:11:08

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

[+cc linux-pci]

On Mon, Apr 29, 2019 at 2:55 PM Mika Westerberg
<[email protected]> wrote:
>
> On Mon, Apr 29, 2019 at 09:47:15PM +0200, Takashi Iwai wrote:
> > Hi,
>
> Hi,
>
> > we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
> > machine. 5.0.7 is confirmed to work, so it must be a regression
> > introduced by 5.0.8.
> >
> > The details are found in openSUSE Bugzilla entry:
> > https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
> >
> > The probe of xhci_hcd on the dock fails like:
> > [ 6.269062] pcieport 0000:3a:00.0: enabling device (0006 -> 0007)
> > [ 6.270027] pcieport 0000:3b:03.0: enabling device (0006 -> 0007)
> > [ 6.270758] xhci_hcd 0000:3c:00.0: init 0000:3c:00.0 fail, -16
> > [ 6.270764] xhci_hcd: probe of 0000:3c:00.0 failed with error -16
> > [ 6.271002] xhci_hcd 0000:3d:00.0: init 0000:3d:00.0 fail, -16
> >
> > and later on, thunderbolt gives warnings:
> > [ 30.232676] thunderbolt 0000:05:00.0: unexpected hop count: 1023
> > [ 30.232957] ------------[ cut here ]------------
> > [ 30.232958] thunderbolt 0000:05:00.0: interrupt for TX ring 0 is already enabled
> > [ 30.232974] WARNING: CPU: 3 PID: 1009 at drivers/thunderbolt/nhi.c:107 ring_interrupt_active+0x1ea/0x230 [thunderbolt]
> >
> >
> > I blindly suspected the commit 3943af9d01e9 and asked for a reverted
> > kernel, but in vain. And now it was confirmed that the problem is
> > present with the latest 5.1-rc, too.
> >
> > I put some people who might have interest and the reporter (Michael)
> > to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
> > let me know if any help needed from the distro side.
>
> Since it exists in 5.1-rcX also it would be good if someone
> who see the problem (Michael?) could bisect it.

2019-04-29 20:15:08

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Mon, Apr 29, 2019 at 10:03:00PM +0200, Michael Hirmke wrote:
> Hi all,
>
> >On Mon, Apr 29, 2019 at 09:47:15PM +0200, Takashi Iwai wrote:
> >> Hi,
>
> >Hi,
>
> >> we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
> >> machine. 5.0.7 is confirmed to work, so it must be a regression
> >> introduced by 5.0.8.
> >>
> >> The details are found in openSUSE Bugzilla entry:
> >> https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
> >>
> [...]
> >>
> >> I blindly suspected the commit 3943af9d01e9 and asked for a reverted
> >> kernel, but in vain. And now it was confirmed that the problem is
> >> present with the latest 5.1-rc, too.
> >>
> >> I put some people who might have interest and the reporter (Michael)
> >> to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
> >> let me know if any help needed from the distro side.
>
> >Since it exists in 5.1-rcX also it would be good if someone
> >who see the problem (Michael?) could bisect it.
>
> I know the meaning of bisecting, but I'm not really a developer, so I am
> probably not able to interpret the results.

No worries.

I'm adding Christian who reported similar (same?) problem last week.
Christian, this seems to exist in v5.1-rc6 at least. Can you try to
bisect it on your side?

I also have XPS 9370 but not that particular dock. I will check tomorrow
if I can reproduce it as well.

2019-04-29 20:15:27

by opensuse

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

Hi all,

>On Mon, Apr 29, 2019 at 09:47:15PM +0200, Takashi Iwai wrote:
>> Hi,

>Hi,

>> we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
>> machine. 5.0.7 is confirmed to work, so it must be a regression
>> introduced by 5.0.8.
>>
>> The details are found in openSUSE Bugzilla entry:
>> https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
>>
[...]
>>
>> I blindly suspected the commit 3943af9d01e9 and asked for a reverted
>> kernel, but in vain. And now it was confirmed that the problem is
>> present with the latest 5.1-rc, too.
>>
>> I put some people who might have interest and the reporter (Michael)
>> to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
>> let me know if any help needed from the distro side.

>Since it exists in 5.1-rcX also it would be good if someone
>who see the problem (Michael?) could bisect it.

I know the meaning of bisecting, but I'm not really a developer, so I am
probably not able to interpret the results.

Sry.

Bye.
Michael.
--
Michael Hirmke

2019-04-29 20:39:02

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Mon, Apr 29, 2019 at 11:13:47PM +0300, Mika Westerberg wrote:
> On Mon, Apr 29, 2019 at 10:03:00PM +0200, Michael Hirmke wrote:
> > Hi all,
> >
> > >On Mon, Apr 29, 2019 at 09:47:15PM +0200, Takashi Iwai wrote:
> > >> Hi,
> >
> > >Hi,
> >
> > >> we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
> > >> machine. 5.0.7 is confirmed to work, so it must be a regression
> > >> introduced by 5.0.8.
> > >>
> > >> The details are found in openSUSE Bugzilla entry:
> > >> https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
> > >>
> > [...]
> > >>
> > >> I blindly suspected the commit 3943af9d01e9 and asked for a reverted
> > >> kernel, but in vain. And now it was confirmed that the problem is
> > >> present with the latest 5.1-rc, too.
> > >>
> > >> I put some people who might have interest and the reporter (Michael)
> > >> to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
> > >> let me know if any help needed from the distro side.
> >
> > >Since it exists in 5.1-rcX also it would be good if someone
> > >who see the problem (Michael?) could bisect it.
> >
> > I know the meaning of bisecting, but I'm not really a developer, so I am
> > probably not able to interpret the results.
>
> No worries.
>
> I'm adding Christian who reported similar (same?) problem last week.
> Christian, this seems to exist in v5.1-rc6 at least. Can you try to
> bisect it on your side?
>
> I also have XPS 9370 but not that particular dock. I will check tomorrow
> if I can reproduce it as well.

There aren't too many changes between 5.0.7 and 5.0.8 that touch
PCI/ACPI. This is just a shot in the dark but could you try to revert:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a

and see if it makes any difference?

2019-04-29 22:09:20

by Takashi Iwai

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Mon, 29 Apr 2019 22:36:30 +0200,
Mika Westerberg wrote:
>
> On Mon, Apr 29, 2019 at 11:13:47PM +0300, Mika Westerberg wrote:
> > On Mon, Apr 29, 2019 at 10:03:00PM +0200, Michael Hirmke wrote:
> > > Hi all,
> > >
> > > >On Mon, Apr 29, 2019 at 09:47:15PM +0200, Takashi Iwai wrote:
> > > >> Hi,
> > >
> > > >Hi,
> > >
> > > >> we've got a regression report wrt xhci_hcd and thunderbolt on a Dell
> > > >> machine. 5.0.7 is confirmed to work, so it must be a regression
> > > >> introduced by 5.0.8.
> > > >>
> > > >> The details are found in openSUSE Bugzilla entry:
> > > >> https://bugzilla.opensuse.org/show_bug.cgi?id=1132943
> > > >>
> > > [...]
> > > >>
> > > >> I blindly suspected the commit 3943af9d01e9 and asked for a reverted
> > > >> kernel, but in vain. And now it was confirmed that the problem is
> > > >> present with the latest 5.1-rc, too.
> > > >>
> > > >> I put some people who might have interest and the reporter (Michael)
> > > >> to Cc. If anyone has an idea, feel free to join to the Bugzilla, or
> > > >> let me know if any help needed from the distro side.
> > >
> > > >Since it exists in 5.1-rcX also it would be good if someone
> > > >who see the problem (Michael?) could bisect it.
> > >
> > > I know the meaning of bisecting, but I'm not really a developer, so I am
> > > probably not able to interpret the results.
> >
> > No worries.
> >
> > I'm adding Christian who reported similar (same?) problem last week.
> > Christian, this seems to exist in v5.1-rc6 at least. Can you try to
> > bisect it on your side?
> >
> > I also have XPS 9370 but not that particular dock. I will check tomorrow
> > if I can reproduce it as well.
>
> There aren't too many changes between 5.0.7 and 5.0.8 that touch
> PCI/ACPI. This is just a shot in the dark but could you try to revert:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a
>
> and see if it makes any difference?

OK, I'm building a test kernel package with the revert in OBS
home:tiwai:bsc1133486 repo. A new kernel will be
kernel-default-5.0.10-*g8edeab8:
http://download.opensuse.org/repositories/home:/tiwai:/bsc1133486/standard/

Michael, once when the new kernel is ready, please give it a try.


thanks,

Takashi

2019-04-30 08:41:32

by opensuse

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

Hi Takashi,

[...]
>>> I also have XPS 9370 but not that particular dock. I will check tomorrow
>>> if I can reproduce it as well.
>>
>> There aren't too many changes between 5.0.7 and 5.0.8 that touch
>> PCI/ACPI. This is just a shot in the dark but could you try to revert:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.gi
>> t/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a
>>
>> and see if it makes any difference?

>OK, I'm building a test kernel package with the revert in OBS
>home:tiwai:bsc1133486 repo. A new kernel will be
>kernel-default-5.0.10-*g8edeab8:
> http://download.opensuse.org/repositories/home:/tiwai:/bsc1133486/standard/

>Michael, once when the new kernel is ready, please give it a try.

as far as I can see, state is back to normal with this kernel.
No more error messages or crashing modules and all devices seem to work
as expected.
Only thing is, that the external devices connected to the Thunderbolt
dock are coming up a little bit slower than with 5.0.7 - but this is
nothing, I'd worry about.

>thanks,

Thank *you*.

>Takashi

Bye.
Michael.
--
Michael Hirmke

2019-04-30 09:01:42

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

+Rafael, Furquan and linux-acpi

(The original thread is here https://lore.kernel.org/lkml/[email protected]/T/#u)

On Tue, Apr 30, 2019 at 10:39:00AM +0200, Michael Hirmke wrote:
> Hi Takashi,
>
> [...]
> >>> I also have XPS 9370 but not that particular dock. I will check tomorrow
> >>> if I can reproduce it as well.
> >>
> >> There aren't too many changes between 5.0.7 and 5.0.8 that touch
> >> PCI/ACPI. This is just a shot in the dark but could you try to revert:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.gi
> >> t/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a
> >>
> >> and see if it makes any difference?
>
> >OK, I'm building a test kernel package with the revert in OBS
> >home:tiwai:bsc1133486 repo. A new kernel will be
> >kernel-default-5.0.10-*g8edeab8:
> > http://download.opensuse.org/repositories/home:/tiwai:/bsc1133486/standard/
>
> >Michael, once when the new kernel is ready, please give it a try.
>
> as far as I can see, state is back to normal with this kernel.
> No more error messages or crashing modules and all devices seem to work
> as expected.
> Only thing is, that the external devices connected to the Thunderbolt
> dock are coming up a little bit slower than with 5.0.7 - but this is
> nothing, I'd worry about.

Thanks for testing.

Rafael, it seems that commit c8b1917c8987 ("ACPICA: Clear status of GPEs
before enabling them") causes problem with Thunderbolt controllers if
you boot with device (dock) connected.

I think the reason is the same that got fixed in v4.14 with commit
ecc1165b8b74 ("ACPICA: Dispatch active GPEs at init time") which the
above commit essentially undoes if I understand it correctly.

2019-04-30 09:40:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Tue, Apr 30, 2019 at 11:00 AM Mika Westerberg
<[email protected]> wrote:
>
> +Rafael, Furquan and linux-acpi
>
> (The original thread is here https://lore.kernel.org/lkml/[email protected]/T/#u)
>
> On Tue, Apr 30, 2019 at 10:39:00AM +0200, Michael Hirmke wrote:
> > Hi Takashi,
> >
> > [...]
> > >>> I also have XPS 9370 but not that particular dock. I will check tomorrow
> > >>> if I can reproduce it as well.
> > >>
> > >> There aren't too many changes between 5.0.7 and 5.0.8 that touch
> > >> PCI/ACPI. This is just a shot in the dark but could you try to revert:
> > >>
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.gi
> > >> t/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a
> > >>
> > >> and see if it makes any difference?
> >
> > >OK, I'm building a test kernel package with the revert in OBS
> > >home:tiwai:bsc1133486 repo. A new kernel will be
> > >kernel-default-5.0.10-*g8edeab8:
> > > http://download.opensuse.org/repositories/home:/tiwai:/bsc1133486/standard/
> >
> > >Michael, once when the new kernel is ready, please give it a try.
> >
> > as far as I can see, state is back to normal with this kernel.
> > No more error messages or crashing modules and all devices seem to work
> > as expected.
> > Only thing is, that the external devices connected to the Thunderbolt
> > dock are coming up a little bit slower than with 5.0.7 - but this is
> > nothing, I'd worry about.
>
> Thanks for testing.
>
> Rafael, it seems that commit c8b1917c8987 ("ACPICA: Clear status of GPEs
> before enabling them") causes problem with Thunderbolt controllers if
> you boot with device (dock) connected.
>
> I think the reason is the same that got fixed in v4.14 with commit
> ecc1165b8b74 ("ACPICA: Dispatch active GPEs at init time") which the
> above commit essentially undoes if I understand it correctly.

OK, I'll queue up a revert of that one then, thanks!

Erik, I think that commit c8b1917c8987 has been picked up by the
upstream ACPICA already. If I'm not mistaken, it needs to be reverted
from there as well.

2019-05-02 11:50:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Tue, Apr 30, 2019 at 11:37:48AM +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 30, 2019 at 11:00 AM Mika Westerberg
> <[email protected]> wrote:
> >
> > +Rafael, Furquan and linux-acpi
> >
> > (The original thread is here https://lore.kernel.org/lkml/[email protected]/T/#u)
> >
> > On Tue, Apr 30, 2019 at 10:39:00AM +0200, Michael Hirmke wrote:
> > > Hi Takashi,
> > >
> > > [...]
> > > >>> I also have XPS 9370 but not that particular dock. I will check tomorrow
> > > >>> if I can reproduce it as well.
> > > >>
> > > >> There aren't too many changes between 5.0.7 and 5.0.8 that touch
> > > >> PCI/ACPI. This is just a shot in the dark but could you try to revert:
> > > >>
> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.gi
> > > >> t/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a
> > > >>
> > > >> and see if it makes any difference?
> > >
> > > >OK, I'm building a test kernel package with the revert in OBS
> > > >home:tiwai:bsc1133486 repo. A new kernel will be
> > > >kernel-default-5.0.10-*g8edeab8:
> > > > http://download.opensuse.org/repositories/home:/tiwai:/bsc1133486/standard/
> > >
> > > >Michael, once when the new kernel is ready, please give it a try.
> > >
> > > as far as I can see, state is back to normal with this kernel.
> > > No more error messages or crashing modules and all devices seem to work
> > > as expected.
> > > Only thing is, that the external devices connected to the Thunderbolt
> > > dock are coming up a little bit slower than with 5.0.7 - but this is
> > > nothing, I'd worry about.
> >
> > Thanks for testing.
> >
> > Rafael, it seems that commit c8b1917c8987 ("ACPICA: Clear status of GPEs
> > before enabling them") causes problem with Thunderbolt controllers if
> > you boot with device (dock) connected.
> >
> > I think the reason is the same that got fixed in v4.14 with commit
> > ecc1165b8b74 ("ACPICA: Dispatch active GPEs at init time") which the
> > above commit essentially undoes if I understand it correctly.
>
> OK, I'll queue up a revert of that one then, thanks!
>
> Erik, I think that commit c8b1917c8987 has been picked up by the
> upstream ACPICA already. If I'm not mistaken, it needs to be reverted
> from there as well.

I've queued the revert up in the stable trees as it has hit Linus's tree
now, and will push out a new round of stable kernels soon.

thanks,

greg k-h

2019-05-03 23:36:39

by Furquan Shaikh

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Thu, May 2, 2019 at 4:48 AM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Apr 30, 2019 at 11:37:48AM +0200, Rafael J. Wysocki wrote:
> > On Tue, Apr 30, 2019 at 11:00 AM Mika Westerberg
> > <[email protected]> wrote:
> > >
> > > +Rafael, Furquan and linux-acpi
> > >
> > > (The original thread is here https://lore.kernel.org/lkml/[email protected]/T/#u)
> > >
> > > On Tue, Apr 30, 2019 at 10:39:00AM +0200, Michael Hirmke wrote:
> > > > Hi Takashi,
> > > >
> > > > [...]
> > > > >>> I also have XPS 9370 but not that particular dock. I will check tomorrow
> > > > >>> if I can reproduce it as well.
> > > > >>
> > > > >> There aren't too many changes between 5.0.7 and 5.0.8 that touch
> > > > >> PCI/ACPI. This is just a shot in the dark but could you try to revert:
> > > > >>
> > > > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.gi
> > > > >> t/commit/?h=linux-5.0.y&id=da6a87fb0ad43ae811519d2e0aa325c7f792b13a
> > > > >>
> > > > >> and see if it makes any difference?
> > > >
> > > > >OK, I'm building a test kernel package with the revert in OBS
> > > > >home:tiwai:bsc1133486 repo. A new kernel will be
> > > > >kernel-default-5.0.10-*g8edeab8:
> > > > > http://download.opensuse.org/repositories/home:/tiwai:/bsc1133486/standard/
> > > >
> > > > >Michael, once when the new kernel is ready, please give it a try.
> > > >
> > > > as far as I can see, state is back to normal with this kernel.
> > > > No more error messages or crashing modules and all devices seem to work
> > > > as expected.
> > > > Only thing is, that the external devices connected to the Thunderbolt
> > > > dock are coming up a little bit slower than with 5.0.7 - but this is
> > > > nothing, I'd worry about.
> > >
> > > Thanks for testing.
> > >
> > > Rafael, it seems that commit c8b1917c8987 ("ACPICA: Clear status of GPEs
> > > before enabling them") causes problem with Thunderbolt controllers if
> > > you boot with device (dock) connected.
> > >
> > > I think the reason is the same that got fixed in v4.14 with commit
> > > ecc1165b8b74 ("ACPICA: Dispatch active GPEs at init time") which the
> > > above commit essentially undoes if I understand it correctly.
> >
> > OK, I'll queue up a revert of that one then, thanks!
> >
> > Erik, I think that commit c8b1917c8987 has been picked up by the
> > upstream ACPICA already. If I'm not mistaken, it needs to be reverted
> > from there as well.
>
> I've queued the revert up in the stable trees as it has hit Linus's tree
> now, and will push out a new round of stable kernels soon.
>
> thanks,
>
> greg k-h

Thanks for reporting the issue and apologize for the breakage. When I
pushed the patch, my understanding was that the device drivers do not
depend on stale GPE events to take any action.

I am curious to understand the behavior for the thunderbolt device
since I do not have one to test with. The failure seems to be a result
of either having a edge-triggered interrupt or a pulse interrupt which
indicates some kind of ready condition to the kernel driver. All the
runtime GPEs seem to be initialized as part of acpi_init before ACPI
bus is scanned. So, is this some special kind of requirement for
thunderbolt that requires GPE enabled before the device can actually
be probed. And so the GPEs going active before being enabled are then
used as a way to call into ACPI Method to enable something which is
essential for probing of device?

The other question I have is given that handling of GPE events that
were active before being enabled is required at least for some set of
devices (e.g. thunderbolt), what is a good way to solve the original
problem that was being addressed by the patch being reverted i.e.
stale events resulting in spurious wakes on wakeup GPEs. One way I can
think of is clearing the status of GPEs when they are setup for
wake(acpi_setup_gpe_for_wake). What do you think?

2019-05-06 06:46:45

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Fri, May 03, 2019 at 04:35:02PM -0700, Furquan Shaikh wrote:
> Thanks for reporting the issue and apologize for the breakage. When I
> pushed the patch, my understanding was that the device drivers do not
> depend on stale GPE events to take any action.
>
> I am curious to understand the behavior for the thunderbolt device
> since I do not have one to test with. The failure seems to be a result
> of either having a edge-triggered interrupt or a pulse interrupt which
> indicates some kind of ready condition to the kernel driver. All the
> runtime GPEs seem to be initialized as part of acpi_init before ACPI
> bus is scanned. So, is this some special kind of requirement for
> thunderbolt that requires GPE enabled before the device can actually
> be probed. And so the GPEs going active before being enabled are then
> used as a way to call into ACPI Method to enable something which is
> essential for probing of device?

IIRC the idea is that when you boot with a TBT device connected (this is
only for the BIOS assisted/ACPI enumeration mode) the Thunderbolt host
router (the device with PCIe switch + xHCI + NHI) is configured in two
phases. The basic configuration is done in the ASL code that then waits
for a synchronization event (signal) from the SMI hotplug handler that
allows it to continue. The GPE which can be either edge or level is then
used to call the SMI hotplug handler to initialize the host router and
its resources properly.

If this is not done the PCI stack finds the host router half-configured
causing the failure.

> The other question I have is given that handling of GPE events that
> were active before being enabled is required at least for some set of
> devices (e.g. thunderbolt), what is a good way to solve the original
> problem that was being addressed by the patch being reverted i.e.
> stale events resulting in spurious wakes on wakeup GPEs. One way I can
> think of is clearing the status of GPEs when they are setup for
> wake(acpi_setup_gpe_for_wake). What do you think?

Sounds good to me.

2019-05-09 11:53:30

by Furquan Shaikh

[permalink] [raw]
Subject: Re: [REGRESSION 5.0.8] Dell thunderbolt dock broken (xhci_hcd and thunderbolt)

On Sun, May 5, 2019 at 11:45 PM Mika Westerberg
<[email protected]> wrote:
>
> On Fri, May 03, 2019 at 04:35:02PM -0700, Furquan Shaikh wrote:
> > Thanks for reporting the issue and apologize for the breakage. When I
> > pushed the patch, my understanding was that the device drivers do not
> > depend on stale GPE events to take any action.
> >
> > I am curious to understand the behavior for the thunderbolt device
> > since I do not have one to test with. The failure seems to be a result
> > of either having a edge-triggered interrupt or a pulse interrupt which
> > indicates some kind of ready condition to the kernel driver. All the
> > runtime GPEs seem to be initialized as part of acpi_init before ACPI
> > bus is scanned. So, is this some special kind of requirement for
> > thunderbolt that requires GPE enabled before the device can actually
> > be probed. And so the GPEs going active before being enabled are then
> > used as a way to call into ACPI Method to enable something which is
> > essential for probing of device?
>
> IIRC the idea is that when you boot with a TBT device connected (this is
> only for the BIOS assisted/ACPI enumeration mode) the Thunderbolt host
> router (the device with PCIe switch + xHCI + NHI) is configured in two
> phases. The basic configuration is done in the ASL code that then waits
> for a synchronization event (signal) from the SMI hotplug handler that
> allows it to continue. The GPE which can be either edge or level is then
> used to call the SMI hotplug handler to initialize the host router and
> its resources properly.
>
> If this is not done the PCI stack finds the host router half-configured
> causing the failure.

Thanks for the explanation!

>
> > The other question I have is given that handling of GPE events that
> > were active before being enabled is required at least for some set of
> > devices (e.g. thunderbolt), what is a good way to solve the original
> > problem that was being addressed by the patch being reverted i.e.
> > stale events resulting in spurious wakes on wakeup GPEs. One way I can
> > think of is clearing the status of GPEs when they are setup for
> > wake(acpi_setup_gpe_for_wake). What do you think?
>
> Sounds good to me.

I will work on this and test it out to see how it goes. Thanks!