2024-05-05 01:12:35

by Micha Albert

[permalink] [raw]
Subject: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

Hello,

I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board. In 6.8.7, this works as expected, and my Plymouth screen (including the LUKS password prompt) shows on my 2 monitors connected to the GPU as well as my main laptop screen. Upon entering the password, I'm put into userspace as expected. However, upon upgrading to 6.8.8, I will be greeted with the regular password prompt, but after entering my password and waiting for it to be accepted, my eGPU will reset and not function. I can tell that it resets since I can hear the click of my ATX power supply turning off and on again, and the status LED of the eGPU board goes from green to blue and back to green, all in less than a second.

I talked to a friend, and we found out that the kernel parameter thunderbolt.host_reset=false fixes the issue. He also thinks that commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look suspicious. I've attached the output of dmesg when the error was occurring, since I'm still able to use my laptop normally when this happens, just not with my eGPU and its connected displays.

Sincerely,
Micha Albert


Attachments:
kernel-log-thunderbolt-error.log (119.08 kB)

2024-05-05 05:00:10

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

[CCing Mario, who asked for the two suspected commits to be backported]

On 05.05.24 03:12, Micha Albert wrote:
>
>     I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board.
> In 6.8.7, this works as expected, and my Plymouth screen (including the
> LUKS password prompt) shows on my 2 monitors connected to the GPU as
> well as my main laptop screen. Upon entering the password, I'm put into
> userspace as expected. However, upon upgrading to 6.8.8, I will be
> greeted with the regular password prompt, but after entering my password
> and waiting for it to be accepted, my eGPU will reset and not function.
> I can tell that it resets since I can hear the click of my ATX power
> supply turning off and on again, and the status LED of the eGPU board
> goes from green to blue and back to green, all in less than a second.
>
>    I talked to a friend, and we found out that the kernel parameter
> thunderbolt.host_reset=false fixes the issue. He also thinks that
> commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look
> suspicious. I've attached the output of dmesg when the error was
> occurring, since I'm still able to use my laptop normally when this
> happens, just not with my eGPU and its connected displays.

Thx for the report. Could you please test if 6.9-rc6 (or a later
snapshot; or -rc7, which should be out in about ~18 hours) is affected
as well? That would be really important to know.

It would also be great if you could try reverting the two patches you
mentioned and see if they are really what's causing this. There iirc are
two more; maybe you might need to revert some or all of them in the
order they were applied.

Ciao, Thorsten

P.s.: To be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced v6.8.7..v6.8.8
#regzbot title thunderbolt: eGPU disconnected during boot

2024-05-05 12:37:20

by Mario Limonciello

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8



On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing Mario, who asked for the two suspected commits to be backported]
>
> On 05.05.24 03:12, Micha Albert wrote:
>>
>>     I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board.
>> In 6.8.7, this works as expected, and my Plymouth screen (including the
>> LUKS password prompt) shows on my 2 monitors connected to the GPU as
>> well as my main laptop screen. Upon entering the password, I'm put into
>> userspace as expected. However, upon upgrading to 6.8.8, I will be
>> greeted with the regular password prompt, but after entering my password
>> and waiting for it to be accepted, my eGPU will reset and not function.
>> I can tell that it resets since I can hear the click of my ATX power
>> supply turning off and on again, and the status LED of the eGPU board
>> goes from green to blue and back to green, all in less than a second.
>>
>>    I talked to a friend, and we found out that the kernel parameter
>> thunderbolt.host_reset=false fixes the issue. He also thinks that
>> commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look
>> suspicious. I've attached the output of dmesg when the error was
>> occurring, since I'm still able to use my laptop normally when this
>> happens, just not with my eGPU and its connected displays.
>
> Thx for the report. Could you please test if 6.9-rc6 (or a later
> snapshot; or -rc7, which should be out in about ~18 hours) is affected
> as well? That would be really important to know.
>
> It would also be great if you could try reverting the two patches you
> mentioned and see if they are really what's causing this. There iirc are
> two more; maybe you might need to revert some or all of them in the
> order they were applied.

There are two other things that I think would be good to understand this
issue.

1) Is it related to trusted devices handling?

You can try to apply it both to 6.8.y or to 6.9-rc.

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iommu/fixes&id=0f91d0795741c12cee200667648669a91b568735

2) Is it because you have amdgpu in your initramfs but not thunderbolt?

If so; there's very likely an ordering issue.

[ 2.325788] [drm] GPU posting now...
[ 30.360701] ACPI: bus type thunderbolt registered

Can you remove amdgpu from your initramfs and wait for it to startup
after you pivot rootfs? Does this still happen?

>
> Ciao, Thorsten
>
> P.s.: To be sure the issue doesn't fall through the cracks unnoticed,
> I'm adding it to regzbot, the Linux kernel regression tracking bot:
>
> #regzbot ^introduced v6.8.7..v6.8.8
> #regzbot title thunderbolt: eGPU disconnected during boot
>

2024-05-05 14:23:56

by Mario Limonciello

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

On 5/5/2024 07:37, Mario Limonciello wrote:
>
>
> On 5/4/24 23:59, Linux regression tracking (Thorsten Leemhuis) wrote:
>> [CCing Mario, who asked for the two suspected commits to be backported]
>>
>> On 05.05.24 03:12, Micha Albert wrote:
>>>
>>>      I have an AMD Radeon 6600 XT GPU in a cheap Thunderbolt eGPU board.
>>> In 6.8.7, this works as expected, and my Plymouth screen (including the
>>> LUKS password prompt) shows on my 2 monitors connected to the GPU as
>>> well as my main laptop screen. Upon entering the password, I'm put into
>>> userspace as expected. However, upon upgrading to 6.8.8, I will be
>>> greeted with the regular password prompt, but after entering my password
>>> and waiting for it to be accepted, my eGPU will reset and not function.
>>> I can tell that it resets since I can hear the click of my ATX power
>>> supply turning off and on again, and the status LED of the eGPU board
>>> goes from green to blue and back to green, all in less than a second.
>>>
>>>     I talked to a friend, and we found out that the kernel parameter
>>> thunderbolt.host_reset=false fixes the issue. He also thinks that
>>> commits cc4c94 (59a54c upstream) and 11371c (ec8162 upstream) look
>>> suspicious. I've attached the output of dmesg when the error was
>>> occurring, since I'm still able to use my laptop normally when this
>>> happens, just not with my eGPU and its connected displays.
>>
>> Thx for the report. Could you please test if 6.9-rc6 (or a later
>> snapshot; or -rc7, which should be out in about ~18 hours) is affected
>> as well? That would be really important to know.
>>
>> It would also be great if you could try reverting the two patches you
>> mentioned and see if they are really what's causing this. There iirc are
>> two more; maybe you might need to revert some or all of them in the
>> order they were applied.
>
> There are two other things that I think would be good to understand this
> issue.
>
> 1) Is it related to trusted devices handling?
>
> You can try to apply it both to 6.8.y or to 6.9-rc.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=iommu/fixes&id=0f91d0795741c12cee200667648669a91b568735
>
> 2) Is it because you have amdgpu in your initramfs but not thunderbolt?
>
> If so; there's very likely an ordering issue.
>
> [    2.325788] [drm] GPU posting now...
> [   30.360701] ACPI: bus type thunderbolt registered
>
> Can you remove amdgpu from your initramfs and wait for it to startup
> after you pivot rootfs?  Does this still happen?
>

One more thought. When you say it's "not function", is it authorized in
thunderbolt sysfs?

See
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/thunderbolt.rst

Is it showing up in lspci anymore?

>>
>> Ciao, Thorsten
>>
>> P.s.: To be sure the issue doesn't fall through the cracks unnoticed,
>> I'm adding it to regzbot, the Linux kernel regression tracking bot:
>>
>> #regzbot ^introduced v6.8.7..v6.8.8
>> #regzbot title thunderbolt: eGPU disconnected during boot
>>


2024-05-06 12:26:17

by Gia

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
TS3 Plus Thunderbolt 3 dock.

After the update I see this message on boot "xHCI host controller not
responding, assume dead" and the dock is not working anymore. Kernel
6.8.7 works great.

2024-05-06 12:54:18

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

[CCing Mario, who asked for the two suspected commits to be backported]

On 06.05.24 14:24, Gia wrote:
> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> TS3 Plus Thunderbolt 3 dock.
>
> After the update I see this message on boot "xHCI host controller not
> responding, assume dead" and the dock is not working anymore. Kernel
> 6.8.7 works great.

Thx for the report. Could you make the kernel log (journalctl -k/dmesg)
accessible somewhere?

And have you looked into the other stuff that Mario suggested in the
other thread? See the following mail and the reply to it for details:

https://lore.kernel.org/all/[email protected]/T/#u

Ciao, Thorsten

P.S.: To be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced v6.8.7..v6.8.8
#regzbot title thunderbolt: TB3 dock problems, xHCI host controller not
responding, assume dead

2024-05-20 09:19:56

by Gia

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

Hi Thorsten,

I'll try to provide a kernel log ASAP, it's not that easy because when
I run into this issue my keyboard isn't working. The kernel parameter
that Mario suggested, thunderbolt.host_reset=false, fixes the issue!

I can add that without the suggested kernel parameter the issue
persists with the latest Archlinux kernel 6.9.1.

I also found another report of the issue on Archlinux forum:
https://bbs.archlinux.org/viewtopic.php?id=295824


On Mon, May 6, 2024 at 2:53 PM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> [CCing Mario, who asked for the two suspected commits to be backported]
>
> On 06.05.24 14:24, Gia wrote:
> > Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> > TS3 Plus Thunderbolt 3 dock.
> >
> > After the update I see this message on boot "xHCI host controller not
> > responding, assume dead" and the dock is not working anymore. Kernel
> > 6.8.7 works great.
>
> Thx for the report. Could you make the kernel log (journalctl -k/dmesg)
> accessible somewhere?
>
> And have you looked into the other stuff that Mario suggested in the
> other thread? See the following mail and the reply to it for details:
>
> https://lore.kernel.org/all/[email protected]/T/#u
>
> Ciao, Thorsten
>
> P.S.: To be sure the issue doesn't fall through the cracks unnoticed,
> I'm adding it to regzbot, the Linux kernel regression tracking bot:
>
> #regzbot ^introduced v6.8.7..v6.8.8
> #regzbot title thunderbolt: TB3 dock problems, xHCI host controller not
> responding, assume dead

2024-05-20 13:43:41

by Mario Limonciello

[permalink] [raw]
Subject: Re: [REGRESSION] Thunderbolt Host Reset Change Causes eGPU Disconnection from 6.8.7=>6.8.8

Can we please get some kernel logs for these two cases on the command line?

thunderbolt.dyndbg=+p
thunderbolt.dyndbg=+p thunderbolt.host_reset=false

Also what is the value for:

$ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection

That won't change in the two cases, but it will be really helpful to
understand this issue.

On 5/20/2024 04:19, Gia wrote:
> Hi Thorsten,
>
> I'll try to provide a kernel log ASAP, it's not that easy because when
> I run into this issue my keyboard isn't working. The kernel parameter
> that Mario suggested, thunderbolt.host_reset=false, fixes the issue!
>
> I can add that without the suggested kernel parameter the issue
> persists with the latest Archlinux kernel 6.9.1.
>
> I also found another report of the issue on Archlinux forum:
> https://bbs.archlinux.org/viewtopic.php?id=295824
>
>
> On Mon, May 6, 2024 at 2:53 PM Linux regression tracking (Thorsten
> Leemhuis) <[email protected]> wrote:
>>
>> [CCing Mario, who asked for the two suspected commits to be backported]
>>
>> On 06.05.24 14:24, Gia wrote:
>>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
>>> TS3 Plus Thunderbolt 3 dock.
>>>
>>> After the update I see this message on boot "xHCI host controller not
>>> responding, assume dead" and the dock is not working anymore. Kernel
>>> 6.8.7 works great.
>>
>> Thx for the report. Could you make the kernel log (journalctl -k/dmesg)
>> accessible somewhere?
>>
>> And have you looked into the other stuff that Mario suggested in the
>> other thread? See the following mail and the reply to it for details:
>>
>> https://lore.kernel.org/all/[email protected]/T/#u
>>
>> Ciao, Thorsten
>>
>> P.S.: To be sure the issue doesn't fall through the cracks unnoticed,
>> I'm adding it to regzbot, the Linux kernel regression tracking bot:
>>
>> #regzbot ^introduced v6.8.7..v6.8.8
>> #regzbot title thunderbolt: TB3 dock problems, xHCI host controller not
>> responding, assume dead


2024-05-20 14:40:11

by Christian Heusel

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> [CCing Mario, who asked for the two suspected commits to be backported]
>
> On 06.05.24 14:24, Gia wrote:
> > Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> > TS3 Plus Thunderbolt 3 dock.
> >
> > After the update I see this message on boot "xHCI host controller not
> > responding, assume dead" and the dock is not working anymore. Kernel
> > 6.8.7 works great.

We now have some further information on the matter as somebody was kind
enough to bisect the issue in the [Arch Linux Forums][0]:

cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")

This is a stable commit id, the relevant mainline commit is:

59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")

The other reporter created [a issue][1] in our bugtracker, which I'll
leave here just for completeness sake.

Reported-by: Benjamin B?hmke <[email protected]>
Reported-by: Gia <[email protected]>
Bisected-by: Benjamin B?hmke <[email protected]>

The person doing the bisection also offered to chime in here if further
debugging is needed!

Also CC'ing the Commitauthors & Subsystem Maintainers for this report.

Cheers,
Christian

[0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
[1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48

#regzbot introduced: 59a54c5f3dbd
#regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48


Attachments:
(No filename) (1.53 kB)
signature.asc (849.00 B)
Download all attachments

2024-05-20 14:42:01

by Mario Limonciello

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

On 5/20/2024 09:39, Christian Heusel wrote:
> On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
>> [CCing Mario, who asked for the two suspected commits to be backported]
>>
>> On 06.05.24 14:24, Gia wrote:
>>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
>>> TS3 Plus Thunderbolt 3 dock.
>>>
>>> After the update I see this message on boot "xHCI host controller not
>>> responding, assume dead" and the dock is not working anymore. Kernel
>>> 6.8.7 works great.
>
> We now have some further information on the matter as somebody was kind
> enough to bisect the issue in the [Arch Linux Forums][0]:
>
> cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")
>
> This is a stable commit id, the relevant mainline commit is:
>
> 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
>
> The other reporter created [a issue][1] in our bugtracker, which I'll
> leave here just for completeness sake.
>
> Reported-by: Benjamin Böhmke <[email protected]>
> Reported-by: Gia <[email protected]>
> Bisected-by: Benjamin Böhmke <[email protected]>
>
> The person doing the bisection also offered to chime in here if further
> debugging is needed!
>
> Also CC'ing the Commitauthors & Subsystem Maintainers for this report.
>
> Cheers,
> Christian
>
> [0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
> [1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
>
> #regzbot introduced: 59a54c5f3dbd
> #regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48

As I mentioned in my other email I would like to collate logs onto a
kernel Bugzilla. With these two cases:

thunderbolt.dyndbg=+p
thunderbolt.dyndbg=+p thunderbolt.host_reset=false

Also what is the value for:

$ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection

2024-05-20 15:23:21

by Benjamin Böhmke

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

On Monday, May 20, 2024 16:41 CEST, Mario Limonciello <[email protected]> wrote:

> On 5/20/2024 09:39, Christian Heusel wrote:
> > On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> [CCing Mario, who asked for the two suspected commits to be backported]
> >>
> >> On 06.05.24 14:24, Gia wrote:
> >>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> >>> TS3 Plus Thunderbolt 3 dock.
> >>>
> >>> After the update I see this message on boot "xHCI host controller not
> >>> responding, assume dead" and the dock is not working anymore. Kernel
> >>> 6.8.7 works great.
> >
> > We now have some further information on the matter as somebody was kind
> > enough to bisect the issue in the [Arch Linux Forums][0]:
> >
> > cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")
> >
> > This is a stable commit id, the relevant mainline commit is:
> >
> > 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
> >
> > The other reporter created [a issue][1] in our bugtracker, which I'll
> > leave here just for completeness sake.
> >
> > Reported-by: Benjamin Böhmke <[email protected]>
> > Reported-by: Gia <[email protected]>
> > Bisected-by: Benjamin Böhmke <[email protected]>
> >
> > The person doing the bisection also offered to chime in here if further
> > debugging is needed!
> >
> > Also CC'ing the Commitauthors & Subsystem Maintainers for this report.
> >
> > Cheers,
> > Christian
> >
> > [0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
> > [1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> >
> > #regzbot introduced: 59a54c5f3dbd
> > #regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
>
> As I mentioned in my other email I would like to collate logs onto a
> kernel Bugzilla. With these two cases:
>
> thunderbolt.dyndbg=+p
> thunderbolt.dyndbg=+p thunderbolt.host_reset=false
>
> Also what is the value for:
>
> $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection

I attached the requested kernel logs as text files (hope this is ok).
In both cases I used the stable ArchLinux kernel 6.9.1

The iommu_dma_protection is both cases "1".

Best Regards
Benjamin


Attachments:
dmesg_tb_dbg__reset_false.txt (107.40 kB)
dmesg_tb_dbg.txt (128.12 kB)
Download all attachments

2024-05-20 15:58:50

by Gia

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

Hi Mario,

In my case in both cases the value for:

$ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection

is 0.

Output of sudo journalctl -k with kernel option thunderbolt.dyndbg=+p:
https://codeshare.io/qAXLoj

Output of sudo dmesg with kernel option thunderbolt.dyndbg=+p:
https://codeshare.io/zlPgRb

Output of sudo journalctl -k with kernel options thunderbolt.dyndbg=+p
thunderbolt.host_reset=false:
https://codeshare.io/Lj3rPV

Output of sudo dmesg with kernel option thunderbolt.dyndbg=+p
thunderbolt.host_reset=false:
https://codeshare.io/beQw36

Best

Giacomo

On Mon, May 20, 2024 at 4:41 PM Mario Limonciello
<[email protected]> wrote:
>
> On 5/20/2024 09:39, Christian Heusel wrote:
> > On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> [CCing Mario, who asked for the two suspected commits to be backported]
> >>
> >> On 06.05.24 14:24, Gia wrote:
> >>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> >>> TS3 Plus Thunderbolt 3 dock.
> >>>
> >>> After the update I see this message on boot "xHCI host controller not
> >>> responding, assume dead" and the dock is not working anymore. Kernel
> >>> 6.8.7 works great.
> >
> > We now have some further information on the matter as somebody was kind
> > enough to bisect the issue in the [Arch Linux Forums][0]:
> >
> > cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")
> >
> > This is a stable commit id, the relevant mainline commit is:
> >
> > 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
> >
> > The other reporter created [a issue][1] in our bugtracker, which I'll
> > leave here just for completeness sake.
> >
> > Reported-by: Benjamin Böhmke <[email protected]>
> > Reported-by: Gia <[email protected]>
> > Bisected-by: Benjamin Böhmke <[email protected]>
> >
> > The person doing the bisection also offered to chime in here if further
> > debugging is needed!
> >
> > Also CC'ing the Commitauthors & Subsystem Maintainers for this report.
> >
> > Cheers,
> > Christian
> >
> > [0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
> > [1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> >
> > #regzbot introduced: 59a54c5f3dbd
> > #regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
>
> As I mentioned in my other email I would like to collate logs onto a
> kernel Bugzilla. With these two cases:
>
> thunderbolt.dyndbg=+p
> thunderbolt.dyndbg=+p thunderbolt.host_reset=false
>
> Also what is the value for:
>
> $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection

2024-05-20 16:21:59

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

Hi,

On Mon, May 20, 2024 at 05:12:40PM +0200, Benjamin Böhmke wrote:
> On Monday, May 20, 2024 16:41 CEST, Mario Limonciello <[email protected]> wrote:
>
> > On 5/20/2024 09:39, Christian Heusel wrote:
> > > On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> > >> [CCing Mario, who asked for the two suspected commits to be backported]
> > >>
> > >> On 06.05.24 14:24, Gia wrote:
> > >>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> > >>> TS3 Plus Thunderbolt 3 dock.
> > >>>
> > >>> After the update I see this message on boot "xHCI host controller not
> > >>> responding, assume dead" and the dock is not working anymore. Kernel
> > >>> 6.8.7 works great.
> > >
> > > We now have some further information on the matter as somebody was kind
> > > enough to bisect the issue in the [Arch Linux Forums][0]:
> > >
> > > cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")
> > >
> > > This is a stable commit id, the relevant mainline commit is:
> > >
> > > 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
> > >
> > > The other reporter created [a issue][1] in our bugtracker, which I'll
> > > leave here just for completeness sake.
> > >
> > > Reported-by: Benjamin Böhmke <[email protected]>
> > > Reported-by: Gia <[email protected]>
> > > Bisected-by: Benjamin Böhmke <[email protected]>
> > >
> > > The person doing the bisection also offered to chime in here if further
> > > debugging is needed!
> > >
> > > Also CC'ing the Commitauthors & Subsystem Maintainers for this report.
> > >
> > > Cheers,
> > > Christian
> > >
> > > [0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
> > > [1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> > >
> > > #regzbot introduced: 59a54c5f3dbd
> > > #regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> >
> > As I mentioned in my other email I would like to collate logs onto a
> > kernel Bugzilla. With these two cases:
> >
> > thunderbolt.dyndbg=+p
> > thunderbolt.dyndbg=+p thunderbolt.host_reset=false
> >
> > Also what is the value for:
> >
> > $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection
>
> I attached the requested kernel logs as text files (hope this is ok).
> In both cases I used the stable ArchLinux kernel 6.9.1
>
> The iommu_dma_protection is both cases "1".
>
> Best Regards
> Benjamin

After reset the link comes up just fine but there is one thing that I
noticed:

> [ 8.225355] thunderbolt 0-0:1.1: NVM version 7.0
> [ 8.225360] thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
> [ 8.226410] thunderbolt 0000:00:0d.2: current switch config:
> [ 8.226413] thunderbolt 0000:00:0d.2: Thunderbolt 3 Switch: 8086:15ef (Revision: 6, TB Version: 16)
> [ 8.226417] thunderbolt 0000:00:0d.2: Max Port Number: 13
> [ 8.226420] thunderbolt 0000:00:0d.2: Config:
> [ 8.226421] thunderbolt 0000:00:0d.2: Upstream Port Number: 0 Depth: 0 Route String: 0x0 Enabled: 0, PlugEventsDelay: 10ms
> [ 8.226424] thunderbolt 0000:00:0d.2: unknown1: 0x0 unknown4: 0x0
> [ 8.227755] iwlwifi 0000:00:14.3: Registered PHC clock: iwlwifi-PTP, with index: 0
> [ 8.234944] thunderbolt 0000:00:0d.2: initializing Switch at 0x1 (depth: 1, up port: 1)
> [ 8.246755] thunderbolt 0000:00:0d.2: acking hot plug event on 1:2
> [ 8.267378] thunderbolt 0000:00:0d.2: 1: reading DROM (length: 0x6d)
> [ 8.879296] thunderbolt 0000:00:0d.2: 1: DROM version: 1
> [ 8.880631] thunderbolt 0000:00:0d.2: 1: uid: 0x3d600630c86400
> [ 8.884540] thunderbolt 0000:00:0d.2: Port 1: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1))
> [ 8.884562] thunderbolt 0000:00:0d.2: Max hop id (in/out): 19/19
> [ 8.884564] thunderbolt 0000:00:0d.2: Max counters: 16
> [ 8.884566] thunderbolt 0000:00:0d.2: NFC Credits: 0x3c00000
> [ 8.884567] thunderbolt 0000:00:0d.2: Credits (total/control): 60/2
> [ 8.887782] thunderbolt 0000:00:0d.2: Port 2: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1))
> [ 8.887787] thunderbolt 0000:00:0d.2: Max hop id (in/out): 19/19
> [ 8.887789] thunderbolt 0000:00:0d.2: Max counters: 16
> [ 8.887791] thunderbolt 0000:00:0d.2: NFC Credits: 0x3c00000
> [ 8.887792] thunderbolt 0000:00:0d.2: Credits (total/control): 60/2
> [ 8.887794] thunderbolt 0000:00:0d.2: 1:3: disabled by eeprom
> [ 8.887795] thunderbolt 0000:00:0d.2: 1:4: disabled by eeprom
> [ 8.887796] thunderbolt 0000:00:0d.2: 1:5: disabled by eeprom
> [ 8.887797] thunderbolt 0000:00:0d.2: 1:6: disabled by eeprom
> [ 8.887798] thunderbolt 0000:00:0d.2: 1:7: disabled by eeprom
> [ 8.888053] thunderbolt 0000:00:0d.2: Port 8: 8086:15ef (Revision: 6, TB Version: 1, Type: PCIe (0x100102))
> [ 8.888056] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> [ 8.888057] thunderbolt 0000:00:0d.2: Max counters: 2
> [ 8.888058] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> [ 8.888059] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> [ 8.888848] thunderbolt 0000:00:0d.2: Port 9: 8086:15ef (Revision: 6, TB Version: 1, Type: PCIe (0x100101))
> [ 8.888850] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> [ 8.888851] thunderbolt 0000:00:0d.2: Max counters: 2
> [ 8.888852] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> [ 8.888852] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> [ 8.889379] thunderbolt 0000:00:0d.2: Port 10: 8086:15ef (Revision: 6, TB Version: 1, Type: DP/HDMI (0xe0102))
> [ 8.889381] thunderbolt 0000:00:0d.2: Max hop id (in/out): 9/9
> [ 8.889382] thunderbolt 0000:00:0d.2: Max counters: 2
> [ 8.889383] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> [ 8.889384] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> [ 8.890457] thunderbolt 0000:00:0d.2: Port 11: 8086:15ef (Revision: 6, TB Version: 1, Type: DP/HDMI (0xe0102))
> [ 8.890459] thunderbolt 0000:00:0d.2: Max hop id (in/out): 9/9
> [ 8.890460] thunderbolt 0000:00:0d.2: Max counters: 2
> [ 8.890461] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> [ 8.890462] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> [ 8.890721] thunderbolt 0000:00:0d.2: Port 12: 8086:15ea (Revision: 6, TB Version: 1, Type: Inactive (0x0))
> [ 8.890723] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> [ 8.890724] thunderbolt 0000:00:0d.2: Max counters: 2
> [ 8.890725] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> [ 8.890726] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> [ 8.891534] thunderbolt 0000:00:0d.2: Port 13: 8086:15ea (Revision: 6, TB Version: 1, Type: Inactive (0x0))
> [ 8.891545] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> [ 8.891551] thunderbolt 0000:00:0d.2: Max counters: 2
> [ 8.891557] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> [ 8.891564] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> [ 8.891825] thunderbolt 0000:00:0d.2: 1: current link speed 10.0 Gb/s

Here it is 10G instead of 20G which limits the bandwidth available for
DP tunneling.

..

> [ 9.297112] pci 0000:05:00.0: [8086:15f0] type 00 class 0x0c0330 PCIe Endpoint
> [ 9.297146] pci 0000:05:00.0: BAR 0 [mem 0x00000000-0x0000ffff]
> [ 9.297249] pci 0000:05:00.0: enabling Extended Tags
> [ 9.297479] pci 0000:05:00.0: supports D1 D2
> [ 9.297481] pci 0000:05:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> [ 9.297717] pci 0000:05:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)

The xHCI comes up just fine though.

> [ 9.300388] xhci_hcd 0000:05:00.0: xHCI Host Controller
> [ 9.300397] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 5
> [ 9.301802] xhci_hcd 0000:05:00.0: hcc params 0x200077c1 hci version 0x110 quirks 0x0000000200009810
> [ 9.302393] xhci_hcd 0000:05:00.0: xHCI Host Controller
> [ 9.302398] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 6
> [ 9.302401] xhci_hcd 0000:05:00.0: Host supports USB 3.1 Enhanced SuperSpeed
> [ 9.302459] usb usb5: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.09
> [ 9.302462] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> [ 9.302465] usb usb5: Product: xHCI Host Controller
> [ 9.302466] usb usb5: Manufacturer: Linux 6.9.1-arch1-1 xhci-hcd
> [ 9.302468] usb usb5: SerialNumber: 0000:05:00.0
> [ 9.302783] hub 5-0:1.0: USB hub found
> [ 9.302794] hub 5-0:1.0: 2 ports detected
> [ 9.302992] usb usb6: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.09
> [ 9.302995] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> [ 9.302997] usb usb6: Product: xHCI Host Controller
> [ 9.302998] usb usb6: Manufacturer: Linux 6.9.1-arch1-1 xhci-hcd
> [ 9.303000] usb usb6: SerialNumber: 0000:05:00.0
> [ 9.303557] hub 6-0:1.0: USB hub found
> [ 9.303567] hub 6-0:1.0: 2 ports detected
> [ 9.552443] usb 5-1: new high-speed USB device number 2 using xhci_hcd
> [ 10.130905] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> [ 10.131029] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> [ 10.131047] thunderbolt 0000:00:0d.2: bandwidth consumption changed, re-calculating estimated bandwidth
> [ 10.131051] thunderbolt 0000:00:0d.2: re-calculating bandwidth estimation for group 1
> [ 10.131198] thunderbolt 0000:00:0d.2: bandwidth estimation for group 1 done
> [ 10.131206] thunderbolt 0000:00:0d.2: bandwidth re-calculation done
> [ 10.131212] thunderbolt 0000:00:0d.2: 1: TMU: mode change uni-directional, LowRes -> uni-directional, HiFi requested
> [ 10.135515] thunderbolt 0000:00:0d.2: 1: TMU: mode set to: uni-directional, HiFi
> [ 10.136473] thunderbolt 0000:00:0d.2: 0:6: DP IN available
> [ 10.136606] thunderbolt 0000:00:0d.2: 1:10: DP OUT in use
> [ 10.136610] thunderbolt 0000:00:0d.2: 0:6: no suitable DP OUT adapter available, not tunneling
> [ 10.136743] thunderbolt 0000:00:0d.2: 1:11: DP OUT resource available after hotplug
> [ 10.136748] thunderbolt 0000:00:0d.2: looking for DP IN <-> DP OUT pairs:
> [ 10.136876] thunderbolt 0000:00:0d.2: 0:5: DP IN in use
> [ 10.137568] thunderbolt 0000:00:0d.2: 0:6: DP IN available
> [ 10.137687] thunderbolt 0000:00:0d.2: 1:10: DP OUT in use
> [ 10.137820] thunderbolt 0000:00:0d.2: 1:11: DP OUT available
> [ 10.139280] thunderbolt 0000:00:0d.2: 0: allocated DP resource for port 6
> [ 10.139286] thunderbolt 0000:00:0d.2: 0:6: attached to bandwidth group 1
> [ 10.139694] thunderbolt 0000:00:0d.2: 0:1: link maximum bandwidth 18000/18000 Mb/s
> [ 10.140680] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> [ 10.140829] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> [ 10.140963] thunderbolt 0000:00:0d.2: 1:1: link maximum bandwidth 18000/18000 Mb/s
> [ 10.141892] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> [ 10.142027] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> [ 10.142033] thunderbolt 0000:00:0d.2: available bandwidth for new DP tunnel 18000/720 Mb/s
> [ 10.142052] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): activating
> [ 10.143353] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): DP IN maximum supported bandwidth 8100 Mb/s x4 = 25920 Mb/s
> [ 10.143360] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): DP OUT maximum supported bandwidth 5400 Mb/s x4 = 17280 Mb/s
> [ 10.143366] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): not enough bandwidth
> [ 10.143371] thunderbolt 0000:00:0d.2: 1:11: DP tunnel activation failed, aborting

However, the second DP tunnel fails because of no bandwidth.

> [ 10.143489] thunderbolt 0000:00:0d.2: 0:6: detached from bandwidth group 1
> [ 10.144883] thunderbolt 0000:00:0d.2: 0: released DP resource for port 6
> [ 14.902955] usb 5-1: unable to get BOS descriptor set
> [ 14.906143] usb 5-1: New USB device found, idVendor=2188, idProduct=0610, bcdDevice=70.42
> [ 14.906167] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 14.906175] usb 5-1: Product: USB2.1 Hub
> [ 14.906183] usb 5-1: Manufacturer: CalDigit, Inc.
> [ 14.908660] hub 5-1:1.0: USB hub found
> [ 14.909135] hub 5-1:1.0: 4 ports detected
> [ 15.026182] usb 6-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> [ 15.050199] usb 6-1: New USB device found, idVendor=2188, idProduct=0625, bcdDevice=70.42
> [ 15.050223] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 15.050231] usb 6-1: Product: USB3.1 Gen2 Hub
> [ 15.050237] usb 6-1: Manufacturer: CalDigit, Inc.
> [ 15.053712] hub 6-1:1.0: USB hub found
> [ 15.054279] hub 6-1:1.0: 4 ports detected
> [ 15.215877] usb 5-1.4: new high-speed USB device number 3 using xhci_hcd
> [ 15.333676] usb 5-1.4: New USB device found, idVendor=2188, idProduct=0611, bcdDevice=93.06
> [ 15.333703] usb 5-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 15.333711] usb 5-1.4: Product: USB2.1 Hub
> [ 15.333718] usb 5-1.4: Manufacturer: CalDigit, Inc.
> [ 15.336484] hub 5-1.4:1.0: USB hub found
> [ 15.336797] hub 5-1.4:1.0: 4 ports detected
> [ 15.402943] usb 6-1.1: new SuperSpeed USB device number 3 using xhci_hcd
> [ 15.425589] usb 6-1.1: New USB device found, idVendor=2188, idProduct=0754, bcdDevice= 0.06
> [ 15.425615] usb 6-1.1: New USB device strings: Mfr=3, Product=4, SerialNumber=2
> [ 15.425623] usb 6-1.1: Product: USB-C Pro Card Reader
> [ 15.425691] usb 6-1.1: Manufacturer: CalDigit
> [ 15.425697] usb 6-1.1: SerialNumber: 000000000006
> [ 15.432231] usb-storage 6-1.1:1.0: USB Mass Storage device detected
> [ 15.433690] scsi host0: usb-storage 6-1.1:1.0
> [ 15.506218] usb 6-1.4: new SuperSpeed USB device number 4 using xhci_hcd
> [ 15.528220] usb 6-1.4: New USB device found, idVendor=2188, idProduct=0620, bcdDevice=93.06
> [ 15.528237] usb 6-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 15.528241] usb 6-1.4: Product: USB3.1 Gen1 Hub
> [ 15.528244] usb 6-1.4: Manufacturer: CalDigit, Inc.
> [ 15.531198] hub 6-1.4:1.0: USB hub found
> [ 15.531506] hub 6-1.4:1.0: 4 ports detected
> [ 15.649217] usb 5-1.4.1: new high-speed USB device number 4 using xhci_hcd
> [ 15.989548] usb 6-1.4.4: new SuperSpeed USB device number 5 using xhci_hcd
> [ 16.007996] usb 6-1.4.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=31.00
> [ 16.008021] usb 6-1.4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
> [ 16.008029] usb 6-1.4.4: Product: USB 10/100/1000 LAN
> [ 16.008035] usb 6-1.4.4: Manufacturer: Realtek
> [ 16.008040] usb 6-1.4.4: SerialNumber: 001001000
> [ 16.090287] r8152-cfgselector 6-1.4.4: reset SuperSpeed USB device number 5 using xhci_hcd
> [ 16.136796] r8152 6-1.4.4:1.0: load rtl8153b-2 v2 04/27/23 successfully
> [ 16.171430] r8152 6-1.4.4:1.0 eth0: v1.12.13
> [ 16.209513] r8152 6-1.4.4:1.0 enp5s0u1u4u4: renamed from eth0
> [ 16.453330] scsi 0:0:0:0: Direct-Access CalDigit SD Card Reader 0006 PQ: 0 ANSI: 6
> [ 16.454420] sd 0:0:0:0: Attached scsi generic sg0 type 0
> [ 16.455908] sd 0:0:0:0: [sda] Media removed, stopped polling
> [ 16.457173] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [ 16.497559] usb 5-1.4.1: New USB device found, idVendor=2188, idProduct=4042, bcdDevice= 0.06
> [ 16.497567] usb 5-1.4.1: New USB device strings: Mfr=3, Product=1, SerialNumber=0
> [ 16.497570] usb 5-1.4.1: Product: CalDigit USB-C Pro Audio
> [ 16.497572] usb 5-1.4.1: Manufacturer: CalDigit Inc.
> [ 16.920216] ucsi_acpi USBC000:00: possible UCSI driver bug 1
> [ 17.494492] input: CalDigit Inc. CalDigit USB-C Pro Audio as /devices/pci0000:00/0000:00:07.0/0000:03:00.0/0000:04:02.0/0000:05:00.0/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.3/0003:2188:4042.0005/input/input20
> [ 17.550258] hid-generic 0003:2188:4042.0005: input,hidraw2: USB HID v1.11 Device [CalDigit Inc. CalDigit USB-C Pro Audio] on usb-0000:05:00.0-1.4.1/input3
> [ 19.609816] r8152 6-1.4.4:1.0 enp5s0u1u4u4: carrier on

All the USB devices seem to work fine (assuming I read this right).

There is the DP tunneling limitation but other than that how the dock
does not work? At least reading this log everything else seems to be
fine except the second monitor?

Now it is interesting why the link is only 20G and not 40G. I do have
this same device and it gets the link up as 40G just fine:

[ 17.867868] thunderbolt 0000:00:0d.2: 1: current link speed 20.0 Gb/s
[ 17.867869] thunderbolt 0000:00:0d.2: 1: current link width symmetric, single lane
[ 17.868437] thunderbolt 0000:00:0d.2: 0:1: total credits changed 120 -> 60
[ 17.868625] thunderbolt 0000:00:0d.2: 0:2: total credits changed 0 -> 60
[ 17.872472] thunderbolt 0000:00:0d.2: 1: TMU: current mode: bi-directional, HiFi
[ 17.872608] thunderbolt 0-1: new device found, vendor=0x3d device=0x11
[ 17.879102] thunderbolt 0-1: CalDigit, Inc. TS3 Plus

Do you use a Thunderbolt cable or some regular type-C one? There is the
lightning symbol on the connector when it is Thunderbolt one.

2024-05-20 16:53:42

by Benjamin Böhmke

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

On Monday, May 20, 2024 18:21 CEST, Mika Westerberg <[email protected]> wrote:

> Hi,
>
> On Mon, May 20, 2024 at 05:12:40PM +0200, Benjamin Böhmke wrote:
> > On Monday, May 20, 2024 16:41 CEST, Mario Limonciello <[email protected]> wrote:
> >
> > > On 5/20/2024 09:39, Christian Heusel wrote:
> > > > On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > >> [CCing Mario, who asked for the two suspected commits to be backported]
> > > >>
> > > >> On 06.05.24 14:24, Gia wrote:
> > > >>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> > > >>> TS3 Plus Thunderbolt 3 dock.
> > > >>>
> > > >>> After the update I see this message on boot "xHCI host controller not
> > > >>> responding, assume dead" and the dock is not working anymore. Kernel
> > > >>> 6.8.7 works great.
> > > >
> > > > We now have some further information on the matter as somebody was kind
> > > > enough to bisect the issue in the [Arch Linux Forums][0]:
> > > >
> > > > cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")
> > > >
> > > > This is a stable commit id, the relevant mainline commit is:
> > > >
> > > > 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
> > > >
> > > > The other reporter created [a issue][1] in our bugtracker, which I'll
> > > > leave here just for completeness sake.
> > > >
> > > > Reported-by: Benjamin Böhmke <[email protected]>
> > > > Reported-by: Gia <[email protected]>
> > > > Bisected-by: Benjamin Böhmke <[email protected]>
> > > >
> > > > The person doing the bisection also offered to chime in here if further
> > > > debugging is needed!
> > > >
> > > > Also CC'ing the Commitauthors & Subsystem Maintainers for this report.
> > > >
> > > > Cheers,
> > > > Christian
> > > >
> > > > [0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
> > > > [1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> > > >
> > > > #regzbot introduced: 59a54c5f3dbd
> > > > #regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> > >
> > > As I mentioned in my other email I would like to collate logs onto a
> > > kernel Bugzilla. With these two cases:
> > >
> > > thunderbolt.dyndbg=+p
> > > thunderbolt.dyndbg=+p thunderbolt.host_reset=false
> > >
> > > Also what is the value for:
> > >
> > > $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection
> >
> > I attached the requested kernel logs as text files (hope this is ok).
> > In both cases I used the stable ArchLinux kernel 6.9.1
> >
> > The iommu_dma_protection is both cases "1".
> >
> > Best Regards
> > Benjamin
>
> After reset the link comes up just fine but there is one thing that I
> noticed:
>
> > [ 8.225355] thunderbolt 0-0:1.1: NVM version 7.0
> > [ 8.225360] thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
> > [ 8.226410] thunderbolt 0000:00:0d.2: current switch config:
> > [ 8.226413] thunderbolt 0000:00:0d.2: Thunderbolt 3 Switch: 8086:15ef (Revision: 6, TB Version: 16)
> > [ 8.226417] thunderbolt 0000:00:0d.2: Max Port Number: 13
> > [ 8.226420] thunderbolt 0000:00:0d.2: Config:
> > [ 8.226421] thunderbolt 0000:00:0d.2: Upstream Port Number: 0 Depth: 0 Route String: 0x0 Enabled: 0, PlugEventsDelay: 10ms
> > [ 8.226424] thunderbolt 0000:00:0d.2: unknown1: 0x0 unknown4: 0x0
> > [ 8.227755] iwlwifi 0000:00:14.3: Registered PHC clock: iwlwifi-PTP, with index: 0
> > [ 8.234944] thunderbolt 0000:00:0d.2: initializing Switch at 0x1 (depth: 1, up port: 1)
> > [ 8.246755] thunderbolt 0000:00:0d.2: acking hot plug event on 1:2
> > [ 8.267378] thunderbolt 0000:00:0d.2: 1: reading DROM (length: 0x6d)
> > [ 8.879296] thunderbolt 0000:00:0d.2: 1: DROM version: 1
> > [ 8.880631] thunderbolt 0000:00:0d.2: 1: uid: 0x3d600630c86400
> > [ 8.884540] thunderbolt 0000:00:0d.2: Port 1: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1))
> > [ 8.884562] thunderbolt 0000:00:0d.2: Max hop id (in/out): 19/19
> > [ 8.884564] thunderbolt 0000:00:0d.2: Max counters: 16
> > [ 8.884566] thunderbolt 0000:00:0d.2: NFC Credits: 0x3c00000
> > [ 8.884567] thunderbolt 0000:00:0d.2: Credits (total/control): 60/2
> > [ 8.887782] thunderbolt 0000:00:0d.2: Port 2: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1))
> > [ 8.887787] thunderbolt 0000:00:0d.2: Max hop id (in/out): 19/19
> > [ 8.887789] thunderbolt 0000:00:0d.2: Max counters: 16
> > [ 8.887791] thunderbolt 0000:00:0d.2: NFC Credits: 0x3c00000
> > [ 8.887792] thunderbolt 0000:00:0d.2: Credits (total/control): 60/2
> > [ 8.887794] thunderbolt 0000:00:0d.2: 1:3: disabled by eeprom
> > [ 8.887795] thunderbolt 0000:00:0d.2: 1:4: disabled by eeprom
> > [ 8.887796] thunderbolt 0000:00:0d.2: 1:5: disabled by eeprom
> > [ 8.887797] thunderbolt 0000:00:0d.2: 1:6: disabled by eeprom
> > [ 8.887798] thunderbolt 0000:00:0d.2: 1:7: disabled by eeprom
> > [ 8.888053] thunderbolt 0000:00:0d.2: Port 8: 8086:15ef (Revision: 6, TB Version: 1, Type: PCIe (0x100102))
> > [ 8.888056] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > [ 8.888057] thunderbolt 0000:00:0d.2: Max counters: 2
> > [ 8.888058] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > [ 8.888059] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > [ 8.888848] thunderbolt 0000:00:0d.2: Port 9: 8086:15ef (Revision: 6, TB Version: 1, Type: PCIe (0x100101))
> > [ 8.888850] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > [ 8.888851] thunderbolt 0000:00:0d.2: Max counters: 2
> > [ 8.888852] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > [ 8.888852] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > [ 8.889379] thunderbolt 0000:00:0d.2: Port 10: 8086:15ef (Revision: 6, TB Version: 1, Type: DP/HDMI (0xe0102))
> > [ 8.889381] thunderbolt 0000:00:0d.2: Max hop id (in/out): 9/9
> > [ 8.889382] thunderbolt 0000:00:0d.2: Max counters: 2
> > [ 8.889383] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > [ 8.889384] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > [ 8.890457] thunderbolt 0000:00:0d.2: Port 11: 8086:15ef (Revision: 6, TB Version: 1, Type: DP/HDMI (0xe0102))
> > [ 8.890459] thunderbolt 0000:00:0d.2: Max hop id (in/out): 9/9
> > [ 8.890460] thunderbolt 0000:00:0d.2: Max counters: 2
> > [ 8.890461] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > [ 8.890462] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > [ 8.890721] thunderbolt 0000:00:0d.2: Port 12: 8086:15ea (Revision: 6, TB Version: 1, Type: Inactive (0x0))
> > [ 8.890723] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > [ 8.890724] thunderbolt 0000:00:0d.2: Max counters: 2
> > [ 8.890725] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > [ 8.890726] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > [ 8.891534] thunderbolt 0000:00:0d.2: Port 13: 8086:15ea (Revision: 6, TB Version: 1, Type: Inactive (0x0))
> > [ 8.891545] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > [ 8.891551] thunderbolt 0000:00:0d.2: Max counters: 2
> > [ 8.891557] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > [ 8.891564] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > [ 8.891825] thunderbolt 0000:00:0d.2: 1: current link speed 10.0 Gb/s
>
> Here it is 10G instead of 20G which limits the bandwidth available for
> DP tunneling.
>
> ...
>
> > [ 9.297112] pci 0000:05:00.0: [8086:15f0] type 00 class 0x0c0330 PCIe Endpoint
> > [ 9.297146] pci 0000:05:00.0: BAR 0 [mem 0x00000000-0x0000ffff]
> > [ 9.297249] pci 0000:05:00.0: enabling Extended Tags
> > [ 9.297479] pci 0000:05:00.0: supports D1 D2
> > [ 9.297481] pci 0000:05:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [ 9.297717] pci 0000:05:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
>
> The xHCI comes up just fine though.
>
> > [ 9.300388] xhci_hcd 0000:05:00.0: xHCI Host Controller
> > [ 9.300397] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 5
> > [ 9.301802] xhci_hcd 0000:05:00.0: hcc params 0x200077c1 hci version 0x110 quirks 0x0000000200009810
> > [ 9.302393] xhci_hcd 0000:05:00.0: xHCI Host Controller
> > [ 9.302398] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 6
> > [ 9.302401] xhci_hcd 0000:05:00.0: Host supports USB 3.1 Enhanced SuperSpeed
> > [ 9.302459] usb usb5: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.09
> > [ 9.302462] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> > [ 9.302465] usb usb5: Product: xHCI Host Controller
> > [ 9.302466] usb usb5: Manufacturer: Linux 6.9.1-arch1-1 xhci-hcd
> > [ 9.302468] usb usb5: SerialNumber: 0000:05:00.0
> > [ 9.302783] hub 5-0:1.0: USB hub found
> > [ 9.302794] hub 5-0:1.0: 2 ports detected
> > [ 9.302992] usb usb6: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.09
> > [ 9.302995] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> > [ 9.302997] usb usb6: Product: xHCI Host Controller
> > [ 9.302998] usb usb6: Manufacturer: Linux 6.9.1-arch1-1 xhci-hcd
> > [ 9.303000] usb usb6: SerialNumber: 0000:05:00.0
> > [ 9.303557] hub 6-0:1.0: USB hub found
> > [ 9.303567] hub 6-0:1.0: 2 ports detected
> > [ 9.552443] usb 5-1: new high-speed USB device number 2 using xhci_hcd
> > [ 10.130905] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> > [ 10.131029] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> > [ 10.131047] thunderbolt 0000:00:0d.2: bandwidth consumption changed, re-calculating estimated bandwidth
> > [ 10.131051] thunderbolt 0000:00:0d.2: re-calculating bandwidth estimation for group 1
> > [ 10.131198] thunderbolt 0000:00:0d.2: bandwidth estimation for group 1 done
> > [ 10.131206] thunderbolt 0000:00:0d.2: bandwidth re-calculation done
> > [ 10.131212] thunderbolt 0000:00:0d.2: 1: TMU: mode change uni-directional, LowRes -> uni-directional, HiFi requested
> > [ 10.135515] thunderbolt 0000:00:0d.2: 1: TMU: mode set to: uni-directional, HiFi
> > [ 10.136473] thunderbolt 0000:00:0d.2: 0:6: DP IN available
> > [ 10.136606] thunderbolt 0000:00:0d.2: 1:10: DP OUT in use
> > [ 10.136610] thunderbolt 0000:00:0d.2: 0:6: no suitable DP OUT adapter available, not tunneling
> > [ 10.136743] thunderbolt 0000:00:0d.2: 1:11: DP OUT resource available after hotplug
> > [ 10.136748] thunderbolt 0000:00:0d.2: looking for DP IN <-> DP OUT pairs:
> > [ 10.136876] thunderbolt 0000:00:0d.2: 0:5: DP IN in use
> > [ 10.137568] thunderbolt 0000:00:0d.2: 0:6: DP IN available
> > [ 10.137687] thunderbolt 0000:00:0d.2: 1:10: DP OUT in use
> > [ 10.137820] thunderbolt 0000:00:0d.2: 1:11: DP OUT available
> > [ 10.139280] thunderbolt 0000:00:0d.2: 0: allocated DP resource for port 6
> > [ 10.139286] thunderbolt 0000:00:0d.2: 0:6: attached to bandwidth group 1
> > [ 10.139694] thunderbolt 0000:00:0d.2: 0:1: link maximum bandwidth 18000/18000 Mb/s
> > [ 10.140680] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> > [ 10.140829] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> > [ 10.140963] thunderbolt 0000:00:0d.2: 1:1: link maximum bandwidth 18000/18000 Mb/s
> > [ 10.141892] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> > [ 10.142027] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> > [ 10.142033] thunderbolt 0000:00:0d.2: available bandwidth for new DP tunnel 18000/720 Mb/s
> > [ 10.142052] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): activating
> > [ 10.143353] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): DP IN maximum supported bandwidth 8100 Mb/s x4 = 25920 Mb/s
> > [ 10.143360] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): DP OUT maximum supported bandwidth 5400 Mb/s x4 = 17280 Mb/s
> > [ 10.143366] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): not enough bandwidth
> > [ 10.143371] thunderbolt 0000:00:0d.2: 1:11: DP tunnel activation failed, aborting
>
> However, the second DP tunnel fails because of no bandwidth.
>
> > [ 10.143489] thunderbolt 0000:00:0d.2: 0:6: detached from bandwidth group 1
> > [ 10.144883] thunderbolt 0000:00:0d.2: 0: released DP resource for port 6
> > [ 14.902955] usb 5-1: unable to get BOS descriptor set
> > [ 14.906143] usb 5-1: New USB device found, idVendor=2188, idProduct=0610, bcdDevice=70.42
> > [ 14.906167] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > [ 14.906175] usb 5-1: Product: USB2.1 Hub
> > [ 14.906183] usb 5-1: Manufacturer: CalDigit, Inc.
> > [ 14.908660] hub 5-1:1.0: USB hub found
> > [ 14.909135] hub 5-1:1.0: 4 ports detected
> > [ 15.026182] usb 6-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> > [ 15.050199] usb 6-1: New USB device found, idVendor=2188, idProduct=0625, bcdDevice=70.42
> > [ 15.050223] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > [ 15.050231] usb 6-1: Product: USB3.1 Gen2 Hub
> > [ 15.050237] usb 6-1: Manufacturer: CalDigit, Inc.
> > [ 15.053712] hub 6-1:1.0: USB hub found
> > [ 15.054279] hub 6-1:1.0: 4 ports detected
> > [ 15.215877] usb 5-1.4: new high-speed USB device number 3 using xhci_hcd
> > [ 15.333676] usb 5-1.4: New USB device found, idVendor=2188, idProduct=0611, bcdDevice=93.06
> > [ 15.333703] usb 5-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > [ 15.333711] usb 5-1.4: Product: USB2.1 Hub
> > [ 15.333718] usb 5-1.4: Manufacturer: CalDigit, Inc.
> > [ 15.336484] hub 5-1.4:1.0: USB hub found
> > [ 15.336797] hub 5-1.4:1.0: 4 ports detected
> > [ 15.402943] usb 6-1.1: new SuperSpeed USB device number 3 using xhci_hcd
> > [ 15.425589] usb 6-1.1: New USB device found, idVendor=2188, idProduct=0754, bcdDevice= 0.06
> > [ 15.425615] usb 6-1.1: New USB device strings: Mfr=3, Product=4, SerialNumber=2
> > [ 15.425623] usb 6-1.1: Product: USB-C Pro Card Reader
> > [ 15.425691] usb 6-1.1: Manufacturer: CalDigit
> > [ 15.425697] usb 6-1.1: SerialNumber: 000000000006
> > [ 15.432231] usb-storage 6-1.1:1.0: USB Mass Storage device detected
> > [ 15.433690] scsi host0: usb-storage 6-1.1:1.0
> > [ 15.506218] usb 6-1.4: new SuperSpeed USB device number 4 using xhci_hcd
> > [ 15.528220] usb 6-1.4: New USB device found, idVendor=2188, idProduct=0620, bcdDevice=93.06
> > [ 15.528237] usb 6-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > [ 15.528241] usb 6-1.4: Product: USB3.1 Gen1 Hub
> > [ 15.528244] usb 6-1.4: Manufacturer: CalDigit, Inc.
> > [ 15.531198] hub 6-1.4:1.0: USB hub found
> > [ 15.531506] hub 6-1.4:1.0: 4 ports detected
> > [ 15.649217] usb 5-1.4.1: new high-speed USB device number 4 using xhci_hcd
> > [ 15.989548] usb 6-1.4.4: new SuperSpeed USB device number 5 using xhci_hcd
> > [ 16.007996] usb 6-1.4.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=31.00
> > [ 16.008021] usb 6-1.4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
> > [ 16.008029] usb 6-1.4.4: Product: USB 10/100/1000 LAN
> > [ 16.008035] usb 6-1.4.4: Manufacturer: Realtek
> > [ 16.008040] usb 6-1.4.4: SerialNumber: 001001000
> > [ 16.090287] r8152-cfgselector 6-1.4.4: reset SuperSpeed USB device number 5 using xhci_hcd
> > [ 16.136796] r8152 6-1.4.4:1.0: load rtl8153b-2 v2 04/27/23 successfully
> > [ 16.171430] r8152 6-1.4.4:1.0 eth0: v1.12.13
> > [ 16.209513] r8152 6-1.4.4:1.0 enp5s0u1u4u4: renamed from eth0
> > [ 16.453330] scsi 0:0:0:0: Direct-Access CalDigit SD Card Reader 0006 PQ: 0 ANSI: 6
> > [ 16.454420] sd 0:0:0:0: Attached scsi generic sg0 type 0
> > [ 16.455908] sd 0:0:0:0: [sda] Media removed, stopped polling
> > [ 16.457173] sd 0:0:0:0: [sda] Attached SCSI removable disk
> > [ 16.497559] usb 5-1.4.1: New USB device found, idVendor=2188, idProduct=4042, bcdDevice= 0.06
> > [ 16.497567] usb 5-1.4.1: New USB device strings: Mfr=3, Product=1, SerialNumber=0
> > [ 16.497570] usb 5-1.4.1: Product: CalDigit USB-C Pro Audio
> > [ 16.497572] usb 5-1.4.1: Manufacturer: CalDigit Inc.
> > [ 16.920216] ucsi_acpi USBC000:00: possible UCSI driver bug 1
> > [ 17.494492] input: CalDigit Inc. CalDigit USB-C Pro Audio as /devices/pci0000:00/0000:00:07.0/0000:03:00.0/0000:04:02.0/0000:05:00.0/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.3/0003:2188:4042.0005/input/input20
> > [ 17.550258] hid-generic 0003:2188:4042.0005: input,hidraw2: USB HID v1.11 Device [CalDigit Inc. CalDigit USB-C Pro Audio] on usb-0000:05:00.0-1.4.1/input3
> > [ 19.609816] r8152 6-1.4.4:1.0 enp5s0u1u4u4: carrier on
>
> All the USB devices seem to work fine (assuming I read this right).

To keep the log small I unplugged all USB devices from the dock.
But even if connected I don't have issues with them.

>
> There is the DP tunneling limitation but other than that how the dock
> does not work? At least reading this log everything else seems to be
> fine except the second monitor?

Exactly only the second monitor is/was not working.

>
> Now it is interesting why the link is only 20G and not 40G. I do have
> this same device and it gets the link up as 40G just fine:
>
> [ 17.867868] thunderbolt 0000:00:0d.2: 1: current link speed 20.0 Gb/s
> [ 17.867869] thunderbolt 0000:00:0d.2: 1: current link width symmetric, single lane
> [ 17.868437] thunderbolt 0000:00:0d.2: 0:1: total credits changed 120 -> 60
> [ 17.868625] thunderbolt 0000:00:0d.2: 0:2: total credits changed 0 -> 60
> [ 17.872472] thunderbolt 0000:00:0d.2: 1: TMU: current mode: bi-directional, HiFi
> [ 17.872608] thunderbolt 0-1: new device found, vendor=0x3d device=0x11
> [ 17.879102] thunderbolt 0-1: CalDigit, Inc. TS3 Plus
>

My dock is a little different model (see https://www.caldigit.com/usb-c-pro-dock/)
I don't have a CalDigit TS3 Plus.

> Do you use a Thunderbolt cable or some regular type-C one? There is the
> lightning symbol on the connector when it is Thunderbolt one.

The dock was connected with a Thunderbolt cable, that I used for a couple of years without any issues.
Based on the hint I replaced the cable and the issue is now gone for me.

I still don't understand why this happened as it was working great for years and is still working with kernels 6.8.7 or older.
But nevertheless sorry if I wasted time of anyone because of broken hardware.

Best Regards
Benjamin


2024-05-20 17:31:40

by Gia

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

In my case I use the official Thunderbolt cable that came with my
CalDigit TS3 Plus and yet the log - attached in a previous email -
says current link speed 10.0 Gb/s. I just tried a good quality USB4
cable too and nothing changed.

Maybe it's relevant to note that I'm using the Thunderbolt dock via
USB4 on a PC with an AMD Ryzen 7 7735HS.

On Mon, May 20, 2024 at 6:53 PM Benjamin Böhmke <[email protected]> wrote:
>
> On Monday, May 20, 2024 18:21 CEST, Mika Westerberg <[email protected]> wrote:
>
> > Hi,
> >
> > On Mon, May 20, 2024 at 05:12:40PM +0200, Benjamin Böhmke wrote:
> > > On Monday, May 20, 2024 16:41 CEST, Mario Limonciello <[email protected]> wrote:
> > >
> > > > On 5/20/2024 09:39, Christian Heusel wrote:
> > > > > On 24/05/06 02:53PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> > > > >> [CCing Mario, who asked for the two suspected commits to be backported]
> > > > >>
> > > > >> On 06.05.24 14:24, Gia wrote:
> > > > >>> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> > > > >>> TS3 Plus Thunderbolt 3 dock.
> > > > >>>
> > > > >>> After the update I see this message on boot "xHCI host controller not
> > > > >>> responding, assume dead" and the dock is not working anymore. Kernel
> > > > >>> 6.8.7 works great.
> > > > >
> > > > > We now have some further information on the matter as somebody was kind
> > > > > enough to bisect the issue in the [Arch Linux Forums][0]:
> > > > >
> > > > > cc4c94a5f6c4 ("thunderbolt: Reset topology created by the boot firmware")
> > > > >
> > > > > This is a stable commit id, the relevant mainline commit is:
> > > > >
> > > > > 59a54c5f3dbd ("thunderbolt: Reset topology created by the boot firmware")
> > > > >
> > > > > The other reporter created [a issue][1] in our bugtracker, which I'll
> > > > > leave here just for completeness sake.
> > > > >
> > > > > Reported-by: Benjamin Böhmke <[email protected]>
> > > > > Reported-by: Gia <[email protected]>
> > > > > Bisected-by: Benjamin Böhmke <[email protected]>
> > > > >
> > > > > The person doing the bisection also offered to chime in here if further
> > > > > debugging is needed!
> > > > >
> > > > > Also CC'ing the Commitauthors & Subsystem Maintainers for this report.
> > > > >
> > > > > Cheers,
> > > > > Christian
> > > > >
> > > > > [0]: https://bbs.archlinux.org/viewtopic.php?pid=2172526
> > > > > [1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> > > > >
> > > > > #regzbot introduced: 59a54c5f3dbd
> > > > > #regzbot link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/48
> > > >
> > > > As I mentioned in my other email I would like to collate logs onto a
> > > > kernel Bugzilla. With these two cases:
> > > >
> > > > thunderbolt.dyndbg=+p
> > > > thunderbolt.dyndbg=+p thunderbolt.host_reset=false
> > > >
> > > > Also what is the value for:
> > > >
> > > > $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection
> > >
> > > I attached the requested kernel logs as text files (hope this is ok).
> > > In both cases I used the stable ArchLinux kernel 6.9.1
> > >
> > > The iommu_dma_protection is both cases "1".
> > >
> > > Best Regards
> > > Benjamin
> >
> > After reset the link comes up just fine but there is one thing that I
> > noticed:
> >
> > > [ 8.225355] thunderbolt 0-0:1.1: NVM version 7.0
> > > [ 8.225360] thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
> > > [ 8.226410] thunderbolt 0000:00:0d.2: current switch config:
> > > [ 8.226413] thunderbolt 0000:00:0d.2: Thunderbolt 3 Switch: 8086:15ef (Revision: 6, TB Version: 16)
> > > [ 8.226417] thunderbolt 0000:00:0d.2: Max Port Number: 13
> > > [ 8.226420] thunderbolt 0000:00:0d.2: Config:
> > > [ 8.226421] thunderbolt 0000:00:0d.2: Upstream Port Number: 0 Depth: 0 Route String: 0x0 Enabled: 0, PlugEventsDelay: 10ms
> > > [ 8.226424] thunderbolt 0000:00:0d.2: unknown1: 0x0 unknown4: 0x0
> > > [ 8.227755] iwlwifi 0000:00:14.3: Registered PHC clock: iwlwifi-PTP, with index: 0
> > > [ 8.234944] thunderbolt 0000:00:0d.2: initializing Switch at 0x1 (depth: 1, up port: 1)
> > > [ 8.246755] thunderbolt 0000:00:0d.2: acking hot plug event on 1:2
> > > [ 8.267378] thunderbolt 0000:00:0d.2: 1: reading DROM (length: 0x6d)
> > > [ 8.879296] thunderbolt 0000:00:0d.2: 1: DROM version: 1
> > > [ 8.880631] thunderbolt 0000:00:0d.2: 1: uid: 0x3d600630c86400
> > > [ 8.884540] thunderbolt 0000:00:0d.2: Port 1: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1))
> > > [ 8.884562] thunderbolt 0000:00:0d.2: Max hop id (in/out): 19/19
> > > [ 8.884564] thunderbolt 0000:00:0d.2: Max counters: 16
> > > [ 8.884566] thunderbolt 0000:00:0d.2: NFC Credits: 0x3c00000
> > > [ 8.884567] thunderbolt 0000:00:0d.2: Credits (total/control): 60/2
> > > [ 8.887782] thunderbolt 0000:00:0d.2: Port 2: 8086:15ef (Revision: 6, TB Version: 1, Type: Port (0x1))
> > > [ 8.887787] thunderbolt 0000:00:0d.2: Max hop id (in/out): 19/19
> > > [ 8.887789] thunderbolt 0000:00:0d.2: Max counters: 16
> > > [ 8.887791] thunderbolt 0000:00:0d.2: NFC Credits: 0x3c00000
> > > [ 8.887792] thunderbolt 0000:00:0d.2: Credits (total/control): 60/2
> > > [ 8.887794] thunderbolt 0000:00:0d.2: 1:3: disabled by eeprom
> > > [ 8.887795] thunderbolt 0000:00:0d.2: 1:4: disabled by eeprom
> > > [ 8.887796] thunderbolt 0000:00:0d.2: 1:5: disabled by eeprom
> > > [ 8.887797] thunderbolt 0000:00:0d.2: 1:6: disabled by eeprom
> > > [ 8.887798] thunderbolt 0000:00:0d.2: 1:7: disabled by eeprom
> > > [ 8.888053] thunderbolt 0000:00:0d.2: Port 8: 8086:15ef (Revision: 6, TB Version: 1, Type: PCIe (0x100102))
> > > [ 8.888056] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > > [ 8.888057] thunderbolt 0000:00:0d.2: Max counters: 2
> > > [ 8.888058] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > > [ 8.888059] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > > [ 8.888848] thunderbolt 0000:00:0d.2: Port 9: 8086:15ef (Revision: 6, TB Version: 1, Type: PCIe (0x100101))
> > > [ 8.888850] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > > [ 8.888851] thunderbolt 0000:00:0d.2: Max counters: 2
> > > [ 8.888852] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > > [ 8.888852] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > > [ 8.889379] thunderbolt 0000:00:0d.2: Port 10: 8086:15ef (Revision: 6, TB Version: 1, Type: DP/HDMI (0xe0102))
> > > [ 8.889381] thunderbolt 0000:00:0d.2: Max hop id (in/out): 9/9
> > > [ 8.889382] thunderbolt 0000:00:0d.2: Max counters: 2
> > > [ 8.889383] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > > [ 8.889384] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > > [ 8.890457] thunderbolt 0000:00:0d.2: Port 11: 8086:15ef (Revision: 6, TB Version: 1, Type: DP/HDMI (0xe0102))
> > > [ 8.890459] thunderbolt 0000:00:0d.2: Max hop id (in/out): 9/9
> > > [ 8.890460] thunderbolt 0000:00:0d.2: Max counters: 2
> > > [ 8.890461] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > > [ 8.890462] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > > [ 8.890721] thunderbolt 0000:00:0d.2: Port 12: 8086:15ea (Revision: 6, TB Version: 1, Type: Inactive (0x0))
> > > [ 8.890723] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > > [ 8.890724] thunderbolt 0000:00:0d.2: Max counters: 2
> > > [ 8.890725] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > > [ 8.890726] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > > [ 8.891534] thunderbolt 0000:00:0d.2: Port 13: 8086:15ea (Revision: 6, TB Version: 1, Type: Inactive (0x0))
> > > [ 8.891545] thunderbolt 0000:00:0d.2: Max hop id (in/out): 8/8
> > > [ 8.891551] thunderbolt 0000:00:0d.2: Max counters: 2
> > > [ 8.891557] thunderbolt 0000:00:0d.2: NFC Credits: 0x800000
> > > [ 8.891564] thunderbolt 0000:00:0d.2: Credits (total/control): 8/0
> > > [ 8.891825] thunderbolt 0000:00:0d.2: 1: current link speed 10.0 Gb/s
> >
> > Here it is 10G instead of 20G which limits the bandwidth available for
> > DP tunneling.
> >
> > ...
> >
> > > [ 9.297112] pci 0000:05:00.0: [8086:15f0] type 00 class 0x0c0330 PCIe Endpoint
> > > [ 9.297146] pci 0000:05:00.0: BAR 0 [mem 0x00000000-0x0000ffff]
> > > [ 9.297249] pci 0000:05:00.0: enabling Extended Tags
> > > [ 9.297479] pci 0000:05:00.0: supports D1 D2
> > > [ 9.297481] pci 0000:05:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> > > [ 9.297717] pci 0000:05:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
> >
> > The xHCI comes up just fine though.
> >
> > > [ 9.300388] xhci_hcd 0000:05:00.0: xHCI Host Controller
> > > [ 9.300397] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 5
> > > [ 9.301802] xhci_hcd 0000:05:00.0: hcc params 0x200077c1 hci version 0x110 quirks 0x0000000200009810
> > > [ 9.302393] xhci_hcd 0000:05:00.0: xHCI Host Controller
> > > [ 9.302398] xhci_hcd 0000:05:00.0: new USB bus registered, assigned bus number 6
> > > [ 9.302401] xhci_hcd 0000:05:00.0: Host supports USB 3.1 Enhanced SuperSpeed
> > > [ 9.302459] usb usb5: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.09
> > > [ 9.302462] usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> > > [ 9.302465] usb usb5: Product: xHCI Host Controller
> > > [ 9.302466] usb usb5: Manufacturer: Linux 6.9.1-arch1-1 xhci-hcd
> > > [ 9.302468] usb usb5: SerialNumber: 0000:05:00.0
> > > [ 9.302783] hub 5-0:1.0: USB hub found
> > > [ 9.302794] hub 5-0:1.0: 2 ports detected
> > > [ 9.302992] usb usb6: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.09
> > > [ 9.302995] usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> > > [ 9.302997] usb usb6: Product: xHCI Host Controller
> > > [ 9.302998] usb usb6: Manufacturer: Linux 6.9.1-arch1-1 xhci-hcd
> > > [ 9.303000] usb usb6: SerialNumber: 0000:05:00.0
> > > [ 9.303557] hub 6-0:1.0: USB hub found
> > > [ 9.303567] hub 6-0:1.0: 2 ports detected
> > > [ 9.552443] usb 5-1: new high-speed USB device number 2 using xhci_hcd
> > > [ 10.130905] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> > > [ 10.131029] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> > > [ 10.131047] thunderbolt 0000:00:0d.2: bandwidth consumption changed, re-calculating estimated bandwidth
> > > [ 10.131051] thunderbolt 0000:00:0d.2: re-calculating bandwidth estimation for group 1
> > > [ 10.131198] thunderbolt 0000:00:0d.2: bandwidth estimation for group 1 done
> > > [ 10.131206] thunderbolt 0000:00:0d.2: bandwidth re-calculation done
> > > [ 10.131212] thunderbolt 0000:00:0d.2: 1: TMU: mode change uni-directional, LowRes -> uni-directional, HiFi requested
> > > [ 10.135515] thunderbolt 0000:00:0d.2: 1: TMU: mode set to: uni-directional, HiFi
> > > [ 10.136473] thunderbolt 0000:00:0d.2: 0:6: DP IN available
> > > [ 10.136606] thunderbolt 0000:00:0d.2: 1:10: DP OUT in use
> > > [ 10.136610] thunderbolt 0000:00:0d.2: 0:6: no suitable DP OUT adapter available, not tunneling
> > > [ 10.136743] thunderbolt 0000:00:0d.2: 1:11: DP OUT resource available after hotplug
> > > [ 10.136748] thunderbolt 0000:00:0d.2: looking for DP IN <-> DP OUT pairs:
> > > [ 10.136876] thunderbolt 0000:00:0d.2: 0:5: DP IN in use
> > > [ 10.137568] thunderbolt 0000:00:0d.2: 0:6: DP IN available
> > > [ 10.137687] thunderbolt 0000:00:0d.2: 1:10: DP OUT in use
> > > [ 10.137820] thunderbolt 0000:00:0d.2: 1:11: DP OUT available
> > > [ 10.139280] thunderbolt 0000:00:0d.2: 0: allocated DP resource for port 6
> > > [ 10.139286] thunderbolt 0000:00:0d.2: 0:6: attached to bandwidth group 1
> > > [ 10.139694] thunderbolt 0000:00:0d.2: 0:1: link maximum bandwidth 18000/18000 Mb/s
> > > [ 10.140680] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> > > [ 10.140829] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> > > [ 10.140963] thunderbolt 0000:00:0d.2: 1:1: link maximum bandwidth 18000/18000 Mb/s
> > > [ 10.141892] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): DPRX read done
> > > [ 10.142027] thunderbolt 0000:00:0d.2: 0:5 <-> 1:10 (DP): consumed bandwidth 0/17280 Mb/s
> > > [ 10.142033] thunderbolt 0000:00:0d.2: available bandwidth for new DP tunnel 18000/720 Mb/s
> > > [ 10.142052] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): activating
> > > [ 10.143353] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): DP IN maximum supported bandwidth 8100 Mb/s x4 = 25920 Mb/s
> > > [ 10.143360] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): DP OUT maximum supported bandwidth 5400 Mb/s x4 = 17280 Mb/s
> > > [ 10.143366] thunderbolt 0000:00:0d.2: 0:6 <-> 1:11 (DP): not enough bandwidth
> > > [ 10.143371] thunderbolt 0000:00:0d.2: 1:11: DP tunnel activation failed, aborting
> >
> > However, the second DP tunnel fails because of no bandwidth.
> >
> > > [ 10.143489] thunderbolt 0000:00:0d.2: 0:6: detached from bandwidth group 1
> > > [ 10.144883] thunderbolt 0000:00:0d.2: 0: released DP resource for port 6
> > > [ 14.902955] usb 5-1: unable to get BOS descriptor set
> > > [ 14.906143] usb 5-1: New USB device found, idVendor=2188, idProduct=0610, bcdDevice=70.42
> > > [ 14.906167] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > > [ 14.906175] usb 5-1: Product: USB2.1 Hub
> > > [ 14.906183] usb 5-1: Manufacturer: CalDigit, Inc.
> > > [ 14.908660] hub 5-1:1.0: USB hub found
> > > [ 14.909135] hub 5-1:1.0: 4 ports detected
> > > [ 15.026182] usb 6-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd
> > > [ 15.050199] usb 6-1: New USB device found, idVendor=2188, idProduct=0625, bcdDevice=70.42
> > > [ 15.050223] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > > [ 15.050231] usb 6-1: Product: USB3.1 Gen2 Hub
> > > [ 15.050237] usb 6-1: Manufacturer: CalDigit, Inc.
> > > [ 15.053712] hub 6-1:1.0: USB hub found
> > > [ 15.054279] hub 6-1:1.0: 4 ports detected
> > > [ 15.215877] usb 5-1.4: new high-speed USB device number 3 using xhci_hcd
> > > [ 15.333676] usb 5-1.4: New USB device found, idVendor=2188, idProduct=0611, bcdDevice=93.06
> > > [ 15.333703] usb 5-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > > [ 15.333711] usb 5-1.4: Product: USB2.1 Hub
> > > [ 15.333718] usb 5-1.4: Manufacturer: CalDigit, Inc.
> > > [ 15.336484] hub 5-1.4:1.0: USB hub found
> > > [ 15.336797] hub 5-1.4:1.0: 4 ports detected
> > > [ 15.402943] usb 6-1.1: new SuperSpeed USB device number 3 using xhci_hcd
> > > [ 15.425589] usb 6-1.1: New USB device found, idVendor=2188, idProduct=0754, bcdDevice= 0.06
> > > [ 15.425615] usb 6-1.1: New USB device strings: Mfr=3, Product=4, SerialNumber=2
> > > [ 15.425623] usb 6-1.1: Product: USB-C Pro Card Reader
> > > [ 15.425691] usb 6-1.1: Manufacturer: CalDigit
> > > [ 15.425697] usb 6-1.1: SerialNumber: 000000000006
> > > [ 15.432231] usb-storage 6-1.1:1.0: USB Mass Storage device detected
> > > [ 15.433690] scsi host0: usb-storage 6-1.1:1.0
> > > [ 15.506218] usb 6-1.4: new SuperSpeed USB device number 4 using xhci_hcd
> > > [ 15.528220] usb 6-1.4: New USB device found, idVendor=2188, idProduct=0620, bcdDevice=93.06
> > > [ 15.528237] usb 6-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> > > [ 15.528241] usb 6-1.4: Product: USB3.1 Gen1 Hub
> > > [ 15.528244] usb 6-1.4: Manufacturer: CalDigit, Inc.
> > > [ 15.531198] hub 6-1.4:1.0: USB hub found
> > > [ 15.531506] hub 6-1.4:1.0: 4 ports detected
> > > [ 15.649217] usb 5-1.4.1: new high-speed USB device number 4 using xhci_hcd
> > > [ 15.989548] usb 6-1.4.4: new SuperSpeed USB device number 5 using xhci_hcd
> > > [ 16.007996] usb 6-1.4.4: New USB device found, idVendor=0bda, idProduct=8153, bcdDevice=31.00
> > > [ 16.008021] usb 6-1.4.4: New USB device strings: Mfr=1, Product=2, SerialNumber=6
> > > [ 16.008029] usb 6-1.4.4: Product: USB 10/100/1000 LAN
> > > [ 16.008035] usb 6-1.4.4: Manufacturer: Realtek
> > > [ 16.008040] usb 6-1.4.4: SerialNumber: 001001000
> > > [ 16.090287] r8152-cfgselector 6-1.4.4: reset SuperSpeed USB device number 5 using xhci_hcd
> > > [ 16.136796] r8152 6-1.4.4:1.0: load rtl8153b-2 v2 04/27/23 successfully
> > > [ 16.171430] r8152 6-1.4.4:1.0 eth0: v1.12.13
> > > [ 16.209513] r8152 6-1.4.4:1.0 enp5s0u1u4u4: renamed from eth0
> > > [ 16.453330] scsi 0:0:0:0: Direct-Access CalDigit SD Card Reader 0006 PQ: 0 ANSI: 6
> > > [ 16.454420] sd 0:0:0:0: Attached scsi generic sg0 type 0
> > > [ 16.455908] sd 0:0:0:0: [sda] Media removed, stopped polling
> > > [ 16.457173] sd 0:0:0:0: [sda] Attached SCSI removable disk
> > > [ 16.497559] usb 5-1.4.1: New USB device found, idVendor=2188, idProduct=4042, bcdDevice= 0.06
> > > [ 16.497567] usb 5-1.4.1: New USB device strings: Mfr=3, Product=1, SerialNumber=0
> > > [ 16.497570] usb 5-1.4.1: Product: CalDigit USB-C Pro Audio
> > > [ 16.497572] usb 5-1.4.1: Manufacturer: CalDigit Inc.
> > > [ 16.920216] ucsi_acpi USBC000:00: possible UCSI driver bug 1
> > > [ 17.494492] input: CalDigit Inc. CalDigit USB-C Pro Audio as /devices/pci0000:00/0000:00:07.0/0000:03:00.0/0000:04:02.0/0000:05:00.0/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.3/0003:2188:4042.0005/input/input20
> > > [ 17.550258] hid-generic 0003:2188:4042.0005: input,hidraw2: USB HID v1.11 Device [CalDigit Inc. CalDigit USB-C Pro Audio] on usb-0000:05:00.0-1.4.1/input3
> > > [ 19.609816] r8152 6-1.4.4:1.0 enp5s0u1u4u4: carrier on
> >
> > All the USB devices seem to work fine (assuming I read this right).
>
> To keep the log small I unplugged all USB devices from the dock.
> But even if connected I don't have issues with them.
>
> >
> > There is the DP tunneling limitation but other than that how the dock
> > does not work? At least reading this log everything else seems to be
> > fine except the second monitor?
>
> Exactly only the second monitor is/was not working.
>
> >
> > Now it is interesting why the link is only 20G and not 40G. I do have
> > this same device and it gets the link up as 40G just fine:
> >
> > [ 17.867868] thunderbolt 0000:00:0d.2: 1: current link speed 20.0 Gb/s
> > [ 17.867869] thunderbolt 0000:00:0d.2: 1: current link width symmetric, single lane
> > [ 17.868437] thunderbolt 0000:00:0d.2: 0:1: total credits changed 120 -> 60
> > [ 17.868625] thunderbolt 0000:00:0d.2: 0:2: total credits changed 0 -> 60
> > [ 17.872472] thunderbolt 0000:00:0d.2: 1: TMU: current mode: bi-directional, HiFi
> > [ 17.872608] thunderbolt 0-1: new device found, vendor=0x3d device=0x11
> > [ 17.879102] thunderbolt 0-1: CalDigit, Inc. TS3 Plus
> >
>
> My dock is a little different model (see https://www.caldigit.com/usb-c-pro-dock/)
> I don't have a CalDigit TS3 Plus.
>
> > Do you use a Thunderbolt cable or some regular type-C one? There is the
> > lightning symbol on the connector when it is Thunderbolt one.
>
> The dock was connected with a Thunderbolt cable, that I used for a couple of years without any issues.
> Based on the hint I replaced the cable and the issue is now gone for me.
>
> I still don't understand why this happened as it was working great for years and is still working with kernels 6.8.7 or older.
> But nevertheless sorry if I wasted time of anyone because of broken hardware.
>
> Best Regards
> Benjamin
>

2024-05-21 09:47:25

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

Hi,

On Mon, May 20, 2024 at 06:53:18PM +0200, Benjamin Böhmke wrote:
> > All the USB devices seem to work fine (assuming I read this right).
>
> To keep the log small I unplugged all USB devices from the dock.
> But even if connected I don't have issues with them.

Okay that's good to know.

Yeah, in the dmesg it might seem odd that the xHCI is "gone" for a while
as we do USB4 topology reset but it comes back after the tunnels get
re-created.

> > There is the DP tunneling limitation but other than that how the dock
> > does not work? At least reading this log everything else seems to be
> > fine except the second monitor?
>
> Exactly only the second monitor is/was not working.

Got it.

> > Now it is interesting why the link is only 20G and not 40G. I do have
> > this same device and it gets the link up as 40G just fine:
> >
> > [ 17.867868] thunderbolt 0000:00:0d.2: 1: current link speed 20.0 Gb/s
> > [ 17.867869] thunderbolt 0000:00:0d.2: 1: current link width symmetric, single lane
> > [ 17.868437] thunderbolt 0000:00:0d.2: 0:1: total credits changed 120 -> 60
> > [ 17.868625] thunderbolt 0000:00:0d.2: 0:2: total credits changed 0 -> 60
> > [ 17.872472] thunderbolt 0000:00:0d.2: 1: TMU: current mode: bi-directional, HiFi
> > [ 17.872608] thunderbolt 0-1: new device found, vendor=0x3d device=0x11
> > [ 17.879102] thunderbolt 0-1: CalDigit, Inc. TS3 Plus
> >
>
> My dock is a little different model (see https://www.caldigit.com/usb-c-pro-dock/)
> I don't have a CalDigit TS3 Plus.

Indeed, my mistake.

> > Do you use a Thunderbolt cable or some regular type-C one? There is the
> > lightning symbol on the connector when it is Thunderbolt one.
>
> The dock was connected with a Thunderbolt cable, that I used for a
> couple of years without any issues. Based on the hint I replaced the
> cable and the issue is now gone for me.
>
> I still don't understand why this happened as it was working great for
> years and is still working with kernels 6.8.7 or older. But
> nevertheless sorry if I wasted time of anyone because of broken
> hardware.

I think the BIOS CM creates the "first" tunnel using reduced
capabilities already so this makes the "second" tunnel fit there in the
18G link. Now that we do the reset the "first" tunnel is re-created with
max capabilities and that makes the "second" not to fit there anymore.

But now you get the full 40G link :)

2024-05-21 10:53:15

by Mario Limonciello

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7


>> I still don't understand why this happened as it was working great for
>> years and is still working with kernels 6.8.7 or older. But
>> nevertheless sorry if I wasted time of anyone because of broken
>> hardware.
>
> I think the BIOS CM creates the "first" tunnel using reduced
> capabilities already so this makes the "second" tunnel fit there in the
> 18G link. Now that we do the reset the "first" tunnel is re-created with
> max capabilities and that makes the "second" not to fit there anymore.
>
> But now you get the full 40G link :)

Well that's awesome! That confirms there were other issues besides the
one Sanath found that get fixed by not reusing BIOS tunnels.

2024-05-21 11:12:08

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

Hi,

On Mon, May 20, 2024 at 07:30:28PM +0200, Gia wrote:
> In my case I use the official Thunderbolt cable that came with my
> CalDigit TS3 Plus and yet the log - attached in a previous email -
> says current link speed 10.0 Gb/s. I just tried a good quality USB4
> cable too and nothing changed.

I will take a look at your logs today but in the meantime can you run
following command on the system with the dock connected?

# tbdump -r 0 -a 1 -vv -N 2 LANE_ADP_CS_0

Here tbdump comes from https://github.com/intel/tbtools. It should be
pretty straighforward to build but let me know if any issues
(unfortunately there is no binary package available at this time).

The '-a 1' should match the adapter the dock is connected. You can get
it for instance like this (this is an example from my system):

# tblist
Domain 0 Route 0: 8087:7eb2 Intel Gen14
Domain 0 Route 1: 003d:0011 CalDigit, Inc. TS3 Plus
Domain 1 Route 0: 8087:7eb2 Intel Gen14

Here the CalDigit has "Route 1" so it means I use "-a 1" above. It could
be also "Domain 0 Route 3" in which case replace the "-a 1" with "-a 3".

This command should dump two lane adapter registers LANE_ADP_CS_0/1 that
show the link capabilities.

2024-05-21 11:12:23

by Mika Westerberg

[permalink] [raw]
Subject: Re: [REGRESSION][BISECTED] "xHCI host controller not responding, assume dead" on stable kernel > 6.8.7

Hi,

On Mon, May 20, 2024 at 05:57:42PM +0200, Gia wrote:
> Hi Mario,
>
> In my case in both cases the value for:
>
> $ cat /sys/bus/thunderbolt/devices/domain0/iommu_dma_protection
>
> is 0.
>
> Output of sudo journalctl -k with kernel option thunderbolt.dyndbg=+p:
> https://codeshare.io/qAXLoj
>
> Output of sudo dmesg with kernel option thunderbolt.dyndbg=+p:
> https://codeshare.io/zlPgRb

I see you have "pcie_aspm=off" in the kernel command line. That kind of
affects things. Can you drop that and see if it changes anything? And
also provide a new full dmesg with "thunderbolt.dyndbg=+p" in the
command line (dropping pcie_aspm_off)?

Also is there any particular reason you have it there?