2022-05-29 14:24:01

by Peter Sutton

[permalink] [raw]
Subject: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

Hi,

Compile the attached C program (gcc -lbluetooth bug.c) and execute:

$ ./a.out

Interrupt (^C/SIGINT) during the connect. The process should hang and
the Bluetooth socket will now be in deadlock.

Kernel thread stack:

[May29 12:23] INFO: task krfcommd:902 blocked for more than 122 seconds.
[ +0.000009] Tainted: P OE 5.18.0-arch1-1 #1
[ +0.000004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000002] task:krfcommd state:D stack: 0 pid: 902 ppid:
2 flags:0x00004000
[ +0.000010] Call Trace:
[ +0.000003] <TASK>
[ +0.000007] __schedule+0x37c/0x11f0
[ +0.000013] ? __schedule+0x384/0x11f0
[ +0.000012] ? l2cap_chan_create+0x138/0x180 [bluetooth
da0a812fd33c72f9c94149bd973bd9835fc8aa63]
[ +0.000104] schedule+0x4f/0xb0
[ +0.000008] schedule_preempt_disabled+0x15/0x20
[ +0.000009] __mutex_lock.constprop.0+0x2d0/0x480
[ +0.000012] rfcomm_run+0x152/0x1900 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000018] ? ttwu_do_wakeup+0x17/0x160
[ +0.000011] ? _raw_spin_rq_lock_irqsave+0x20/0x20
[ +0.000010] ? rfcomm_check_accept+0xa0/0xa0 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000015] kthread+0xde/0x110
[ +0.000011] ? kthread_complete_and_exit+0x20/0x20
[ +0.000010] ret_from_fork+0x22/0x30
[ +0.000012] </TASK>

Task stack:

[ +0.000003] INFO: task a.out:1035 blocked for more than 122 seconds.
[ +0.000004] Tainted: P OE 5.18.0-arch1-1 #1
[ +0.000003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000001] task:a.out state:D stack: 0 pid: 1035 ppid:
817 flags:0x00004006
[ +0.000008] Call Trace:
[ +0.000002] <TASK>
[ +0.000003] __schedule+0x37c/0x11f0
[ +0.000009] ? __mod_memcg_state+0x2f/0x70
[ +0.000008] schedule+0x4f/0xb0
[ +0.000007] __lock_sock+0x7d/0xc0
[ +0.000010] ? cpuacct_percpu_seq_show+0x20/0x20
[ +0.000009] lock_sock_nested+0x48/0x50
[ +0.000009] rfcomm_sk_state_change+0x2b/0x120 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000018] __rfcomm_dlc_close+0x99/0x210 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000015] rfcomm_dlc_close+0x6e/0xb0 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000015] __rfcomm_sock_close+0x2e/0xe0 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000017] rfcomm_sock_shutdown+0x65/0xa0 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000016] rfcomm_sock_release+0x32/0xb0 [rfcomm
70c711e71e4c70ddabda45ec756f02d9606ec257]
[ +0.000016] __sock_release+0x3d/0xa0
[ +0.000010] sock_close+0x15/0x20
[ +0.000009] __fput+0x89/0x240
[ +0.000011] task_work_run+0x60/0x90
[ +0.000007] do_exit+0x337/0xac0
[ +0.000010] ? del_timer_sync+0x73/0xb0
[ +0.000006] do_group_exit+0x31/0xa0
[ +0.000009] get_signal+0x986/0x990
[ +0.000007] ? bt_sock_wait_state+0x124/0x1a0 [bluetooth
da0a812fd33c72f9c94149bd973bd9835fc8aa63]
[ +0.000060] ? wake_up_q+0x90/0x90
[ +0.000010] arch_do_signal_or_restart+0x48/0x760
[ +0.000012] exit_to_user_mode_prepare+0xd3/0x140
[ +0.000008] syscall_exit_to_user_mode+0x26/0x50
[ +0.000006] do_syscall_64+0x6b/0x90
[ +0.000009] ? exc_page_fault+0x74/0x170
[ +0.000009] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ +0.000007] RIP: 0033:0x7f4ab4f13557
[ +0.000006] RSP: 002b:00007fff5b37cc38 EFLAGS: 00000246 ORIG_RAX:
000000000000002a
[ +0.000007] RAX: fffffffffffffffc RBX: 00007fff5b37cd78 RCX: 00007f4ab4f13557
[ +0.000004] RDX: 000000000000000a RSI: 00007fff5b37cc4e RDI: 0000000000000003
[ +0.000004] RBP: 00007fff5b37cc60 R08: 0fffffffffffffff R09: 0000000000000000
[ +0.000003] R10: 00007f4ab4e075e0 R11: 0000000000000246 R12: 0000000000000000
[ +0.000003] R13: 00007fff5b37cd88 R14: 0000562da1cefde0 R15: 00007f4ab5214000
[ +0.000007] </TASK>

Process stack:

[<0>] __lock_sock+0x7d/0xc0
[<0>] lock_sock_nested+0x48/0x50
[<0>] rfcomm_sk_state_change+0x2b/0x120 [rfcomm]
[<0>] __rfcomm_dlc_close+0x99/0x210 [rfcomm]
[<0>] rfcomm_dlc_close+0x6e/0xb0 [rfcomm]
[<0>] __rfcomm_sock_close+0x2e/0xe0 [rfcomm]
[<0>] rfcomm_sock_shutdown+0x65/0xa0 [rfcomm]
[<0>] rfcomm_sock_release+0x32/0xb0 [rfcomm]
[<0>] __sock_release+0x3d/0xa0
[<0>] sock_close+0x15/0x20
[<0>] __fput+0x89/0x240
[<0>] task_work_run+0x60/0x90
[<0>] do_exit+0x337/0xac0
[<0>] do_group_exit+0x31/0xa0
[<0>] get_signal+0x986/0x990
[<0>] arch_do_signal_or_restart+0x48/0x760
[<0>] exit_to_user_mode_prepare+0xd3/0x140
[<0>] syscall_exit_to_user_mode+0x26/0x50
[<0>] do_syscall_64+0x6b/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Replicated by Matt (CC'ed running 5.15.39) on different hardware and
Lloyd (CC'ed) on same hardware with same stack trace. Tested on
up-to-date Arch Linux (5.18.0).

Let me know if you need anything else. Cheers
--
Pete.


Attachments:
bug.c (419.00 B)

2022-05-30 07:34:28

by Paul Menzel

[permalink] [raw]
Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

Dear Pete,


Thank you for your email with a reproducer.

Am 29.05.22 um 13:42 schrieb Peter Sutton:

> Compile the attached C program (gcc -lbluetooth bug.c) and execute:
>
> $ ./a.out
>
> Interrupt (^C/SIGINT) during the connect. The process should hang and
> the Bluetooth socket will now be in deadlock.
>
> Kernel thread stack:

Google Mail’s compositor wraps lines after 72 characters, making it
harder to read.

> [May29 12:23] INFO: task krfcommd:902 blocked for more than 122 seconds.
> [ +0.000009] Tainted: P OE 5.18.0-arch1-1 #1
> [ +0.000004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ +0.000002] task:krfcommd state:D stack: 0 pid: 902 ppid: 2 flags:0x00004000
> [ +0.000010] Call Trace:
> [ +0.000003] <TASK>
> [ +0.000007] __schedule+0x37c/0x11f0
> [ +0.000013] ? __schedule+0x384/0x11f0
> [ +0.000012] ? l2cap_chan_create+0x138/0x180 [bluetooth da0a812fd33c72f9c94149bd973bd9835fc8aa63]
> [ +0.000104] schedule+0x4f/0xb0
> [ +0.000008] schedule_preempt_disabled+0x15/0x20
> [ +0.000009] __mutex_lock.constprop.0+0x2d0/0x480
> [ +0.000012] rfcomm_run+0x152/0x1900 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000018] ? ttwu_do_wakeup+0x17/0x160
> [ +0.000011] ? _raw_spin_rq_lock_irqsave+0x20/0x20
> [ +0.000010] ? rfcomm_check_accept+0xa0/0xa0 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000015] kthread+0xde/0x110
> [ +0.000011] ? kthread_complete_and_exit+0x20/0x20
> [ +0.000010] ret_from_fork+0x22/0x30
> [ +0.000012] </TASK>
>
> Task stack:
>
> [ +0.000003] INFO: task a.out:1035 blocked for more than 122 seconds.
> [ +0.000004] Tainted: P OE 5.18.0-arch1-1 #1
> [ +0.000003] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ +0.000001] task:a.out state:D stack: 0 pid: 1035 ppid: 817 flags:0x00004006
> [ +0.000008] Call Trace:
> [ +0.000002] <TASK>
> [ +0.000003] __schedule+0x37c/0x11f0
> [ +0.000009] ? __mod_memcg_state+0x2f/0x70
> [ +0.000008] schedule+0x4f/0xb0
> [ +0.000007] __lock_sock+0x7d/0xc0
> [ +0.000010] ? cpuacct_percpu_seq_show+0x20/0x20
> [ +0.000009] lock_sock_nested+0x48/0x50
> [ +0.000009] rfcomm_sk_state_change+0x2b/0x120 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000018] __rfcomm_dlc_close+0x99/0x210 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000015] rfcomm_dlc_close+0x6e/0xb0 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000015] __rfcomm_sock_close+0x2e/0xe0 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000017] rfcomm_sock_shutdown+0x65/0xa0 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000016] rfcomm_sock_release+0x32/0xb0 [rfcomm 70c711e71e4c70ddabda45ec756f02d9606ec257]
> [ +0.000016] __sock_release+0x3d/0xa0
> [ +0.000010] sock_close+0x15/0x20
> [ +0.000009] __fput+0x89/0x240
> [ +0.000011] task_work_run+0x60/0x90
> [ +0.000007] do_exit+0x337/0xac0
> [ +0.000010] ? del_timer_sync+0x73/0xb0
> [ +0.000006] do_group_exit+0x31/0xa0
> [ +0.000009] get_signal+0x986/0x990
> [ +0.000007] ? bt_sock_wait_state+0x124/0x1a0 [bluetooth da0a812fd33c72f9c94149bd973bd9835fc8aa63]
> [ +0.000060] ? wake_up_q+0x90/0x90
> [ +0.000010] arch_do_signal_or_restart+0x48/0x760
> [ +0.000012] exit_to_user_mode_prepare+0xd3/0x140
> [ +0.000008] syscall_exit_to_user_mode+0x26/0x50
> [ +0.000006] do_syscall_64+0x6b/0x90
> [ +0.000009] ? exc_page_fault+0x74/0x170
> [ +0.000009] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ +0.000007] RIP: 0033:0x7f4ab4f13557
> [ +0.000006] RSP: 002b:00007fff5b37cc38 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> [ +0.000007] RAX: fffffffffffffffc RBX: 00007fff5b37cd78 RCX: 00007f4ab4f13557
> [ +0.000004] RDX: 000000000000000a RSI: 00007fff5b37cc4e RDI: 0000000000000003
> [ +0.000004] RBP: 00007fff5b37cc60 R08: 0fffffffffffffff R09: 0000000000000000
> [ +0.000003] R10: 00007f4ab4e075e0 R11: 0000000000000246 R12: 0000000000000000
> [ +0.000003] R13: 00007fff5b37cd88 R14: 0000562da1cefde0 R15: 00007f4ab5214000
> [ +0.000007] </TASK>
>
> Process stack:
>
> [<0>] __lock_sock+0x7d/0xc0
> [<0>] lock_sock_nested+0x48/0x50
> [<0>] rfcomm_sk_state_change+0x2b/0x120 [rfcomm]
> [<0>] __rfcomm_dlc_close+0x99/0x210 [rfcomm]
> [<0>] rfcomm_dlc_close+0x6e/0xb0 [rfcomm]
> [<0>] __rfcomm_sock_close+0x2e/0xe0 [rfcomm]
> [<0>] rfcomm_sock_shutdown+0x65/0xa0 [rfcomm]
> [<0>] rfcomm_sock_release+0x32/0xb0 [rfcomm]
> [<0>] __sock_release+0x3d/0xa0
> [<0>] sock_close+0x15/0x20
> [<0>] __fput+0x89/0x240
> [<0>] task_work_run+0x60/0x90
> [<0>] do_exit+0x337/0xac0
> [<0>] do_group_exit+0x31/0xa0
> [<0>] get_signal+0x986/0x990
> [<0>] arch_do_signal_or_restart+0x48/0x760
> [<0>] exit_to_user_mode_prepare+0xd3/0x140
> [<0>] syscall_exit_to_user_mode+0x26/0x50
> [<0>] do_syscall_64+0x6b/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Replicated by Matt (CC'ed running 5.15.39) on different hardware and
> Lloyd (CC'ed) on same hardware with same stack trace. Tested on
> up-to-date Arch Linux (5.18.0).

What hardware is that?

> Let me know if you need anything else.

As a lot of patches are also applied to the stable series, do you know,
if this is a regression? Does it work with Linux 5.15(.0) or 5.10?


Kind regards,

Paul


> --
> Pete.

Only, if you care, the standard signature delimiter has a trailing
space: `-- ` [1].


[1]: https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter

2022-05-30 14:04:40

by Peter Sutton

[permalink] [raw]
Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

Commit b7ce436a5d798bc59e71797952566608a4b4626b is the probable cause.
I compiled a custom Arch Linux kernel package [1] and the bug was
present. Reverting the commit fixed the bug. Below is the reply I was
writing before Matt found the suspect commit and I tested with the
custom kernel.

> What hardware is that?

$ dmesg | grep iwlwifi
Me: Intel(R) Dual Band Wireless AC 8260, REV=0x204
Matt: Intel(R) Dual Band Wireless AC 8265, REV=0x230

We both get:

$ lsusb | grep Bluetooth
Me & Matt: Bus 001 Device 006: ID 8087:0a2b Intel Corp. Bluetooth
wireless interface

> As a lot of patches are also applied to the stable series, do you know,
> if this is a regression? Does it work with Linux 5.15(.0) or 5.10?

Bug is present on current Arch Linux LTS kernel:

$ uname -a
Linux taffer 5.15.43-1-lts #1 SMP Wed, 25 May 2022 14:08:34 +0000
x86_64 GNU/Linux

Matt tested on 5.10.115 and the bug is not present. So I guess it's a
regression. Anecdotally, we encountered this behaviour 1 yr ago
(difficult to say exactly), then it went away but came back about 1 or
2 months ago. All of this is on Arch Linux, I update about once a
week.

[1] https://wiki.archlinux.org/title/Kernel/Arch_Build_System

2022-09-11 15:44:15

by Peter Sutton

[permalink] [raw]
Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

Just following this up. Is there anything I can do to help fix this?
Running a custom kernel is a real pain. I've been running with the
commit revert and upgrading with Arch Linux kernel releases with no
issue.

Thanks,


Pete.

On Mon, 30 May 2022 at 12:44, Peter Sutton <[email protected]> wrote:
>
> Commit b7ce436a5d798bc59e71797952566608a4b4626b is the probable cause.
> I compiled a custom Arch Linux kernel package [1] and the bug was
> present. Reverting the commit fixed the bug. Below is the reply I was
> writing before Matt found the suspect commit and I tested with the
> custom kernel.
>
> > What hardware is that?
>
> $ dmesg | grep iwlwifi
> Me: Intel(R) Dual Band Wireless AC 8260, REV=0x204
> Matt: Intel(R) Dual Band Wireless AC 8265, REV=0x230
>
> We both get:
>
> $ lsusb | grep Bluetooth
> Me & Matt: Bus 001 Device 006: ID 8087:0a2b Intel Corp. Bluetooth
> wireless interface
>
> > As a lot of patches are also applied to the stable series, do you know,
> > if this is a regression? Does it work with Linux 5.15(.0) or 5.10?
>
> Bug is present on current Arch Linux LTS kernel:
>
> $ uname -a
> Linux taffer 5.15.43-1-lts #1 SMP Wed, 25 May 2022 14:08:34 +0000
> x86_64 GNU/Linux
>
> Matt tested on 5.10.115 and the bug is not present. So I guess it's a
> regression. Anecdotally, we encountered this behaviour 1 yr ago
> (difficult to say exactly), then it went away but came back about 1 or
> 2 months ago. All of this is on Arch Linux, I update about once a
> week.
>
> [1] https://wiki.archlinux.org/title/Kernel/Arch_Build_System

2022-09-12 06:50:56

by Paul Menzel

[permalink] [raw]
Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

[Cc: +regressions]

#regzbot ^introduced: b7ce436a5d798bc59e71797952566608a4b4626b
#regzbot title: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket
release when connect interrupted

Dear Pete,


Am 11.09.22 um 17:42 schrieb Peter Sutton:
> Just following this up. Is there anything I can do to help fix this?
> Running a custom kernel is a real pain. I've been running with the
> commit revert and upgrading with Arch Linux kernel releases with no
> issue.

(Please do not top post.)

Have you tested bluetooth-next already? Regardless, the offending commit
present in Linux since 5.15-rc1 should be reverted.


Kind regards,

Paul


> On Mon, 30 May 2022 at 12:44, Peter Sutton <[email protected]> wrote:
>>
>> Commit b7ce436a5d798bc59e71797952566608a4b4626b is the probable cause.
>> I compiled a custom Arch Linux kernel package [1] and the bug was
>> present. Reverting the commit fixed the bug. Below is the reply I was
>> writing before Matt found the suspect commit and I tested with the
>> custom kernel.
>>
>>> What hardware is that?
>>
>> $ dmesg | grep iwlwifi
>> Me: Intel(R) Dual Band Wireless AC 8260, REV=0x204
>> Matt: Intel(R) Dual Band Wireless AC 8265, REV=0x230
>>
>> We both get:
>>
>> $ lsusb | grep Bluetooth
>> Me & Matt: Bus 001 Device 006: ID 8087:0a2b Intel Corp. Bluetooth wireless interface
>>
>>> As a lot of patches are also applied to the stable series, do you know,
>>> if this is a regression? Does it work with Linux 5.15(.0) or 5.10?
>>
>> Bug is present on current Arch Linux LTS kernel:
>>
>> $ uname -a
>> Linux taffer 5.15.43-1-lts #1 SMP Wed, 25 May 2022 14:08:34 +0000 x86_64 GNU/Linux
>>
>> Matt tested on 5.10.115 and the bug is not present. So I guess it's a
>> regression. Anecdotally, we encountered this behaviour 1 yr ago
>> (difficult to say exactly), then it went away but came back about 1 or
>> 2 months ago. All of this is on Arch Linux, I update about once a
>> week.
>>
>> [1] https://wiki.archlinux.org/title/Kernel/Arch_Build_System

2022-09-13 16:49:47

by Desmond Cheong Zhi Xi

[permalink] [raw]
Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

Hi all,

On 13/9/22 08:20, Peter Sutton wrote:
>> a fix for a deadlock for RFCOMM sk state change was posted last year already:
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> It seems it never went anywhere, unless I'm missing something. Is that
>> maybe the same problem or somehow related?
>
> I mentioned this on the Arch Linux Matrix channel. The `linux` package
> maintainer said they had encountered the same and added the linked
> patch to the Arch Linux kernel package but removed it because it
> wasn't merged (which explains why my issue went way then came back).
> Anyway, we compiled a 5.19.8 `linux` package with the patch (which
> fixes my issue) and they said they'll add the patch back to the linux
> package.
Thanks for following up on this issue. I'd completely missed it until
recently, and Luiz has suggested an alternative approach to fixing the
problem here:

https://lore.kernel.org/lkml/CABBYNZJxzA0U5bL6d0KtAkZw6yfUSNcpaH3Oh=xZFZdER8FCog@mail.gmail.com/

I'm planning to take a crack at it, but probably don't have the cycles
for another week or so.

Best wishes,
Desmond

Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted



On 13.09.22 16:20, Peter Sutton wrote:
>> a fix for a deadlock for RFCOMM sk state change was posted last year already:
>>
>> https://lore.kernel.org/all/[email protected]/
>>
>> It seems it never went anywhere, unless I'm missing something. Is that
>> maybe the same problem or somehow related?
>
> I mentioned this on the Arch Linux Matrix channel. The `linux` package
> maintainer said they had encountered the same and added the linked
> patch to the Arch Linux kernel package but removed it because it
> wasn't merged (which explains why my issue went way then came back).
> Anyway, we compiled a 5.19.8 `linux` package with the patch (which
> fixes my issue) and they said they'll add the patch back to the linux
> package.

Well, that's fine and hopefully will solve your issue soon, but the arch
maintainers are right: this should be fixed upstream.

Luiz, is there a reason why that patch wasn't merged? What is needed to
get this merged, ideally while adding a "Link:
https://lore.kernel.org/all/CAD+dNTsbuU4w+Y_P7o+VEN7BYCAbZuwZx2+tH+OTzCdcZF82YA@mail.gmail.com/"
tag?

Ciao, Thorsten

Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

On 12.09.22 07:23, Paul Menzel wrote:
> [Cc: +regressions]
>
> #regzbot ^introduced: b7ce436a5d798bc59e71797952566608a4b4626b

thx for this

> #regzbot title: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket
> release when connect interrupted

BTW & JFYI: regzbot will automatically use the mail's subject as title
by default, so it seems in this case that "#regzbot title:" is superfluous.

> Am 11.09.22 um 17:42 schrieb Peter Sutton:
>> Just following this up. Is there anything I can do to help fix this?
>> Running a custom kernel is a real pain. I've been running with the
>> commit revert and upgrading with Arch Linux kernel releases with no
>> issue.
>
> (Please do not top post.)
>
> Have you tested bluetooth-next already? Regardless, the offending commit
> present in Linux since 5.15-rc1 should be reverted.

Well, I'd be a bit more careful here, as reverting commits after so much
time easily can cause other regressions.

>> On Mon, 30 May 2022 at 12:44, Peter Sutton <[email protected]>
>> wrote:
>>>
>>> Commit b7ce436a5d798bc59e71797952566608a4b4626b is the probable cause.
>>> I compiled a custom Arch Linux kernel package [1] and the bug was
>>> present. Reverting the commit fixed the bug. Below is the reply I was
>>> writing before Matt found the suspect commit and I tested with the
>>> custom kernel.

Anyway, the main reason why I write this: I'm currently traveling and
only took a very quick look into this, but a fix for a deadlock for
RFCOMM sk state change was posted last year already:

https://lore.kernel.org/all/[email protected]/

It seems it never went anywhere, unless I'm missing something. Is that
maybe the same problem or somehow related?

>>>> What hardware is that?
>>>
>>> $ dmesg | grep iwlwifi
>>> Me: Intel(R) Dual Band Wireless AC 8260, REV=0x204
>>> Matt: Intel(R) Dual Band Wireless AC 8265, REV=0x230
>>>
>>> We both get:
>>>
>>> $ lsusb | grep Bluetooth
>>> Me & Matt: Bus 001 Device 006: ID 8087:0a2b Intel Corp. Bluetooth
>>> wireless interface
>>>
>>>> As a lot of patches are also applied to the stable series, do you know,
>>>> if this is a regression? Does it work with Linux 5.15(.0) or 5.10?
>>>
>>> Bug is present on current Arch Linux LTS kernel:
>>>
>>> $ uname -a
>>> Linux taffer 5.15.43-1-lts #1 SMP Wed, 25 May 2022 14:08:34 +0000
>>> x86_64 GNU/Linux
>>>
>>> Matt tested on 5.10.115 and the bug is not present. So I guess it's a
>>> regression. Anecdotally, we encountered this behaviour 1 yr ago
>>> (difficult to say exactly), then it went away but came back about 1 or
>>> 2 months ago. All of this is on Arch Linux, I update about once a
>>> week.
>>>
>>> [1] https://wiki.archlinux.org/title/Kernel/Arch_Build_System

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

2022-09-13 17:12:41

by Peter Sutton

[permalink] [raw]
Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted

> a fix for a deadlock for RFCOMM sk state change was posted last year already:
>
> https://lore.kernel.org/all/[email protected]/
>
> It seems it never went anywhere, unless I'm missing something. Is that
> maybe the same problem or somehow related?

I mentioned this on the Arch Linux Matrix channel. The `linux` package
maintainer said they had encountered the same and added the linked
patch to the Arch Linux kernel package but removed it because it
wasn't merged (which explains why my issue went way then came back).
Anyway, we compiled a 5.19.8 `linux` package with the patch (which
fixes my issue) and they said they'll add the patch back to the linux
package.

Subject: Re: [Bug] [Deadlock] Kernel thread deadlock in rfcomm socket release when connect interrupted #forregzbot

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 13.09.22 17:37, Thorsten Leemhuis wrote:
>
> On 13.09.22 16:20, Peter Sutton wrote:
>>> a fix for a deadlock for RFCOMM sk state change was posted last year already:
>>>
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>> It seems it never went anywhere, unless I'm missing something. Is that
>>> maybe the same problem or somehow related?
>>
>> I mentioned this on the Arch Linux Matrix channel. The `linux` package
>> maintainer said they had encountered the same and added the linked
>> patch to the Arch Linux kernel package but removed it because it
>> wasn't merged (which explains why my issue went way then came back).
>> Anyway, we compiled a 5.19.8 `linux` package with the patch (which
>> fixes my issue) and they said they'll add the patch back to the linux
>> package.
>
> Well, that's fine and hopefully will solve your issue soon, but the arch
> maintainers are right: this should be fixed upstream.
>
> Luiz, is there a reason why that patch wasn't merged? What is needed to
> get this merged, ideally while adding a "Link:
> https://lore.kernel.org/all/CAD+dNTsbuU4w+Y_P7o+VEN7BYCAbZuwZx2+tH+OTzCdcZF82YA@mail.gmail.com/"
> tag?

#regzbot fixed-by: 812e92b824c1db16

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.