2024-04-21 13:54:47

by Jeremy Lainé

[permalink] [raw]
Subject: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hello!

After upgrading my kernel to Debian's latest version (6.1.85), I
started encountering systematic kernel BUGs at boot, making the
bluetooth stack unusable. I initially reported this to Debian's bug
tracker:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301

.. but have since confirmed that this is reproducible with vanilla
kernels, including the latest 6.1.y version (6.1.87).

I tried various kernel versions (straight from kernel.org) to pinpoint
when the problem started occurring and the resultats are:

- linux 6.1.80 => OK
- linux 6.1.82 => OK
- linux 6.1.83 => BUG
- linux 6.1.85 => BUG
- linux 6.1.87 => BUG

I have included a trace below, and full system details are available
in the Debian bug listed above. Can you suggest any other tests I can
perform to help diagnose the origin of the problem?

[ 22.660847] list_del corruption, ffff94d9f6302000->prev is
LIST_POISON2 (dead000000000122)
[ 22.660887] ------------[ cut here ]------------
[ 22.660890] kernel BUG at lib/list_debug.c:56!
[ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted
6.1.0-20-amd64 #1 Debian 6.1.85-1
[ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
[ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
[ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
[ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a
9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00
a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48
89 fe
[ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
[ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
[ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
[ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
[ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
[ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
[ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000)
knlGS:0000000000000000
[ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
[ 22.661214] PKRU: 55555554
[ 22.661218] Call Trace:
[ 22.661225] <TASK>
[ 22.661232] ? __die_body.cold+0x1a/0x1f
[ 22.661246] ? die+0x2a/0x50
[ 22.661257] ? do_trap+0xc5/0x110
[ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661279] ? do_error_trap+0x6a/0x90
[ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661298] ? exc_invalid_op+0x4c/0x60
[ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661316] ? asm_exc_invalid_op+0x16/0x20
[ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
[ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
[ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
[ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
[ 22.661915] process_one_work+0x1c4/0x380
[ 22.661929] worker_thread+0x4d/0x380
[ 22.661940] ? rescuer_thread+0x3a0/0x3a0
[ 22.661950] kthread+0xd7/0x100
[ 22.661959] ? kthread_complete_and_exit+0x20/0x20
[ 22.661969] ret_from_fork+0x1f/0x30
[ 22.661984] </TASK>
[ 22.661987] Modules linked in: ctr ccm nft_chain_nat xt_MASQUERADE
nf_nat nf_conntrack_netlink br_netfilter bridge stp llc xfrm_user
xfrm_algo nvme_fabrics rfcomm snd_seq_dummy snd_hrtimer snd_seq
snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led
snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_sof_probes
snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw
regmap_sdw_mbq snd_hda_codec_hdmi regmap_sdw overlay ip6t_REJECT
nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4
xt_LOG qrtr nf_log_syslog nft_limit bnep ipmi_devintf ipmi_msghandler
xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink
binfmt_misc nls_ascii nls_cp437 vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp snd_soc_dmic snd_sof_pci_intel_tgl
snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation
soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp
snd_sof snd_sof_utils
[ 22.662122] snd_soc_hdac_hda snd_hda_ext_core
snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel
snd_compress btusb soundwire_bus btrtl kvm btbcm snd_hda_intel btintel
snd_intel_dspcfg btmtk dell_laptop snd_intel_sdw_acpi irqbypass
ledtrig_audio bluetooth snd_hda_codec i915 snd_hda_core rapl mei_hdcp
intel_rapl_msr snd_hwdep processor_thermal_device_pci dell_wmi joydev
hid_sensor_als intel_cstate jitterentropy_rng processor_thermal_device
snd_pcm hid_sensor_trigger processor_thermal_rfim dell_smbios
ucsi_acpi dcdbas hid_sensor_iio_common processor_thermal_mbox
drm_buddy intel_uncore iwlmvm pcspkr drbg iTCO_wdt typec_ucsi
dell_wmi_sysman snd_timer industrialio_triggered_buffer
drm_display_helper processor_thermal_rapl mei_me dell_wmi_descriptor
firmware_attributes_class kfifo_buf wmi_bmof ansi_cprng intel_pmc_bxt
cec snd roles intel_rapl_common ecdh_generic iTCO_vendor_support
int3403_thermal watchdog ecc industrialio mei soundcore typec
int3400_thermal rc_core mac80211
[ 22.662253] int340x_thermal_zone intel_pmc_core button intel_hid
acpi_thermal_rel sparse_keymap ttm acpi_pad acpi_tad drm_kms_helper
libarc4 igen6_edac i2c_algo_bit ac evdev hid_multitouch serio_raw
iwlwifi cfg80211 rfkill msr parport_pc ppdev lp drm parport fuse loop
efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16
mbcache jbd2 crc32c_generic usbhid hid_sensor_custom hid_sensor_hub
dm_crypt dm_mod intel_ishtp_hid nvme nvme_core t10_pi
crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic
crc64 ahci libahci crct10dif_pclmul crct10dif_common libata
crc32_pclmul crc32c_intel scsi_mod spi_pxa2xx_platform
ghash_clmulni_intel dw_dmac hid_generic sha512_ssse3 scsi_common
dw_dmac_core xhci_pci sha512_generic sha256_ssse3 xhci_hcd sha1_ssse3
usbcore i2c_hid_acpi intel_lpss_pci aesni_intel video intel_ish_ipc
i2c_i801 i2c_hid intel_lpss psmouse thunderbolt crypto_simd cryptd
i2c_smbus vmd intel_ishtp usb_common idma64 hid battery wmi
[ 22.662422] ---[ end trace 0000000000000000 ]---

Cheers,

Jeremy


2024-04-21 21:01:23

by Paul Menzel

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

#regzbot introduced: v6.1.82..v6.1.83


Dear Jeremy,


Am 21.04.24 um 15:54 schrieb Jeremy Lainé:

> After upgrading my kernel to Debian's latest version (6.1.85), I
> started encountering systematic kernel BUGs at boot, making the
> bluetooth stack unusable. I initially reported this to Debian's bug
> tracker:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301
>
> .. but have since confirmed that this is reproducible with vanilla
> kernels, including the latest 6.1.y version (6.1.87).

Thank you for reporting this and taking the time to pinpoint the version.

> I tried various kernel versions (straight from kernel.org) to pinpoint
> when the problem started occurring and the resultats are:
>
> - linux 6.1.80 => OK
> - linux 6.1.82 => OK
> - linux 6.1.83 => BUG
> - linux 6.1.85 => BUG
> - linux 6.1.87 => BUG
>
> I have included a trace below, and full system details are available
> in the Debian bug listed above. Can you suggest any other tests I can
> perform to help diagnose the origin of the problem?

Would you be so kind to go the extra mail, and bisect the commit between
6.1.82 and 6.1.83 [1]?

> [ 22.660847] list_del corruption, ffff94d9f6302000->prev is LIST_POISON2 (dead000000000122)
> [ 22.660887] ------------[ cut here ]------------
> [ 22.660890] kernel BUG at lib/list_debug.c:56!
> [ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted 6.1.0-20-amd64 #1 Debian 6.1.85-1
> [ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
> [ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> [ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a 9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00 a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48 89 fe
> [ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
> [ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
> [ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
> [ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
> [ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
> [ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
> [ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000) knlGS:0000000000000000
> [ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
> [ 22.661214] PKRU: 55555554
> [ 22.661218] Call Trace:
> [ 22.661225] <TASK>
> [ 22.661232] ? __die_body.cold+0x1a/0x1f
> [ 22.661246] ? die+0x2a/0x50
> [ 22.661257] ? do_trap+0xc5/0x110
> [ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661279] ? do_error_trap+0x6a/0x90
> [ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661298] ? exc_invalid_op+0x4c/0x60
> [ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661316] ? asm_exc_invalid_op+0x16/0x20
> [ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
> [ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
> [ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
> [ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
> [ 22.661915] process_one_work+0x1c4/0x380
> [ 22.661929] worker_thread+0x4d/0x380
> [ 22.661940] ? rescuer_thread+0x3a0/0x3a0
> [ 22.661950] kthread+0xd7/0x100
> [ 22.661959] ? kthread_complete_and_exit+0x20/0x20
> [ 22.661969] ret_from_fork+0x1f/0x30
> [ 22.661984] </TASK>

You can pipe the output through `scripts/decodecode` and it should show
more information.

[…]


Kind regards,

Paul


[1]:
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html

2024-04-21 23:17:31

by Jeremy Lainé

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi Paul,

On Sun, Apr 21, 2024 at 11:01 PM Paul Menzel <[email protected]> wrote:
>
> Would you be so kind to go the extra mail, and bisect the commit between
> 6.1.82 and 6.1.83 [1]?
>

Thanks for the link to the instructions, here's the bisect log:

git bisect start
# status: waiting for both good and bad commits
# good: [d7543167affd372819a94879b8b1e8b9b12547d9] Linux 6.1.82
git bisect good d7543167affd372819a94879b8b1e8b9b12547d9
# status: waiting for bad commit, 1 good commit known
# bad: [e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1] Linux 6.1.83
git bisect bad e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
# bad: [440e278cb53b8dd6627c32e84950350083c39d35] net: kcm: fix
incorrect parameter validation in the kcm_getsockopt) function
git bisect bad 440e278cb53b8dd6627c32e84950350083c39d35
# good: [a4116bd6ee5e1c1b65a61ed9221657615a2f45bf] arm64: dts:
imx8mm-kontron: Disable pull resistors for SD card signals on BL OSM-S
board
git bisect good a4116bd6ee5e1c1b65a61ed9221657615a2f45bf
# good: [e16c33dd9967b7f20987bf653acc4f605836127b] net: mctp: copy skb
ext data when fragmenting
git bisect good e16c33dd9967b7f20987bf653acc4f605836127b
# bad: [6083089ab00631617f9eac678df3ab050a9d837a] Bluetooth: hci_conn:
Consolidate code for aborting connections
git bisect bad 6083089ab00631617f9eac678df3ab050a9d837a
# good: [934212a623cbab851848b6de377eb476718c3e4c] SUNRPC: fix some
memleaks in gssx_dec_option_array
git bisect good 934212a623cbab851848b6de377eb476718c3e4c
# good: [8499af0616cf76e6cbe811107e3f5b33bd472041] igb: Fix missing
time sync events
git bisect good 8499af0616cf76e6cbe811107e3f5b33bd472041
# good: [653a17a99d752ffde175d4bc96154f2a3642f400] Bluetooth: Remove
superfluous call to hci_conn_check_pending()
git bisect good 653a17a99d752ffde175d4bc96154f2a3642f400
# good: [1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84] Bluetooth: Cancel
sync command before suspend and power off
git bisect good 1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84
# good: [ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c] Bluetooth:
hci_sync: Only allow hci_cmd_sync_queue if running
git bisect good ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c
# first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
Bluetooth: hci_conn: Consolidate code for aborting connections


> You can pipe the output through `scripts/decodecode` and it should show
> more information.

This was the output of running the dmesg snippet through `scripts/decodecode`:

All code
========
0: fe (bad)
1: ff 0f decl (%rdi)
3: 0b 48 89 or -0x77(%rax),%ecx
6: f2 48 89 fe repnz mov %rdi,%rsi
a: 48 c7 c7 48 18 7a 9f mov $0xffffffff9f7a1848,%rdi
11: e8 14 a1 fe ff call 0xfffffffffffea12a
16: 0f 0b ud2
18: 48 89 fe mov %rdi,%rsi
1b: 48 89 ca mov %rcx,%rdx
1e: 48 c7 c7 10 18 7a 9f mov $0xffffffff9f7a1810,%rdi
25: e8 00 a1 fe ff call 0xfffffffffffea12a
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 89 fe mov %rdi,%rsi
2f: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
36: e8 ef a0 fe ff call 0xfffffffffffea12a
3b: 0f 0b ud2
3d: 48 89 fe mov %rdi,%rsi

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 89 fe mov %rdi,%rsi
5: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
c: e8 ef a0 fe ff call 0xfffffffffffea100
11: 0f 0b ud2
13: 48 89 fe mov %rdi,%rsi

Best regards,
Jeremy

Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 22.04.24 01:17, Jeremy Lainé wrote:
>
> On Sun, Apr 21, 2024 at 11:01 PM Paul Menzel <[email protected]> wrote:
>>
>> Would you be so kind to go the extra mail, and bisect the commit between
>> 6.1.82 and 6.1.83 [1]?
>>
>
> Thanks for the link to the instructions, here's the bisect log:

Thx! Did you also test if mainline (e.g. 6.9-rc5) is affected? Without
this we won't know if this is something the stable team or the regular
bluetooth developers have to handle.

Ciao, Thorsten

> git bisect start
> # status: waiting for both good and bad commits
> # good: [d7543167affd372819a94879b8b1e8b9b12547d9] Linux 6.1.82
> git bisect good d7543167affd372819a94879b8b1e8b9b12547d9
> # status: waiting for bad commit, 1 good commit known
> # bad: [e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1] Linux 6.1.83
> git bisect bad e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
> # bad: [440e278cb53b8dd6627c32e84950350083c39d35] net: kcm: fix
> incorrect parameter validation in the kcm_getsockopt) function
> git bisect bad 440e278cb53b8dd6627c32e84950350083c39d35
> # good: [a4116bd6ee5e1c1b65a61ed9221657615a2f45bf] arm64: dts:
> imx8mm-kontron: Disable pull resistors for SD card signals on BL OSM-S
> board
> git bisect good a4116bd6ee5e1c1b65a61ed9221657615a2f45bf
> # good: [e16c33dd9967b7f20987bf653acc4f605836127b] net: mctp: copy skb
> ext data when fragmenting
> git bisect good e16c33dd9967b7f20987bf653acc4f605836127b
> # bad: [6083089ab00631617f9eac678df3ab050a9d837a] Bluetooth: hci_conn:
> Consolidate code for aborting connections
> git bisect bad 6083089ab00631617f9eac678df3ab050a9d837a
> # good: [934212a623cbab851848b6de377eb476718c3e4c] SUNRPC: fix some
> memleaks in gssx_dec_option_array
> git bisect good 934212a623cbab851848b6de377eb476718c3e4c
> # good: [8499af0616cf76e6cbe811107e3f5b33bd472041] igb: Fix missing
> time sync events
> git bisect good 8499af0616cf76e6cbe811107e3f5b33bd472041
> # good: [653a17a99d752ffde175d4bc96154f2a3642f400] Bluetooth: Remove
> superfluous call to hci_conn_check_pending()
> git bisect good 653a17a99d752ffde175d4bc96154f2a3642f400
> # good: [1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84] Bluetooth: Cancel
> sync command before suspend and power off
> git bisect good 1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84
> # good: [ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c] Bluetooth:
> hci_sync: Only allow hci_cmd_sync_queue if running
> git bisect good ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c
> # first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
> Bluetooth: hci_conn: Consolidate code for aborting connections
>
>
>> You can pipe the output through `scripts/decodecode` and it should show
>> more information.
>
> This was the output of running the dmesg snippet through `scripts/decodecode`:
>
> All code
> ========
> 0: fe (bad)
> 1: ff 0f decl (%rdi)
> 3: 0b 48 89 or -0x77(%rax),%ecx
> 6: f2 48 89 fe repnz mov %rdi,%rsi
> a: 48 c7 c7 48 18 7a 9f mov $0xffffffff9f7a1848,%rdi
> 11: e8 14 a1 fe ff call 0xfffffffffffea12a
> 16: 0f 0b ud2
> 18: 48 89 fe mov %rdi,%rsi
> 1b: 48 89 ca mov %rcx,%rdx
> 1e: 48 c7 c7 10 18 7a 9f mov $0xffffffff9f7a1810,%rdi
> 25: e8 00 a1 fe ff call 0xfffffffffffea12a
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: 48 89 fe mov %rdi,%rsi
> 2f: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
> 36: e8 ef a0 fe ff call 0xfffffffffffea12a
> 3b: 0f 0b ud2
> 3d: 48 89 fe mov %rdi,%rsi
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 48 89 fe mov %rdi,%rsi
> 5: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
> c: e8 ef a0 fe ff call 0xfffffffffffea100
> 11: 0f 0b ud2
> 13: 48 89 fe mov %rdi,%rsi
>
> Best regards,
> Jeremy
>
>

2024-04-22 08:27:40

by Jeremy Lainé

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi Thorsten,

On Mon, Apr 22, 2024 at 7:41 AM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> Thx! Did you also test if mainline (e.g. 6.9-rc5) is affected? Without
> this we won't know if this is something the stable team or the regular
> bluetooth developers have to handle.

I'm now running 6.9-rc5 and have not been able to reproduce the issue,
so it does sound like it's an issue for the stable team.

Cheers,
Jeremy

Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi stable team (and Bluetooth maintainers), I noticed a regression
report about a BT problem in 6.1.y:

On 21.04.24 15:54, Jeremy Lainé wrote:
>
> After upgrading my kernel to Debian's latest version (6.1.85), I
> started encountering systematic kernel BUGs at boot, making the
> bluetooth stack unusable. I initially reported this to Debian's bug
> tracker:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301
>
> .. but have since confirmed that this is reproducible with vanilla
> kernels, including the latest 6.1.y version (6.1.87).
>
> I tried various kernel versions (straight from kernel.org) to pinpoint
> when the problem started occurring and the resultats are:

Jeremy later wrote:
> # first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
> Bluetooth: hci_conn: Consolidate code for aborting connections
https://lore.kernel.org/all/[email protected]/

That's a13f316e90fdb1 ("Bluetooth: hci_conn: Consolidate code for
aborting connections") [v6.6-rc1, v6.1.83 (6083089ab00631)]

FWIW, there is a fix for the mainline commit under review:
https://lore.kernel.org/all/[email protected]/

But it is likely unrelated, as Jeremy later also wrote:
> I'm now running 6.9-rc5 and have not been able to reproduce the issue,
https://lore.kernel.org/all/CADRbXaA2yFjMo=_8_ZTubPbrrmWH9yx+aG5pUadnk395koonXg@mail.gmail.com/

Makes me wonder if 6.1.y is missing some other change a13f316e90fdb1
depends on.

Ciao, Thorsten

> I have included a trace below, and full system details are available
> in the Debian bug listed above. Can you suggest any other tests I can
> perform to help diagnose the origin of the problem?
>
> [ 22.660847] list_del corruption, ffff94d9f6302000->prev is
> LIST_POISON2 (dead000000000122)
> [ 22.660887] ------------[ cut here ]------------
> [ 22.660890] kernel BUG at lib/list_debug.c:56!
> [ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted
> 6.1.0-20-amd64 #1 Debian 6.1.85-1
> [ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
> [ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> [ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a
> 9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00
> a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48
> 89 fe
> [ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
> [ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
> [ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
> [ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
> [ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
> [ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
> [ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000)
> knlGS:0000000000000000
> [ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
> [ 22.661214] PKRU: 55555554
> [ 22.661218] Call Trace:
> [ 22.661225] <TASK>
> [ 22.661232] ? __die_body.cold+0x1a/0x1f
> [ 22.661246] ? die+0x2a/0x50
> [ 22.661257] ? do_trap+0xc5/0x110
> [ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661279] ? do_error_trap+0x6a/0x90
> [ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661298] ? exc_invalid_op+0x4c/0x60
> [ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661316] ? asm_exc_invalid_op+0x16/0x20
> [ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
> [ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
> [ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
> [ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
> [ 22.661915] process_one_work+0x1c4/0x380
> [ 22.661929] worker_thread+0x4d/0x380
> [ 22.661940] ? rescuer_thread+0x3a0/0x3a0
> [ 22.661950] kthread+0xd7/0x100
> [ 22.661959] ? kthread_complete_and_exit+0x20/0x20
> [ 22.661969] ret_from_fork+0x1f/0x30
> [ 22.661984] </TASK>
> [ 22.661987] Modules linked in: ctr ccm nft_chain_nat xt_MASQUERADE
> nf_nat nf_conntrack_netlink br_netfilter bridge stp llc xfrm_user
> xfrm_algo nvme_fabrics rfcomm snd_seq_dummy snd_hrtimer snd_seq
> snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led
> snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_sof_probes
> snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw
> regmap_sdw_mbq snd_hda_codec_hdmi regmap_sdw overlay ip6t_REJECT
> nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4
> xt_LOG qrtr nf_log_syslog nft_limit bnep ipmi_devintf ipmi_msghandler
> xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink
> binfmt_misc nls_ascii nls_cp437 vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp snd_soc_dmic snd_sof_pci_intel_tgl
> snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation
> soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp
> snd_sof snd_sof_utils
> [ 22.662122] snd_soc_hdac_hda snd_hda_ext_core
> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel
> snd_compress btusb soundwire_bus btrtl kvm btbcm snd_hda_intel btintel
> snd_intel_dspcfg btmtk dell_laptop snd_intel_sdw_acpi irqbypass
> ledtrig_audio bluetooth snd_hda_codec i915 snd_hda_core rapl mei_hdcp
> intel_rapl_msr snd_hwdep processor_thermal_device_pci dell_wmi joydev
> hid_sensor_als intel_cstate jitterentropy_rng processor_thermal_device
> snd_pcm hid_sensor_trigger processor_thermal_rfim dell_smbios
> ucsi_acpi dcdbas hid_sensor_iio_common processor_thermal_mbox
> drm_buddy intel_uncore iwlmvm pcspkr drbg iTCO_wdt typec_ucsi
> dell_wmi_sysman snd_timer industrialio_triggered_buffer
> drm_display_helper processor_thermal_rapl mei_me dell_wmi_descriptor
> firmware_attributes_class kfifo_buf wmi_bmof ansi_cprng intel_pmc_bxt
> cec snd roles intel_rapl_common ecdh_generic iTCO_vendor_support
> int3403_thermal watchdog ecc industrialio mei soundcore typec
> int3400_thermal rc_core mac80211
> [ 22.662253] int340x_thermal_zone intel_pmc_core button intel_hid
> acpi_thermal_rel sparse_keymap ttm acpi_pad acpi_tad drm_kms_helper
> libarc4 igen6_edac i2c_algo_bit ac evdev hid_multitouch serio_raw
> iwlwifi cfg80211 rfkill msr parport_pc ppdev lp drm parport fuse loop
> efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16
> mbcache jbd2 crc32c_generic usbhid hid_sensor_custom hid_sensor_hub
> dm_crypt dm_mod intel_ishtp_hid nvme nvme_core t10_pi
> crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic
> crc64 ahci libahci crct10dif_pclmul crct10dif_common libata
> crc32_pclmul crc32c_intel scsi_mod spi_pxa2xx_platform
> ghash_clmulni_intel dw_dmac hid_generic sha512_ssse3 scsi_common
> dw_dmac_core xhci_pci sha512_generic sha256_ssse3 xhci_hcd sha1_ssse3
> usbcore i2c_hid_acpi intel_lpss_pci aesni_intel video intel_ish_ipc
> i2c_i801 i2c_hid intel_lpss psmouse thunderbolt crypto_simd cryptd
> i2c_smbus vmd intel_ishtp usb_common idma64 hid battery wmi
> [ 22.662422] ---[ end trace 0000000000000000 ]---
>
> Cheers,
>
> Jeremy


#regzbot ^introduced 6083089ab0063
#regzbot title Bluetooth kernel BUG with Intel AX211
#regzbot duplicate:
https://lore.kernel.org/all/[email protected]/
#regzbot ignore-activit

Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 22.04.24 11:56, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi stable team (and Bluetooth maintainers), I noticed a regression
> report about a BT problem in 6.1.y:

Hmmm. Nothing happened since then (or I missed it). So it seems the
Bluetooth maintainers don't care about stable specific problems (they
are free to do so!) or are busy with other work (happens!).

So we either need to find the cause (likely a missing backport) through
some other way or maybe revert the culprit in the 6.1.y series. Jeremy,
did you try if the latter is an option? If not: could you do that
please? And could you also try cherry-pikcing c7eaf80bfb0c8c
("Bluetooth: Fix hci_link_tx_to RCU lock usage") [v6.6-rc5] into 6.1.y
helps? It's just a wild guess, but it contains a Fixes: tag for the
commit in question.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> On 21.04.24 15:54, Jeremy Lainé wrote:
>>
>> After upgrading my kernel to Debian's latest version (6.1.85), I
>> started encountering systematic kernel BUGs at boot, making the
>> bluetooth stack unusable. I initially reported this to Debian's bug
>> tracker:
>>
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301
>>
>> .. but have since confirmed that this is reproducible with vanilla
>> kernels, including the latest 6.1.y version (6.1.87).
>>
>> I tried various kernel versions (straight from kernel.org) to pinpoint
>> when the problem started occurring and the resultats are:
>
> Jeremy later wrote:
>> # first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
>> Bluetooth: hci_conn: Consolidate code for aborting connections
> https://lore.kernel.org/all/[email protected]/
>
> That's a13f316e90fdb1 ("Bluetooth: hci_conn: Consolidate code for
> aborting connections") [v6.6-rc1, v6.1.83 (6083089ab00631)]
>
> FWIW, there is a fix for the mainline commit under review:
> https://lore.kernel.org/all/[email protected]/
>
> But it is likely unrelated, as Jeremy later also wrote:
>> I'm now running 6.9-rc5 and have not been able to reproduce the issue,
> https://lore.kernel.org/all/CADRbXaA2yFjMo=_8_ZTubPbrrmWH9yx+aG5pUadnk395koonXg@mail.gmail.com/
>
> Makes me wonder if 6.1.y is missing some other change a13f316e90fdb1
> depends on.
>
> Ciao, Thorsten
>
>> I have included a trace below, and full system details are available
>> in the Debian bug listed above. Can you suggest any other tests I can
>> perform to help diagnose the origin of the problem?
>>
>> [ 22.660847] list_del corruption, ffff94d9f6302000->prev is
>> LIST_POISON2 (dead000000000122)
>> [ 22.660887] ------------[ cut here ]------------
>> [ 22.660890] kernel BUG at lib/list_debug.c:56!
>> [ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted
>> 6.1.0-20-amd64 #1 Debian 6.1.85-1
>> [ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
>> [ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
>> [ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a
>> 9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00
>> a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48
>> 89 fe
>> [ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
>> [ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
>> [ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
>> [ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
>> [ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
>> [ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
>> [ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000)
>> knlGS:0000000000000000
>> [ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
>> [ 22.661214] PKRU: 55555554
>> [ 22.661218] Call Trace:
>> [ 22.661225] <TASK>
>> [ 22.661232] ? __die_body.cold+0x1a/0x1f
>> [ 22.661246] ? die+0x2a/0x50
>> [ 22.661257] ? do_trap+0xc5/0x110
>> [ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661279] ? do_error_trap+0x6a/0x90
>> [ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661298] ? exc_invalid_op+0x4c/0x60
>> [ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661316] ? asm_exc_invalid_op+0x16/0x20
>> [ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
>> [ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
>> [ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
>> [ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
>> [ 22.661915] process_one_work+0x1c4/0x380
>> [ 22.661929] worker_thread+0x4d/0x380
>> [ 22.661940] ? rescuer_thread+0x3a0/0x3a0
>> [ 22.661950] kthread+0xd7/0x100
>> [ 22.661959] ? kthread_complete_and_exit+0x20/0x20
>> [ 22.661969] ret_from_fork+0x1f/0x30
>> [ 22.661984] </TASK>
>> [ 22.661987] Modules linked in: ctr ccm nft_chain_nat xt_MASQUERADE
>> nf_nat nf_conntrack_netlink br_netfilter bridge stp llc xfrm_user
>> xfrm_algo nvme_fabrics rfcomm snd_seq_dummy snd_hrtimer snd_seq
>> snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led
>> snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_sof_probes
>> snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw
>> regmap_sdw_mbq snd_hda_codec_hdmi regmap_sdw overlay ip6t_REJECT
>> nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4
>> xt_LOG qrtr nf_log_syslog nft_limit bnep ipmi_devintf ipmi_msghandler
>> xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack
>> nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink
>> binfmt_misc nls_ascii nls_cp437 vfat fat x86_pkg_temp_thermal
>> intel_powerclamp coretemp snd_soc_dmic snd_sof_pci_intel_tgl
>> snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation
>> soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp
>> snd_sof snd_sof_utils
>> [ 22.662122] snd_soc_hdac_hda snd_hda_ext_core
>> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel
>> snd_compress btusb soundwire_bus btrtl kvm btbcm snd_hda_intel btintel
>> snd_intel_dspcfg btmtk dell_laptop snd_intel_sdw_acpi irqbypass
>> ledtrig_audio bluetooth snd_hda_codec i915 snd_hda_core rapl mei_hdcp
>> intel_rapl_msr snd_hwdep processor_thermal_device_pci dell_wmi joydev
>> hid_sensor_als intel_cstate jitterentropy_rng processor_thermal_device
>> snd_pcm hid_sensor_trigger processor_thermal_rfim dell_smbios
>> ucsi_acpi dcdbas hid_sensor_iio_common processor_thermal_mbox
>> drm_buddy intel_uncore iwlmvm pcspkr drbg iTCO_wdt typec_ucsi
>> dell_wmi_sysman snd_timer industrialio_triggered_buffer
>> drm_display_helper processor_thermal_rapl mei_me dell_wmi_descriptor
>> firmware_attributes_class kfifo_buf wmi_bmof ansi_cprng intel_pmc_bxt
>> cec snd roles intel_rapl_common ecdh_generic iTCO_vendor_support
>> int3403_thermal watchdog ecc industrialio mei soundcore typec
>> int3400_thermal rc_core mac80211
>> [ 22.662253] int340x_thermal_zone intel_pmc_core button intel_hid
>> acpi_thermal_rel sparse_keymap ttm acpi_pad acpi_tad drm_kms_helper
>> libarc4 igen6_edac i2c_algo_bit ac evdev hid_multitouch serio_raw
>> iwlwifi cfg80211 rfkill msr parport_pc ppdev lp drm parport fuse loop
>> efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16
>> mbcache jbd2 crc32c_generic usbhid hid_sensor_custom hid_sensor_hub
>> dm_crypt dm_mod intel_ishtp_hid nvme nvme_core t10_pi
>> crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic
>> crc64 ahci libahci crct10dif_pclmul crct10dif_common libata
>> crc32_pclmul crc32c_intel scsi_mod spi_pxa2xx_platform
>> ghash_clmulni_intel dw_dmac hid_generic sha512_ssse3 scsi_common
>> dw_dmac_core xhci_pci sha512_generic sha256_ssse3 xhci_hcd sha1_ssse3
>> usbcore i2c_hid_acpi intel_lpss_pci aesni_intel video intel_ish_ipc
>> i2c_i801 i2c_hid intel_lpss psmouse thunderbolt crypto_simd cryptd
>> i2c_smbus vmd intel_ishtp usb_common idma64 hid battery wmi
>> [ 22.662422] ---[ end trace 0000000000000000 ]---
>>
>> Cheers,
>>
>> Jeremy
>
>
> #regzbot ^introduced 6083089ab0063
> #regzbot title Bluetooth kernel BUG with Intel AX211
> #regzbot duplicate:
> https://lore.kernel.org/all/[email protected]/
> #regzbot ignore-activit
>
>

2024-04-29 18:28:36

by Jeremy Lainé

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi Thorsten,

On Mon, Apr 29, 2024 at 12:24 PM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> So we either need to find the cause (likely a missing backport) through
> some other way or maybe revert the culprit in the 6.1.y series. Jeremy,
> did you try if the latter is an option? If not: could you do that
> please? And could you also try cherry-pikcing c7eaf80bfb0c8c
> ("Bluetooth: Fix hci_link_tx_to RCU lock usage") [v6.6-rc5] into 6.1.y
> helps? It's just a wild guess, but it contains a Fixes: tag for the
> commit in question.

I gave it a try, and sadly I'm still hitting the exact same bug when I
cherry-pick the patch you mentioned on top of 6.1.y (at tag v6.1.87).

Thanks for trying, is there any other patch that looks like a good candidate?

Jeremy

Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 29.04.24 20:28, Jeremy Lainé wrote:
>
> On Mon, Apr 29, 2024 at 12:24 PM Linux regression tracking (Thorsten
> Leemhuis) <[email protected]> wrote:
>>
>> So we either need to find the cause (likely a missing backport) through
>> some other way or maybe revert the culprit in the 6.1.y series. Jeremy,
>> did you try if the latter is an option? If not: could you do that
>> please? And could you also try cherry-pikcing c7eaf80bfb0c8c
>> ("Bluetooth: Fix hci_link_tx_to RCU lock usage") [v6.6-rc5] into 6.1.y
>> helps? It's just a wild guess, but it contains a Fixes: tag for the
>> commit in question.
>
> I gave it a try, and sadly I'm still hitting the exact same bug when I
> cherry-pick the patch you mentioned on top of 6.1.y (at tag v6.1.87).
>
> Thanks for trying, is there any other patch that looks like a good candidate?

Well, did you try what I suggested earlier (see above) and check if a
revert of 6083089ab00631617f9eac678df3ab050a9d837a ontop of latest 6.1.y
helps?

Ciao, Thorsten