2024-04-21 13:54:47

by Jeremy Lainé

[permalink] [raw]
Subject: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hello!

After upgrading my kernel to Debian's latest version (6.1.85), I
started encountering systematic kernel BUGs at boot, making the
bluetooth stack unusable. I initially reported this to Debian's bug
tracker:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301

.. but have since confirmed that this is reproducible with vanilla
kernels, including the latest 6.1.y version (6.1.87).

I tried various kernel versions (straight from kernel.org) to pinpoint
when the problem started occurring and the resultats are:

- linux 6.1.80 => OK
- linux 6.1.82 => OK
- linux 6.1.83 => BUG
- linux 6.1.85 => BUG
- linux 6.1.87 => BUG

I have included a trace below, and full system details are available
in the Debian bug listed above. Can you suggest any other tests I can
perform to help diagnose the origin of the problem?

[ 22.660847] list_del corruption, ffff94d9f6302000->prev is
LIST_POISON2 (dead000000000122)
[ 22.660887] ------------[ cut here ]------------
[ 22.660890] kernel BUG at lib/list_debug.c:56!
[ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted
6.1.0-20-amd64 #1 Debian 6.1.85-1
[ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
[ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
[ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
[ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a
9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00
a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48
89 fe
[ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
[ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
[ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
[ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
[ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
[ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
[ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000)
knlGS:0000000000000000
[ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
[ 22.661214] PKRU: 55555554
[ 22.661218] Call Trace:
[ 22.661225] <TASK>
[ 22.661232] ? __die_body.cold+0x1a/0x1f
[ 22.661246] ? die+0x2a/0x50
[ 22.661257] ? do_trap+0xc5/0x110
[ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661279] ? do_error_trap+0x6a/0x90
[ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661298] ? exc_invalid_op+0x4c/0x60
[ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661316] ? asm_exc_invalid_op+0x16/0x20
[ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
[ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
[ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
[ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
[ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
[ 22.661915] process_one_work+0x1c4/0x380
[ 22.661929] worker_thread+0x4d/0x380
[ 22.661940] ? rescuer_thread+0x3a0/0x3a0
[ 22.661950] kthread+0xd7/0x100
[ 22.661959] ? kthread_complete_and_exit+0x20/0x20
[ 22.661969] ret_from_fork+0x1f/0x30
[ 22.661984] </TASK>
[ 22.661987] Modules linked in: ctr ccm nft_chain_nat xt_MASQUERADE
nf_nat nf_conntrack_netlink br_netfilter bridge stp llc xfrm_user
xfrm_algo nvme_fabrics rfcomm snd_seq_dummy snd_hrtimer snd_seq
snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led
snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_sof_probes
snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw
regmap_sdw_mbq snd_hda_codec_hdmi regmap_sdw overlay ip6t_REJECT
nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4
xt_LOG qrtr nf_log_syslog nft_limit bnep ipmi_devintf ipmi_msghandler
xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink
binfmt_misc nls_ascii nls_cp437 vfat fat x86_pkg_temp_thermal
intel_powerclamp coretemp snd_soc_dmic snd_sof_pci_intel_tgl
snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation
soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp
snd_sof snd_sof_utils
[ 22.662122] snd_soc_hdac_hda snd_hda_ext_core
snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel
snd_compress btusb soundwire_bus btrtl kvm btbcm snd_hda_intel btintel
snd_intel_dspcfg btmtk dell_laptop snd_intel_sdw_acpi irqbypass
ledtrig_audio bluetooth snd_hda_codec i915 snd_hda_core rapl mei_hdcp
intel_rapl_msr snd_hwdep processor_thermal_device_pci dell_wmi joydev
hid_sensor_als intel_cstate jitterentropy_rng processor_thermal_device
snd_pcm hid_sensor_trigger processor_thermal_rfim dell_smbios
ucsi_acpi dcdbas hid_sensor_iio_common processor_thermal_mbox
drm_buddy intel_uncore iwlmvm pcspkr drbg iTCO_wdt typec_ucsi
dell_wmi_sysman snd_timer industrialio_triggered_buffer
drm_display_helper processor_thermal_rapl mei_me dell_wmi_descriptor
firmware_attributes_class kfifo_buf wmi_bmof ansi_cprng intel_pmc_bxt
cec snd roles intel_rapl_common ecdh_generic iTCO_vendor_support
int3403_thermal watchdog ecc industrialio mei soundcore typec
int3400_thermal rc_core mac80211
[ 22.662253] int340x_thermal_zone intel_pmc_core button intel_hid
acpi_thermal_rel sparse_keymap ttm acpi_pad acpi_tad drm_kms_helper
libarc4 igen6_edac i2c_algo_bit ac evdev hid_multitouch serio_raw
iwlwifi cfg80211 rfkill msr parport_pc ppdev lp drm parport fuse loop
efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16
mbcache jbd2 crc32c_generic usbhid hid_sensor_custom hid_sensor_hub
dm_crypt dm_mod intel_ishtp_hid nvme nvme_core t10_pi
crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic
crc64 ahci libahci crct10dif_pclmul crct10dif_common libata
crc32_pclmul crc32c_intel scsi_mod spi_pxa2xx_platform
ghash_clmulni_intel dw_dmac hid_generic sha512_ssse3 scsi_common
dw_dmac_core xhci_pci sha512_generic sha256_ssse3 xhci_hcd sha1_ssse3
usbcore i2c_hid_acpi intel_lpss_pci aesni_intel video intel_ish_ipc
i2c_i801 i2c_hid intel_lpss psmouse thunderbolt crypto_simd cryptd
i2c_smbus vmd intel_ishtp usb_common idma64 hid battery wmi
[ 22.662422] ---[ end trace 0000000000000000 ]---

Cheers,

Jeremy


2024-04-21 21:01:23

by Paul Menzel

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

#regzbot introduced: v6.1.82..v6.1.83


Dear Jeremy,


Am 21.04.24 um 15:54 schrieb Jeremy Lainé:

> After upgrading my kernel to Debian's latest version (6.1.85), I
> started encountering systematic kernel BUGs at boot, making the
> bluetooth stack unusable. I initially reported this to Debian's bug
> tracker:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301
>
> .. but have since confirmed that this is reproducible with vanilla
> kernels, including the latest 6.1.y version (6.1.87).

Thank you for reporting this and taking the time to pinpoint the version.

> I tried various kernel versions (straight from kernel.org) to pinpoint
> when the problem started occurring and the resultats are:
>
> - linux 6.1.80 => OK
> - linux 6.1.82 => OK
> - linux 6.1.83 => BUG
> - linux 6.1.85 => BUG
> - linux 6.1.87 => BUG
>
> I have included a trace below, and full system details are available
> in the Debian bug listed above. Can you suggest any other tests I can
> perform to help diagnose the origin of the problem?

Would you be so kind to go the extra mail, and bisect the commit between
6.1.82 and 6.1.83 [1]?

> [ 22.660847] list_del corruption, ffff94d9f6302000->prev is LIST_POISON2 (dead000000000122)
> [ 22.660887] ------------[ cut here ]------------
> [ 22.660890] kernel BUG at lib/list_debug.c:56!
> [ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted 6.1.0-20-amd64 #1 Debian 6.1.85-1
> [ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
> [ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> [ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a 9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00 a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48 89 fe
> [ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
> [ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
> [ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
> [ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
> [ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
> [ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
> [ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000) knlGS:0000000000000000
> [ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
> [ 22.661214] PKRU: 55555554
> [ 22.661218] Call Trace:
> [ 22.661225] <TASK>
> [ 22.661232] ? __die_body.cold+0x1a/0x1f
> [ 22.661246] ? die+0x2a/0x50
> [ 22.661257] ? do_trap+0xc5/0x110
> [ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661279] ? do_error_trap+0x6a/0x90
> [ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661298] ? exc_invalid_op+0x4c/0x60
> [ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661316] ? asm_exc_invalid_op+0x16/0x20
> [ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
> [ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
> [ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
> [ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
> [ 22.661915] process_one_work+0x1c4/0x380
> [ 22.661929] worker_thread+0x4d/0x380
> [ 22.661940] ? rescuer_thread+0x3a0/0x3a0
> [ 22.661950] kthread+0xd7/0x100
> [ 22.661959] ? kthread_complete_and_exit+0x20/0x20
> [ 22.661969] ret_from_fork+0x1f/0x30
> [ 22.661984] </TASK>

You can pipe the output through `scripts/decodecode` and it should show
more information.

[…]


Kind regards,

Paul


[1]:
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html

2024-04-21 23:17:31

by Jeremy Lainé

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi Paul,

On Sun, Apr 21, 2024 at 11:01 PM Paul Menzel <[email protected]> wrote:
>
> Would you be so kind to go the extra mail, and bisect the commit between
> 6.1.82 and 6.1.83 [1]?
>

Thanks for the link to the instructions, here's the bisect log:

git bisect start
# status: waiting for both good and bad commits
# good: [d7543167affd372819a94879b8b1e8b9b12547d9] Linux 6.1.82
git bisect good d7543167affd372819a94879b8b1e8b9b12547d9
# status: waiting for bad commit, 1 good commit known
# bad: [e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1] Linux 6.1.83
git bisect bad e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
# bad: [440e278cb53b8dd6627c32e84950350083c39d35] net: kcm: fix
incorrect parameter validation in the kcm_getsockopt) function
git bisect bad 440e278cb53b8dd6627c32e84950350083c39d35
# good: [a4116bd6ee5e1c1b65a61ed9221657615a2f45bf] arm64: dts:
imx8mm-kontron: Disable pull resistors for SD card signals on BL OSM-S
board
git bisect good a4116bd6ee5e1c1b65a61ed9221657615a2f45bf
# good: [e16c33dd9967b7f20987bf653acc4f605836127b] net: mctp: copy skb
ext data when fragmenting
git bisect good e16c33dd9967b7f20987bf653acc4f605836127b
# bad: [6083089ab00631617f9eac678df3ab050a9d837a] Bluetooth: hci_conn:
Consolidate code for aborting connections
git bisect bad 6083089ab00631617f9eac678df3ab050a9d837a
# good: [934212a623cbab851848b6de377eb476718c3e4c] SUNRPC: fix some
memleaks in gssx_dec_option_array
git bisect good 934212a623cbab851848b6de377eb476718c3e4c
# good: [8499af0616cf76e6cbe811107e3f5b33bd472041] igb: Fix missing
time sync events
git bisect good 8499af0616cf76e6cbe811107e3f5b33bd472041
# good: [653a17a99d752ffde175d4bc96154f2a3642f400] Bluetooth: Remove
superfluous call to hci_conn_check_pending()
git bisect good 653a17a99d752ffde175d4bc96154f2a3642f400
# good: [1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84] Bluetooth: Cancel
sync command before suspend and power off
git bisect good 1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84
# good: [ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c] Bluetooth:
hci_sync: Only allow hci_cmd_sync_queue if running
git bisect good ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c
# first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
Bluetooth: hci_conn: Consolidate code for aborting connections


> You can pipe the output through `scripts/decodecode` and it should show
> more information.

This was the output of running the dmesg snippet through `scripts/decodecode`:

All code
========
0: fe (bad)
1: ff 0f decl (%rdi)
3: 0b 48 89 or -0x77(%rax),%ecx
6: f2 48 89 fe repnz mov %rdi,%rsi
a: 48 c7 c7 48 18 7a 9f mov $0xffffffff9f7a1848,%rdi
11: e8 14 a1 fe ff call 0xfffffffffffea12a
16: 0f 0b ud2
18: 48 89 fe mov %rdi,%rsi
1b: 48 89 ca mov %rcx,%rdx
1e: 48 c7 c7 10 18 7a 9f mov $0xffffffff9f7a1810,%rdi
25: e8 00 a1 fe ff call 0xfffffffffffea12a
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 89 fe mov %rdi,%rsi
2f: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
36: e8 ef a0 fe ff call 0xfffffffffffea12a
3b: 0f 0b ud2
3d: 48 89 fe mov %rdi,%rsi

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 89 fe mov %rdi,%rsi
5: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
c: e8 ef a0 fe ff call 0xfffffffffffea100
11: 0f 0b ud2
13: 48 89 fe mov %rdi,%rsi

Best regards,
Jeremy

2024-04-22 05:50:52

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 22.04.24 01:17, Jeremy Lainé wrote:
>
> On Sun, Apr 21, 2024 at 11:01 PM Paul Menzel <[email protected]> wrote:
>>
>> Would you be so kind to go the extra mail, and bisect the commit between
>> 6.1.82 and 6.1.83 [1]?
>>
>
> Thanks for the link to the instructions, here's the bisect log:

Thx! Did you also test if mainline (e.g. 6.9-rc5) is affected? Without
this we won't know if this is something the stable team or the regular
bluetooth developers have to handle.

Ciao, Thorsten

> git bisect start
> # status: waiting for both good and bad commits
> # good: [d7543167affd372819a94879b8b1e8b9b12547d9] Linux 6.1.82
> git bisect good d7543167affd372819a94879b8b1e8b9b12547d9
> # status: waiting for bad commit, 1 good commit known
> # bad: [e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1] Linux 6.1.83
> git bisect bad e5cd595e23c1a075359a337c0e5c3a4f2dc28dd1
> # bad: [440e278cb53b8dd6627c32e84950350083c39d35] net: kcm: fix
> incorrect parameter validation in the kcm_getsockopt) function
> git bisect bad 440e278cb53b8dd6627c32e84950350083c39d35
> # good: [a4116bd6ee5e1c1b65a61ed9221657615a2f45bf] arm64: dts:
> imx8mm-kontron: Disable pull resistors for SD card signals on BL OSM-S
> board
> git bisect good a4116bd6ee5e1c1b65a61ed9221657615a2f45bf
> # good: [e16c33dd9967b7f20987bf653acc4f605836127b] net: mctp: copy skb
> ext data when fragmenting
> git bisect good e16c33dd9967b7f20987bf653acc4f605836127b
> # bad: [6083089ab00631617f9eac678df3ab050a9d837a] Bluetooth: hci_conn:
> Consolidate code for aborting connections
> git bisect bad 6083089ab00631617f9eac678df3ab050a9d837a
> # good: [934212a623cbab851848b6de377eb476718c3e4c] SUNRPC: fix some
> memleaks in gssx_dec_option_array
> git bisect good 934212a623cbab851848b6de377eb476718c3e4c
> # good: [8499af0616cf76e6cbe811107e3f5b33bd472041] igb: Fix missing
> time sync events
> git bisect good 8499af0616cf76e6cbe811107e3f5b33bd472041
> # good: [653a17a99d752ffde175d4bc96154f2a3642f400] Bluetooth: Remove
> superfluous call to hci_conn_check_pending()
> git bisect good 653a17a99d752ffde175d4bc96154f2a3642f400
> # good: [1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84] Bluetooth: Cancel
> sync command before suspend and power off
> git bisect good 1023de27cd1d0d692e70fe6d6d5cee9fff9b9c84
> # good: [ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c] Bluetooth:
> hci_sync: Only allow hci_cmd_sync_queue if running
> git bisect good ac7a47aaa7944efc94e4fc23cc438b7bd9cc222c
> # first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
> Bluetooth: hci_conn: Consolidate code for aborting connections
>
>
>> You can pipe the output through `scripts/decodecode` and it should show
>> more information.
>
> This was the output of running the dmesg snippet through `scripts/decodecode`:
>
> All code
> ========
> 0: fe (bad)
> 1: ff 0f decl (%rdi)
> 3: 0b 48 89 or -0x77(%rax),%ecx
> 6: f2 48 89 fe repnz mov %rdi,%rsi
> a: 48 c7 c7 48 18 7a 9f mov $0xffffffff9f7a1848,%rdi
> 11: e8 14 a1 fe ff call 0xfffffffffffea12a
> 16: 0f 0b ud2
> 18: 48 89 fe mov %rdi,%rsi
> 1b: 48 89 ca mov %rcx,%rdx
> 1e: 48 c7 c7 10 18 7a 9f mov $0xffffffff9f7a1810,%rdi
> 25: e8 00 a1 fe ff call 0xfffffffffffea12a
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: 48 89 fe mov %rdi,%rsi
> 2f: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
> 36: e8 ef a0 fe ff call 0xfffffffffffea12a
> 3b: 0f 0b ud2
> 3d: 48 89 fe mov %rdi,%rsi
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 48 89 fe mov %rdi,%rsi
> 5: 48 c7 c7 d8 17 7a 9f mov $0xffffffff9f7a17d8,%rdi
> c: e8 ef a0 fe ff call 0xfffffffffffea100
> 11: 0f 0b ud2
> 13: 48 89 fe mov %rdi,%rsi
>
> Best regards,
> Jeremy
>
>

2024-04-22 08:27:40

by Jeremy Lainé

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi Thorsten,

On Mon, Apr 22, 2024 at 7:41 AM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> Thx! Did you also test if mainline (e.g. 6.9-rc5) is affected? Without
> this we won't know if this is something the stable team or the regular
> bluetooth developers have to handle.

I'm now running 6.9-rc5 and have not been able to reproduce the issue,
so it does sound like it's an issue for the stable team.

Cheers,
Jeremy

2024-04-22 09:57:33

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi stable team (and Bluetooth maintainers), I noticed a regression
report about a BT problem in 6.1.y:

On 21.04.24 15:54, Jeremy Lainé wrote:
>
> After upgrading my kernel to Debian's latest version (6.1.85), I
> started encountering systematic kernel BUGs at boot, making the
> bluetooth stack unusable. I initially reported this to Debian's bug
> tracker:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301
>
> .. but have since confirmed that this is reproducible with vanilla
> kernels, including the latest 6.1.y version (6.1.87).
>
> I tried various kernel versions (straight from kernel.org) to pinpoint
> when the problem started occurring and the resultats are:

Jeremy later wrote:
> # first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
> Bluetooth: hci_conn: Consolidate code for aborting connections
https://lore.kernel.org/all/[email protected]/

That's a13f316e90fdb1 ("Bluetooth: hci_conn: Consolidate code for
aborting connections") [v6.6-rc1, v6.1.83 (6083089ab00631)]

FWIW, there is a fix for the mainline commit under review:
https://lore.kernel.org/all/[email protected]/

But it is likely unrelated, as Jeremy later also wrote:
> I'm now running 6.9-rc5 and have not been able to reproduce the issue,
https://lore.kernel.org/all/CADRbXaA2yFjMo=_8_ZTubPbrrmWH9yx+aG5pUadnk395koonXg@mail.gmail.com/

Makes me wonder if 6.1.y is missing some other change a13f316e90fdb1
depends on.

Ciao, Thorsten

> I have included a trace below, and full system details are available
> in the Debian bug listed above. Can you suggest any other tests I can
> perform to help diagnose the origin of the problem?
>
> [ 22.660847] list_del corruption, ffff94d9f6302000->prev is
> LIST_POISON2 (dead000000000122)
> [ 22.660887] ------------[ cut here ]------------
> [ 22.660890] kernel BUG at lib/list_debug.c:56!
> [ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted
> 6.1.0-20-amd64 #1 Debian 6.1.85-1
> [ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
> [ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
> [ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a
> 9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00
> a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48
> 89 fe
> [ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
> [ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
> [ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
> [ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
> [ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
> [ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
> [ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000)
> knlGS:0000000000000000
> [ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
> [ 22.661214] PKRU: 55555554
> [ 22.661218] Call Trace:
> [ 22.661225] <TASK>
> [ 22.661232] ? __die_body.cold+0x1a/0x1f
> [ 22.661246] ? die+0x2a/0x50
> [ 22.661257] ? do_trap+0xc5/0x110
> [ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661279] ? do_error_trap+0x6a/0x90
> [ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661298] ? exc_invalid_op+0x4c/0x60
> [ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661316] ? asm_exc_invalid_op+0x16/0x20
> [ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
> [ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
> [ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
> [ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
> [ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
> [ 22.661915] process_one_work+0x1c4/0x380
> [ 22.661929] worker_thread+0x4d/0x380
> [ 22.661940] ? rescuer_thread+0x3a0/0x3a0
> [ 22.661950] kthread+0xd7/0x100
> [ 22.661959] ? kthread_complete_and_exit+0x20/0x20
> [ 22.661969] ret_from_fork+0x1f/0x30
> [ 22.661984] </TASK>
> [ 22.661987] Modules linked in: ctr ccm nft_chain_nat xt_MASQUERADE
> nf_nat nf_conntrack_netlink br_netfilter bridge stp llc xfrm_user
> xfrm_algo nvme_fabrics rfcomm snd_seq_dummy snd_hrtimer snd_seq
> snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led
> snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_sof_probes
> snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw
> regmap_sdw_mbq snd_hda_codec_hdmi regmap_sdw overlay ip6t_REJECT
> nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4
> xt_LOG qrtr nf_log_syslog nft_limit bnep ipmi_devintf ipmi_msghandler
> xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink
> binfmt_misc nls_ascii nls_cp437 vfat fat x86_pkg_temp_thermal
> intel_powerclamp coretemp snd_soc_dmic snd_sof_pci_intel_tgl
> snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation
> soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp
> snd_sof snd_sof_utils
> [ 22.662122] snd_soc_hdac_hda snd_hda_ext_core
> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel
> snd_compress btusb soundwire_bus btrtl kvm btbcm snd_hda_intel btintel
> snd_intel_dspcfg btmtk dell_laptop snd_intel_sdw_acpi irqbypass
> ledtrig_audio bluetooth snd_hda_codec i915 snd_hda_core rapl mei_hdcp
> intel_rapl_msr snd_hwdep processor_thermal_device_pci dell_wmi joydev
> hid_sensor_als intel_cstate jitterentropy_rng processor_thermal_device
> snd_pcm hid_sensor_trigger processor_thermal_rfim dell_smbios
> ucsi_acpi dcdbas hid_sensor_iio_common processor_thermal_mbox
> drm_buddy intel_uncore iwlmvm pcspkr drbg iTCO_wdt typec_ucsi
> dell_wmi_sysman snd_timer industrialio_triggered_buffer
> drm_display_helper processor_thermal_rapl mei_me dell_wmi_descriptor
> firmware_attributes_class kfifo_buf wmi_bmof ansi_cprng intel_pmc_bxt
> cec snd roles intel_rapl_common ecdh_generic iTCO_vendor_support
> int3403_thermal watchdog ecc industrialio mei soundcore typec
> int3400_thermal rc_core mac80211
> [ 22.662253] int340x_thermal_zone intel_pmc_core button intel_hid
> acpi_thermal_rel sparse_keymap ttm acpi_pad acpi_tad drm_kms_helper
> libarc4 igen6_edac i2c_algo_bit ac evdev hid_multitouch serio_raw
> iwlwifi cfg80211 rfkill msr parport_pc ppdev lp drm parport fuse loop
> efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16
> mbcache jbd2 crc32c_generic usbhid hid_sensor_custom hid_sensor_hub
> dm_crypt dm_mod intel_ishtp_hid nvme nvme_core t10_pi
> crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic
> crc64 ahci libahci crct10dif_pclmul crct10dif_common libata
> crc32_pclmul crc32c_intel scsi_mod spi_pxa2xx_platform
> ghash_clmulni_intel dw_dmac hid_generic sha512_ssse3 scsi_common
> dw_dmac_core xhci_pci sha512_generic sha256_ssse3 xhci_hcd sha1_ssse3
> usbcore i2c_hid_acpi intel_lpss_pci aesni_intel video intel_ish_ipc
> i2c_i801 i2c_hid intel_lpss psmouse thunderbolt crypto_simd cryptd
> i2c_smbus vmd intel_ishtp usb_common idma64 hid battery wmi
> [ 22.662422] ---[ end trace 0000000000000000 ]---
>
> Cheers,
>
> Jeremy


#regzbot ^introduced 6083089ab0063
#regzbot title Bluetooth kernel BUG with Intel AX211
#regzbot duplicate:
https://lore.kernel.org/all/[email protected]/
#regzbot ignore-activit

2024-04-29 10:25:07

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 22.04.24 11:56, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi stable team (and Bluetooth maintainers), I noticed a regression
> report about a BT problem in 6.1.y:

Hmmm. Nothing happened since then (or I missed it). So it seems the
Bluetooth maintainers don't care about stable specific problems (they
are free to do so!) or are busy with other work (happens!).

So we either need to find the cause (likely a missing backport) through
some other way or maybe revert the culprit in the 6.1.y series. Jeremy,
did you try if the latter is an option? If not: could you do that
please? And could you also try cherry-pikcing c7eaf80bfb0c8c
("Bluetooth: Fix hci_link_tx_to RCU lock usage") [v6.6-rc5] into 6.1.y
helps? It's just a wild guess, but it contains a Fixes: tag for the
commit in question.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> On 21.04.24 15:54, Jeremy Lainé wrote:
>>
>> After upgrading my kernel to Debian's latest version (6.1.85), I
>> started encountering systematic kernel BUGs at boot, making the
>> bluetooth stack unusable. I initially reported this to Debian's bug
>> tracker:
>>
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069301
>>
>> .. but have since confirmed that this is reproducible with vanilla
>> kernels, including the latest 6.1.y version (6.1.87).
>>
>> I tried various kernel versions (straight from kernel.org) to pinpoint
>> when the problem started occurring and the resultats are:
>
> Jeremy later wrote:
>> # first bad commit: [6083089ab00631617f9eac678df3ab050a9d837a]
>> Bluetooth: hci_conn: Consolidate code for aborting connections
> https://lore.kernel.org/all/[email protected]/
>
> That's a13f316e90fdb1 ("Bluetooth: hci_conn: Consolidate code for
> aborting connections") [v6.6-rc1, v6.1.83 (6083089ab00631)]
>
> FWIW, there is a fix for the mainline commit under review:
> https://lore.kernel.org/all/[email protected]/
>
> But it is likely unrelated, as Jeremy later also wrote:
>> I'm now running 6.9-rc5 and have not been able to reproduce the issue,
> https://lore.kernel.org/all/CADRbXaA2yFjMo=_8_ZTubPbrrmWH9yx+aG5pUadnk395koonXg@mail.gmail.com/
>
> Makes me wonder if 6.1.y is missing some other change a13f316e90fdb1
> depends on.
>
> Ciao, Thorsten
>
>> I have included a trace below, and full system details are available
>> in the Debian bug listed above. Can you suggest any other tests I can
>> perform to help diagnose the origin of the problem?
>>
>> [ 22.660847] list_del corruption, ffff94d9f6302000->prev is
>> LIST_POISON2 (dead000000000122)
>> [ 22.660887] ------------[ cut here ]------------
>> [ 22.660890] kernel BUG at lib/list_debug.c:56!
>> [ 22.660907] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>> [ 22.660917] CPU: 10 PID: 139 Comm: kworker/u25:0 Not tainted
>> 6.1.0-20-amd64 #1 Debian 6.1.85-1
>> [ 22.660929] Hardware name: Dell Inc. XPS 9315/00KRKP, BIOS 1.19.1 03/14/2024
>> [ 22.660936] Workqueue: hci0 hci_cmd_sync_work [bluetooth]
>> [ 22.661128] RIP: 0010:__list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661147] Code: fe ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 48 18 7a
>> 9f e8 14 a1 fe ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 10 18 7a 9f e8 00
>> a1 fe ff <0f> 0b 48 89 fe 48 c7 c7 d8 17 7a 9f e8 ef a0 fe ff 0f 0b 48
>> 89 fe
>> [ 22.661156] RSP: 0000:ffffae0e406efde0 EFLAGS: 00010246
>> [ 22.661164] RAX: 000000000000004e RBX: ffff94d9f6302000 RCX: 0000000000000027
>> [ 22.661172] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff94dfaf8a03a0
>> [ 22.661177] RBP: ffff94d859392000 R08: 0000000000000000 R09: ffffae0e406efc78
>> [ 22.661182] R10: 0000000000000003 R11: ffffffff9fed4448 R12: ffff94d859392000
>> [ 22.661187] R13: ffff94d859392770 R14: ffff94d858cb9800 R15: dead000000000100
>> [ 22.661194] FS: 0000000000000000(0000) GS:ffff94dfaf880000(0000)
>> knlGS:0000000000000000
>> [ 22.661202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 22.661208] CR2: 00007f423c024038 CR3: 0000000799c04000 CR4: 0000000000750ee0
>> [ 22.661214] PKRU: 55555554
>> [ 22.661218] Call Trace:
>> [ 22.661225] <TASK>
>> [ 22.661232] ? __die_body.cold+0x1a/0x1f
>> [ 22.661246] ? die+0x2a/0x50
>> [ 22.661257] ? do_trap+0xc5/0x110
>> [ 22.661268] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661279] ? do_error_trap+0x6a/0x90
>> [ 22.661289] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661298] ? exc_invalid_op+0x4c/0x60
>> [ 22.661307] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661316] ? asm_exc_invalid_op+0x16/0x20
>> [ 22.661328] ? __list_del_entry_valid.cold+0x4b/0x6f
>> [ 22.661337] hci_conn_del+0x136/0x3e0 [bluetooth]
>> [ 22.661466] hci_abort_conn_sync+0xaa/0x230 [bluetooth]
>> [ 22.661632] ? abort_conn_sync+0x3d/0x70 [bluetooth]
>> [ 22.661751] hci_cmd_sync_work+0x9f/0x150 [bluetooth]
>> [ 22.661915] process_one_work+0x1c4/0x380
>> [ 22.661929] worker_thread+0x4d/0x380
>> [ 22.661940] ? rescuer_thread+0x3a0/0x3a0
>> [ 22.661950] kthread+0xd7/0x100
>> [ 22.661959] ? kthread_complete_and_exit+0x20/0x20
>> [ 22.661969] ret_from_fork+0x1f/0x30
>> [ 22.661984] </TASK>
>> [ 22.661987] Modules linked in: ctr ccm nft_chain_nat xt_MASQUERADE
>> nf_nat nf_conntrack_netlink br_netfilter bridge stp llc xfrm_user
>> xfrm_algo nvme_fabrics rfcomm snd_seq_dummy snd_hrtimer snd_seq
>> snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led
>> snd_soc_sof_sdw snd_soc_intel_hda_dsp_common snd_sof_probes
>> snd_soc_intel_sof_maxim_common snd_soc_rt715_sdca snd_soc_rt1316_sdw
>> regmap_sdw_mbq snd_hda_codec_hdmi regmap_sdw overlay ip6t_REJECT
>> nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4
>> xt_LOG qrtr nf_log_syslog nft_limit bnep ipmi_devintf ipmi_msghandler
>> xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack
>> nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink
>> binfmt_misc nls_ascii nls_cp437 vfat fat x86_pkg_temp_thermal
>> intel_powerclamp coretemp snd_soc_dmic snd_sof_pci_intel_tgl
>> snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation
>> soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp
>> snd_sof snd_sof_utils
>> [ 22.662122] snd_soc_hdac_hda snd_hda_ext_core
>> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core kvm_intel
>> snd_compress btusb soundwire_bus btrtl kvm btbcm snd_hda_intel btintel
>> snd_intel_dspcfg btmtk dell_laptop snd_intel_sdw_acpi irqbypass
>> ledtrig_audio bluetooth snd_hda_codec i915 snd_hda_core rapl mei_hdcp
>> intel_rapl_msr snd_hwdep processor_thermal_device_pci dell_wmi joydev
>> hid_sensor_als intel_cstate jitterentropy_rng processor_thermal_device
>> snd_pcm hid_sensor_trigger processor_thermal_rfim dell_smbios
>> ucsi_acpi dcdbas hid_sensor_iio_common processor_thermal_mbox
>> drm_buddy intel_uncore iwlmvm pcspkr drbg iTCO_wdt typec_ucsi
>> dell_wmi_sysman snd_timer industrialio_triggered_buffer
>> drm_display_helper processor_thermal_rapl mei_me dell_wmi_descriptor
>> firmware_attributes_class kfifo_buf wmi_bmof ansi_cprng intel_pmc_bxt
>> cec snd roles intel_rapl_common ecdh_generic iTCO_vendor_support
>> int3403_thermal watchdog ecc industrialio mei soundcore typec
>> int3400_thermal rc_core mac80211
>> [ 22.662253] int340x_thermal_zone intel_pmc_core button intel_hid
>> acpi_thermal_rel sparse_keymap ttm acpi_pad acpi_tad drm_kms_helper
>> libarc4 igen6_edac i2c_algo_bit ac evdev hid_multitouch serio_raw
>> iwlwifi cfg80211 rfkill msr parport_pc ppdev lp drm parport fuse loop
>> efi_pstore configfs efivarfs ip_tables x_tables autofs4 ext4 crc16
>> mbcache jbd2 crc32c_generic usbhid hid_sensor_custom hid_sensor_hub
>> dm_crypt dm_mod intel_ishtp_hid nvme nvme_core t10_pi
>> crc64_rocksoft_generic crc64_rocksoft crc_t10dif crct10dif_generic
>> crc64 ahci libahci crct10dif_pclmul crct10dif_common libata
>> crc32_pclmul crc32c_intel scsi_mod spi_pxa2xx_platform
>> ghash_clmulni_intel dw_dmac hid_generic sha512_ssse3 scsi_common
>> dw_dmac_core xhci_pci sha512_generic sha256_ssse3 xhci_hcd sha1_ssse3
>> usbcore i2c_hid_acpi intel_lpss_pci aesni_intel video intel_ish_ipc
>> i2c_i801 i2c_hid intel_lpss psmouse thunderbolt crypto_simd cryptd
>> i2c_smbus vmd intel_ishtp usb_common idma64 hid battery wmi
>> [ 22.662422] ---[ end trace 0000000000000000 ]---
>>
>> Cheers,
>>
>> Jeremy
>
>
> #regzbot ^introduced 6083089ab0063
> #regzbot title Bluetooth kernel BUG with Intel AX211
> #regzbot duplicate:
> https://lore.kernel.org/all/[email protected]/
> #regzbot ignore-activit
>
>

2024-04-29 18:28:36

by Jeremy Lainé

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

Hi Thorsten,

On Mon, Apr 29, 2024 at 12:24 PM Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> So we either need to find the cause (likely a missing backport) through
> some other way or maybe revert the culprit in the 6.1.y series. Jeremy,
> did you try if the latter is an option? If not: could you do that
> please? And could you also try cherry-pikcing c7eaf80bfb0c8c
> ("Bluetooth: Fix hci_link_tx_to RCU lock usage") [v6.6-rc5] into 6.1.y
> helps? It's just a wild guess, but it contains a Fixes: tag for the
> commit in question.

I gave it a try, and sadly I'm still hitting the exact same bug when I
cherry-pick the patch you mentioned on top of 6.1.y (at tag v6.1.87).

Thanks for trying, is there any other patch that looks like a good candidate?

Jeremy

2024-04-29 18:51:38

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 29.04.24 20:28, Jeremy Lainé wrote:
>
> On Mon, Apr 29, 2024 at 12:24 PM Linux regression tracking (Thorsten
> Leemhuis) <[email protected]> wrote:
>>
>> So we either need to find the cause (likely a missing backport) through
>> some other way or maybe revert the culprit in the 6.1.y series. Jeremy,
>> did you try if the latter is an option? If not: could you do that
>> please? And could you also try cherry-pikcing c7eaf80bfb0c8c
>> ("Bluetooth: Fix hci_link_tx_to RCU lock usage") [v6.6-rc5] into 6.1.y
>> helps? It's just a wild guess, but it contains a Fixes: tag for the
>> commit in question.
>
> I gave it a try, and sadly I'm still hitting the exact same bug when I
> cherry-pick the patch you mentioned on top of 6.1.y (at tag v6.1.87).
>
> Thanks for trying, is there any other patch that looks like a good candidate?

Well, did you try what I suggested earlier (see above) and check if a
revert of 6083089ab00631617f9eac678df3ab050a9d837a ontop of latest 6.1.y
helps?

Ciao, Thorsten

2024-05-28 21:02:40

by Mike

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 29.04.24 20:46, Linux regression tracking (Thorsten Leemhuis) wrote:>
Well, did you try what I suggested earlier (see above) and check if a
> revert of 6083089ab00631617f9eac678df3ab050a9d837a ontop of latest 6.1.y
> helps?

Hello Thorsten, Jeremy,
I hope you don't mind if I jump into the conversation trying to help.
I'm also experiencing this bug (with an Intel AX200) and I don't see
any updates in this thread since a month.

I tried reverting 6083089ab00631617f9eac678df3ab050a9d837a
on top of6.1.91 and it looks much better: it's been 10 days, and the BT
and the system are stable.
Previously, I encountered the mentioned "kernel BUG" at each boot, and I
was unable to stop/kill the bluetoothd process.
Let me know if/how I can help further.
M.

2024-05-29 09:07:10

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 28.05.24 22:54, Mike wrote:
> On 29.04.24 20:46, Linux regression tracking (Thorsten Leemhuis) wrote:>
>> Well, did you try what I suggested earlier (see above) and check if a
>> revert of 6083089ab00631617f9eac678df3ab050a9d837a ontop of latest 6.1.y
>> helps?
>
> I hope you don't mind if I jump into the conversation trying to help.

On the contrary, thx for providing the needed information!

> I tried reverting 6083089ab00631617f9eac678df3ab050a9d837a
> on top of6.1.91 and it looks much better: it's been 10 days, and the BT
> and the system are stable.
> Previously, I encountered the mentioned "kernel BUG" at each boot,

Might be a good idea to share it, the developers might want to confirm
it's really the same bug.

> and I was unable to stop/kill the bluetoothd process.
> Let me know if/how I can help further.

Jeremy Lainé already confirmed that 6.9-rc5[1] worked fine and that
another fix for the culprit did not help[2]. Therefore we just have
three options left:

1. test another fix for the culprit I found on lore -- but note, this is
just a shot in the dark
https://lore.kernel.org/all/[email protected]/

2. revert 6083089ab00631617f9eac678df3ab050a9d837a in 6.1.y if that is
still possible, does not create a even bigger regression, or leads to
some security vulnerability

3. motivate the BT developers to look into this (some other patch the
culprit depends on might be missing), even if this strictly speaking is
a problem they are free to ignore.

Maybe give "1." a try; then we'll ask Greg for "2.", unless this
discussion or something else leads to "3."

Ciao, Thorsten

[1]
https://lore.kernel.org/all/CADRbXaA2yFjMo=_8_ZTubPbrrmWH9yx+aG5pUadnk395koonXg@mail.gmail.com/
[2]
https://lore.kernel.org/all/CADRbXaBkkGmqnibGvcAF2YH5CjLRJ2bnnix1xKozKdw_Hv3qNg@mail.gmail.com/

2024-06-03 20:04:15

by Mike

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 29.05.24 11:06, Thorsten Leemhuis wrote:
> Might be a good idea to share it, the developers might want to confirm
> it's really the same bug.

I'm attaching the stacktrace [1] and decodecode [2] at the end, generated
on 6.1.92 vanilla+patch (1.).

> 1. test another fix for the culprit I found on lore -- but note, this is
> just a shot in the dark
> https://lore.kernel.org/all/[email protected]/

Looks like it was a miss :(
I tested the recent release version 6.1.92, and the bug is still
reproducible.
Interestingly, I encountered fewer occurrences with this release..
I then applied the patch mentioned (1.), but the bug is still
(immediately) reproducible.
The stack traces are the same for version 6.1.92, both with and without
the patch.

I understand that 6.9-rc5[1] worked fine, but I guess it will take some
time to be
included in Debian stable, so having a patch for 6.1.x will be much
appreciated.
I do not have the time to follow the vanilla (latest) release as is
likely the case for
many other Linux users.

Let me know if there's anything else useful I can do for you.
Thank you,
Mike

[1]
2024-06-03T21:04:49.730983+02:00 mike kernel: [   24.110172] kernel BUG
at lib/list_debug.c:56!
2024-06-03T21:04:49.730984+02:00 mike kernel: [   24.110181] invalid
opcode: 0000 [#1] PREEMPT SMP NOPTI
2024-06-03T21:04:49.730985+02:00 mike kernel: [   24.110184] CPU: 2 PID:
868 Comm: kworker/u65:2 Not tainted 6.1.92 #2
2024-06-03T21:04:49.730985+02:00 mike kernel: [   24.110187] Hardware
name: Micro-Star International Co., Ltd. MS-7B93/MPG X570 GAMING PRO
CARBON WIFI (MS-7B93), BIOS 1.M0 04/02/2024
2024-06-03T21:04:49.730986+02:00 mike kernel: [   24.110191] Workqueue:
hci0 hci_cmd_sync_work [bluetooth]
2024-06-03T21:04:49.730986+02:00 mike kernel: [   24.110234] RIP:
0010:__list_del_entry_valid.cold+0x4b/0x6f
2024-06-03T21:04:49.730987+02:00 mike kernel: [   24.110240] Code: fe ff
0f 0b 48 89 f2 48 89 fe 48 c7 c7 c0 2d fa a6 e8 07 a1 fe ff 0f 0b 48 89
fe 48 89 ca 48 c7 c7 88 2d fa a6 e8 f3 a0 fe ff <0f> 0b 48 89 fe 48 c7
c7 50 2d fa a6 e8 e2 a0 fe ff 0f 0b 48 89 fe
2024-06-03T21:04:49.730987+02:00 mike kernel: [   24.110243] RSP:
0018:ffffb5fe04863de0 EFLAGS: 00010246
2024-06-03T21:04:49.730988+02:00 mike kernel: [   24.110247] RAX:
000000000000004e RBX: ffff9bff53430800 RCX: 0000000000000027
2024-06-03T21:04:49.730988+02:00 mike kernel: [   24.110249] RDX:
0000000000000000 RSI: 0000000000000001 RDI: ffff9c064eaa03a0
2024-06-03T21:04:49.730988+02:00 mike kernel: [   24.110252] RBP:
ffff9bff4d2ce000 R08: 0000000000000000 R09: ffffb5fe04863c78
2024-06-03T21:04:49.730989+02:00 mike kernel: [   24.110254] R10:
0000000000000003 R11: ffff9c066f2fc3e8 R12: ffff9bff4d2ce000
2024-06-03T21:04:49.730997+02:00 mike kernel: [   24.110256] R13:
ffff9bff4d2ce770 R14: ffff9bff62e919c0 R15: dead000000000100
2024-06-03T21:04:49.730997+02:00 mike kernel: [   24.110259] FS:
0000000000000000(0000) GS:ffff9c064ea80000(0000) knlGS:0000000000000000
2024-06-03T21:04:49.730997+02:00 mike kernel: [   24.110262] CS: 0010
DS: 0000 ES: 0000 CR0: 0000000080050033
2024-06-03T21:04:49.730998+02:00 mike kernel: [   24.110265] CR2:
000055ff08f14638 CR3: 0000000169804000 CR4: 0000000000350ee0
2024-06-03T21:04:49.730998+02:00 mike kernel: [   24.110268] Call Trace:
2024-06-03T21:04:49.730999+02:00 mike kernel: [   24.110270] <TASK>
2024-06-03T21:04:49.730999+02:00 mike kernel: [   24.110273]  ?
__die_body.cold+0x1a/0x1f
2024-06-03T21:04:49.730999+02:00 mike kernel: [   24.110278]  ?
die+0x2a/0x50
2024-06-03T21:04:49.731000+02:00 mike kernel: [   24.110283]  ?
do_trap+0xc5/0x110
2024-06-03T21:04:49.731000+02:00 mike kernel: [   24.110287]  ?
__list_del_entry_valid.cold+0x4b/0x6f
2024-06-03T21:04:49.731000+02:00 mike kernel: [   24.110293]  ?
do_error_trap+0x6a/0x90
2024-06-03T21:04:49.731001+02:00 mike kernel: [   24.110296]  ?
__list_del_entry_valid.cold+0x4b/0x6f
2024-06-03T21:04:49.731002+02:00 mike kernel: [   24.110301]  ?
exc_invalid_op+0x4c/0x60
2024-06-03T21:04:49.731002+02:00 mike kernel: [   24.110305]  ?
__list_del_entry_valid.cold+0x4b/0x6f
2024-06-03T21:04:49.731002+02:00 mike kernel: [   24.110309]  ?
asm_exc_invalid_op+0x16/0x20
2024-06-03T21:04:49.731003+02:00 mike kernel: [   24.110316]  ?
__list_del_entry_valid.cold+0x4b/0x6f
2024-06-03T21:04:49.731003+02:00 mike kernel: [   24.110321]
hci_conn_del+0x136/0x3e0 [bluetooth]
2024-06-03T21:04:49.731003+02:00 mike kernel: [   24.110357]
hci_abort_conn_sync+0xaa/0x230 [bluetooth]
2024-06-03T21:04:49.731004+02:00 mike kernel: [   24.110395]  ?
srso_return_thunk+0x5/0x10
2024-06-03T21:04:49.731004+02:00 mike kernel: [   24.110399]  ?
abort_conn_sync+0x3d/0x70 [bluetooth]
2024-06-03T21:04:49.731004+02:00 mike kernel: [   24.110435]
hci_cmd_sync_work+0xa2/0x150 [bluetooth]
2024-06-03T21:04:49.731005+02:00 mike kernel: [   24.110471]
process_one_work+0x1c7/0x380
2024-06-03T21:04:49.731005+02:00 mike kernel: [   24.110477]
worker_thread+0x4d/0x380
2024-06-03T21:04:49.731005+02:00 mike kernel: [   24.110482]  ?
rescuer_thread+0x3a0/0x3a0
2024-06-03T21:04:49.731006+02:00 mike kernel: [   24.110486]
kthread+0xda/0x100
2024-06-03T21:04:49.731006+02:00 mike kernel: [   24.110490]  ?
kthread_complete_and_exit+0x20/0x20
2024-06-03T21:04:49.731006+02:00 mike kernel: [   24.110494]
ret_from_fork+0x22/0x30
2024-06-03T21:04:49.731007+02:00 mike kernel: [   24.110503] </TASK>
2024-06-03T21:04:49.731007+02:00 mike kernel: [   24.110505] Modules
linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device
xt_CHECKSUM tun uhid bridge stp llc qrtr cpufreq_powersave
cpufreq_userspace cpufreq_conservative cpufreq_ondemand cmac algif_hash
algif_skcipher af_alg bnep uinput nft_chain_nat xt_MASQUERADE nf_nat
xt_LOG nf_log_syslog xt_mac ipt_REJECT nf_reject_ipv4 xt_conntrack
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp xt_pkttype
nft_compat sunrpc binfmt_misc nf_tables nfnetlink pktcdvd nls_ascii
nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic
ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common
edac_mce_amd btusb snd_hda_intel btrtl snd_intel_dspcfg btbcm
snd_intel_sdw_acpi btintel kvm_amd iwlmvm btmtk snd_hda_codec ccp
bluetooth mac80211 snd_hda_core libarc4 snd_hwdep jitterentropy_rng
snd_pcm kvm snd_timer irqbypass iwlwifi drbg snd sp5100_tco rapl
wmi_bmof soundcore ansi_cprng k10temp watchdog cfg80211 ecdh_generic ecc
rfkill joydev evdev button acpi_cpufreq sg msr
2024-06-03T21:04:49.731007+02:00 mike kernel: [   24.110606] dm_crypt
loop fuse efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16
mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c
crc32c_generic efivarfs linear dm_mirror dm_region_hash dm_log
hid_logitech_hidpp hid_logitech_dj hid_generic dm_mod raid1 usbhid hid
md_mod amdgpu drm_ttm_helper ttm video crc32_pclmul gpu_sched uas
crc32c_intel usb_storage sr_mod drm_buddy sd_mod ghash_clmulni_intel
cdrom sha512_ssse3 drm_display_helper sha512_generic drm_kms_helper
sha256_ssse3 nvme ahci sha1_ssse3 xhci_pci nvme_core libahci drm
xhci_hcd t10_pi aesni_intel libata crypto_simd cec cryptd
crc64_rocksoft_generic usbcore igb rc_core i2c_piix4 crc64_rocksoft
scsi_mod crc_t10dif crct10dif_generic usb_common dca crct10dif_pclmul
scsi_common i2c_algo_bit crc64 crct10dif_common wmi
2024-06-03T21:04:49.731008+02:00 mike kernel: [   24.110695] ---[ end
trace 0000000000000000 ]---

[2]
2024-06-03T21:04:49.731009+02:00 mike kernel: [ 24.243204] Code: fe ff
0f 0b 48 89 f2 48 89 fe 48 c7 c7 c0 2d fa a6 e8 07 a1 fe ff 0f 0b 48 89
fe 48 89 ca 48 c7 c7 88 2d fa a6 e8 f3 a0 fe ff <0f> 0b 48 89 fe 48 c7
c7 50 2d fa a6 e8 e2 a0 fe ff 0f 0b 48 89 fe
All code
========
   0:   fe                      (bad)
   1:   ff 0f                   decl   (%rdi)
   3:   0b 48 89                or     -0x77(%rax),%ecx
   6:   f2 48 89 fe             repnz mov %rdi,%rsi
   a:   48 c7 c7 c0 2d fa a6    mov    $0xffffffffa6fa2dc0,%rdi
  11:   e8 07 a1 fe ff          call   0xfffffffffffea11d
  16:   0f 0b                   ud2
  18:   48 89 fe                mov    %rdi,%rsi
  1b:   48 89 ca                mov    %rcx,%rdx
  1e:   48 c7 c7 88 2d fa a6    mov    $0xffffffffa6fa2d88,%rdi
  25:   e8 f3 a0 fe ff          call   0xfffffffffffea11d
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   48 89 fe                mov    %rdi,%rsi
  2f:   48 c7 c7 50 2d fa a6    mov    $0xffffffffa6fa2d50,%rdi
  36:   e8 e2 a0 fe ff          call   0xfffffffffffea11d
  3b:   0f 0b                   ud2
  3d:   48 89 fe                mov    %rdi,%rsi

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   48 89 fe                mov    %rdi,%rsi
   5:   48 c7 c7 50 2d fa a6    mov    $0xffffffffa6fa2d50,%rdi
   c:   e8 e2 a0 fe ff          call   0xfffffffffffea0f3
  11:   0f 0b                   ud2
  13:   48 89 fe                mov    %rdi,%rsi


2024-06-06 10:42:41

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On 03.06.24 22:03, Mike wrote:
> On 29.05.24 11:06, Thorsten Leemhuis wrote:
>> Might be a good idea to share it, the developers might want to confirm
>> it's really the same bug.
> I'm attaching the stacktrace [1] and decodecode [2] at the end, generated
> on 6.1.92 vanilla+patch (1.).
> [...]
> I understand that 6.9-rc5[1] worked fine, but I guess it will take some
> time to be
> included in Debian stable, so having a patch for 6.1.x will be much
> appreciated.
> I do not have the time to follow the vanilla (latest) release as is
> likely the case for
> many other Linux users.
>
> Let me know if there's anything else useful I can do for you.
> Thank you,

Still no reaction from the bluetooth developers. Guess they are busy
and/or do not care about 6.1.y. In that case:

@Greg: do you might have an idea how the 6.1.y commit a13f316e90fdb1
("Bluetooth: hci_conn: Consolidate code for aborting connections") might
cause this or if it's missing some per-requisite? If not I wonder if
reverting that patch from 6.1.y might be the best move to resolve this
regression. Mike earlier in
https://lore.kernel.org/all/[email protected]/
confirmed that this fixed the problem in tests. Jeremy (who started the
thread and afaics has the same problem) did not reply.

Ciao, Thorsten

> [1]
> 2024-06-03T21:04:49.730983+02:00 mike kernel: [   24.110172] kernel BUG
> at lib/list_debug.c:56!
> 2024-06-03T21:04:49.730984+02:00 mike kernel: [   24.110181] invalid
> opcode: 0000 [#1] PREEMPT SMP NOPTI
> 2024-06-03T21:04:49.730985+02:00 mike kernel: [   24.110184] CPU: 2 PID:
> 868 Comm: kworker/u65:2 Not tainted 6.1.92 #2
> 2024-06-03T21:04:49.730985+02:00 mike kernel: [   24.110187] Hardware
> name: Micro-Star International Co., Ltd. MS-7B93/MPG X570 GAMING PRO
> CARBON WIFI (MS-7B93), BIOS 1.M0 04/02/2024
> 2024-06-03T21:04:49.730986+02:00 mike kernel: [   24.110191] Workqueue:
> hci0 hci_cmd_sync_work [bluetooth]
> 2024-06-03T21:04:49.730986+02:00 mike kernel: [   24.110234] RIP:
> 0010:__list_del_entry_valid.cold+0x4b/0x6f
> 2024-06-03T21:04:49.730987+02:00 mike kernel: [   24.110240] Code: fe ff
> 0f 0b 48 89 f2 48 89 fe 48 c7 c7 c0 2d fa a6 e8 07 a1 fe ff 0f 0b 48 89
> fe 48 89 ca 48 c7 c7 88 2d fa a6 e8 f3 a0 fe ff <0f> 0b 48 89 fe 48 c7
> c7 50 2d fa a6 e8 e2 a0 fe ff 0f 0b 48 89 fe
> 2024-06-03T21:04:49.730987+02:00 mike kernel: [   24.110243] RSP:
> 0018:ffffb5fe04863de0 EFLAGS: 00010246
> 2024-06-03T21:04:49.730988+02:00 mike kernel: [   24.110247] RAX:
> 000000000000004e RBX: ffff9bff53430800 RCX: 0000000000000027
> 2024-06-03T21:04:49.730988+02:00 mike kernel: [   24.110249] RDX:
> 0000000000000000 RSI: 0000000000000001 RDI: ffff9c064eaa03a0
> 2024-06-03T21:04:49.730988+02:00 mike kernel: [   24.110252] RBP:
> ffff9bff4d2ce000 R08: 0000000000000000 R09: ffffb5fe04863c78
> 2024-06-03T21:04:49.730989+02:00 mike kernel: [   24.110254] R10:
> 0000000000000003 R11: ffff9c066f2fc3e8 R12: ffff9bff4d2ce000
> 2024-06-03T21:04:49.730997+02:00 mike kernel: [   24.110256] R13:
> ffff9bff4d2ce770 R14: ffff9bff62e919c0 R15: dead000000000100
> 2024-06-03T21:04:49.730997+02:00 mike kernel: [   24.110259] FS:
> 0000000000000000(0000) GS:ffff9c064ea80000(0000) knlGS:0000000000000000
> 2024-06-03T21:04:49.730997+02:00 mike kernel: [   24.110262] CS: 0010
> DS: 0000 ES: 0000 CR0: 0000000080050033
> 2024-06-03T21:04:49.730998+02:00 mike kernel: [   24.110265] CR2:
> 000055ff08f14638 CR3: 0000000169804000 CR4: 0000000000350ee0
> 2024-06-03T21:04:49.730998+02:00 mike kernel: [   24.110268] Call Trace:
> 2024-06-03T21:04:49.730999+02:00 mike kernel: [   24.110270] <TASK>
> 2024-06-03T21:04:49.730999+02:00 mike kernel: [   24.110273]  ?
> __die_body.cold+0x1a/0x1f
> 2024-06-03T21:04:49.730999+02:00 mike kernel: [   24.110278]  ?
> die+0x2a/0x50
> 2024-06-03T21:04:49.731000+02:00 mike kernel: [   24.110283]  ?
> do_trap+0xc5/0x110
> 2024-06-03T21:04:49.731000+02:00 mike kernel: [   24.110287]  ?
> __list_del_entry_valid.cold+0x4b/0x6f
> 2024-06-03T21:04:49.731000+02:00 mike kernel: [   24.110293]  ?
> do_error_trap+0x6a/0x90
> 2024-06-03T21:04:49.731001+02:00 mike kernel: [   24.110296]  ?
> __list_del_entry_valid.cold+0x4b/0x6f
> 2024-06-03T21:04:49.731002+02:00 mike kernel: [   24.110301]  ?
> exc_invalid_op+0x4c/0x60
> 2024-06-03T21:04:49.731002+02:00 mike kernel: [   24.110305]  ?
> __list_del_entry_valid.cold+0x4b/0x6f
> 2024-06-03T21:04:49.731002+02:00 mike kernel: [   24.110309]  ?
> asm_exc_invalid_op+0x16/0x20
> 2024-06-03T21:04:49.731003+02:00 mike kernel: [   24.110316]  ?
> __list_del_entry_valid.cold+0x4b/0x6f
> 2024-06-03T21:04:49.731003+02:00 mike kernel: [   24.110321]
> hci_conn_del+0x136/0x3e0 [bluetooth]
> 2024-06-03T21:04:49.731003+02:00 mike kernel: [   24.110357]
> hci_abort_conn_sync+0xaa/0x230 [bluetooth]
> 2024-06-03T21:04:49.731004+02:00 mike kernel: [   24.110395]  ?
> srso_return_thunk+0x5/0x10
> 2024-06-03T21:04:49.731004+02:00 mike kernel: [   24.110399]  ?
> abort_conn_sync+0x3d/0x70 [bluetooth]
> 2024-06-03T21:04:49.731004+02:00 mike kernel: [   24.110435]
> hci_cmd_sync_work+0xa2/0x150 [bluetooth]
> 2024-06-03T21:04:49.731005+02:00 mike kernel: [   24.110471]
> process_one_work+0x1c7/0x380
> 2024-06-03T21:04:49.731005+02:00 mike kernel: [   24.110477]
> worker_thread+0x4d/0x380
> 2024-06-03T21:04:49.731005+02:00 mike kernel: [   24.110482]  ?
> rescuer_thread+0x3a0/0x3a0
> 2024-06-03T21:04:49.731006+02:00 mike kernel: [   24.110486]
> kthread+0xda/0x100
> 2024-06-03T21:04:49.731006+02:00 mike kernel: [   24.110490]  ?
> kthread_complete_and_exit+0x20/0x20
> 2024-06-03T21:04:49.731006+02:00 mike kernel: [   24.110494]
> ret_from_fork+0x22/0x30
> 2024-06-03T21:04:49.731007+02:00 mike kernel: [   24.110503] </TASK>
> 2024-06-03T21:04:49.731007+02:00 mike kernel: [   24.110505] Modules
> linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device
> xt_CHECKSUM tun uhid bridge stp llc qrtr cpufreq_powersave
> cpufreq_userspace cpufreq_conservative cpufreq_ondemand cmac algif_hash
> algif_skcipher af_alg bnep uinput nft_chain_nat xt_MASQUERADE nf_nat
> xt_LOG nf_log_syslog xt_mac ipt_REJECT nf_reject_ipv4 xt_conntrack
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp xt_pkttype
> nft_compat sunrpc binfmt_misc nf_tables nfnetlink pktcdvd nls_ascii
> nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic
> ledtrig_audio snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common
> edac_mce_amd btusb snd_hda_intel btrtl snd_intel_dspcfg btbcm
> snd_intel_sdw_acpi btintel kvm_amd iwlmvm btmtk snd_hda_codec ccp
> bluetooth mac80211 snd_hda_core libarc4 snd_hwdep jitterentropy_rng
> snd_pcm kvm snd_timer irqbypass iwlwifi drbg snd sp5100_tco rapl
> wmi_bmof soundcore ansi_cprng k10temp watchdog cfg80211 ecdh_generic ecc
> rfkill joydev evdev button acpi_cpufreq sg msr
> 2024-06-03T21:04:49.731007+02:00 mike kernel: [   24.110606] dm_crypt
> loop fuse efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16
> mbcache jbd2 btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c
> crc32c_generic efivarfs linear dm_mirror dm_region_hash dm_log
> hid_logitech_hidpp hid_logitech_dj hid_generic dm_mod raid1 usbhid hid
> md_mod amdgpu drm_ttm_helper ttm video crc32_pclmul gpu_sched uas
> crc32c_intel usb_storage sr_mod drm_buddy sd_mod ghash_clmulni_intel
> cdrom sha512_ssse3 drm_display_helper sha512_generic drm_kms_helper
> sha256_ssse3 nvme ahci sha1_ssse3 xhci_pci nvme_core libahci drm
> xhci_hcd t10_pi aesni_intel libata crypto_simd cec cryptd
> crc64_rocksoft_generic usbcore igb rc_core i2c_piix4 crc64_rocksoft
> scsi_mod crc_t10dif crct10dif_generic usb_common dca crct10dif_pclmul
> scsi_common i2c_algo_bit crc64 crct10dif_common wmi
> 2024-06-03T21:04:49.731008+02:00 mike kernel: [   24.110695] ---[ end
> trace 0000000000000000 ]---
>
> [2]
> 2024-06-03T21:04:49.731009+02:00 mike kernel: [ 24.243204] Code: fe ff
> 0f 0b 48 89 f2 48 89 fe 48 c7 c7 c0 2d fa a6 e8 07 a1 fe ff 0f 0b 48 89
> fe 48 89 ca 48 c7 c7 88 2d fa a6 e8 f3 a0 fe ff <0f> 0b 48 89 fe 48 c7
> c7 50 2d fa a6 e8 e2 a0 fe ff 0f 0b 48 89 fe
> All code
> ========
>    0:   fe                      (bad)
>    1:   ff 0f                   decl   (%rdi)
>    3:   0b 48 89                or     -0x77(%rax),%ecx
>    6:   f2 48 89 fe             repnz mov %rdi,%rsi
>    a:   48 c7 c7 c0 2d fa a6    mov    $0xffffffffa6fa2dc0,%rdi
>   11:   e8 07 a1 fe ff          call   0xfffffffffffea11d
>   16:   0f 0b                   ud2
>   18:   48 89 fe                mov    %rdi,%rsi
>   1b:   48 89 ca                mov    %rcx,%rdx
>   1e:   48 c7 c7 88 2d fa a6    mov    $0xffffffffa6fa2d88,%rdi
>   25:   e8 f3 a0 fe ff          call   0xfffffffffffea11d
>   2a:*  0f 0b                   ud2             <-- trapping instruction
>   2c:   48 89 fe                mov    %rdi,%rsi
>   2f:   48 c7 c7 50 2d fa a6    mov    $0xffffffffa6fa2d50,%rdi
>   36:   e8 e2 a0 fe ff          call   0xfffffffffffea11d
>   3b:   0f 0b                   ud2
>   3d:   48 89 fe                mov    %rdi,%rsi
>
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   48 89 fe                mov    %rdi,%rsi
>    5:   48 c7 c7 50 2d fa a6    mov    $0xffffffffa6fa2d50,%rdi
>    c:   e8 e2 a0 fe ff          call   0xfffffffffffea0f3
>   11:   0f 0b                   ud2
>   13:   48 89 fe                mov    %rdi,%rsi
>
>

2024-06-12 12:06:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Bluetooth kernel BUG with Intel AX211 (regression in 6.1.83)

On Thu, Jun 06, 2024 at 12:18:18PM +0200, Thorsten Leemhuis wrote:
> On 03.06.24 22:03, Mike wrote:
> > On 29.05.24 11:06, Thorsten Leemhuis wrote:
> >> Might be a good idea to share it, the developers might want to confirm
> >> it's really the same bug.
> > I'm attaching the stacktrace [1] and decodecode [2] at the end, generated
> > on 6.1.92 vanilla+patch (1.).
> > [...]
> > I understand that 6.9-rc5[1] worked fine, but I guess it will take some
> > time to be
> > included in Debian stable, so having a patch for 6.1.x will be much
> > appreciated.
> > I do not have the time to follow the vanilla (latest) release as is
> > likely the case for
> > many other Linux users.
> >
> > Let me know if there's anything else useful I can do for you.
> > Thank you,
>
> Still no reaction from the bluetooth developers. Guess they are busy
> and/or do not care about 6.1.y. In that case:
>
> @Greg: do you might have an idea how the 6.1.y commit a13f316e90fdb1
> ("Bluetooth: hci_conn: Consolidate code for aborting connections") might
> cause this or if it's missing some per-requisite? If not I wonder if
> reverting that patch from 6.1.y might be the best move to resolve this
> regression. Mike earlier in
> https://lore.kernel.org/all/[email protected]/
> confirmed that this fixed the problem in tests. Jeremy (who started the
> thread and afaics has the same problem) did not reply.

How was this reverted? I get a bunch of conflicts as this commit was
added as a dependency of a patch later in the series.

So if this wants to be reverted from 6.1.y, can someone send me the
revert that has been tested to work?

thanks,

greg k-h