2024-01-10 15:16:10

by Mikhail Gavrilov

[permalink] [raw]
Subject: [BUG] Unloading mt7921e module cause use-after-free

Greetings,
For bug reproduction just type:
# rmmod mt7921e

Backtrace:
BUG: KASAN: use-after-free in tasklet_action_common.isra.0+0x6a4/0x7a0
Read of size 8 at addr ffff888146806748 by task ksoftirqd/5/48
CPU: 5 PID: 48 Comm: ksoftirqd/5 Tainted: G W L -------
--- 6.8.0-0.rc0.20240109git9f8413c4a66f.1.fc40.x86_64+debug #1
Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I
EDGE WIFI (MS-7D73), BIOS 1.81 01/05/2024
Call Trace:
<TASK>
dump_stack_lvl+0x76/0xd0
print_report+0xcf/0x670
? tasklet_action_common.isra.0+0x6a4/0x7a0
kasan_report+0xa6/0xe0
? tasklet_action_common.isra.0+0x6a4/0x7a0
tasklet_action_common.isra.0+0x6a4/0x7a0
__do_softirq+0x215/0x8b9
? __pfx___do_softirq+0x10/0x10
? run_ksoftirqd+0x73/0x80
? __pfx_run_ksoftirqd+0x10/0x10
run_ksoftirqd+0x4b/0x80
smpboot_thread_fn+0x56d/0x900
? __kthread_parkme+0xbd/0x1f0
? __pfx_smpboot_thread_fn+0x10/0x10
kthread+0x2f2/0x3d0
? _raw_spin_unlock_irq+0x28/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
The buggy address belongs to the physical page:
page:0000000021f6fa86 refcount:0 mapcount:0 mapping:0000000000000000
index:0x1 pfn:0x146806
flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888146806600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888146806680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff888146806700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff888146806780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff888146806800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Demonstration: https://youtu.be/4dSuQp0aPkQ

Probably I wouldn't have paid attention to this because in real life I
did not need to unload module mt7921e.
But after commit 9270270d62191b7549296721e8d5f3dc0df01563 I see
"use-after-free" on every system shutdown and reboot.

mikhail@secondary-ws ~/p/g/linux ((fcc51acf)|BISECTING)> git bisect good
9270270d62191b7549296721e8d5f3dc0df01563 is the first bad commit
commit 9270270d62191b7549296721e8d5f3dc0df01563
Author: Deren Wu <[email protected]>
Date: Tue Feb 14 10:49:57 2023 +0800

wifi: mt76: mt7921: fix PCI DMA hang after reboot

mt7921 just stop some workers and clean up chip status before reboot.
In stress test, there are working activities still running at the period
of .shutdown callback and that would cause some hosts cannot recover
DMA after reboot. To avoid the floating state in reboot, we use
mt7921_pci_remove() to fully deinit all resources.

Fixes: f23a0cea8bd6 ("wifi: mt76: mt7921e: add pci .shutdown() support")
Signed-off-by: Deren Wu <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno
<[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>

drivers/net/wireless/mediatek/mt76/mt7921/pci.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

Most oldest kernel which I could build is 5.17 and on this kernel
use-after-free has different backtrace:
BUG: KASAN: use-after-free in mt7921_irq_handler+0xd8/0x100 [mt7921e]
Read of size 8 at addr ffff88824a7d3b78 by task rmmod/11115
CPU: 28 PID: 11115 Comm: rmmod Tainted: G W L 5.17.0 #10
Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I
EDGE WIFI (MS-7D73), BIOS 1.81 01/05/2024
Call Trace:
<TASK>
dump_stack_lvl+0x6f/0xa0
print_address_description.constprop.0+0x1f/0x190
? mt7921_irq_handler+0xd8/0x100 [mt7921e]
? mt7921_irq_handler+0xd8/0x100 [mt7921e]
kasan_report.cold+0x7f/0x11b
? mt7921_irq_handler+0xd8/0x100 [mt7921e]
mt7921_irq_handler+0xd8/0x100 [mt7921e]
free_irq+0x627/0xaa0
devm_free_irq+0x94/0xd0
? devm_request_any_context_irq+0x160/0x160
? kobject_put+0x18d/0x4a0
mt7921_pci_remove+0x153/0x190 [mt7921e]
pci_device_remove+0xa2/0x1d0
__device_release_driver+0x346/0x6e0
driver_detach+0x1ef/0x2c0
bus_remove_driver+0xe7/0x2d0
? __check_object_size+0x57/0x310
pci_unregister_driver+0x26/0x250
__do_sys_delete_module+0x307/0x510
? free_module+0x6a0/0x6a0
? fpregs_assert_state_consistent+0x4b/0xb0
? rcu_read_lock_sched_held+0x10/0x70
? syscall_enter_from_user_mode+0x20/0x70
? trace_hardirqs_on+0x1c/0x130
do_syscall_64+0x5c/0x80
? trace_hardirqs_on_prepare+0x72/0x160
? do_syscall_64+0x68/0x80
? trace_hardirqs_on_prepare+0x72/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc83aad105b
Code: 73 01 c3 48 8b 0d bd 8d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66
2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 8d 8d 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc384c28c8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000560eec64a750 RCX: 00007fc83aad105b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000560eec64a7b8
RBP: 00007ffc384c28f0 R08: 1999999999999999 R09: 0000000000000000
R10: 00007fc83ab49ac0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc384c2b60 R14: 0000560eec64a750 R15: 0000000000000000
</TASK>
The buggy address belongs to the page:
page:00000000f94118a1 refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x24a7d3
flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 0017ffffc0000000 0000000000000000 ffffea000929f488 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88824a7d3a00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88824a7d3a80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88824a7d3b00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88824a7d3b80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88824a7d3c00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

All kernel logs and .config are attached to this message.
What did you think?

--
Best Regards,
Mike Gavrilov.


Attachments:
dmesg-6.8.zip (50.17 kB)
dmesg-5.17.0.zip (51.91 kB)
.config.zip (58.90 kB)
build-error-5.16.zip (1.99 kB)
Download all attachments

2024-01-12 08:06:45

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [BUG] Unloading mt7921e module cause use-after-free

On Fri, Jan 12, 2024 at 12:31 PM Mikhail Gavrilov
<[email protected]> wrote:
>
> Thanks, this patch looks good to me.
> Demonstration: https://youtu.be/nKnA2ftVoXw
>
> Tested-by: Mikhail Gavrilov <[email protected]>
>

I noticed DMA-API notice:
------------[ cut here ]------------
DMA-API: pci 0000:0f:00.0: device driver has pending DMA allocations
while released from device [count=21]
One of leaked entries details: [device address=0x00000000ffbda000]
[size=4096 bytes] [mapped with DMA_FROM_DEVICE] [mapped as single]
WARNING: CPU: 13 PID: 11252 at kernel/dma/debug.c:863
dma_debug_device_change+0x276/0x3d0
Modules linked in: mt7921e(-) uinput rfcomm snd_seq_dummy snd_hrtimer
nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 ip_set nf_tables qrtr bnep sunrpc binfmt_misc
intel_rapl_msr intel_rapl_common mt7921_common mt792x_lib
snd_hda_codec_hdmi mt76 edac_mce_amd snd_hda_intel snd_intel_dspcfg
snd_intel_sdw_acpi mac80211 snd_usb_audio kvm_amd snd_hda_codec
snd_usbmidi_lib btusb snd_hda_core uvcvideo snd_ump btrtl kvm ntel
snd_hwdep btbcm uvc snd_seq vfat btmtk videobuf2_vmalloc
videobuf2_memops libarc4 fat snd_seq_device videobuf2_v4l2 bluetooth
snd_pcm irqbypass videobuf2_common snd_timer cfg80211 rapl of snd
joydev pcspkr mc k10temp i2c_piix4 soundcore rfkill gpio_amdpt
gpio_generic loop nfnetlink zram amdgpu amdxcp i2c_algo_bit
drm_ttm_helper ttm drm_exec crct10dif_pclmul gpu_sched
crc32_pclmul crc32c_intel drm_suballoc_helper polyval_clmulni
drm_buddy polyval_generic drm_display_helper ghash_clmulni_intel nvme
sha512_ssse3 sha256_ssse3 ccp nvme_core sha1_ssse3 sp5100_tco ek
nvme_auth video wmi ip6_tables ip_tables fuse [last unloaded: mt7921e]
CPU: 13 PID: 11252 Comm: rmmod Tainted: G W L 6.7.0-check-fix+ #70
Hardware name: Micro-Star International Co., Ltd. MS-7D73/MPG B650I
EDGE WIFI (MS-7D73), BIOS 1.81 01/05/2024
RIP: 0010:dma_debug_device_change+0x276/0x3d0
Code: 54 24 08 e8 5c d7 c8 01 48 8b 54 24 08 48 89 c6 ff 34 24 49 89
e9 49 89 d8 44 89 e1 41 56 48 c7 c7 40 46 52 b3 e8 1a c9 d6 ff <0f> 0b
5a 59 4d 85 ed 74 49 48 c7 c7 80 47 52 b3 e8 15 04 f5 ff
RSP: 0018:ffffc90009c77bf0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 00000000ffbda000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000001
RBP: 0000000000001000 R08: 0000000000000001 R09: ffffed11f587fff9
R10: ffff888fac3fffcb R11: 0000000000000000 R12: 0000000000000015
R13: ffff888109b02180 R14: ffffffffb35265e0 R15: ffff88811418e0c0
FS: 00007f5a1e4177c0(0000) GS:ffff888fac200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8b1d394054 CR3: 0000000134cb8000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0xcd/0x2b0
? __pfx_vprintk_emit+0x10/0x10
? dma_debug_device_change+0x276/0x3d0
? report_bug+0x2ea/0x390
? handle_bug+0x79/0xa0
? exc_invalid_op+0x17/0x40
? asm_exc_invalid_op+0x1a/0x20
? dma_debug_device_change+0x276/0x3d0
notifier_call_chain+0xa0/0x2a0
blocking_notifier_call_chain+0x64/0x90
bus_notify+0x51/0x70
device_release_driver_internal+0x42d/0x540
driver_detach+0xc5/0x180
bus_remove_driver+0x11e/0x2a0
? __check_object_size+0x5b/0x680
pci_unregister_driver+0x2a/0x250
__do_sys_delete_module+0x350/0x580
? __pfx___do_sys_delete_module+0x10/0x10
? syscall_exit_to_user_mode+0xce/0x2b0
? do_syscall_64+0xab/0x190
? rcu_is_watching+0x15/0xb0
? syscall_exit_to_user_mode+0xb6/0x2b0
? trace_hardirqs_on_prepare+0xe3/0x100
? do_syscall_64+0x58/0x190
do_syscall_64+0x9b/0x190
? trace_hardirqs_on_prepare+0xe3/0x100
entry_SYSCALL_64_after_hwframe+0x6e/0x76
RIP: 0033:0x7f5a1dd2d05b
Code: 73 01 c3 48 8b 0d bd 8d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66
2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d 8d 8d 0c 00 f7 d8 64 89 01
RSP: 002b:00007ffc5ae71b68 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000055e490fee760 RCX: 00007f5a1dd2d05b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055e490fee7c8
RBP: 00007ffc5ae71b90 R08: 1999999999999999 R09: 0000000000000000
R10: 00007f5a1dda5ac0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffc5ae71df0 R14: 000055e490fee760 R15: 0000000000000000
</TASK>
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffffb023b291>] copy_process+0x2111/0x88e0
softirqs last enabled at (0): [<ffffffffb023b2f3>] copy_process+0x2173/0x88e0
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---
DMA-API: Mapped at:
debug_dma_map_page+0x60/0x3c0
dma_map_page_attrs+0x2fc/0xba0
page_pool_dma_map+0xaf/0x2d0
__page_pool_alloc_pages_slow+0x36f/0xab0
page_pool_alloc_frag+0x4fa/0x9c0

It happens when I unload driver mt7921e and in the background some
applications (in my case it is speedtest) have heavy network activity.

Demonstration: https://youtu.be/XYO4ueVlh90

But I am not sure what it is related to the mt7921e driver.

--
Best Regards,
Mike Gavrilov.


Attachments:
dmesg-6.8-with-patch.zip (228.00 B)