Hello,
When using modified Syzkaller to fuzz the Linux kernel-5.19, the
following crash was triggered.
We would appreciate a CVE ID if this is a security issue.
HEAD commit: 3d7cb6b04c3f Linux-5.19
git tree: upstream
console output:
https://drive.google.com/file/d/1NmOGWcfPnY2kSrS0nOwvG1AZ923jFQ3p/view?usp=sharing
kernel config: https://drive.google.com/file/d/1wgIUDwP5ho29AM-K7HhysSTfWFpfXYkG/view?usp=sharing
syz repro: https://drive.google.com/file/d/16hUTEGw4IcPQA9CZvoF7I5la42TlU-Cx/view?usp=sharing
C reproducer: https://drive.google.com/file/d/1YvgzTvV4qaSZPiD4D1IWGL4GuapzHD2w/view?usp=sharing
Environment:
Ubuntu 20.04 on Linux 5.4.0
QEMU 4.2.1:
qemu-system-x86_64 \
-m 2G \
-smp 2 \
-kernel /home/workdir/bzImage \
-append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
-drive file=/home/workdir/stretch.img,format=raw \
-net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 \
-net nic,model=e1000 \
-enable-kvm \
-nographic \
-pidfile vm.pid \
2>&1 | tee vm.log
If you fix this issue, please add the following tag to the commit:
Reported-by Jiacheng Xu<[email protected]>
============================================
WARNING: possible recursive locking detected
5.19.0 #1 Not tainted
--------------------------------------------
syz-executor/9064 is trying to acquire lock:
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: lock_sock include/net/sock.h:1677 [inline]
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
but task is already holding lock:
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: lock_sock include/net/sock.h:1677 [inline]
ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
at: rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by syz-executor/9064:
#0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
inode_lock include/linux/fs.h:741 [inline]
#0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
__sock_release+0x86/0x280 net/socket.c:649
#1: ffff888026b13130
(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock
include/net/sock.h:1677 [inline]
#1: ffff888026b13130
(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at:
rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
#2: ffffffff8d7d8428 (rfcomm_mutex){+.+.}-{3:3}, at:
rfcomm_dlc_close+0x34/0x240 net/bluetooth/rfcomm/core.c:507
#3: ffff8880155d2d28 (&d->lock){+.+.}-{3:3}, at:
__rfcomm_dlc_close+0x157/0x710 net/bluetooth/rfcomm/core.c:487
stack backtrace:
CPU: 0 PID: 9064 Comm: syz-executor Not tainted 5.19.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2988 [inline]
check_deadlock kernel/locking/lockdep.c:3031 [inline]
validate_chain kernel/locking/lockdep.c:3816 [inline]
__lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x580 kernel/locking/lockdep.c:5630
lock_sock_nested+0x36/0xf0 net/core/sock.c:3389
lock_sock include/net/sock.h:1677 [inline]
rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
__rfcomm_dlc_close+0x1ab/0x710 net/bluetooth/rfcomm/core.c:489
rfcomm_dlc_close+0x1ea/0x240 net/bluetooth/rfcomm/core.c:520
__rfcomm_sock_close+0xda/0x260 net/bluetooth/rfcomm/sock.c:220
rfcomm_sock_shutdown+0xf4/0x220 net/bluetooth/rfcomm/sock.c:905
rfcomm_sock_release+0x5f/0x140 net/bluetooth/rfcomm/sock.c:925
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x18/0x20 net/socket.c:1365
__fput+0x277/0x9d0 fs/file_table.c:317
task_work_run+0xe0/0x1a0 kernel/task_work.c:177
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0xaf5/0x2da0 kernel/exit.c:795
do_group_exit+0xd2/0x2f0 kernel/exit.c:925
get_signal+0x2842/0x2870 kernel/signal.c:2857
arch_do_signal_or_restart+0x82/0x2270 arch/x86/kernel/signal.c:869
exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
exit_to_user_mode_prepare+0x174/0x260 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f26c3295dfd
Code: Unable to access opcode bytes at RIP 0x7f26c3295dd3.
RSP: 002b:00007f26c43fcc58 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: fffffffffffffffc RBX: 00007f26c33bc0a0 RCX: 00007f26c3295dfd
RDX: 0000000000000080 RSI: 0000000020000000 RDI: 0000000000000004
RBP: 00007f26c32ff4c1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f26c33bc0a0
R13: 00007ffc2c88f2df R14: 00007ffc2c88f480 R15: 00007f26c43fcdc0
</TASK>
Hi,
I believe the deadlock is more than possible but actually real.
I got a poc that could stably trigger the deadlock.
poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
Description/Root cause:
In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
shutting down socket.
However, lock_sock() has to be called once more when the sk_state is
changed because the
lock is not always held when rfcomm_sk_state_change() is called. One
such call stack is:
rfcomm_sock_shutdown():
lock_sock();
__rfcomm_sock_close():
rfcomm_dlc_close():
__rfcomm_dlc_close():
rfcomm_dlc_lock();
rfcomm_sk_state_change():
lock_sock();
Besides the recursive deadlock, there is also an
issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
lock_sock() if the socket is locked in rfcomm_sk_state_change().
Reference: https://lore.kernel.org/all/[email protected]/
On Sun, Aug 28, 2022 at 12:19 AM Jiacheng Xu <[email protected]> wrote:
>
> Hello,
>
> When using modified Syzkaller to fuzz the Linux kernel-5.19, the
> following crash was triggered.
> We would appreciate a CVE ID if this is a security issue.
>
> HEAD commit: 3d7cb6b04c3f Linux-5.19
> git tree: upstream
>
> console output:
> https://drive.google.com/file/d/1NmOGWcfPnY2kSrS0nOwvG1AZ923jFQ3p/view?usp=sharing
> kernel config: https://drive.google.com/file/d/1wgIUDwP5ho29AM-K7HhysSTfWFpfXYkG/view?usp=sharing
> syz repro: https://drive.google.com/file/d/16hUTEGw4IcPQA9CZvoF7I5la42TlU-Cx/view?usp=sharing
> C reproducer: https://drive.google.com/file/d/1YvgzTvV4qaSZPiD4D1IWGL4GuapzHD2w/view?usp=sharing
>
> Environment:
> Ubuntu 20.04 on Linux 5.4.0
> QEMU 4.2.1:
> qemu-system-x86_64 \
> -m 2G \
> -smp 2 \
> -kernel /home/workdir/bzImage \
> -append "console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
> -drive file=/home/workdir/stretch.img,format=raw \
> -net user,host=10.0.2.10,hostfwd=tcp:127.0.0.1:10021-:22 \
> -net nic,model=e1000 \
> -enable-kvm \
> -nographic \
> -pidfile vm.pid \
> 2>&1 | tee vm.log
>
> If you fix this issue, please add the following tag to the commit:
> Reported-by Jiacheng Xu<[email protected]>
>
> ============================================
> WARNING: possible recursive locking detected
> 5.19.0 #1 Not tainted
> --------------------------------------------
> syz-executor/9064 is trying to acquire lock:
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: lock_sock include/net/sock.h:1677 [inline]
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
>
> but task is already holding lock:
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: lock_sock include/net/sock.h:1677 [inline]
> ffff888026b13130 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0},
> at: rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
> lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 4 locks held by syz-executor/9064:
> #0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
> inode_lock include/linux/fs.h:741 [inline]
> #0: ffff888110dac410 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
> __sock_release+0x86/0x280 net/socket.c:649
> #1: ffff888026b13130
> (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock
> include/net/sock.h:1677 [inline]
> #1: ffff888026b13130
> (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at:
> rfcomm_sock_shutdown+0x57/0x220 net/bluetooth/rfcomm/sock.c:902
> #2: ffffffff8d7d8428 (rfcomm_mutex){+.+.}-{3:3}, at:
> rfcomm_dlc_close+0x34/0x240 net/bluetooth/rfcomm/core.c:507
> #3: ffff8880155d2d28 (&d->lock){+.+.}-{3:3}, at:
> __rfcomm_dlc_close+0x157/0x710 net/bluetooth/rfcomm/core.c:487
>
> stack backtrace:
> CPU: 0 PID: 9064 Comm: syz-executor Not tainted 5.19.0 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> print_deadlock_bug kernel/locking/lockdep.c:2988 [inline]
> check_deadlock kernel/locking/lockdep.c:3031 [inline]
> validate_chain kernel/locking/lockdep.c:3816 [inline]
> __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053
> lock_acquire kernel/locking/lockdep.c:5665 [inline]
> lock_acquire+0x1ab/0x580 kernel/locking/lockdep.c:5630
> lock_sock_nested+0x36/0xf0 net/core/sock.c:3389
> lock_sock include/net/sock.h:1677 [inline]
> rfcomm_sk_state_change+0x6e/0x3a0 net/bluetooth/rfcomm/sock.c:73
> __rfcomm_dlc_close+0x1ab/0x710 net/bluetooth/rfcomm/core.c:489
> rfcomm_dlc_close+0x1ea/0x240 net/bluetooth/rfcomm/core.c:520
> __rfcomm_sock_close+0xda/0x260 net/bluetooth/rfcomm/sock.c:220
> rfcomm_sock_shutdown+0xf4/0x220 net/bluetooth/rfcomm/sock.c:905
> rfcomm_sock_release+0x5f/0x140 net/bluetooth/rfcomm/sock.c:925
> __sock_release+0xcd/0x280 net/socket.c:650
> sock_close+0x18/0x20 net/socket.c:1365
> __fput+0x277/0x9d0 fs/file_table.c:317
> task_work_run+0xe0/0x1a0 kernel/task_work.c:177
> exit_task_work include/linux/task_work.h:38 [inline]
> do_exit+0xaf5/0x2da0 kernel/exit.c:795
> do_group_exit+0xd2/0x2f0 kernel/exit.c:925
> get_signal+0x2842/0x2870 kernel/signal.c:2857
> arch_do_signal_or_restart+0x82/0x2270 arch/x86/kernel/signal.c:869
> exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
> exit_to_user_mode_prepare+0x174/0x260 kernel/entry/common.c:201
> __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
> syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
> do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f26c3295dfd
> Code: Unable to access opcode bytes at RIP 0x7f26c3295dd3.
> RSP: 002b:00007f26c43fcc58 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> RAX: fffffffffffffffc RBX: 00007f26c33bc0a0 RCX: 00007f26c3295dfd
> RDX: 0000000000000080 RSI: 0000000020000000 RDI: 0000000000000004
> RBP: 00007f26c32ff4c1 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f26c33bc0a0
> R13: 00007ffc2c88f2df R14: 00007ffc2c88f480 R15: 00007f26c43fcdc0
> </TASK>
+cc Bluetooth and Networking maintainers
Hi Jiacheng,
On 28/8/22 04:03, Jiacheng Xu wrote:
> Hi,
>
> I believe the deadlock is more than possible but actually real.
> I got a poc that could stably trigger the deadlock.
>
> poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
>
> Description/Root cause:
> In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
> shutting down socket.
> However, lock_sock() has to be called once more when the sk_state is
> changed because the
> lock is not always held when rfcomm_sk_state_change() is called. One
> such call stack is:
>
> rfcomm_sock_shutdown():
> lock_sock();
> __rfcomm_sock_close():
> rfcomm_dlc_close():
> __rfcomm_dlc_close():
> rfcomm_dlc_lock();
> rfcomm_sk_state_change():
> lock_sock();
>
> Besides the recursive deadlock, there is also an
> issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
> lock_sock() if the socket is locked in rfcomm_sk_state_change().
Thanks for the poc and for following the trail all the way to the root
cause - this was a known issue and I didn't realize the patch wasn't
applied.
> > Reference:
https://lore.kernel.org/all/[email protected]/
>
Fwiw, I tested the patch again with syzbot. It still applies cleanly to
the head of bluetooth-next and seems to address the root cause.
Any thoughts from the maintainers on this issue and the proposed fix?
Best,
Desmond
Hi Desmond,
On Mon, Aug 29, 2022 at 11:48 PM Desmond Cheong Zhi Xi
<[email protected]> wrote:
>
> +cc Bluetooth and Networking maintainers
>
> Hi Jiacheng,
>
> On 28/8/22 04:03, Jiacheng Xu wrote:
> > Hi,
> >
> > I believe the deadlock is more than possible but actually real.
> > I got a poc that could stably trigger the deadlock.
> >
> > poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
> >
> > Description/Root cause:
> > In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
> > shutting down socket.
> > However, lock_sock() has to be called once more when the sk_state is
> > changed because the
> > lock is not always held when rfcomm_sk_state_change() is called. One
> > such call stack is:
> >
> > rfcomm_sock_shutdown():
> > lock_sock();
> > __rfcomm_sock_close():
> > rfcomm_dlc_close():
> > __rfcomm_dlc_close():
> > rfcomm_dlc_lock();
> > rfcomm_sk_state_change():
> > lock_sock();
> >
> > Besides the recursive deadlock, there is also an
> > issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
> > lock_sock() if the socket is locked in rfcomm_sk_state_change().
>
>
> Thanks for the poc and for following the trail all the way to the root
> cause - this was a known issue and I didn't realize the patch wasn't
> applied.
>
> > > Reference:
> https://lore.kernel.org/all/[email protected]/
> >
>
> Fwiw, I tested the patch again with syzbot. It still applies cleanly to
> the head of bluetooth-next and seems to address the root cause.
>
> Any thoughts from the maintainers on this issue and the proposed fix?
We probably need to introduce a test to rfcomm-tester to reproduce
this sort of problem, I also would like to avoid introducing a work
just to trigger a state change since we don't have such problem on the
likes of L2CAP socket so perhaps we need to rework the code a little
bit to avoid the locking problems.
> Best,
> Desmond
--
Luiz Augusto von Dentz
Hi Desmond,
On Tue, Aug 30, 2022 at 10:41 AM Luiz Augusto von Dentz
<[email protected]> wrote:
>
> Hi Desmond,
>
> On Mon, Aug 29, 2022 at 11:48 PM Desmond Cheong Zhi Xi
> <[email protected]> wrote:
> >
> > +cc Bluetooth and Networking maintainers
> >
> > Hi Jiacheng,
> >
> > On 28/8/22 04:03, Jiacheng Xu wrote:
> > > Hi,
> > >
> > > I believe the deadlock is more than possible but actually real.
> > > I got a poc that could stably trigger the deadlock.
> > >
> > > poc: https://drive.google.com/file/d/1PjqvMtHsrrGM1MIRGKl_zJGR-teAMMQy/view?usp=sharing
> > >
> > > Description/Root cause:
> > > In rfcomm_sock_shutdown(), lock_sock() is called when releasing and
> > > shutting down socket.
> > > However, lock_sock() has to be called once more when the sk_state is
> > > changed because the
> > > lock is not always held when rfcomm_sk_state_change() is called. One
> > > such call stack is:
> > >
> > > rfcomm_sock_shutdown():
> > > lock_sock();
> > > __rfcomm_sock_close():
> > > rfcomm_dlc_close():
> > > __rfcomm_dlc_close():
> > > rfcomm_dlc_lock();
> > > rfcomm_sk_state_change():
> > > lock_sock();
> > >
> > > Besides the recursive deadlock, there is also an
> > > issue of a lock hierarchy inversion between rfcomm_dlc_lock() and
> > > lock_sock() if the socket is locked in rfcomm_sk_state_change().
> >
> >
> > Thanks for the poc and for following the trail all the way to the root
> > cause - this was a known issue and I didn't realize the patch wasn't
> > applied.
> >
> > > > Reference:
> > https://lore.kernel.org/all/[email protected]/
> > >
> >
> > Fwiw, I tested the patch again with syzbot. It still applies cleanly to
> > the head of bluetooth-next and seems to address the root cause.
> >
> > Any thoughts from the maintainers on this issue and the proposed fix?
>
> We probably need to introduce a test to rfcomm-tester to reproduce
> this sort of problem, I also would like to avoid introducing a work
> just to trigger a state change since we don't have such problem on the
> likes of L2CAP socket so perhaps we need to rework the code a little
> bit to avoid the locking problems.
It looks like for L2CAP we use lock_sock_nested on teardown, we don't
have the exact same behavior in RFCOMM but I think that might be worth
a try if we can use that instead of introducing yet another work item.
> > Best,
> > Desmond
>
>
>
> --
> Luiz Augusto von Dentz
--
Luiz Augusto von Dentz