LinuxLists.cc - A potential deadlock in sockhash map

2023-02-20 13:40:42

Subject: A potential deadlock in sockhash map

Hi,

I think my previous report got blocked since it contained HTML
subparts so I am sending it again. Our bpf runtime fuzzer (a
customized syzkaller) triggered a lockdep warning in the bpf subsystem
indicating a potential deadlock. We are able to trigger this bug on
v5.15.25 and v5.19. The following code is a BPF PoC, and the lockdep
warning is attached at the end.

#include "/usr/local/include/vmlinux.h"
#include "/usr/include/bpf/bpf_helpers.h"

#define __uint(name, val) int (*name)[val]
#define __type(name, val) typeof(val) *name
#define __array(name, val) typeof(val) *name[]

#define SEC(name) \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wignored-attributes\"") \
__attribute__((section(name), used)) \
_Pragma("GCC diagnostic pop")

#define DEFINE_BPF_MAP(the_map, TypeOfMap, MapFlags, TypeOfKey,
TypeOfValue, MaxEntries) \
struct { \
__uint(type, TypeOfMap); \
__uint(map_flags, (MapFlags)); \
__uint(max_entries, (MaxEntries)); \
__type(key, TypeOfKey); \
__type(value, TypeOfValue); \
} the_map SEC(".maps");

DEFINE_BPF_MAP(map_0, BPF_MAP_TYPE_SOCKHASH, 0, uint32_t, uint32_t, 1005);
SEC("tp/sched/sched_switch")
int func(__u64 *ctx) {
uint32_t v0 = 0;
uint64_t v1 = 0;
v1 = bpf_map_delete_elem(&map_0, &v0);
return 0;
}
char _license[] SEC("license") = "GPL";

=====================================================
WARNING: CPU: 1 PID: 0 at kernel/softirq.c:376 __local_bh_enable_ip+0xc6/0x110
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
Modules linked in:
5.19.0+ #2 Not tainted

-----------------------------------------------------
syz-executor.0/1299 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0+ #2
ffffc90000a173e0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
(
RIP: 0010:__local_bh_enable_ip+0xc6/0x110
&htab->buckets[i].lock
Code: 00 ff ff 00 74 39 65 ff 0d 17 e8 25 4f e8 62 bb 3c 00 fb 0f 1f
44 00 00 5b 5d c3 cc cc cc cc 65 8b 05 7a ee 25 4f 85 c0 75 9b <0f> 0b
eb 97 e8 41 ba 3c 00 eb a4 48 89 ef e8 b7 6e 16 00 eb ad 65
){+...}-{2:2}
RSP: 0018:ffff888001b1fac8 EFLAGS: 00010046
, at: sock_hash_delete_elem+0xcc/0x290

and this task is already holding:
RAX: 0000000000000000 RBX: 0000000000000201 RCX: 1ffffffff6a5dcf1
ffff888054a42c98
RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffffb2c89dfd
(
RBP: ffffffffb2c89dfd R08: 0000000000000000 R09: ffffc90000a173cb
&rq->__lock
R10: fffff52000142e79 R11: 0000000000000001 R12: 00000000fffffffe
){-.-.}-{2:2}
R13: ffffc90000a173c8 R14: 00000000049396b8 R15: 0000000000000004
, at: __schedule+0x29a/0x28c0
FS: 0000000000000000(0000) GS:ffff888054a80000(0000) knlGS:0000000000000000
which would create a new lock dependency:
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
(&rq->__lock
CR2: 0000001b30b20000 CR3: 0000000006eb2004 CR4: 0000000000370ee0
){-.-.}-{2:2}
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
-> (
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
&htab->buckets[i].lock
Call Trace:
){+...}-{2:2}
<TASK>

but this new dependency connects a HARDIRQ-irq-safe lock:
sock_hash_delete_elem+0x20d/0x290
(&rq->__lock
bpf_prog_ca95c5394311bbec_func+0x2d/0x31
){-.-.}-{2:2}
trace_call_bpf+0x274/0x5f0

... which became HARDIRQ-irq-safe at:
? tracing_prog_func_proto+0x490/0x490
lock_acquire+0x1a1/0x500
perf_trace_run_bpf_submit+0x96/0x1c0
_raw_spin_lock_nested+0x2a/0x40
perf_trace_sched_switch+0x452/0x6c0
scheduler_tick+0x9f/0x770
? psi_task_switch+0x3ae/0x4c0
update_process_times+0x10f/0x150
? trace_raw_output_sched_wake_idle_without_ipi+0xb0/0xb0
tick_periodic+0x72/0x230
? do_raw_spin_lock+0x125/0x270
tick_handle_periodic+0x46/0x120
? pick_next_task_fair+0x46a/0xee0
timer_interrupt+0x4a/0x80
__schedule+0x134f/0x28c0
__handle_irq_event_percpu+0x214/0x7b0
? io_schedule_timeout+0x150/0x150
handle_irq_event+0xac/0x1e0
? tick_nohz_idle_exit+0x13a/0x3d0
handle_level_irq+0x245/0x6e0
? lockdep_hardirqs_on_prepare+0x27b/0x3f0
__common_interrupt+0x6c/0x160
? trace_hardirqs_on+0x2d/0x100
common_interrupt+0x78/0xa0
schedule_idle+0x5c/0xa0
asm_common_interrupt+0x22/0x40
do_idle+0x2e0/0x550
__x86_return_thunk+0x0/0x8
? arch_cpu_idle_exit+0x40/0x40
_raw_spin_unlock_irqrestore+0x33/0x50
? lockdep_hardirqs_on_prepare+0x27b/0x3f0
__setup_irq+0x101a/0x1ca0
? trace_hardirqs_on+0x2d/0x100
request_threaded_irq+0x2b7/0x3e0
cpu_startup_entry+0x19/0x20
hpet_time_init+0x2d/0x4b
start_secondary+0x241/0x2d0
x86_late_time_init+0x63/0xa6
? set_cpu_sibling_map+0x1ed0/0x1ed0
start_kernel+0x443/0x517
? set_bringup_idt_handler.constprop.0+0x98/0xc0
secondary_startup_64_no_verify+0xd3/0xdb
? start_cpu0+0xc/0xc

to a HARDIRQ-irq-unsafe lock:
secondary_startup_64_no_verify+0xd3/0xdb
(
</TASK>
&htab->buckets[i].lock
irq event stamp: 304481
){+...}-{2:2}
hardirqs last enabled at (304479): [<ffffffffb1043fca>]
tick_nohz_idle_exit+0x13a/0x3d0

... which became HARDIRQ-irq-unsafe at:
hardirqs last disabled at (304480): [<ffffffffb33a2e07>]
__schedule+0x14d7/0x28c0
...
lock_acquire+0x1a1/0x500
softirqs last enabled at (304354): [<ffffffffb0dd1079>]
__irq_exit_rcu+0x189/0x1f0
_raw_spin_lock_bh+0x34/0x40
softirqs last disabled at (304481): [<ffffffffb2c89cbc>]
sock_hash_delete_elem+0xcc/0x290
sock_hash_free+0x124/0x970
---[ end trace 0000000000000000 ]---
__sys_bpf+0x39c4/0x6070
__x64_sys_bpf+0x7a/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(&rq->__lock);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(&rq->__lock);

*** DEADLOCK ***

2 locks held by syz-executor.0/1299:
#0: ffff888054a42c98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x29a/0x28c0
#1: ffffffffb4bc25c0 (rcu_read_lock){....}-{1:2}, at: trace_call_bpf+0xa0/0x5f0

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&rq->__lock){-.-.}-{2:2} {
IN-HARDIRQ-W at:
lock_acquire+0x1a1/0x500
_raw_spin_lock_nested+0x2a/0x40
scheduler_tick+0x9f/0x770
update_process_times+0x10f/0x150
tick_periodic+0x72/0x230
tick_handle_periodic+0x46/0x120
timer_interrupt+0x4a/0x80
__handle_irq_event_percpu+0x214/0x7b0
handle_irq_event+0xac/0x1e0
handle_level_irq+0x245/0x6e0
__common_interrupt+0x6c/0x160
common_interrupt+0x78/0xa0
asm_common_interrupt+0x22/0x40
__x86_return_thunk+0x0/0x8
_raw_spin_unlock_irqrestore+0x33/0x50
__setup_irq+0x101a/0x1ca0
request_threaded_irq+0x2b7/0x3e0
hpet_time_init+0x2d/0x4b
x86_late_time_init+0x63/0xa6
start_kernel+0x443/0x517
secondary_startup_64_no_verify+0xd3/0xdb
IN-SOFTIRQ-W at:
lock_acquire+0x1a1/0x500
_raw_spin_lock_nested+0x2a/0x40
try_to_wake_up+0x4b6/0x1650
call_timer_fn+0x187/0x590
__run_timers.part.0+0x66b/0xa20
run_timer_softirq+0x85/0x130
__do_softirq+0x1c2/0x845
__irq_exit_rcu+0x189/0x1f0
irq_exit_rcu+0xa/0x20
sysvec_apic_timer_interrupt+0x6f/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
kmemleak_alloc+0x11/0x80
kmem_cache_alloc_trace+0x2ae/0x4a0
kprobe_add_ksym_blacklist+0xf4/0x300
kprobe_add_area_blacklist+0x6f/0xb0
init_kprobes+0x129/0x310
do_one_initcall+0xf0/0x550
kernel_init_freeable+0x50b/0x7c4
kernel_init+0x1f/0x1f0
ret_from_fork+0x22/0x30
INITIAL USE at:
lock_acquire+0x1a1/0x500
_raw_spin_lock_nested+0x2a/0x40
raw_spin_rq_lock_nested+0x11/0x20
_raw_spin_rq_lock_irqsave+0x25/0x50
rq_attach_root+0x25/0x340
sched_init+0x938/0xe26
start_kernel+0x1a8/0x517
secondary_startup_64_no_verify+0xd3/0xdb
}
... key at: [<ffffffffb5ddd400>] __key.265+0x0/0x40

the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&htab->buckets[i].lock){+...}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x1a1/0x500
_raw_spin_lock_bh+0x34/0x40
sock_hash_free+0x124/0x970
__sys_bpf+0x39c4/0x6070
__x64_sys_bpf+0x7a/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
INITIAL USE at:
lock_acquire+0x1a1/0x500
_raw_spin_lock_bh+0x34/0x40
sock_hash_free+0x124/0x970
__sys_bpf+0x39c4/0x6070
__x64_sys_bpf+0x7a/0xc0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
}
... key at: [<ffffffffb70688e0>] __key.0+0x0/0x40
... acquired at:
lock_acquire+0x1a1/0x500
_raw_spin_lock_bh+0x34/0x40
sock_hash_delete_elem+0xcc/0x290
bpf_prog_ca95c5394311bbec_func+0x2d/0x31
trace_call_bpf+0x274/0x5f0
perf_trace_run_bpf_submit+0x96/0x1c0
perf_trace_sched_switch+0x452/0x6c0
__schedule+0x134f/0x28c0
schedule+0xd4/0x1f0
exit_to_user_mode_prepare+0x124/0x230
syscall_exit_to_user_mode+0x16/0x50
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd

stack backtrace:
CPU: 0 PID: 1299 Comm: syz-executor.0 Tainted: G W 5.19.0+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x9c/0xc9
check_irq_usage.cold+0x4ab/0x666
? print_shortest_lock_dependencies_backwards+0x80/0x80
? check_path.constprop.0+0x24/0x50
? register_lock_class+0xb8/0x1120
? lock_is_held_type+0xa6/0x120
? lockdep_lock+0xbe/0x1c0
? call_rcu_zapped+0xc0/0xc0
__lock_acquire+0x293f/0x5320
? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
? __lock_acquire+0x15c3/0x5320
lock_acquire+0x1a1/0x500
? sock_hash_delete_elem+0xcc/0x290
? lock_release+0x720/0x720
? lock_acquire+0x1a1/0x500
? trace_call_bpf+0xa0/0x5f0
? lock_release+0x720/0x720
? __sanitizer_cov_trace_switch+0x50/0x90
_raw_spin_lock_bh+0x34/0x40
? sock_hash_delete_elem+0xcc/0x290
sock_hash_delete_elem+0xcc/0x290
bpf_prog_ca95c5394311bbec_func+0x2d/0x31
trace_call_bpf+0x274/0x5f0
? tracing_prog_func_proto+0x490/0x490
perf_trace_run_bpf_submit+0x96/0x1c0
perf_trace_sched_switch+0x452/0x6c0
? psi_task_switch+0x186/0x4c0
? trace_raw_output_sched_wake_idle_without_ipi+0xb0/0xb0
? do_raw_spin_lock+0x125/0x270
? pick_next_task_fair+0x46a/0xee0
__schedule+0x134f/0x28c0
? io_schedule_timeout+0x150/0x150
? fput+0x30/0x1a0
? ksys_write+0x1a8/0x260
schedule+0xd4/0x1f0
exit_to_user_mode_prepare+0x124/0x230
syscall_exit_to_user_mode+0x16/0x50
do_syscall_64+0x48/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f79ef08636f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54
24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
RSP: 002b:00007f79f0337300 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: 0000000000000011 RBX: 0000000000000011 RCX: 00007f79ef08636f
RDX: 0000000000000011 RSI: 00007f79f03374d0 RDI: 0000000000000002
RBP: 00007f79f03374d0 R08: 0000000000000000 R09: 0000000000000011
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000011
R13: 00007f79ef202080 R14: 0000000000000011 R15: 00007f79ef203ba0
</TASK>

Thank you,
Hsin-Wei Hung

--
Computer Science Department
University of California, Irvine

2023-02-23 13:20:26

by Jakub Sitnicki

[permalink] [raw]

Subject: Re: A potential deadlock in sockhash map

Hi,

On Mon, Feb 20, 2023 at 07:39 AM -06, Hsin-Wei Hung wrote:
> I think my previous report got blocked since it contained HTML
> subparts so I am sending it again. Our bpf runtime fuzzer (a
> customized syzkaller) triggered a lockdep warning in the bpf subsystem
> indicating a potential deadlock. We are able to trigger this bug on
> v5.15.25 and v5.19. The following code is a BPF PoC, and the lockdep
> warning is attached at the end.

Not sure if you've seen John's reply to the previous report:

https://lore.kernel.org/all/[email protected]/

Are you also fuzzing any newer kernel versions? Or was v5.19 the latest?

Did syzkaller find a reproducer?

Thanks,
Jakub

2023-02-24 15:59:22

by Hsin-Wei Hung

[permalink] [raw]

Subject: Re: A potential deadlock in sockhash map

Hi Jakub,

Thanks for following up. Sorry that I did not receive the previous reply.

The latest version I tested is 5.19 (3d7cb6b04c3f) and we can reproduce the
issue with the BPF PoC included. Since we modified Syzkaller, we do not
have a Syzkaller reproducer.

I will follow John's suggestion to test the latest kernel and bpf
tree. I will follow
up if the issue still reproduces.

Thanks,
Hsin-Wei

On Fri, Feb 24, 2023 at 8:51 AM Jakub Sitnicki <[email protected]> wrote:
>
> Hi,
>
> On Mon, Feb 20, 2023 at 07:39 AM -06, Hsin-Wei Hung wrote:
> > I think my previous report got blocked since it contained HTML
> > subparts so I am sending it again. Our bpf runtime fuzzer (a
> > customized syzkaller) triggered a lockdep warning in the bpf subsystem
> > indicating a potential deadlock. We are able to trigger this bug on
> > v5.15.25 and v5.19. The following code is a BPF PoC, and the lockdep
> > warning is attached at the end.
>
> Not sure if you've seen John's reply to the previous report:
>
> https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!PU-LFxMnx4b-GmTXGI0hYjBiq8vkwrFrlf_b0N5uzy8do5kPFiNcuZJbby-19TtOH2rJoY9UwOvzFArd$
>
> Are you also fuzzing any newer kernel versions? Or was v5.19 the latest?
>
> Did syzkaller find a reproducer?
>
> Thanks,
> Jakub

2023-02-24 19:19:12

by Hsin-Wei Hung

[permalink] [raw]

Subject: Re: A potential deadlock in sockhash map

Hi,

Just a quick update. I can still trigger the lockdep warning on bpf
tree (5b7c4cabbb65).

Thanks,
Hsin-Wei

On Fri, Feb 24, 2023 at 9:58 AM Hsin-Wei Hung <[email protected]> wrote:
>
> Hi Jakub,
>
> Thanks for following up. Sorry that I did not receive the previous reply.
>
> The latest version I tested is 5.19 (3d7cb6b04c3f) and we can reproduce the
> issue with the BPF PoC included. Since we modified Syzkaller, we do not
> have a Syzkaller reproducer.
>
> I will follow John's suggestion to test the latest kernel and bpf
> tree. I will follow
> up if the issue still reproduces.
>
> Thanks,
> Hsin-Wei
>
>
>
>
> On Fri, Feb 24, 2023 at 8:51 AM Jakub Sitnicki <[email protected]> wrote:
> >
> > Hi,
> >
> > On Mon, Feb 20, 2023 at 07:39 AM -06, Hsin-Wei Hung wrote:
> > > I think my previous report got blocked since it contained HTML
> > > subparts so I am sending it again. Our bpf runtime fuzzer (a
> > > customized syzkaller) triggered a lockdep warning in the bpf subsystem
> > > indicating a potential deadlock. We are able to trigger this bug on
> > > v5.15.25 and v5.19. The following code is a BPF PoC, and the lockdep
> > > warning is attached at the end.
> >
> > Not sure if you've seen John's reply to the previous report:
> >
> > https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!PU-LFxMnx4b-GmTXGI0hYjBiq8vkwrFrlf_b0N5uzy8do5kPFiNcuZJbby-19TtOH2rJoY9UwOvzFArd$
> >
> > Are you also fuzzing any newer kernel versions? Or was v5.19 the latest?
> >
> > Did syzkaller find a reproducer?
> >
> > Thanks,
> > Jakub

2023-02-24 20:26:56

by John Fastabend

[permalink] [raw]

Subject: Re: A potential deadlock in sockhash map

Hsin-Wei Hung wrote:
> Hi,
>
> Just a quick update. I can still trigger the lockdep warning on bpf
> tree (5b7c4cabbb65).
>
> Thanks,
> Hsin-Wei

Thanks, I'll take a look.

>
> On Fri, Feb 24, 2023 at 9:58 AM Hsin-Wei Hung <[email protected]> wrote:
> >
> > Hi Jakub,
> >
> > Thanks for following up. Sorry that I did not receive the previous reply.
> >
> > The latest version I tested is 5.19 (3d7cb6b04c3f) and we can reproduce the
> > issue with the BPF PoC included. Since we modified Syzkaller, we do not
> > have a Syzkaller reproducer.
> >
> > I will follow John's suggestion to test the latest kernel and bpf
> > tree. I will follow
> > up if the issue still reproduces.
> >
> > Thanks,
> > Hsin-Wei
> >
> >
> >
> >
> > On Fri, Feb 24, 2023 at 8:51 AM Jakub Sitnicki <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, Feb 20, 2023 at 07:39 AM -06, Hsin-Wei Hung wrote:
> > > > I think my previous report got blocked since it contained HTML
> > > > subparts so I am sending it again. Our bpf runtime fuzzer (a
> > > > customized syzkaller) triggered a lockdep warning in the bpf subsystem
> > > > indicating a potential deadlock. We are able to trigger this bug on
> > > > v5.15.25 and v5.19. The following code is a BPF PoC, and the lockdep
> > > > warning is attached at the end.
> > >
> > > Not sure if you've seen John's reply to the previous report:
> > >
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/[email protected]/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!PU-LFxMnx4b-GmTXGI0hYjBiq8vkwrFrlf_b0N5uzy8do5kPFiNcuZJbby-19TtOH2rJoY9UwOvzFArd$
> > >
> > > Are you also fuzzing any newer kernel versions? Or was v5.19 the latest?
> > >
> > > Did syzkaller find a reproducer?
> > >
> > > Thanks,
> > > Jakub

2023-03-12 19:32:32

by Cong Wang

[permalink] [raw]

Subject: Re: A potential deadlock in sockhash map

On Mon, Feb 20, 2023 at 07:39:59AM -0600, Hsin-Wei Hung wrote:
> DEFINE_BPF_MAP(map_0, BPF_MAP_TYPE_SOCKHASH, 0, uint32_t, uint32_t, 1005);
> SEC("tp/sched/sched_switch")
> int func(__u64 *ctx) {
> uint32_t v0 = 0;
> uint64_t v1 = 0;
> v1 = bpf_map_delete_elem(&map_0, &v0);
> return 0;
> }

It looks like we have to disable hardirq (instead of just softirq) in order to use
sockmap safely in interrupt context like sched/sched_switch.

Thanks.