Hi,
The following warning can be triggered with the C reproducer in
the link. The repro starts 32 threads, each attaches a tracepoint
into `ext4_mark_inode_dirty`. The prog loads the following insns
that simply sends signal to current proc, and then wait.
Seems issues in queued irq_work with `do_bpf_send_signal`, also
I'm wondering what if the task in `send_signal_irq_work` exited,
at the time the callback invoked.
This can be reproduced on:
HEAD commit: 6d0c4b11e743 ("libbpf: Poison strlcpy()")
git tree: bpf-next
console output: https://pastebin.com/raw/ZtVM15Jx
kernel config : https://pastebin.com/raw/nt6XW0Sz
C reproducer : https://pastebin.com/raw/NHqy5tR6
func#0 @0
0: R1=ctx(off=0,imm=0) R10=fp0
0: (18) r0 = 0x0 ; R0_w=0
2: (18) r6 = 0x0 ; R6_w=0
4: (18) r7 = 0x0 ; R7_w=0
6: (18) r8 = 0x0 ; R8_w=0
8: (18) r9 = 0x0 ; R9_w=0
10: (18) r1 = 0x700000007 ; R1_w=30064771079
12: (85) call bpf_send_signal#109 ; R0_w=scalar()
13: (95) exit
processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 41309 at kernel/locking/mutex-debug.c:74 debug_mutex_unlock+0x1fc/0x250 kernel/locking/mutex-debug.c:74
Modules linked in:
CPU: 3 PID: 41309 Comm: systemd-udevd Not tainted 6.2.0-rc2-00302-g6d0c4b11e743 #153
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:debug_mutex_unlock+0x1fc/0x250 kernel/locking/mutex-debug.c:74
Code: 84 c0 75 59 8b 15 34 92 3a 11 85 d2 0f 84 c5 fe ff ff e9 e6 fe ff ff 48 c7 c6 a0 50 4c 8a 48 c7 c7 a0 4e 4c 8a e8 9e bb 5f 08 <0f> 0b eb b2 e8 bb 9d 6e 00 e9 be fe ff ff e8 b1 9d 6e 00 e9 8b fe
RSP: 0018:ffffc90012b7f6f8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88804145a2d0 RCX: 0000000000000000
RDX: ffff88803f468000 RSI: ffffffff81671400 RDI: fffff5200256fed1
RBP: ffffffff929e2a00 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000001 R11: ffffffff90d3f0c3 R12: ffff88804145a2d8
R13: fffffbfff253c540 R14: ffffffff929e2a00 R15: ffffc90012b7f798
FS: 00007f36a937e8c0(0000) GS:ffff888135c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056181882c020 CR3: 000000002a9a2000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
<TASK>
__mutex_unlock_slowpath+0x197/0x630 kernel/locking/mutex.c:933
devlink_compat_running_version+0x128/0x6c0 net/devlink/leftover.c:12237
dev_ethtool+0x55f/0x53c0 net/ethtool/ioctl.c:3091
dev_ioctl+0x29e/0x1050 net/core/dev_ioctl.c:524
sock_do_ioctl+0x1be/0x250 net/socket.c:1183
sock_ioctl+0x205/0x6a0 net/socket.c:1286
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x189/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f36a94a55f7
Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd663b3d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00005618188043f0 RCX: 00007f36a94a55f7
RDX: 00007ffd663b3e60 RSI: 0000000000008946 RDI: 0000000000000006
RBP: 00007ffd663b3e90 R08: 0000561818887fa0 R09: 0000000000000000
R10: 00007f36a937e6c0 R11: 0000000000000246 R12: 0000561818887fa0
R13: 0000561818832c70 R14: 00007ffd663b3e60 R15: 0000000000000007
</TASK>
On 1/8/23 11:44 PM, Hao Sun wrote:
> Hi,
>
> The following warning can be triggered with the C reproducer in
> the link. The repro starts 32 threads, each attaches a tracepoint
> into `ext4_mark_inode_dirty`. The prog loads the following insns
> that simply sends signal to current proc, and then wait.
>
> Seems issues in queued irq_work with `do_bpf_send_signal`, also
> I'm wondering what if the task in `send_signal_irq_work` exited,
> at the time the callback invoked.
Somehow, I cannot reproduce the issue in my qemu environment
with below kernel config and C reproducer.
But could you try the following patch to see whether it
fixed the issue in your environment?
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 23ce498bca97..1b26d51caf31 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -831,6 +831,7 @@ static void do_bpf_send_signal(struct irq_work *entry)
work = container_of(entry, struct send_signal_irq_work, irq_work);
group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task,
work->type);
+ put_task_struct(work->task);
}
static int bpf_send_signal_common(u32 sig, enum pid_type type)
@@ -862,7 +863,7 @@ static int bpf_send_signal_common(u32 sig, enum
pid_type type)
* to the irq_work. The current task may change when queued
* irq works get executed.
*/
- work->task = current;
+ work->task = get_task_struct(current);
work->sig = sig;
work->type = type;
irq_work_queue(&work->irq_work);
>
> This can be reproduced on:
>
> HEAD commit: 6d0c4b11e743 ("libbpf: Poison strlcpy()")
> git tree: bpf-next
> console output: https://pastebin.com/raw/ZtVM15Jx
> kernel config : https://pastebin.com/raw/nt6XW0Sz
> C reproducer : https://pastebin.com/raw/NHqy5tR6
>
> func#0 @0
> 0: R1=ctx(off=0,imm=0) R10=fp0
> 0: (18) r0 = 0x0 ; R0_w=0
> 2: (18) r6 = 0x0 ; R6_w=0
> 4: (18) r7 = 0x0 ; R7_w=0
> 6: (18) r8 = 0x0 ; R8_w=0
> 8: (18) r9 = 0x0 ; R9_w=0
> 10: (18) r1 = 0x700000007 ; R1_w=30064771079
> 12: (85) call bpf_send_signal#109 ; R0_w=scalar()
> 13: (95) exit
> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> ------------[ cut here ]------------
> DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> WARNING: CPU: 3 PID: 41309 at kernel/locking/mutex-debug.c:74 debug_mutex_unlock+0x1fc/0x250 kernel/locking/mutex-debug.c:74
> Modules linked in:
> CPU: 3 PID: 41309 Comm: systemd-udevd Not tainted 6.2.0-rc2-00302-g6d0c4b11e743 #153
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:debug_mutex_unlock+0x1fc/0x250 kernel/locking/mutex-debug.c:74
> Code: 84 c0 75 59 8b 15 34 92 3a 11 85 d2 0f 84 c5 fe ff ff e9 e6 fe ff ff 48 c7 c6 a0 50 4c 8a 48 c7 c7 a0 4e 4c 8a e8 9e bb 5f 08 <0f> 0b eb b2 e8 bb 9d 6e 00 e9 be fe ff ff e8 b1 9d 6e 00 e9 8b fe
> RSP: 0018:ffffc90012b7f6f8 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff88804145a2d0 RCX: 0000000000000000
> RDX: ffff88803f468000 RSI: ffffffff81671400 RDI: fffff5200256fed1
> RBP: ffffffff929e2a00 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000080000001 R11: ffffffff90d3f0c3 R12: ffff88804145a2d8
> R13: fffffbfff253c540 R14: ffffffff929e2a00 R15: ffffc90012b7f798
> FS: 00007f36a937e8c0(0000) GS:ffff888135c80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000056181882c020 CR3: 000000002a9a2000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
> <TASK>
> __mutex_unlock_slowpath+0x197/0x630 kernel/locking/mutex.c:933
> devlink_compat_running_version+0x128/0x6c0 net/devlink/leftover.c:12237
> dev_ethtool+0x55f/0x53c0 net/ethtool/ioctl.c:3091
> dev_ioctl+0x29e/0x1050 net/core/dev_ioctl.c:524
> sock_do_ioctl+0x1be/0x250 net/socket.c:1183
> sock_ioctl+0x205/0x6a0 net/socket.c:1286
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:870 [inline]
> __se_sys_ioctl fs/ioctl.c:856 [inline]
> __x64_sys_ioctl+0x189/0x210 fs/ioctl.c:856
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f36a94a55f7
> Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffd663b3d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00005618188043f0 RCX: 00007f36a94a55f7
> RDX: 00007ffd663b3e60 RSI: 0000000000008946 RDI: 0000000000000006
> RBP: 00007ffd663b3e90 R08: 0000561818887fa0 R09: 0000000000000000
> R10: 00007f36a937e6c0 R11: 0000000000000246 R12: 0000561818887fa0
> R13: 0000561818832c70 R14: 00007ffd663b3e60 R15: 0000000000000007
> </TASK>
>
> On 13 Jan 2023, at 2:45 PM, Yonghong Song <[email protected]> wrote:
>
>
>
> On 1/8/23 11:44 PM, Hao Sun wrote:
>> Hi,
>> The following warning can be triggered with the C reproducer in
>> the link. The repro starts 32 threads, each attaches a tracepoint
>> into `ext4_mark_inode_dirty`. The prog loads the following insns
>> that simply sends signal to current proc, and then wait.
>> Seems issues in queued irq_work with `do_bpf_send_signal`, also
>> I'm wondering what if the task in `send_signal_irq_work` exited,
>> at the time the callback invoked.
>
> Somehow, I cannot reproduce the issue in my qemu environment
> with below kernel config and C reproducer.
>
> But could you try the following patch to see whether it
> fixed the issue in your environment?
Tested the below patch on my local machine, seems fixed the issue.
Before applying the patch, the reproducer can still trigger the
reported issue on a latest bpf-next build; After applying the
patch, the warning no longer appears.
The test is conducted on: dfff86f8eb6a (“Merge branch 'samples/bpf:
modernize BPF functionality test programs'")
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 23ce498bca97..1b26d51caf31 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -831,6 +831,7 @@ static void do_bpf_send_signal(struct irq_work *entry)
>
> work = container_of(entry, struct send_signal_irq_work, irq_work);
> group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, work->type);
> + put_task_struct(work->task);
> }
>
> static int bpf_send_signal_common(u32 sig, enum pid_type type)
> @@ -862,7 +863,7 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type)
> * to the irq_work. The current task may change when queued
> * irq works get executed.
> */
> - work->task = current;
> + work->task = get_task_struct(current);
> work->sig = sig;
> work->type = type;
> irq_work_queue(&work->irq_work);
>
On 1/15/23 6:24 PM, Hao Sun wrote:
>
>
>> On 13 Jan 2023, at 2:45 PM, Yonghong Song <[email protected]> wrote:
>>
>>
>>
>> On 1/8/23 11:44 PM, Hao Sun wrote:
>>> Hi,
>>> The following warning can be triggered with the C reproducer in
>>> the link. The repro starts 32 threads, each attaches a tracepoint
>>> into `ext4_mark_inode_dirty`. The prog loads the following insns
>>> that simply sends signal to current proc, and then wait.
>>> Seems issues in queued irq_work with `do_bpf_send_signal`, also
>>> I'm wondering what if the task in `send_signal_irq_work` exited,
>>> at the time the callback invoked.
>>
>> Somehow, I cannot reproduce the issue in my qemu environment
>> with below kernel config and C reproducer.
>>
>> But could you try the following patch to see whether it
>> fixed the issue in your environment?
>
> Tested the below patch on my local machine, seems fixed the issue.
>
> Before applying the patch, the reproducer can still trigger the
> reported issue on a latest bpf-next build; After applying the
> patch, the warning no longer appears.
>
> The test is conducted on: dfff86f8eb6a (“Merge branch 'samples/bpf:
> modernize BPF functionality test programs'")
Thanks for testing. I will submit a patch shortly with your
Reported-by and Tested-by.
>
>
>>
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 23ce498bca97..1b26d51caf31 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -831,6 +831,7 @@ static void do_bpf_send_signal(struct irq_work *entry)
>>
>> work = container_of(entry, struct send_signal_irq_work, irq_work);
>> group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, work->type);
>> + put_task_struct(work->task);
>> }
>>
>> static int bpf_send_signal_common(u32 sig, enum pid_type type)
>> @@ -862,7 +863,7 @@ static int bpf_send_signal_common(u32 sig, enum pid_type type)
>> * to the irq_work. The current task may change when queued
>> * irq works get executed.
>> */
>> - work->task = current;
>> + work->task = get_task_struct(current);
>> work->sig = sig;
>> work->type = type;
>> irq_work_queue(&work->irq_work);
>>
>