Hi,
The following warning can be triggered with the C reproducer in the link.
Syzbot also reported this several days ago, Jiri posted a patch that
uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
Prevent bpf program recursion...) according to syzbot dashboard
(https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
But this warning can still be triggered on 247f34f7b803
(Linux-v6.1-rc2) that already merged the patch, so it seems that this
still is an issue.
HEAD commit: 247f34f7b803 Linux 6.1-rc2
git tree: upstream
console output: https://pastebin.com/raw/kNw8JCu5
kernel config: https://pastebin.com/raw/sE5QK5HL
C reproducer: https://pastebin.com/raw/X96ASi27
------------[ cut here ]------------
WARNING: CPU: 6 PID: 6850 at kernel/bpf/helpers.c:770
try_get_fmt_tmp_buf kernel/bpf/helpers.c:770 [inline]
WARNING: CPU: 6 PID: 6850 at kernel/bpf/helpers.c:770
bpf_bprintf_prepare+0xf6a/0x1170 kernel/bpf/helpers.c:818
Modules linked in:
CPU: 6 PID: 6850 Comm: a.out Not tainted 6.1.0-rc2-dirty #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:try_get_fmt_tmp_buf kernel/bpf/helpers.c:770 [inline]
RIP: 0010:bpf_bprintf_prepare+0xf6a/0x1170 kernel/bpf/helpers.c:818
Code: c6 e8 ba 51 00 07 83 c0 01 48 98 48 01 c5 48 89 6c 24 08 e8 b8
0a eb ff 8d 6b 02 83 44 24 10 01 e9 2d f5 ff ff e8 a6 0a eb ff <0f> 0b
65 ff 0d 85 bf 7c 7e bf 01 00 00 00 41 bc f0 ff ff ff e8 2d
RSP: 0018:ffffc90015a96c20 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff88814f8057c0
RDX: 0000000000000000 RSI: ffff88814f8057c0 RDI: 0000000000000002
RBP: ffffc90015a96d50 R08: ffffffff818681ba R09: 0000000000000003
R10: 0000000000000005 R11: fffffbfff1a25ab2 R12: 0000000000000003
R13: 0000000000000004 R14: ffffc90015a96e08 R15: 0000000000000003
FS: 00007f012f4d2440(0000) GS:ffff8880b9d80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f012f19ac28 CR3: 0000000148da3000 CR4: 0000000000350ee0
Call Trace:
<TASK>
____bpf_trace_printk kernel/trace/bpf_trace.c:385 [inline]
bpf_trace_printk+0xab/0x420 kernel/trace/bpf_trace.c:376
bpf_prog_0605f9f479290f07+0x2f/0x33
bpf_dispatcher_nop_func include/linux/bpf.h:963 [inline]
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run include/linux/filter.h:607 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
__bpf_trace_contention_begin+0xb5/0xf0 include/trace/events/lock.h:95
__traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
trace_contention_begin include/trace/events/lock.h:95 [inline]
__pv_queued_spin_lock_slowpath+0x542/0xff0 kernel/locking/qspinlock.c:405
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x204/0x2d0 kernel/locking/spinlock_debug.c:115
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0x41/0x50 kernel/locking/spinlock.c:162
____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
bpf_trace_printk+0xcf/0x420 kernel/trace/bpf_trace.c:376
</TASK>
Regards
Hao Sun
On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> Hi,
>
> The following warning can be triggered with the C reproducer in the link.
> Syzbot also reported this several days ago, Jiri posted a patch that
> uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> Prevent bpf program recursion...) according to syzbot dashboard
> (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> But this warning can still be triggered on 247f34f7b803
> (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> still is an issue.
>
> HEAD commit: 247f34f7b803 Linux 6.1-rc2
> git tree: upstream
> console output: https://pastebin.com/raw/kNw8JCu5
> kernel config: https://pastebin.com/raw/sE5QK5HL
> C reproducer: https://pastebin.com/raw/X96ASi27
hi,
right, that fix addressed that issue for single bpf program,
and it won't prevent if there are multiple programs hook on
contention_begin tracepoint and calling bpf_trace_printk,
I'm not sure we can do something there.. will check
do you run just the reproducer, or you load the server somehow?
I cannot hit the issue so far
thanks,
jirka
>
> ------------[ cut here ]------------
> WARNING: CPU: 6 PID: 6850 at kernel/bpf/helpers.c:770
> try_get_fmt_tmp_buf kernel/bpf/helpers.c:770 [inline]
> WARNING: CPU: 6 PID: 6850 at kernel/bpf/helpers.c:770
> bpf_bprintf_prepare+0xf6a/0x1170 kernel/bpf/helpers.c:818
> Modules linked in:
> CPU: 6 PID: 6850 Comm: a.out Not tainted 6.1.0-rc2-dirty #23
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> RIP: 0010:try_get_fmt_tmp_buf kernel/bpf/helpers.c:770 [inline]
> RIP: 0010:bpf_bprintf_prepare+0xf6a/0x1170 kernel/bpf/helpers.c:818
> Code: c6 e8 ba 51 00 07 83 c0 01 48 98 48 01 c5 48 89 6c 24 08 e8 b8
> 0a eb ff 8d 6b 02 83 44 24 10 01 e9 2d f5 ff ff e8 a6 0a eb ff <0f> 0b
> 65 ff 0d 85 bf 7c 7e bf 01 00 00 00 41 bc f0 ff ff ff e8 2d
> RSP: 0018:ffffc90015a96c20 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff88814f8057c0
> RDX: 0000000000000000 RSI: ffff88814f8057c0 RDI: 0000000000000002
> RBP: ffffc90015a96d50 R08: ffffffff818681ba R09: 0000000000000003
> R10: 0000000000000005 R11: fffffbfff1a25ab2 R12: 0000000000000003
> R13: 0000000000000004 R14: ffffc90015a96e08 R15: 0000000000000003
> FS: 00007f012f4d2440(0000) GS:ffff8880b9d80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f012f19ac28 CR3: 0000000148da3000 CR4: 0000000000350ee0
> Call Trace:
> <TASK>
> ____bpf_trace_printk kernel/trace/bpf_trace.c:385 [inline]
> bpf_trace_printk+0xab/0x420 kernel/trace/bpf_trace.c:376
> bpf_prog_0605f9f479290f07+0x2f/0x33
> bpf_dispatcher_nop_func include/linux/bpf.h:963 [inline]
> __bpf_prog_run include/linux/filter.h:600 [inline]
> bpf_prog_run include/linux/filter.h:607 [inline]
> __bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
> bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
> __bpf_trace_contention_begin+0xb5/0xf0 include/trace/events/lock.h:95
> __traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
> trace_contention_begin include/trace/events/lock.h:95 [inline]
> __pv_queued_spin_lock_slowpath+0x542/0xff0 kernel/locking/qspinlock.c:405
> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
> queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
> do_raw_spin_lock+0x204/0x2d0 kernel/locking/spinlock_debug.c:115
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
> _raw_spin_lock_irqsave+0x41/0x50 kernel/locking/spinlock.c:162
> ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> bpf_trace_printk+0xcf/0x420 kernel/trace/bpf_trace.c:376
> </TASK>
>
> Regards
> Hao Sun
Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
>
> On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > Hi,
> >
> > The following warning can be triggered with the C reproducer in the link.
> > Syzbot also reported this several days ago, Jiri posted a patch that
> > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > Prevent bpf program recursion...) according to syzbot dashboard
> > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > But this warning can still be triggered on 247f34f7b803
> > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > still is an issue.
> >
> > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > git tree: upstream
> > console output: https://pastebin.com/raw/kNw8JCu5
> > kernel config: https://pastebin.com/raw/sE5QK5HL
> > C reproducer: https://pastebin.com/raw/X96ASi27
>
> hi,
> right, that fix addressed that issue for single bpf program,
> and it won't prevent if there are multiple programs hook on
> contention_begin tracepoint and calling bpf_trace_printk,
>
> I'm not sure we can do something there.. will check
>
> do you run just the reproducer, or you load the server somehow?
> I cannot hit the issue so far
>
Hi,
Last email has format issues, resend it here.
I built the kernel with the config in the link, which contains
“CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
boot the kernel with normal qemu setup and then the warning can be
triggered by executing the reproducer.
Also, I’m willing to test the proposed patch if any.
Thanks
Hao
> thanks,
> jirka
>
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 6 PID: 6850 at kernel/bpf/helpers.c:770
> > try_get_fmt_tmp_buf kernel/bpf/helpers.c:770 [inline]
> > WARNING: CPU: 6 PID: 6850 at kernel/bpf/helpers.c:770
> > bpf_bprintf_prepare+0xf6a/0x1170 kernel/bpf/helpers.c:818
> > Modules linked in:
> > CPU: 6 PID: 6850 Comm: a.out Not tainted 6.1.0-rc2-dirty #23
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > 1.13.0-1ubuntu1.1 04/01/2014
> > RIP: 0010:try_get_fmt_tmp_buf kernel/bpf/helpers.c:770 [inline]
> > RIP: 0010:bpf_bprintf_prepare+0xf6a/0x1170 kernel/bpf/helpers.c:818
> > Code: c6 e8 ba 51 00 07 83 c0 01 48 98 48 01 c5 48 89 6c 24 08 e8 b8
> > 0a eb ff 8d 6b 02 83 44 24 10 01 e9 2d f5 ff ff e8 a6 0a eb ff <0f> 0b
> > 65 ff 0d 85 bf 7c 7e bf 01 00 00 00 41 bc f0 ff ff ff e8 2d
> > RSP: 0018:ffffc90015a96c20 EFLAGS: 00010046
> > RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff88814f8057c0
> > RDX: 0000000000000000 RSI: ffff88814f8057c0 RDI: 0000000000000002
> > RBP: ffffc90015a96d50 R08: ffffffff818681ba R09: 0000000000000003
> > R10: 0000000000000005 R11: fffffbfff1a25ab2 R12: 0000000000000003
> > R13: 0000000000000004 R14: ffffc90015a96e08 R15: 0000000000000003
> > FS: 00007f012f4d2440(0000) GS:ffff8880b9d80000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f012f19ac28 CR3: 0000000148da3000 CR4: 0000000000350ee0
> > Call Trace:
> > <TASK>
> > ____bpf_trace_printk kernel/trace/bpf_trace.c:385 [inline]
> > bpf_trace_printk+0xab/0x420 kernel/trace/bpf_trace.c:376
> > bpf_prog_0605f9f479290f07+0x2f/0x33
> > bpf_dispatcher_nop_func include/linux/bpf.h:963 [inline]
> > __bpf_prog_run include/linux/filter.h:600 [inline]
> > bpf_prog_run include/linux/filter.h:607 [inline]
> > __bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
> > bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
> > __bpf_trace_contention_begin+0xb5/0xf0 include/trace/events/lock.h:95
> > __traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
> > trace_contention_begin include/trace/events/lock.h:95 [inline]
> > __pv_queued_spin_lock_slowpath+0x542/0xff0 kernel/locking/qspinlock.c:405
> > pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
> > queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
> > queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
> > do_raw_spin_lock+0x204/0x2d0 kernel/locking/spinlock_debug.c:115
> > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
> > _raw_spin_lock_irqsave+0x41/0x50 kernel/locking/spinlock.c:162
> > ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> > bpf_trace_printk+0xcf/0x420 kernel/trace/bpf_trace.c:376
> > </TASK>
> >
> > Regards
> > Hao Sun
On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
> >
> > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > Hi,
> > >
> > > The following warning can be triggered with the C reproducer in the link.
> > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > Prevent bpf program recursion...) according to syzbot dashboard
> > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > But this warning can still be triggered on 247f34f7b803
> > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > still is an issue.
> > >
> > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > git tree: upstream
> > > console output: https://pastebin.com/raw/kNw8JCu5
> > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > C reproducer: https://pastebin.com/raw/X96ASi27
> >
> > hi,
> > right, that fix addressed that issue for single bpf program,
> > and it won't prevent if there are multiple programs hook on
> > contention_begin tracepoint and calling bpf_trace_printk,
> >
> > I'm not sure we can do something there.. will check
> >
> > do you run just the reproducer, or you load the server somehow?
> > I cannot hit the issue so far
> >
>
> Hi,
>
> Last email has format issues, resend it here.
>
> I built the kernel with the config in the link, which contains
> “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> boot the kernel with normal qemu setup and then the warning can be
> triggered by executing the reproducer.
>
> Also, I’m willing to test the proposed patch if any.
fyi I reproduced that.. will check if we can do anything about that
thanks,
jirka
On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
> > >
> > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > Hi,
> > > >
> > > > The following warning can be triggered with the C reproducer in the link.
> > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > But this warning can still be triggered on 247f34f7b803
> > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > still is an issue.
> > > >
> > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > git tree: upstream
> > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > >
> > > hi,
> > > right, that fix addressed that issue for single bpf program,
> > > and it won't prevent if there are multiple programs hook on
> > > contention_begin tracepoint and calling bpf_trace_printk,
> > >
> > > I'm not sure we can do something there.. will check
> > >
> > > do you run just the reproducer, or you load the server somehow?
> > > I cannot hit the issue so far
> > >
> >
> > Hi,
> >
> > Last email has format issues, resend it here.
> >
> > I built the kernel with the config in the link, which contains
> > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > boot the kernel with normal qemu setup and then the warning can be
> > triggered by executing the reproducer.
> >
> > Also, I’m willing to test the proposed patch if any.
>
> fyi I reproduced that.. will check if we can do anything about that
I reproduced this with set of 8 programs all hooked to contention_begin
tracepoint and here's what I think is happening:
all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
running 'perf bench sched messaging' to load the machine
at some point some contended lock triggers trace_contention_begin:
trace_contention_begin
__traceiter_contention_begin <-- iterates all functions attached to tracepoint
__bpf_trace_run(prog1)
prog1->active = 1
bpf_prog_run(prog1)
bpf_trace_printk
bpf_bprintf_prepare <-- takes buffer 1 out of 3
raw_spin_lock_irqsave(trace_printk_lock)
# we have global single trace_printk_lock, so we will trigger
# its trace_contention_begin at some point
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
prog1->active block <-- prog1 is already 'running', skipping the execution
__bpf_trace_run(prog2)
prog2->active = 1
bpf_prog_run(prog2)
bpf_trace_printk
bpf_bprintf_prepare <-- takes buffer 2 out of 3
raw_spin_lock_irqsave(trace_printk_lock)
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
prog1->active block <-- prog1 is already 'running', skipping the execution
__bpf_trace_run(prog2)
prog2->active block <-- prog2 is already 'running', skipping the execution
__bpf_trace_run(prog3)
prog3->active = 1
bpf_prog_run(prog3)
bpf_trace_printk
bpf_bprintf_prepare <-- takes buffer 3 out of 3
raw_spin_lock_irqsave(trace_printk_lock)
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
prog1->active block <-- prog1 is already 'running', skipping the execution
__bpf_trace_run(prog2)
prog2->active block <-- prog2 is already 'running', skipping the execution
__bpf_trace_run(prog3)
prog3->active block <-- prog3 is already 'running', skipping the execution
__bpf_trace_run(prog4)
prog4->active = 1
bpf_prog_run(prog4)
bpf_trace_printk
bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
the code path may vary based on the contention of the trace_printk_lock,
so I saw different nesting within 8 programs, but all eventually ended up
at 4 levels of nesting and hit the warning
I think we could perhaps move the 'active' flag protection from program
to the tracepoint level (in the patch below), to prevent nesting execution
of the same tracepoint, so it'd look like:
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1) {
contention_begin.active = 1
bpf_prog_run(prog1)
bpf_trace_printk
bpf_bprintf_prepare
raw_spin_lock_irqsave(trace_printk_lock)
trace_contention_begin
__traceiter_contention_begin
__bpf_trace_run(prog1)
blocked because contention_begin.active == 1
__bpf_trace_run(prog2)
blocked because contention_begin.active == 1
__bpf_trace_run(prog3)
...
__bpf_trace_run(prog8)
blocked because contention_begin.active == 1
raw_spin_unlock_irqrestore
bpf_bprintf_cleanup
contention_begin.active = 0
}
__bpf_trace_run(prog2) {
contention_begin.active = 1
bpf_prog_run(prog2)
...
contention_begin.active = 0
}
do we need bpf program execution in nested tracepoints?
we could actually allow 3 nesting levels for this case.. thoughts?
thanks,
jirka
---
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27..5a354ae096e5 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -78,11 +78,15 @@
#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
#define __BPF_DECLARE_TRACE(call, proto, args) \
+static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
static notrace void \
__bpf_trace_##call(void *__data, proto) \
{ \
struct bpf_prog *prog = __data; \
- CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ \
+ if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ this_cpu_dec(__bpf_trace_tp_active_##call); \
}
#undef DECLARE_EVENT_CLASS
On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <[email protected]> wrote:
>
> On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
> > > >
> > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > Hi,
> > > > >
> > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > But this warning can still be triggered on 247f34f7b803
> > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > still is an issue.
> > > > >
> > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > git tree: upstream
> > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > >
> > > > hi,
> > > > right, that fix addressed that issue for single bpf program,
> > > > and it won't prevent if there are multiple programs hook on
> > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > >
> > > > I'm not sure we can do something there.. will check
> > > >
> > > > do you run just the reproducer, or you load the server somehow?
> > > > I cannot hit the issue so far
> > > >
> > >
> > > Hi,
> > >
> > > Last email has format issues, resend it here.
> > >
> > > I built the kernel with the config in the link, which contains
> > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > boot the kernel with normal qemu setup and then the warning can be
> > > triggered by executing the reproducer.
> > >
> > > Also, I’m willing to test the proposed patch if any.
> >
> > fyi I reproduced that.. will check if we can do anything about that
>
> I reproduced this with set of 8 programs all hooked to contention_begin
> tracepoint and here's what I think is happening:
>
> all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> running 'perf bench sched messaging' to load the machine
>
> at some point some contended lock triggers trace_contention_begin:
>
> trace_contention_begin
> __traceiter_contention_begin <-- iterates all functions attached to tracepoint
> __bpf_trace_run(prog1)
> prog1->active = 1
> bpf_prog_run(prog1)
> bpf_trace_printk
> bpf_bprintf_prepare <-- takes buffer 1 out of 3
> raw_spin_lock_irqsave(trace_printk_lock)
>
> # we have global single trace_printk_lock, so we will trigger
> # its trace_contention_begin at some point
>
> trace_contention_begin
> __traceiter_contention_begin
> __bpf_trace_run(prog1)
> prog1->active block <-- prog1 is already 'running', skipping the execution
> __bpf_trace_run(prog2)
> prog2->active = 1
> bpf_prog_run(prog2)
> bpf_trace_printk
> bpf_bprintf_prepare <-- takes buffer 2 out of 3
> raw_spin_lock_irqsave(trace_printk_lock)
> trace_contention_begin
> __traceiter_contention_begin
> __bpf_trace_run(prog1)
> prog1->active block <-- prog1 is already 'running', skipping the execution
> __bpf_trace_run(prog2)
> prog2->active block <-- prog2 is already 'running', skipping the execution
> __bpf_trace_run(prog3)
> prog3->active = 1
> bpf_prog_run(prog3)
> bpf_trace_printk
> bpf_bprintf_prepare <-- takes buffer 3 out of 3
> raw_spin_lock_irqsave(trace_printk_lock)
> trace_contention_begin
> __traceiter_contention_begin
> __bpf_trace_run(prog1)
> prog1->active block <-- prog1 is already 'running', skipping the execution
> __bpf_trace_run(prog2)
> prog2->active block <-- prog2 is already 'running', skipping the execution
> __bpf_trace_run(prog3)
> prog3->active block <-- prog3 is already 'running', skipping the execution
> __bpf_trace_run(prog4)
> prog4->active = 1
> bpf_prog_run(prog4)
> bpf_trace_printk
> bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
>
>
> the code path may vary based on the contention of the trace_printk_lock,
> so I saw different nesting within 8 programs, but all eventually ended up
> at 4 levels of nesting and hit the warning
>
> I think we could perhaps move the 'active' flag protection from program
> to the tracepoint level (in the patch below), to prevent nesting execution
> of the same tracepoint, so it'd look like:
>
> trace_contention_begin
> __traceiter_contention_begin
> __bpf_trace_run(prog1) {
> contention_begin.active = 1
> bpf_prog_run(prog1)
> bpf_trace_printk
> bpf_bprintf_prepare
> raw_spin_lock_irqsave(trace_printk_lock)
> trace_contention_begin
> __traceiter_contention_begin
> __bpf_trace_run(prog1)
> blocked because contention_begin.active == 1
> __bpf_trace_run(prog2)
> blocked because contention_begin.active == 1
> __bpf_trace_run(prog3)
> ...
> __bpf_trace_run(prog8)
> blocked because contention_begin.active == 1
>
> raw_spin_unlock_irqrestore
> bpf_bprintf_cleanup
>
> contention_begin.active = 0
> }
>
> __bpf_trace_run(prog2) {
> contention_begin.active = 1
> bpf_prog_run(prog2)
> ...
> contention_begin.active = 0
> }
>
> do we need bpf program execution in nested tracepoints?
> we could actually allow 3 nesting levels for this case.. thoughts?
>
> thanks,
> jirka
>
>
> ---
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index 6a13220d2d27..5a354ae096e5 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -78,11 +78,15 @@
> #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
>
> #define __BPF_DECLARE_TRACE(call, proto, args) \
> +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> static notrace void \
> __bpf_trace_##call(void *__data, proto) \
> { \
> struct bpf_prog *prog = __data; \
> - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> + \
> + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> + this_cpu_dec(__bpf_trace_tp_active_##call); \
> }
This approach will hurt real use cases where
multiple and different raw_tp progs run on the same cpu.
Instead let's disallow attaching to trace_contention and
potentially any other hook with similar recursion properties.
Another option is to add a recursion check to trace_contention itself.
On Mon, Nov 07, 2022 at 12:49:01PM -0800, Alexei Starovoitov wrote:
> On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > > Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
> > > > >
> > > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > > Hi,
> > > > > >
> > > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > > But this warning can still be triggered on 247f34f7b803
> > > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > > still is an issue.
> > > > > >
> > > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > > git tree: upstream
> > > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > > >
> > > > > hi,
> > > > > right, that fix addressed that issue for single bpf program,
> > > > > and it won't prevent if there are multiple programs hook on
> > > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > > >
> > > > > I'm not sure we can do something there.. will check
> > > > >
> > > > > do you run just the reproducer, or you load the server somehow?
> > > > > I cannot hit the issue so far
> > > > >
> > > >
> > > > Hi,
> > > >
> > > > Last email has format issues, resend it here.
> > > >
> > > > I built the kernel with the config in the link, which contains
> > > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > > boot the kernel with normal qemu setup and then the warning can be
> > > > triggered by executing the reproducer.
> > > >
> > > > Also, I’m willing to test the proposed patch if any.
> > >
> > > fyi I reproduced that.. will check if we can do anything about that
> >
> > I reproduced this with set of 8 programs all hooked to contention_begin
> > tracepoint and here's what I think is happening:
> >
> > all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> > running 'perf bench sched messaging' to load the machine
> >
> > at some point some contended lock triggers trace_contention_begin:
> >
> > trace_contention_begin
> > __traceiter_contention_begin <-- iterates all functions attached to tracepoint
> > __bpf_trace_run(prog1)
> > prog1->active = 1
> > bpf_prog_run(prog1)
> > bpf_trace_printk
> > bpf_bprintf_prepare <-- takes buffer 1 out of 3
> > raw_spin_lock_irqsave(trace_printk_lock)
> >
> > # we have global single trace_printk_lock, so we will trigger
> > # its trace_contention_begin at some point
> >
> > trace_contention_begin
> > __traceiter_contention_begin
> > __bpf_trace_run(prog1)
> > prog1->active block <-- prog1 is already 'running', skipping the execution
> > __bpf_trace_run(prog2)
> > prog2->active = 1
> > bpf_prog_run(prog2)
> > bpf_trace_printk
> > bpf_bprintf_prepare <-- takes buffer 2 out of 3
> > raw_spin_lock_irqsave(trace_printk_lock)
> > trace_contention_begin
> > __traceiter_contention_begin
> > __bpf_trace_run(prog1)
> > prog1->active block <-- prog1 is already 'running', skipping the execution
> > __bpf_trace_run(prog2)
> > prog2->active block <-- prog2 is already 'running', skipping the execution
> > __bpf_trace_run(prog3)
> > prog3->active = 1
> > bpf_prog_run(prog3)
> > bpf_trace_printk
> > bpf_bprintf_prepare <-- takes buffer 3 out of 3
> > raw_spin_lock_irqsave(trace_printk_lock)
> > trace_contention_begin
> > __traceiter_contention_begin
> > __bpf_trace_run(prog1)
> > prog1->active block <-- prog1 is already 'running', skipping the execution
> > __bpf_trace_run(prog2)
> > prog2->active block <-- prog2 is already 'running', skipping the execution
> > __bpf_trace_run(prog3)
> > prog3->active block <-- prog3 is already 'running', skipping the execution
> > __bpf_trace_run(prog4)
> > prog4->active = 1
> > bpf_prog_run(prog4)
> > bpf_trace_printk
> > bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
> >
> >
> > the code path may vary based on the contention of the trace_printk_lock,
> > so I saw different nesting within 8 programs, but all eventually ended up
> > at 4 levels of nesting and hit the warning
> >
> > I think we could perhaps move the 'active' flag protection from program
> > to the tracepoint level (in the patch below), to prevent nesting execution
> > of the same tracepoint, so it'd look like:
> >
> > trace_contention_begin
> > __traceiter_contention_begin
> > __bpf_trace_run(prog1) {
> > contention_begin.active = 1
> > bpf_prog_run(prog1)
> > bpf_trace_printk
> > bpf_bprintf_prepare
> > raw_spin_lock_irqsave(trace_printk_lock)
> > trace_contention_begin
> > __traceiter_contention_begin
> > __bpf_trace_run(prog1)
> > blocked because contention_begin.active == 1
> > __bpf_trace_run(prog2)
> > blocked because contention_begin.active == 1
> > __bpf_trace_run(prog3)
> > ...
> > __bpf_trace_run(prog8)
> > blocked because contention_begin.active == 1
> >
> > raw_spin_unlock_irqrestore
> > bpf_bprintf_cleanup
> >
> > contention_begin.active = 0
> > }
> >
> > __bpf_trace_run(prog2) {
> > contention_begin.active = 1
> > bpf_prog_run(prog2)
> > ...
> > contention_begin.active = 0
> > }
> >
> > do we need bpf program execution in nested tracepoints?
> > we could actually allow 3 nesting levels for this case.. thoughts?
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > index 6a13220d2d27..5a354ae096e5 100644
> > --- a/include/trace/bpf_probe.h
> > +++ b/include/trace/bpf_probe.h
> > @@ -78,11 +78,15 @@
> > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> >
> > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > static notrace void \
> > __bpf_trace_##call(void *__data, proto) \
> > { \
> > struct bpf_prog *prog = __data; \
> > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > + \
> > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > }
>
> This approach will hurt real use cases where
> multiple and different raw_tp progs run on the same cpu.
would the 2 levels of nesting help in here?
I can imagine the change above would break use case where we want to
trigger tracepoints in irq context that interrupted task that's already
in the same tracepoint
with 2 levels of nesting we would trigger that tracepoint from irq and
would still be safe with bpf_bprintf_prepare buffer
what other use case do I miss?
thanks,
jirka
> Instead let's disallow attaching to trace_contention and
> potentially any other hook with similar recursion properties.
> Another option is to add a recursion check to trace_contention itself.
On Wed, Nov 9, 2022 at 5:49 AM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Nov 07, 2022 at 12:49:01PM -0800, Alexei Starovoitov wrote:
> > On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > > > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > > > Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
> > > > > >
> > > > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > > > But this warning can still be triggered on 247f34f7b803
> > > > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > > > still is an issue.
> > > > > > >
> > > > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > > > git tree: upstream
> > > > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > > > >
> > > > > > hi,
> > > > > > right, that fix addressed that issue for single bpf program,
> > > > > > and it won't prevent if there are multiple programs hook on
> > > > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > > > >
> > > > > > I'm not sure we can do something there.. will check
> > > > > >
> > > > > > do you run just the reproducer, or you load the server somehow?
> > > > > > I cannot hit the issue so far
> > > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > Last email has format issues, resend it here.
> > > > >
> > > > > I built the kernel with the config in the link, which contains
> > > > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > > > boot the kernel with normal qemu setup and then the warning can be
> > > > > triggered by executing the reproducer.
> > > > >
> > > > > Also, I’m willing to test the proposed patch if any.
> > > >
> > > > fyi I reproduced that.. will check if we can do anything about that
> > >
> > > I reproduced this with set of 8 programs all hooked to contention_begin
> > > tracepoint and here's what I think is happening:
> > >
> > > all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> > > running 'perf bench sched messaging' to load the machine
> > >
> > > at some point some contended lock triggers trace_contention_begin:
> > >
> > > trace_contention_begin
> > > __traceiter_contention_begin <-- iterates all functions attached to tracepoint
> > > __bpf_trace_run(prog1)
> > > prog1->active = 1
> > > bpf_prog_run(prog1)
> > > bpf_trace_printk
> > > bpf_bprintf_prepare <-- takes buffer 1 out of 3
> > > raw_spin_lock_irqsave(trace_printk_lock)
> > >
> > > # we have global single trace_printk_lock, so we will trigger
> > > # its trace_contention_begin at some point
> > >
> > > trace_contention_begin
> > > __traceiter_contention_begin
> > > __bpf_trace_run(prog1)
> > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > __bpf_trace_run(prog2)
> > > prog2->active = 1
> > > bpf_prog_run(prog2)
> > > bpf_trace_printk
> > > bpf_bprintf_prepare <-- takes buffer 2 out of 3
> > > raw_spin_lock_irqsave(trace_printk_lock)
> > > trace_contention_begin
> > > __traceiter_contention_begin
> > > __bpf_trace_run(prog1)
> > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > __bpf_trace_run(prog2)
> > > prog2->active block <-- prog2 is already 'running', skipping the execution
> > > __bpf_trace_run(prog3)
> > > prog3->active = 1
> > > bpf_prog_run(prog3)
> > > bpf_trace_printk
> > > bpf_bprintf_prepare <-- takes buffer 3 out of 3
> > > raw_spin_lock_irqsave(trace_printk_lock)
> > > trace_contention_begin
> > > __traceiter_contention_begin
> > > __bpf_trace_run(prog1)
> > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > __bpf_trace_run(prog2)
> > > prog2->active block <-- prog2 is already 'running', skipping the execution
> > > __bpf_trace_run(prog3)
> > > prog3->active block <-- prog3 is already 'running', skipping the execution
> > > __bpf_trace_run(prog4)
> > > prog4->active = 1
> > > bpf_prog_run(prog4)
> > > bpf_trace_printk
> > > bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
> > >
> > >
> > > the code path may vary based on the contention of the trace_printk_lock,
> > > so I saw different nesting within 8 programs, but all eventually ended up
> > > at 4 levels of nesting and hit the warning
> > >
> > > I think we could perhaps move the 'active' flag protection from program
> > > to the tracepoint level (in the patch below), to prevent nesting execution
> > > of the same tracepoint, so it'd look like:
> > >
> > > trace_contention_begin
> > > __traceiter_contention_begin
> > > __bpf_trace_run(prog1) {
> > > contention_begin.active = 1
> > > bpf_prog_run(prog1)
> > > bpf_trace_printk
> > > bpf_bprintf_prepare
> > > raw_spin_lock_irqsave(trace_printk_lock)
> > > trace_contention_begin
> > > __traceiter_contention_begin
> > > __bpf_trace_run(prog1)
> > > blocked because contention_begin.active == 1
> > > __bpf_trace_run(prog2)
> > > blocked because contention_begin.active == 1
> > > __bpf_trace_run(prog3)
> > > ...
> > > __bpf_trace_run(prog8)
> > > blocked because contention_begin.active == 1
> > >
> > > raw_spin_unlock_irqrestore
> > > bpf_bprintf_cleanup
> > >
> > > contention_begin.active = 0
> > > }
> > >
> > > __bpf_trace_run(prog2) {
> > > contention_begin.active = 1
> > > bpf_prog_run(prog2)
> > > ...
> > > contention_begin.active = 0
> > > }
> > >
> > > do we need bpf program execution in nested tracepoints?
> > > we could actually allow 3 nesting levels for this case.. thoughts?
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > ---
> > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > index 6a13220d2d27..5a354ae096e5 100644
> > > --- a/include/trace/bpf_probe.h
> > > +++ b/include/trace/bpf_probe.h
> > > @@ -78,11 +78,15 @@
> > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > >
> > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > static notrace void \
> > > __bpf_trace_##call(void *__data, proto) \
> > > { \
> > > struct bpf_prog *prog = __data; \
> > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > + \
> > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > }
> >
> > This approach will hurt real use cases where
> > multiple and different raw_tp progs run on the same cpu.
>
> would the 2 levels of nesting help in here?
>
> I can imagine the change above would break use case where we want to
> trigger tracepoints in irq context that interrupted task that's already
> in the same tracepoint
>
> with 2 levels of nesting we would trigger that tracepoint from irq and
> would still be safe with bpf_bprintf_prepare buffer
How would these 2 levels work?
On Wed, Nov 09, 2022 at 11:41:54AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 9, 2022 at 5:49 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Mon, Nov 07, 2022 at 12:49:01PM -0800, Alexei Starovoitov wrote:
> > > On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > > > > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > > > > Jiri Olsa <[email protected]> 于2022年10月27日周四 19:24写道:
> > > > > > >
> > > > > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > > > > But this warning can still be triggered on 247f34f7b803
> > > > > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > > > > still is an issue.
> > > > > > > >
> > > > > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > > > > git tree: upstream
> > > > > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > > > > >
> > > > > > > hi,
> > > > > > > right, that fix addressed that issue for single bpf program,
> > > > > > > and it won't prevent if there are multiple programs hook on
> > > > > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > > > > >
> > > > > > > I'm not sure we can do something there.. will check
> > > > > > >
> > > > > > > do you run just the reproducer, or you load the server somehow?
> > > > > > > I cannot hit the issue so far
> > > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Last email has format issues, resend it here.
> > > > > >
> > > > > > I built the kernel with the config in the link, which contains
> > > > > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > > > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > > > > boot the kernel with normal qemu setup and then the warning can be
> > > > > > triggered by executing the reproducer.
> > > > > >
> > > > > > Also, I’m willing to test the proposed patch if any.
> > > > >
> > > > > fyi I reproduced that.. will check if we can do anything about that
> > > >
> > > > I reproduced this with set of 8 programs all hooked to contention_begin
> > > > tracepoint and here's what I think is happening:
> > > >
> > > > all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> > > > running 'perf bench sched messaging' to load the machine
> > > >
> > > > at some point some contended lock triggers trace_contention_begin:
> > > >
> > > > trace_contention_begin
> > > > __traceiter_contention_begin <-- iterates all functions attached to tracepoint
> > > > __bpf_trace_run(prog1)
> > > > prog1->active = 1
> > > > bpf_prog_run(prog1)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- takes buffer 1 out of 3
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > >
> > > > # we have global single trace_printk_lock, so we will trigger
> > > > # its trace_contention_begin at some point
> > > >
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog2)
> > > > prog2->active = 1
> > > > bpf_prog_run(prog2)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- takes buffer 2 out of 3
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog2)
> > > > prog2->active block <-- prog2 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog3)
> > > > prog3->active = 1
> > > > bpf_prog_run(prog3)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- takes buffer 3 out of 3
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > prog1->active block <-- prog1 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog2)
> > > > prog2->active block <-- prog2 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog3)
> > > > prog3->active block <-- prog3 is already 'running', skipping the execution
> > > > __bpf_trace_run(prog4)
> > > > prog4->active = 1
> > > > bpf_prog_run(prog4)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare <-- tries to take buffer 4 out of 3 -> WARNING
> > > >
> > > >
> > > > the code path may vary based on the contention of the trace_printk_lock,
> > > > so I saw different nesting within 8 programs, but all eventually ended up
> > > > at 4 levels of nesting and hit the warning
> > > >
> > > > I think we could perhaps move the 'active' flag protection from program
> > > > to the tracepoint level (in the patch below), to prevent nesting execution
> > > > of the same tracepoint, so it'd look like:
> > > >
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1) {
> > > > contention_begin.active = 1
> > > > bpf_prog_run(prog1)
> > > > bpf_trace_printk
> > > > bpf_bprintf_prepare
> > > > raw_spin_lock_irqsave(trace_printk_lock)
> > > > trace_contention_begin
> > > > __traceiter_contention_begin
> > > > __bpf_trace_run(prog1)
> > > > blocked because contention_begin.active == 1
> > > > __bpf_trace_run(prog2)
> > > > blocked because contention_begin.active == 1
> > > > __bpf_trace_run(prog3)
> > > > ...
> > > > __bpf_trace_run(prog8)
> > > > blocked because contention_begin.active == 1
> > > >
> > > > raw_spin_unlock_irqrestore
> > > > bpf_bprintf_cleanup
> > > >
> > > > contention_begin.active = 0
> > > > }
> > > >
> > > > __bpf_trace_run(prog2) {
> > > > contention_begin.active = 1
> > > > bpf_prog_run(prog2)
> > > > ...
> > > > contention_begin.active = 0
> > > > }
> > > >
> > > > do we need bpf program execution in nested tracepoints?
> > > > we could actually allow 3 nesting levels for this case.. thoughts?
> > > >
> > > > thanks,
> > > > jirka
> > > >
> > > >
> > > > ---
> > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > --- a/include/trace/bpf_probe.h
> > > > +++ b/include/trace/bpf_probe.h
> > > > @@ -78,11 +78,15 @@
> > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > >
> > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > static notrace void \
> > > > __bpf_trace_##call(void *__data, proto) \
> > > > { \
> > > > struct bpf_prog *prog = __data; \
> > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > + \
> > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > }
> > >
> > > This approach will hurt real use cases where
> > > multiple and different raw_tp progs run on the same cpu.
> >
> > would the 2 levels of nesting help in here?
> >
> > I can imagine the change above would break use case where we want to
> > trigger tracepoints in irq context that interrupted task that's already
> > in the same tracepoint
> >
> > with 2 levels of nesting we would trigger that tracepoint from irq and
> > would still be safe with bpf_bprintf_prepare buffer
>
> How would these 2 levels work?
just using the active counter like below, but I haven't tested it yet
jirka
---
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27..ca5dd34478b7 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -78,11 +78,15 @@
#define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
#define __BPF_DECLARE_TRACE(call, proto, args) \
+static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
static notrace void \
__bpf_trace_##call(void *__data, proto) \
{ \
struct bpf_prog *prog = __data; \
- CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ \
+ if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) < 3)) \
+ CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
+ this_cpu_dec(__bpf_trace_tp_active_##call); \
}
#undef DECLARE_EVENT_CLASS
On Thu, Nov 10, 2022 at 12:53:16AM +0100, Jiri Olsa wrote:
SNIP
> > > > > ---
> > > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > > --- a/include/trace/bpf_probe.h
> > > > > +++ b/include/trace/bpf_probe.h
> > > > > @@ -78,11 +78,15 @@
> > > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > > >
> > > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > > static notrace void \
> > > > > __bpf_trace_##call(void *__data, proto) \
> > > > > { \
> > > > > struct bpf_prog *prog = __data; \
> > > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > + \
> > > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > > }
> > > >
> > > > This approach will hurt real use cases where
> > > > multiple and different raw_tp progs run on the same cpu.
> > >
> > > would the 2 levels of nesting help in here?
> > >
> > > I can imagine the change above would break use case where we want to
> > > trigger tracepoints in irq context that interrupted task that's already
> > > in the same tracepoint
> > >
> > > with 2 levels of nesting we would trigger that tracepoint from irq and
> > > would still be safe with bpf_bprintf_prepare buffer
> >
> > How would these 2 levels work?
>
> just using the active counter like below, but I haven't tested it yet
>
> jirka
seems to be working
Hao Sun, could you please test this patch?
thanks,
jirka
>
>
> ---
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index 6a13220d2d27..ca5dd34478b7 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -78,11 +78,15 @@
> #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
>
> #define __BPF_DECLARE_TRACE(call, proto, args) \
> +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> static notrace void \
> __bpf_trace_##call(void *__data, proto) \
> { \
> struct bpf_prog *prog = __data; \
> - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> + \
> + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) < 3)) \
> + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> + this_cpu_dec(__bpf_trace_tp_active_##call); \
> }
>
> #undef DECLARE_EVENT_CLASS
Jiri Olsa <[email protected]> 于2022年11月11日周五 22:45写道:
>
> On Thu, Nov 10, 2022 at 12:53:16AM +0100, Jiri Olsa wrote:
>
> SNIP
>
> > > > > > ---
> > > > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > > > --- a/include/trace/bpf_probe.h
> > > > > > +++ b/include/trace/bpf_probe.h
> > > > > > @@ -78,11 +78,15 @@
> > > > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > > > >
> > > > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > > > static notrace void \
> > > > > > __bpf_trace_##call(void *__data, proto) \
> > > > > > { \
> > > > > > struct bpf_prog *prog = __data; \
> > > > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > + \
> > > > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > > > }
> > > > >
> > > > > This approach will hurt real use cases where
> > > > > multiple and different raw_tp progs run on the same cpu.
> > > >
> > > > would the 2 levels of nesting help in here?
> > > >
> > > > I can imagine the change above would break use case where we want to
> > > > trigger tracepoints in irq context that interrupted task that's already
> > > > in the same tracepoint
> > > >
> > > > with 2 levels of nesting we would trigger that tracepoint from irq and
> > > > would still be safe with bpf_bprintf_prepare buffer
> > >
> > > How would these 2 levels work?
> >
> > just using the active counter like below, but I haven't tested it yet
> >
> > jirka
>
> seems to be working
> Hao Sun, could you please test this patch?
>
Hi Jirka,
I've tested the proposed patch, the warning from bpf_bprintf_prepare will not
be triggered with the patch, but the reproducer can still trigger the following
warning. My test was conducted on:
commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
git tree: upstream
kernel config: https://pastebin.com/raw/sE5QK5HL
C reproducer: https://pastebin.com/raw/X96ASi27
console log *before* the patch: https://pastebin.com/raw/eSCUNFrd
console log *after* the patch: https://pastebin.com/raw/tzcmdWZt
============================================
WARNING: possible recursive locking detected
6.1.0-rc4-00020-gf67dd6ce0723-dirty #11 Not tainted
--------------------------------------------
a.out/6703 is trying to acquire lock:
ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
but task is already holding lock:
ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(trace_printk_lock);
lock(trace_printk_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by a.out/6703:
#0: ffffffff8ce02fc8 (event_mutex){+.+.}-{3:3}, at:
__ftrace_set_clr_event kernel/trace/trace_events.c:1060 [inline]
#0: ffffffff8ce02fc8 (event_mutex){+.+.}-{3:3}, at:
trace_set_clr_event+0xd5/0x140 kernel/trace/trace_events.c:1126
#1: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run
kernel/trace/bpf_trace.c:2249 [inline]
#1: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at:
bpf_trace_run2+0xb9/0x3d0 kernel/trace/bpf_trace.c:2293
#2: ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
#2: ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
#3: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run
kernel/trace/bpf_trace.c:2249 [inline]
#3: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at:
bpf_trace_run2+0xb9/0x3d0 kernel/trace/bpf_trace.c:2293
stack backtrace:
CPU: 7 PID: 6703 Comm: a.out Not tainted 6.1.0-rc4-00020-gf67dd6ce0723-dirty #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux
1.16.0-3-3 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xfc/0x174 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2990 [inline]
check_deadlock kernel/locking/lockdep.c:3033 [inline]
validate_chain kernel/locking/lockdep.c:3818 [inline]
__lock_acquire.cold+0x119/0x3b9 kernel/locking/lockdep.c:5055
lock_acquire kernel/locking/lockdep.c:5668 [inline]
lock_acquire+0x1dc/0x600 kernel/locking/lockdep.c:5633
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x36/0x50 kernel/locking/spinlock.c:162
____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
___bpf_prog_run+0x42e5/0x8de0 kernel/bpf/core.c:1818
__bpf_prog_run32+0x99/0xe0 kernel/bpf/core.c:2041
bpf_dispatcher_nop_func include/linux/bpf.h:964 [inline]
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run include/linux/filter.h:607 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
__bpf_trace_contention_begin+0xc8/0x120 include/trace/events/lock.h:95
__traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
trace_contention_begin.constprop.0+0x143/0x240 include/trace/events/lock.h:95
__pv_queued_spin_lock_slowpath+0xfd/0xc70 kernel/locking/qspinlock.c:405
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x20a/0x2b0 kernel/locking/spinlock_debug.c:115
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0x3e/0x50 kernel/locking/spinlock.c:162
____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
___bpf_prog_run+0x42e5/0x8de0 kernel/bpf/core.c:1818
__bpf_prog_run32+0x99/0xe0 kernel/bpf/core.c:2041
bpf_dispatcher_nop_func include/linux/bpf.h:964 [inline]
__bpf_prog_run include/linux/filter.h:600 [inline]
bpf_prog_run include/linux/filter.h:607 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
__bpf_trace_contention_begin+0xc8/0x120 include/trace/events/lock.h:95
__traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
trace_contention_begin+0x129/0x1e0 include/trace/events/lock.h:95
__mutex_lock_common kernel/locking/mutex.c:605 [inline]
__mutex_lock+0x15a/0x12d0 kernel/locking/mutex.c:747
__ftrace_set_clr_event kernel/trace/trace_events.c:1060 [inline]
trace_set_clr_event+0xd5/0x140 kernel/trace/trace_events.c:1126
__set_printk_clr_event kernel/trace/bpf_trace.c:419 [inline]
bpf_get_trace_printk_proto kernel/trace/bpf_trace.c:425 [inline]
bpf_tracing_func_proto+0x476/0x5a0 kernel/trace/bpf_trace.c:1422
raw_tp_prog_func_proto+0x4f/0x60 kernel/trace/bpf_trace.c:1885
check_helper_call+0x20a/0x94d0 kernel/bpf/verifier.c:7255
do_check kernel/bpf/verifier.c:12384 [inline]
do_check_common+0x6b3e/0xdf20 kernel/bpf/verifier.c:14643
do_check_main kernel/bpf/verifier.c:14706 [inline]
bpf_check+0x714c/0xa9b0 kernel/bpf/verifier.c:15276
bpf_prog_load+0xfb1/0x2110 kernel/bpf/syscall.c:2605
__sys_bpf+0xaba/0x5520 kernel/bpf/syscall.c:4965
__do_sys_bpf kernel/bpf/syscall.c:5069 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5067 [inline]
__x64_sys_bpf+0x74/0xb0 kernel/bpf/syscall.c:5067
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe9f4ee4469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007fff7c845438 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe9f4ee4469
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 00007fff7c845450 R08: 00007fe9f4f2e160 R09: 00007fff7c845450
R10: 0000000020001cc2 R11: 0000000000000246 R12: 000055643c800c00
R13: 00007fff7c845570 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Regards
Hao
> thanks,
> jirka
> >
> >
> > ---
> > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > index 6a13220d2d27..ca5dd34478b7 100644
> > --- a/include/trace/bpf_probe.h
> > +++ b/include/trace/bpf_probe.h
> > @@ -78,11 +78,15 @@
> > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> >
> > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > static notrace void \
> > __bpf_trace_##call(void *__data, proto) \
> > { \
> > struct bpf_prog *prog = __data; \
> > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > + \
> > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) < 3)) \
> > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > }
> >
> > #undef DECLARE_EVENT_CLASS
On Sat, Nov 12, 2022 at 12:02:50AM +0800, Hao Sun wrote:
> Jiri Olsa <[email protected]> 于2022年11月11日周五 22:45写道:
> >
> > On Thu, Nov 10, 2022 at 12:53:16AM +0100, Jiri Olsa wrote:
> >
> > SNIP
> >
> > > > > > > ---
> > > > > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > > > > --- a/include/trace/bpf_probe.h
> > > > > > > +++ b/include/trace/bpf_probe.h
> > > > > > > @@ -78,11 +78,15 @@
> > > > > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > > > > >
> > > > > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > > > > static notrace void \
> > > > > > > __bpf_trace_##call(void *__data, proto) \
> > > > > > > { \
> > > > > > > struct bpf_prog *prog = __data; \
> > > > > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > > + \
> > > > > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > > > > }
> > > > > >
> > > > > > This approach will hurt real use cases where
> > > > > > multiple and different raw_tp progs run on the same cpu.
> > > > >
> > > > > would the 2 levels of nesting help in here?
> > > > >
> > > > > I can imagine the change above would break use case where we want to
> > > > > trigger tracepoints in irq context that interrupted task that's already
> > > > > in the same tracepoint
> > > > >
> > > > > with 2 levels of nesting we would trigger that tracepoint from irq and
> > > > > would still be safe with bpf_bprintf_prepare buffer
> > > >
> > > > How would these 2 levels work?
> > >
> > > just using the active counter like below, but I haven't tested it yet
> > >
> > > jirka
> >
> > seems to be working
> > Hao Sun, could you please test this patch?
> >
>
> Hi Jirka,
>
> I've tested the proposed patch, the warning from bpf_bprintf_prepare will not
> be triggered with the patch, but the reproducer can still trigger the following
> warning. My test was conducted on:
>
> commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
> git tree: upstream
> kernel config: https://pastebin.com/raw/sE5QK5HL
> C reproducer: https://pastebin.com/raw/X96ASi27
> console log *before* the patch: https://pastebin.com/raw/eSCUNFrd
> console log *after* the patch: https://pastebin.com/raw/tzcmdWZt
thanks for testing.. I can't reproduce this for some reason
I'll check some more and perhaps go with denying bpf attachment
for this tracepoint as Alexei suggeste
jirka
>
> ============================================
> WARNING: possible recursive locking detected
> 6.1.0-rc4-00020-gf67dd6ce0723-dirty #11 Not tainted
> --------------------------------------------
> a.out/6703 is trying to acquire lock:
> ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
> ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
> bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
>
> but task is already holding lock:
> ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
> ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
> bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(trace_printk_lock);
> lock(trace_printk_lock);
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 4 locks held by a.out/6703:
> #0: ffffffff8ce02fc8 (event_mutex){+.+.}-{3:3}, at:
> __ftrace_set_clr_event kernel/trace/trace_events.c:1060 [inline]
> #0: ffffffff8ce02fc8 (event_mutex){+.+.}-{3:3}, at:
> trace_set_clr_event+0xd5/0x140 kernel/trace/trace_events.c:1126
> #1: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run
> kernel/trace/bpf_trace.c:2249 [inline]
> #1: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at:
> bpf_trace_run2+0xb9/0x3d0 kernel/trace/bpf_trace.c:2293
> #2: ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
> ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> #2: ffffffff8ce0ea18 (trace_printk_lock){....}-{2:2}, at:
> bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
> #3: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run
> kernel/trace/bpf_trace.c:2249 [inline]
> #3: ffffffff8cd85f00 (rcu_read_lock){....}-{1:2}, at:
> bpf_trace_run2+0xb9/0x3d0 kernel/trace/bpf_trace.c:2293
>
> stack backtrace:
> CPU: 7 PID: 6703 Comm: a.out Not tainted 6.1.0-rc4-00020-gf67dd6ce0723-dirty #11
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux
> 1.16.0-3-3 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xfc/0x174 lib/dump_stack.c:106
> print_deadlock_bug kernel/locking/lockdep.c:2990 [inline]
> check_deadlock kernel/locking/lockdep.c:3033 [inline]
> validate_chain kernel/locking/lockdep.c:3818 [inline]
> __lock_acquire.cold+0x119/0x3b9 kernel/locking/lockdep.c:5055
> lock_acquire kernel/locking/lockdep.c:5668 [inline]
> lock_acquire+0x1dc/0x600 kernel/locking/lockdep.c:5633
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x36/0x50 kernel/locking/spinlock.c:162
> ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
> ___bpf_prog_run+0x42e5/0x8de0 kernel/bpf/core.c:1818
> __bpf_prog_run32+0x99/0xe0 kernel/bpf/core.c:2041
> bpf_dispatcher_nop_func include/linux/bpf.h:964 [inline]
> __bpf_prog_run include/linux/filter.h:600 [inline]
> bpf_prog_run include/linux/filter.h:607 [inline]
> __bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
> bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
> __bpf_trace_contention_begin+0xc8/0x120 include/trace/events/lock.h:95
> __traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
> trace_contention_begin.constprop.0+0x143/0x240 include/trace/events/lock.h:95
> __pv_queued_spin_lock_slowpath+0xfd/0xc70 kernel/locking/qspinlock.c:405
> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline]
> queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
> queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
> do_raw_spin_lock+0x20a/0x2b0 kernel/locking/spinlock_debug.c:115
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
> _raw_spin_lock_irqsave+0x3e/0x50 kernel/locking/spinlock.c:162
> ____bpf_trace_printk kernel/trace/bpf_trace.c:390 [inline]
> bpf_trace_printk+0xcf/0x170 kernel/trace/bpf_trace.c:376
> ___bpf_prog_run+0x42e5/0x8de0 kernel/bpf/core.c:1818
> __bpf_prog_run32+0x99/0xe0 kernel/bpf/core.c:2041
> bpf_dispatcher_nop_func include/linux/bpf.h:964 [inline]
> __bpf_prog_run include/linux/filter.h:600 [inline]
> bpf_prog_run include/linux/filter.h:607 [inline]
> __bpf_trace_run kernel/trace/bpf_trace.c:2254 [inline]
> bpf_trace_run2+0x14d/0x3d0 kernel/trace/bpf_trace.c:2293
> __bpf_trace_contention_begin+0xc8/0x120 include/trace/events/lock.h:95
> __traceiter_contention_begin+0x56/0x90 include/trace/events/lock.h:95
> trace_contention_begin+0x129/0x1e0 include/trace/events/lock.h:95
> __mutex_lock_common kernel/locking/mutex.c:605 [inline]
> __mutex_lock+0x15a/0x12d0 kernel/locking/mutex.c:747
> __ftrace_set_clr_event kernel/trace/trace_events.c:1060 [inline]
> trace_set_clr_event+0xd5/0x140 kernel/trace/trace_events.c:1126
> __set_printk_clr_event kernel/trace/bpf_trace.c:419 [inline]
> bpf_get_trace_printk_proto kernel/trace/bpf_trace.c:425 [inline]
> bpf_tracing_func_proto+0x476/0x5a0 kernel/trace/bpf_trace.c:1422
> raw_tp_prog_func_proto+0x4f/0x60 kernel/trace/bpf_trace.c:1885
> check_helper_call+0x20a/0x94d0 kernel/bpf/verifier.c:7255
> do_check kernel/bpf/verifier.c:12384 [inline]
> do_check_common+0x6b3e/0xdf20 kernel/bpf/verifier.c:14643
> do_check_main kernel/bpf/verifier.c:14706 [inline]
> bpf_check+0x714c/0xa9b0 kernel/bpf/verifier.c:15276
> bpf_prog_load+0xfb1/0x2110 kernel/bpf/syscall.c:2605
> __sys_bpf+0xaba/0x5520 kernel/bpf/syscall.c:4965
> __do_sys_bpf kernel/bpf/syscall.c:5069 [inline]
> __se_sys_bpf kernel/bpf/syscall.c:5067 [inline]
> __x64_sys_bpf+0x74/0xb0 kernel/bpf/syscall.c:5067
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fe9f4ee4469
> Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
> RSP: 002b:00007fff7c845438 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe9f4ee4469
> RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
> RBP: 00007fff7c845450 R08: 00007fe9f4f2e160 R09: 00007fff7c845450
> R10: 0000000020001cc2 R11: 0000000000000246 R12: 000055643c800c00
> R13: 00007fff7c845570 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
>
> Regards
> Hao
>
> > thanks,
> > jirka
> > >
> > >
> > > ---
> > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > index 6a13220d2d27..ca5dd34478b7 100644
> > > --- a/include/trace/bpf_probe.h
> > > +++ b/include/trace/bpf_probe.h
> > > @@ -78,11 +78,15 @@
> > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > >
> > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > static notrace void \
> > > __bpf_trace_##call(void *__data, proto) \
> > > { \
> > > struct bpf_prog *prog = __data; \
> > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > + \
> > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) < 3)) \
> > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > }
> > >
> > > #undef DECLARE_EVENT_CLASS
On Mon, Nov 14, 2022 at 09:04:57AM +0100, Jiri Olsa wrote:
> On Sat, Nov 12, 2022 at 12:02:50AM +0800, Hao Sun wrote:
> > Jiri Olsa <[email protected]> 于2022年11月11日周五 22:45写道:
> > >
> > > On Thu, Nov 10, 2022 at 12:53:16AM +0100, Jiri Olsa wrote:
> > >
> > > SNIP
> > >
> > > > > > > > ---
> > > > > > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > > > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > > > > > --- a/include/trace/bpf_probe.h
> > > > > > > > +++ b/include/trace/bpf_probe.h
> > > > > > > > @@ -78,11 +78,15 @@
> > > > > > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > > > > > >
> > > > > > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > > > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > > > > > static notrace void \
> > > > > > > > __bpf_trace_##call(void *__data, proto) \
> > > > > > > > { \
> > > > > > > > struct bpf_prog *prog = __data; \
> > > > > > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > > > + \
> > > > > > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > > > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > > > > > }
> > > > > > >
> > > > > > > This approach will hurt real use cases where
> > > > > > > multiple and different raw_tp progs run on the same cpu.
> > > > > >
> > > > > > would the 2 levels of nesting help in here?
> > > > > >
> > > > > > I can imagine the change above would break use case where we want to
> > > > > > trigger tracepoints in irq context that interrupted task that's already
> > > > > > in the same tracepoint
> > > > > >
> > > > > > with 2 levels of nesting we would trigger that tracepoint from irq and
> > > > > > would still be safe with bpf_bprintf_prepare buffer
> > > > >
> > > > > How would these 2 levels work?
> > > >
> > > > just using the active counter like below, but I haven't tested it yet
> > > >
> > > > jirka
> > >
> > > seems to be working
> > > Hao Sun, could you please test this patch?
> > >
> >
> > Hi Jirka,
> >
> > I've tested the proposed patch, the warning from bpf_bprintf_prepare will not
> > be triggered with the patch, but the reproducer can still trigger the following
> > warning. My test was conducted on:
> >
> > commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
> > git tree: upstream
> > kernel config: https://pastebin.com/raw/sE5QK5HL
> > C reproducer: https://pastebin.com/raw/X96ASi27
> > console log *before* the patch: https://pastebin.com/raw/eSCUNFrd
> > console log *after* the patch: https://pastebin.com/raw/tzcmdWZt
>
> thanks for testing.. I can't reproduce this for some reason
>
> I'll check some more and perhaps go with denying bpf attachment
> for this tracepoint as Alexei suggeste
the change below won't allow to attach bpf program with any printk
helper in contention_begin and bpf_trace_printk tracepoints
I still need to test it on the machine that reproduced the issue
for me.. meanwhile any feedback is appreciated
thanks,
jirka
---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 798aec816970..d88e0741b381 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1257,7 +1257,8 @@ struct bpf_prog {
enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
call_get_func_ip:1, /* Do we call get_func_ip() */
- tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
+ tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
+ call_printk:1; /* Do we call trace_printk/trace_vprintk */
enum bpf_prog_type type; /* Type of BPF program */
enum bpf_attach_type expected_attach_type; /* For some prog types */
u32 len; /* Number of filter blocks */
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 20749bd9db71..fd2725624fed 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -742,7 +742,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event);
int perf_event_query_prog_array(struct perf_event *event, void __user *info);
int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
-struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
+struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name, struct bpf_prog *prog);
void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
u32 *fd_type, const char **buf,
@@ -775,7 +775,8 @@ static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf
{
return -EOPNOTSUPP;
}
-static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
+static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name,
+ struct bpf_prog *prog)
{
return NULL;
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 85532d301124..d6081e8336c6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3281,7 +3281,7 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
return -EINVAL;
}
- btp = bpf_get_raw_tracepoint(tp_name);
+ btp = bpf_get_raw_tracepoint(tp_name, prog);
if (!btp)
return -ENOENT;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 07c0259dfc1a..9862345d9249 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7572,6 +7572,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
set_user_ringbuf_callback_state);
break;
+ case BPF_FUNC_trace_printk:
+ case BPF_FUNC_trace_vprintk:
+ env->prog->call_printk = 1;
+ break;
}
if (err)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f2d8d070d024..9a4652a05690 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2229,10 +2229,32 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
extern struct bpf_raw_event_map __start__bpf_raw_tp[];
extern struct bpf_raw_event_map __stop__bpf_raw_tp[];
-struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
+static int check_printk_denylist(const char *name, struct bpf_prog *prog)
+{
+ static const char *denylist[] = {
+ "contention_begin",
+ "bpf_trace_printk",
+ };
+ int i;
+
+ if (!prog->call_printk)
+ return 0;
+
+ for (i = 0; i < ARRAY_SIZE(denylist); i++) {
+ if (!strcmp(denylist[i], name))
+ return 1;
+ }
+ return 0;
+}
+
+struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name,
+ struct bpf_prog *prog)
{
struct bpf_raw_event_map *btp = __start__bpf_raw_tp;
+ if (check_printk_denylist(name, prog))
+ return NULL;
+
for (; btp < __stop__bpf_raw_tp; btp++) {
if (!strcmp(btp->tp->name, name))
return btp;
Jiri Olsa <[email protected]> 于2022年11月15日周二 06:47写道:
>
> On Mon, Nov 14, 2022 at 09:04:57AM +0100, Jiri Olsa wrote:
> > On Sat, Nov 12, 2022 at 12:02:50AM +0800, Hao Sun wrote:
> > > Jiri Olsa <[email protected]> 于2022年11月11日周五 22:45写道:
> > > >
> > > > On Thu, Nov 10, 2022 at 12:53:16AM +0100, Jiri Olsa wrote:
> > > >
> > > > SNIP
> > > >
> > > > > > > > > ---
> > > > > > > > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > > > > > > > index 6a13220d2d27..5a354ae096e5 100644
> > > > > > > > > --- a/include/trace/bpf_probe.h
> > > > > > > > > +++ b/include/trace/bpf_probe.h
> > > > > > > > > @@ -78,11 +78,15 @@
> > > > > > > > > #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > > > > > > > >
> > > > > > > > > #define __BPF_DECLARE_TRACE(call, proto, args) \
> > > > > > > > > +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call); \
> > > > > > > > > static notrace void \
> > > > > > > > > __bpf_trace_##call(void *__data, proto) \
> > > > > > > > > { \
> > > > > > > > > struct bpf_prog *prog = __data; \
> > > > > > > > > - CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > > > > + \
> > > > > > > > > + if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1)) \
> > > > > > > > > + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args)); \
> > > > > > > > > + this_cpu_dec(__bpf_trace_tp_active_##call); \
> > > > > > > > > }
> > > > > > > >
> > > > > > > > This approach will hurt real use cases where
> > > > > > > > multiple and different raw_tp progs run on the same cpu.
> > > > > > >
> > > > > > > would the 2 levels of nesting help in here?
> > > > > > >
> > > > > > > I can imagine the change above would break use case where we want to
> > > > > > > trigger tracepoints in irq context that interrupted task that's already
> > > > > > > in the same tracepoint
> > > > > > >
> > > > > > > with 2 levels of nesting we would trigger that tracepoint from irq and
> > > > > > > would still be safe with bpf_bprintf_prepare buffer
> > > > > >
> > > > > > How would these 2 levels work?
> > > > >
> > > > > just using the active counter like below, but I haven't tested it yet
> > > > >
> > > > > jirka
> > > >
> > > > seems to be working
> > > > Hao Sun, could you please test this patch?
> > > >
> > >
> > > Hi Jirka,
> > >
> > > I've tested the proposed patch, the warning from bpf_bprintf_prepare will not
> > > be triggered with the patch, but the reproducer can still trigger the following
> > > warning. My test was conducted on:
> > >
> > > commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
> > > git tree: upstream
> > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > console log *before* the patch: https://pastebin.com/raw/eSCUNFrd
> > > console log *after* the patch: https://pastebin.com/raw/tzcmdWZt
> >
> > thanks for testing.. I can't reproduce this for some reason
> >
> > I'll check some more and perhaps go with denying bpf attachment
> > for this tracepoint as Alexei suggeste
>
> the change below won't allow to attach bpf program with any printk
> helper in contention_begin and bpf_trace_printk tracepoints
>
> I still need to test it on the machine that reproduced the issue
> for me.. meanwhile any feedback is appreciated
>
Hi,
Tested on my machine, the C reproducer won't trigger any issue
this time with the patch. The test was conducted on:
commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
git tree: upstream
kernel config: https://pastebin.com/raw/sE5QK5HL
C reproducer: https://pastebin.com/raw/X96ASi27
full console log *before* the patch: https://pastebin.com/raw/n3x55RDr
full console log *after* the patch: https://pastebin.com/raw/7HdxnCnL
Thanks
Hao
> thanks,
> jirka
>
> ---
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 798aec816970..d88e0741b381 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1257,7 +1257,8 @@ struct bpf_prog {
> enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> call_get_func_ip:1, /* Do we call get_func_ip() */
> - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> + tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
> + call_printk:1; /* Do we call trace_printk/trace_vprintk */
> enum bpf_prog_type type; /* Type of BPF program */
> enum bpf_attach_type expected_attach_type; /* For some prog types */
> u32 len; /* Number of filter blocks */
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 20749bd9db71..fd2725624fed 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -742,7 +742,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event);
> int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> -struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
> +struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name, struct bpf_prog *prog);
> void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
> int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
> u32 *fd_type, const char **buf,
> @@ -775,7 +775,8 @@ static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf
> {
> return -EOPNOTSUPP;
> }
> -static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
> +static inline struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name,
> + struct bpf_prog *prog)
> {
> return NULL;
> }
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 85532d301124..d6081e8336c6 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3281,7 +3281,7 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
> return -EINVAL;
> }
>
> - btp = bpf_get_raw_tracepoint(tp_name);
> + btp = bpf_get_raw_tracepoint(tp_name, prog);
> if (!btp)
> return -ENOENT;
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 07c0259dfc1a..9862345d9249 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7572,6 +7572,10 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
> set_user_ringbuf_callback_state);
> break;
> + case BPF_FUNC_trace_printk:
> + case BPF_FUNC_trace_vprintk:
> + env->prog->call_printk = 1;
> + break;
> }
>
> if (err)
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f2d8d070d024..9a4652a05690 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2229,10 +2229,32 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info)
> extern struct bpf_raw_event_map __start__bpf_raw_tp[];
> extern struct bpf_raw_event_map __stop__bpf_raw_tp[];
>
> -struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name)
> +static int check_printk_denylist(const char *name, struct bpf_prog *prog)
> +{
> + static const char *denylist[] = {
> + "contention_begin",
> + "bpf_trace_printk",
> + };
> + int i;
> +
> + if (!prog->call_printk)
> + return 0;
> +
> + for (i = 0; i < ARRAY_SIZE(denylist); i++) {
> + if (!strcmp(denylist[i], name))
> + return 1;
> + }
> + return 0;
> +}
> +
> +struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name,
> + struct bpf_prog *prog)
> {
> struct bpf_raw_event_map *btp = __start__bpf_raw_tp;
>
> + if (check_printk_denylist(name, prog))
> + return NULL;
> +
> for (; btp < __stop__bpf_raw_tp; btp++) {
> if (!strcmp(btp->tp->name, name))
> return btp;
On Tue, Nov 15, 2022 at 09:48:27AM +0800, Hao Sun wrote:
SNIP
> > > > Hi Jirka,
> > > >
> > > > I've tested the proposed patch, the warning from bpf_bprintf_prepare will not
> > > > be triggered with the patch, but the reproducer can still trigger the following
> > > > warning. My test was conducted on:
> > > >
> > > > commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
> > > > git tree: upstream
> > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > > console log *before* the patch: https://pastebin.com/raw/eSCUNFrd
> > > > console log *after* the patch: https://pastebin.com/raw/tzcmdWZt
> > >
> > > thanks for testing.. I can't reproduce this for some reason
> > >
> > > I'll check some more and perhaps go with denying bpf attachment
> > > for this tracepoint as Alexei suggeste
> >
> > the change below won't allow to attach bpf program with any printk
> > helper in contention_begin and bpf_trace_printk tracepoints
> >
> > I still need to test it on the machine that reproduced the issue
> > for me.. meanwhile any feedback is appreciated
> >
>
> Hi,
>
> Tested on my machine, the C reproducer won't trigger any issue
> this time with the patch. The test was conducted on:
>
> commit: f67dd6ce0723 Merge tag 'slab-for-6.1-rc4-fixes'
> git tree: upstream
> kernel config: https://pastebin.com/raw/sE5QK5HL
> C reproducer: https://pastebin.com/raw/X96ASi27
> full console log *before* the patch: https://pastebin.com/raw/n3x55RDr
> full console log *after* the patch: https://pastebin.com/raw/7HdxnCnL
thanks, looks good on my end as well
jirka