2022-10-06 14:08:27

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Thu, Oct 06, 2022 at 03:33PM +0200, Peter Zijlstra wrote:
>
> OK, so the below seems to pass the concurrent sigtrap_threads test for
> me and doesn't have that horrible irq_work_sync hackery.
>
> Does it work for you too?

I'm getting this (courtesy of syzkaller):

| BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor.8/22848
| caller is perf_swevent_get_recursion_context+0x13/0x80
| CPU: 0 PID: 22860 Comm: syz-executor.6 Not tainted 6.0.0-rc3-00017-g1472d7e42f41 #64
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
| <TASK>
| dump_stack_lvl+0x6a/0x86
| check_preemption_disabled+0xdf/0xf0
| perf_swevent_get_recursion_context+0x13/0x80
| perf_pending_task+0xf/0x80
| task_work_run+0x73/0xc0
| do_exit+0x459/0xf20
| do_group_exit+0x3f/0xe0
| get_signal+0xe04/0xe60
| arch_do_signal_or_restart+0x3f/0x780
| exit_to_user_mode_prepare+0x135/0x1a0
| irqentry_exit_to_user_mode+0x6/0x30
| asm_sysvec_irq_work+0x16/0x20

That one I could fix up with:

| diff --git a/kernel/events/core.c b/kernel/events/core.c
| index 9319af6013f1..2f1d51b50be7 100644
| --- a/kernel/events/core.c
| +++ b/kernel/events/core.c
| @@ -6563,6 +6563,7 @@ static void perf_pending_task(struct callback_head *head)
| * If we 'fail' here, that's OK, it means recursion is already disabled
| * and we won't recurse 'further'.
| */
| + preempt_disable_notrace();
| rctx = perf_swevent_get_recursion_context();
|
| if (event->pending_work) {
| @@ -6573,6 +6574,7 @@ static void perf_pending_task(struct callback_head *head)
|
| if (rctx >= 0)
| perf_swevent_put_recursion_context(rctx);
| + preempt_enable_notrace();
| }
|
| #ifdef CONFIG_GUEST_PERF_EVENTS

But following that, I get:

| ======================================================
| WARNING: possible circular locking dependency detected
| 6.0.0-rc3-00017-g1472d7e42f41-dirty #65 Not tainted
| ------------------------------------------------------
| syz-executor.11/13018 is trying to acquire lock:
| ffffffffbb754a18 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0xa/0x30 kernel/locking/semaphore.c:139
|
| but task is already holding lock:
| ffff8ea992e00e20 (&ctx->lock){-.-.}-{2:2}, at: perf_event_context_sched_out kernel/events/core.c:3499 [inline]
| ffff8ea992e00e20 (&ctx->lock){-.-.}-{2:2}, at: __perf_event_task_sched_out+0x29e/0xb50 kernel/events/core.c:3608
|
| which lock already depends on the new lock.
|
| << snip ... lockdep unhappy we're trying to WARN >>
|
| WARNING: CPU: 3 PID: 13018 at kernel/events/core.c:2288 event_sched_out+0x3f2/0x410 kernel/events/core.c:2288
| Modules linked in:
| CPU: 3 PID: 13018 Comm: syz-executor.11 Not tainted 6.0.0-rc3-00017-g1472d7e42f41-dirty #65
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| RIP: 0010:event_sched_out+0x3f2/0x410 kernel/events/core.c:2288
| Code: ff ff e8 21 b2 f9 ff 65 8b 05 76 67 7f 46 85 c0 0f 84 0f ff ff ff e8 0d b2 f9 ff 90 0f 0b 90 e9 01 ff ff ff e8 ff b1 f9 ff 90 <0f> 0b 90 e9 3b fe ff ff e8 f1 b1 f9 ff 90 0f 0b 90 e9 01 ff ff ff
| RSP: 0018:ffffa69c8931f9a8 EFLAGS: 00010012
| RAX: 0000000040000000 RBX: ffff8ea99526f1c8 RCX: ffffffffb9824d01
| RDX: ffff8ea9934a0040 RSI: 0000000000000000 RDI: ffff8ea99526f1c8
| RBP: ffff8ea992e00e00 R08: 0000000000000000 R09: 0000000000000000
| R10: 00000000820822cd R11: 00000000d820822c R12: ffff8eacafcf2e50
| R13: ffff8eacafcf2e58 R14: ffffffffbb62e9a0 R15: ffff8ea992e00ef8
| FS: 00007fdcddbb6640(0000) GS:ffff8eacafcc0000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 00007fdcddbb5fa8 CR3: 0000000112846004 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
| PKRU: 55555554
| Call Trace:
| <TASK>
| group_sched_out.part.0+0x5c/0xe0 kernel/events/core.c:2320
| group_sched_out kernel/events/core.c:2315 [inline]
| ctx_sched_out+0x35d/0x3c0 kernel/events/core.c:3288
| task_ctx_sched_out+0x3d/0x60 kernel/events/core.c:2657
| perf_event_context_sched_out kernel/events/core.c:3505 [inline]
| __perf_event_task_sched_out+0x31b/0xb50 kernel/events/core.c:3608
| perf_event_task_sched_out include/linux/perf_event.h:1266 [inline]
| prepare_task_switch kernel/sched/core.c:4992 [inline]
| context_switch kernel/sched/core.c:5134 [inline]
| __schedule+0x4f8/0xb20 kernel/sched/core.c:6494
| preempt_schedule_irq+0x39/0x70 kernel/sched/core.c:6806
| irqentry_exit+0x32/0x90 kernel/entry/common.c:428
| asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649
| RIP: 0010:try_to_freeze include/linux/freezer.h:66 [inline]
| RIP: 0010:freezer_count include/linux/freezer.h:128 [inline]
| RIP: 0010:coredump_wait fs/coredump.c:407 [inline]
| RIP: 0010:do_coredump+0x1193/0x1b60 fs/coredump.c:563
| Code: d3 25 df ff 83 e3 08 0f 84 f0 03 00 00 e8 c5 25 df ff 48 f7 85 88 fe ff ff 00 01 00 00 0f 85 7e fc ff ff 31 db e9 83 fc ff ff <e8> a8 25 df ff e8 63 43 d3 ff e9 d2 f1 ff ff e8 99 25 df ff 48 85
| RSP: 0018:ffffa69c8931fc30 EFLAGS: 00000246
| RAX: 7fffffffffffffff RBX: ffff8ea9934a0040 RCX: 0000000000000000
| RDX: 0000000000000001 RSI: ffffffffbb4ab491 RDI: 00000000ffffffff
| RBP: ffffa69c8931fdc0 R08: 0000000000000001 R09: 0000000000000001
| R10: 00000000ffffffff R11: 00000000ffffffff R12: ffff8ea9934a0040
| R13: ffffffffbb792620 R14: 0000000000000108 R15: 0000000000000001
| get_signal+0xe56/0xe60 kernel/signal.c:2843
| arch_do_signal_or_restart+0x3f/0x780 arch/x86/kernel/signal.c:869
| exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
| exit_to_user_mode_prepare+0x135/0x1a0 kernel/entry/common.c:201
| __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
| syscall_exit_to_user_mode+0x1a/0x50 kernel/entry/common.c:294
| do_syscall_64+0x48/0x90 arch/x86/entry/common.c:86
| entry_SYSCALL_64_after_hwframe+0x64/0xce
| RIP: 0033:0x7fdcddc48549
| Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
| RSP: 002b:00007fdcddbb60f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
| RAX: 0000000000000001 RBX: 00007fdcddd62f88 RCX: 00007fdcddc48549
| RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007fdcddd62f8c
| RBP: 00007fdcddd62f80 R08: 000000000000001e R09: 0000000000000000
| R10: 0000000000000003 R11: 0000000000000246 R12: 00007fdcddd62f8c
| R13: 00007ffc4136118f R14: 0000000000000000 R15: 00007fdcddbb6640
| </TASK>
| irq event stamp: 128
| hardirqs last enabled at (127): [<ffffffffba9f7237>] irqentry_exit+0x37/0x90 kernel/entry/common.c:431
| hardirqs last disabled at (128): [<ffffffffba9faa5b>] __schedule+0x6cb/0xb20 kernel/sched/core.c:6393
| softirqs last enabled at (106): [<ffffffffbae0034f>] softirq_handle_end kernel/softirq.c:414 [inline]
| softirqs last enabled at (106): [<ffffffffbae0034f>] __do_softirq+0x34f/0x4d5 kernel/softirq.c:600
| softirqs last disabled at (99): [<ffffffffb9693821>] invoke_softirq kernel/softirq.c:445 [inline]
| softirqs last disabled at (99): [<ffffffffb9693821>] __irq_exit_rcu+0xb1/0x120 kernel/softirq.c:650
| ---[ end trace 0000000000000000 ]---
| BUG: kernel NULL pointer dereference, address: 0000000000000000
| #PF: supervisor instruction fetch in kernel mode
| #PF: error_code(0x0010) - not-present page
| PGD 8000000112e91067 P4D 8000000112e91067 PUD 114481067 PMD 0
| Oops: 0010 [#1] PREEMPT SMP PTI
| CPU: 1 PID: 13018 Comm: syz-executor.11 Tainted: G W 6.0.0-rc3-00017-g1472d7e42f41-dirty #65
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| RIP: 0010:0x0
| Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
| RSP: 0018:ffffa69c8931fd18 EFLAGS: 00010293
| RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb96be917
| RDX: ffff8ea9934a0040 RSI: 0000000000000000 RDI: ffff8ea99526f620
| RBP: ffff8ea9934a0040 R08: 0000000000000000 R09: 0000000000000000
| R10: 0000000000000001 R11: ffffffffb9cb7eaf R12: ffff8ea9934a08f0
| R13: ffff8ea992fc9cf8 R14: ffff8ea98c65dec0 R15: ffff8ea9934a0828
| FS: 0000000000000000(0000) GS:ffff8eacafc40000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: ffffffffffffffd6 CR3: 000000011343a003 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
| PKRU: 55555554
| Call Trace:
| <TASK>
| task_work_run+0x73/0xc0 kernel/task_work.c:177
| exit_task_work include/linux/task_work.h:38 [inline]
| do_exit+0x459/0xf20 kernel/exit.c:795
| do_group_exit+0x3f/0xe0 kernel/exit.c:925
| get_signal+0xe04/0xe60 kernel/signal.c:2857
| arch_do_signal_or_restart+0x3f/0x780 arch/x86/kernel/signal.c:869
| exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
| exit_to_user_mode_prepare+0x135/0x1a0 kernel/entry/common.c:201
| __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
| syscall_exit_to_user_mode+0x1a/0x50 kernel/entry/common.c:294
| do_syscall_64+0x48/0x90 arch/x86/entry/common.c:86
| entry_SYSCALL_64_after_hwframe+0x64/0xce
| RIP: 0033:0x7fdcddc48549
| Code: Unable to access opcode bytes at RIP 0x7fdcddc4851f.
| RSP: 002b:00007fdcddbb60f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
| RAX: 0000000000000001 RBX: 00007fdcddd62f88 RCX: 00007fdcddc48549
| RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007fdcddd62f8c
| RBP: 00007fdcddd62f80 R08: 000000000000001e R09: 0000000000000000
| R10: 0000000000000003 R11: 0000000000000246 R12: 00007fdcddd62f8c
| R13: 00007ffc4136118f R14: 0000000000000000 R15: 00007fdcddbb6640
| </TASK>
| Modules linked in:
| CR2: 0000000000000000
| ---[ end trace 0000000000000000 ]---
| RIP: 0010:0x0
| Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
| RSP: 0018:ffffa69c8931fd18 EFLAGS: 00010293
| RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb96be917
| RDX: ffff8ea9934a0040 RSI: 0000000000000000 RDI: ffff8ea99526f620
| RBP: ffff8ea9934a0040 R08: 0000000000000000 R09: 0000000000000000
| R10: 0000000000000001 R11: ffffffffb9cb7eaf R12: ffff8ea9934a08f0
| R13: ffff8ea992fc9cf8 R14: ffff8ea98c65dec0 R15: ffff8ea9934a0828
| FS: 0000000000000000(0000) GS:ffff8eacafc40000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: ffffffffffffffd6 CR3: 000000011343a003 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
| PKRU: 55555554
| ----------------
| Code disassembly (best guess):
| 0: d3 25 df ff 83 e3 shll %cl,-0x1c7c0021(%rip) # 0xe383ffe5
| 6: 08 0f or %cl,(%rdi)
| 8: 84 f0 test %dh,%al
| a: 03 00 add (%rax),%eax
| c: 00 e8 add %ch,%al
| e: c5 25 df ff vpandn %ymm7,%ymm11,%ymm15
| 12: 48 f7 85 88 fe ff ff testq $0x100,-0x178(%rbp)
| 19: 00 01 00 00
| 1d: 0f 85 7e fc ff ff jne 0xfffffca1
| 23: 31 db xor %ebx,%ebx
| 25: e9 83 fc ff ff jmp 0xfffffcad
| * 2a: e8 a8 25 df ff call 0xffdf25d7 <-- trapping instruction
| 2f: e8 63 43 d3 ff call 0xffd34397
| 34: e9 d2 f1 ff ff jmp 0xfffff20b
| 39: e8 99 25 df ff call 0xffdf25d7
| 3e: 48 rex.W
| 3f: 85 .byte 0x85


So something isn't quite right yet. Unfortunately I don't have a good
reproducer. :-/

Thanks,
-- Marco


2022-10-06 16:10:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Thu, Oct 06, 2022 at 03:59:55PM +0200, Marco Elver wrote:

> That one I could fix up with:
>
> | diff --git a/kernel/events/core.c b/kernel/events/core.c
> | index 9319af6013f1..2f1d51b50be7 100644
> | --- a/kernel/events/core.c
> | +++ b/kernel/events/core.c
> | @@ -6563,6 +6563,7 @@ static void perf_pending_task(struct callback_head *head)
> | * If we 'fail' here, that's OK, it means recursion is already disabled
> | * and we won't recurse 'further'.
> | */
> | + preempt_disable_notrace();
> | rctx = perf_swevent_get_recursion_context();
> |
> | if (event->pending_work) {
> | @@ -6573,6 +6574,7 @@ static void perf_pending_task(struct callback_head *head)
> |
> | if (rctx >= 0)
> | perf_swevent_put_recursion_context(rctx);
> | + preempt_enable_notrace();
> | }
> |
> | #ifdef CONFIG_GUEST_PERF_EVENTS

Right, thanks! It appears I only have lockdep enabled but not the
preempt warning :/

> But following that, I get:
>

> | WARNING: CPU: 3 PID: 13018 at kernel/events/core.c:2288 event_sched_out+0x3f2/0x410 kernel/events/core.c:2288

I'm taking this is (my line numbers are slightly different):

WARN_ON_ONCE(event->pending_work);



> So something isn't quite right yet. Unfortunately I don't have a good
> reproducer. :-/

This can happen if we get two consecutive event_sched_out() and both
instances will have pending_sigtrap set. This can happen when the event
that has sigtrap set also triggers in kernel space.

You then get task_work list corruption and *boom*.

I'm thinking the below might be the simplest solution; we can only send
a single signal after all.


--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2293,9 +2293,10 @@ event_sched_out(struct perf_event *event
*/
local_dec(&event->ctx->nr_pending);
} else {
- WARN_ON_ONCE(event->pending_work);
- event->pending_work = 1;
- task_work_add(current, &event->pending_task, TWA_RESUME);
+ if (!event->pending_work) {
+ event->pending_work = 1;
+ task_work_add(current, &event->pending_task, TWA_RESUME);
+ }
}
}

2022-10-07 09:51:16

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Thu, Oct 06, 2022 at 06:02PM +0200, Peter Zijlstra wrote:

> This can happen if we get two consecutive event_sched_out() and both
> instances will have pending_sigtrap set. This can happen when the event
> that has sigtrap set also triggers in kernel space.
>
> You then get task_work list corruption and *boom*.
>
> I'm thinking the below might be the simplest solution; we can only send
> a single signal after all.

That worked. In addition I had to disable the ctx->task != current check
if we're in task_work, because presumably the event might have already
been disabled/moved??

At least with all the below fixups, things seem to work (tests +
light fuzzing).

Thanks,
-- Marco

------ >8 ------

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9319af6013f1..29ed6e58906b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event,
*/
local_dec(&event->ctx->nr_pending);
} else {
- WARN_ON_ONCE(event->pending_work);
- event->pending_work = 1;
- task_work_add(current, &event->pending_task, TWA_RESUME);
+ if (!event->pending_work) {
+ event->pending_work = 1;
+ task_work_add(current, &event->pending_task, TWA_RESUME);
+ }
}
}

@@ -6455,18 +6456,19 @@ void perf_event_wakeup(struct perf_event *event)
}
}

-static void perf_sigtrap(struct perf_event *event)
+static void perf_sigtrap(struct perf_event *event, bool in_task_work)
{
/*
* We'd expect this to only occur if the irq_work is delayed and either
* ctx->task or current has changed in the meantime. This can be the
* case on architectures that do not implement arch_irq_work_raise().
*/
- if (WARN_ON_ONCE(event->ctx->task != current))
+ if (WARN_ON_ONCE(!in_task_work && event->ctx->task != current))
return;

/*
- * perf_pending_irq() can race with the task exiting.
+ * Both perf_pending_task() and perf_pending_irq() can race with the
+ * task exiting.
*/
if (current->flags & PF_EXITING)
return;
@@ -6496,7 +6498,7 @@ static void __perf_pending_irq(struct perf_event *event)
if (event->pending_sigtrap) {
event->pending_sigtrap = 0;
local_dec(&event->ctx->nr_pending);
- perf_sigtrap(event);
+ perf_sigtrap(event, false);
}
if (event->pending_disable) {
event->pending_disable = 0;
@@ -6563,16 +6565,18 @@ static void perf_pending_task(struct callback_head *head)
* If we 'fail' here, that's OK, it means recursion is already disabled
* and we won't recurse 'further'.
*/
+ preempt_disable_notrace();
rctx = perf_swevent_get_recursion_context();

if (event->pending_work) {
event->pending_work = 0;
local_dec(&event->ctx->nr_pending);
- perf_sigtrap(event);
+ perf_sigtrap(event, true);
}

if (rctx >= 0)
perf_swevent_put_recursion_context(rctx);
+ preempt_enable_notrace();
}

#ifdef CONFIG_GUEST_PERF_EVENTS

2022-10-07 13:12:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote:

> That worked. In addition I had to disable the ctx->task != current check
> if we're in task_work, because presumably the event might have already
> been disabled/moved??

Uhmmm... uhhh... damn. (wall-time was significantly longer)

Does this help?

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6490,8 +6490,8 @@ static void __perf_pending_irq(struct pe
if (cpu == smp_processor_id()) {
if (event->pending_sigtrap) {
event->pending_sigtrap = 0;
- local_dec(&event->ctx->nr_pending);
perf_sigtrap(event);
+ local_dec(&event->ctx->nr_pending);
}
if (event->pending_disable) {
event->pending_disable = 0;
@@ -6563,8 +6563,8 @@ static void perf_pending_task(struct cal

if (event->pending_work) {
event->pending_work = 0;
- local_dec(&event->ctx->nr_pending);
perf_sigtrap(event);
+ local_dec(&event->ctx->nr_pending);
}

if (rctx >= 0)

2022-10-07 14:20:53

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Fri, Oct 07, 2022 at 03:09PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote:
>
> > That worked. In addition I had to disable the ctx->task != current check
> > if we're in task_work, because presumably the event might have already
> > been disabled/moved??
>
> Uhmmm... uhhh... damn. (wall-time was significantly longer)
>
> Does this help?

No unfortunately - still see:

[ 82.300827] ------------[ cut here ]------------
[ 82.301680] WARNING: CPU: 0 PID: 976 at kernel/events/core.c:6466 perf_sigtrap+0x60/0x70
[ 82.303069] Modules linked in:
[ 82.303524] CPU: 0 PID: 976 Comm: missed_breakpoi Not tainted 6.0.0-rc3-00017-g1472d7e42f41-dirty #68
[ 82.304825] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 82.306204] RIP: 0010:perf_sigtrap+0x60/0x70
[ 82.306858] Code: e6 59 fa ff 48 8b 93 50 01 00 00 8b b3 d8 00 00 00 48 8b bb 30 04 00 00 e8 dd cf e8 ff 5b 5d e9 c6 59 fa ff e8 c1 59 fa ff 90 <0f> 0b 90 5b 5d e9 b6 59 fa ff 66 0f 1f 44 00 00 e8 ab 59 fa ff bf
[ 82.309515] RSP: 0000:ffffa52041cbbee0 EFLAGS: 00010293
[ 82.310295] RAX: 0000000000000000 RBX: ffff902fc966a228 RCX: ffffffffa761a53f
[ 82.311336] RDX: ffff902fca39c340 RSI: 0000000000000000 RDI: ffff902fc966a228
[ 82.312376] RBP: ffff902fca39c340 R08: 0000000000000001 R09: 0000000000000001
[ 82.313412] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffff902fca39cbf0
[ 82.314456] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 82.315561] FS: 00007fbae0636700(0000) GS:ffff9032efc00000(0000) knlGS:0000000000000000
[ 82.316815] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 82.317708] CR2: 000000001082d318 CR3: 0000000109430002 CR4: 0000000000770ef0
[ 82.318839] DR0: 00000000008aca98 DR1: 00000000008acb38 DR2: 00000000008acae8
[ 82.319955] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 82.321068] PKRU: 55555554
[ 82.321505] Call Trace:
[ 82.321913] <TASK>
[ 82.322267] perf_pending_task+0x7d/0xa0
[ 82.322900] task_work_run+0x73/0xc0
[ 82.323476] exit_to_user_mode_prepare+0x19d/0x1a0
[ 82.324209] irqentry_exit_to_user_mode+0x6/0x30
[ 82.324887] asm_sysvec_call_function_single+0x16/0x20
[ 82.325623] RIP: 0033:0x27d10b
[ 82.326092] Code: 43 08 48 8d 04 80 48 c1 e0 04 48 8d 0d 5e f9 62 00 48 01 c8 48 83 c0 08 b9 01 00 00 00 66 90 48 8b 10 48 39 ca 75 f8 88 48 41 <f0> 48 ff 40 08 48 8b 50 10 48 39 ca 75 f7 88 48 43 f0 48 ff 40 18
[ 82.328696] RSP: 002b:00007fbae0635a60 EFLAGS: 00000246
[ 82.329470] RAX: 00000000008acaa8 RBX: 000024073fc007d0 RCX: 0000000000001add
[ 82.330521] RDX: 0000000000001add RSI: 0000000000000070 RDI: 0000000000000007
[ 82.331557] RBP: 00007fbae0635a70 R08: 00007fbae0636700 R09: 00007fbae0636700
[ 82.332593] R10: 00007fbae06369d0 R11: 0000000000000202 R12: 00007fbae06369d0
[ 82.333630] R13: 00007ffe8139de16 R14: 00007fbae0636d1c R15: 00007fbae0635a80
[ 82.334713] </TASK>
[ 82.335093] irq event stamp: 546455
[ 82.335657] hardirqs last enabled at (546465): [<ffffffffa7513ef6>] __up_console_sem+0x66/0x70
[ 82.337032] hardirqs last disabled at (546476): [<ffffffffa7513edb>] __up_console_sem+0x4b/0x70
[ 82.338414] softirqs last enabled at (546084): [<ffffffffa8c0034f>] __do_softirq+0x34f/0x4d5
[ 82.339769] softirqs last disabled at (546079): [<ffffffffa7493821>] __irq_exit_rcu+0xb1/0x120
[ 82.341128] ---[ end trace 0000000000000000 ]---

I now have this on top:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9319af6013f1..7de83c42d312 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event,
*/
local_dec(&event->ctx->nr_pending);
} else {
- WARN_ON_ONCE(event->pending_work);
- event->pending_work = 1;
- task_work_add(current, &event->pending_task, TWA_RESUME);
+ if (!event->pending_work) {
+ event->pending_work = 1;
+ task_work_add(current, &event->pending_task, TWA_RESUME);
+ }
}
}

@@ -6466,7 +6467,8 @@ static void perf_sigtrap(struct perf_event *event)
return;

/*
- * perf_pending_irq() can race with the task exiting.
+ * Both perf_pending_task() and perf_pending_irq() can race with the
+ * task exiting.
*/
if (current->flags & PF_EXITING)
return;
@@ -6495,8 +6497,8 @@ static void __perf_pending_irq(struct perf_event *event)
if (cpu == smp_processor_id()) {
if (event->pending_sigtrap) {
event->pending_sigtrap = 0;
- local_dec(&event->ctx->nr_pending);
perf_sigtrap(event);
+ local_dec(&event->ctx->nr_pending);
}
if (event->pending_disable) {
event->pending_disable = 0;
@@ -6563,16 +6565,18 @@ static void perf_pending_task(struct callback_head *head)
* If we 'fail' here, that's OK, it means recursion is already disabled
* and we won't recurse 'further'.
*/
+ preempt_disable_notrace();
rctx = perf_swevent_get_recursion_context();

if (event->pending_work) {
event->pending_work = 0;
- local_dec(&event->ctx->nr_pending);
perf_sigtrap(event);
+ local_dec(&event->ctx->nr_pending);
}

if (rctx >= 0)
perf_swevent_put_recursion_context(rctx);
+ preempt_enable_notrace();
}

#ifdef CONFIG_GUEST_PERF_EVENTS

I'm throwing more WARN_ON()s at it to see what's going on...

2022-10-07 17:13:40

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Fri, Oct 07, 2022 at 03:58PM +0200, Marco Elver wrote:
> On Fri, Oct 07, 2022 at 03:09PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote:
> >
> > > That worked. In addition I had to disable the ctx->task != current check
> > > if we're in task_work, because presumably the event might have already
> > > been disabled/moved??
> >
> > Uhmmm... uhhh... damn. (wall-time was significantly longer)
> >
> > Does this help?
>
> No unfortunately - still see:
>
> [ 82.300827] ------------[ cut here ]------------
> [ 82.301680] WARNING: CPU: 0 PID: 976 at kernel/events/core.c:6466 perf_sigtrap+0x60/0x70

Whenever the warning fires, I see that event->state is OFF.

2022-10-08 09:43:18

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Fri, Oct 07, 2022 at 06:14PM +0200, Marco Elver wrote:
> On Fri, Oct 07, 2022 at 03:58PM +0200, Marco Elver wrote:
> > On Fri, Oct 07, 2022 at 03:09PM +0200, Peter Zijlstra wrote:
> > > On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote:
> > >
> > > > That worked. In addition I had to disable the ctx->task != current check
> > > > if we're in task_work, because presumably the event might have already
> > > > been disabled/moved??
> > >
> > > Uhmmm... uhhh... damn. (wall-time was significantly longer)
> > >
> > > Does this help?
> >
> > No unfortunately - still see:
> >
> > [ 82.300827] ------------[ cut here ]------------
> > [ 82.301680] WARNING: CPU: 0 PID: 976 at kernel/events/core.c:6466 perf_sigtrap+0x60/0x70
>
> Whenever the warning fires, I see that event->state is OFF.

The below patch to the sigtrap_threads test can repro the issue (when
run lots of them concurrently again). It also illustrates the original
problem we're trying to solve, where the event never gets rearmed again
and the test times out (doesn't happen with the almost-working fix).

Thanks,
-- Marco

------ >8 ------

From 98d225bda6d94dd793a1d0c77ae4b301c364166e Mon Sep 17 00:00:00 2001
From: Marco Elver <[email protected]>
Date: Sat, 8 Oct 2022 10:26:58 +0200
Subject: [PATCH] selftests/perf_events: Add a SIGTRAP stress test with
disables

Add a SIGTRAP stress test that exercises repeatedly enabling/disabling
an event while it concurrently keeps firing.

Signed-off-by: Marco Elver <[email protected]>
---
.../selftests/perf_events/sigtrap_threads.c | 35 +++++++++++++++++--
1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/perf_events/sigtrap_threads.c b/tools/testing/selftests/perf_events/sigtrap_threads.c
index 6d849dc2bee0..d1d8483ac628 100644
--- a/tools/testing/selftests/perf_events/sigtrap_threads.c
+++ b/tools/testing/selftests/perf_events/sigtrap_threads.c
@@ -62,6 +62,8 @@ static struct perf_event_attr make_event_attr(bool enabled, volatile void *addr,
.remove_on_exec = 1, /* Required by sigtrap. */
.sigtrap = 1, /* Request synchronous SIGTRAP on event. */
.sig_data = TEST_SIG_DATA(addr, id),
+ .exclude_kernel = 1, /* To allow */
+ .exclude_hv = 1, /* running as !root */
};
return attr;
}
@@ -93,9 +95,13 @@ static void *test_thread(void *arg)

__atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED);
iter = ctx.iterate_on; /* read */
- for (i = 0; i < iter - 1; i++) {
- __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED);
- ctx.iterate_on = iter; /* idempotent write */
+ if (iter >= 0) {
+ for (i = 0; i < iter - 1; i++) {
+ __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED);
+ ctx.iterate_on = iter; /* idempotent write */
+ }
+ } else {
+ while (ctx.iterate_on);
}

return NULL;
@@ -208,4 +214,27 @@ TEST_F(sigtrap_threads, signal_stress)
EXPECT_EQ(ctx.first_siginfo.si_perf_data, TEST_SIG_DATA(&ctx.iterate_on, 0));
}

+TEST_F(sigtrap_threads, signal_stress_with_disable)
+{
+ const int target_count = NUM_THREADS * 3000;
+ int i;
+
+ ctx.iterate_on = -1;
+
+ EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_ENABLE, 0), 0);
+ pthread_barrier_wait(&self->barrier);
+ while (__atomic_load_n(&ctx.signal_count, __ATOMIC_RELAXED) < target_count) {
+ EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_DISABLE, 0), 0);
+ EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_ENABLE, 0), 0);
+ }
+ ctx.iterate_on = 0;
+ for (i = 0; i < NUM_THREADS; i++)
+ ASSERT_EQ(pthread_join(self->threads[i], NULL), 0);
+ EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_DISABLE, 0), 0);
+
+ EXPECT_EQ(ctx.first_siginfo.si_addr, &ctx.iterate_on);
+ EXPECT_EQ(ctx.first_siginfo.si_perf_type, PERF_TYPE_BREAKPOINT);
+ EXPECT_EQ(ctx.first_siginfo.si_perf_data, TEST_SIG_DATA(&ctx.iterate_on, 0));
+}
+
TEST_HARNESS_MAIN
--
2.38.0.rc1.362.ged0d419d3c-goog

2022-10-08 13:52:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Sat, Oct 08, 2022 at 10:41:28AM +0200, Marco Elver wrote:
> The below patch to the sigtrap_threads test can repro the issue (when
> run lots of them concurrently again). It also illustrates the original
> problem we're trying to solve, where the event never gets rearmed again
> and the test times out (doesn't happen with the almost-working fix).

Excellent, that helps. Also, I'm an idiot ;-)

The below seems to fix it for me.

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3441,7 +3448,8 @@ static void perf_event_context_sched_out
perf_pmu_disable(pmu);

/* PMIs are disabled; ctx->nr_pending is stable. */
- if (local_read(&ctx->nr_pending)) {
+ if (local_read(&ctx->nr_pending) ||
+ local_read(&next_ctx->nr_pending)) {
/*
* Must not swap out ctx when there's pending
* events that rely on the ctx->task relation.

2022-10-08 14:06:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Fri, Oct 07, 2022 at 03:58:03PM +0200, Marco Elver wrote:
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 9319af6013f1..7de83c42d312 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event,
> */
> local_dec(&event->ctx->nr_pending);
> } else {
> - WARN_ON_ONCE(event->pending_work);
> - event->pending_work = 1;
> - task_work_add(current, &event->pending_task, TWA_RESUME);
> + if (!event->pending_work) {
> + event->pending_work = 1;
> + task_work_add(current, &event->pending_task, TWA_RESUME);
> + }
else {
local_dec(&event->ctx->nr_pending);
}
> }
> }

That whole thing can be written much saner like:

if (event->pending_sigtrap) {
event->pending_sigtrap = 0;
if (state != PERF_EVENT_STATE_OFF &&
!event->pending_work) {
event->pending_work = 1;
local_inc(&event->ctx->nr_pending);
task_work_add(current, &event->pending_task, TWA_RESUME);
}
local_dec(&event->ctx->nr_pending);
}

Except now we have two nr_pending ops -- I'm torn.

2022-10-08 15:04:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Sat, Oct 08, 2022 at 03:51:24PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 07, 2022 at 03:58:03PM +0200, Marco Elver wrote:
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 9319af6013f1..7de83c42d312 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event,
> > */
> > local_dec(&event->ctx->nr_pending);
> > } else {
> > - WARN_ON_ONCE(event->pending_work);
> > - event->pending_work = 1;
> > - task_work_add(current, &event->pending_task, TWA_RESUME);
> > + if (!event->pending_work) {
> > + event->pending_work = 1;
> > + task_work_add(current, &event->pending_task, TWA_RESUME);
> > + }
> else {
> local_dec(&event->ctx->nr_pending);
> }
> > }
> > }
>
> That whole thing can be written much saner like:
>
> if (event->pending_sigtrap) {
> event->pending_sigtrap = 0;
> if (state != PERF_EVENT_STATE_OFF &&
> !event->pending_work) {
> event->pending_work = 1;
> local_inc(&event->ctx->nr_pending);
> task_work_add(current, &event->pending_task, TWA_RESUME);
> }
> local_dec(&event->ctx->nr_pending);
> }
>
> Except now we have two nr_pending ops -- I'm torn.

I've settled for:

+ if (event->pending_sigtrap) {
+ bool dec = true;
+
+ event->pending_sigtrap = 0;
+ if (state != PERF_EVENT_STATE_OFF &&
+ !event->pending_work) {
+ event->pending_work = 1;
+ dec = false;
+ task_work_add(current, &event->pending_task, TWA_RESUME);
+ }
+ if (dec)
+ local_dec(&event->ctx->nr_pending);
+ }

2022-10-10 20:59:11

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs

On Sat, Oct 08, 2022 at 02:41PM +0200, Peter Zijlstra wrote:
> On Sat, Oct 08, 2022 at 10:41:28AM +0200, Marco Elver wrote:
> > The below patch to the sigtrap_threads test can repro the issue (when
> > run lots of them concurrently again). It also illustrates the original
> > problem we're trying to solve, where the event never gets rearmed again
> > and the test times out (doesn't happen with the almost-working fix).
>
> Excellent, that helps. Also, I'm an idiot ;-)
>
> The below seems to fix it for me.
>
> ---
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3441,7 +3448,8 @@ static void perf_event_context_sched_out
> perf_pmu_disable(pmu);
>
> /* PMIs are disabled; ctx->nr_pending is stable. */
> - if (local_read(&ctx->nr_pending)) {
> + if (local_read(&ctx->nr_pending) ||
> + local_read(&next_ctx->nr_pending)) {
> /*
> * Must not swap out ctx when there's pending
> * events that rely on the ctx->task relation.

Yup, that fixes it.

Can you send a v2 with all the fixups? Just to make sure I've tested the
right thing.

I'll also send the patch for the selftest addition once I gave it a good
spin.

Thanks,
-- Marco