2023-07-25 20:32:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: 'perf test sigtrap' failing on PREEMPT_RT_FULL

Hi Marco, Peter,

I got a report that 'perf test sigtrap' test failed on a
PREEMPT_RT_FULL kernel, one that had up to:

commit 97ba62b278674293762c3d91f724f1bb922f04e0
Author: Marco Elver <[email protected]>
Date: Thu Apr 8 12:36:01 2021 +0200

perf: Add support for SIGTRAP on perf events


It failed with no sigtrap delivered, none of the nr-threads +
interactions (15000).

Then I tried backporting up to the perf subsystem refactorings,
and then to what is in
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/log/?h=linux-6.4.y-rt,
but in both cases I ended up with the splat at the end of this message.

I'll continue investigating, but thought it would be a good time
to report it.

- Arnaldo


[ 52.848925] BUG: scheduling while atomic: perf/6549/0x00000002
[ 52.848925] BUG: scheduling while atomic: perf/6547/0x00000002
[ 52.848931] Modules linked in:
[ 52.848931] Modules linked in:
[ 52.848932] nft_fib_inet
[ 52.848931] BUG: scheduling while atomic: perf/6548/0x00000002
[ 52.848932] nft_fib_inet nft_fib_ipv4

<SNIP>

[ 52.849055] Preemption disabled at:
[ 52.849055] Preemption disabled at:
[ 52.849055] Preemption disabled at:
[ 52.849056] [<0000000000000000>] 0x0
[ 52.849056] [<0000000000000000>] 0x0
[ 52.849056] [<0000000000000000>] 0x0
[ 52.849061] CPU: 2 PID: 6549 Comm: perf Not tainted 6.4.0-rt6+ #2
[ 52.849064] Hardware name: LENOVO 427623U/427623U, BIOS 8BET45WW (1.25 ) 05/18/2011
[ 52.849066] Call Trace:
[ 52.849069] <TASK>
[ 52.849071] dump_stack_lvl+0x33/0x50
[ 52.849077] __schedule_bug+0x9a/0xb0
[ 52.849082] schedule_debug.constprop.0+0x10f/0x140
[ 52.849086] __schedule+0x50/0x6c0
[ 52.849090] ? _raw_spin_lock+0x13/0x40
[ 52.849093] ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b2/0x440
[ 52.849096] schedule_rtlock+0x1e/0x40
[ 52.849099] rtlock_slowlock_locked+0xf2/0x360
[ 52.849102] ? perf_ctx_enable+0x44/0x60
[ 52.849106] rt_spin_lock+0x41/0x60
[ 52.849109] do_send_sig_info+0x32/0xb0
[ 52.849112] send_sig_perf+0x70/0x90
[ 52.849116] perf_pending_task+0xb1/0xd0
[ 52.849119] task_work_run+0x59/0x90
[ 52.849123] exit_to_user_mode_loop+0x128/0x130
[ 52.849128] exit_to_user_mode_prepare+0xbd/0xd0
[ 52.849131] irqentry_exit_to_user_mode+0x5/0x30
[ 52.849135] asm_sysvec_irq_work+0x16/0x20
[ 52.849138] RIP: 0033:0x55aa823bed63
[ 52.849141] Code: 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 e8 e5 03 f5 ff 4c 89 e7 48 89 c3 e8 da 1c f5 ff f0 01 1d a3 f4 91 00 8b 05 a5 f4 91 00 <83> f8 01 7e 1f 89 d9 31 d2 0f 1f 40 00 f0 01 0d 89 f4 91 00 8b 05
[ 52.849143] RSP: 002b:00007f31341ccdb0 EFLAGS: 00000206
[ 52.849145] RAX: 0000000000000bb8 RBX: 0000000000001995 RCX: 00007f31369aab44
[ 52.849147] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 00007fffcdbf8384
[ 52.849148] RBP: 00007f31341ccdd0 R08: 00007fffcdbf8380 R09: 0000000000000006
[ 52.849150] R10: 0000000000000000 R11: 0000000000000286 R12: 00007fffcdbf8380
[ 52.849151] R13: 000000000000000d R14: 00007f31369ac530 R15: 0000000000000000
[ 52.849156] </TASK>
[ 52.849157] CPU: 3 PID: 6547 Comm: perf Not tainted 6.4.0-rt6+ #2

[acme@nine linux]$ git log --oneline -10
d37d728e9a66 (HEAD, tag: v6.4-rt6, linux-rt-devel/linux-6.4.y-rt) v6.4-rt6
4d1139baae8b mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
dc93c1f07d48 seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested()
a3f6be6e5353 printk: Check only for migration in printk_deferred_*().
6dc15eb2a631 bpf: Remove in_atomic() from bpf_link_put().
0ccbab373cd7 (tag: v6.4-rt5) v6.4-rt5
786cdf91804d Merge tag 'v6.4' into linux-6.4.y-rt
6995e2de6891 (tag: v6.4, linux-rt-devel/master, linux-rt-devel/linux-6.4.y) Linux 6.4
e3b2e2c14bcc Merge tag 'i2c-for-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
547cc9be86f4 Merge tag 'perf_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
[acme@nine linux]$

[acme@nine linux]$ uname -a
Linux nine 6.4.0-rt6+ #2 SMP PREEMPT_RT Sat Jul 22 08:52:53 -03 2023 x86_64 x86_64 x86_64 GNU/Linux

[acme@nine linux]$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.2"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.2 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.2"
[acme@nine linux]$


2023-07-26 06:58:08

by Mike Galbraith

[permalink] [raw]
Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

On Tue, 2023-07-25 at 17:15 -0300, Arnaldo Carvalho de Melo wrote:
> Hi Marco, Peter,
>
>         I got a report that 'perf test sigtrap' test failed on a
> PREEMPT_RT_FULL kernel, one that had up to:
>
> commit 97ba62b278674293762c3d91f724f1bb922f04e0
> Author: Marco Elver <[email protected]>
> Date:   Thu Apr 8 12:36:01 2021 +0200
>
>     perf: Add support for SIGTRAP on perf events
> ...

> [   52.848925] BUG: scheduling while atomic: perf/6549/0x00000002

Had bf9ad37dc8a not been reverted due to insufficient beauty, you could
trivially make the sigtrap test a happy camper (wart tested in tip-rt).

-Mike

@@ -1829,6 +1869,9 @@ int send_sig_perf(void __user *addr, u32
TRAP_PERF_FLAG_ASYNC :
0;

+ if (force_sig_delayed(&info, current))
+ return 0;
+
return send_sig_info(info.si_signo, &info, current);
}




2023-07-26 15:57:54

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

Em Wed, Jul 26, 2023 at 08:10:45AM +0200, Mike Galbraith escreveu:
> On Tue, 2023-07-25 at 17:15 -0300, Arnaldo Carvalho de Melo wrote:
> > Hi Marco, Peter,

> > ????????I got a report that 'perf test sigtrap' test failed on a
> > PREEMPT_RT_FULL kernel, one that had up to:

> > commit 97ba62b278674293762c3d91f724f1bb922f04e0
> > Author: Marco Elver <[email protected]>
> > Date:?? Thu Apr 8 12:36:01 2021 +0200

> > ??? perf: Add support for SIGTRAP on perf events
> > ...

> > [?? 52.848925] BUG: scheduling while atomic: perf/6549/0x00000002

> Had bf9ad37dc8a not been reverted due to insufficient beauty, you could
> trivially make the sigtrap test a happy camper (wart tested in tip-rt).

Yeah, I cherry-picked bf9ad37dc8a:

Author: Oleg Nesterov <[email protected]>
Date: Tue Jul 14 14:26:34 2015 +0200

signal, x86: Delay calling signals in atomic on RT enabled kernels

Applied your force_sig_delayed() call to send_sig_perf() and got:

[root@nine ~]# perf test sigtrap
73: Sigtrap : Ok
[root@nine ~]#

Happy camper indeed.

[acme@nine linux]$ git log --oneline -5
24f75a478a32 (HEAD) signal, x86: Delay calling signals in atomic on RT enabled kernels
d37d728e9a66 (tag: v6.4-rt6, linux-rt-devel/linux-6.4.y-rt) v6.4-rt6
4d1139baae8b mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
dc93c1f07d48 seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested()
a3f6be6e5353 printk: Check only for migration in printk_deferred_*().
[acme@nine linux]$ git diff
diff --git a/kernel/signal.c b/kernel/signal.c
index 464e68a8a273..f186e0d85381 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1868,6 +1868,9 @@ int send_sig_perf(void __user *addr, u32 type, u64 sig_data)
TRAP_PERF_FLAG_ASYNC :
0;

+ if (force_sig_delayed(&info, current))
+ return 0;
+
return send_sig_info(info.si_signo, &info, current);
}

[acme@nine linux]$ uname -a
Linux nine 6.4.0-rt6+ #3 SMP PREEMPT_RT Wed Jul 26 11:46:12 -03 2023 x86_64 x86_64 x86_64 GNU/Linux
[acme@nine linux]$ perf test sigtrap
71: Sigtrap : Ok
[acme@nine linux]$

- Arnaldo

> @@ -1829,6 +1869,9 @@ int send_sig_perf(void __user *addr, u32
> TRAP_PERF_FLAG_ASYNC :
> 0;
>
> + if (force_sig_delayed(&info, current))
> + return 0;
> +
> return send_sig_info(info.si_signo, &info, current);
> }

Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

On 2023-07-26 08:10:45 [+0200], Mike Galbraith wrote:
> > [   52.848925] BUG: scheduling while atomic: perf/6549/0x00000002
>
> Had bf9ad37dc8a not been reverted due to insufficient beauty, you could
> trivially make the sigtrap test a happy camper (wart tested in tip-rt).

Thank you for the pointer Mike.

I guess we need this preempt_disable_notrace() in perf_pending_task()
due to context accounting in get_recursion_context(). Would a
migrate_disable() be sufficient or could we send the signal outside of
the preempt-disabled block?

This is also used in perf_pending_irq() and on PREEMPT_RT this is
invoked from softirq context which is preemptible.

Sebastian

2024-01-04 22:36:10

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

Em Fri, Jul 28, 2023 at 05:07:18PM +0200, Sebastian Andrzej Siewior escreveu:
> On 2023-07-26 08:10:45 [+0200], Mike Galbraith wrote:
> > > [?? 52.848925] BUG: scheduling while atomic: perf/6549/0x00000002

> > Had bf9ad37dc8a not been reverted due to insufficient beauty, you could
> > trivially make the sigtrap test a happy camper (wart tested in tip-rt).

> Thank you for the pointer Mike.

> I guess we need this preempt_disable_notrace() in perf_pending_task()
> due to context accounting in get_recursion_context(). Would a
> migrate_disable() be sufficient or could we send the signal outside of
> the preempt-disabled block?

I got back to this, need to go again over all the callers of
perf_swevent_get_recursion_context(), from the first quick glance there
are other places with preempt_disable()/enable(), but doing just the
switch to migrate disable/enable on perf_pending_task() makes this
specific test to work:

[acme@nine linux]$ git log --oneline -5
086dab66d504 (HEAD -> linux-rt-devel/linux-6.7.y-rt/send_sig_perf.fix, tag: v6.7-rc5-rt5, linux-rt-devel/linux-6.7.y-rt) v6.7-rc5-rt5
29e0d951f39b printk: Update the printk series.
2308ecc8ce88 (tag: v6.7-rc5-rt4) v6.7-rc5-rt4
10d5f3551216 Merge tag 'v6.7-rc5' into linux-6.7.y-rt
a39b6ac3781d (tag: v6.7-rc5, linux-rt-devel/master, linux-rt-devel/linux-6.7.y) Linux 6.7-rc5
[acme@nine linux]$ git diff
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c9d123e13b57..a9b9ef60f6b3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6801,7 +6801,7 @@ static void perf_pending_task(struct callback_head *head)
* If we 'fail' here, that's OK, it means recursion is already disabled
* and we won't recurse 'further'.
*/
- preempt_disable_notrace();
+ migrate_disable();
rctx = perf_swevent_get_recursion_context();

if (event->pending_work) {
@@ -6812,7 +6812,7 @@ static void perf_pending_task(struct callback_head *head)

if (rctx >= 0)
perf_swevent_put_recursion_context(rctx);
- preempt_enable_notrace();
+ migrate_enable();

put_event(event);
}
[acme@nine linux]$ uname -a
Linux nine 6.7.0-rc5-rt5.sigtrap-fix-dirty #2 SMP PREEMPT_RT Thu Jan 4 18:11:44 -03 2024 x86_64 x86_64 x86_64 GNU/Linux
[acme@nine linux]$ sudo su -
[sudo] password for acme:
[root@nine ~]#
[root@nine ~]# perf test sigtrap
68: Sigtrap : Ok
[root@nine ~]#
[root@nine ~]# perf probe -L perf_pending_task
<perf_pending_task@/home/acme/git/linux/kernel/events/core.c:0>
0 static void perf_pending_task(struct callback_head *head)
{
2 struct perf_event *event = container_of(head, struct perf_event, pending_task);
3 int rctx;

/*
* If we 'fail' here, that's OK, it means recursion is already disabled
* and we won't recurse 'further'.
*/
migrate_disable();
10 rctx = perf_swevent_get_recursion_context();

12 if (event->pending_work) {
13 event->pending_work = 0;
14 perf_sigtrap(event);
15 local_dec(&event->ctx->nr_pending);
}

18 if (rctx >= 0)
19 perf_swevent_put_recursion_context(rctx);
20 migrate_enable();

22 put_event(event);
}

#ifdef CONFIG_GUEST_PERF_EVENTS

[root@nine ~]# perf probe perf_pending_task
Added new event:
probe:perf_pending_task (on perf_pending_task)

You can now use it in all perf tools, such as:

perf record -e probe:perf_pending_task -aR sleep 1

[root@nine ~]# perf trace --max-events=1 -e probe:perf_pending_task/max-stack=6/ perf test sigtrap
68: Sigtrap : Ok
0.000 :9608/9608 probe:perf_pending_task(__probe_ip: -2064408784)
perf_pending_task ([kernel.kallsyms])
task_work_run ([kernel.kallsyms])
exit_to_user_mode_loop ([kernel.kallsyms])
exit_to_user_mode_prepare ([kernel.kallsyms])
irqentry_exit_to_user_mode ([kernel.kallsyms])
asm_sysvec_irq_work ([kernel.kallsyms])
[root@nine ~]#

[root@nine ~]# head -5 /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.2"
[root@nine ~]#

I did the test without the above patch and the original problem is
reproduced.

> This is also used in perf_pending_irq() and on PREEMPT_RT this is
> invoked from softirq context which is preemptible.

Right.

- Arnaldo

2024-02-21 19:38:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

In Thu, 4 Jan 2024 19:35:57 -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jul 28, 2023 at 05:07:18PM +0200, Sebastian Andrzej Siewior escreveu:
> > On 2023-07-26 08:10:45 [+0200], Mike Galbraith wrote:
> > > > [?? 52.848925] BUG: scheduling while atomic: perf/6549/0x00000002

> > > Had bf9ad37dc8a not been reverted due to insufficient beauty, you could
> > > trivially make the sigtrap test a happy camper (wart tested in tip-rt).

> > Thank you for the pointer Mike.

> > I guess we need this preempt_disable_notrace() in perf_pending_task()
> > due to context accounting in get_recursion_context(). Would a
> > migrate_disable() be sufficient or could we send the signal outside of
> > the preempt-disabled block?

> I got back to this, need to go again over all the callers of
> perf_swevent_get_recursion_context(), from the first quick glance there
> are other places with preempt_disable()/enable(), but doing just the
> switch to migrate disable/enable on perf_pending_task() makes this
> specific test to work:

> [acme@nine linux]$ git log --oneline -5
> 086dab66d504 (HEAD -> linux-rt-devel/linux-6.7.y-rt/send_sig_perf.fix, tag: v6.7-rc5-rt5, linux-rt-devel/linux-6.7.y-rt) v6.7-rc5-rt5
> 29e0d951f39b printk: Update the printk series.
> 2308ecc8ce88 (tag: v6.7-rc5-rt4) v6.7-rc5-rt4
> 10d5f3551216 Merge tag 'v6.7-rc5' into linux-6.7.y-rt
> a39b6ac3781d (tag: v6.7-rc5, linux-rt-devel/master, linux-rt-devel/linux-6.7.y) Linux 6.7-rc5
> [acme@nine linux]$ git diff
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c9d123e13b57..a9b9ef60f6b3 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6801,7 +6801,7 @@ static void perf_pending_task(struct callback_head *head)
> * If we 'fail' here, that's OK, it means recursion is already disabled
> * and we won't recurse 'further'.
> */
>- preempt_disable_notrace();
>+ migrate_disable();
> rctx = perf_swevent_get_recursion_context();

Pardon my ignorance, is it safe to call preempt_count() with preemption
enabled on PREEMPT_RT, or at least in the context being discussed here?

Because:

perf_swevent_get_recursion_context()
get_recursion_context()
interrupt_context_level()
preempt_count()

And:

int perf_swevent_get_recursion_context(void)
{
struct swevent_htable *swhash = this_cpu_ptr(&swevent_htable);

return get_recursion_context(swhash->recursion);
}

> if (event->pending_work) {
> @@ -6812,7 +6812,7 @@ static void perf_pending_task(struct callback_head *head)

> if (rctx >= 0)
> perf_swevent_put_recursion_context(rctx);
> - preempt_enable_notrace();
> + migrate_enable();

> put_event(event);
> }
> [acme@nine linux]$ uname -a
> Linux nine 6.7.0-rc5-rt5.sigtrap-fix-dirty #2 SMP PREEMPT_RT Thu Jan 4 18:11:44 -03 2024 x86_64 x86_64 x86_64 GNU/Linux
> [acme@nine linux]$ sudo su -
> [sudo] password for acme:
> [root@nine ~]#
> [root@nine ~]# perf test sigtrap
> 68: Sigtrap : Ok
> [root@nine ~]#
> [root@nine ~]# perf probe -L perf_pending_task
> <perf_pending_task@/home/acme/git/linux/kernel/events/core.c:0>
> 0 static void perf_pending_task(struct callback_head *head)
> {
> 2 struct perf_event *event = container_of(head, struct perf_event, pending_task);
> 3 int rctx;

> /*
> * If we 'fail' here, that's OK, it means recursion is already disabled
> * and we won't recurse 'further'.
> */
> migrate_disable();
> 10 rctx = perf_swevent_get_recursion_context();
>
> 12 if (event->pending_work) {
> 13 event->pending_work = 0;
> 14 perf_sigtrap(event);
> 15 local_dec(&event->ctx->nr_pending);
> }
>
> 18 if (rctx >= 0)
> 19 perf_swevent_put_recursion_context(rctx);
> 20 migrate_enable();

> 22 put_event(event);
> }

> #ifdef CONFIG_GUEST_PERF_EVENTS

> [root@nine ~]# perf probe perf_pending_task
> Added new event:
> probe:perf_pending_task (on perf_pending_task)

> You can now use it in all perf tools, such as:

> perf record -e probe:perf_pending_task -aR sleep 1

> [root@nine ~]# perf trace --max-events=1 -e probe:perf_pending_task/max-stack=6/ perf test sigtrap
> 68: Sigtrap : Ok
> 0.000 :9608/9608 probe:perf_pending_task(__probe_ip: -2064408784)
> perf_pending_task ([kernel.kallsyms])
> task_work_run ([kernel.kallsyms])
> exit_to_user_mode_loop ([kernel.kallsyms])
> exit_to_user_mode_prepare ([kernel.kallsyms])
> irqentry_exit_to_user_mode ([kernel.kallsyms])
> asm_sysvec_irq_work ([kernel.kallsyms])
> [root@nine ~]#

> [root@nine ~]# head -5 /etc/os-release
> NAME="Red Hat Enterprise Linux"
> VERSION="9.2 (Plow)"
> ID="rhel"
> ID_LIKE="fedora"
> VERSION_ID="9.2"
> [root@nine ~]#

> I did the test without the above patch and the original problem is
> reproduced.

> > This is also used in perf_pending_irq() and on PREEMPT_RT this is
> > invoked from softirq context which is preemptible.

Humm, and then when going thru perf_pending_irq() we don't hit that
scheduling on atomic.

- Arnaldo

2024-03-06 16:28:05

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

On Wed, Mar 06, 2024 at 10:06:30AM -0300, Arnaldo Carvalho de Melo wrote:
> > In Thu, 4 Jan 2024 19:35:57 -0300, Arnaldo Carvalho de Melo wrote:
> > > +++ b/kernel/events/core.c
> > > @@ -6801,7 +6801,7 @@ static void perf_pending_task(struct callback_head *head)
> > > * If we 'fail' here, that's OK, it means recursion is already disabled
> > > * and we won't recurse 'further'.
> > > */
> > >- preempt_disable_notrace();
> > >+ migrate_disable();
> > > rctx = perf_swevent_get_recursion_context();
>
> > Pardon my ignorance, is it safe to call preempt_count() with preemption
> > enabled on PREEMPT_RT, or at least in the context being discussed here?
>
> > Because:
>
> > perf_swevent_get_recursion_context()
> > get_recursion_context()
> > interrupt_context_level()
> > preempt_count()
>
> > And:
>
> > int perf_swevent_get_recursion_context(void)
> > {
> > struct swevent_htable *swhash = this_cpu_ptr(&swevent_htable);
> >
> > return get_recursion_context(swhash->recursion);
> > }
>
> Seems to be enough because perf_pending_task is a irq_work callback and

s/irq_work/task_work/ but that also doesn't reentry, I think

> that is guaranteed not to reentry?
>
> Artem's tests with a RHEL kernel seems to indicate that, ditto for my,
> will test with upstream linux-6.8.y-rt.
>
> But there is a lot more happening in perf_sigtrap and I'm not sure if
> the irq_work callback gets preempted we would not race with something
> else.
>
> Marco, Mike, ideas?

Looking at:

commit ca6c21327c6af02b7eec31ce4b9a740a18c6c13f
Author: Peter Zijlstra <[email protected]>
Date: Thu Oct 6 15:00:39 2022 +0200

perf: Fix missing SIGTRAPs

Marco reported:

Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:

1. Missing SIGTRAP due to a race with event_sched_out() (more
details below).

2. Hardware PMU events being disabled due to returning 1 from
perf_event_overflow(). The only way to re-enable the event is
for user space to first "properly" disable the event and then
re-enable it.

3. The inability to automatically disable an event after a
specified number of overflows via PERF_EVENT_IOC_REFRESH.

The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:

-------------------------------------------------------------

That its what introduces perf_pending_task(), I'm now unsure we can just
disable migration, as event_sched_out() seems to require being called
under a raw_spin_lock and that disables preemption...

- Arnaldo

Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

On 2024-03-06 13:27:06 [-0300], Arnaldo Carvalho de Melo wrote:
> That its what introduces perf_pending_task(), I'm now unsure we can just
> disable migration, as event_sched_out() seems to require being called
> under a raw_spin_lock and that disables preemption...

Not sure what the best course of action is here but based on what I
learned last time you reported this I think we need delayed signals…
Let me look into this. We had it and then removed it because we had no
users of it at some point but probably nobody took perf into account.

> - Arnaldo

Sebastian

Subject: Re: 'perf test sigtrap' failing on PREEMPT_RT_FULL

On 2024-03-06 17:54:45 [+0100], To Arnaldo Carvalho de Melo wrote:
> Not sure what the best course of action is here but based on what I
> learned last time you reported this I think we need delayed signals…
> Let me look into this. We had it and then removed it because we had no
> users of it at some point but probably nobody took perf into account.

=> https://lore.kernel.org/[email protected]

> - Arnaldo

Sebastian