Subject: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

perf_swevent_get_recursion_context() is supposed to avoid recursion.
This requires to remain on the same CPU in order to decrement/ increment
the same counter. This is done by using preempt_disable(). Having
preemption disabled while sending a signal leads to locking problems on
PREEMPT_RT because sighand, a spinlock_t, becomes a sleeping lock.

This callback runs in task context and currently delivers only a signal
to "itself". Any kind of recusrion protection in this context is not
required.

Remove recursion protection in perf_pending_task().

Tested-by: Marco Elver <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Reported-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/events/core.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e0b2da8de485f..5400f7ed2f98b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
static void perf_pending_task(struct callback_head *head)
{
struct perf_event *event = container_of(head, struct perf_event, pending_task);
- int rctx;
-
- /*
- * If we 'fail' here, that's OK, it means recursion is already disabled
- * and we won't recurse 'further'.
- */
- preempt_disable_notrace();
- rctx = perf_swevent_get_recursion_context();

if (event->pending_work) {
event->pending_work = 0;
@@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
local_dec(&event->ctx->nr_pending);
}

- if (rctx >= 0)
- perf_swevent_put_recursion_context(rctx);
- preempt_enable_notrace();
-
put_event(event);
}

--
2.43.0



2024-04-08 22:06:11

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

Le Fri, Mar 22, 2024 at 07:48:23AM +0100, Sebastian Andrzej Siewior a ?crit :
> perf_swevent_get_recursion_context() is supposed to avoid recursion.
> This requires to remain on the same CPU in order to decrement/ increment
> the same counter. This is done by using preempt_disable(). Having
> preemption disabled while sending a signal leads to locking problems on
> PREEMPT_RT because sighand, a spinlock_t, becomes a sleeping lock.
>
> This callback runs in task context and currently delivers only a signal
> to "itself". Any kind of recusrion protection in this context is not
> required.
>
> Remove recursion protection in perf_pending_task().
>
> Tested-by: Marco Elver <[email protected]>
> Tested-by: Arnaldo Carvalho de Melo <[email protected]>
> Reported-by: Arnaldo Carvalho de Melo <[email protected]>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/events/core.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e0b2da8de485f..5400f7ed2f98b 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
> static void perf_pending_task(struct callback_head *head)
> {
> struct perf_event *event = container_of(head, struct perf_event, pending_task);
> - int rctx;
> -
> - /*
> - * If we 'fail' here, that's OK, it means recursion is already disabled
> - * and we won't recurse 'further'.
> - */
> - preempt_disable_notrace();
> - rctx = perf_swevent_get_recursion_context();
>
> if (event->pending_work) {
> event->pending_work = 0;
> @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> local_dec(&event->ctx->nr_pending);
> }
>
> - if (rctx >= 0)
> - perf_swevent_put_recursion_context(rctx);
> - preempt_enable_notrace();

Well, if a software event happens during perf_sigtrap(), the task work
may be requeued endlessly and the task may get stuck in task_work_run()...

> -
> put_event(event);
> }
>
> --
> 2.43.0
>
>

Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

On 2024-04-09 00:06:00 [+0200], Frederic Weisbecker wrote:
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index e0b2da8de485f..5400f7ed2f98b 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
> > static void perf_pending_task(struct callback_head *head)
> > {
> > struct perf_event *event = container_of(head, struct perf_event, pending_task);
> > - int rctx;
> > -
> > - /*
> > - * If we 'fail' here, that's OK, it means recursion is already disabled
> > - * and we won't recurse 'further'.
> > - */
> > - preempt_disable_notrace();
> > - rctx = perf_swevent_get_recursion_context();
> >
> > if (event->pending_work) {
> > event->pending_work = 0;
> > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > local_dec(&event->ctx->nr_pending);
> > }
> >
> > - if (rctx >= 0)
> > - perf_swevent_put_recursion_context(rctx);
> > - preempt_enable_notrace();
>
> Well, if a software event happens during perf_sigtrap(), the task work
> may be requeued endlessly and the task may get stuck in task_work_run()...

The last time I checked it had no users in the task context. How would
that happen?

Sebastian

2024-04-09 10:35:57

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

Le Tue, Apr 09, 2024 at 08:25:01AM +0200, Sebastian Andrzej Siewior a ?crit :
> On 2024-04-09 00:06:00 [+0200], Frederic Weisbecker wrote:
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index e0b2da8de485f..5400f7ed2f98b 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -6785,14 +6785,6 @@ static void perf_pending_irq(struct irq_work *entry)
> > > static void perf_pending_task(struct callback_head *head)
> > > {
> > > struct perf_event *event = container_of(head, struct perf_event, pending_task);
> > > - int rctx;
> > > -
> > > - /*
> > > - * If we 'fail' here, that's OK, it means recursion is already disabled
> > > - * and we won't recurse 'further'.
> > > - */
> > > - preempt_disable_notrace();
> > > - rctx = perf_swevent_get_recursion_context();
> > >
> > > if (event->pending_work) {
> > > event->pending_work = 0;
> > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > > local_dec(&event->ctx->nr_pending);
> > > }
> > >
> > > - if (rctx >= 0)
> > > - perf_swevent_put_recursion_context(rctx);
> > > - preempt_enable_notrace();
> >
> > Well, if a software event happens during perf_sigtrap(), the task work
> > may be requeued endlessly and the task may get stuck in task_work_run()...
>
> The last time I checked it had no users in the task context. How would
> that happen?

I guess many tracepoint events would do the trick. Such as trace_lock_acquire()
for example.

Thanks.

Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote:
> > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > > > local_dec(&event->ctx->nr_pending);
> > > > }
> > > >
> > > > - if (rctx >= 0)
> > > > - perf_swevent_put_recursion_context(rctx);
> > > > - preempt_enable_notrace();
> > >
> > > Well, if a software event happens during perf_sigtrap(), the task work
> > > may be requeued endlessly and the task may get stuck in task_work_run()...
> >
> > The last time I checked it had no users in the task context. How would
> > that happen?
>
> I guess many tracepoint events would do the trick. Such as trace_lock_acquire()
> for example.

So the perf_trace_buf_alloc() is invoked from that trace point and
avoids the recursion. And any trace event from within perf_sigtrap()
would trigger the endless loop?

> Thanks.

Sebastian

2024-04-09 12:02:09

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

Le Tue, Apr 09, 2024 at 12:54:05PM +0200, Sebastian Andrzej Siewior a ?crit :
> On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote:
> > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > > > > local_dec(&event->ctx->nr_pending);
> > > > > }
> > > > >
> > > > > - if (rctx >= 0)
> > > > > - perf_swevent_put_recursion_context(rctx);
> > > > > - preempt_enable_notrace();
> > > >
> > > > Well, if a software event happens during perf_sigtrap(), the task work
> > > > may be requeued endlessly and the task may get stuck in task_work_run()...
> > >
> > > The last time I checked it had no users in the task context. How would
> > > that happen?
> >
> > I guess many tracepoint events would do the trick. Such as trace_lock_acquire()
> > for example.
>
> So the perf_trace_buf_alloc() is invoked from that trace point and
> avoids the recursion. And any trace event from within perf_sigtrap()
> would trigger the endless loop?

No sure I'm following:

1) event->perf_event_overflow() -> task_work_add()
//return to userspace
2) task_work_run() -> perf_pending_task() -> perf_sigtrap() -> tracepoint event
-> perf_event_overflow() -> task_work_add()
3) task_work_run() -> perf_pending_task() -> etc...

What am I missing?

Thanks.

Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

On 2024-04-09 14:00:49 [+0200], Frederic Weisbecker wrote:
> Le Tue, Apr 09, 2024 at 12:54:05PM +0200, Sebastian Andrzej Siewior a écrit :
> > On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote:
> > > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > > > > > local_dec(&event->ctx->nr_pending);
> > > > > > }
> > > > > >
> > > > > > - if (rctx >= 0)
> > > > > > - perf_swevent_put_recursion_context(rctx);
> > > > > > - preempt_enable_notrace();
> > > > >
> > > > > Well, if a software event happens during perf_sigtrap(), the task work
> > > > > may be requeued endlessly and the task may get stuck in task_work_run()...
> > > >
> > > > The last time I checked it had no users in the task context. How would
> > > > that happen?
> > >
> > > I guess many tracepoint events would do the trick. Such as trace_lock_acquire()
> > > for example.
> >
> > So the perf_trace_buf_alloc() is invoked from that trace point and
> > avoids the recursion. And any trace event from within perf_sigtrap()
> > would trigger the endless loop?
>
> No sure I'm following:
>
> 1) event->perf_event_overflow() -> task_work_add()
> //return to userspace
> 2) task_work_run() -> perf_pending_task() -> perf_sigtrap() -> tracepoint event
> -> perf_event_overflow() -> task_work_add()
> 3) task_work_run() -> perf_pending_task() -> etc...
>
> What am I missing?

Yes, that is what I tried to say. Anyway, I misunderstood the concept
before. That means we need to keep that counter here and a
migrate_disable() is needed to avoid CPU migration which is sad.

> Thanks.

Sebastian

2024-04-10 10:39:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

Le Tue, Apr 09, 2024 at 03:33:36PM +0200, Sebastian Andrzej Siewior a ?crit :
> On 2024-04-09 14:00:49 [+0200], Frederic Weisbecker wrote:
> > Le Tue, Apr 09, 2024 at 12:54:05PM +0200, Sebastian Andrzej Siewior a ?crit :
> > > On 2024-04-09 12:35:46 [+0200], Frederic Weisbecker wrote:
> > > > > > > @@ -6800,10 +6792,6 @@ static void perf_pending_task(struct callback_head *head)
> > > > > > > local_dec(&event->ctx->nr_pending);
> > > > > > > }
> > > > > > >
> > > > > > > - if (rctx >= 0)
> > > > > > > - perf_swevent_put_recursion_context(rctx);
> > > > > > > - preempt_enable_notrace();
> > > > > >
> > > > > > Well, if a software event happens during perf_sigtrap(), the task work
> > > > > > may be requeued endlessly and the task may get stuck in task_work_run()...
> > > > >
> > > > > The last time I checked it had no users in the task context. How would
> > > > > that happen?
> > > >
> > > > I guess many tracepoint events would do the trick. Such as trace_lock_acquire()
> > > > for example.
> > >
> > > So the perf_trace_buf_alloc() is invoked from that trace point and
> > > avoids the recursion. And any trace event from within perf_sigtrap()
> > > would trigger the endless loop?
> >
> > No sure I'm following:
> >
> > 1) event->perf_event_overflow() -> task_work_add()
> > //return to userspace
> > 2) task_work_run() -> perf_pending_task() -> perf_sigtrap() -> tracepoint event
> > -> perf_event_overflow() -> task_work_add()
> > 3) task_work_run() -> perf_pending_task() -> etc...
> >
> > What am I missing?
>
> Yes, that is what I tried to say.

Oh ok.. :o)

> Anyway, I misunderstood the concept
> before. That means we need to keep that counter here and a
> migrate_disable() is needed to avoid CPU migration which is sad.

I fear that won't work either. The task is then pinned but another
task can come up on that CPU and its software events will be ignored...

Some alternatives:

_ Clear event->pending_work = 0 after perf_sigtrap(), preventing an
event in there from adding a new task work. We may miss a signal though...

_ Make the recursion context per task on -RT...

> > Thanks.
>
> Sebastian

Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

On 2024-04-10 12:38:50 [+0200], Frederic Weisbecker wrote:
> > Anyway, I misunderstood the concept
> > before. That means we need to keep that counter here and a
> > migrate_disable() is needed to avoid CPU migration which is sad.
>
> I fear that won't work either. The task is then pinned but another
> task can come up on that CPU and its software events will be ignored...

oh, right.

> Some alternatives:
>
> _ Clear event->pending_work = 0 after perf_sigtrap(), preventing an
> event in there from adding a new task work. We may miss a signal though...
>
> _ Make the recursion context per task on -RT...

The per-task counter would be indeed the easiest thing to do. But then
only for task context, right?

But why would we miss a signal if we clean event->pending_work late?
Isn't cleaning late same as clearing in
perf_swevent_put_recursion_context()?
If clearing pending_work late works, I would prefer to avoid yet another
per-task counter if possible.

> > > Thanks.

Sebastian

2024-04-10 14:04:37

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf: Remove perf_swevent_get_recursion_context() from perf_pending_task().

Le Wed, Apr 10, 2024 at 02:51:26PM +0200, Sebastian Andrzej Siewior a ?crit :
> On 2024-04-10 12:38:50 [+0200], Frederic Weisbecker wrote:
> > Some alternatives:
> >
> > _ Clear event->pending_work = 0 after perf_sigtrap(), preventing an
> > event in there from adding a new task work. We may miss a signal though...
> >
> > _ Make the recursion context per task on -RT...
>
> The per-task counter would be indeed the easiest thing to do. But then
> only for task context, right?

It should work for CPU context as well. A context switch shouldn't be
considered as recursion. Hopefully...

>
> But why would we miss a signal if we clean event->pending_work late?
> Isn't cleaning late same as clearing in
> perf_swevent_put_recursion_context()?

Not exactly. perf_swevent_get_recursion_context() avoids a new software event
altogether from being recorded. It is completely ignored. Whereas clearing late
event->pending_work records new software events but ignores the signal.

> If clearing pending_work late works, I would prefer to avoid yet another
> per-task counter if possible.

Yeah I know :-/

Thanks.

>
> > > > Thanks.
>
> Sebastian