2024-04-18 10:16:49

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference

On Thu, Apr 18, 2024 at 08:03:56AM +0000, Haifeng Xu wrote:
> In our production environment, we found many hung tasks which are
> blocked for more than 18 hours. Their call traces are like this:
>
> [346278.191038] __schedule+0x2d8/0x890
> [346278.191046] schedule+0x4e/0xb0
> [346278.191049] perf_event_free_task+0x220/0x270
> [346278.191056] ? init_wait_var_entry+0x50/0x50
> [346278.191060] copy_process+0x663/0x18d0
> [346278.191068] kernel_clone+0x9d/0x3d0
> [346278.191072] __do_sys_clone+0x5d/0x80
> [346278.191076] __x64_sys_clone+0x25/0x30
> [346278.191079] do_syscall_64+0x5c/0xc0
> [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
> [346278.191086] ? do_syscall_64+0x69/0xc0
> [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
> [346278.191092] ? irqentry_exit+0x19/0x30
> [346278.191095] ? exc_page_fault+0x89/0x160
> [346278.191097] ? asm_exc_page_fault+0x8/0x30
> [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The task was waiting for the refcount become to 1, but from the vmcore,
> we found the refcount has already been 1. It seems that the task didn't
> get woken up by perf_event_release_kernel() and got stuck forever. The
> below scenario may cause the problem.
>
> Thread A Thread B
> ... ...
> perf_event_free_task perf_event_release_kernel
> ...
> acquire event->child_mutex
> ...
> get_ctx
> ... release event->child_mutex
> acquire ctx->mutex
> ...
> perf_free_event (acquire/release event->child_mutex)
> ...
> release ctx->mutex
> wait_var_event
> acquire ctx->mutex
> acquire event->child_mutex
> # move existing events to free_list
> release event->child_mutex
> release ctx->mutex
> put_ctx
> ... ...
>
> In this case, all events of the ctx have been freed, so we couldn't
> find the ctx in free_list and Thread A will miss the wakeup. It's thus
> necessary to add a wakeup after dropping the reference.
>
> Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
> Signed-off-by: Haifeng Xu <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>

Hint: always ask before putting someone else's Signed-off-by tag ;-)
And anyway you don't need it here.

Reviewed-by: Frederic Weisbecker <[email protected]>


> ---
> Changes since v1
> - Add the fixed tag.
> - Simplify v1's patch. (Frederic)
> ---
> kernel/events/core.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4f0c45ab8d7d..15c35070db6a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5340,6 +5340,7 @@ int perf_event_release_kernel(struct perf_event *event)
> again:
> mutex_lock(&event->child_mutex);
> list_for_each_entry(child, &event->child_list, child_list) {
> + void *var = NULL;
>
> /*
> * Cannot change, child events are not migrated, see the
> @@ -5380,11 +5381,23 @@ int perf_event_release_kernel(struct perf_event *event)
> * this can't be the last reference.
> */
> put_event(event);
> + } else {
> + var = &ctx->refcount;
> }
>
> mutex_unlock(&event->child_mutex);
> mutex_unlock(&ctx->mutex);
> put_ctx(ctx);
> +
> + if (var) {
> + /*
> + * If perf_event_free_task() has deleted all events from the
> + * ctx while the child_mutex got released above, make sure to
> + * notify about the preceding put_ctx().
> + */
> + smp_mb(); /* pairs with wait_var_event() */
> + wake_up_var(var);
> + }
> goto again;
> }
> mutex_unlock(&event->child_mutex);
> --
> 2.25.1
>


2024-04-18 11:34:52

by Haifeng Xu

[permalink] [raw]
Subject: Re: [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference



On 2024/4/18 18:03, Frederic Weisbecker wrote:
> On Thu, Apr 18, 2024 at 08:03:56AM +0000, Haifeng Xu wrote:
>> In our production environment, we found many hung tasks which are
>> blocked for more than 18 hours. Their call traces are like this:
>>
>> [346278.191038] __schedule+0x2d8/0x890
>> [346278.191046] schedule+0x4e/0xb0
>> [346278.191049] perf_event_free_task+0x220/0x270
>> [346278.191056] ? init_wait_var_entry+0x50/0x50
>> [346278.191060] copy_process+0x663/0x18d0
>> [346278.191068] kernel_clone+0x9d/0x3d0
>> [346278.191072] __do_sys_clone+0x5d/0x80
>> [346278.191076] __x64_sys_clone+0x25/0x30
>> [346278.191079] do_syscall_64+0x5c/0xc0
>> [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
>> [346278.191086] ? do_syscall_64+0x69/0xc0
>> [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
>> [346278.191092] ? irqentry_exit+0x19/0x30
>> [346278.191095] ? exc_page_fault+0x89/0x160
>> [346278.191097] ? asm_exc_page_fault+0x8/0x30
>> [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> The task was waiting for the refcount become to 1, but from the vmcore,
>> we found the refcount has already been 1. It seems that the task didn't
>> get woken up by perf_event_release_kernel() and got stuck forever. The
>> below scenario may cause the problem.
>>
>> Thread A Thread B
>> ... ...
>> perf_event_free_task perf_event_release_kernel
>> ...
>> acquire event->child_mutex
>> ...
>> get_ctx
>> ... release event->child_mutex
>> acquire ctx->mutex
>> ...
>> perf_free_event (acquire/release event->child_mutex)
>> ...
>> release ctx->mutex
>> wait_var_event
>> acquire ctx->mutex
>> acquire event->child_mutex
>> # move existing events to free_list
>> release event->child_mutex
>> release ctx->mutex
>> put_ctx
>> ... ...
>>
>> In this case, all events of the ctx have been freed, so we couldn't
>> find the ctx in free_list and Thread A will miss the wakeup. It's thus
>> necessary to add a wakeup after dropping the reference.
>>
>> Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
>> Signed-off-by: Haifeng Xu <[email protected]>
>> Signed-off-by: Frederic Weisbecker <[email protected]>
>
> Hint: always ask before putting someone else's Signed-off-by tag ;-)
> And anyway you don't need it here.

Sorry, Frederic. I'll resend this with Reviewed-by tag.

Thanks!
>
> Reviewed-by: Frederic Weisbecker <[email protected]>
>
>
>> ---
>> Changes since v1
>> - Add the fixed tag.
>> - Simplify v1's patch. (Frederic)
>> ---
>> kernel/events/core.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 4f0c45ab8d7d..15c35070db6a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -5340,6 +5340,7 @@ int perf_event_release_kernel(struct perf_event *event)
>> again:
>> mutex_lock(&event->child_mutex);
>> list_for_each_entry(child, &event->child_list, child_list) {
>> + void *var = NULL;
>>
>> /*
>> * Cannot change, child events are not migrated, see the
>> @@ -5380,11 +5381,23 @@ int perf_event_release_kernel(struct perf_event *event)
>> * this can't be the last reference.
>> */
>> put_event(event);
>> + } else {
>> + var = &ctx->refcount;
>> }
>>
>> mutex_unlock(&event->child_mutex);
>> mutex_unlock(&ctx->mutex);
>> put_ctx(ctx);
>> +
>> + if (var) {
>> + /*
>> + * If perf_event_free_task() has deleted all events from the
>> + * ctx while the child_mutex got released above, make sure to
>> + * notify about the preceding put_ctx().
>> + */
>> + smp_mb(); /* pairs with wait_var_event() */
>> + wake_up_var(var);
>> + }
>> goto again;
>> }
>> mutex_unlock(&event->child_mutex);
>> --
>> 2.25.1
>>