2024-03-30 02:58:11

by Wen Yang

[permalink] [raw]
Subject: [RESEND PATCH v3] exit: move trace_sched_process_exit earlier in do_exit()

From: Wen Yang <[email protected]>

In a safety critical system, when some processes exit abnormally,
it is hoped that prompt information can be reported to the monitor
as soon as possible.

Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & profile_munmap")
simplified the code, but also removed profile_task_exit(), which may
prevent third-party kernel modules from detecting process exits timely.

Compared to adding an extra tracking point, it is better to move the
existing trace_sched_process_exit() earlier in do_exit(), since any tracer
interested in knowing the point where a task is really reclaimed is
trace_sched_process_free() called from delayed_put_task_struct().[1]

Andrew raised a concern:
If userspace is awaiting this notification to say "it's now OK to read
the dump file" then it could break things?
The nearby proc_exit_connector() can be used for this purpose. But we
couldn't find any specific code that concerns the location of
trace_sched_process_exit().

Oleg initially proposed this suggestion, and Steven further provided some
detailed suggestions, and Mathieu carefully checked the historical code
and said:
: I've checked with Matthew Khouzam (maintainer of Trace Compass)
: which care about this tracepoint, and we have not identified any
: significant impact of moving it on its model of the scheduler, other
: than slightly changing its timing.
: I've also checked quickly in lttng-analyses and have not found
: any code that care about its specific placement.
: So I would say go ahead and move it earlier in do_exit(), it's
: fine by me. [2]

[1]: https://lore.kernel.org/all/[email protected]/
[2]: https://lore.kernel.org/all/[email protected]/

Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Suggested-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Wen Yang <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
---
kernel/exit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 493647fd7c07..2cff6533cb39 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -826,6 +826,7 @@ void __noreturn do_exit(long code)

WARN_ON(tsk->plug);

+ trace_sched_process_exit(tsk);
kcov_task_exit(tsk);
kmsan_task_exit(tsk);

@@ -866,7 +867,6 @@ void __noreturn do_exit(long code)

if (group_dead)
acct_process();
- trace_sched_process_exit(tsk);

exit_sem(tsk);
exit_shm(tsk);
--
2.25.1



2024-03-30 10:28:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND PATCH v3] exit: move trace_sched_process_exit earlier in do_exit()


* [email protected] <[email protected]> wrote:

> From: Wen Yang <[email protected]>
>
> In a safety critical system, when some processes exit abnormally, it
> is hoped that prompt information can be reported to the monitor as
> soon as possible.

If this event is so critical to catch, a probe can be put on do_exit().
This will be superior to your patch, because it will notify about the
event even sooner.

> Commit 2d4bcf886e42 ("exit: Remove profile_task_exit &
> profile_munmap") simplified the code, but also removed
> profile_task_exit(), which may prevent third-party kernel modules
> from detecting process exits timely.

Could you point out an example of such third-party kernel modules, and
why we should care about them?

> Compared to adding an extra tracking point, it is better to move the
> existing trace_sched_process_exit() earlier in do_exit(), since any
> tracer interested in knowing the point where a task is really
> reclaimed is trace_sched_process_free() called from
> delayed_put_task_struct().[1]

I disagree, I think this scheduler tracepoint should be moved even
*later* in the exit sequence, and be combined with
sched_autogroup_exit_task(), so that the scheduler only has a single
exit-notification callback in essence.

Until this is all done cleanly no tree should pick up this change:

NAKed-by: Ingo Molnar <[email protected]>

Thanks,

Ingo