2021-03-29 02:41:23

by qianli zhao

[permalink] [raw]
Subject: [PATCH V4] exit: trigger panic when global init has exited

From: Qianli Zhao <[email protected]>

When init sub-threads running on different CPUs exit at the same time,
zap_pid_ns_processe()->BUG() may be happened(timing is as below),move
panic() before set PF_EXITING to fix this problem.

In addition,if panic() after other sub-threads finish do_exit(),
some key variables (task->mm,task->nsproxy etc) of sub-thread will be lost,
which makes it difficult to parse coredump from fulldump,checking SIGNAL_GROUP_EXIT
to prevent init sub-threads exit.

[ 24.705376] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
[ 24.705382] CPU: 4 PID: 552 Comm: init Tainted: G S O 4.14.180-perf-g4483caa8ae80-dirty #1
[ 24.705390] kernel BUG at include/linux/pid_namespace.h:98!

PID: 552 CPU: 4 COMMAND: "init"
PID: 1 CPU: 7 COMMAND: "init"
core4 core7
... sys_exit_group()
do_group_exit()
- sig->flags = SIGNAL_GROUP_EXIT
- zap_other_threads()
do_exit() //PF_EXITING is set
ret_to_user()
do_notify_resume()
get_signal()
- signal_group_exit
- goto fatal;
do_group_exit()
do_exit() //PF_EXITING is set
- panic("Attempted to kill init! exitcode=0x%08x\n")
exit_notify()
find_alive_thread() //no alive sub-threads
zap_pid_ns_processes()//CONFIG_PID_NS is not set
BUG()

Signed-off-by: Qianli Zhao <[email protected]>
---
V4:
- Changelog update

V3:
- Use group_dead instead of thread_group_empty() to test single init exit.

V2:
- Changelog update
- Remove wrong useage of SIGNAL_UNKILLABLE.
- Add thread_group_empty() test to handle single init thread exit

---
kernel/exit.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e3..f95f8dc 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -766,6 +766,17 @@ void __noreturn do_exit(long code)

validate_creds_for_do_exit(tsk);

+ group_dead = atomic_dec_and_test(&tsk->signal->live);
+ /*
+ * If global init has exited,
+ * panic immediately to get a useable coredump.
+ */
+ if (unlikely(is_global_init(tsk) &&
+ (group_dead || (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
+ panic("Attempted to kill init! exitcode=0x%08x\n",
+ tsk->signal->group_exit_code ?: (int)code);
+ }
+
/*
* We're taking recursive faults here in do_exit. Safest is to just
* leave this task alone and wait for reboot.
@@ -784,16 +795,8 @@ void __noreturn do_exit(long code)
if (tsk->mm)
sync_mm_rss(tsk->mm);
acct_update_integrals(tsk);
- group_dead = atomic_dec_and_test(&tsk->signal->live);
- if (group_dead) {
- /*
- * If the last thread of global init has exited, panic
- * immediately to get a useable coredump.
- */
- if (unlikely(is_global_init(tsk)))
- panic("Attempted to kill init! exitcode=0x%08x\n",
- tsk->signal->group_exit_code ?: (int)code);

+ if (group_dead) {
#ifdef CONFIG_POSIX_TIMERS
hrtimer_cancel(&tsk->signal->real_timer);
exit_itimers(tsk->signal);
--
1.9.1


2021-03-30 02:29:00

by qianli zhao

[permalink] [raw]
Subject: Re: [PATCH V4] exit: trigger panic when global init has exited

Hi, Eric, Oleg

Any comment?

From the previous discussions, i think this change is necessary, but
we need to confirm that move the decrement of signal->live is a
safe.Here are some of my considerations
There are three places that are going to be called besides do_exit().
1. current_is_single_threaded()
current_is_single_threaded() is used to check current process just has
a single thread,my patch just moved the "signal->live--" position,this
won't change anything,current_is_single_threaded() maybe get different
value, after my patch,there is no change from the current logic.

2.css_task_iter_advance()
Same as above,css_task_iter_advance() just read "signal->live",this
may return different value,but it same before my patch.
css_task_iter_advance() cgroup_threadgroup_change_begin() held around
setting PF_EXITING before signal->live is decremented,
cgroup_threadgroup_rwsem(cgroup_threadgroup_change_begin()) is used
for user to get expect stable threadgroup,cgroup has no dependencies
on setting PF_EXITING or signal->live decrement.

3.copy_process()
copy_process() is called by fork(),copy_process will incremental
"signal->live",signal->live is atomic operation,there is no race, the
patch only move position,i don't see any new dependency problems

Moving the decrement position mainly changes the order in which
variables are assigned,we need to check if the change in the order of
assignment has any side effects on other callers.
i think acct_update_integrals(),sync_mm_rss() mainly updated some
data,only exit_signals() and sched_exit() need attention.
cgroup_threadgroup_change_begin() is called in exit_signals(),and
css_task_iter_advance used "signal->live",it seems like it might be a
little related.
cgroup_threadgroup_change_begin() just give stable threadgroup for
user,and css_task_iter_advance only check if group is dead, decrement
of signal->live and sets PF_EXITING seems like safe.

From my current analysis, this is safe.