2024-06-09 14:25:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/1] exit: kill signal_struct->quick_threads

Hello,

Eric, I can't understand why the commit ("signal: Guarantee that
SIGNAL_GROUP_EXIT is set on process exit") added the new
quick_threads counter. And why, if we forget about --quick_threads,
synchronize_group_exit() has to take siglock unconditionally.
Did I miss something obvious?

Tejun, could you check the note about css_task_iter_advance() in
the changelog? And can't it use thread_group_empty() instead?

Oleg.



2024-06-09 14:25:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/1] exit: kill signal_struct->quick_threads

do_exit() can simply decrement signal_struct->live earlier and call
synchronize_group_exit() only if group_dead is true. Also, this code
can avoid spin_lock_irq(siglock) if SIGNAL_GROUP_EXIT is already set.

The only "nontrivial" user of signal->live is css_task_iter_advance()
but it should not be affected. With or without this change the task
is the exiting/exited leader which has already passed cgroup_exit(),
atomic_read(signal->live) can race with sub-threads but this is fine.

Signed-off-by: Oleg Nesterov <[email protected]>
---
include/linux/sched/signal.h | 1 -
kernel/exit.c | 12 +++++++-----
kernel/fork.c | 2 --
3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 0a0e23c45406..f98aae40d7e6 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -95,7 +95,6 @@ struct signal_struct {
refcount_t sigcnt;
atomic_t live;
int nr_threads;
- int quick_threads;
struct list_head thread_head;

wait_queue_head_t wait_chldexit; /* for wait4() */
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..11bac37e2bc6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -800,10 +800,11 @@ static void synchronize_group_exit(struct task_struct *tsk, long code)
struct sighand_struct *sighand = tsk->sighand;
struct signal_struct *signal = tsk->signal;

+ if (READ_ONCE(signal->flags) & SIGNAL_GROUP_EXIT)
+ return;
+
spin_lock_irq(&sighand->siglock);
- signal->quick_threads--;
- if ((signal->quick_threads == 0) &&
- !(signal->flags & SIGNAL_GROUP_EXIT)) {
+ if (!(signal->flags & SIGNAL_GROUP_EXIT)) {
signal->flags = SIGNAL_GROUP_EXIT;
signal->group_exit_code = code;
signal->group_stop_count = 0;
@@ -818,7 +819,9 @@ void __noreturn do_exit(long code)

WARN_ON(irqs_disabled());

- synchronize_group_exit(tsk, code);
+ group_dead = atomic_dec_and_test(&tsk->signal->live);
+ if (group_dead)
+ synchronize_group_exit(tsk, code);

WARN_ON(tsk->plug);

@@ -833,7 +836,6 @@ void __noreturn do_exit(long code)
exit_signals(tsk); /* sets PF_EXITING */

acct_update_integrals(tsk);
- group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
/*
* If the last thread of global init has exited, panic
diff --git a/kernel/fork.c b/kernel/fork.c
index 99076dbe27d8..4c361d2bdc12 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1861,7 +1861,6 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
return -ENOMEM;

sig->nr_threads = 1;
- sig->quick_threads = 1;
atomic_set(&sig->live, 1);
refcount_set(&sig->sigcnt, 1);

@@ -2590,7 +2589,6 @@ __latent_entropy struct task_struct *copy_process(
__this_cpu_inc(process_counts);
} else {
current->signal->nr_threads++;
- current->signal->quick_threads++;
atomic_inc(&current->signal->live);
refcount_inc(&current->signal->sigcnt);
task_join_group_stop(p);
--
2.25.1.362.g51ebf55



2024-06-09 18:41:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

On 06/09, Oleg Nesterov wrote:
>
> Tejun, could you check the note about css_task_iter_advance() in
> the changelog?

Yes,

> And can't it use thread_group_empty() instead?

Please forget. I'll write another email.

Oleg.


2024-06-10 10:55:27

by Oleg Nesterov

[permalink] [raw]
Subject: Q: css_task_iter_advance() && dying_tasks

I never understood the code in kernel/cgroup/ even remotely, most probably
I missed something, but let me ask a couple of stupid questions anyway.

cgroup_exit() does

css_set_move_task(tsk, cset, NULL, false);
list_add_tail(&tsk->cg_list, &cset->dying_tasks);

but unless I am totally confused css_task_iter_advance() always ignores
the "dying" sub-threads, so perhaps it should do, say,

css_set_move_task(tsk, cset, NULL, false);
if (delay_group_leader(tsk))
list_add_tail(&tsk->cg_list, &cset->dying_tasks);

and then cgroup_release() can check list_empty(cg_list) before it takes
css_set_lock.

No ?

Or, perhaps we can do even better? Can't cgroup_exit() do something like

// group_dead should be passed from do_exit()

css_set_move_task(tsk, cset, NULL, false);

if (thread_group_leader(tsk) && !group_dead)
list_add_tail(&tsk->cg_list, &cset->dying_tasks);

else if (!thread_group_leader(tsk) && group_dead) {
leader = tsk->group_leader;
if (!list_empty(leader->cg_list) {
css_set_skip_task_iters(task_css_set(leader), leader);
list_del_init(&leader->cg_list);
}
}

and then

- kill the atomic_read(&task->signal->live)) check in
css_task_iter_advance()

- kill the code under css_set_lock in cgroup_release()

Oleg.


2024-06-10 11:10:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Q: css_task_iter_advance() && dying_tasks

Sorry for the spam, forgot to mention. Either way the usage of group_dead
or atomic_read(signal->live) in these paths doesn't look "perfect", but
this is another thing. The pseudo code below tries to mimic the current
logic but again, I'm afraid I misread this code completely.


On 06/10, Oleg Nesterov wrote:
>
> I never understood the code in kernel/cgroup/ even remotely, most probably
> I missed something, but let me ask a couple of stupid questions anyway.
>
> cgroup_exit() does
>
> css_set_move_task(tsk, cset, NULL, false);
> list_add_tail(&tsk->cg_list, &cset->dying_tasks);
>
> but unless I am totally confused css_task_iter_advance() always ignores
> the "dying" sub-threads, so perhaps it should do, say,
>
> css_set_move_task(tsk, cset, NULL, false);
> if (delay_group_leader(tsk))
> list_add_tail(&tsk->cg_list, &cset->dying_tasks);
>
> and then cgroup_release() can check list_empty(cg_list) before it takes
> css_set_lock.
>
> No ?
>
> Or, perhaps we can do even better? Can't cgroup_exit() do something like
>
> // group_dead should be passed from do_exit()
>
> css_set_move_task(tsk, cset, NULL, false);
>
> if (thread_group_leader(tsk) && !group_dead)
> list_add_tail(&tsk->cg_list, &cset->dying_tasks);
>
> else if (!thread_group_leader(tsk) && group_dead) {
> leader = tsk->group_leader;
> if (!list_empty(leader->cg_list) {
> css_set_skip_task_iters(task_css_set(leader), leader);
> list_del_init(&leader->cg_list);
> }
> }
>
> and then
>
> - kill the atomic_read(&task->signal->live)) check in
> css_task_iter_advance()
>
> - kill the code under css_set_lock in cgroup_release()
>
> Oleg.


2024-06-10 13:15:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

Oleg Nesterov <[email protected]> writes:

> Hello,
>
> Eric, I can't understand why the commit ("signal: Guarantee that
> SIGNAL_GROUP_EXIT is set on process exit") added the new
> quick_threads counter. And why, if we forget about --quick_threads,
> synchronize_group_exit() has to take siglock unconditionally.
> Did I miss something obvious?

At a minimum it is the exact same locking as everywhere else that sets
signal->flags, signal->group_exit_code, and signal->group_stop_count
uses.

So it would probably require some significant reason to not use
the same locking and complicate reasoning about the code. I suspect
setting those values without siglock held is likely to lead to
interesting races.

May I ask which direction you are coming at this from? Are you trying
to reduce the cost of do_exit? Are you interested in untangling the
mess that is exiting threads in a process?

I have a branch around that I was slowly working through to detangle
the entire mess. And if you are interested I can dig it back up.
My memory is I had all of the known issues worked through but I still
needed to feed the code through code review and merge it in small steps
to ensure I don't introduce regressions.

That is where signal->quick_threads comes from. In the work it is a
part of I wind up moving the decrement up much sooner to the point where
individual threads decide to exit. The decrement of signal->live comes
much too late to be useful in that context.

It is also part of me wanting to be able to uniformly use
SIGNAL_GROUP_EXIT and signal->group_exit_code when talking about the
process state, and p->exit_code when talking about the per task state.

At the moment I am staring at wait_task_zombie and trying to understand
how:

status = (p->signal->flags & SIGNAL_GROUP_EXIT)
? p->signal->group_exit_code : p->exit_code;

works without any locks or barriers.

Eric







2024-06-10 15:31:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

Hi Eric, thanks for looking at this.

Let me answer your questions out-of-order. But, before anything else,
do you see anything wrong in 1/1 ?

On 06/10, Eric W. Biederman wrote:
>
> May I ask which direction you are coming at this from? Are you trying
> to reduce the cost of do_exit? Are you interested in untangling the
> mess that is exiting threads in a process?

I am trying to understand why do we need another counter.

And, I'd like to cleanup the usage of task->signal->live, I think it
should be avoided (if possible) when task != current. IIRC, we even
discussed this some time ago but I can't find any reference.

See also another thread about css_task_iter_advance().

> > Eric, I can't understand why the commit ("signal: Guarantee that
> > SIGNAL_GROUP_EXIT is set on process exit") added the new
> > quick_threads counter. And why, if we forget about --quick_threads,
> > synchronize_group_exit() has to take siglock unconditionally.
> > Did I miss something obvious?
>
> At a minimum it is the exact same locking as everywhere else that sets
> signal->flags, signal->group_exit_code, and signal->group_stop_count
> uses.
>
> So it would probably require some significant reason to not use
> the same locking and complicate reasoning about the code. I suspect
> setting those values without siglock held is likely to lead to
> interesting races.

I guess I was not clear. Of course, SIGNAL_GROUP_EXIT must be always
set under ->siglock. But I think synchronize_group_exit() can just
return if SIGNAL_GROUP_EXIT is already set? If nothing else, this is
what do_group_exit() does.

Or I misunderstood you?

> That is where signal->quick_threads comes from. In the work it is a
> part of I wind up moving the decrement up much sooner to the point where
> individual threads decide to exit. The decrement of signal->live comes
> much too late to be useful in that context.

And that is why this patch moves the decrement of signal->live to the
start of do_exit().

> It is also part of me wanting to be able to uniformly use
> SIGNAL_GROUP_EXIT and signal->group_exit_code when talking about the
> process state, and p->exit_code when talking about the per task state.

Agreed,

> At the moment I am staring at wait_task_zombie and trying to understand
> how:
>
> status = (p->signal->flags & SIGNAL_GROUP_EXIT)
> ? p->signal->group_exit_code : p->exit_code;
>
> works without any locks or barriers.

Agreed, at first glance this looks worrying without siglock... I'll try
to take a look, perhaps we can simply kill the SIGNAL_GROUP_EXIT check,
not sure.

But this patch should not make any difference ?

Oleg.


2024-06-10 15:44:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

Forgot to mention ...

On 06/10, Oleg Nesterov wrote:
>
> I guess I was not clear. Of course, SIGNAL_GROUP_EXIT must be always
> set under ->siglock. But I think synchronize_group_exit() can just
> return if SIGNAL_GROUP_EXIT is already set? If nothing else, this is
> what do_group_exit() does.

Just in case... I guess do_group_exit() needs some cleanups too. For
example,

if (sig->flags & SIGNAL_GROUP_EXIT)
exit_code = sig->group_exit_code;

without siglock is racy, exit_code can be wrong (not a big problem).
LOAD(sig->flags) can make KCSAN unhappy. Etc.

But this all has nothing to do with 1/1 ?

Oleg.


2024-06-10 16:42:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

On 06/10, Oleg Nesterov wrote:
>
> > At the moment I am staring at wait_task_zombie and trying to understand
> > how:
> >
> > status = (p->signal->flags & SIGNAL_GROUP_EXIT)
> > ? p->signal->group_exit_code : p->exit_code;
> >
> > works without any locks or barriers.
>
> Agreed, at first glance this looks worrying without siglock... I'll try
> to take a look, perhaps we can simply kill the SIGNAL_GROUP_EXIT check,
> not sure.

Still not sure, will check tomorrow, but probably yes. We can rely on the
the exit_state == EXIT_ZOMBIE check in the caller and tasklist_lock.
OTOH, we should probably take ptrace into account...

In short, I am still not sure ;) but I agree this code needs some changes,
the comments at least.

> But this patch should not make any difference ?

Yes.

Oleg.


2024-06-10 20:03:35

by Tejun Heo

[permalink] [raw]
Subject: Re: Q: css_task_iter_advance() && dying_tasks

On Mon, Jun 10, 2024 at 01:08:52PM +0200, Oleg Nesterov wrote:
> Sorry for the spam, forgot to mention. Either way the usage of group_dead
> or atomic_read(signal->live) in these paths doesn't look "perfect", but
> this is another thing. The pseudo code below tries to mimic the current
> logic but again, I'm afraid I misread this code completely.

The usage of signal->live there is something I added without much thinking.
I just needed something which goes off after all sub-threads are gone.
Getting @group_dead from do_exit() sounds perfectly fine to me.

Thanks.

--
tejun

2024-06-10 20:04:57

by Tejun Heo

[permalink] [raw]
Subject: Re: Q: css_task_iter_advance() && dying_tasks

Hello, Oleg.

On Mon, Jun 10, 2024 at 12:50:28PM +0200, Oleg Nesterov wrote:
> I never understood the code in kernel/cgroup/ even remotely, most probably
> I missed something, but let me ask a couple of stupid questions anyway.
>
> cgroup_exit() does
>
> css_set_move_task(tsk, cset, NULL, false);
> list_add_tail(&tsk->cg_list, &cset->dying_tasks);
>
> but unless I am totally confused css_task_iter_advance() always ignores
> the "dying" sub-threads, so perhaps it should do, say,
>
> css_set_move_task(tsk, cset, NULL, false);
> if (delay_group_leader(tsk))
> list_add_tail(&tsk->cg_list, &cset->dying_tasks);
>
> and then cgroup_release() can check list_empty(cg_list) before it takes
> css_set_lock.
>
> No ?

Yeah, I think so. The current code hasn't broken for quite a while but it
also hasn't received much attention after the iterator updates which were
very much in the spirit of just getting it to work. I don't think you're
missing anything.

> Or, perhaps we can do even better? Can't cgroup_exit() do something like
>
> // group_dead should be passed from do_exit()
>
> css_set_move_task(tsk, cset, NULL, false);
>
> if (thread_group_leader(tsk) && !group_dead)
> list_add_tail(&tsk->cg_list, &cset->dying_tasks);
>
> else if (!thread_group_leader(tsk) && group_dead) {
> leader = tsk->group_leader;
> if (!list_empty(leader->cg_list) {
> css_set_skip_task_iters(task_css_set(leader), leader);
> list_del_init(&leader->cg_list);
> }
> }
>
> and then
>
> - kill the atomic_read(&task->signal->live)) check in
> css_task_iter_advance()
>
> - kill the code under css_set_lock in cgroup_release()

That does sound a lot better than the current code. Care to send the patch?

Thanks.

--
tejun

2024-06-13 15:54:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

So...

Eric, do you agree with this patch or not?

Tejun, sorry for delay, I'll try to send the patch which cleanups
(at least in my opinion) the ->dying_tasks logic as soon as I have
time. But just in case... no, cgroup_exit() can't rely on group_exit
passed from the caller, I was wrong ;)


On 06/10, Oleg Nesterov wrote:
>
> Hi Eric, thanks for looking at this.
>
> Let me answer your questions out-of-order. But, before anything else,
> do you see anything wrong in 1/1 ?
>
> On 06/10, Eric W. Biederman wrote:
> >
> > May I ask which direction you are coming at this from? Are you trying
> > to reduce the cost of do_exit? Are you interested in untangling the
> > mess that is exiting threads in a process?
>
> I am trying to understand why do we need another counter.
>
> And, I'd like to cleanup the usage of task->signal->live, I think it
> should be avoided (if possible) when task != current. IIRC, we even
> discussed this some time ago but I can't find any reference.
>
> See also another thread about css_task_iter_advance().
>
> > > Eric, I can't understand why the commit ("signal: Guarantee that
> > > SIGNAL_GROUP_EXIT is set on process exit") added the new
> > > quick_threads counter. And why, if we forget about --quick_threads,
> > > synchronize_group_exit() has to take siglock unconditionally.
> > > Did I miss something obvious?
> >
> > At a minimum it is the exact same locking as everywhere else that sets
> > signal->flags, signal->group_exit_code, and signal->group_stop_count
> > uses.
> >
> > So it would probably require some significant reason to not use
> > the same locking and complicate reasoning about the code. I suspect
> > setting those values without siglock held is likely to lead to
> > interesting races.
>
> I guess I was not clear. Of course, SIGNAL_GROUP_EXIT must be always
> set under ->siglock. But I think synchronize_group_exit() can just
> return if SIGNAL_GROUP_EXIT is already set? If nothing else, this is
> what do_group_exit() does.
>
> Or I misunderstood you?
>
> > That is where signal->quick_threads comes from. In the work it is a
> > part of I wind up moving the decrement up much sooner to the point where
> > individual threads decide to exit. The decrement of signal->live comes
> > much too late to be useful in that context.
>
> And that is why this patch moves the decrement of signal->live to the
> start of do_exit().
>
> > It is also part of me wanting to be able to uniformly use
> > SIGNAL_GROUP_EXIT and signal->group_exit_code when talking about the
> > process state, and p->exit_code when talking about the per task state.
>
> Agreed,
>
> > At the moment I am staring at wait_task_zombie and trying to understand
> > how:
> >
> > status = (p->signal->flags & SIGNAL_GROUP_EXIT)
> > ? p->signal->group_exit_code : p->exit_code;
> >
> > works without any locks or barriers.
>
> Agreed, at first glance this looks worrying without siglock... I'll try
> to take a look, perhaps we can simply kill the SIGNAL_GROUP_EXIT check,
> not sure.
>
> But this patch should not make any difference ?
>
> Oleg.


2024-06-15 14:54:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/1] exit: kill signal_struct->quick_threads

Oleg Nesterov <[email protected]> writes:

> So...
>
> Eric, do you agree with this patch or not?

I really don't.

I think skipping some work if SIGNAL_GROUP_EXIT is already
set is not necessarily wrong.

I think we need the quick_threads count, and related cleanups.
I was hoping to be able to post a patchset with this reply
to explain things, but it looks like that is still a couple
of days off.

Where I was going, and where I think we should go with quick_threads is
an exit path that is much easier to reason about and maintain. But the
decrement needs to move into the same code that sets SIGNAL_GROUP_EXIT.
Which makes it something very different from signal->live.

Eric