2015-02-05 13:13:26

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH] de_thread: Move notify_count write under lock


The write operation may be reordered with the setting of group_exit_task.
If so, this fires in exit_notify().

Looks like, it's not good to add smp barriers for this case, especially
in exit_notify(), so let's put the notify_count write under write lock.

Signed-off-by: Kirill Tkhai <[email protected]>
---
fs/exec.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index ad8798e..42782d5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;

- sig->notify_count = -1; /* for exit_notify() */
for (;;) {
threadgroup_change_begin(tsk);
write_lock_irq(&tasklist_lock);
+ /*
+ * We could set it once outside the for() cycle, but
+ * this requires to use SMP barriers there and in
+ * exit_notify(), because the write operation may
+ * be reordered with the setting of group_exit_task.
+ */
+ sig->notify_count = -1; /* for exit_notify() */
if (likely(leader->exit_state))
break;
__set_current_state(TASK_KILLABLE);



2015-02-05 13:39:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

On 02/05, Kirill Tkhai wrote:
>
> The write operation may be reordered with the setting of group_exit_task.
> If so, this fires in exit_notify().

How?

OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
but we do not care?

group_exit_task + notify_count is only checked under the same lock, and
"notify_count = -1" can't happen until de_thread() sees it is zero.

Could you explain why this is bad in more details?


> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> if (!thread_group_leader(tsk)) {
> struct task_struct *leader = tsk->group_leader;
>
> - sig->notify_count = -1; /* for exit_notify() */
> for (;;) {
> threadgroup_change_begin(tsk);
> write_lock_irq(&tasklist_lock);
> + /*
> + * We could set it once outside the for() cycle, but
> + * this requires to use SMP barriers there and in
> + * exit_notify(), because the write operation may
> + * be reordered with the setting of group_exit_task.
> + */
> + sig->notify_count = -1; /* for exit_notify() */
> if (likely(leader->exit_state))
> break;
> __set_current_state(TASK_KILLABLE);

Perhaps something like this makes sense anyway to make the code more
clear, but in this case I'd suggest to set ->notify_count after we
check ->exit_state. And without the (afaics!) misleading comment...

Or I missed something?

Oleg.

2015-02-05 14:15:09

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет:
> On 02/05, Kirill Tkhai wrote:
> >
> > The write operation may be reordered with the setting of group_exit_task.
> > If so, this fires in exit_notify().
>
> How?
>
> OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
> but we do not care?
>
> group_exit_task + notify_count is only checked under the same lock, and
> "notify_count = -1" can't happen until de_thread() sees it is zero.
>
> Could you explain why this is bad in more details?

Can't exit_notify() see tsk->signal->notify_count == -1 before
tsk->signal->group_exit_task?

As I see in Documentation/memory-barriers.txt:

RELEASE operation implication:
Memory operations issued after the RELEASE may be completed before the
RELEASE operation has completed.


>
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> > if (!thread_group_leader(tsk)) {
> > struct task_struct *leader = tsk->group_leader;
> >
> > - sig->notify_count = -1; /* for exit_notify() */
> > for (;;) {
> > threadgroup_change_begin(tsk);
> > write_lock_irq(&tasklist_lock);
> > + /*
> > + * We could set it once outside the for() cycle, but
> > + * this requires to use SMP barriers there and in
> > + * exit_notify(), because the write operation may
> > + * be reordered with the setting of group_exit_task.
> > + */
> > + sig->notify_count = -1; /* for exit_notify() */
> > if (likely(leader->exit_state))
> > break;
> > __set_current_state(TASK_KILLABLE);
>
> Perhaps something like this makes sense anyway to make the code more
> clear, but in this case I'd suggest to set ->notify_count after we
> check ->exit_state. And without the (afaics!) misleading comment...
>
> Or I missed something?
>
> Oleg.
>

2015-02-05 14:27:43

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

В Чт, 05/02/2015 в 17:15 +0300, Kirill Tkhai пишет:
> В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет:
> > On 02/05, Kirill Tkhai wrote:
> > >
> > > The write operation may be reordered with the setting of group_exit_task.
> > > If so, this fires in exit_notify().
> >
> > How?
> >
> > OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
> > but we do not care?
> >
> > group_exit_task + notify_count is only checked under the same lock, and
> > "notify_count = -1" can't happen until de_thread() sees it is zero.
> >
> > Could you explain why this is bad in more details?
>
> Can't exit_notify() see tsk->signal->notify_count == -1 before
> tsk->signal->group_exit_task?
>
> As I see in Documentation/memory-barriers.txt:
>
> RELEASE operation implication:
> Memory operations issued after the RELEASE may be completed before the
> RELEASE operation has completed.

Thread group leader (I) Thread (II)

exit_notify() de_thread()

sig->group_exit_task = tsk;
sig->notify_count = zap_other_threads(tsk); // == 1
if (!thread_group_leader(tsk))
sig->notify_count--; // == 0

spin_unlock_irq(lock);

sig->notify_count = -1;


if (tsk->signal->notify_count < 0) (== -1)

wake_up_process(tsk->signal->group_exit_task); (garbage in group_exit_task)




> >
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> > > if (!thread_group_leader(tsk)) {
> > > struct task_struct *leader = tsk->group_leader;
> > >
> > > - sig->notify_count = -1; /* for exit_notify() */
> > > for (;;) {
> > > threadgroup_change_begin(tsk);
> > > write_lock_irq(&tasklist_lock);
> > > + /*
> > > + * We could set it once outside the for() cycle, but
> > > + * this requires to use SMP barriers there and in
> > > + * exit_notify(), because the write operation may
> > > + * be reordered with the setting of group_exit_task.
> > > + */
> > > + sig->notify_count = -1; /* for exit_notify() */
> > > if (likely(leader->exit_state))
> > > break;
> > > __set_current_state(TASK_KILLABLE);
> >
> > Perhaps something like this makes sense anyway to make the code more
> > clear, but in this case I'd suggest to set ->notify_count after we
> > check ->exit_state. And without the (afaics!) misleading comment...
> >
> > Or I missed something?
> >
> > Oleg.
> >
>

2015-02-05 16:11:56

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет:
> On 02/05, Kirill Tkhai wrote:
> >
> > The write operation may be reordered with the setting of group_exit_task.
> > If so, this fires in exit_notify().
>
> How?
>
> OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
> but we do not care?
>
> group_exit_task + notify_count is only checked under the same lock, and
> "notify_count = -1" can't happen until de_thread() sees it is zero.
>
> Could you explain why this is bad in more details?
>
>
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> > if (!thread_group_leader(tsk)) {
> > struct task_struct *leader = tsk->group_leader;
> >
> > - sig->notify_count = -1; /* for exit_notify() */
> > for (;;) {
> > threadgroup_change_begin(tsk);
> > write_lock_irq(&tasklist_lock);
> > + /*
> > + * We could set it once outside the for() cycle, but
> > + * this requires to use SMP barriers there and in
> > + * exit_notify(), because the write operation may
> > + * be reordered with the setting of group_exit_task.
> > + */
> > + sig->notify_count = -1; /* for exit_notify() */
> > if (likely(leader->exit_state))
> > break;
> > __set_current_state(TASK_KILLABLE);
>
> Perhaps something like this makes sense anyway to make the code more
> clear, but in this case I'd suggest to set ->notify_count after we
> check ->exit_state. And without the (afaics!) misleading comment...
>
> Or I missed something?

Other solution is in the patch below.

Can't (sig->notify_count == -1) be visible earlier than tsk->signal->group_exit_task
in exit_notify()?

tasklist_lock is held in exit_notify(), but de_thread() actions (notify_count and
group_exit_task writes) are independent from it (another lock is held there).

diff --git a/fs/exec.c b/fs/exec.c
index ad8798e..e3235b7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -920,6 +920,7 @@ static int de_thread(struct task_struct *tsk)
if (!thread_group_leader(tsk)) {
struct task_struct *leader = tsk->group_leader;

+ smp_wmb(); /* Pairs with smp_rmb() in exit_notify */
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
threadgroup_change_begin(tsk);
diff --git a/kernel/exit.c b/kernel/exit.c
index 6806c55..665fe0e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -615,8 +615,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
list_add(&tsk->ptrace_entry, &dead);

/* mt-exec, de_thread() is waiting for group leader */
- if (unlikely(tsk->signal->notify_count < 0))
+ if (unlikely(tsk->signal->notify_count < 0)) {
+ smp_rmb(); /* Pairs with smp_wmb() in de_thread */
wake_up_process(tsk->signal->group_exit_task);
+ }
write_unlock_irq(&tasklist_lock);

list_for_each_entry_safe(p, n, &dead, ptrace_entry) {

2015-02-05 16:49:44

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

Ah, sig->notify_count = zap_other_threads(tsk) is the same variable.

Sorry for the noise.

Thanks,
Kirill

В Чт, 05/02/2015 в 19:11 +0300, Kirill Tkhai пишет:
> В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет:
> > On 02/05, Kirill Tkhai wrote:
> > >
> > > The write operation may be reordered with the setting of group_exit_task.
> > > If so, this fires in exit_notify().
> >
> > How?
> >
> > OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
> > but we do not care?
> >
> > group_exit_task + notify_count is only checked under the same lock, and
> > "notify_count = -1" can't happen until de_thread() sees it is zero.
> >
> > Could you explain why this is bad in more details?
> >
> >
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> > > if (!thread_group_leader(tsk)) {
> > > struct task_struct *leader = tsk->group_leader;
> > >
> > > - sig->notify_count = -1; /* for exit_notify() */
> > > for (;;) {
> > > threadgroup_change_begin(tsk);
> > > write_lock_irq(&tasklist_lock);
> > > + /*
> > > + * We could set it once outside the for() cycle, but
> > > + * this requires to use SMP barriers there and in
> > > + * exit_notify(), because the write operation may
> > > + * be reordered with the setting of group_exit_task.
> > > + */
> > > + sig->notify_count = -1; /* for exit_notify() */
> > > if (likely(leader->exit_state))
> > > break;
> > > __set_current_state(TASK_KILLABLE);
> >
> > Perhaps something like this makes sense anyway to make the code more
> > clear, but in this case I'd suggest to set ->notify_count after we
> > check ->exit_state. And without the (afaics!) misleading comment...
> >
> > Or I missed something?
>
> Other solution is in the patch below.
>
> Can't (sig->notify_count == -1) be visible earlier than tsk->signal->group_exit_task
> in exit_notify()?
>
> tasklist_lock is held in exit_notify(), but de_thread() actions (notify_count and
> group_exit_task writes) are independent from it (another lock is held there).
>
> diff --git a/fs/exec.c b/fs/exec.c
> index ad8798e..e3235b7 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -920,6 +920,7 @@ static int de_thread(struct task_struct *tsk)
> if (!thread_group_leader(tsk)) {
> struct task_struct *leader = tsk->group_leader;
>
> + smp_wmb(); /* Pairs with smp_rmb() in exit_notify */
> sig->notify_count = -1; /* for exit_notify() */
> for (;;) {
> threadgroup_change_begin(tsk);
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 6806c55..665fe0e 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -615,8 +615,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> list_add(&tsk->ptrace_entry, &dead);
>
> /* mt-exec, de_thread() is waiting for group leader */
> - if (unlikely(tsk->signal->notify_count < 0))
> + if (unlikely(tsk->signal->notify_count < 0)) {
> + smp_rmb(); /* Pairs with smp_wmb() in de_thread */
> wake_up_process(tsk->signal->group_exit_task);
> + }
> write_unlock_irq(&tasklist_lock);
>
> list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
>

2015-02-05 17:10:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

On 02/05, Kirill Tkhai wrote:
>
> В Чт, 05/02/2015 в 17:15 +0300, Kirill Tkhai пишет:
> > В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет:
> > > On 02/05, Kirill Tkhai wrote:
> > > >
> > > > The write operation may be reordered with the setting of group_exit_task.
> > > > If so, this fires in exit_notify().
> > >
> > > How?
> > >
> > > OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
> > > but we do not care?
> > >
> > > group_exit_task + notify_count is only checked under the same lock, and
> > > "notify_count = -1" can't happen until de_thread() sees it is zero.
> > >
> > > Could you explain why this is bad in more details?
> >
> > Can't exit_notify() see tsk->signal->notify_count == -1 before
> > tsk->signal->group_exit_task?

Ah, I am starting understand, thanks!

And sorry, it turns out I didn't read the changelog/comment carefully
enough.

> > As I see in Documentation/memory-barriers.txt:
> >
> > RELEASE operation implication:
> > Memory operations issued after the RELEASE may be completed before the
> > RELEASE operation has completed.

Yes sure. But I think this doesn't matter, and this is why I was confused.
I still do not think that they can be reordered on CPU which does de_thread()
but this doesn't matter too and perhaps I am wrong again.

> if (tsk->signal->notify_count < 0) (== -1)
>
> wake_up_process(tsk->signal->group_exit_task); (garbage in group_exit_task)

Yes, thanks for correcting me. Whatever de_thread() does, another CPU can
observe these changes in other order. Not to mention we do not even have
the compiler barrier.

Still. Can't you make the comment more explicit? Say, something like

/*
* Do this under tasklist_lock to ensure that
* exit_notify() can't miss ->group_exit_task
*/
sig->notify_count = -1;

although when I re-read your comment it no longer looks confusing to me,
so I won't insist. As for "set notify_count=-1 after ->exit_state check",
this is cosmetic too.

Acked-by: Oleg Nesterov <[email protected]>


> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> > > > if (!thread_group_leader(tsk)) {
> > > > struct task_struct *leader = tsk->group_leader;
> > > >
> > > > - sig->notify_count = -1; /* for exit_notify() */
> > > > for (;;) {
> > > > threadgroup_change_begin(tsk);
> > > > write_lock_irq(&tasklist_lock);
> > > > + /*
> > > > + * We could set it once outside the for() cycle, but
> > > > + * this requires to use SMP barriers there and in
> > > > + * exit_notify(), because the write operation may
> > > > + * be reordered with the setting of group_exit_task.
> > > > + */
> > > > + sig->notify_count = -1; /* for exit_notify() */
> > > > if (likely(leader->exit_state))
> > > > break;
> > > > __set_current_state(TASK_KILLABLE);
> > >
> > > Perhaps something like this makes sense anyway to make the code more
> > > clear, but in this case I'd suggest to set ->notify_count after we
> > > check ->exit_state. And without the (afaics!) misleading comment...
> > >
> > > Or I missed something?
> > >
> > > Oleg.
> > >
> >
>
>

2015-02-05 17:20:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

On 02/05, Kirill Tkhai wrote:
>
> Other solution is in the patch below.

I like your previous patch more ;)

> Can't (sig->notify_count == -1) be visible earlier than tsk->signal->group_exit_task
> in exit_notify()?

Yes, please see another email I sent. Sorry again for missing your point.

> tasklist_lock is held in exit_notify(), but de_thread() actions (notify_count and
> group_exit_task writes) are independent from it (another lock is held there).
^^^^^^^^^^^^^^^^^^^^^^^^^^

Yes, I missed this.

Oleg.

2015-02-23 20:20:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] de_thread: Move notify_count write under lock

I still think that the changelog and the comment look a bit confusing...
it could simply say that exit_notify() can see these STORE's out of order.
And we can set ->notify_count after ->exit_state check, but again this is
cosmetic, I won't insist. The main problem with this patch is that it was
ignored ;)

Kirill, could you resend? Feel free to add my ack.

On 02/05, Kirill Tkhai wrote:
>
> The write operation may be reordered with the setting of group_exit_task.
> If so, this fires in exit_notify().
>
> Looks like, it's not good to add smp barriers for this case, especially
> in exit_notify(), so let's put the notify_count write under write lock.
>
> Signed-off-by: Kirill Tkhai <[email protected]>
> ---
> fs/exec.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index ad8798e..42782d5 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> if (!thread_group_leader(tsk)) {
> struct task_struct *leader = tsk->group_leader;
>
> - sig->notify_count = -1; /* for exit_notify() */
> for (;;) {
> threadgroup_change_begin(tsk);
> write_lock_irq(&tasklist_lock);
> + /*
> + * We could set it once outside the for() cycle, but
> + * this requires to use SMP barriers there and in
> + * exit_notify(), because the write operation may
> + * be reordered with the setting of group_exit_task.
> + */
> + sig->notify_count = -1; /* for exit_notify() */
> if (likely(leader->exit_state))
> break;
> __set_current_state(TASK_KILLABLE);
>
>
>