Message-ID: <1423152711.6933.19.camel@tkhai>
Subject: Re: [PATCH] de_thread: Move notify_count write under lock
From: Kirill Tkhai <ktkhai@parallels.com>
To: Oleg Nesterov <oleg@redhat.com>
CC: <linux-kernel@vger.kernel.org>, Andrew Morton <akpm@linux-foundation.org>,
        <tkhai@yandex.ru>
Date: Thu, 5 Feb 2015 19:11:51 +0300
In-Reply-To: <20150205133829.GA8322@redhat.com>
References: <1423142000.6933.3.camel@tkhai>
	 <20150205133829.GA8322@redhat.com>
Organization: Parallels
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3110
Lines: 87

В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет:
> On 02/05, Kirill Tkhai wrote:
> >
> > The write operation may be reordered with the setting of group_exit_task.
> > If so, this fires in exit_notify().
> 
> How?
> 
> OK, yes, "sig->notify_count = -1" can be reordered with the last unlock,
> but we do not care?
> 
> group_exit_task + notify_count is only checked under the same lock, and
> "notify_count = -1" can't happen until de_thread() sees it is zero.
> 
> Could you explain why this is bad in more details?
> 
> 
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk)
> >  	if (!thread_group_leader(tsk)) {
> >  		struct task_struct *leader = tsk->group_leader;
> >
> > -		sig->notify_count = -1;	/* for exit_notify() */
> >  		for (;;) {
> >  			threadgroup_change_begin(tsk);
> >  			write_lock_irq(&tasklist_lock);
> > +			/*
> > +			 * We could set it once outside the for() cycle, but
> > +			 * this requires to use SMP barriers there and in
> > +			 * exit_notify(), because the write operation may
> > +			 * be reordered with the setting of group_exit_task.
> > +			 */
> > +			sig->notify_count = -1;	/* for exit_notify() */
> >  			if (likely(leader->exit_state))
> >  				break;
> >  			__set_current_state(TASK_KILLABLE);
> 
> Perhaps something like this makes sense anyway to make the code more
> clear, but in this case I'd suggest to set ->notify_count after we
> check ->exit_state. And without the (afaics!) misleading comment...
> 
> Or I missed something?

Other solution is in the patch below.

Can't (sig->notify_count == -1) be visible earlier than tsk->signal->group_exit_task
in exit_notify()?

tasklist_lock is held in exit_notify(), but de_thread() actions (notify_count and
group_exit_task writes) are independent from it (another lock is held there).

diff --git a/fs/exec.c b/fs/exec.c
index ad8798e..e3235b7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -920,6 +920,7 @@ static int de_thread(struct task_struct *tsk)
 	if (!thread_group_leader(tsk)) {
 		struct task_struct *leader = tsk->group_leader;
 
+		smp_wmb(); /* Pairs with smp_rmb() in exit_notify */
 		sig->notify_count = -1;	/* for exit_notify() */
 		for (;;) {
 			threadgroup_change_begin(tsk);
diff --git a/kernel/exit.c b/kernel/exit.c
index 6806c55..665fe0e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -615,8 +615,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 		list_add(&tsk->ptrace_entry, &dead);
 
 	/* mt-exec, de_thread() is waiting for group leader */
-	if (unlikely(tsk->signal->notify_count < 0))
+	if (unlikely(tsk->signal->notify_count < 0)) {
+		smp_rmb(); /* Pairs with smp_wmb() in de_thread */
 		wake_up_process(tsk->signal->group_exit_task);
+	}
 	write_unlock_irq(&tasklist_lock);
 
 	list_for_each_entry_safe(p, n, &dead, ptrace_entry) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/