Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758129AbbBEQL4 (ORCPT ); Thu, 5 Feb 2015 11:11:56 -0500 Received: from relay.parallels.com ([195.214.232.42]:47633 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753916AbbBEQLz (ORCPT ); Thu, 5 Feb 2015 11:11:55 -0500 Message-ID: <1423152711.6933.19.camel@tkhai> Subject: Re: [PATCH] de_thread: Move notify_count write under lock From: Kirill Tkhai To: Oleg Nesterov CC: , Andrew Morton , Date: Thu, 5 Feb 2015 19:11:51 +0300 In-Reply-To: <20150205133829.GA8322@redhat.com> References: <1423142000.6933.3.camel@tkhai> <20150205133829.GA8322@redhat.com> Organization: Parallels Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.8.5-2+b3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Originating-IP: [10.30.26.172] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3110 Lines: 87 В Чт, 05/02/2015 в 14:38 +0100, Oleg Nesterov пишет: > On 02/05, Kirill Tkhai wrote: > > > > The write operation may be reordered with the setting of group_exit_task. > > If so, this fires in exit_notify(). > > How? > > OK, yes, "sig->notify_count = -1" can be reordered with the last unlock, > but we do not care? > > group_exit_task + notify_count is only checked under the same lock, and > "notify_count = -1" can't happen until de_thread() sees it is zero. > > Could you explain why this is bad in more details? > > > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -920,10 +920,16 @@ static int de_thread(struct task_struct *tsk) > > if (!thread_group_leader(tsk)) { > > struct task_struct *leader = tsk->group_leader; > > > > - sig->notify_count = -1; /* for exit_notify() */ > > for (;;) { > > threadgroup_change_begin(tsk); > > write_lock_irq(&tasklist_lock); > > + /* > > + * We could set it once outside the for() cycle, but > > + * this requires to use SMP barriers there and in > > + * exit_notify(), because the write operation may > > + * be reordered with the setting of group_exit_task. > > + */ > > + sig->notify_count = -1; /* for exit_notify() */ > > if (likely(leader->exit_state)) > > break; > > __set_current_state(TASK_KILLABLE); > > Perhaps something like this makes sense anyway to make the code more > clear, but in this case I'd suggest to set ->notify_count after we > check ->exit_state. And without the (afaics!) misleading comment... > > Or I missed something? Other solution is in the patch below. Can't (sig->notify_count == -1) be visible earlier than tsk->signal->group_exit_task in exit_notify()? tasklist_lock is held in exit_notify(), but de_thread() actions (notify_count and group_exit_task writes) are independent from it (another lock is held there). diff --git a/fs/exec.c b/fs/exec.c index ad8798e..e3235b7 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -920,6 +920,7 @@ static int de_thread(struct task_struct *tsk) if (!thread_group_leader(tsk)) { struct task_struct *leader = tsk->group_leader; + smp_wmb(); /* Pairs with smp_rmb() in exit_notify */ sig->notify_count = -1; /* for exit_notify() */ for (;;) { threadgroup_change_begin(tsk); diff --git a/kernel/exit.c b/kernel/exit.c index 6806c55..665fe0e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -615,8 +615,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead) list_add(&tsk->ptrace_entry, &dead); /* mt-exec, de_thread() is waiting for group leader */ - if (unlikely(tsk->signal->notify_count < 0)) + if (unlikely(tsk->signal->notify_count < 0)) { + smp_rmb(); /* Pairs with smp_wmb() in de_thread */ wake_up_process(tsk->signal->group_exit_task); + } write_unlock_irq(&tasklist_lock); list_for_each_entry_safe(p, n, &dead, ptrace_entry) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/