Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758641AbYA3J1w (ORCPT ); Wed, 30 Jan 2008 04:27:52 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755282AbYA3J1d (ORCPT ); Wed, 30 Jan 2008 04:27:33 -0500 Received: from x346.tv-sign.ru ([89.108.83.215]:35555 "EHLO mail.screens.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755684AbYA3J1a (ORCPT ); Wed, 30 Jan 2008 04:27:30 -0500 Date: Wed, 30 Jan 2008 12:30:18 +0300 From: Oleg Nesterov To: "Eric W. Biederman" Cc: Andrew Morton , Ingo Molnar , "Paul E. McKenney" , Pavel Emelyanov , linux-kernel@vger.kernel.org Subject: Re: [PATCH] fix tasklist + find_pid() with CONFIG_PREEMPT_RCU Message-ID: <20080130093018.GA88@tv-sign.ru> References: <20080129164019.GA2060@tv-sign.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3651 Lines: 105 On 01/29, Eric W. Biederman wrote: > > Oleg Nesterov writes: > > > With CONFIG_PREEMPT_RCU read_lock(tasklist_lock) doesn't imply rcu_read_lock(), > > but find_pid_ns()->hlist_for_each_entry_rcu() should be safe under tasklist. > > > > Usually it is, detach_pid() is always called under write_lock(tasklist_lock), > > but copy_process() calls free_pid() lockless. > > > > "#ifdef CONFIG_PREEMPT_RCU" is added mostly as documentation, perhaps it is > > too ugly and should be removed. > > > > Signed-off-by: Oleg Nesterov > > > > --- MM/kernel/fork.c~PR_RCU 2008-01-27 17:09:47.000000000 +0300 > > +++ MM/kernel/fork.c 2008-01-29 19:23:44.000000000 +0300 > > @@ -1335,8 +1335,19 @@ static struct task_struct *copy_process( > > return p; > > > > bad_fork_free_pid: > > - if (pid != &init_struct_pid) > > + if (pid != &init_struct_pid) { > > +#ifdef CONFIG_PREEMPT_RCU > > + /* > > + * read_lock(tasklist_lock) doesn't imply rcu_read_lock(), > > + * make sure find_pid() is safe under read_lock(tasklist). > > + */ > > + write_lock_irq(&tasklist_lock); > > +#endif > > free_pid(pid); > > +#ifdef CONFIG_PREEMPT_RCU > > + write_unlock_irq(&tasklist_lock); > > +#endif > > + } > > bad_fork_cleanup_namespaces: > > exit_task_namespaces(p); > > bad_fork_cleanup_keys: > > Ok. I believe I see what problem you are trying to fix. That > a pid returned from find_pid might disappear if we are not rcu > protected. Not only. Any find_pid() is unsafe under tasklist, even the find_pid(1). Because free_pid() mangles the pid_hash[hash] while find_pid() scans the same list. > This patch in the simplest form is wrong because it is confusing. > > We currently appear to have two options. > 1) Force all pid hash table access and pid accesses that > do not get a count to be covered under rcu_read_lock. I agree, we can (and should) convert most of these read_lock(tasklist)'s to rcu_read_lock(). But we have a lot of them. > 2) To modify the locking requirements for free_pid to require > the tasklist_lock. > > However this second approach is horribly brittle, as it > will break if we ever have intermediate entries in the > hash table protected by pidmap_lock. > > Using the tasklist_lock to still guarantee we see the list, the entire > list, and exactly the list for proper implementation of kill to > process groups and sessions still seems sane. And this means that attach_pid() and detach_pid() need write_lock(tasklist) anyway. So copy_process()->free_pid() is the only case when we modify the pid_hash[] list without tasklist. > So let's just remove the guarantee of find_pid being usable with > just the tasklist_lock held. > > Eric > > diff --git a/include/linux/pid.h b/include/linux/pid.h > index e29a900..0ffb8cc 100644 > --- a/include/linux/pid.h > +++ b/include/linux/pid.h > @@ -100,8 +100,7 @@ struct pid_namespace; > extern struct pid_namespace init_pid_ns; > > /* > - * look up a PID in the hash table. Must be called with the tasklist_lock > - * or rcu_read_lock() held. > + * look up a PID in the hash table. Must be called with the rcu_read_lock() held. Imho, we should first fix all users of read_lock(tasklist)+find_..._pid(). So I still think this patch makes sense as a trivial fix for now, until we add the necessary rcu_read_lock()s. However this race is very unlikely, perhaps we can live with it. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/