To: akpm@linux-foundation.org
Cc: oleg@redhat.com, tglx@linutronix.de, linux-kernel@vger.kernel.org,
       paulmck@linux.vnet.ibm.com, linux-security-module@vger.kernel.org
Subject: Re: [PATCH] Update comment on find_task_by_pid_ns
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
References: <20100208132101.GA7129@redhat.com>
	<alpine.LFD.2.00.1002081802510.2811@localhost.localdomain>
	<20100208171643.GA19230@redhat.com>
	<201002090642.EBE48414.HLJVFOQFSOFOMt@I-love.SAKURA.ne.jp>
	<20100209140818.43bb9770.akpm@linux-foundation.org>
In-Reply-To: <20100209140818.43bb9770.akpm@linux-foundation.org>
Message-Id: <201002111021.EJG17183.FtOVFLSFQOJMHO@I-love.SAKURA.ne.jp>
Date: Thu, 11 Feb 2010 10:21:15 +0900
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2312
Lines: 65

Andrew Morton wrote:
> > What should we do? Adding rcu_read_lock()/rcu_read_unlock() to each
> > callers? Or adding rcu_read_lock()/rcu_read_unlock() inside
> > find_task_by_pid_ns()?
> 
> Putting rcu_read_lock() in the callee isn't a complete solution. 
> Because the function would still be returning a task_struct* without
> any locking held and without taking a reference against it.  So that
> pointer is useless to the caller!
> 
> We could add a new function which looks up the task and then takes a
> reference on it, insde suitable locks.  The caller would then use the
> task_struct and then remember to call put_task_struct() to unpin it. 
> This prevents the task_struct from getting freed while it's being
> manipulated, but it doesn't prevent fields within it from being altered
> - that's up to the caller to sort out.

Code for "struct task_struct" is too complicated for me to understand,
but my understanding is that

(1) tasklist_lock is acquired for writing.

(2) "struct task_struct" (to exit()) is removed from task's list.

(3) tasklist_lock is released.

(4) Wait for RCU grace period.

(5) kfree() members of "struct task_struct".

(6) kfree() "struct task_struct" itself.

If above sequence is correct, I think

	rcu_read_lock();
	task = find_task_by_pid_ns();
	if (task)
		do_something(task);
	rcu_read_unlock();

do_something() can safely access all members of task without
read_lock(&tasklist_lock), except task->prev (I don't know the exact member)
and task->usage, because do_something() finishes its work before (5).
I think we need to call find_task_by_pid_ns() with both
read_lock(&tasklist_lock) and rcu_read_lock()

	read_lock(&tasklist_lock);
	rcu_read_lock();
	task = find_task_by_pid_ns();
	if (task)
		atomido_something(task);
	rcu_read_unlock();
	read_unlock(&tasklist_lock);

only when do_something() wants to access task->prev or task->usage .

> 
> One fix is to go through all those callsites and add the rcu_read_lock.
> That kinda sucks.  Perhaps writing the new function which returns a
> pinned task_struct is better?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/