On Tue, Oct 12, 2010 at 12:08:46AM -0700, Greg Thelen wrote:
> I observe a failing rcu_dereference_check() in linux-next (found in
> mmotm-2010-10-07-14-08). An extra rcu assertion in
> find_task_by_pid_ns() was added by:
> commit 4221a9918e38b7494cee341dda7b7b4bb8c04bde
> Author: Tetsuo Handa <[email protected]>
> Date: Sat Jun 26 01:08:19 2010 +0900
>
> Add RCU check for find_task_by_vpid().
>
> This extra assertion causes a rcu_dereference_check() failure during
> boot in 512 MIB VM. I would be happy to get out proposed patches to
> this issue. My config includes:
> CONFIG_PREEMPT=y
> CONFIG_LOCKDEP=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_PROVE_RCU=y
>
> The console error:
>
> Begin: Running /scripts/local-bottom ...
> Done.
> Done.
> Begin: Running /scripts/init-bottom ...
> Done.
> [ 3.394348]
> [ 3.394349] ===================================================
> [ 3.395162] [ INFO: suspicious rcu_dereference_check() usage. ]
> [ 3.395786] ---------------------------------------------------
> [ 3.396452] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
> [ 3.397483]
> [ 3.397484] other info that might help us debug this:
> [ 3.397485]
> [ 3.398363]
> [ 3.398364] rcu_scheduler_active = 1, debug_locks = 0
> [ 3.399073] 1 lock held by ureadahead/1438:
> [ 3.399515] #0: (tasklist_lock){.+.+..}, at: [<ffffffff811c1d1a>] sys_ioprio_set+0x8a/0x3f0
> [ 3.400500]
> [ 3.400501] stack backtrace:
> [ 3.401036] Pid: 1438, comm: ureadahead Not tainted 2.6.36-dbg-DEV #10
> [ 3.401717] Call Trace:
> [ 3.401996] [<ffffffff810c720b>] lockdep_rcu_dereference+0xbb/0xc0
> [ 3.402742] [<ffffffff810aebb1>] find_task_by_pid_ns+0x81/0x90
> [ 3.403445] [<ffffffff810aebe2>] find_task_by_vpid+0x22/0x30
> [ 3.404146] [<ffffffff811c2074>] sys_ioprio_set+0x3e4/0x3f0
> [ 3.404756] [<ffffffff815c5919>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 3.405455] [<ffffffff8104331b>] system_call_fastpath+0x16/0x1b
>
>
> ioprio_set() contains a comment warning against of usage of
> rcu_read_lock() to avoid this warning:
> /*
> * We want IOPRIO_WHO_PGRP/IOPRIO_WHO_USER to be "atomic",
> * so we can't use rcu_read_lock(). See re-copy of ->ioprio
> * in copy_process().
> */
>
> So I'm not sure what the best fix is.
I must defer to Oleg, who wrote the comment. But please see below.
> Also I see that sys_ioprio_get() has a similar problem that might be
> addressed with:
There is a patch from Sergey Senozhatsky currently in -mm that encloses
a subset of this code (both ioprio_set and ioprio_get) in rcu_read_lock()
and rcu_read_unlock(), see http://lkml.org/lkml/2010/10/29/168.
Thanx, Paul
> diff --git a/fs/ioprio.c b/fs/ioprio.c
> index 748cfb9..02eed30 100644
> --- a/fs/ioprio.c
> +++ b/fs/ioprio.c
> @@ -197,6 +197,7 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
> int ret = -ESRCH;
> int tmpio;
>
> + rcu_read_lock();
> read_lock(&tasklist_lock);
> switch (which) {
> case IOPRIO_WHO_PROCESS:
> @@ -251,5 +252,6 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
> }
>
> read_unlock(&tasklist_lock);
> + rcu_read_unlock();
> return ret;
> }
>
> sys_ioprio_get() didn't have an explicit warning against usage of
> rcu_read_lock(), but that doesn't mean this is a good patch.
>
> --
> Greg
On 11/07, Paul E. McKenney wrote:
>
> On Tue, Oct 12, 2010 at 12:08:46AM -0700, Greg Thelen wrote:
> >
> > ioprio_set() contains a comment warning against of usage of
> > rcu_read_lock() to avoid this warning:
> > /*
> > * We want IOPRIO_WHO_PGRP/IOPRIO_WHO_USER to be "atomic",
> > * so we can't use rcu_read_lock(). See re-copy of ->ioprio
> > * in copy_process().
> > */
> >
> > So I'm not sure what the best fix is.
(please note that "we can't use rcu_read_lock()" actually meant
rcu_read_lock() is not _enough_)
> I must defer to Oleg, who wrote the comment. But please see below.
I added this comment to explain some oddities in copy_process().
Nobody confirmed my understanding was correct ;)
In any case, this comment doesn't look right today. This code was
changed by fd0928df98b9578be8a786ac0cb78a47a5e17a20
"ioprio: move io priority from task_struct to io_context" after that,
tasklist can't help to make sys_ioprio_set(IOPRIO_WHO_PGRP) atomic.
I think tasklist_lock can be removed now.
And, as Paul pointed out, we need rcu_read_lock() anyway, it was
already added by Sergey.
Oleg.
On Mon, Nov 08, 2010 at 04:15:09PM +0100, Oleg Nesterov wrote:
> On 11/07, Paul E. McKenney wrote:
> >
> > On Tue, Oct 12, 2010 at 12:08:46AM -0700, Greg Thelen wrote:
> > >
> > > ioprio_set() contains a comment warning against of usage of
> > > rcu_read_lock() to avoid this warning:
> > > /*
> > > * We want IOPRIO_WHO_PGRP/IOPRIO_WHO_USER to be "atomic",
> > > * so we can't use rcu_read_lock(). See re-copy of ->ioprio
> > > * in copy_process().
> > > */
> > >
> > > So I'm not sure what the best fix is.
>
> (please note that "we can't use rcu_read_lock()" actually meant
> rcu_read_lock() is not _enough_)
>
> > I must defer to Oleg, who wrote the comment. But please see below.
>
> I added this comment to explain some oddities in copy_process().
> Nobody confirmed my understanding was correct ;)
>
> In any case, this comment doesn't look right today. This code was
> changed by fd0928df98b9578be8a786ac0cb78a47a5e17a20
> "ioprio: move io priority from task_struct to io_context" after that,
> tasklist can't help to make sys_ioprio_set(IOPRIO_WHO_PGRP) atomic.
>
> I think tasklist_lock can be removed now.
>
> And, as Paul pointed out, we need rcu_read_lock() anyway, it was
> already added by Sergey.
Thank you, Oleg! Greg, would you be willing to update your patch
to remove the comment? (Perhaps tasklist_lock as well...)
Thanx, Paul
On 11/09, Paul E. McKenney wrote:
>
> Thank you, Oleg! Greg, would you be willing to update your patch
> to remove the comment? (Perhaps tasklist_lock as well...)
Agreed, I think tasklock should be killed.
But wait. Whatever we do, isn't this code racy? I do not see why, say,
sys_ioprio_set(IOPRIO_WHO_PROCESS) can't install ->io_context after
this task has already passed exit_io_context().
Jens, am I missed something?
Oleg.
(another try with the proper email address)
On 11/09, Paul E. McKenney wrote:
>
> Thank you, Oleg! Greg, would you be willing to update your patch
> to remove the comment? (Perhaps tasklist_lock as well...)
Agreed, I think tasklock should be killed.
But wait. Whatever we do, isn't this code racy? I do not see why, say,
sys_ioprio_set(IOPRIO_WHO_PROCESS) can't install ->io_context after
this task has already passed exit_io_context().
Jens, am I missed something?
Oleg.
On 2010-11-10 17:02, Oleg Nesterov wrote:
> (another try with the proper email address)
>
> On 11/09, Paul E. McKenney wrote:
>>
>> Thank you, Oleg! Greg, would you be willing to update your patch
>> to remove the comment? (Perhaps tasklist_lock as well...)
>
> Agreed, I think tasklock should be killed.
>
>
> But wait. Whatever we do, isn't this code racy? I do not see why, say,
> sys_ioprio_set(IOPRIO_WHO_PROCESS) can't install ->io_context after
> this task has already passed exit_io_context().
>
> Jens, am I missed something?
Not sure, I think the original intent was for the tasklist_lock to
protect from a concurrent exit, but that looks like nonsense and it was
just there to protect the task lookup.
How about moving the ->io_context check and exit_io_context() in
do_exit() under the task lock? Coupled with a check for PF_EXITING in
set_task_ioprio().
--
Jens Axboe
On 11/11, Jens Axboe wrote:
>
> On 2010-11-10 17:02, Oleg Nesterov wrote:
> >
> > But wait. Whatever we do, isn't this code racy? I do not see why, say,
> > sys_ioprio_set(IOPRIO_WHO_PROCESS) can't install ->io_context after
> > this task has already passed exit_io_context().
> >
> > Jens, am I missed something?
>
> Not sure, I think the original intent was for the tasklist_lock to
> protect from a concurrent exit, but that looks like nonsense and it was
> just there to protect the task lookup.
Probably. After that (perhaps) there was another reason, see
5b160f5e "copy_process: cosmetic ->ioprio tweak"
cf342e52 "Don't need to disable interrupts for tasklist_lock"
But this was dismissed by
fd0928df "ioprio: move io priority from task_struct to io_context"
> How about moving the ->io_context check and exit_io_context() in
> do_exit() under the task lock? Coupled with a check for PF_EXITING in
> set_task_ioprio().
Yes, I thought about this too. The only drawback is that we should
take task_lock() unconditionally in exit_io_context().
Btw, in theory get_task_ioprio() is racy too. "ret = p->io_context->ioprio"
can lead to use-after-free. Probably needs task_lock() as well.
Hmm. And copy_io_context() has no callers ;)
Oleg.
On 2010-11-11 13:30, Oleg Nesterov wrote:
> On 11/11, Jens Axboe wrote:
>>
>> On 2010-11-10 17:02, Oleg Nesterov wrote:
>>>
>>> But wait. Whatever we do, isn't this code racy? I do not see why, say,
>>> sys_ioprio_set(IOPRIO_WHO_PROCESS) can't install ->io_context after
>>> this task has already passed exit_io_context().
>>>
>>> Jens, am I missed something?
>>
>> Not sure, I think the original intent was for the tasklist_lock to
>> protect from a concurrent exit, but that looks like nonsense and it was
>> just there to protect the task lookup.
>
> Probably. After that (perhaps) there was another reason, see
>
> 5b160f5e "copy_process: cosmetic ->ioprio tweak"
> cf342e52 "Don't need to disable interrupts for tasklist_lock"
>
> But this was dismissed by
>
> fd0928df "ioprio: move io priority from task_struct to io_context"
>
>> How about moving the ->io_context check and exit_io_context() in
>> do_exit() under the task lock? Coupled with a check for PF_EXITING in
>> set_task_ioprio().
>
> Yes, I thought about this too. The only drawback is that we should
> take task_lock() unconditionally in exit_io_context().
Sure, not a big problem.
> Btw, in theory get_task_ioprio() is racy too. "ret = p->io_context->ioprio"
> can lead to use-after-free. Probably needs task_lock() as well.
Indeed...
> Hmm. And copy_io_context() has no callers ;)
Good find. It was previously used by the AS io scheduler, seems there
are no users left anymore. I queued up a patch to kill it.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
> On 2010-11-11 13:30, Oleg Nesterov wrote:
>> On 11/11, Jens Axboe wrote:
>>>
>>> On 2010-11-10 17:02, Oleg Nesterov wrote:
>>>>
>>>> But wait. Whatever we do, isn't this code racy? I do not see why, say,
>>>> sys_ioprio_set(IOPRIO_WHO_PROCESS) can't install ->io_context after
>>>> this task has already passed exit_io_context().
>>>>
>>>> Jens, am I missed something?
>>>
>>> Not sure, I think the original intent was for the tasklist_lock to
>>> protect from a concurrent exit, but that looks like nonsense and it was
>>> just there to protect the task lookup.
>>
>> Probably. After that (perhaps) there was another reason, see
>>
>> 5b160f5e "copy_process: cosmetic ->ioprio tweak"
>> cf342e52 "Don't need to disable interrupts for tasklist_lock"
>>
>> But this was dismissed by
>>
>> fd0928df "ioprio: move io priority from task_struct to io_context"
>>
>>> How about moving the ->io_context check and exit_io_context() in
>>> do_exit() under the task lock? Coupled with a check for PF_EXITING in
>>> set_task_ioprio().
>>
>> Yes, I thought about this too. The only drawback is that we should
>> take task_lock() unconditionally in exit_io_context().
>
> Sure, not a big problem.
>
>> Btw, in theory get_task_ioprio() is racy too. "ret = p->io_context->ioprio"
>> can lead to use-after-free. Probably needs task_lock() as well.
>
> Indeed...
>
>> Hmm. And copy_io_context() has no callers ;)
>
> Good find. It was previously used by the AS io scheduler, seems there
> are no users left anymore. I queued up a patch to kill it.
>From this thread I gather the following changes are being proposed:
a) my original report added rcu_read_lock() to sys_ioprio_get() and
claims that "something" is needed in sys_ioprio_set().
c) http://lkml.org/lkml/2010/10/29/168 added rcu locks to both
sys_ioprio_get() and sys_ioprio_set() thus addressing the issues
raised in a). However, I do not see this patch in -mm.
I just retested and confirmed that this warning still exists in
unmodified mmotm-2010-11-09-15-31:
Call Trace:
[<ffffffff8109befc>] lockdep_rcu_dereference+0xaa/0xb3
[<ffffffff81088aaf>] find_task_by_pid_ns+0x44/0x5d
[<ffffffff81088aea>] find_task_by_vpid+0x22/0x24
[<ffffffff81155ad2>] sys_ioprio_set+0xb4/0x29e
[<ffffffff81476819>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff8105c409>] sysenter_dispatch+0x7/0x2c
[<ffffffff814767da>] ? trace_hardirqs_on_thunk+0x3a/0x3f
I can resubmit my patch, but want to know if there is a reason that
http://lkml.org/lkml/2010/10/29/168 did not make it into either -mm
or linux-next?
d) the sys_ioprio_set() comment indicating that "we can't use
rcu_read_lock()" needs to be updated to be more clear. I'm not sure
what this should be updated to, which leads into the next
sub-topic...
e) possibly removing tasklist_lock, though there seems to be some
concern that this might introduce task->io_context usage race. I
think Jens is going to address this issue.
--
Greg
On 11/11, Greg Thelen wrote:
>
> a) my original report added rcu_read_lock() to sys_ioprio_get() and
> claims that "something" is needed in sys_ioprio_set().
>
> c) http://lkml.org/lkml/2010/10/29/168 added rcu locks to both
> sys_ioprio_get() and sys_ioprio_set() thus addressing the issues
> raised in a). However, I do not see this patch in -mm.
Well, I do not know what happened with this patch, but
> I can resubmit my patch, but want to know if there is a reason that
> http://lkml.org/lkml/2010/10/29/168 did not make it into either -mm
> or linux-next?
I am looking at http://lkml.org/lkml/2010/10/29/168 now, and I think
it should be dropped or you can submit the patch on top of it.
It only adds rcu_read_lock() around of find_task_by_vpid(), but we can
use rcu_read_lock() instead of tasklist_lock.
> d) the sys_ioprio_set() comment indicating that "we can't use
> rcu_read_lock()" needs to be updated to be more clear. I'm not sure
> what this should be updated to, which leads into the next
> sub-topic...
It should be just removed. It doesn't match the reality today.
> e) possibly removing tasklist_lock,
Yes.
> though there seems to be some
> concern that this might introduce task->io_context usage race.
No!
I am sorry for confusion, those ->io_context races are completely
orthogonal to s/tasklist/rcu/.
Oleg.