2010-08-16 16:57:39

by Vegard Nossum

[permalink] [raw]
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)

Hi,

I was testing your VFS scaling tree*, but I get the following in my dmesg:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
#0: (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
#1: (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3

stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
[<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
[<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
[<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
[<ffffffff8103a29d>] do_wait+0xdb/0x200
[<ffffffff8103a45b>] sys_wait4+0x99/0xd3
[<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
[<ffffffff8109be66>] ? might_fault+0x96/0xbc
[<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
[<ffffffff8109be83>] ? might_fault+0xb3/0xbc
[<ffffffff8109be38>] ? might_fault+0x68/0xbc
[<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff8102862c>] sys32_waitpid+0xb/0xd
[<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
[<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f

I don't think it's related to the VFS patches, however, so I added
some other (maybe relevant) CCs too.

At the time I was simply invoking gcc from bash.


Vegard

* git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
branch vfs-scale


2010-08-16 17:01:11

by Vegard Nossum

[permalink] [raw]
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)

Nick Piggin might update his git config because the commits in the
vfs-scale tree appear to bear an invalid e-mail address. Please reply
to this e-mail instead. Thanks.

Vegard


---------- Forwarded message ----------
From: Vegard Nossum <[email protected]>
Date: 16 August 2010 18:57
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
usage (kernel/exit.c:1387)
To: Nick Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>, David Howells
<[email protected]>, "Paul E. McKenney"
<[email protected]>, LKML <[email protected]>


Hi,

I was testing your VFS scaling tree*, but I get the following in my dmesg:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
 #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
 #1:  (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3

stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
 [<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
 [<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
 [<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
 [<ffffffff8103a29d>] do_wait+0xdb/0x200
 [<ffffffff8103a45b>] sys_wait4+0x99/0xd3
 [<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
 [<ffffffff8109be66>] ? might_fault+0x96/0xbc
 [<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
 [<ffffffff8109be83>] ? might_fault+0xb3/0xbc
 [<ffffffff8109be38>] ? might_fault+0x68/0xbc
 [<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8102862c>] sys32_waitpid+0xb/0xd
 [<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
 [<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f

I don't think it's related to the VFS patches, however, so I added
some other (maybe relevant) CCs too.

At the time I was simply invoking gcc from bash.


Vegard

* git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
branch vfs-scale

2010-08-16 17:03:42

by Vegard Nossum

[permalink] [raw]
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)

On the third try it will hopefully work.


Vegard

---------- Forwarded message ----------
From: Vegard Nossum <[email protected]>
Date: 16 August 2010 19:01
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious
rcu_dereference_check() usage (kernel/exit.c:1387)
To: Nick Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>, David Howells
<[email protected]>, "Paul E. McKenney"
<[email protected]>, LKML <[email protected]>


Nick Piggin might update his git config because the commits in the
vfs-scale tree appear to bear an invalid e-mail address. Please reply
to this e-mail instead. Thanks.

Vegard


---------- Forwarded message ----------
From: Vegard Nossum <[email protected]>
Date: 16 August 2010 18:57
Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
usage (kernel/exit.c:1387)
To: Nick Piggin <[email protected]>
Cc: Peter Zijlstra <[email protected]>, David Howells
<[email protected]>, "Paul E. McKenney"
<[email protected]>, LKML <[email protected]>


Hi,

I was testing your VFS scaling tree*, but I get the following in my dmesg:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1387 invoked rcu_dereference_check() without protection!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by bash/2878:
 #0:  (tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
 #1:  (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
wait_consider_task+0x2ca/0xaa3

stack backtrace:
Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
Call Trace:
 [<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
 [<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
 [<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
 [<ffffffff8103a29d>] do_wait+0xdb/0x200
 [<ffffffff8103a45b>] sys_wait4+0x99/0xd3
 [<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
 [<ffffffff8109be66>] ? might_fault+0x96/0xbc
 [<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
 [<ffffffff8109be83>] ? might_fault+0xb3/0xbc
 [<ffffffff8109be38>] ? might_fault+0x68/0xbc
 [<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8102862c>] sys32_waitpid+0xb/0xd
 [<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
 [<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f

I don't think it's related to the VFS patches, however, so I added
some other (maybe relevant) CCs too.

At the time I was simply invoking gcc from bash.


Vegard

* git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
branch vfs-scale

2010-08-16 20:01:36

by David Howells

[permalink] [raw]
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)


Vegard Nossum <[email protected]> wrote:

> ?#0: ?(tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
> ?#1: ?(&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
> wait_consider_task+0x2ca/0xaa3
>
> stack backtrace:
> Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
> Call Trace:
> ?[<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
> ?[<ffffffff8103a036>] wait_consider_task+0x917/0xaa3

The attached patch should fix this.

David
---
From: Daniel J Blueman <[email protected]>
Subject: [PATCH] Fix unprotected access to task credentials in waitid()

Using a program like the following:

#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main() {
id_t id;
siginfo_t infop;
pid_t res;

id = fork();
if (id == 0) { sleep(1); exit(0); }
kill(id, SIGSTOP);
alarm(1);
waitid(P_PID, id, &infop, WCONTINUED);
return 0;
}

to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 1
2 locks held by waitid02/22252:
#0: (tasklist_lock){.?.?..}, at: [<ffffffff81061ce5>] do_wait+0xc5/0x310
#1: (&(&sighand->siglock)->rlock){-.-...}, at: [<ffffffff810611da>]
wait_consider_task+0x19a/0xbe0

stack backtrace:
Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
Call Trace:
[<ffffffff81095da4>] lockdep_rcu_dereference+0xa4/0xc0
[<ffffffff81061b31>] wait_consider_task+0xaf1/0xbe0
[<ffffffff81061d15>] do_wait+0xf5/0x310
[<ffffffff810620b6>] sys_waitid+0x86/0x1f0
[<ffffffff8105fce0>] ? child_wait_callback+0x0/0x70
[<ffffffff81003282>] system_call_fastpath+0x16/0x1b

This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.

Furthermore, protect wait_task_stopped() in the same way.

We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.

Signed-off-by: Daniel J Blueman <[email protected]>
Signed-off-by: David Howells <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
---

kernel/exit.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)


diff --git a/kernel/exit.c b/kernel/exit.c
index 671ed56..0312022 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1386,8 +1386,7 @@ static int wait_task_stopped(struct wait_opts *wo,
if (!unlikely(wo->wo_flags & WNOWAIT))
*p_code = 0;

- /* don't need the RCU readlock here as we're holding a spinlock */
- uid = __task_cred(p)->uid;
+ uid = task_uid(p);
unlock_sig:
spin_unlock_irq(&p->sighand->siglock);
if (!exit_code)
@@ -1460,7 +1459,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
}
if (!unlikely(wo->wo_flags & WNOWAIT))
p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
- uid = __task_cred(p)->uid;
+ uid = task_uid(p);
spin_unlock_irq(&p->sighand->siglock);

pid = task_pid_vnr(p);

2010-08-16 20:21:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)

On Mon, Aug 16, 2010 at 07:03:40PM +0200, Vegard Nossum wrote:
> On the third try it will hopefully work.
>
>
> Vegard
>
> ---------- Forwarded message ----------
> From: Vegard Nossum <[email protected]>
> Date: 16 August 2010 19:01
> Subject: Re: 2.6.35+vfs-scale: INFO: suspicious
> rcu_dereference_check() usage (kernel/exit.c:1387)
> To: Nick Piggin <[email protected]>
> Cc: Peter Zijlstra <[email protected]>, David Howells
> <[email protected]>, "Paul E. McKenney"
> <[email protected]>, LKML <[email protected]>
>
>
> Nick Piggin might update his git config because the commits in the
> vfs-scale tree appear to bear an invalid e-mail address. Please reply
> to this e-mail instead. Thanks.
>
> Vegard
>
>
> ---------- Forwarded message ----------
> From: Vegard Nossum <[email protected]>
> Date: 16 August 2010 18:57
> Subject: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check()
> usage (kernel/exit.c:1387)
> To: Nick Piggin <[email protected]>
> Cc: Peter Zijlstra <[email protected]>, David Howells
> <[email protected]>, "Paul E. McKenney"
> <[email protected]>, LKML <[email protected]>
>
>
> Hi,
>
> I was testing your VFS scaling tree*, but I get the following in my dmesg:
>
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> kernel/exit.c:1387 invoked rcu_dereference_check() without protection!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 1
> 2 locks held by bash/2878:
> ?#0: ?(tasklist_lock){.+.+..}, at: [<ffffffff8103a275>] do_wait+0xb3/0x200
> ?#1: ?(&(&sighand->siglock)->rlock){-.....}, at: [<ffffffff810399e9>]
> wait_consider_task+0x2ca/0xaa3

Thank you for locating this!

OK, so we aren't holding rcu_read_lock(), nor is this task dead, so
something needs some help.

We get from wait_consider_task() as follows: wait_consider_task()
calls security_task_wait(), which calls either selinux_task_wait() or
smack_task_wait(). Let's start with selinux_task_wait(), which calls
task_has_perm(), which does a couple of __task_cred(), but does so
under rcu_read_lock() protection, so presumably we didn't come in
that way.

Dave, am I missing a code path somewhere here?

Thanx, Paul

> stack backtrace:
> Pid: 2878, comm: bash Not tainted 2.6.35-00079-g4067b85 #7
> Call Trace:
> ?[<ffffffff8105e9f2>] lockdep_rcu_dereference+0x9b/0xa6
> ?[<ffffffff8103a036>] wait_consider_task+0x917/0xaa3
> ?[<ffffffff8102cf36>] ? add_preempt_count+0xc2/0xc9
> ?[<ffffffff8103a29d>] do_wait+0xdb/0x200
> ?[<ffffffff8103a45b>] sys_wait4+0x99/0xd3
> ?[<ffffffff810387e5>] ? child_wait_callback+0x0/0x5d
> ?[<ffffffff8109be66>] ? might_fault+0x96/0xbc
> ?[<ffffffff810737a6>] compat_sys_wait4+0x8a/0xc1
> ?[<ffffffff8109be83>] ? might_fault+0xb3/0xbc
> ?[<ffffffff8109be38>] ? might_fault+0x68/0xbc
> ?[<ffffffff815d02c3>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> ?[<ffffffff8102862c>] sys32_waitpid+0xb/0xd
> ?[<ffffffff81027bb4>] sysenter_dispatch+0x7/0x30
> ?[<ffffffff815d0284>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>
> I don't think it's related to the VFS patches, however, so I added
> some other (maybe relevant) CCs too.
>
> At the time I was simply invoking gcc from bash.
>
>
> Vegard
>
> * git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git
> branch vfs-scale

2010-08-17 18:12:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: 2.6.35+vfs-scale: INFO: suspicious rcu_dereference_check() usage (kernel/exit.c:1387)

On 08/16, David Howells wrote:
>
> Vegard Nossum <[email protected]> wrote:
>
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1386,8 +1386,7 @@ static int wait_task_stopped(struct wait_opts *wo,
> if (!unlikely(wo->wo_flags & WNOWAIT))
> *p_code = 0;
>
> - /* don't need the RCU readlock here as we're holding a spinlock */
> - uid = __task_cred(p)->uid;
> + uid = task_uid(p);
> unlock_sig:
> spin_unlock_irq(&p->sighand->siglock);
> if (!exit_code)
> @@ -1460,7 +1459,7 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p)
> }
> if (!unlikely(wo->wo_flags & WNOWAIT))
> p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
> - uid = __task_cred(p)->uid;
> + uid = task_uid(p);
> spin_unlock_irq(&p->sighand->siglock);
>
> pid = task_pid_vnr(p);

Acked-by: Oleg Nesterov <[email protected]>


Vegard, perhaps you can take care of wait_task_zombie() too?
Of course, if you wish.

Like wait_task_stopped(), it doesn't really need RCU lock. But if
we change it to use task_uid() to shut up rcu_dereference_check(),
then we can kill task_is_dead().

Oleg.