2010-04-21 20:01:57

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

Hello!

This patchset contains four RCU lockdep splat fixes, courtesy of David
Howells, Peter Zijlstra, and Trond Myklebust, as well as an enhancement
by Lai Jiangshan that permits collecting more than one RCU lockdep splat
per boot.

Thanx, Paul

------------------------------------------------------------------------

b/fs/nfs/delegation.c | 42 ++++++++++++++++++++++++++++--------------
b/include/linux/rcupdate.h | 15 +++++++++++----
b/kernel/cgroup_freezer.c | 5 ++++-
b/kernel/lockdep.c | 2 --
b/kernel/sched.c | 10 ++++++++++
fs/nfs/delegation.c | 44 +++++++++++++++++++++++---------------------
6 files changed, 76 insertions(+), 42 deletions(-)


2010-04-21 20:02:24

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/urgent 3/5] rcu: leave lockdep enabled after RCU lockdep splat

From: Lai Jiangshan <[email protected]>

There is no need to disable lockdep after an RCU lockdep splat, so
remove the debug_lockdeps_off() from lockdep_rcu_dereference().
To avoid repeated lockdep splats, use a static variable in the
inlined rcu_dereference_check() and rcu_dereference_protected()
macros so that a given instance splats only once, but so that
multiple instances can be detected per boot.

Requested-by: Eric Paris <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
Tested-by: Eric Paris <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/linux/rcupdate.h | 15 +++++++++++----
kernel/lockdep.c | 2 --
2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 07db2fe..ec9ab49 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -190,6 +190,15 @@ static inline int rcu_read_lock_sched_held(void)

#ifdef CONFIG_PROVE_RCU

+#define __do_rcu_dereference_check(c) \
+ do { \
+ static bool __warned; \
+ if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \
+ __warned = true; \
+ lockdep_rcu_dereference(__FILE__, __LINE__); \
+ } \
+ } while (0)
+
/**
* rcu_dereference_check - rcu_dereference with debug checking
* @p: The pointer to read, prior to dereferencing
@@ -219,8 +228,7 @@ static inline int rcu_read_lock_sched_held(void)
*/
#define rcu_dereference_check(p, c) \
({ \
- if (debug_lockdep_rcu_enabled() && !(c)) \
- lockdep_rcu_dereference(__FILE__, __LINE__); \
+ __do_rcu_dereference_check(c); \
rcu_dereference_raw(p); \
})

@@ -237,8 +245,7 @@ static inline int rcu_read_lock_sched_held(void)
*/
#define rcu_dereference_protected(p, c) \
({ \
- if (debug_lockdep_rcu_enabled() && !(c)) \
- lockdep_rcu_dereference(__FILE__, __LINE__); \
+ __do_rcu_dereference_check(c); \
(p); \
})

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 2594e1c..03dd1fa 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -3801,8 +3801,6 @@ void lockdep_rcu_dereference(const char *file, const int line)
{
struct task_struct *curr = current;

- if (!debug_locks_off())
- return;
printk("\n===================================================\n");
printk( "[ INFO: suspicious rcu_dereference_check() usage. ]\n");
printk( "---------------------------------------------------\n");
--
1.7.0

2010-04-21 20:02:28

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/urgent 1/5] rcu: Fix RCU lockdep splat in set_task_cpu on fork path

From: Peter Zijlstra <[email protected]>

Add an RCU read-side critical section to suppress this false positive.

Located-by: Eric Paris <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/sched.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 6af210a..14c44ec 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -323,6 +323,15 @@ static inline struct task_group *task_group(struct task_struct *p)
/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
{
+ /*
+ * Strictly speaking this rcu_read_lock() is not needed since the
+ * task_group is tied to the cgroup, which in turn can never go away
+ * as long as there are tasks attached to it.
+ *
+ * However since task_group() uses task_subsys_state() which is an
+ * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
+ */
+ rcu_read_lock();
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
p->se.parent = task_group(p)->se[cpu];
@@ -332,6 +341,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
p->rt.rt_rq = task_group(p)->rt_rq[cpu];
p->rt.parent = task_group(p)->rt_se[cpu];
#endif
+ rcu_read_unlock();
}

#else
--
1.7.0

2010-04-21 20:02:29

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/urgent 2/5] rcu: fix RCU lockdep splat on freezer_fork path

Add an RCU read-side critical section to suppress this false positive.

Located-by: Eric Paris <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Acked-by: Li Zefan <[email protected]>
---
kernel/cgroup_freezer.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index da5e139..e5c0244 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -205,9 +205,12 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
* No lock is needed, since the task isn't on tasklist yet,
* so it can't be moved to another cgroup, which means the
* freezer won't be removed and will be valid during this
- * function call.
+ * function call. Nevertheless, apply RCU read-side critical
+ * section to suppress RCU lockdep false positives.
*/
+ rcu_read_lock();
freezer = task_freezer(task);
+ rcu_read_unlock();

/*
* The root cgroup is non-freezable, so we can skip the
--
1.7.0

2010-04-21 20:02:22

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/urgent 5/5] NFS: Fix RCU issues in the NFSv4 delegation code

From: David Howells <[email protected]>

Fix a number of RCU issues in the NFSv4 delegation code.

(1) delegation->cred doesn't need to be RCU protected as it's essentially an
invariant refcounted structure.

By the time we get to nfs_free_delegation(), the delegation is being
released, so no one else should be attempting to use the saved
credentials, and they can be cleared.

However, since the list of delegations could still be under traversal at
this point by such as nfs_client_return_marked_delegations(), the cred
should be released in nfs_do_free_delegation() rather than in
nfs_free_delegation(). Simply using rcu_assign_pointer() to clear it is
insufficient as that doesn't stop the cred from being destroyed, and nor
does calling put_rpccred() after call_rcu(), given that the latter is
asynchronous.

(2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
rcu_derefence_protected() because they can only be called if
nfs_client::cl_lock is held, and that guards against anyone changing
nfsi->delegation under it. Furthermore, the barrier imposed by
rcu_dereference() is superfluous, given that the spin_lock() is also a
barrier.

(3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
struct so that it can issue lockdep advice based on clp->cl_lock for (2).

(4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
should use rcu_access_pointer() outside the spinlocked region as they
merely examine the pointer and don't follow it, thus rendering unnecessary
the need to impose a partial ordering over the one item of interest.

These result in an RCU warning like the following:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by mount.nfs4/2281:
#0: (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
#1: (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a

stack backtrace:
Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
Call Trace:
[<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
[<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
[<ffffffff810c2d92>] clear_inode+0x9e/0xf8
[<ffffffff810c3028>] dispose_list+0x67/0x10e
[<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
[<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
[<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
[<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
[<ffffffff810b25bc>] deactivate_super+0x68/0x80
[<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
[<ffffffff810c681b>] release_mounts+0x9a/0xb0
[<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
[<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
[<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
[<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
[<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
[<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
[<ffffffff810b2176>] do_kern_mount+0x48/0xe8
[<ffffffff810c810b>] do_mount+0x782/0x7f9
[<ffffffff810c8205>] sys_mount+0x83/0xbe
[<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Also on:

fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
[<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
[<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
[<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
...

And:

fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
[<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
[<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
[<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
[<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
[<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
...

Signed-off-by: David Howells <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
fs/nfs/delegation.c | 44 +++++++++++++++++++++++---------------------
1 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 8d9ec49..ea61d26 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -24,6 +24,8 @@

static void nfs_do_free_delegation(struct nfs_delegation *delegation)
{
+ if (delegation->cred)
+ put_rpccred(delegation->cred);
kfree(delegation);
}

@@ -36,13 +38,7 @@ static void nfs_free_delegation_callback(struct rcu_head *head)

static void nfs_free_delegation(struct nfs_delegation *delegation)
{
- struct rpc_cred *cred;
-
- cred = rcu_dereference(delegation->cred);
- rcu_assign_pointer(delegation->cred, NULL);
call_rcu(&delegation->rcu, nfs_free_delegation_callback);
- if (cred)
- put_rpccred(cred);
}

void nfs_mark_delegation_referenced(struct nfs_delegation *delegation)
@@ -180,9 +176,13 @@ static struct inode *nfs_delegation_grab_inode(struct nfs_delegation *delegation
return inode;
}

-static struct nfs_delegation *nfs_detach_delegation_locked(struct nfs_inode *nfsi, const nfs4_stateid *stateid)
+static struct nfs_delegation *nfs_detach_delegation_locked(struct nfs_inode *nfsi,
+ const nfs4_stateid *stateid,
+ struct nfs_client *clp)
{
- struct nfs_delegation *delegation = rcu_dereference(nfsi->delegation);
+ struct nfs_delegation *delegation =
+ rcu_dereference_protected(nfsi->delegation,
+ lockdep_is_held(&clp->cl_lock));

if (delegation == NULL)
goto nomatch;
@@ -209,7 +209,7 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
{
struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
struct nfs_inode *nfsi = NFS_I(inode);
- struct nfs_delegation *delegation;
+ struct nfs_delegation *delegation, *old_delegation;
struct nfs_delegation *freeme = NULL;
int status = 0;

@@ -227,10 +227,12 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
spin_lock_init(&delegation->lock);

spin_lock(&clp->cl_lock);
- if (rcu_dereference(nfsi->delegation) != NULL) {
- if (memcmp(&delegation->stateid, &nfsi->delegation->stateid,
- sizeof(delegation->stateid)) == 0 &&
- delegation->type == nfsi->delegation->type) {
+ old_delegation = rcu_dereference_protected(nfsi->delegation,
+ lockdep_is_held(&clp->cl_lock));
+ if (old_delegation != NULL) {
+ if (memcmp(&delegation->stateid, &old_delegation->stateid,
+ sizeof(old_delegation->stateid)) == 0 &&
+ delegation->type == old_delegation->type) {
goto out;
}
/*
@@ -240,12 +242,12 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
dfprintk(FILE, "%s: server %s handed out "
"a duplicate delegation!\n",
__func__, clp->cl_hostname);
- if (delegation->type <= nfsi->delegation->type) {
+ if (delegation->type <= old_delegation->type) {
freeme = delegation;
delegation = NULL;
goto out;
}
- freeme = nfs_detach_delegation_locked(nfsi, NULL);
+ freeme = nfs_detach_delegation_locked(nfsi, NULL, clp);
}
list_add_rcu(&delegation->super_list, &clp->cl_delegations);
nfsi->delegation_state = delegation->type;
@@ -315,7 +317,7 @@ restart:
if (inode == NULL)
continue;
spin_lock(&clp->cl_lock);
- delegation = nfs_detach_delegation_locked(NFS_I(inode), NULL);
+ delegation = nfs_detach_delegation_locked(NFS_I(inode), NULL, clp);
spin_unlock(&clp->cl_lock);
rcu_read_unlock();
if (delegation != NULL) {
@@ -344,9 +346,9 @@ void nfs_inode_return_delegation_noreclaim(struct inode *inode)
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_delegation *delegation;

- if (rcu_dereference(nfsi->delegation) != NULL) {
+ if (rcu_access_pointer(nfsi->delegation) != NULL) {
spin_lock(&clp->cl_lock);
- delegation = nfs_detach_delegation_locked(nfsi, NULL);
+ delegation = nfs_detach_delegation_locked(nfsi, NULL, clp);
spin_unlock(&clp->cl_lock);
if (delegation != NULL)
nfs_do_return_delegation(inode, delegation, 0);
@@ -360,9 +362,9 @@ int nfs_inode_return_delegation(struct inode *inode)
struct nfs_delegation *delegation;
int err = 0;

- if (rcu_dereference(nfsi->delegation) != NULL) {
+ if (rcu_access_pointer(nfsi->delegation) != NULL) {
spin_lock(&clp->cl_lock);
- delegation = nfs_detach_delegation_locked(nfsi, NULL);
+ delegation = nfs_detach_delegation_locked(nfsi, NULL, clp);
spin_unlock(&clp->cl_lock);
if (delegation != NULL) {
nfs_msync_inode(inode);
@@ -540,7 +542,7 @@ restart:
if (inode == NULL)
continue;
spin_lock(&clp->cl_lock);
- delegation = nfs_detach_delegation_locked(NFS_I(inode), NULL);
+ delegation = nfs_detach_delegation_locked(NFS_I(inode), NULL, clp);
spin_unlock(&clp->cl_lock);
rcu_read_unlock();
if (delegation != NULL)
--
1.7.0

2010-04-21 20:03:19

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH tip/core/urgent 4/5] NFSv4: Fix the locking in nfs_inode_reclaim_delegation()

From: Trond Myklebust <[email protected]>

Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.

Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: David Howells <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
fs/nfs/delegation.c | 42 ++++++++++++++++++++++++++++--------------
1 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 1567124..8d9ec49 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -129,21 +129,35 @@ again:
*/
void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res)
{
- struct nfs_delegation *delegation = NFS_I(inode)->delegation;
- struct rpc_cred *oldcred;
+ struct nfs_delegation *delegation;
+ struct rpc_cred *oldcred = NULL;

- if (delegation == NULL)
- return;
- memcpy(delegation->stateid.data, res->delegation.data,
- sizeof(delegation->stateid.data));
- delegation->type = res->delegation_type;
- delegation->maxsize = res->maxsize;
- oldcred = delegation->cred;
- delegation->cred = get_rpccred(cred);
- clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
- NFS_I(inode)->delegation_state = delegation->type;
- smp_wmb();
- put_rpccred(oldcred);
+ rcu_read_lock();
+ delegation = rcu_dereference(NFS_I(inode)->delegation);
+ if (delegation != NULL) {
+ spin_lock(&delegation->lock);
+ if (delegation->inode != NULL) {
+ memcpy(delegation->stateid.data, res->delegation.data,
+ sizeof(delegation->stateid.data));
+ delegation->type = res->delegation_type;
+ delegation->maxsize = res->maxsize;
+ oldcred = delegation->cred;
+ delegation->cred = get_rpccred(cred);
+ clear_bit(NFS_DELEGATION_NEED_RECLAIM,
+ &delegation->flags);
+ NFS_I(inode)->delegation_state = delegation->type;
+ spin_unlock(&delegation->lock);
+ put_rpccred(oldcred);
+ rcu_read_unlock();
+ } else {
+ /* We appear to have raced with a delegation return. */
+ spin_unlock(&delegation->lock);
+ rcu_read_unlock();
+ nfs_inode_set_delegation(inode, cred, res);
+ }
+ } else {
+ rcu_read_unlock();
+ }
}

static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync)
--
1.7.0

2010-04-30 16:58:48

by Paul E. McKenney

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Fix RCU lockdep splat on freezer_fork path

Commit-ID: 8b46f880841aac821af8efa6581bb0e46b8b9845
Gitweb: http://git.kernel.org/tip/8b46f880841aac821af8efa6581bb0e46b8b9845
Author: Paul E. McKenney <[email protected]>
AuthorDate: Wed, 21 Apr 2010 13:02:08 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 30 Apr 2010 12:03:17 +0200

rcu: Fix RCU lockdep splat on freezer_fork path

Add an RCU read-side critical section to suppress this false
positive.

Located-by: Eric Paris <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/cgroup_freezer.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c
index da5e139..e5c0244 100644
--- a/kernel/cgroup_freezer.c
+++ b/kernel/cgroup_freezer.c
@@ -205,9 +205,12 @@ static void freezer_fork(struct cgroup_subsys *ss, struct task_struct *task)
* No lock is needed, since the task isn't on tasklist yet,
* so it can't be moved to another cgroup, which means the
* freezer won't be removed and will be valid during this
- * function call.
+ * function call. Nevertheless, apply RCU read-side critical
+ * section to suppress RCU lockdep false positives.
*/
+ rcu_read_lock();
freezer = task_freezer(task);
+ rcu_read_unlock();

/*
* The root cgroup is non-freezable, so we can skip the

2010-04-30 17:03:57

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

* Ingo Molnar ([email protected]) wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > Hello!
> >
> > This patchset contains four RCU lockdep splat fixes, courtesy of David
> > Howells, Peter Zijlstra, and Trond Myklebust, [...]
>
> I've applied #1 and #2 - but shouldnt #4 and #5 go via the NFS tree?
>
> > [...] as well as an enhancement by Lai Jiangshan that permits collecting
> > more than one RCU lockdep splat per boot.
>
> Hm, this #3 patch i disagree with quite fundamentally: one of the big virtues
> of lockdep is that it complains only once and then shuts up and lets the
> system work. It allows distro debug kernels to have lockdep enabled, etc.
>
> One bugreport per bootup per user is the most we can expect really. Not
> disabling it risks getting a stream of repeat messages, annoyed testers and
> gives us _less_ bugreports in the end.
>
> Also, often the _first_ warning is the most reliable one - sometimes there's
> interactions, and the first bug causing a second warning as well, etc. So
> reporting just the highest-quality (i.e. first) issue we detect is the best
> approach.

I recommend creating a kernel command line parameter that would tweak
the number of messages printed by lockdep. The default would indeed by 1
message, but people in a debugging marathon can specify a larger value
so they won't have to reboot between each individual lockdep error.

Thanks,

Mathieu

>
> Thanks,
>
> Ingo

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-04-30 18:03:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

On Fri, Apr 30, 2010 at 12:16:45PM -0400, [email protected] wrote:
> On Fri, 30 Apr 2010 11:33:34 EDT, Mathieu Desnoyers said:
>
> > I recommend creating a kernel command line parameter that would tweak
> > the number of messages printed by lockdep. The default would indeed by 1
> > message, but people in a debugging marathon can specify a larger value
> > so they won't have to reboot between each individual lockdep error.
>
> Yeah, that would rock for development kernels - playing whack-a-mole with
> a half-dozen new lockdep whinges can easily stretch out for quite some time.

The RCU-lockdep splats are a bit different in nature than the
deadlock-related splats that lockdep normally prints. The RCU-lockdep
splats are transient in nature, and it is easy to apply WARN_ON_ONCE().
In contrast, if you permit multiple deadlock-related lockdep splats,
you tend to get lots of warnings about the same deadlock cycle.

So how about an additional kernel configuration variable, default
disabled, perhaps named CONFIG_PROVE_RCU_MULTIPLE, that allows a
single boot to see multiple messages? Unlike the dyntick-idle
WARN_ON()s that generated multi-gigabyte console logs in a great
hurry, I haven't yet seen excessive quantities of RCU-lockdep splats,
so I don't see the need for an integer limit.

Thoughts?

Thanx, Paul

2010-04-30 18:12:27

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

* Paul E. McKenney ([email protected]) wrote:
> On Fri, Apr 30, 2010 at 12:16:45PM -0400, [email protected] wrote:
> > On Fri, 30 Apr 2010 11:33:34 EDT, Mathieu Desnoyers said:
> >
> > > I recommend creating a kernel command line parameter that would tweak
> > > the number of messages printed by lockdep. The default would indeed by 1
> > > message, but people in a debugging marathon can specify a larger value
> > > so they won't have to reboot between each individual lockdep error.
> >
> > Yeah, that would rock for development kernels - playing whack-a-mole with
> > a half-dozen new lockdep whinges can easily stretch out for quite some time.
>
> The RCU-lockdep splats are a bit different in nature than the
> deadlock-related splats that lockdep normally prints. The RCU-lockdep
> splats are transient in nature, and it is easy to apply WARN_ON_ONCE().
> In contrast, if you permit multiple deadlock-related lockdep splats,
> you tend to get lots of warnings about the same deadlock cycle.
>
> So how about an additional kernel configuration variable, default
> disabled, perhaps named CONFIG_PROVE_RCU_MULTIPLE, that allows a
> single boot to see multiple messages? Unlike the dyntick-idle
> WARN_ON()s that generated multi-gigabyte console logs in a great
> hurry, I haven't yet seen excessive quantities of RCU-lockdep splats,
> so I don't see the need for an integer limit.
>
> Thoughts?

Ideally we don't want to flood the console with thousands of instances
of the same RCU-lockdep splat (think of a missing read lock on a common
code path). Therefore I think keeping an integer limit is relevant here.
I agree that this integer limit could be selected by a CONFIG_ option
rather than by a kernel parameter, as it will typically only be used on
development kernels with "kernel hacking" enabled anyway. There is not
much point in bloating the kernel code with an extra debug-only kernel
parameter parsing.

Thanks,

Mathieu

>
> Thanx, Paul

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-04-30 18:32:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

On Fri, Apr 30, 2010 at 02:12:01PM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney ([email protected]) wrote:
> > On Fri, Apr 30, 2010 at 12:16:45PM -0400, [email protected] wrote:
> > > On Fri, 30 Apr 2010 11:33:34 EDT, Mathieu Desnoyers said:
> > >
> > > > I recommend creating a kernel command line parameter that would tweak
> > > > the number of messages printed by lockdep. The default would indeed by 1
> > > > message, but people in a debugging marathon can specify a larger value
> > > > so they won't have to reboot between each individual lockdep error.
> > >
> > > Yeah, that would rock for development kernels - playing whack-a-mole with
> > > a half-dozen new lockdep whinges can easily stretch out for quite some time.
> >
> > The RCU-lockdep splats are a bit different in nature than the
> > deadlock-related splats that lockdep normally prints. The RCU-lockdep
> > splats are transient in nature, and it is easy to apply WARN_ON_ONCE().
> > In contrast, if you permit multiple deadlock-related lockdep splats,
> > you tend to get lots of warnings about the same deadlock cycle.
> >
> > So how about an additional kernel configuration variable, default
> > disabled, perhaps named CONFIG_PROVE_RCU_MULTIPLE, that allows a
> > single boot to see multiple messages? Unlike the dyntick-idle
> > WARN_ON()s that generated multi-gigabyte console logs in a great
> > hurry, I haven't yet seen excessive quantities of RCU-lockdep splats,
> > so I don't see the need for an integer limit.
> >
> > Thoughts?
>
> Ideally we don't want to flood the console with thousands of instances
> of the same RCU-lockdep splat (think of a missing read lock on a common
> code path). Therefore I think keeping an integer limit is relevant here.
> I agree that this integer limit could be selected by a CONFIG_ option
> rather than by a kernel parameter, as it will typically only be used on
> development kernels with "kernel hacking" enabled anyway. There is not
> much point in bloating the kernel code with an extra debug-only kernel
> parameter parsing.

We already limit via WARN_ON_ONCE(), and there are fewer than 500 lines
of code in the kernel that can give RCU lockdep splats, so I really believe
that we are OK without an overall limit for the foreseeable future.

Thanx, Paul

2010-04-30 18:33:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats


* Paul E. McKenney <[email protected]> wrote:

> Hello!
>
> This patchset contains four RCU lockdep splat fixes, courtesy of David
> Howells, Peter Zijlstra, and Trond Myklebust, [...]

I've applied #1 and #2 - but shouldnt #4 and #5 go via the NFS tree?

> [...] as well as an enhancement by Lai Jiangshan that permits collecting
> more than one RCU lockdep splat per boot.

Hm, this #3 patch i disagree with quite fundamentally: one of the big virtues
of lockdep is that it complains only once and then shuts up and lets the
system work. It allows distro debug kernels to have lockdep enabled, etc.

One bugreport per bootup per user is the most we can expect really. Not
disabling it risks getting a stream of repeat messages, annoyed testers and
gives us _less_ bugreports in the end.

Also, often the _first_ warning is the most reliable one - sometimes there's
interactions, and the first bug causing a second warning as well, etc. So
reporting just the highest-quality (i.e. first) issue we detect is the best
approach.

Thanks,

Ingo

2010-04-30 19:09:33

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

* Paul E. McKenney ([email protected]) wrote:
> On Fri, Apr 30, 2010 at 02:12:01PM -0400, Mathieu Desnoyers wrote:
> > * Paul E. McKenney ([email protected]) wrote:
> > > On Fri, Apr 30, 2010 at 12:16:45PM -0400, [email protected] wrote:
> > > > On Fri, 30 Apr 2010 11:33:34 EDT, Mathieu Desnoyers said:
> > > >
> > > > > I recommend creating a kernel command line parameter that would tweak
> > > > > the number of messages printed by lockdep. The default would indeed by 1
> > > > > message, but people in a debugging marathon can specify a larger value
> > > > > so they won't have to reboot between each individual lockdep error.
> > > >
> > > > Yeah, that would rock for development kernels - playing whack-a-mole with
> > > > a half-dozen new lockdep whinges can easily stretch out for quite some time.
> > >
> > > The RCU-lockdep splats are a bit different in nature than the
> > > deadlock-related splats that lockdep normally prints. The RCU-lockdep
> > > splats are transient in nature, and it is easy to apply WARN_ON_ONCE().
> > > In contrast, if you permit multiple deadlock-related lockdep splats,
> > > you tend to get lots of warnings about the same deadlock cycle.
> > >
> > > So how about an additional kernel configuration variable, default
> > > disabled, perhaps named CONFIG_PROVE_RCU_MULTIPLE, that allows a
> > > single boot to see multiple messages? Unlike the dyntick-idle
> > > WARN_ON()s that generated multi-gigabyte console logs in a great
> > > hurry, I haven't yet seen excessive quantities of RCU-lockdep splats,
> > > so I don't see the need for an integer limit.
> > >
> > > Thoughts?
> >
> > Ideally we don't want to flood the console with thousands of instances
> > of the same RCU-lockdep splat (think of a missing read lock on a common
> > code path). Therefore I think keeping an integer limit is relevant here.
> > I agree that this integer limit could be selected by a CONFIG_ option
> > rather than by a kernel parameter, as it will typically only be used on
> > development kernels with "kernel hacking" enabled anyway. There is not
> > much point in bloating the kernel code with an extra debug-only kernel
> > parameter parsing.
>
> We already limit via WARN_ON_ONCE(), and there are fewer than 500 lines
> of code in the kernel that can give RCU lockdep splats, so I really believe
> that we are OK without an overall limit for the foreseeable future.

Your idea makes sense then.

Thanks,

Mathieu

>
> Thanx, Paul

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-04-30 16:59:34

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:core/urgent] rcu: Fix RCU lockdep splat in set_task_cpu on fork path

Commit-ID: 8b08ca52f5942c21564bbb90ccfb61053f2c26a1
Gitweb: http://git.kernel.org/tip/8b08ca52f5942c21564bbb90ccfb61053f2c26a1
Author: Peter Zijlstra <[email protected]>
AuthorDate: Wed, 21 Apr 2010 13:02:07 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 30 Apr 2010 12:03:17 +0200

rcu: Fix RCU lockdep splat in set_task_cpu on fork path

Add an RCU read-side critical section to suppress this false
positive.

Located-by: Eric Paris <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index de0bd26..3c2a54f 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -323,6 +323,15 @@ static inline struct task_group *task_group(struct task_struct *p)
/* Change a task's cfs_rq and parent entity if it moves across CPUs/groups */
static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
{
+ /*
+ * Strictly speaking this rcu_read_lock() is not needed since the
+ * task_group is tied to the cgroup, which in turn can never go away
+ * as long as there are tasks attached to it.
+ *
+ * However since task_group() uses task_subsys_state() which is an
+ * rcu_dereference() user, this quiets CONFIG_PROVE_RCU.
+ */
+ rcu_read_lock();
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = task_group(p)->cfs_rq[cpu];
p->se.parent = task_group(p)->se[cpu];
@@ -332,6 +341,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
p->rt.rt_rq = task_group(p)->rt_rq[cpu];
p->rt.parent = task_group(p)->rt_se[cpu];
#endif
+ rcu_read_unlock();
}

#else

2010-04-30 19:41:31

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

On Fri, 30 Apr 2010 11:33:34 EDT, Mathieu Desnoyers said:

> I recommend creating a kernel command line parameter that would tweak
> the number of messages printed by lockdep. The default would indeed by 1
> message, but people in a debugging marathon can specify a larger value
> so they won't have to reboot between each individual lockdep error.

Yeah, that would rock for development kernels - playing whack-a-mole with
a half-dozen new lockdep whinges can easily stretch out for quite some time.


Attachments:
(No filename) (227.00 B)

2010-04-30 23:47:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH tip/core/urgent] fix several lockdep splats, allow multiple splats

On Fri, Apr 30, 2010 at 12:07:49PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <[email protected]> wrote:
>
> > Hello!
> >
> > This patchset contains four RCU lockdep splat fixes, courtesy of David
> > Howells, Peter Zijlstra, and Trond Myklebust, [...]
>
> I've applied #1 and #2 - but shouldnt #4 and #5 go via the NFS tree?

Good point -- I will forward them on to Trond.

Thanx, Paul