2010-01-29 07:04:17

by Cong Wang

[permalink] [raw]
Subject: [Patch 0/2] sysfs: fix s_active lockdep warning


Recently we met a lockdep warning from sysfs during s2ram or cpu hotplug.
As reported by several people, it is something like:

[ 6967.926563] ACPI: Preparing to enter system sleep state S3
[ 6967.956156] Disabling non-boot CPUs ...
[ 6967.970401]
[ 6967.970408] =============================================
[ 6967.970419] [ INFO: possible recursive locking detected ]
[ 6967.970431] 2.6.33-rc2-git6 #27
[ 6967.970439] ---------------------------------------------
[ 6967.970450] pm-suspend/22147 is trying to acquire lock:
[ 6967.970460] (s_active){++++.+}, at: [<c10d2941>]
sysfs_hash_and_remove+0x3d/0x4f
[ 6967.970493]
[ 6967.970497] but task is already holding lock:
[ 6967.970506] (s_active){++++.+}, at: [<c10d4110>]
sysfs_get_active_two+0x16/0x36
[...]

Eric already provides a patch for this[1], but it still can't fix the
problem. I add the missing part of Eric's patch and send these two patches
together, hopefully we can fix the warning completely.

1. http://lkml.org/lkml/2010/1/10/282


Reported-by: Benjamin Herrenschmidt <[email protected]>
Reported-by: Larry Finger <[email protected]>
Reported-by: Miles Lane <[email protected]>
Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: WANG Cong <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>


2010-01-29 07:04:27

by Cong Wang

[permalink] [raw]
Subject: [Patch 1/2] sysfs: add support for lockdep subclasses to s_active

From: Eric W. Biederman <[email protected]>
Date: Fri, 29 Jan 2010 14:03:39 +0800
Subject: [PATCH 1/2] sysfs: add support for lockdep subclasses to s_active

We have apparently valid cases where the code for a sysfs attribute
removes other sysfs attributes. Without support for subclasses
lockdep flags a possible recursive lock problem as it figures
the first sysfs attribute could be attempting to remove itself.

By adding support for sysfs subclasses we can teach lockdep to
distinguish between different types of sysfs attributes and not
get confused.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: WANG Cong <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>

---
fs/sysfs/dir.c | 14 ++++++++++++--
include/linux/sysfs.h | 7 +++++++
kernel/power/power.h | 15 ++++++++-------
3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 699f371..e0cd4a0 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -95,9 +95,14 @@ static void sysfs_unlink_sibling(struct sysfs_dirent *sd)
*/
static struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)
{
+ int subclass;
if (unlikely(!sd))
return NULL;

+ subclass = SYSFS_ATTR_NORMAL;
+ if (sysfs_type(sd) == SYSFS_KOBJ_ATTR)
+ subclass = sd->s_attr.attr->subclass;
+
while (1) {
int v, t;

@@ -107,7 +112,7 @@ static struct sysfs_dirent *sysfs_get_active(struct sysfs_dirent *sd)

t = atomic_cmpxchg(&sd->s_active, v, v + 1);
if (likely(t == v)) {
- rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
+ rwsem_acquire_read(&sd->dep_map, subclass, 1, _RET_IP_);
return sd;
}
if (t < 0)
@@ -192,12 +197,17 @@ void sysfs_put_active_two(struct sysfs_dirent *sd)
static void sysfs_deactivate(struct sysfs_dirent *sd)
{
DECLARE_COMPLETION_ONSTACK(wait);
+ int subclass;
int v;

BUG_ON(sd->s_sibling || !(sd->s_flags & SYSFS_FLAG_REMOVED));
sd->s_sibling = (void *)&wait;

- rwsem_acquire(&sd->dep_map, 0, 0, _RET_IP_);
+ subclass = SYSFS_ATTR_NORMAL;
+ if (sysfs_type(sd) == SYSFS_KOBJ_ATTR)
+ subclass = sd->s_attr.attr->subclass;
+
+ rwsem_acquire(&sd->dep_map, subclass, 0, _RET_IP_);
/* atomic_add_return() is a mb(), put_active() will always see
* the updated sd->s_sibling.
*/
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index cfa8308..2f50fec 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -20,6 +20,12 @@
struct kobject;
struct module;

+enum sysfs_attr_lock_class
+{
+ SYSFS_ATTR_NORMAL,
+ SYSFS_ATTR_PM_CONTROL,
+};
+
/* FIXME
* The *owner field is no longer used.
* x86 tree has been cleaned up. The owner
@@ -29,6 +35,7 @@ struct attribute {
const char *name;
struct module *owner;
mode_t mode;
+ enum sysfs_attr_lock_class subclass;
};

struct attribute_group {
diff --git a/kernel/power/power.h b/kernel/power/power.h
index 46c5a26..0459f27 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -54,13 +54,14 @@ extern int hibernation_platform_enter(void);
extern int pfn_is_nosave(unsigned long);

#define power_attr(_name) \
-static struct kobj_attribute _name##_attr = { \
- .attr = { \
- .name = __stringify(_name), \
- .mode = 0644, \
- }, \
- .show = _name##_show, \
- .store = _name##_store, \
+static struct kobj_attribute _name##_attr = { \
+ .attr = { \
+ .name = __stringify(_name), \
+ .mode = 0644, \
+ .subclass = SYSFS_ATTR_PM_CONTROL, \
+ }, \
+ .show = _name##_show, \
+ .store = _name##_store, \
}

/* Preferred image size in bytes (default 500 MB) */
--
1.5.5.6

2010-01-29 07:04:38

by Cong Wang

[permalink] [raw]
Subject: [PATCH 2/2] sysfs: fix the incomplete part of subclass support for s_active

From: Amerigo Wang <[email protected]>
Date: Fri, 29 Jan 2010 14:18:58 +0800
Subject: [PATCH 2/2] sysfs: fix the incomplete part of subclass support for s_active

Patch 1/2 missed the initialization part for subclass, which I think
is the reason why it still can't stop the lockdep warning. This depends
on patch 1/2.

Compile test only.

Signed-off-by: WANG Cong <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>

---
fs/sysfs/dir.c | 1 -
fs/sysfs/file.c | 2 ++
fs/sysfs/sysfs.h | 6 +++---
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e0cd4a0..6b63198 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -364,7 +364,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)

atomic_set(&sd->s_count, 1);
atomic_set(&sd->s_active, 0);
- sysfs_dirent_init_lockdep(sd);

sd->s_name = name;
sd->s_mode = mode;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index dc30d9e..abf30d7 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -509,6 +509,8 @@ int sysfs_add_file_mode(struct sysfs_dirent *dir_sd,
if (!sd)
return -ENOMEM;
sd->s_attr.attr = (void *)attr;
+ if ((type & SYSFS_TYPE_MASK) == SYSFS_KOBJ_ATTR)
+ sysfs_dirent_init_lockdep(sd, sd->s_attr.attr->subclass);

sysfs_addrm_start(&acxt, dir_sd);
rc = sysfs_add_one(&acxt, sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index cdd9377..7e78f2e 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -89,14 +89,14 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
}

#ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define sysfs_dirent_init_lockdep(sd) \
+#define sysfs_dirent_init_lockdep(sd, c) \
do { \
static struct lock_class_key __key; \
\
- lockdep_init_map(&sd->dep_map, "s_active", &__key, 0); \
+ lockdep_init_map(&sd->dep_map, "s_active", &__key, c); \
} while(0)
#else
-#define sysfs_dirent_init_lockdep(sd) do {} while(0)
+#define sysfs_dirent_init_lockdep(sd, c) do {} while(0)
#endif

/*
--
1.5.5.6

2010-01-29 07:22:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Amerigo Wang <[email protected]> writes:

> Recently we met a lockdep warning from sysfs during s2ram or cpu hotplug.
> As reported by several people, it is something like:
>
> [ 6967.926563] ACPI: Preparing to enter system sleep state S3
> [ 6967.956156] Disabling non-boot CPUs ...
> [ 6967.970401]
> [ 6967.970408] =============================================
> [ 6967.970419] [ INFO: possible recursive locking detected ]
> [ 6967.970431] 2.6.33-rc2-git6 #27
> [ 6967.970439] ---------------------------------------------
> [ 6967.970450] pm-suspend/22147 is trying to acquire lock:
> [ 6967.970460] (s_active){++++.+}, at: [<c10d2941>]
> sysfs_hash_and_remove+0x3d/0x4f
> [ 6967.970493]
> [ 6967.970497] but task is already holding lock:
> [ 6967.970506] (s_active){++++.+}, at: [<c10d4110>]
> sysfs_get_active_two+0x16/0x36
> [...]
>
> Eric already provides a patch for this[1], but it still can't fix the
> problem. I add the missing part of Eric's patch and send these two patches
> together, hopefully we can fix the warning completely.
>
> 1. http://lkml.org/lkml/2010/1/10/282
>
>
> Reported-by: Benjamin Herrenschmidt <[email protected]>
> Reported-by: Larry Finger <[email protected]>
> Reported-by: Miles Lane <[email protected]>
> Reported-by: Heiko Carstens <[email protected]>
> Signed-off-by: WANG Cong <[email protected]>
> Signed-off-by: Eric W. Biederman <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>

Thanks for following up on this.

I suspect we may want to create a separate class for each sysfs file
instead of playing whack-a-mole and creating a subclass each time we
have problems.

I don't see why the rules for one sysfs file should be the same as for
any other sysfs file.

Eric

2010-01-29 08:36:04

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Eric W. Biederman wrote:
> Amerigo Wang <[email protected]> writes:
>
>> Recently we met a lockdep warning from sysfs during s2ram or cpu hotplug.
>> As reported by several people, it is something like:
>>
>> [ 6967.926563] ACPI: Preparing to enter system sleep state S3
>> [ 6967.956156] Disabling non-boot CPUs ...
>> [ 6967.970401]
>> [ 6967.970408] =============================================
>> [ 6967.970419] [ INFO: possible recursive locking detected ]
>> [ 6967.970431] 2.6.33-rc2-git6 #27
>> [ 6967.970439] ---------------------------------------------
>> [ 6967.970450] pm-suspend/22147 is trying to acquire lock:
>> [ 6967.970460] (s_active){++++.+}, at: [<c10d2941>]
>> sysfs_hash_and_remove+0x3d/0x4f
>> [ 6967.970493]
>> [ 6967.970497] but task is already holding lock:
>> [ 6967.970506] (s_active){++++.+}, at: [<c10d4110>]
>> sysfs_get_active_two+0x16/0x36
>> [...]
>>
>> Eric already provides a patch for this[1], but it still can't fix the
>> problem. I add the missing part of Eric's patch and send these two patches
>> together, hopefully we can fix the warning completely.
>>
>> 1. http://lkml.org/lkml/2010/1/10/282
>>
>>
>> Reported-by: Benjamin Herrenschmidt <[email protected]>
>> Reported-by: Larry Finger <[email protected]>
>> Reported-by: Miles Lane <[email protected]>
>> Reported-by: Heiko Carstens <[email protected]>
>> Signed-off-by: WANG Cong <[email protected]>
>> Signed-off-by: Eric W. Biederman <[email protected]>
>> Cc: Tejun Heo <[email protected]>
>> Cc: Greg Kroah-Hartman <[email protected]>
>
> Thanks for following up on this.
>
> I suspect we may want to create a separate class for each sysfs file
> instead of playing whack-a-mole and creating a subclass each time we
> have problems.
>
> I don't see why the rules for one sysfs file should be the same as for
> any other sysfs file.
>

I am confused, we don't know who created sysfs files unless we
separate them by subclasses, the way of your patch is very straight
ward.

2010-01-29 13:44:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Cong Wang <[email protected]> writes:

> Eric W. Biederman wrote:
>> Amerigo Wang <[email protected]> writes:
>>
>>> Recently we met a lockdep warning from sysfs during s2ram or cpu hotplug.
>>> As reported by several people, it is something like:
>>>
>>> [ 6967.926563] ACPI: Preparing to enter system sleep state S3
>>> [ 6967.956156] Disabling non-boot CPUs ...
>>> [ 6967.970401]
>>> [ 6967.970408] =============================================
>>> [ 6967.970419] [ INFO: possible recursive locking detected ]
>>> [ 6967.970431] 2.6.33-rc2-git6 #27
>>> [ 6967.970439] ---------------------------------------------
>>> [ 6967.970450] pm-suspend/22147 is trying to acquire lock:
>>> [ 6967.970460] (s_active){++++.+}, at: [<c10d2941>]
>>> sysfs_hash_and_remove+0x3d/0x4f
>>> [ 6967.970493]
>>> [ 6967.970497] but task is already holding lock:
>>> [ 6967.970506] (s_active){++++.+}, at: [<c10d4110>]
>>> sysfs_get_active_two+0x16/0x36
>>> [...]
>>>
>>> Eric already provides a patch for this[1], but it still can't fix the
>>> problem. I add the missing part of Eric's patch and send these two patches
>>> together, hopefully we can fix the warning completely.
>>>
>>> 1. http://lkml.org/lkml/2010/1/10/282
>>>
>>>
>>> Reported-by: Benjamin Herrenschmidt <[email protected]>
>>> Reported-by: Larry Finger <[email protected]>
>>> Reported-by: Miles Lane <[email protected]>
>>> Reported-by: Heiko Carstens <[email protected]>
>>> Signed-off-by: WANG Cong <[email protected]>
>>> Signed-off-by: Eric W. Biederman <[email protected]>
>>> Cc: Tejun Heo <[email protected]>
>>> Cc: Greg Kroah-Hartman <[email protected]>
>>
>> Thanks for following up on this.
>>
>> I suspect we may want to create a separate class for each sysfs file
>> instead of playing whack-a-mole and creating a subclass each time we
>> have problems.
>>
>> I don't see why the rules for one sysfs file should be the same as for
>> any other sysfs file.
>>
>
> I am confused, we don't know who created sysfs files unless we
> separate them by subclasses, the way of your patch is very straight
> ward.

The assumption is that all entities in a class are used very similarly.
What I was suggesting is that it may make sense, and be simpler to have
a separate __key value and thus place each sysfs file in it's class.

Doing that is a lot more for lockdep to track, but it would not produce
the confusing false positives that we see now. From the reports I have
seen we may have more that 16 subclasses in sysfs, and it will likely
take us a while to find them all.

Eric

2010-01-29 14:23:27

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, Jan 29, 2010 at 05:44:07AM -0800, Eric W. Biederman wrote:
> Cong Wang <[email protected]> writes:
>
> > Eric W. Biederman wrote:
> >> Amerigo Wang <[email protected]> writes:
> >>
> >>> Recently we met a lockdep warning from sysfs during s2ram or cpu hotplug.
> >>> As reported by several people, it is something like:
> >>>
> >>> [ 6967.926563] ACPI: Preparing to enter system sleep state S3
> >>> [ 6967.956156] Disabling non-boot CPUs ...
> >>> [ 6967.970401]
> >>> [ 6967.970408] =============================================
> >>> [ 6967.970419] [ INFO: possible recursive locking detected ]
> >>> [ 6967.970431] 2.6.33-rc2-git6 #27
> >>> [ 6967.970439] ---------------------------------------------
> >>> [ 6967.970450] pm-suspend/22147 is trying to acquire lock:
> >>> [ 6967.970460] (s_active){++++.+}, at: [<c10d2941>]
> >>> sysfs_hash_and_remove+0x3d/0x4f
> >>> [ 6967.970493]
> >>> [ 6967.970497] but task is already holding lock:
> >>> [ 6967.970506] (s_active){++++.+}, at: [<c10d4110>]
> >>> sysfs_get_active_two+0x16/0x36
> >>> [...]
> >>>
> >>> Eric already provides a patch for this[1], but it still can't fix the
> >>> problem. I add the missing part of Eric's patch and send these two patches
> >>> together, hopefully we can fix the warning completely.
> >>>
> >>> 1. http://lkml.org/lkml/2010/1/10/282
> >>>
> >>>
> >>> Reported-by: Benjamin Herrenschmidt <[email protected]>
> >>> Reported-by: Larry Finger <[email protected]>
> >>> Reported-by: Miles Lane <[email protected]>
> >>> Reported-by: Heiko Carstens <[email protected]>
> >>> Signed-off-by: WANG Cong <[email protected]>
> >>> Signed-off-by: Eric W. Biederman <[email protected]>
> >>> Cc: Tejun Heo <[email protected]>
> >>> Cc: Greg Kroah-Hartman <[email protected]>
> >>
> >> Thanks for following up on this.
> >>
> >> I suspect we may want to create a separate class for each sysfs file
> >> instead of playing whack-a-mole and creating a subclass each time we
> >> have problems.
> >>
> >> I don't see why the rules for one sysfs file should be the same as for
> >> any other sysfs file.
> >>
> >
> > I am confused, we don't know who created sysfs files unless we
> > separate them by subclasses, the way of your patch is very straight
> > ward.
>
> The assumption is that all entities in a class are used very similarly.
> What I was suggesting is that it may make sense, and be simpler to have
> a separate __key value and thus place each sysfs file in it's class.
>
> Doing that is a lot more for lockdep to track, but it would not produce
> the confusing false positives that we see now. From the reports I have
> seen we may have more that 16 subclasses in sysfs, and it will likely
> take us a while to find them all.

Heh, this whole mess is the very reason we didn't add lockdep support to
the driver core. Nested devices that all look alike from the driver
core, are really different objects and the locking lifetimes are
separate, but lockdep can't see that.

I suggest we just remove the original patch, as it seems to be causing
way too many problems.

Any objections to that?

thanks,

greg k-h

2010-01-29 17:59:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 2010-01-29 at 06:22 -0800, Greg KH wrote:
>
> Heh, this whole mess is the very reason we didn't add lockdep support to
> the driver core. Nested devices that all look alike from the driver
> core, are really different objects and the locking lifetimes are
> separate, but lockdep can't see that.

And here I through Alan Stern had a handle on making the driver core
play nice.

2010-01-29 18:03:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 2010-01-28 at 23:21 -0800, Eric W. Biederman wrote:
>
> I suspect we may want to create a separate class for each sysfs file
> instead of playing whack-a-mole and creating a subclass each time we
> have problems.

Well, you can make a class every time you have a problem, I don't think
you can make a class for each file since classes have to have static
storage, also subclasses are limited to 8.

2010-01-29 18:11:31

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, Jan 29, 2010 at 06:57:28PM +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-29 at 06:22 -0800, Greg KH wrote:
> >
> > Heh, this whole mess is the very reason we didn't add lockdep support to
> > the driver core. Nested devices that all look alike from the driver
> > core, are really different objects and the locking lifetimes are
> > separate, but lockdep can't see that.
>
> And here I through Alan Stern had a handle on making the driver core
> play nice.

It's not the driver core that is the issue here, it's that lockdep can't
handle the tree structure of devices that is represented in the kernel.

I don't think it is a driver core problem, but rather, a lockdep issue.

thanks,

greg k-h

2010-01-29 18:15:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 2010-01-29 at 10:10 -0800, Greg KH wrote:
> On Fri, Jan 29, 2010 at 06:57:28PM +0100, Peter Zijlstra wrote:
> > On Fri, 2010-01-29 at 06:22 -0800, Greg KH wrote:
> > >
> > > Heh, this whole mess is the very reason we didn't add lockdep support to
> > > the driver core. Nested devices that all look alike from the driver
> > > core, are really different objects and the locking lifetimes are
> > > separate, but lockdep can't see that.
> >
> > And here I through Alan Stern had a handle on making the driver core
> > play nice.
>
> It's not the driver core that is the issue here, it's that lockdep can't
> handle the tree structure of devices that is represented in the kernel.
>
> I don't think it is a driver core problem, but rather, a lockdep issue.

Right, we've been over that and I think I added enough lockdep
annotations to make it work for the device tree. At least, Alan and I
seemed to agree on that last time we talked about it.


2010-01-29 18:22:16

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, Jan 29, 2010 at 07:14:20PM +0100, Peter Zijlstra wrote:
> On Fri, 2010-01-29 at 10:10 -0800, Greg KH wrote:
> > On Fri, Jan 29, 2010 at 06:57:28PM +0100, Peter Zijlstra wrote:
> > > On Fri, 2010-01-29 at 06:22 -0800, Greg KH wrote:
> > > >
> > > > Heh, this whole mess is the very reason we didn't add lockdep support to
> > > > the driver core. Nested devices that all look alike from the driver
> > > > core, are really different objects and the locking lifetimes are
> > > > separate, but lockdep can't see that.
> > >
> > > And here I through Alan Stern had a handle on making the driver core
> > > play nice.
> >
> > It's not the driver core that is the issue here, it's that lockdep can't
> > handle the tree structure of devices that is represented in the kernel.
> >
> > I don't think it is a driver core problem, but rather, a lockdep issue.
>
> Right, we've been over that and I think I added enough lockdep
> annotations to make it work for the device tree. At least, Alan and I
> seemed to agree on that last time we talked about it.

Ah, I didn't realize that, very nice.

If so, then this sysfs lock stuff should be able to use those
annotations and we shouldn't have this issue, right?

thanks,

greg k-h

2010-01-29 20:11:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 2010-01-29 at 10:21 -0800, Greg KH wrote:
> On Fri, Jan 29, 2010 at 07:14:20PM +0100, Peter Zijlstra wrote:
> > On Fri, 2010-01-29 at 10:10 -0800, Greg KH wrote:
> > > On Fri, Jan 29, 2010 at 06:57:28PM +0100, Peter Zijlstra wrote:
> > > > On Fri, 2010-01-29 at 06:22 -0800, Greg KH wrote:
> > > > >
> > > > > Heh, this whole mess is the very reason we didn't add lockdep support to
> > > > > the driver core. Nested devices that all look alike from the driver
> > > > > core, are really different objects and the locking lifetimes are
> > > > > separate, but lockdep can't see that.
> > > >
> > > > And here I through Alan Stern had a handle on making the driver core
> > > > play nice.
> > >
> > > It's not the driver core that is the issue here, it's that lockdep can't
> > > handle the tree structure of devices that is represented in the kernel.
> > >
> > > I don't think it is a driver core problem, but rather, a lockdep issue.
> >
> > Right, we've been over that and I think I added enough lockdep
> > annotations to make it work for the device tree. At least, Alan and I
> > seemed to agree on that last time we talked about it.
>
> Ah, I didn't realize that, very nice.
>
> If so, then this sysfs lock stuff should be able to use those
> annotations and we shouldn't have this issue, right?

I really wouldn't know, I've not yet looked at sysfs to see what the
particular issue is. But possibly, if you say the problem space is
similar.

2010-01-29 20:25:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Greg KH <[email protected]> writes:

> Heh, this whole mess is the very reason we didn't add lockdep support to
> the driver core. Nested devices that all look alike from the driver
> core, are really different objects and the locking lifetimes are
> separate, but lockdep can't see that.
>
> I suggest we just remove the original patch, as it seems to be causing
> way too many problems.
>
> Any objections to that?

I think the hit rate for real problems has been about 25-50%. Of the
false positives a lot of those have been, code that is at least
questionable.

Furthermore there are problems we can find this way that we won't know
about any other way. Unfortunately I haven't had much time to do
anything kernel related lately, or I would have done more with this.
My comment was about simply about finding a good way to increase the
signal to noise ration so investigations can reasonably start with the
presumption that code lockdep is complaining about real problems.

The deadlocks that we can hit in sysfs are very nasty to find, they
have persisted for years, and they pop back up after they are fixed.
So far the pain from lockdep annotations seems a lot lower.

Right now annotating with subclasses as Amerigo is attempting will work,
and remove the false positives. I was simply hoping to find a faster
way to get there.

So yes, I do object to removing the original patch. Let's put in the
work to find a good path to remove the handful of cases that cause
false positives.

It's a shame we aren't getting stack overflow errors on the same paths
that are removing sysfs attributes from the callback handlers of sysfs
attributes, or we could just rule out that questionable practice and
call all of the lockdep warnings valid. Unfortunately that would just
be the tail wagging the horse.

Eric

2010-01-29 20:30:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Peter Zijlstra <[email protected]> writes:

> On Fri, 2010-01-29 at 10:21 -0800, Greg KH wrote:
>> On Fri, Jan 29, 2010 at 07:14:20PM +0100, Peter Zijlstra wrote:
>> > On Fri, 2010-01-29 at 10:10 -0800, Greg KH wrote:
>> > > On Fri, Jan 29, 2010 at 06:57:28PM +0100, Peter Zijlstra wrote:
>> > > > On Fri, 2010-01-29 at 06:22 -0800, Greg KH wrote:
>> > > > >
>> > > > > Heh, this whole mess is the very reason we didn't add lockdep support to
>> > > > > the driver core. Nested devices that all look alike from the driver
>> > > > > core, are really different objects and the locking lifetimes are
>> > > > > separate, but lockdep can't see that.
>> > > >
>> > > > And here I through Alan Stern had a handle on making the driver core
>> > > > play nice.
>> > >
>> > > It's not the driver core that is the issue here, it's that lockdep can't
>> > > handle the tree structure of devices that is represented in the kernel.
>> > >
>> > > I don't think it is a driver core problem, but rather, a lockdep issue.
>> >
>> > Right, we've been over that and I think I added enough lockdep
>> > annotations to make it work for the device tree. At least, Alan and I
>> > seemed to agree on that last time we talked about it.
>>
>> Ah, I didn't realize that, very nice.
>>
>> If so, then this sysfs lock stuff should be able to use those
>> annotations and we shouldn't have this issue, right?
>
> I really wouldn't know, I've not yet looked at sysfs to see what the
> particular issue is. But possibly, if you say the problem space is
> similar.

We get false positives when the code of a sysfs attribute
synchronously removes other sysfs attributes. In general that is not
safe due to hotplug etc, but there are specific instances of static
sysfs entries like the pm_core where it appears to be safe.

I am not familiar with the device core lockdep issues. Are they similar?

Eric

2010-01-30 05:31:34

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, Jan 29, 2010 at 12:25:22PM -0800, Eric W. Biederman wrote:
> Greg KH <[email protected]> writes:
>
> > Heh, this whole mess is the very reason we didn't add lockdep support to
> > the driver core. Nested devices that all look alike from the driver
> > core, are really different objects and the locking lifetimes are
> > separate, but lockdep can't see that.
> >
> > I suggest we just remove the original patch, as it seems to be causing
> > way too many problems.
> >
> > Any objections to that?
>
> I think the hit rate for real problems has been about 25-50%. Of the
> false positives a lot of those have been, code that is at least
> questionable.
>
> Furthermore there are problems we can find this way that we won't know
> about any other way. Unfortunately I haven't had much time to do
> anything kernel related lately, or I would have done more with this.
> My comment was about simply about finding a good way to increase the
> signal to noise ration so investigations can reasonably start with the
> presumption that code lockdep is complaining about real problems.
>
> The deadlocks that we can hit in sysfs are very nasty to find, they
> have persisted for years, and they pop back up after they are fixed.
> So far the pain from lockdep annotations seems a lot lower.
>
> Right now annotating with subclasses as Amerigo is attempting will work,
> and remove the false positives. I was simply hoping to find a faster
> way to get there.
>
> So yes, I do object to removing the original patch. Let's put in the
> work to find a good path to remove the handful of cases that cause
> false positives.

Ok, that sounds good to me.

thanks,

greg k-h

2010-02-04 11:39:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 2010-01-29 at 12:30 -0800, Eric W. Biederman wrote:

> We get false positives when the code of a sysfs attribute
> synchronously removes other sysfs attributes. In general that is not
> safe due to hotplug etc, but there are specific instances of static
> sysfs entries like the pm_core where it appears to be safe.
>
> I am not familiar with the device core lockdep issues. Are they similar?

The device tree had the problem that we could basically hold a device
lock and an unspecified number of parent locks (iirc this was due to
device probing, where we hold the bus lock while probing/adding child
device, recursively).

If we place each dev->lock into the same class (which would naively
happen), then this would lead to recursive lock warnings. The proposed
solution for this is to create MAX_LOCK_DEPTH classes and assign them to
the dev->lock depending on the depth in the device tree (Alan said that
MAX_LOCK_DEPTH is sufficient for all practical cases).

static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];

device_add() or thereabouts would have something like:

#ifdef CONFIG_PROVE_LOCKING
BUG_ON(dev->depth >= MAX_LOCK_DEPTH);
lockdep_set_class(dev->lock, &dev_tree_classes[dev->depth]);
#endif


Then there was a problem were we could lock all child devices while
holding the parent device lock (forgot why though), this would, on
taking the second child dev->lock, again lead to recursive lock
warnings.

We have an annotation for that: lock_nest_lock (currently only
spin_lock_nest_lock exists, but mutex_lock_nest_lock is easily created),
and this would allow you to do things like:

mutex_lock(&parent->lock);
for_each_device_child(child, parent) {
mutex_lock_nest_lock(&child->lock, &parent->lock);
...
}


I hope this helps in figuring out the sysfs case..

2010-02-04 16:35:32

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Greg:

You have accepted Thomas's patch "drivers/base: Convert dev->sem to
mutex". It generates lockdep violations galore during device probing
and removal! Luckily lockdep is smart enough only to print the first
occurrence. Here's what I get early on during bootup:

[ 0.149911] ACPI: EC: Look up EC in DSDT
[ 0.170665] ACPI: Executed 1 blocks of module-level executable AML code
[ 0.198111] ACPI: Interpreter enabled
[ 0.198267] ACPI: (supports S0 S3 S4 S5)
[ 0.198802] ACPI: Using IOAPIC for interrupt routing
[ 0.266493]
[ 0.266496] =============================================
[ 0.266775] [ INFO: possible recursive locking detected ]
[ 0.266917] 2.6.33-rc6 #1
[ 0.267051] ---------------------------------------------
[ 0.267192] swapper/1 is trying to acquire lock:
[ 0.267332] (&dev->mutex){+.+...}, at: [<c11496be>] __driver_attach+0x38/0x63
[ 0.267683]
[ 0.267685] but task is already holding lock:
[ 0.267953] (&dev->mutex){+.+...}, at: [<c11496b2>] __driver_attach+0x2c/0x63
[ 0.268000]
[ 0.268000] other info that might help us debug this:
[ 0.268000] 1 lock held by swapper/1:
[ 0.268000] #0: (&dev->mutex){+.+...}, at: [<c11496b2>] __driver_attach+0x2c/0x63
[ 0.268000]
[ 0.268000] stack backtrace:
[ 0.268000] Pid: 1, comm: swapper Not tainted 2.6.33-rc6 #1
[ 0.268000] Call Trace:
[ 0.268000] [<c11c819e>] ? printk+0xf/0x11
[ 0.268000] [<c1041c9b>] __lock_acquire+0x804/0xb47
[ 0.268000] [<c10b2026>] ? sysfs_addrm_finish+0x19/0xe2
[ 0.268000] [<c1042020>] lock_acquire+0x42/0x59
[ 0.268000] [<c11496be>] ? __driver_attach+0x38/0x63
[ 0.268000] [<c11c90c6>] __mutex_lock_common+0x39/0x38f
[ 0.268000] [<c11496be>] ? __driver_attach+0x38/0x63
[ 0.268000] [<c11c94ab>] mutex_lock_nested+0x2b/0x33
[ 0.268000] [<c11496be>] ? __driver_attach+0x38/0x63
[ 0.268000] [<c11496be>] __driver_attach+0x38/0x63
[ 0.268000] [<c1148e0a>] bus_for_each_dev+0x3d/0x67
[ 0.268000] [<c11494cf>] driver_attach+0x14/0x16
[ 0.268000] [<c1149686>] ? __driver_attach+0x0/0x63
[ 0.268000] [<c11491c1>] bus_add_driver+0x92/0x1c5
[ 0.268000] [<c114990f>] driver_register+0x79/0xe0
[ 0.268000] [<c1106d32>] acpi_bus_register_driver+0x3a/0x3c
[ 0.268000] [<c131999f>] acpi_power_init+0x3f/0x5e
[ 0.268000] [<c1319422>] acpi_init+0x28e/0x2c8
[ 0.268000] [<c1319194>] ? acpi_init+0x0/0x2c8
[ 0.268000] [<c1001139>] do_one_initcall+0x4c/0x136
[ 0.268000] [<c130130b>] kernel_init+0x11c/0x16d
[ 0.268000] [<c13011ef>] ? kernel_init+0x0/0x16d
[ 0.268000] [<c1002cba>] kernel_thread_helper+0x6/0x10
[ 0.268485] ACPI: Power Resource [GFAN] (on)


On Thu, 4 Feb 2010, Peter Zijlstra wrote:

> The device tree had the problem that we could basically hold a device
> lock and an unspecified number of parent locks (iirc this was due to
> device probing, where we hold the bus lock while probing/adding child
> device, recursively).
>
> If we place each dev->lock into the same class (which would naively
> happen), then this would lead to recursive lock warnings. The proposed
> solution for this is to create MAX_LOCK_DEPTH classes and assign them to
> the dev->lock depending on the depth in the device tree (Alan said that
> MAX_LOCK_DEPTH is sufficient for all practical cases).
>
> static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
>
> device_add() or thereabouts would have something like:
>
> #ifdef CONFIG_PROVE_LOCKING
> BUG_ON(dev->depth >= MAX_LOCK_DEPTH);
> lockdep_set_class(dev->lock, &dev_tree_classes[dev->depth]);
> #endif

Unfortunately this doesn't really work. Here is a patch implementing
the scheme:

Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -22,6 +22,7 @@
#include <linux/kallsyms.h>
#include <linux/mutex.h>
#include <linux/async.h>
+#include <linux/sched.h>

#include "base.h"
#include "power/power.h"
@@ -671,6 +672,26 @@ static void setup_parent(struct device *
dev->kobj.parent = kobj;
}

+#ifdef CONFIG_PROVE_LOCKING
+static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
+
+static void setup_mutex_depth(struct device *dev, struct device *parent)
+{
+ int depth = 0;
+
+ /* Dynamically determine the device's depth in the device tree */
+ while (parent) {
+ ++depth;
+ parent = parent->parent;
+ }
+ BUG_ON(depth > MAX_LOCK_DEPTH);
+ lockdep_set_class(&dev->mutex, &dev_tree_classes[depth]);
+}
+#else
+static inline void setup_mutex_depth(struct device *dev,
+ struct device *parent) {}
+#endif
+
static int device_add_class_symlinks(struct device *dev)
{
int error;
@@ -912,6 +933,7 @@ int device_add(struct device *dev)

parent = get_device(dev->parent);
setup_parent(dev, parent);
+ setup_mutex_depth(dev, parent);

/* use parent numa_node */
if (parent)


This doesn't address the fact that we really have multiple device trees
(for example, class devices are handled separately from normal
devices). With the above patch installed, I still get lockdep
violations farther on during boot:

[ 0.272332] pci_bus 0000:00: on NUMA node 0
[ 0.272355] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.273503] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
[ 0.279205]
[ 0.279208] =============================================
[ 0.279485] [ INFO: possible recursive locking detected ]
[ 0.279628] 2.6.33-rc6 #2
[ 0.279763] ---------------------------------------------
[ 0.279905] swapper/1 is trying to acquire lock:
[ 0.280000] (&dev_tree_classes[depth]#2){+.+.+.}, at: [<c1149776>] device_attach+0x14/0x6e
[ 0.280000]
[ 0.280000] but task is already holding lock:
[ 0.280000] (&dev_tree_classes[depth]#2){+.+.+.}, at: [<c11496de>] __driver_attach+0x2c/0x63
[ 0.280000]
[ 0.280000] other info that might help us debug this:
[ 0.280000] 2 locks held by swapper/1:
[ 0.280000] #0: (&dev_tree_classes[depth]#2){+.+.+.}, at: [<c11496de>] __driver_attach+0x2c/0x63
[ 0.280000] #1: (&dev_tree_classes[depth]#3){+.+.+.}, at: [<c11496ea>] __driver_attach+0x38/0x63
[ 0.280000]
[ 0.280000] stack backtrace:
[ 0.280000] Pid: 1, comm: swapper Not tainted 2.6.33-rc6 #2
[ 0.280000] Call Trace:
[ 0.280000] [<c11c81ce>] ? printk+0xf/0x11
[ 0.280000] [<c1041c9b>] __lock_acquire+0x804/0xb47
[ 0.280000] [<c101a73d>] ? spin_unlock_irqrestore+0x8/0xa
[ 0.280000] [<c101a891>] ? __wake_up+0x32/0x3b
[ 0.280000] [<c1042020>] lock_acquire+0x42/0x59
[ 0.280000] [<c1149776>] ? device_attach+0x14/0x6e
[ 0.280000] [<c11c90f6>] __mutex_lock_common+0x39/0x38f
[ 0.280000] [<c1149776>] ? device_attach+0x14/0x6e
[ 0.280000] [<c1040e2e>] ? trace_hardirqs_on+0xb/0xd
[ 0.280000] [<c10ed5a7>] ? kobject_uevent_env+0x2e9/0x30a
[ 0.280000] [<c10ed5a7>] ? kobject_uevent_env+0x2e9/0x30a
[ 0.280000] [<c11c94db>] mutex_lock_nested+0x2b/0x33
[ 0.280000] [<c1149776>] ? device_attach+0x14/0x6e
[ 0.280000] [<c1149776>] device_attach+0x14/0x6e
[ 0.280000] [<c1148aa1>] bus_probe_device+0x1b/0x30
[ 0.280000] [<c1147b6c>] device_add+0x310/0x458
[ 0.280000] [<c10f96ac>] pci_bus_add_device+0xf/0x30
[ 0.280000] [<c10f96f0>] pci_bus_add_devices+0x23/0xdd
[ 0.280000] [<c11c011b>] ? acpi_pci_root_add+0x1cf/0x1ff
[ 0.280000] [<c11088a3>] acpi_pci_root_start+0x11/0x15
[ 0.280000] [<c1106370>] acpi_start_single_object+0x1e/0x3f
[ 0.280000] [<c11064a9>] acpi_device_probe+0x78/0xf4
[ 0.280000] [<c1149632>] driver_probe_device+0x87/0x107
[ 0.280000] [<c11496f9>] __driver_attach+0x47/0x63
[ 0.280000] [<c1148e36>] bus_for_each_dev+0x3d/0x67
[ 0.280000] [<c11494fb>] driver_attach+0x14/0x16
[ 0.280000] [<c11496b2>] ? __driver_attach+0x0/0x63
[ 0.280000] [<c11491ed>] bus_add_driver+0x92/0x1c5
[ 0.280000] [<c1319798>] ? acpi_pci_root_init+0x0/0x25
[ 0.280000] [<c114993b>] driver_register+0x79/0xe0
[ 0.280000] [<c1319798>] ? acpi_pci_root_init+0x0/0x25
[ 0.280000] [<c1106d32>] acpi_bus_register_driver+0x3a/0x3c
[ 0.280000] [<c13197ae>] acpi_pci_root_init+0x16/0x25
[ 0.280000] [<c1001139>] do_one_initcall+0x4c/0x136
[ 0.280000] [<c130130b>] kernel_init+0x11c/0x16d
[ 0.280000] [<c13011ef>] ? kernel_init+0x0/0x16d
[ 0.280000] [<c1002cba>] kernel_thread_helper+0x6/0x10
[ 0.328206] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15)
[ 0.329223] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)


> Then there was a problem were we could lock all child devices while
> holding the parent device lock (forgot why though), this would, on
> taking the second child dev->lock, again lead to recursive lock
> warnings.

AFAIK, the code that used to do this is no longer present. There may
be other places where it is still done, but I'm not aware of any.

However in view of the other difficulties, it still doesn't seem
possible to make device mutexes work with lockdep. I suggest removing
Thomas's patch.

Alan Stern

2010-02-04 16:44:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 2010-02-04 at 11:35 -0500, Alan Stern wrote:
> Greg:
>
> You have accepted Thomas's patch "drivers/base: Convert dev->sem to
> mutex". It generates lockdep violations galore during device probing
> and removal! Luckily lockdep is smart enough only to print the first
> occurrence. Here's what I get early on during bootup:

<snip lockdep splat>

> On Thu, 4 Feb 2010, Peter Zijlstra wrote:
>
> > The device tree had the problem that we could basically hold a device
> > lock and an unspecified number of parent locks (iirc this was due to
> > device probing, where we hold the bus lock while probing/adding child
> > device, recursively).
> >
> > If we place each dev->lock into the same class (which would naively
> > happen), then this would lead to recursive lock warnings. The proposed
> > solution for this is to create MAX_LOCK_DEPTH classes and assign them to
> > the dev->lock depending on the depth in the device tree (Alan said that
> > MAX_LOCK_DEPTH is sufficient for all practical cases).
> >
> > static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
> >
> > device_add() or thereabouts would have something like:
> >
> > #ifdef CONFIG_PROVE_LOCKING
> > BUG_ON(dev->depth >= MAX_LOCK_DEPTH);
> > lockdep_set_class(dev->lock, &dev_tree_classes[dev->depth]);
> > #endif
>
> Unfortunately this doesn't really work. Here is a patch implementing
> the scheme:
>
> Index: usb-2.6/drivers/base/core.c
> ===================================================================
> --- usb-2.6.orig/drivers/base/core.c
> +++ usb-2.6/drivers/base/core.c
> @@ -22,6 +22,7 @@
> #include <linux/kallsyms.h>
> #include <linux/mutex.h>
> #include <linux/async.h>
> +#include <linux/sched.h>
>
> #include "base.h"
> #include "power/power.h"
> @@ -671,6 +672,26 @@ static void setup_parent(struct device *
> dev->kobj.parent = kobj;
> }
>
> +#ifdef CONFIG_PROVE_LOCKING
> +static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
> +
> +static void setup_mutex_depth(struct device *dev, struct device *parent)
> +{
> + int depth = 0;
> +
> + /* Dynamically determine the device's depth in the device tree */
> + while (parent) {
> + ++depth;
> + parent = parent->parent;
> + }
> + BUG_ON(depth > MAX_LOCK_DEPTH);
> + lockdep_set_class(&dev->mutex, &dev_tree_classes[depth]);
> +}
> +#else
> +static inline void setup_mutex_depth(struct device *dev,
> + struct device *parent) {}
> +#endif
> +
> static int device_add_class_symlinks(struct device *dev)
> {
> int error;
> @@ -912,6 +933,7 @@ int device_add(struct device *dev)
>
> parent = get_device(dev->parent);
> setup_parent(dev, parent);
> + setup_mutex_depth(dev, parent);
>
> /* use parent numa_node */
> if (parent)
>
>
> This doesn't address the fact that we really have multiple device trees
> (for example, class devices are handled separately from normal
> devices). With the above patch installed, I still get lockdep
> violations farther on during boot:

<snip lockdep splat>

Hmm, so you have multiple interacting trees? I had understood you only
had a single device tree. So how many trees are there, is that fixed?
Does the device know what tree it is going to end up in?

If yes, then you can extend the setup_mutex_depth() function to pick a
different class stack for each tree.

2010-02-04 16:46:03

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, Feb 04, 2010 at 11:35:28AM -0500, Alan Stern wrote:
> Greg:
>
> You have accepted Thomas's patch "drivers/base: Convert dev->sem to
> mutex". It generates lockdep violations galore during device probing
> and removal! Luckily lockdep is smart enough only to print the first
> occurrence. Here's what I get early on during bootup:

Ugh, I forgot this is why we haven't done this yet.

Thomas, you didn't see these warnings?

> [ 0.149911] ACPI: EC: Look up EC in DSDT
> [ 0.170665] ACPI: Executed 1 blocks of module-level executable AML code
> [ 0.198111] ACPI: Interpreter enabled
> [ 0.198267] ACPI: (supports S0 S3 S4 S5)
> [ 0.198802] ACPI: Using IOAPIC for interrupt routing
> [ 0.266493]
> [ 0.266496] =============================================
> [ 0.266775] [ INFO: possible recursive locking detected ]
> [ 0.266917] 2.6.33-rc6 #1
> [ 0.267051] ---------------------------------------------
> [ 0.267192] swapper/1 is trying to acquire lock:
> [ 0.267332] (&dev->mutex){+.+...}, at: [<c11496be>] __driver_attach+0x38/0x63
> [ 0.267683]
> [ 0.267685] but task is already holding lock:
> [ 0.267953] (&dev->mutex){+.+...}, at: [<c11496b2>] __driver_attach+0x2c/0x63
> [ 0.268000]
> [ 0.268000] other info that might help us debug this:
> [ 0.268000] 1 lock held by swapper/1:
> [ 0.268000] #0: (&dev->mutex){+.+...}, at: [<c11496b2>] __driver_attach+0x2c/0x63
> [ 0.268000]
> [ 0.268000] stack backtrace:
> [ 0.268000] Pid: 1, comm: swapper Not tainted 2.6.33-rc6 #1
> [ 0.268000] Call Trace:
> [ 0.268000] [<c11c819e>] ? printk+0xf/0x11
> [ 0.268000] [<c1041c9b>] __lock_acquire+0x804/0xb47
> [ 0.268000] [<c10b2026>] ? sysfs_addrm_finish+0x19/0xe2
> [ 0.268000] [<c1042020>] lock_acquire+0x42/0x59
> [ 0.268000] [<c11496be>] ? __driver_attach+0x38/0x63
> [ 0.268000] [<c11c90c6>] __mutex_lock_common+0x39/0x38f
> [ 0.268000] [<c11496be>] ? __driver_attach+0x38/0x63
> [ 0.268000] [<c11c94ab>] mutex_lock_nested+0x2b/0x33
> [ 0.268000] [<c11496be>] ? __driver_attach+0x38/0x63
> [ 0.268000] [<c11496be>] __driver_attach+0x38/0x63
> [ 0.268000] [<c1148e0a>] bus_for_each_dev+0x3d/0x67
> [ 0.268000] [<c11494cf>] driver_attach+0x14/0x16
> [ 0.268000] [<c1149686>] ? __driver_attach+0x0/0x63
> [ 0.268000] [<c11491c1>] bus_add_driver+0x92/0x1c5
> [ 0.268000] [<c114990f>] driver_register+0x79/0xe0
> [ 0.268000] [<c1106d32>] acpi_bus_register_driver+0x3a/0x3c
> [ 0.268000] [<c131999f>] acpi_power_init+0x3f/0x5e
> [ 0.268000] [<c1319422>] acpi_init+0x28e/0x2c8
> [ 0.268000] [<c1319194>] ? acpi_init+0x0/0x2c8
> [ 0.268000] [<c1001139>] do_one_initcall+0x4c/0x136
> [ 0.268000] [<c130130b>] kernel_init+0x11c/0x16d
> [ 0.268000] [<c13011ef>] ? kernel_init+0x0/0x16d
> [ 0.268000] [<c1002cba>] kernel_thread_helper+0x6/0x10
> [ 0.268485] ACPI: Power Resource [GFAN] (on)
>
>
> On Thu, 4 Feb 2010, Peter Zijlstra wrote:
>
> > The device tree had the problem that we could basically hold a device
> > lock and an unspecified number of parent locks (iirc this was due to
> > device probing, where we hold the bus lock while probing/adding child
> > device, recursively).
> >
> > If we place each dev->lock into the same class (which would naively
> > happen), then this would lead to recursive lock warnings. The proposed
> > solution for this is to create MAX_LOCK_DEPTH classes and assign them to
> > the dev->lock depending on the depth in the device tree (Alan said that
> > MAX_LOCK_DEPTH is sufficient for all practical cases).
> >
> > static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
> >
> > device_add() or thereabouts would have something like:
> >
> > #ifdef CONFIG_PROVE_LOCKING
> > BUG_ON(dev->depth >= MAX_LOCK_DEPTH);
> > lockdep_set_class(dev->lock, &dev_tree_classes[dev->depth]);
> > #endif
>
> Unfortunately this doesn't really work. Here is a patch implementing
> the scheme:
>
> Index: usb-2.6/drivers/base/core.c
> ===================================================================
> --- usb-2.6.orig/drivers/base/core.c
> +++ usb-2.6/drivers/base/core.c
> @@ -22,6 +22,7 @@
> #include <linux/kallsyms.h>
> #include <linux/mutex.h>
> #include <linux/async.h>
> +#include <linux/sched.h>
>
> #include "base.h"
> #include "power/power.h"
> @@ -671,6 +672,26 @@ static void setup_parent(struct device *
> dev->kobj.parent = kobj;
> }
>
> +#ifdef CONFIG_PROVE_LOCKING
> +static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
> +
> +static void setup_mutex_depth(struct device *dev, struct device *parent)
> +{
> + int depth = 0;
> +
> + /* Dynamically determine the device's depth in the device tree */
> + while (parent) {
> + ++depth;
> + parent = parent->parent;
> + }
> + BUG_ON(depth > MAX_LOCK_DEPTH);
> + lockdep_set_class(&dev->mutex, &dev_tree_classes[depth]);
> +}
> +#else
> +static inline void setup_mutex_depth(struct device *dev,
> + struct device *parent) {}
> +#endif
> +
> static int device_add_class_symlinks(struct device *dev)
> {
> int error;
> @@ -912,6 +933,7 @@ int device_add(struct device *dev)
>
> parent = get_device(dev->parent);
> setup_parent(dev, parent);
> + setup_mutex_depth(dev, parent);
>
> /* use parent numa_node */
> if (parent)
>
>
> This doesn't address the fact that we really have multiple device trees
> (for example, class devices are handled separately from normal
> devices). With the above patch installed, I still get lockdep
> violations farther on during boot:
>
> [ 0.272332] pci_bus 0000:00: on NUMA node 0
> [ 0.272355] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> [ 0.273503] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
> [ 0.279205]
> [ 0.279208] =============================================
> [ 0.279485] [ INFO: possible recursive locking detected ]
> [ 0.279628] 2.6.33-rc6 #2
> [ 0.279763] ---------------------------------------------
> [ 0.279905] swapper/1 is trying to acquire lock:
> [ 0.280000] (&dev_tree_classes[depth]#2){+.+.+.}, at: [<c1149776>] device_attach+0x14/0x6e
> [ 0.280000]
> [ 0.280000] but task is already holding lock:
> [ 0.280000] (&dev_tree_classes[depth]#2){+.+.+.}, at: [<c11496de>] __driver_attach+0x2c/0x63
> [ 0.280000]
> [ 0.280000] other info that might help us debug this:
> [ 0.280000] 2 locks held by swapper/1:
> [ 0.280000] #0: (&dev_tree_classes[depth]#2){+.+.+.}, at: [<c11496de>] __driver_attach+0x2c/0x63
> [ 0.280000] #1: (&dev_tree_classes[depth]#3){+.+.+.}, at: [<c11496ea>] __driver_attach+0x38/0x63
> [ 0.280000]
> [ 0.280000] stack backtrace:
> [ 0.280000] Pid: 1, comm: swapper Not tainted 2.6.33-rc6 #2
> [ 0.280000] Call Trace:
> [ 0.280000] [<c11c81ce>] ? printk+0xf/0x11
> [ 0.280000] [<c1041c9b>] __lock_acquire+0x804/0xb47
> [ 0.280000] [<c101a73d>] ? spin_unlock_irqrestore+0x8/0xa
> [ 0.280000] [<c101a891>] ? __wake_up+0x32/0x3b
> [ 0.280000] [<c1042020>] lock_acquire+0x42/0x59
> [ 0.280000] [<c1149776>] ? device_attach+0x14/0x6e
> [ 0.280000] [<c11c90f6>] __mutex_lock_common+0x39/0x38f
> [ 0.280000] [<c1149776>] ? device_attach+0x14/0x6e
> [ 0.280000] [<c1040e2e>] ? trace_hardirqs_on+0xb/0xd
> [ 0.280000] [<c10ed5a7>] ? kobject_uevent_env+0x2e9/0x30a
> [ 0.280000] [<c10ed5a7>] ? kobject_uevent_env+0x2e9/0x30a
> [ 0.280000] [<c11c94db>] mutex_lock_nested+0x2b/0x33
> [ 0.280000] [<c1149776>] ? device_attach+0x14/0x6e
> [ 0.280000] [<c1149776>] device_attach+0x14/0x6e
> [ 0.280000] [<c1148aa1>] bus_probe_device+0x1b/0x30
> [ 0.280000] [<c1147b6c>] device_add+0x310/0x458
> [ 0.280000] [<c10f96ac>] pci_bus_add_device+0xf/0x30
> [ 0.280000] [<c10f96f0>] pci_bus_add_devices+0x23/0xdd
> [ 0.280000] [<c11c011b>] ? acpi_pci_root_add+0x1cf/0x1ff
> [ 0.280000] [<c11088a3>] acpi_pci_root_start+0x11/0x15
> [ 0.280000] [<c1106370>] acpi_start_single_object+0x1e/0x3f
> [ 0.280000] [<c11064a9>] acpi_device_probe+0x78/0xf4
> [ 0.280000] [<c1149632>] driver_probe_device+0x87/0x107
> [ 0.280000] [<c11496f9>] __driver_attach+0x47/0x63
> [ 0.280000] [<c1148e36>] bus_for_each_dev+0x3d/0x67
> [ 0.280000] [<c11494fb>] driver_attach+0x14/0x16
> [ 0.280000] [<c11496b2>] ? __driver_attach+0x0/0x63
> [ 0.280000] [<c11491ed>] bus_add_driver+0x92/0x1c5
> [ 0.280000] [<c1319798>] ? acpi_pci_root_init+0x0/0x25
> [ 0.280000] [<c114993b>] driver_register+0x79/0xe0
> [ 0.280000] [<c1319798>] ? acpi_pci_root_init+0x0/0x25
> [ 0.280000] [<c1106d32>] acpi_bus_register_driver+0x3a/0x3c
> [ 0.280000] [<c13197ae>] acpi_pci_root_init+0x16/0x25
> [ 0.280000] [<c1001139>] do_one_initcall+0x4c/0x136
> [ 0.280000] [<c130130b>] kernel_init+0x11c/0x16d
> [ 0.280000] [<c13011ef>] ? kernel_init+0x0/0x16d
> [ 0.280000] [<c1002cba>] kernel_thread_helper+0x6/0x10
> [ 0.328206] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 *10 11 12 14 15)
> [ 0.329223] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
>
>
> > Then there was a problem were we could lock all child devices while
> > holding the parent device lock (forgot why though), this would, on
> > taking the second child dev->lock, again lead to recursive lock
> > warnings.
>
> AFAIK, the code that used to do this is no longer present. There may
> be other places where it is still done, but I'm not aware of any.
>
> However in view of the other difficulties, it still doesn't seem
> possible to make device mutexes work with lockdep. I suggest removing
> Thomas's patch.

Unless Thomas or anyone else thinks of something that can solve these
problems, I will do so.

thanks,

greg k-h

2010-02-04 16:47:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 2010-02-04 at 11:35 -0500, Alan Stern wrote:
> +#include <linux/sched.h>

#include <linux/lockdep.h>

should be sufficient I think.

2010-02-04 17:06:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 4 Feb 2010, Greg KH wrote:
> > However in view of the other difficulties, it still doesn't seem
> > possible to make device mutexes work with lockdep. I suggest removing
> > Thomas's patch.
>
> Unless Thomas or anyone else thinks of something that can solve these
> problems, I will do so.

Hmm, I have not seen those lockdep splats on -rt where we have the
mutex conversion for quite some time already. Need to look at it.

Thanks,

tglx

2010-02-04 18:37:06

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 4 Feb 2010, Peter Zijlstra wrote:

> > This doesn't address the fact that we really have multiple device trees
> > (for example, class devices are handled separately from normal
> > devices). With the above patch installed, I still get lockdep
> > violations farther on during boot:
>
> <snip lockdep splat>
>
> Hmm, so you have multiple interacting trees? I had understood you only
> had a single device tree.

The real situation is kind of complicated, and I'm not familiar with
all the details. But it's certainly true that a driver will want to
work with (and lock!) multiple struct device's that don't have a
parent-child relation in the tree. The simplest example is regular
devices together with class devices, and another might be PCI devices
together with their "shadow" ACPI devices.

> So how many trees are there, is that fixed?
> Does the device know what tree it is going to end up in?

The driver generally knows, but AFAIK that information is not passed
back to the driver core. At least, not directly -- you might say that
it could be deduced from the parent pointer, assuming the core already
knows all about the parent.

> If yes, then you can extend the setup_mutex_depth() function to pick a
> different class stack for each tree.

Maybe this could be done. But for now perhaps a compromise is in
order. We could make the switch from semaphores to mutexes while
avoiding lockdep issues by assigning the device mutexes to a
"don't-verify" class. Is there such a thing, or could it be added?

Alan Stern

2010-02-04 18:40:18

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 4 Feb 2010, Peter Zijlstra wrote:

> On Thu, 2010-02-04 at 11:35 -0500, Alan Stern wrote:
> > +#include <linux/sched.h>
>
> #include <linux/lockdep.h>
>
> should be sufficient I think.

No, it's not. It leaves MAX_LOCK_DEPTH undeclared. Beats me why that
symbol ended up in sched.h...

Alan Stern

2010-02-05 03:07:42

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Alan Stern wrote:
> On Thu, 4 Feb 2010, Peter Zijlstra wrote:
>
>> On Thu, 2010-02-04 at 11:35 -0500, Alan Stern wrote:
>>> +#include <linux/sched.h>
>> #include <linux/lockdep.h>
>>
>> should be sufficient I think.
>
> No, it's not. It leaves MAX_LOCK_DEPTH undeclared. Beats me why that
> symbol ended up in sched.h...
>

because task_struct->held_locks uses it, but it seems better to move
it into lockdep.h...

2010-02-05 03:41:53

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Peter Zijlstra wrote:
> On Fri, 2010-01-29 at 12:30 -0800, Eric W. Biederman wrote:
>
>> We get false positives when the code of a sysfs attribute
>> synchronously removes other sysfs attributes. In general that is not
>> safe due to hotplug etc, but there are specific instances of static
>> sysfs entries like the pm_core where it appears to be safe.
>>
>> I am not familiar with the device core lockdep issues. Are they similar?
>
> The device tree had the problem that we could basically hold a device
> lock and an unspecified number of parent locks (iirc this was due to
> device probing, where we hold the bus lock while probing/adding child
> device, recursively).
>
> If we place each dev->lock into the same class (which would naively
> happen), then this would lead to recursive lock warnings. The proposed
> solution for this is to create MAX_LOCK_DEPTH classes and assign them to
> the dev->lock depending on the depth in the device tree (Alan said that
> MAX_LOCK_DEPTH is sufficient for all practical cases).
>
> static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
>
> device_add() or thereabouts would have something like:
>
> #ifdef CONFIG_PROVE_LOCKING
> BUG_ON(dev->depth >= MAX_LOCK_DEPTH);
> lockdep_set_class(dev->lock, &dev_tree_classes[dev->depth]);
> #endif
>
>

Nice explanation!

I see, we should set the class of the lock when after we holding it...
I will update my patch.

Thank you!

2010-02-05 04:06:57

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 5 Feb 2010, Cong Wang wrote:

> Alan Stern wrote:
> > On Thu, 4 Feb 2010, Peter Zijlstra wrote:
> >
> >> On Thu, 2010-02-04 at 11:35 -0500, Alan Stern wrote:
> >>> +#include <linux/sched.h>
> >> #include <linux/lockdep.h>
> >>
> >> should be sufficient I think.
> >
> > No, it's not. It leaves MAX_LOCK_DEPTH undeclared. Beats me why that
> > symbol ended up in sched.h...
> >
>
> because task_struct->held_locks uses it, but it seems better to move
> it into lockdep.h...

Be my guest! :-)

Alan Stern

2010-02-05 08:55:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Peter Zijlstra <[email protected]> writes:

> On Fri, 2010-01-29 at 12:30 -0800, Eric W. Biederman wrote:
>
>> We get false positives when the code of a sysfs attribute
>> synchronously removes other sysfs attributes. In general that is not
>> safe due to hotplug etc, but there are specific instances of static
>> sysfs entries like the pm_core where it appears to be safe.
>>
>> I am not familiar with the device core lockdep issues. Are they similar?
>
> The device tree had the problem that we could basically hold a device
> lock and an unspecified number of parent locks (iirc this was due to
> device probing, where we hold the bus lock while probing/adding child
> device, recursively).
>
> If we place each dev->lock into the same class (which would naively
> happen), then this would lead to recursive lock warnings. The proposed
> solution for this is to create MAX_LOCK_DEPTH classes and assign them to
> the dev->lock depending on the depth in the device tree (Alan said that
> MAX_LOCK_DEPTH is sufficient for all practical cases).
>
> static struct lock_class_key dev_tree_classes[MAX_LOCK_DEPTH];
>
> device_add() or thereabouts would have something like:
>
> #ifdef CONFIG_PROVE_LOCKING
> BUG_ON(dev->depth >= MAX_LOCK_DEPTH);
> lockdep_set_class(dev->lock, &dev_tree_classes[dev->depth]);
> #endif
>
>
> Then there was a problem were we could lock all child devices while
> holding the parent device lock (forgot why though), this would, on
> taking the second child dev->lock, again lead to recursive lock
> warnings.
>
> We have an annotation for that: lock_nest_lock (currently only
> spin_lock_nest_lock exists, but mutex_lock_nest_lock is easily created),
> and this would allow you to do things like:
>
> mutex_lock(&parent->lock);
> for_each_device_child(child, parent) {
> mutex_lock_nest_lock(&child->lock, &parent->lock);
> ...
> }
>
>
> I hope this helps in figuring out the sysfs case..

Interesting. The sysfs attributes in general don't have this problem
because the common case is effectively a reader/writer lock where we
can have all of the readers we want.

In the sysfs case the classic problem is when an attribute grabs a
lock and that lock is also held while the attribute is being deleted.
I first found this with the rtnl_lock but there are other cases
as well.

The particularly tricky case to handle in lockdep is when that
other lock also the s_active lock from another part of the tree. There
is an entire Alice style rabbit whole there, I am trying to figure out.

Eric

2010-02-05 10:19:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 2010-02-04 at 13:37 -0500, Alan Stern wrote:
> On Thu, 4 Feb 2010, Peter Zijlstra wrote:
>
> > > This doesn't address the fact that we really have multiple device trees
> > > (for example, class devices are handled separately from normal
> > > devices). With the above patch installed, I still get lockdep
> > > violations farther on during boot:
> >
> > <snip lockdep splat>
> >
> > Hmm, so you have multiple interacting trees? I had understood you only
> > had a single device tree.
>
> The real situation is kind of complicated, and I'm not familiar with
> all the details. But it's certainly true that a driver will want to
> work with (and lock!) multiple struct device's that don't have a
> parent-child relation in the tree. The simplest example is regular
> devices together with class devices, and another might be PCI devices
> together with their "shadow" ACPI devices.
>
> > So how many trees are there, is that fixed?
> > Does the device know what tree it is going to end up in?
>
> The driver generally knows, but AFAIK that information is not passed
> back to the driver core. At least, not directly -- you might say that
> it could be deduced from the parent pointer, assuming the core already
> knows all about the parent.
>
> > If yes, then you can extend the setup_mutex_depth() function to pick a
> > different class stack for each tree.
>
> Maybe this could be done.

Right, so this device stuff is much more complicated than I was led to
believe ;-)

So the device core doesn't know, so how are you guys making sure there
really are no deadlocks hidden in there somewhere?

> But for now perhaps a compromise is in
> order. We could make the switch from semaphores to mutexes while
> avoiding lockdep issues by assigning the device mutexes to a
> "don't-verify" class. Is there such a thing, or could it be added?

Something like the below might work, but it should go along with a
checkpatch.pl mod to ensure we don't grow any new users (just don't feel
like brushing up my perl fu enough to actually make sense of that
script)

---
include/linux/lockdep.h | 2 ++
kernel/lockdep.c | 5 +++++
2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 9ccf0e2..4e30ab4 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -40,6 +40,8 @@ struct lock_class_key {
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};

+extern struct lock_class_key __lockdep_no_validate__;
+
#define LOCKSTAT_POINTS 4

/*
diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index c62ec14..af65a34 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2716,6 +2716,8 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
}
EXPORT_SYMBOL_GPL(lockdep_init_map);

+struct lock_class_key __lockdep_no_validate__;
+
/*
* This gets called for every mutex_lock*()/spin_lock*() operation.
* We maintain the dependency maps and validate the locking attempt:
@@ -2750,6 +2752,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
return 0;
}

+ if (lock->key == &__lockdep_no_validate__)
+ check = 1;
+
if (!subclass)
class = lock->class_cache;
/*

2010-02-05 15:31:00

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 5 Feb 2010, Peter Zijlstra wrote:

> Right, so this device stuff is much more complicated than I was led to
> believe ;-)

Haven't I told you all along that tree-structured locking is
complicated? :-)

> So the device core doesn't know, so how are you guys making sure there
> really are no deadlocks hidden in there somewhere?

In the code I've seen, deadlocks are avoided by always taking the locks
in the same order. But who knows? Maybe there _are_ some hidden
deadlocks lurking. For now we can't rely on lockdep to find them,
though, because it gets sidetracked by all the false positives.

> > But for now perhaps a compromise is in
> > order. We could make the switch from semaphores to mutexes while
> > avoiding lockdep issues by assigning the device mutexes to a
> > "don't-verify" class. Is there such a thing, or could it be added?
>
> Something like the below might work, but it should go along with a
> checkpatch.pl mod to ensure we don't grow any new users (just don't feel
> like brushing up my perl fu enough to actually make sense of that
> script)

I'll try it out in the next few days, and if it looks good maybe the
checkpatch maintainers can lend some assistance before it gets
submitted.

Alan Stern

2010-02-05 15:43:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 2010-02-05 at 10:30 -0500, Alan Stern wrote:
> On Fri, 5 Feb 2010, Peter Zijlstra wrote:
>
> > Right, so this device stuff is much more complicated than I was led to
> > believe ;-)
>
> Haven't I told you all along that tree-structured locking is
> complicated? :-)

Well, regular tree's aren't all that complicated, but multiple
inter-locking trees is a whole different story indeed.

2010-02-07 09:22:36

by Dave Young

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, Feb 05, 2010 at 04:41:57PM +0100, Peter Zijlstra wrote:
> On Fri, 2010-02-05 at 10:30 -0500, Alan Stern wrote:
> > On Fri, 5 Feb 2010, Peter Zijlstra wrote:
> >
> > > Right, so this device stuff is much more complicated than I was led to
> > > believe ;-)
> >
> > Haven't I told you all along that tree-structured locking is
> > complicated? :-)
>
> Well, regular tree's aren't all that complicated, but multiple
> inter-locking trees is a whole different story indeed.
>

I ever tried converting device semaphore to mutex, but failed with same issue.

At least now there's no lockdep solution for it, so I recommend revert
the mutex converting patch.

following lockdep warning with rc6-mm1:

[ 0.397123]
[ 0.397124] =============================================
[ 0.397359] [ INFO: possible recursive locking detected ]
[ 0.397480] 2.6.33-rc6-mm1 #1
[ 0.397596] ---------------------------------------------
[ 0.397717] swapper/1 is trying to acquire lock:
[ 0.397836] (&dev->mutex){+.+...}, at: [<c12662e4>] __driver_attach+0x38/0x63
[ 0.398162]
[ 0.398162] but task is already holding lock:
[ 0.398393] (&dev->mutex){+.+...}, at: [<c12662d8>] __driver_attach+0x2c/0x63
[ 0.399999]
[ 0.399999] other info that might help us debug this:
[ 0.399999] 1 lock held by swapper/1:
[ 0.399999] #0: (&dev->mutex){+.+...}, at: [<c12662d8>] __driver_attach+0x2c/0x63
[ 0.399999]
[ 0.399999] stack backtrace:
[ 0.399999] Pid: 1, comm: swapper Not tainted 2.6.33-rc6-mm1 #1
[ 0.399999] Call Trace:
[ 0.399999] [<c13d5851>] ? printk+0xf/0x11
[ 0.399999] [<c105460c>] __lock_acquire+0x880/0xc0d
[ 0.399999] [<c13d66e4>] ? __mutex_unlock_slowpath+0xf0/0xff
[ 0.399999] [<c1054a34>] lock_acquire+0x9b/0xb8
[ 0.399999] [<c12662e4>] ? __driver_attach+0x38/0x63
[ 0.399999] [<c13d685c>] __mutex_lock_common+0x35/0x2f2
[ 0.399999] [<c12662e4>] ? __driver_attach+0x38/0x63
[ 0.399999] [<c13d6bb7>] mutex_lock_nested+0x30/0x38
[ 0.399999] [<c12662e4>] ? __driver_attach+0x38/0x63
[ 0.399999] [<c12662e4>] __driver_attach+0x38/0x63
[ 0.399999] [<c1265a43>] bus_for_each_dev+0x3d/0x67
[ 0.399999] [<c1265f8f>] driver_attach+0x14/0x16
[ 0.399999] [<c12662ac>] ? __driver_attach+0x0/0x63
[ 0.399999] [<c126541c>] bus_add_driver+0xc4/0x20c
[ 0.399999] [<c1266553>] driver_register+0x8b/0xeb
[ 0.399999] [<c11d8e3a>] acpi_bus_register_driver+0x3a/0x3c
[ 0.399999] [<c165dc9a>] acpi_ec_init+0x2b/0x4a
[ 0.399999] [<c165daf8>] acpi_init+0x289/0x2c8
[ 0.399999] [<c165d86f>] ? acpi_init+0x0/0x2c8
[ 0.399999] [<c1001139>] do_one_initcall+0x4c/0x13f
[ 0.399999] [<c163d315>] kernel_init+0x123/0x174
[ 0.399999] [<c163d1f2>] ? kernel_init+0x0/0x174
[ 0.399999] [<c10035ba>] kernel_thread_helper+0x6/0x10
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-02-08 03:06:20

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Alan Stern wrote:
> On Fri, 5 Feb 2010, Peter Zijlstra wrote:
>
>> Right, so this device stuff is much more complicated than I was led to
>> believe ;-)
>
> Haven't I told you all along that tree-structured locking is
> complicated? :-)
>
>> So the device core doesn't know, so how are you guys making sure there
>> really are no deadlocks hidden in there somewhere?
>
> In the code I've seen, deadlocks are avoided by always taking the locks
> in the same order. But who knows? Maybe there _are_ some hidden
> deadlocks lurking. For now we can't rely on lockdep to find them,
> though, because it gets sidetracked by all the false positives.
>

This is almost the same with the sysfs case...

Thanks.

2010-02-08 03:06:57

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Dave Young wrote:
> On Fri, Feb 05, 2010 at 04:41:57PM +0100, Peter Zijlstra wrote:
>> On Fri, 2010-02-05 at 10:30 -0500, Alan Stern wrote:
>>> On Fri, 5 Feb 2010, Peter Zijlstra wrote:
>>>
>>>> Right, so this device stuff is much more complicated than I was led to
>>>> believe ;-)
>>> Haven't I told you all along that tree-structured locking is
>>> complicated? :-)
>> Well, regular tree's aren't all that complicated, but multiple
>> inter-locking trees is a whole different story indeed.
>>
>
> I ever tried converting device semaphore to mutex, but failed with same issue.
>
> At least now there's no lockdep solution for it, so I recommend revert
> the mutex converting patch.
>
> following lockdep warning with rc6-mm1:
>
> [ 0.397123]
> [ 0.397124] =============================================
> [ 0.397359] [ INFO: possible recursive locking detected ]
> [ 0.397480] 2.6.33-rc6-mm1 #1
> [ 0.397596] ---------------------------------------------
> [ 0.397717] swapper/1 is trying to acquire lock:
> [ 0.397836] (&dev->mutex){+.+...}, at: [<c12662e4>] __driver_attach+0x38/0x63
> [ 0.398162]
> [ 0.398162] but task is already holding lock:
> [ 0.398393] (&dev->mutex){+.+...}, at: [<c12662d8>] __driver_attach+0x2c/0x63
> [ 0.399999]

Alan already provided a patch for this issue earlier in this thread.

Thanks.

2010-02-08 03:14:56

by Dave Young

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Mon, Feb 8, 2010 at 11:08 AM, Cong Wang <[email protected]> wrote:
> Dave Young wrote:
>>
>> On Fri, Feb 05, 2010 at 04:41:57PM +0100, Peter Zijlstra wrote:
>>>
>>> On Fri, 2010-02-05 at 10:30 -0500, Alan Stern wrote:
>>>>
>>>> On Fri, 5 Feb 2010, Peter Zijlstra wrote:
>>>>
>>>>> Right, so this device stuff is much more complicated than I was led to
>>>>> believe ;-)
>>>>
>>>> Haven't I told you all along that tree-structured locking is
>>>> complicated?  :-)
>>>
>>> Well, regular tree's aren't all that complicated, but multiple
>>> inter-locking trees is a whole different story indeed.
>>>
>>
>> I ever tried converting device semaphore to mutex, but failed with same
>> issue.
>>
>> At least now there's no lockdep solution for it, so I recommend revert
>> the mutex converting patch.
>>
>> following lockdep warning with rc6-mm1:
>>
>> [    0.397123] [    0.397124]
>> =============================================
>> [    0.397359] [ INFO: possible recursive locking detected ]
>> [    0.397480] 2.6.33-rc6-mm1 #1
>> [    0.397596] ---------------------------------------------
>> [    0.397717] swapper/1 is trying to acquire lock:
>> [    0.397836]  (&dev->mutex){+.+...}, at: [<c12662e4>]
>> __driver_attach+0x38/0x63
>> [    0.398162] [    0.398162] but task is already holding lock:
>> [    0.398393]  (&dev->mutex){+.+...}, at: [<c12662d8>]
>> __driver_attach+0x2c/0x63
>> [    0.399999]
>
> Alan already provided a patch for this issue earlier in this thread.

Yes, but device locks can not be classified with regular tree style.
Please read the whole thread.

>
> Thanks.
>
>



--
Regards
dave

2010-02-08 03:28:30

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

Dave Young wrote:
> On Mon, Feb 8, 2010 at 11:08 AM, Cong Wang <[email protected]> wrote:
>> Dave Young wrote:
>>> On Fri, Feb 05, 2010 at 04:41:57PM +0100, Peter Zijlstra wrote:
>>>> On Fri, 2010-02-05 at 10:30 -0500, Alan Stern wrote:
>>>>> On Fri, 5 Feb 2010, Peter Zijlstra wrote:
>>>>>
>>>>>> Right, so this device stuff is much more complicated than I was led to
>>>>>> believe ;-)
>>>>> Haven't I told you all along that tree-structured locking is
>>>>> complicated? :-)
>>>> Well, regular tree's aren't all that complicated, but multiple
>>>> inter-locking trees is a whole different story indeed.
>>>>
>>> I ever tried converting device semaphore to mutex, but failed with same
>>> issue.
>>>
>>> At least now there's no lockdep solution for it, so I recommend revert
>>> the mutex converting patch.
>>>
>>> following lockdep warning with rc6-mm1:
>>>
>>> [ 0.397123] [ 0.397124]
>>> =============================================
>>> [ 0.397359] [ INFO: possible recursive locking detected ]
>>> [ 0.397480] 2.6.33-rc6-mm1 #1
>>> [ 0.397596] ---------------------------------------------
>>> [ 0.397717] swapper/1 is trying to acquire lock:
>>> [ 0.397836] (&dev->mutex){+.+...}, at: [<c12662e4>]
>>> __driver_attach+0x38/0x63
>>> [ 0.398162] [ 0.398162] but task is already holding lock:
>>> [ 0.398393] (&dev->mutex){+.+...}, at: [<c12662d8>]
>>> __driver_attach+0x2c/0x63
>>> [ 0.399999]
>> Alan already provided a patch for this issue earlier in this thread.
>
> Yes, but device locks can not be classified with regular tree style.

True, Alan mentioned the device trees could be more than one,
which is the difference with the sysfs, I think, where we only
have one tree.

> Please read the whole thread.

Surely I did.

2010-02-08 15:38:56

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 5 Feb 2010, Peter Zijlstra wrote:

> > But for now perhaps a compromise is in
> > order. We could make the switch from semaphores to mutexes while
> > avoiding lockdep issues by assigning the device mutexes to a
> > "don't-verify" class. Is there such a thing, or could it be added?
>
> Something like the below might work, but it should go along with a
> checkpatch.pl mod to ensure we don't grow any new users (just don't feel
> like brushing up my perl fu enough to actually make sense of that
> script)

I tried this out and it does work; no more lockdep warnings during
bootup. The complete patch is below.

But before we apply this, it might be worthwhile to spend a little time
figuring out how many independent device trees there really are and
whether they can be handled with separate depth-associated lockdep
classes. It's not at all obvious that they can.

Alan Stern


Index: usb-2.6/include/linux/lockdep.h
===================================================================
--- usb-2.6.orig/include/linux/lockdep.h
+++ usb-2.6/include/linux/lockdep.h
@@ -40,6 +40,8 @@ struct lock_class_key {
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};

+extern struct lock_class_key __lockdep_no_validate__;
+
#define LOCKSTAT_POINTS 4

/*
Index: usb-2.6/kernel/lockdep.c
===================================================================
--- usb-2.6.orig/kernel/lockdep.c
+++ usb-2.6/kernel/lockdep.c
@@ -2716,6 +2716,8 @@ void lockdep_init_map(struct lockdep_map
}
EXPORT_SYMBOL_GPL(lockdep_init_map);

+struct lock_class_key __lockdep_no_validate__;
+
/*
* This gets called for every mutex_lock*()/spin_lock*() operation.
* We maintain the dependency maps and validate the locking attempt:
@@ -2750,6 +2752,9 @@ static int __lock_acquire(struct lockdep
return 0;
}

+ if (lock->key == &__lockdep_no_validate__)
+ check = 1;
+
if (!subclass)
class = lock->class_cache;
/*
Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -559,6 +559,7 @@ void device_initialize(struct device *de
kobject_init(&dev->kobj, &device_ktype);
INIT_LIST_HEAD(&dev->dma_pools);
mutex_init(&dev->mutex);
+ lockdep_set_class(&dev->mutex, &__lockdep_no_validate__);
spin_lock_init(&dev->devres_lock);
INIT_LIST_HEAD(&dev->devres_head);
device_init_wakeup(dev, 0);

2010-02-26 19:36:35

by Alan Stern

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Thu, 4 Feb 2010, Thomas Gleixner wrote:

> On Thu, 4 Feb 2010, Greg KH wrote:
> > > However in view of the other difficulties, it still doesn't seem
> > > possible to make device mutexes work with lockdep. I suggest removing
> > > Thomas's patch.
> >
> > Unless Thomas or anyone else thinks of something that can solve these
> > problems, I will do so.
>
> Hmm, I have not seen those lockdep splats on -rt where we have the
> mutex conversion for quite some time already. Need to look at it.

Thomas, did you ever figure out what was going on? It would be nice to
know why the -rt kernels don't have any lockdep problems associated
with the device mutexes.

Alan Stern

2010-02-26 20:55:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Patch 0/2] sysfs: fix s_active lockdep warning

On Fri, 26 Feb 2010, Alan Stern wrote:
> On Thu, 4 Feb 2010, Thomas Gleixner wrote:
>
> > On Thu, 4 Feb 2010, Greg KH wrote:
> > > > However in view of the other difficulties, it still doesn't seem
> > > > possible to make device mutexes work with lockdep. I suggest removing
> > > > Thomas's patch.
> > >
> > > Unless Thomas or anyone else thinks of something that can solve these
> > > problems, I will do so.
> >
> > Hmm, I have not seen those lockdep splats on -rt where we have the
> > mutex conversion for quite some time already. Need to look at it.
>
> Thomas, did you ever figure out what was going on? It would be nice to
> know why the -rt kernels don't have any lockdep problems associated
> with the device mutexes.

Yes, it just did not happen on the systems which I tested with lockdep
enabled, but I found a box where it triggered.

Sorry should have tested better. :(

Thanks,

tglx