2010-02-08 09:53:36

by Cong Wang

[permalink] [raw]
Subject: [Patch 1/2] sysfs: add lockdep class support to s_active

Recently we met a lockdep warning from sysfs during s2ram.
As reported by several people, it is something like:

[ 6967.926563] ACPI: Preparing to enter system sleep state S3
[ 6967.956156] Disabling non-boot CPUs ...
[ 6967.970401]
[ 6967.970408] =============================================
[ 6967.970419] [ INFO: possible recursive locking detected ]
[ 6967.970431] 2.6.33-rc2-git6 #27
[ 6967.970439] ---------------------------------------------
[ 6967.970450] pm-suspend/22147 is trying to acquire lock:
[ 6967.970460] (s_active){++++.+}, at: [<c10d2941>]
sysfs_hash_and_remove+0x3d/0x4f
[ 6967.970493]
[ 6967.970497] but task is already holding lock:
[ 6967.970506] (s_active){++++.+}, at: [<c10d4110>]
sysfs_get_active_two+0x16/0x36
[...]

Eric already provides a patch for this[1], but it still can't fix the
problem. Based on his work and Peter's suggestion, I write this patch,
hopefully we can fix the warning completely.

This patch put sysfs s_active into two classes, one is for PM, the other
is for the rest, so lockdep will distinguish them.

Still, using a workqueue to do the cleaning work is another choice,
as pointed by Eric. But not sure if it's better than this approach,
this depends on if we want to eliminate all the similar cases hold
the same class of locks, or just eliminate this one case. Please
comment.

I tested this patch, it fixes the problem.

1. http://lkml.org/lkml/2010/1/10/282


Reported-by: Larry Finger <[email protected]>
Reported-by: Miles Lane <[email protected]>
Reported-by: Heiko Carstens <[email protected]>
Signed-off-by: WANG Cong <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>

---
fs/sysfs/dir.c | 1 -
fs/sysfs/file.c | 7 +++++++
fs/sysfs/sysfs.h | 11 -----------
include/linux/sysfs.h | 7 +++++++
kernel/power/power.h | 15 ++++++++-------
5 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 699f371..d7de269 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -354,7 +354,6 @@ struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type)

atomic_set(&sd->s_count, 1);
atomic_set(&sd->s_active, 0);
- sysfs_dirent_init_lockdep(sd);

sd->s_name = name;
sd->s_mode = mode;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index dc30d9e..97e397a 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -24,6 +24,8 @@

#include "sysfs.h"

+static struct lock_class_key sysfs_classes[SYSFS_NR_CLASSES];
+
/* used in crash dumps to help with debugging */
static char last_sysfs_file[PATH_MAX];
void sysfs_printk_last_file(void)
@@ -504,11 +506,16 @@ int sysfs_add_file_mode(struct sysfs_dirent *dir_sd,
struct sysfs_addrm_cxt acxt;
struct sysfs_dirent *sd;
int rc;
+ int class;

sd = sysfs_new_dirent(attr->name, mode, type);
if (!sd)
return -ENOMEM;
sd->s_attr.attr = (void *)attr;
+ class = SYSFS_ATTR_NORMAL;
+ if (sysfs_type(sd) == SYSFS_KOBJ_ATTR)
+ class = sd->s_attr.attr->class;
+ lockdep_set_class_and_name(sd, &sysfs_classes[class], "s_active");

sysfs_addrm_start(&acxt, dir_sd);
rc = sysfs_add_one(&acxt, sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index cdd9377..dde4d73 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -88,17 +88,6 @@ static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
return sd->s_flags & SYSFS_TYPE_MASK;
}

-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define sysfs_dirent_init_lockdep(sd) \
-do { \
- static struct lock_class_key __key; \
- \
- lockdep_init_map(&sd->dep_map, "s_active", &__key, 0); \
-} while(0)
-#else
-#define sysfs_dirent_init_lockdep(sd) do {} while(0)
-#endif
-
/*
* Context structure to be used while adding/removing nodes.
*/
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index cfa8308..2b91b74 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -20,6 +20,12 @@
struct kobject;
struct module;

+enum sysfs_attr_lock_class {
+ SYSFS_ATTR_NORMAL,
+ SYSFS_ATTR_PM_CONTROL,
+ SYSFS_NR_CLASSES,
+};
+
/* FIXME
* The *owner field is no longer used.
* x86 tree has been cleaned up. The owner
@@ -29,6 +35,7 @@ struct attribute {
const char *name;
struct module *owner;
mode_t mode;
+ enum sysfs_attr_lock_class class;
};

struct attribute_group {
diff --git a/kernel/power/power.h b/kernel/power/power.h
index 46c5a26..67a6fe7 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -54,13 +54,14 @@ extern int hibernation_platform_enter(void);
extern int pfn_is_nosave(unsigned long);

#define power_attr(_name) \
-static struct kobj_attribute _name##_attr = { \
- .attr = { \
- .name = __stringify(_name), \
- .mode = 0644, \
- }, \
- .show = _name##_show, \
- .store = _name##_store, \
+static struct kobj_attribute _name##_attr = { \
+ .attr = { \
+ .name = __stringify(_name), \
+ .mode = 0644, \
+ .class = SYSFS_ATTR_PM_CONTROL, \
+ }, \
+ .show = _name##_show, \
+ .store = _name##_store, \
}

/* Preferred image size in bytes (default 500 MB) */
--
1.5.5.6


2010-02-08 09:54:09

by Cong Wang

[permalink] [raw]
Subject: [Patch 2/2] block: add sysfs lockdep class for iosched


Similar to the previous PM case, in iosched, we hold an s_active
lock to store "scheduler", meanwhile we want to remove "iosched/*"
files.

This patch depends on the previous one. I tested it on my machine,
it fixes the problem.

Reported-by: Hugh Dickins <[email protected]>
Signed-off-by: WANG Cong <[email protected]>
Cc: Jens Axboe <[email protected]>

---
block/blk-sysfs.c | 120 +++++++++++++++----------------------------------
include/linux/sysfs.h | 1 +
2 files changed, 38 insertions(+), 83 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 8606c95..f863d4d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -6,6 +6,7 @@
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/blktrace_api.h>
+#include <linux/sysfs.h>

#include "blk.h"

@@ -254,105 +255,58 @@ static ssize_t queue_iostats_store(struct request_queue *q, const char *page,
return ret;
}

-static struct queue_sysfs_entry queue_requests_entry = {
- .attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
- .show = queue_requests_show,
- .store = queue_requests_store,
-};
-
-static struct queue_sysfs_entry queue_ra_entry = {
- .attr = {.name = "read_ahead_kb", .mode = S_IRUGO | S_IWUSR },
- .show = queue_ra_show,
- .store = queue_ra_store,
-};
+#define queue_sysfs_rw_attr(_name, _filename) \
+static struct queue_sysfs_entry _name##_entry = { \
+ .attr = { \
+ .name = _filename, \
+ .mode = S_IRUGO | S_IWUSR, \
+ .class = SYSFS_ATTR_IOSCHED, \
+ }, \
+ .show = _name##_show, \
+ .store = _name##_store, \
+}

-static struct queue_sysfs_entry queue_max_sectors_entry = {
- .attr = {.name = "max_sectors_kb", .mode = S_IRUGO | S_IWUSR },
- .show = queue_max_sectors_show,
- .store = queue_max_sectors_store,
-};
+#define queue_sysfs_ro_attr(_name, _filename) \
+static struct queue_sysfs_entry _name##_entry = { \
+ .attr = { \
+ .name = _filename, \
+ .mode = S_IRUGO, \
+ .class = SYSFS_ATTR_IOSCHED, \
+ }, \
+ .show = _name##_show, \
+}

-static struct queue_sysfs_entry queue_max_hw_sectors_entry = {
- .attr = {.name = "max_hw_sectors_kb", .mode = S_IRUGO },
- .show = queue_max_hw_sectors_show,
-};

-static struct queue_sysfs_entry queue_iosched_entry = {
- .attr = {.name = "scheduler", .mode = S_IRUGO | S_IWUSR },
- .show = elv_iosched_show,
- .store = elv_iosched_store,
-};
+queue_sysfs_rw_attr(queue_requests, "nr_requests");
+queue_sysfs_rw_attr(queue_ra, "read_ahead_kb");
+queue_sysfs_rw_attr(queue_max_sectors, "max_sectors_kb");
+queue_sysfs_ro_attr(queue_max_hw_sectors, "max_hw_sectors_kb");
+queue_sysfs_rw_attr(elv_iosched, "scheduler");
+queue_sysfs_ro_attr(queue_logical_block_size, "logical_block_size");

static struct queue_sysfs_entry queue_hw_sector_size_entry = {
.attr = {.name = "hw_sector_size", .mode = S_IRUGO },
.show = queue_logical_block_size_show,
};

-static struct queue_sysfs_entry queue_logical_block_size_entry = {
- .attr = {.name = "logical_block_size", .mode = S_IRUGO },
- .show = queue_logical_block_size_show,
-};
-
-static struct queue_sysfs_entry queue_physical_block_size_entry = {
- .attr = {.name = "physical_block_size", .mode = S_IRUGO },
- .show = queue_physical_block_size_show,
-};
+queue_sysfs_ro_attr(queue_physical_block_size, "physical_block_size");
+queue_sysfs_ro_attr(queue_io_min, "minimum_io_size");
+queue_sysfs_ro_attr(queue_io_opt, "optimal_io_size");
+queue_sysfs_ro_attr(queue_discard_granularity, "discard_granularity");
+queue_sysfs_ro_attr(queue_discard_max, "discard_max_bytes");
+queue_sysfs_ro_attr(queue_discard_zeroes_data, "discard_zeroes_data");

-static struct queue_sysfs_entry queue_io_min_entry = {
- .attr = {.name = "minimum_io_size", .mode = S_IRUGO },
- .show = queue_io_min_show,
-};
-
-static struct queue_sysfs_entry queue_io_opt_entry = {
- .attr = {.name = "optimal_io_size", .mode = S_IRUGO },
- .show = queue_io_opt_show,
-};
-
-static struct queue_sysfs_entry queue_discard_granularity_entry = {
- .attr = {.name = "discard_granularity", .mode = S_IRUGO },
- .show = queue_discard_granularity_show,
-};
-
-static struct queue_sysfs_entry queue_discard_max_entry = {
- .attr = {.name = "discard_max_bytes", .mode = S_IRUGO },
- .show = queue_discard_max_show,
-};
-
-static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {
- .attr = {.name = "discard_zeroes_data", .mode = S_IRUGO },
- .show = queue_discard_zeroes_data_show,
-};
-
-static struct queue_sysfs_entry queue_nonrot_entry = {
- .attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
- .show = queue_nonrot_show,
- .store = queue_nonrot_store,
-};
-
-static struct queue_sysfs_entry queue_nomerges_entry = {
- .attr = {.name = "nomerges", .mode = S_IRUGO | S_IWUSR },
- .show = queue_nomerges_show,
- .store = queue_nomerges_store,
-};
-
-static struct queue_sysfs_entry queue_rq_affinity_entry = {
- .attr = {.name = "rq_affinity", .mode = S_IRUGO | S_IWUSR },
- .show = queue_rq_affinity_show,
- .store = queue_rq_affinity_store,
-};
-
-static struct queue_sysfs_entry queue_iostats_entry = {
- .attr = {.name = "iostats", .mode = S_IRUGO | S_IWUSR },
- .show = queue_iostats_show,
- .store = queue_iostats_store,
-};
+queue_sysfs_rw_attr(queue_nonrot, "rotational");
+queue_sysfs_rw_attr(queue_nomerges, "nomerges");
+queue_sysfs_rw_attr(queue_rq_affinity, "rq_affinity");
+queue_sysfs_rw_attr(queue_iostats, "iostats");

static struct attribute *default_attrs[] = {
&queue_requests_entry.attr,
&queue_ra_entry.attr,
&queue_max_hw_sectors_entry.attr,
&queue_max_sectors_entry.attr,
- &queue_iosched_entry.attr,
+ &elv_iosched_entry.attr,
&queue_hw_sector_size_entry.attr,
&queue_logical_block_size_entry.attr,
&queue_physical_block_size_entry.attr,
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 2b91b74..3a91008 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -23,6 +23,7 @@ struct module;
enum sysfs_attr_lock_class {
SYSFS_ATTR_NORMAL,
SYSFS_ATTR_PM_CONTROL,
+ SYSFS_ATTR_IOSCHED,
SYSFS_NR_CLASSES,
};

--
1.5.5.6

2010-02-08 20:50:24

by Larry Finger

[permalink] [raw]
Subject: Re: [Patch 2/2] block: add sysfs lockdep class for iosched

On 02/08/2010 03:52 AM, Amerigo Wang wrote:
> Similar to the previous PM case, in iosched, we hold an s_active
> lock to store "scheduler", meanwhile we want to remove "iosched/*"
> files.
>
> This patch depends on the previous one. I tested it on my machine,
> it fixes the problem.
>
> Reported-by: Hugh Dickins <[email protected]>
> Signed-off-by: WANG Cong <[email protected]>
> Cc: Jens Axboe <[email protected]>

After applying the 2 patches to 2.6.33-rc7, I get the following:

ACPI: bus type pci registered
PCI: MMCONFIG for domain 0000 [bus 00-09] at [mem 0xe0000000-0xe09fffff] (base
0xe0000000)
PCI: MMCONFIG at [mem 0xe0000000-0xe09fffff] reserved in E820
PCI: Using configuration type 1 for base access
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Pid: 1, comm: swapper Not tainted 2.6.33-rc7-Linus-00010-g6339204-dirty #181
Call Trace:
[<ffffffff8107c6e6>] __lock_acquire+0xf86/0x1d30
[<ffffffff81078e7f>] ? lockdep_init_map+0x5f/0x5d0
[<ffffffff8107d52b>] lock_acquire+0x9b/0x120
[<ffffffff81167a93>] ? sysfs_addrm_finish+0x43/0x70
[<ffffffff81167243>] sysfs_deactivate+0xc3/0x110
[<ffffffff81167a93>] ? sysfs_addrm_finish+0x43/0x70
[<ffffffff813124d3>] ? mutex_lock_nested+0x243/0x300
[<ffffffff81167a93>] sysfs_addrm_finish+0x43/0x70
[<ffffffff81167af6>] remove_dir+0x36/0x40
[<ffffffff81167b09>] sysfs_remove_subdir+0x9/0x10
[<ffffffff81168ff6>] sysfs_remove_group+0x66/0xf0
[<ffffffff81861555>] param_sysfs_init+0x102/0x277
[<ffffffff8124a5bd>] ? sysdev_create_file+0xd/0x10
[<ffffffff8130fe46>] ? register_cpu+0xa3/0xa5
[<ffffffff81861453>] ? param_sysfs_init+0x0/0x277
[<ffffffff810001d7>] do_one_initcall+0x37/0x190
[<ffffffff8184c6d0>] kernel_init+0x14f/0x1a5
[<ffffffff81003bd4>] kernel_thread_helper+0x4/0x10
[<ffffffff8131417c>] ? restore_args+0x0/0x30
[<ffffffff8184c581>] ? kernel_init+0x0/0x1a5
[<ffffffff81003bd0>] ? kernel_thread_helper+0x0/0x10

This dump does not occur with standard 2.6.33-rc7. As the above turns off the
locking correctness validator, I cannot really test to see what happens when
suspending.

Larry

2010-02-09 02:54:49

by Cong Wang

[permalink] [raw]
Subject: Re: [Patch 2/2] block: add sysfs lockdep class for iosched

Larry Finger wrote:
> On 02/08/2010 03:52 AM, Amerigo Wang wrote:
>> Similar to the previous PM case, in iosched, we hold an s_active
>> lock to store "scheduler", meanwhile we want to remove "iosched/*"
>> files.
>>
>> This patch depends on the previous one. I tested it on my machine,
>> it fixes the problem.
>>
>> Reported-by: Hugh Dickins <[email protected]>
>> Signed-off-by: WANG Cong <[email protected]>
>> Cc: Jens Axboe <[email protected]>
>
> After applying the 2 patches to 2.6.33-rc7, I get the following:
>
> ACPI: bus type pci registered
> PCI: MMCONFIG for domain 0000 [bus 00-09] at [mem 0xe0000000-0xe09fffff] (base
> 0xe0000000)
> PCI: MMCONFIG at [mem 0xe0000000-0xe09fffff] reserved in E820
> PCI: Using configuration type 1 for base access
> INFO: trying to register non-static key.
> the code is fine but needs lockdep annotation.
> turning off the locking correctness validator.
> Pid: 1, comm: swapper Not tainted 2.6.33-rc7-Linus-00010-g6339204-dirty #181
> Call Trace:
> [<ffffffff8107c6e6>] __lock_acquire+0xf86/0x1d30
> [<ffffffff81078e7f>] ? lockdep_init_map+0x5f/0x5d0
> [<ffffffff8107d52b>] lock_acquire+0x9b/0x120
> [<ffffffff81167a93>] ? sysfs_addrm_finish+0x43/0x70
> [<ffffffff81167243>] sysfs_deactivate+0xc3/0x110
> [<ffffffff81167a93>] ? sysfs_addrm_finish+0x43/0x70
> [<ffffffff813124d3>] ? mutex_lock_nested+0x243/0x300
> [<ffffffff81167a93>] sysfs_addrm_finish+0x43/0x70
> [<ffffffff81167af6>] remove_dir+0x36/0x40
> [<ffffffff81167b09>] sysfs_remove_subdir+0x9/0x10
> [<ffffffff81168ff6>] sysfs_remove_group+0x66/0xf0
> [<ffffffff81861555>] param_sysfs_init+0x102/0x277
> [<ffffffff8124a5bd>] ? sysdev_create_file+0xd/0x10
> [<ffffffff8130fe46>] ? register_cpu+0xa3/0xa5
> [<ffffffff81861453>] ? param_sysfs_init+0x0/0x277
> [<ffffffff810001d7>] do_one_initcall+0x37/0x190
> [<ffffffff8184c6d0>] kernel_init+0x14f/0x1a5
> [<ffffffff81003bd4>] kernel_thread_helper+0x4/0x10
> [<ffffffff8131417c>] ? restore_args+0x0/0x30
> [<ffffffff8184c581>] ? kernel_init+0x0/0x1a5
> [<ffffffff81003bd0>] ? kernel_thread_helper+0x0/0x10
>
> This dump does not occur with standard 2.6.33-rc7. As the above turns off the
> locking correctness validator, I cannot really test to see what happens when
> suspending.
>

Ouch! I forgot to add the annotations to sysfs dirs...

Thanks much for the report, I will send an updated version soon!