2008-11-21 16:41:28

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH -V4] ext4: Fix lockdep recursive locking warning

Indicate that the group locks can be taken in loop.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/mballoc.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 7293209..1fa311c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2413,6 +2413,9 @@ ext4_mb_store_history(struct ext4_allocation_context *ac)
#define ext4_mb_history_init(sb)
#endif

+#ifdef CONFIG_LOCKDEP
+static struct lock_class_key alloc_sem_key[NR_BG_LOCKS];
+#endif

/* Create and initialize ext4_group_info data for the given group. */
int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
@@ -2473,8 +2476,14 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
}

INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
- init_rwsem(&meta_group_info[i]->alloc_sem);
+#ifdef CONFIG_LOCKDEP
+ __init_rwsem(&meta_group_info[i]->alloc_sem,
+ "&meta_group_info[i]->alloc_sem",
+ &alloc_sem_key[i]);
meta_group_info[i]->bb_free_root.rb_node = NULL;;
+#else
+ init_rwsem(&meta_group_info[i]->alloc_sem);
+#endif

#ifdef DOUBLE_CHECK
{
--
1.6.0.4.735.gea4f



2008-11-21 16:48:37

by Aneesh Kumar K.V

[permalink] [raw]
Subject: patchqueue update

Hi Ted,

Along with this change you can drop the patch
aneesh-8-fix-double-free-of-blocks from the patchqueue.
The changes are not needed. We were finding double free
due to a race in uninit bg code which i am fixing in
series sent after this mail.

-aneesh

2008-11-22 23:00:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V4] ext4: Fix lockdep recursive locking warning

On Fri, Nov 21, 2008 at 10:10:46PM +0530, Aneesh Kumar K.V wrote:
> Indicate that the group locks can be taken in loop.

I've been looking at this patch more closely, and I think there's a
major problem here. You've statically declared alloc_sem_key to be
NR_BG_LOCKS:

> +#ifdef CONFIG_LOCKDEP
> +static struct lock_class_key alloc_sem_key[NR_BG_LOCKS];
> +#endif

NR_BG_LOCKS is defined in include/linux/blockgroup_lock.h, and is 4 if
NR_CPUS is 1 or 2, 8 if NR_CPUS is 3, 16 if NR_CPUS is between 4 and
7, 32 if NR_CPUS is between 8 and 15, and so on.

It gets used this way:

> +#ifdef CONFIG_LOCKDEP
> + __init_rwsem(&meta_group_info[i]->alloc_sem,
> + "&meta_group_info[i]->alloc_sem",
> + &alloc_sem_key[i]);

But i is set thusly:

i = group & (EXT4_DESC_PER_BLOCK(sb) - 1);

which means i is between 0 and 127 if the filesystem has block 4k
filesystem....

It's also not clear to me that this will do the right thing if there
are multiple ext4 filesystems mounted. Since we are using a static
array for the lockdep class keys, that means that sb->s_group_info[x]
for one filesystem is considered in the same lockdep class as
sb->s_group_info[x] for another filesystem. This could cause false
positives if there are multiple ext4 filesystems mounted and two CPU's
are simultaneously accessing the filesystems and then access the two
s_group_info structures in different orders. Am I missing something?

- Ted


2008-11-23 04:44:19

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V4] ext4: Fix lockdep recursive locking warning

On Sat, Nov 22, 2008 at 03:46:25PM -0500, Theodore Tso wrote:
> On Fri, Nov 21, 2008 at 10:10:46PM +0530, Aneesh Kumar K.V wrote:
> > Indicate that the group locks can be taken in loop.
>
> I've been looking at this patch more closely, and I think there's a
> major problem here.

OK, after looking at this in yet more detail (and having changed
planes in Dallas :-), I am more than ever convinced this patch is not
rightq. We have an rw_sem for each block group, grp->alloc_sem, which
is allocated in groups of meta blockgroups. The whole reason why we
should worry about keeping them in the same class is we should worry
about is if for some reason, the multiblock allocator happens to
allocate two block group's alloc_sem, but one does them out of order
(say, bg 4, then bg 2, while another does bg 2, then 4), we would get
a dead lock.

I'm guessing that what caused the problem for you was
ext4_mb_init_group(), which if you are using 1k filesystems, tries to
grab multiple grp->alloc_sem's. In each place where we find those, we
need to use down_write_nested --- see Documentation/lockdep-design.txt.

If there are any other places in mballoc.c which grabs multiple
alloc_sem's at the same time, we'll have to use define new subclasses.

- Ted

2008-11-23 16:39:01

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V4] ext4: Fix lockdep recursive locking warning

On Sat, Nov 22, 2008 at 09:49:11PM -0500, Theodore Tso wrote:
> On Sat, Nov 22, 2008 at 03:46:25PM -0500, Theodore Tso wrote:
> > On Fri, Nov 21, 2008 at 10:10:46PM +0530, Aneesh Kumar K.V wrote:
> > > Indicate that the group locks can be taken in loop.
> >
> > I've been looking at this patch more closely, and I think there's a
> > major problem here.
>
> OK, after looking at this in yet more detail (and having changed
> planes in Dallas :-), I am more than ever convinced this patch is not
> rightq. We have an rw_sem for each block group, grp->alloc_sem, which
> is allocated in groups of meta blockgroups. The whole reason why we
> should worry about keeping them in the same class is we should worry
> about is if for some reason, the multiblock allocator happens to
> allocate two block group's alloc_sem, but one does them out of order
> (say, bg 4, then bg 2, while another does bg 2, then 4), we would get
> a dead lock.
>
> I'm guessing that what caused the problem for you was
> ext4_mb_init_group(), which if you are using 1k filesystems, tries to
> grab multiple grp->alloc_sem's. In each place where we find those, we
> need to use down_write_nested --- see Documentation/lockdep-design.txt.

Correct

>
> If there are any other places in mballoc.c which grabs multiple
> alloc_sem's at the same time, we'll have to use define new subclasses.

No. That is the only call site.

How about the below patch. We can have more than 2 groups in a page
depending on the page size and blocksize. So instead of using
single_depth I guess we should use the relative group number ?.

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1fa311c..891ce41 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1783,7 +1783,7 @@ static int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
* no block allocation going on in any
* of that groups
*/
- down_write(&grp->alloc_sem);
+ down_write_nested(&grp->alloc_sem, i);
}
/*
* make sure we look at only those groups

2008-11-23 18:35:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V4] ext4: Fix lockdep recursive locking warning

On Sun, Nov 23, 2008 at 10:03:49PM +0530, Aneesh Kumar K.V wrote:
>
> How about the below patch. We can have more than 2 groups in a page
> depending on the page size and blocksize. So instead of using
> single_depth I guess we should use the relative group number ?.

That should work. The maximum number of subclasses that we can have
by default is 8. With 16k pages, that will barely be enough for 1k
blocksize file systems (since we lock alloc_sem for
page_size/(2*fs_block_size) block groups). If we need more than that,
we might be better off just locking the entire filesystem against
block allocations, since after all this is a pretty rare case; it's
used only when we resize or when the filesystem is getting mounted.

- Ted

2008-11-24 05:05:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V4] ext4: Fix lockdep recursive locking warning

I've added your patch to the patch queue, using the following commit
comment, and using it to replace
aneesh-9-fix-lockdeep-recursive-locking-warning in the patch queue.

Please note that commit description explains what was the problem you
were trying to solve, some notes about why this works, what the
limitations might be with the approach. This is the kind of commit
logs we should strive for. We've been complemented for the clarity of
our commit logs, and much of that is because I've been rewriting the
changelog messages. If everyone who submits patches could strive to
meet similar standards, I'd greatly appreciated.

- Ted

ext4: Fix lockdep recursive locking warning

From: "Aneesh Kumar K.V" <[email protected]>

In ext4_mb_init_group(), if the filesystem block size is less than
PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
groups in a loop. We need to allow for this by using
down_write_nested() and passing in the loop index as a lock subclass
number. This works because no other code path needs to take multiple
alloc_sem's. Note that lockdep will fail for filesystem blocksize
smaller than to PAGE_SIZE/16k. (e.g., a 1k filesystem blocksize with
a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
etc.)