2008-07-18 19:58:13

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH 2/2] lock block groups when initializing

I noticed when filling a 1T filesystem with 4 threads using the
fs_mark benchmark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

that I occasionally got checksum mismatch errors:

EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935

etc. I'd reliably get 4-5 of them during the run.

It appears that the problem is likely a race to init the bg's
when the uninit_bg feature is enabled.

With the patch below, which adds sb_bgl_locking around initialization,
I was able to complete several runs with no errors or warnings.

Signed-off-by: Eric Sandeen <[email protected]>
---

Index: linux-2.6/fs/ext4/balloc.c
===================================================================
--- linux-2.6.orig/fs/ext4/balloc.c 2008-07-18 12:37:19.826353478 -0500
+++ linux-2.6/fs/ext4/balloc.c 2008-07-18 12:37:40.731354157 -0500
@@ -321,12 +321,15 @@ ext4_read_block_bitmap(struct super_bloc
if (bh_uptodate_or_lock(bh))
return bh;

+ spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
ext4_init_block_bitmap(sb, bh, block_group, desc);
set_buffer_uptodate(bh);
unlock_buffer(bh);
+ spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
return bh;
}
+ spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (bh_submit_read(bh) < 0) {
put_bh(bh);
ext4_error(sb, __func__,
Index: linux-2.6/fs/ext4/ialloc.c
===================================================================
--- linux-2.6.orig/fs/ext4/ialloc.c 2008-07-18 12:37:19.828353479 -0500
+++ linux-2.6/fs/ext4/ialloc.c 2008-07-18 12:37:40.732353959 -0500
@@ -118,12 +118,15 @@ ext4_read_inode_bitmap(struct super_bloc
if (bh_uptodate_or_lock(bh))
return bh;

+ spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
ext4_init_inode_bitmap(sb, bh, block_group, desc);
set_buffer_uptodate(bh);
unlock_buffer(bh);
+ spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
return bh;
}
+ spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (bh_submit_read(bh) < 0) {
put_bh(bh);
ext4_error(sb, __func__,
@@ -735,7 +738,7 @@ got:

/* When marking the block group with
* ~EXT4_BG_INODE_UNINIT we don't want to depend
- * on the value of bg_itable_unsed even though
+ * on the value of bg_itable_unused even though
* mke2fs could have initialized the same for us.
* Instead we calculated the value below
*/
Index: linux-2.6/fs/ext4/mballoc.c
===================================================================
--- linux-2.6.orig/fs/ext4/mballoc.c 2008-07-18 14:46:10.873291452 -0500
+++ linux-2.6/fs/ext4/mballoc.c 2008-07-18 14:56:11.329291114 -0500
@@ -787,13 +787,16 @@ static int ext4_mb_init_cache(struct pag
if (bh_uptodate_or_lock(bh[i]))
continue;

+ spin_lock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
ext4_init_block_bitmap(sb, bh[i],
first_group + i, desc);
set_buffer_uptodate(bh[i]);
unlock_buffer(bh[i]);
+ spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
continue;
}
+ spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
get_bh(bh[i]);
bh[i]->b_end_io = end_buffer_read_sync;
submit_bh(READ, bh[i]);Index: linux-2.6/fs/ext4/super.c
===================================================================
--- linux-2.6.orig/fs/ext4/super.c 2008-07-18 12:35:13.110291141 -0500
+++ linux-2.6/fs/ext4/super.c 2008-07-18 12:37:40.784353748 -0500
@@ -1621,6 +1621,7 @@ static int ext4_check_descriptors(struct
"(block %llu)!", i, inode_table);
return 0;
}
+ spin_lock(sb_bgl_lock(sbi, i));
if (!ext4_group_desc_csum_verify(sbi, i, gdp)) {
printk(KERN_ERR "EXT4-fs: ext4_check_descriptors: "
"Checksum for group %lu failed (%u!=%u)\n",
@@ -1628,6 +1629,7 @@ static int ext4_check_descriptors(struct
gdp)), le16_to_cpu(gdp->bg_checksum));
return 0;
}
+ spin_unlock(sb_bgl_lock(sbi, i));
if (!flexbg_flag)
first_block += EXT4_BLOCKS_PER_GROUP(sb);
}



2008-07-19 14:50:59

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/2] lock block groups when initializing

I've added both of these patches to the ext4 patch queue for testing.

- Ted

2008-07-19 16:11:07

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 2/2] lock block groups when initializing

Theodore Tso wrote:
> I've added both of these patches to the ext4 patch queue for testing.
>
> - Ted

Thanks, I'm hesitant to add my own patches w/o some review first :)

-Eric

2008-07-19 17:19:31

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/2] lock block groups when initializing

On Sat, Jul 19, 2008 at 11:11:04AM -0500, Eric Sandeen wrote:
> Theodore Tso wrote:
> > I've added both of these patches to the ext4 patch queue for testing.
> >
> > - Ted
>
> Thanks, I'm hesitant to add my own patches w/o some review first :)
>

I haven't done a full review yet; just a "quick eyeball" of the patch,
and then I put it on my laptop and noted that it hadn't destroyed my
home directory. :-)

I'll try to do a full review in the few days; I'll move it into the
stable queue and push it and a few other patches to the mainline
(although it appears that Linus went on vacation before pushing out
-rc1; Alan mentioned in another message that Linus might be gone for
"a week or two", which surprises me if he really went on vacation for
that length of time without pushing out -rc1 first.)

- Ted


2008-07-21 05:11:47

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 2/2] lock block groups when initializing

On Jul 18, 2008 14:58 -0500, Eric Sandeen wrote:
> I noticed when filling a 1T filesystem with 4 threads using the
> fs_mark benchmark that I occasionally got checksum mismatch errors:
>
> It appears that the problem is likely a race to init the bg's
> when the uninit_bg feature is enabled.
>
> With the patch below, which adds sb_bgl_locking around initialization,
> I was able to complete several runs with no errors or warnings.

Thanks for finding this.

> Signed-off-by: Eric Sandeen <[email protected]>

You can add Acked-by: Andreas Dilger <[email protected]>

> @@ -321,12 +321,15 @@ ext4_read_block_bitmap(struct super_bloc
> if (bh_uptodate_or_lock(bh))
> return bh;
>
> + spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
> if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> ext4_init_block_bitmap(sb, bh, block_group, desc);
> set_buffer_uptodate(bh);
> unlock_buffer(bh);
> + spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
> return bh;
> }

Since this flag will only ever change from set to unset, and will in
many cases not be set, we should first check if it is set without the
spin_lock(), and then if set re-check under the lock. This avoids putting
an extra lock in this path. While it may seem that we are already locking
and slow because of the locked buffer_head, the sb_bgl_lock() is hashed
so many groups will share the same lock and we may cause contention on
other CPUs accessing other groups needlessly.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.