2009-02-20 23:31:39

by Xiang Wang

[permalink] [raw]
Subject: fsck errors encountered when applying patch "ext4: fix BUG when calling ext4_error with locked block group"

We are recently picking some patches selectively from the ext4-stable
branch of the ext4 git tree and applied them to our internal ext4
tree(mostly based on a 2.6.26 kernel with some of our own changes),
and when we applied the following patch:

"ext4: fix BUG when calling ext4_error with locked block group"
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commit;h=be8f3df12cddeb352dd624fba9bf46a2de5711f3

We hit filesystem errors reported by fsck after we run dbench, an
example of the error is as follows:
// run dbench
dbench complete!
starting fsck...
e2fsck 1.41.3 (12-Oct-2008)
/dev/hdk3 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free inodes count wrong (45793274, counted=45793269).
Fix? no


/dev/hdk3: ********** WARNING: Filesystem still has errors **********

/dev/hdk3: 6/45793280 files (0.0% non-contiguous), 2891716/183143000 blocks

This problem seems to be the number of free inodes stored in the ext4
super block does not match the number counted by reading the inode
bitmaps.

Then I looked into the patch, especially the diff in the
ext4_commit_super in fs/ext4/super.c
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=blobdiff;f=fs/ext4/super.c;h=c53cab1e0a7fca1d9406f9bbb7c9cb661bae0567;hp=ed0406de6cae7379df9f72f272ade1a18df3966b;hb=be8f3df12cddeb352dd624fba9bf46a2de5711f3;hpb=e8470671cf71ec6361b71b3c95a1a1392c5cfa75

@@ -2868,8 +2906,11 @@ static void ext4_commit_super(struct super_block *sb,
set_buffer_uptodate(sbh);
}
es->s_wtime = cpu_to_le32(get_seconds());
- ext4_free_blocks_count_set(es, ext4_count_free_blocks(sb));
- es->s_free_inodes_count = cpu_to_le32(ext4_count_free_inodes(sb));
+ ext4_free_blocks_count_set(es, percpu_counter_sum_positive(
+ &EXT4_SB(sb)->s_freeblocks_counter));
+ es->s_free_inodes_count = cpu_to_le32(percpu_counter_sum_positive(
+ &EXT4_SB(sb)->s_freeinodes_counter));
+
BUFFER_TRACE(sbh, "marking dirty");
mark_buffer_dirty(sbh);
if (sync) {

seems like the new code only looks into the s_freeinodes_counter field
while the old code calls ext4_count_free_inodes(sb) and calculates the
count by adding up the free inode number from each block group.

So I tried reverting this particular portion of the patch, and reran
the dbench with the newly built kernel a couple of times, and the fsck
showed the file system to be clean.

I am just curious to see if anyone has ever seen this problem as I do
and whether there is a later fix for this. Of course, since I did not
apply all the patches from the ext4-stable branch, nor did I apply
patches on a public ext4 tree(I am only working on our internal tree),
that might already be a big problem. Still I would like to see why my
reverting this portion of the patch seems to be a temporarily fix?

thanks,
Xiang


2009-02-21 01:30:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: fsck errors encountered when applying patch "ext4: fix BUG when calling ext4_error with locked block group"

On Fri, Feb 20, 2009 at 03:31:36PM -0800, Xiang Wang wrote:
> We hit filesystem errors reported by fsck after we run dbench, an
> example of the error is as follows:
> // run dbench
> dbench complete!
> starting fsck...
> e2fsck 1.41.3 (12-Oct-2008)
> /dev/hdk3 contains a file system with errors, check forced.

Hmm, that's interesting. Can you check the system logs and see what
sort of errors were flagged by ext4? There may have been some sort of
issue which caused the filesystem to set the ERROR_FS flag. It would
be useful to see what the filesystem complained about.

> This problem seems to be the number of free inodes stored in the ext4
> super block does not match the number counted by reading the inode
> bitmaps.
>
> Then I looked into the patch, especially the diff in the
> ext4_commit_super in fs/ext4/super.c
> http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=blobdiff;f=fs/ext4/super.c;h=c53cab1e0a7fca1d9406f9bbb7c9cb661bae0567;hp=ed0406de6cae7379df9f72f272ade1a18df3966b;hb=be8f3df12cddeb352dd624fba9bf46a2de5711f3;hpb=e8470671cf71ec6361b71b3c95a1a1392c5cfa75
>
> @@ -2868,8 +2906,11 @@ static void ext4_commit_super(struct super_block *sb,
> set_buffer_uptodate(sbh);
> }
> es->s_wtime = cpu_to_le32(get_seconds());
> - ext4_free_blocks_count_set(es, ext4_count_free_blocks(sb));
> - es->s_free_inodes_count = cpu_to_le32(ext4_count_free_inodes(sb));
> + ext4_free_blocks_count_set(es, percpu_counter_sum_positive(
> + &EXT4_SB(sb)->s_freeblocks_counter));
> + es->s_free_inodes_count = cpu_to_le32(percpu_counter_sum_positive(
> + &EXT4_SB(sb)->s_freeinodes_counter));
> +
> BUFFER_TRACE(sbh, "marking dirty");
> mark_buffer_dirty(sbh);
> if (sync) {
>
> seems like the new code only looks into the s_freeinodes_counter field
> while the old code calls ext4_count_free_inodes(sb) and calculates the
> count by adding up the free inode number from each block group.

I'm guessing the problem here is that you didn't take these commits:

02d211688727ad02bb4555b1aa8ae2de16b21b39
71c5576fbd809f2015f4eddf72e501e298720cf3

... which fixes a problem which would cause inaccuracies in the percpu
counters due to races that would be tend to show up under a workload
such as dbench. Read the commit description of these two commits very
carefully, since there are potentially other users of the percpu
counters, and this changes the semantics of percpu_counter_sum().

Note that the percpu_counter race conditions can lead to problems
where the delayed allocation code might not notice that the disk has
filled, leading to data being lost without the application getting an
ENOSPC return from a write() system call.

One of the downsides of trying to backport a lot of changes to a
kernel version as old as 2.6.26 is that you miss changes like this.
You might want to think about leapfrogging to a newer kernel at some
point....

- Ted


2009-02-21 01:37:41

by Michael Rubin

[permalink] [raw]
Subject: Re: fsck errors encountered when applying patch "ext4: fix BUG when calling ext4_error with locked block group"

On Fri, Feb 20, 2009 at 5:29 PM, Theodore Tso <[email protected]> wrote:
> One of the downsides of trying to backport a lot of changes to a
> kernel version as old as 2.6.26 is that you miss changes like this.
> You might want to think about leapfrogging to a newer kernel at some
> point....

We know. Our qualifying process is not the most light weight and the
kernel moves fast. Normally we take a snapshot and qualify it, trying
to take upstream patches when we can and then also publishing bugs we
find. The problem is that with ext4 still undergoing active dev we
want to be able to keep our ext4 portion of the tree as up to date as
possible.

I admit it's a bit of a balancing act.

Hopefully we won't be causing a lot of undue burden on this list.

Thanks for this.

Michael Rubin

2009-02-21 05:00:19

by Theodore Ts'o

[permalink] [raw]
Subject: Re: fsck errors encountered when applying patch "ext4: fix BUG when calling ext4_error with locked block group"

On Fri, Feb 20, 2009 at 05:37:35PM -0800, Michael Rubin wrote:
> We know. Our qualifying process is not the most light weight and the
> kernel moves fast. Normally we take a snapshot and qualify it, trying
> to take upstream patches when we can and then also publishing bugs we
> find. The problem is that with ext4 still undergoing active dev we
> want to be able to keep our ext4 portion of the tree as up to date as
> possible.

I understand, and it's not a burden to answer questions like this. I
was just pointing out the effort that it will likely take to backport
the percpu counter patches, since you will need to scan the your
sources and make sure the behavioural changes in percpu_counter_sum
isn't going to cause problems for you elsewhere, and that this sort of
thing is probably going to get harder as time goes by, not easier. I
know how painful it can be, since I've been having a hard time
backporting fixes to the 2.6.27 stable tree.

The good news is that ext4 development is settling down, so if you
manage to take another snapshot around 2.6.29 or 2.6.30, I suspect
life will be much easier (at least as far as backporting patches for
ext4 is concerned.)

Best regards,

- Ted

2009-02-25 18:33:56

by Xiang Wang

[permalink] [raw]
Subject: Re: fsck errors encountered when applying patch "ext4: fix BUG when calling ext4_error with locked block group"

Hey Ted,

Good news!
You are right! Taking the two commits as you suggested cleared our
bug. Thank you very much for pointing that out.

We may still be back-porting patches for a while until our internal
tree catches up with 2.6.29 or 2.6.30.
But we definitely will be much more careful in taking patches.

Thanks,
Xiang

On Fri, Feb 20, 2009 at 5:50 PM, Theodore Tso <[email protected]> wrote:
> On Fri, Feb 20, 2009 at 05:37:35PM -0800, Michael Rubin wrote:
>> We know. Our qualifying process is not the most light weight and the
>> kernel moves fast. Normally we take a snapshot and qualify it, trying
>> to take upstream patches when we can and then also publishing bugs we
>> find. The problem is that with ext4 still undergoing active dev we
>> want to be able to keep our ext4 portion of the tree as up to date as
>> possible.
>
> I understand, and it's not a burden to answer questions like this. ?I
> was just pointing out the effort that it will likely take to backport
> the percpu counter patches, since you will need to scan the your
> sources and make sure the behavioural changes in percpu_counter_sum
> isn't going to cause problems for you elsewhere, and that this sort of
> thing is probably going to get harder as time goes by, not easier. ?I
> know how painful it can be, since I've been having a hard time
> backporting fixes to the 2.6.27 stable tree.
>
> The good news is that ext4 development is settling down, so if you
> manage to take another snapshot around 2.6.29 or 2.6.30, I suspect
> life will be much easier (at least as far as backporting patches for
> ext4 is concerned.)
>
> Best regards,
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>