2013-10-22 07:45:59

by Akira Fujita

[permalink] [raw]
Subject: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

If we create ext4 filesystem without resize_inode feature,
mke2fs command does not initialize block groups
which have backup superblock and/or group descriptor block
(With meta_bg feature, backup superblock and group descriptor block
are located separately).
So we have to fix block bitmaps when we run "e2fsck -b superblock device".
This patch fixes the issue by initializing block bitmaps correctly.

Steps to reproduce:
1. mke2fs -t ext4 -b 4096 -O ^resize_inode device
2. e2fsck -b 32768 DEV
<snip>
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
Fix<y>?

Signed-off-by: Akira Fujita <[email protected]>
---
lib/ext2fs/alloc_sb.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c
index 223ec51..5419d0d 100644
--- a/lib/ext2fs/alloc_sb.c
+++ b/lib/ext2fs/alloc_sb.c
@@ -58,23 +58,25 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
old_desc_blocks =
fs->desc_blocks + fs->super->s_reserved_gdt_blocks;

- if (super_blk || (group == 0))
+ if (super_blk || (group == 0)) {
+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
ext2fs_mark_block_bitmap2(bmap, super_blk);
+ }
if ((group == 0) && (fs->blocksize == 1024) &&
EXT2FS_CLUSTER_RATIO(fs) > 1)
ext2fs_mark_block_bitmap2(bmap, 0);

if (old_desc_blk) {
- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
num_blocks = old_desc_blocks;
if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
num_blocks = ext2fs_blocks_count(fs->super) -
old_desc_blk;
ext2fs_mark_block_bitmap_range2(bmap, old_desc_blk, num_blocks);
}
- if (new_desc_blk)
+ if (new_desc_blk) {
+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
ext2fs_mark_block_bitmap2(bmap, new_desc_blk);
+ }

num_blocks = ext2fs_group_blocks_count(fs, group);
num_blocks -= 2 + fs->inode_blocks_per_group + used_blks;



2013-11-22 08:10:17

by Akira Fujita

[permalink] [raw]
Subject: RE: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

Hi Ted,

Could you review this patch?

Best,
Akira Fujita

>-----Original Message-----
>From: [email protected] [mailto:[email protected]] On Behalf Of
>Akira Fujita
>Sent: Tuesday, October 22, 2013 4:40 PM
>To: Theodore Tso; ext4 development
>Subject: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode
>
>If we create ext4 filesystem without resize_inode feature, mke2fs command does not initialize
block
>groups which have backup superblock and/or group descriptor block (With meta_bg feature,
backup
>superblock and group descriptor block are located separately).
>So we have to fix block bitmaps when we run "e2fsck -b superblock device".
>This patch fixes the issue by initializing block bitmaps correctly.
>
>Steps to reproduce:
>1. mke2fs -t ext4 -b 4096 -O ^resize_inode device 2. e2fsck -b 32768 DEV
> <snip>
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
> Fix<y>?
>
>Signed-off-by: Akira Fujita <[email protected]>
>---
> lib/ext2fs/alloc_sb.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/ext2fs/alloc_sb.c
>b/lib/ext2fs/alloc_sb.c index 223ec51..5419d0d 100644
>--- a/lib/ext2fs/alloc_sb.c
>+++ b/lib/ext2fs/alloc_sb.c
>@@ -58,23 +58,25 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> old_desc_blocks =
> fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
>
>- if (super_blk || (group == 0))
>+ if (super_blk || (group == 0)) {
>+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> ext2fs_mark_block_bitmap2(bmap, super_blk);
>+ }
> if ((group == 0) && (fs->blocksize == 1024) &&
> EXT2FS_CLUSTER_RATIO(fs) > 1)
> ext2fs_mark_block_bitmap2(bmap, 0);
>
> if (old_desc_blk) {
>- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
>- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> num_blocks = old_desc_blocks;
> if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
> num_blocks = ext2fs_blocks_count(fs->super) -
> old_desc_blk;
> ext2fs_mark_block_bitmap_range2(bmap, old_desc_blk, num_blocks);
> }
>- if (new_desc_blk)
>+ if (new_desc_blk) {
>+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> ext2fs_mark_block_bitmap2(bmap, new_desc_blk);
>+ }
>
> num_blocks = ext2fs_group_blocks_count(fs, group);
> num_blocks -= 2 + fs->inode_blocks_per_group + used_blks;
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message
to
>[email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-11-23 01:33:45

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

On Tue, Oct 22, 2013 at 04:39:38PM +0900, Akira Fujita wrote:
> If we create ext4 filesystem without resize_inode feature,
> mke2fs command does not initialize block groups
> which have backup superblock and/or group descriptor block
> (With meta_bg feature, backup superblock and group descriptor block
> are located separately).
> So we have to fix block bitmaps when we run "e2fsck -b superblock device".
> This patch fixes the issue by initializing block bitmaps correctly.
>
> Steps to reproduce:
> 1. mke2fs -t ext4 -b 4096 -O ^resize_inode device
> 2. e2fsck -b 32768 DEV
> <snip>
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
> Fix<y>?
>
> Signed-off-by: Akira Fujita <[email protected]>
> ---
> lib/ext2fs/alloc_sb.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
> diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c
> index 223ec51..5419d0d 100644
> --- a/lib/ext2fs/alloc_sb.c
> +++ b/lib/ext2fs/alloc_sb.c
> @@ -58,23 +58,25 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> old_desc_blocks =
> fs->desc_blocks + fs->super->s_reserved_gdt_blocks;
>
> - if (super_blk || (group == 0))
> + if (super_blk || (group == 0)) {
> + ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);

I'm not sure I agree with clearing the uninit flag unconditionally. Consider
the case where bmap != fs->block_map, which happens in mark_table_blocks()
(e2fsck pass1) and ext2fs_check_desc (libext2fs). In these two cases, the
function will mark superblock and group descriptor blocks in a separate bitmap,
leaving everything under fs alone. IOW: It doesn't make sense to mess with the
fs' uninit flags if we're not marking the fs' bitmap.

As you point out, however, if we /are/ being called with fs->block_bitmap ==
bmap, then we're probably initializing the fs and it makes sense to clear the
uninit flags.

Maybe something like this, just after the call to ext2fs_super_and_bgd_loc2?

if (fs->block_map == bmap && (new_desc_block || old_desc_block || super_blk))
ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);

<shrug>

I've been wondering for a while -- if we have a FS with meta_bg and
s_first_meta_bg > 0, does that mean we have "old" style group descriptor
layouts up to whatever block group s_first_meta_bg points to? And why do we
set old_desc_blocks to s_first_meta_bg, since the former is used as a block
length offset for old_desc_blk for block groups beneath s_first_meta_bg?
Group descriptors aren't the same size as a block.

This all seems kinda moot since the only initialization I can find for
s_first_meta_bg sets it to zero.

<confused>

> ext2fs_mark_block_bitmap2(bmap, super_blk);
> + }
> if ((group == 0) && (fs->blocksize == 1024) &&
> EXT2FS_CLUSTER_RATIO(fs) > 1)
> ext2fs_mark_block_bitmap2(bmap, 0);
>
> if (old_desc_blk) {
> - if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
> - ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);

Why remove this?

> num_blocks = old_desc_blocks;
> if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
> num_blocks = ext2fs_blocks_count(fs->super) -
> old_desc_blk;
> ext2fs_mark_block_bitmap_range2(bmap, old_desc_blk, num_blocks);
> }
> - if (new_desc_blk)
> + if (new_desc_blk) {
> + ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);

Same comment here as that first chunk of reply.

--D
> ext2fs_mark_block_bitmap2(bmap, new_desc_blk);
> + }
>
> num_blocks = ext2fs_group_blocks_count(fs, group);
> num_blocks -= 2 + fs->inode_blocks_per_group + used_blks;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-11-26 01:27:17

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

Does the following patch fix your error? I think I've started to hit it too,
now that I added ^resize_inode,^meta_bg to the mix.

--D

libext2fs: clear BLOCK_UNINIT any time a group has metadata blocks

If ext2fs_reserve_super_and_bgd() is called with the filesystem's
block bitmap, we clear the group's BLOCK_UNINIT flag if the group has
an old-style (pre-metabg) group descriptor block with reserved
descriptor blocks prior to marking the old-style descriptor blocks in
use.

Unfortunately, this is not sufficient -- if the group has no reserved
blocks yet isn't a meta_bg filesystem (^resize_inode,^meta_bg), we
forget to clear the flag, so the filesystem thinks all blocks are
free, which is disastrous.

Therefore, if we know that we're going to mark any of the bitmaps,
make sure that we clear BLOCK_UNINIT.

This is based on an earlier patch by Akira Fujita.

> Steps to reproduce:
> 1. mke2fs -t ext4 -b 4096 -O ^resize_inode device
> 2. e2fsck -b 32768 DEV
> <snip>
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
> Fix<y>?

Reported-by: Akira Fujita <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
---
lib/ext2fs/alloc_sb.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c
index 223ec51..f1898df 100644
--- a/lib/ext2fs/alloc_sb.c
+++ b/lib/ext2fs/alloc_sb.c
@@ -52,6 +52,15 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
ext2fs_super_and_bgd_loc2(fs, group, &super_blk,
&old_desc_blk, &new_desc_blk, &used_blks);

+ /*
+ * If we're marking blocks in the main block bitmap, be sure that we
+ * clear the group's BLOCK_UNINIT flag if this group has metadata
+ * blocks allocated.
+ */
+ if (fs->block_map == bmap &&
+ (new_desc_blk || old_desc_blk || super_blk || (group == 0)))
+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
+
if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
old_desc_blocks = fs->super->s_first_meta_bg;
else
@@ -65,8 +74,6 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
ext2fs_mark_block_bitmap2(bmap, 0);

if (old_desc_blk) {
- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
num_blocks = old_desc_blocks;
if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
num_blocks = ext2fs_blocks_count(fs->super) -

2013-11-26 08:17:33

by Akira Fujita

[permalink] [raw]
Subject: RE: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

Hi Darrick,

With your patch, e2fsck can't fix the filesystem created with existing mke2fs without
resize_inode.
Because "fs->block_map == bmap" condition is in ext2fs_reserve_super_and_bgd().

But if the condition removed (like my previous patch) we can detect and
fix filesystem inconsistency as follows.

1.Create filesystem with existing mke2fs command
# mke2fs -t ext4 -O ^resize_inode DEVICE

2. (With my patch) e2fsck removes uninit_bg flag then fixes metadata inconsistency
# ./e2fsck/e2fsck -f DEVICE
e2fsck 1.43-WIP (8-Jul-2013)
One or more block group descriptor checksums are invalid. Fix<y>? yes
Group descriptor 1 checksum is 0x0343, should be 0xc4ba. FIXED.
Group descriptor 3 checksum is 0x17f2, should be 0xd00b. FIXED.
Group descriptor 5 checksum is 0xf36c, should be 0x3495. FIXED.
Group descriptor 7 checksum is 0x3e90, should be 0xf969. FIXED.
Group descriptor 9 checksum is 0xa31e, should be 0x64e7. FIXED.
Group descriptor 25 checksum is 0x165a, should be 0xd1a3. FIXED.
Group descriptor 27 checksum is 0xce4c, should be 0x09b5. FIXED.
Group descriptor 49 checksum is 0x502b, should be 0x97d2. FIXED.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841) +(229376--229377)
+(294912--294913) +(819200--819201) +(884736--884737) +(1605632--1605633)
Fix<y>? yes

/dev/loop0: ***** FILE SYSTEM WAS MODIFIED *****
/dev/loop0: 11/656640 files (0.0% non-contiguous), 73991/2621440 blocks

Regards,
Akira Fujita

>-----Original Message-----
>From: Darrick J. Wong [mailto:[email protected]]
>Sent: Tuesday, November 26, 2013 10:27 AM
>To: Akira Fujita
>Cc: Theodore Tso; ext4 development
>Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode
>
>Does the following patch fix your error? I think I've started to hit it too, now that I added
>^resize_inode,^meta_bg to the mix.
>
>--D
>
>libext2fs: clear BLOCK_UNINIT any time a group has metadata blocks
>
>If ext2fs_reserve_super_and_bgd() is called with the filesystem's block bitmap, we clear the
group's
>BLOCK_UNINIT flag if the group has an old-style (pre-metabg) group descriptor block with
reserved
>descriptor blocks prior to marking the old-style descriptor blocks in use.
>
>Unfortunately, this is not sufficient -- if the group has no reserved blocks yet isn't a
meta_bg
>filesystem (^resize_inode,^meta_bg), we forget to clear the flag, so the filesystem thinks all
>blocks are free, which is disastrous.
>
>Therefore, if we know that we're going to mark any of the bitmaps, make sure that we clear
>BLOCK_UNINIT.
>
>This is based on an earlier patch by Akira Fujita.
>
>> Steps to reproduce:
>> 1. mke2fs -t ext4 -b 4096 -O ^resize_inode device 2. e2fsck -b 32768
>> DEV
>> <snip>
>> Pass 1: Checking inodes, blocks, and sizes
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
>> Fix<y>?
>
>Reported-by: Akira Fujita <[email protected]>
>Signed-off-by: Darrick J. Wong <[email protected]>
>---
> lib/ext2fs/alloc_sb.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
>diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c index 223ec51..f1898df 100644
>--- a/lib/ext2fs/alloc_sb.c
>+++ b/lib/ext2fs/alloc_sb.c
>@@ -52,6 +52,15 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> ext2fs_super_and_bgd_loc2(fs, group, &super_blk,
> &old_desc_blk, &new_desc_blk, &used_blks);
>
>+ /*
>+ * If we're marking blocks in the main block bitmap, be sure that we
>+ * clear the group's BLOCK_UNINIT flag if this group has metadata
>+ * blocks allocated.
>+ */
>+ if (fs->block_map == bmap &&
>+ (new_desc_blk || old_desc_blk || super_blk || (group == 0)))
>+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
>+
> if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
> old_desc_blocks = fs->super->s_first_meta_bg;
> else
>@@ -65,8 +74,6 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> ext2fs_mark_block_bitmap2(bmap, 0);
>
> if (old_desc_blk) {
>- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
>- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> num_blocks = old_desc_blocks;
> if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
> num_blocks = ext2fs_blocks_count(fs->super) -


2013-11-27 00:55:12

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

On Tue, Nov 26, 2013 at 05:10:56PM +0900, Akira Fujita wrote:
> Hi Darrick,
>
> With your patch, e2fsck can't fix the filesystem created with existing mke2fs without
> resize_inode.
> Because "fs->block_map == bmap" condition is in ext2fs_reserve_super_and_bgd().

Hmm, I think I see what's going on here.

First, I think you're arguing that BLOCK_UNINIT should only be set on a group
if it's completely empty. I think you're also arguing that even if the group
contains only its own superblocks, group descriptors, bitmaps, or inode tables,
BLOCK_UNINIT should not be set. Let's call that condition (block group
contains only group data, but BLOCK_UNINIT is set) "(1)".

Note: If a group contains some other group's metadata, BLOCK_UNINIT should not
(and does not seem to) be set.

(If I'm wrong, please let me know.)

The kernel, when it needs to load a bitmap, checks for BLOCK_UNINIT; if set, it
calls ext4_init_block_bitmap() to zero the bitmap and mark the relevant
sb/gdt/bmap/table bits, if necessary. That takes care of (1) in the kernel.

e2fsck pass5 also checks for BLOCK_UNINIT; if it's set, then it first compares
block_found_map against a zeroed bitmap (which falls back to no_optimize mode),
and then it secondly compares each block individually against the ranges that
ext2fs_super_and_bgd_loc2(), ext2fs_*_bitmap_loc(), and
ext2fs_inode_table_loc() return. That takes care of (1) in e2fsck.

libext2fs, however, checks for BLOCK_UNINIT; if it's set, it zeroes that part
of the bitmap and moves on without bothering to mark the sb/gdt/bmap/table
bits. This does NOT take care of (1), and opens a window where callers can
allocate blocks that already hold group metadata! (Let's call this (2).)

The fuse2fs bug that I was seeing was that if (2) happens, then fuse2fs sees a
zero bitmap and blindly hands away group metadata, which is broken. With
yesterday's patch, (1) doesn't happen and all seems ok.

I think in your e2fsck test, e2fsck tests block_found_map against a correctly
constructed bitmap (with ext2fs_super_and_bgd_loc2) if BLOCK_UNINIT is set,
which is why it can't fix (1) unless e2fsck is fed a backup superblock (or
there's a corrupt a group descriptor checksum), either of which clears
BLOCK_UNINIT before pass5).

So I'm wondering, is there more to the bug report than just "I ran mkfs and
prodded e2fsck into reporting errors"?

Given that both the kernel and e2fsck know how to calculate the proper block
bitmap when BLOCK_UNINIT is set, it doesn't seem to be an error case if
BLOCK_UNINIT is set on a group that contains superblocks, group descriptors,
bitmaps, or inode tables. It seems to me that if we can calculate a bitmap,
then there's no need to record it, and we can use BLOCK_UNINIT.

What do you think of this solution: Change read_bitmaps to mark the location of
sb/gdt/bmap/inode blocks in the in-core block bitmap for BLOCK_UNINIT groups.
This fixes the problem that fs->block_bitmap doesn't accurately reflect the
real state of the filesystem (as evidenced by dumpe2fs/debugfs reporting more
free blocks than the summar. I think we can also eliminate the skip_group code
from the block bitmap check in pass5. That should speed up pass5 since we can
also use the fast bitmap compare.

I'll send out a test patch.

Of course, the tests seem coded to pass things like:

> Group 2: (Blocks 16385-24576) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
> Block bitmap at 16385 (+0), Inode bitmap at 16386 (+1)
^^^^^ ^^^^^
> Inode table at 16387-16642 (+2)
^^^^^^^^^^^
> 7934 free blocks, 2048 free inodes, 0 directories, 2048 unused inodes
> Free blocks: 16385-24576
^^^^^
> Free inodes: 4097-6144

<grumble>

--D

> But if the condition removed (like my previous patch) we can detect and
> fix filesystem inconsistency as follows.
>
> 1.Create filesystem with existing mke2fs command
> # mke2fs -t ext4 -O ^resize_inode DEVICE
>
> 2. (With my patch) e2fsck removes uninit_bg flag then fixes metadata inconsistency
> # ./e2fsck/e2fsck -f DEVICE
> e2fsck 1.43-WIP (8-Jul-2013)
> One or more block group descriptor checksums are invalid. Fix<y>? yes
> Group descriptor 1 checksum is 0x0343, should be 0xc4ba. FIXED.
> Group descriptor 3 checksum is 0x17f2, should be 0xd00b. FIXED.
> Group descriptor 5 checksum is 0xf36c, should be 0x3495. FIXED.
> Group descriptor 7 checksum is 0x3e90, should be 0xf969. FIXED.
> Group descriptor 9 checksum is 0xa31e, should be 0x64e7. FIXED.
> Group descriptor 25 checksum is 0x165a, should be 0xd1a3. FIXED.
> Group descriptor 27 checksum is 0xce4c, should be 0x09b5. FIXED.
> Group descriptor 49 checksum is 0x502b, should be 0x97d2. FIXED.
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841) +(229376--229377)
> +(294912--294913) +(819200--819201) +(884736--884737) +(1605632--1605633)
> Fix<y>? yes
>
> /dev/loop0: ***** FILE SYSTEM WAS MODIFIED *****
> /dev/loop0: 11/656640 files (0.0% non-contiguous), 73991/2621440 blocks
>
> Regards,
> Akira Fujita
>
> >-----Original Message-----
> >From: Darrick J. Wong [mailto:[email protected]]
> >Sent: Tuesday, November 26, 2013 10:27 AM
> >To: Akira Fujita
> >Cc: Theodore Tso; ext4 development
> >Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode
> >
> >Does the following patch fix your error? I think I've started to hit it too, now that I added
> >^resize_inode,^meta_bg to the mix.
> >
> >--D
> >
> >libext2fs: clear BLOCK_UNINIT any time a group has metadata blocks
> >
> >If ext2fs_reserve_super_and_bgd() is called with the filesystem's block bitmap, we clear the
> group's
> >BLOCK_UNINIT flag if the group has an old-style (pre-metabg) group descriptor block with
> reserved
> >descriptor blocks prior to marking the old-style descriptor blocks in use.
> >
> >Unfortunately, this is not sufficient -- if the group has no reserved blocks yet isn't a
> meta_bg
> >filesystem (^resize_inode,^meta_bg), we forget to clear the flag, so the filesystem thinks all
> >blocks are free, which is disastrous.
> >
> >Therefore, if we know that we're going to mark any of the bitmaps, make sure that we clear
> >BLOCK_UNINIT.
> >
> >This is based on an earlier patch by Akira Fujita.
> >
> >> Steps to reproduce:
> >> 1. mke2fs -t ext4 -b 4096 -O ^resize_inode device 2. e2fsck -b 32768
> >> DEV
> >> <snip>
> >> Pass 1: Checking inodes, blocks, and sizes
> >> Pass 2: Checking directory structure
> >> Pass 3: Checking directory connectivity
> >> Pass 4: Checking reference counts
> >> Pass 5: Checking group summary information
> >> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
> >> Fix<y>?
> >
> >Reported-by: Akira Fujita <[email protected]>
> >Signed-off-by: Darrick J. Wong <[email protected]>
> >---
> > lib/ext2fs/alloc_sb.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> >diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c index 223ec51..f1898df 100644
> >--- a/lib/ext2fs/alloc_sb.c
> >+++ b/lib/ext2fs/alloc_sb.c
> >@@ -52,6 +52,15 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> > ext2fs_super_and_bgd_loc2(fs, group, &super_blk,
> > &old_desc_blk, &new_desc_blk, &used_blks);
> >
> >+ /*
> >+ * If we're marking blocks in the main block bitmap, be sure that we
> >+ * clear the group's BLOCK_UNINIT flag if this group has metadata
> >+ * blocks allocated.
> >+ */
> >+ if (fs->block_map == bmap &&
> >+ (new_desc_blk || old_desc_blk || super_blk || (group == 0)))
> >+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> >+
> > if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
> > old_desc_blocks = fs->super->s_first_meta_bg;
> > else
> >@@ -65,8 +74,6 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> > ext2fs_mark_block_bitmap2(bmap, 0);
> >
> > if (old_desc_blk) {
> >- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
> >- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> > num_blocks = old_desc_blocks;
> > if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
> > num_blocks = ext2fs_blocks_count(fs->super) -
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-11-28 08:56:56

by Akira Fujita

[permalink] [raw]
Subject: RE: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

>On Tue, Nov 26, 2013 at 05:10:56PM +0900, Akira Fujita wrote:
>> Hi Darrick,
>>
>> With your patch, e2fsck can't fix the filesystem created with existing
>> mke2fs without resize_inode.
>> Because "fs->block_map == bmap" condition is in ext2fs_reserve_super_and_bgd().
>
>Hmm, I think I see what's going on here.
>
>First, I think you're arguing that BLOCK_UNINIT should only be set on a group if it's
completely
>empty. I think you're also arguing that even if the group contains only its own superblocks,
group
>descriptors, bitmaps, or inode tables, BLOCK_UNINIT should not be set. Let's call that
condition
>(block group contains only group data, but BLOCK_UNINIT is set) "(1)".
>
>Note: If a group contains some other group's metadata, BLOCK_UNINIT should not (and does not
seem
>to) be set.
>
>(If I'm wrong, please let me know.)

Yes. Current ext4, it seems to me that BLOCK_UNINIT is set under the following rules:

1 sb(backup sb)/gdt/bitmap/itable exist (e.g. #BG0) -> BLOCK_UNINIT off
2 backup sb/gdt exist (e.g. #BG5) -> BLOCK_UNINIT off
3 bitmap/itable exist (e.g. #BG16) -> BLOCK_UNINIT off
4 Completely empty (e.g. #BG6) -> BLOCK_UNINIT on
Note: #BGN is in ext4 created by default option

And there is a bug that block group which has backup super block
and group descriptor block are not write out to device in wrtie_bitmaps()
when BLOCK_UNINIT is set.
So at this time, I attempted to fix this by cleaning BLOCK_UNIIT.


>The kernel, when it needs to load a bitmap, checks for BLOCK_UNINIT; if set, it calls
>ext4_init_block_bitmap() to zero the bitmap and mark the relevant sb/gdt/bmap/table bits, if
>necessary. That takes care of (1) in the kernel.
>
>e2fsck pass5 also checks for BLOCK_UNINIT; if it's set, then it first compares block_found_map
>against a zeroed bitmap (which falls back to no_optimize mode), and then it secondly compares
each
>block individually against the ranges that ext2fs_super_and_bgd_loc2(), ext2fs_*_bitmap_loc(),
and
>ext2fs_inode_table_loc() return. That takes care of (1) in e2fsck.
>
>libext2fs, however, checks for BLOCK_UNINIT; if it's set, it zeroes that part of the bitmap
and
>moves on without bothering to mark the sb/gdt/bmap/table bits. This does NOT take care of
(1),
>and opens a window where callers can allocate blocks that already hold group metadata! (Let's
call
>this (2).)
>
>The fuse2fs bug that I was seeing was that if (2) happens, then fuse2fs sees a zero bitmap and
blindly
>hands away group metadata, which is broken. With yesterday's patch, (1) doesn't happen and
all
>seems ok.
>
>I think in your e2fsck test, e2fsck tests block_found_map against a correctly constructed
bitmap
>(with ext2fs_super_and_bgd_loc2) if BLOCK_UNINIT is set, which is why it can't fix (1) unless
e2fsck
>is fed a backup superblock (or there's a corrupt a group descriptor checksum), either of which
clears
>BLOCK_UNINIT before pass5).
>
>So I'm wondering, is there more to the bug report than just "I ran mkfs and prodded e2fsck
into
>reporting errors"?

I guess there is another bug.

>Given that both the kernel and e2fsck know how to calculate the proper block bitmap when
BLOCK_UNINIT
>is set, it doesn't seem to be an error case if BLOCK_UNINIT is set on a group that contains
superblocks,
>group descriptors, bitmaps, or inode tables. It seems to me that if we can calculate a
bitmap,
>then there's no need to record it, and we can use BLOCK_UNINIT.
>
>What do you think of this solution: Change read_bitmaps to mark the location of
sb/gdt/bmap/inode
>blocks in the in-core block bitmap for BLOCK_UNINIT groups.
>This fixes the problem that fs->block_bitmap doesn't accurately reflect the real state of the
>filesystem (as evidenced by dumpe2fs/debugfs reporting more free blocks than the summar. I
think
>we can also eliminate the skip_group code from the block bitmap check in pass5. That should
speed
>up pass5 since we can also use the fast bitmap compare.

Speeding up bitmap check in pass5 sounds nice,
but setting BLOCK_UNINIT for block groups which have metadata means
we'll change the definition of BLOCK_UNIIT... :s

Regards,
Akira Fujita

>I'll send out a test patch.
>
>Of course, the tests seem coded to pass things like:
>
>> Group 2: (Blocks 16385-24576) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
>> Block bitmap at 16385 (+0), Inode bitmap at 16386 (+1)
> ^^^^^ ^^^^^
>> Inode table at 16387-16642 (+2)
> ^^^^^^^^^^^
>> 7934 free blocks, 2048 free inodes, 0 directories, 2048 unused inodes
>> Free blocks: 16385-24576
> ^^^^^
>> Free inodes: 4097-6144
>
><grumble>
>
>--D
>
>> But if the condition removed (like my previous patch) we can detect
>> and fix filesystem inconsistency as follows.
>>
>> 1.Create filesystem with existing mke2fs command # mke2fs -t ext4 -O
>> ^resize_inode DEVICE
>>
>> 2. (With my patch) e2fsck removes uninit_bg flag then fixes metadata
>> inconsistency # ./e2fsck/e2fsck -f DEVICE e2fsck 1.43-WIP (8-Jul-2013)
>> One or more block group descriptor checksums are invalid. Fix<y>? yes
>> Group descriptor 1 checksum is 0x0343, should be 0xc4ba. FIXED.
>> Group descriptor 3 checksum is 0x17f2, should be 0xd00b. FIXED.
>> Group descriptor 5 checksum is 0xf36c, should be 0x3495. FIXED.
>> Group descriptor 7 checksum is 0x3e90, should be 0xf969. FIXED.
>> Group descriptor 9 checksum is 0xa31e, should be 0x64e7. FIXED.
>> Group descriptor 25 checksum is 0x165a, should be 0xd1a3. FIXED.
>> Group descriptor 27 checksum is 0xce4c, should be 0x09b5. FIXED.
>> Group descriptor 49 checksum is 0x502b, should be 0x97d2. FIXED.
>> Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory
>> structure Pass 3: Checking directory connectivity Pass 4: Checking
>> reference counts Pass 5: Checking group summary information Block
>> bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
>> +(229376--229377)
>> +(294912--294913) +(819200--819201) +(884736--884737)
>> ++(1605632--1605633)
>> Fix<y>? yes
>>
>> /dev/loop0: ***** FILE SYSTEM WAS MODIFIED *****
>> /dev/loop0: 11/656640 files (0.0% non-contiguous), 73991/2621440
>> blocks
>>
>> Regards,
>> Akira Fujita
>>
>> >-----Original Message-----
>> >From: Darrick J. Wong [mailto:[email protected]]
>> >Sent: Tuesday, November 26, 2013 10:27 AM
>> >To: Akira Fujita
>> >Cc: Theodore Tso; ext4 development
>> >Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O
>> >^resize_inode
>> >
>> >Does the following patch fix your error? I think I've started to hit
>> >it too, now that I added ^resize_inode,^meta_bg to the mix.
>> >
>> >--D
>> >
>> >libext2fs: clear BLOCK_UNINIT any time a group has metadata blocks
>> >
>> >If ext2fs_reserve_super_and_bgd() is called with the filesystem's
>> >block bitmap, we clear the
>> group's
>> >BLOCK_UNINIT flag if the group has an old-style (pre-metabg) group
>> >descriptor block with
>> reserved
>> >descriptor blocks prior to marking the old-style descriptor blocks in use.
>> >
>> >Unfortunately, this is not sufficient -- if the group has no reserved
>> >blocks yet isn't a
>> meta_bg
>> >filesystem (^resize_inode,^meta_bg), we forget to clear the flag, so
>> >the filesystem thinks all blocks are free, which is disastrous.
>> >
>> >Therefore, if we know that we're going to mark any of the bitmaps,
>> >make sure that we clear BLOCK_UNINIT.
>> >
>> >This is based on an earlier patch by Akira Fujita.
>> >
>> >> Steps to reproduce:
>> >> 1. mke2fs -t ext4 -b 4096 -O ^resize_inode device 2. e2fsck -b
>> >> 32768 DEV
>> >> <snip>
>> >> Pass 1: Checking inodes, blocks, and sizes
>> >> Pass 2: Checking directory structure
>> >> Pass 3: Checking directory connectivity
>> >> Pass 4: Checking reference counts
>> >> Pass 5: Checking group summary information
>> >> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
>> >> Fix<y>?
>> >
>> >Reported-by: Akira Fujita <[email protected]>
>> >Signed-off-by: Darrick J. Wong <[email protected]>
>> >---
>> > lib/ext2fs/alloc_sb.c | 11 +++++++++--
>> > 1 file changed, 9 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c index
>> >223ec51..f1898df 100644
>> >--- a/lib/ext2fs/alloc_sb.c
>> >+++ b/lib/ext2fs/alloc_sb.c
>> >@@ -52,6 +52,15 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
>> > ext2fs_super_and_bgd_loc2(fs, group, &super_blk,
>> > &old_desc_blk, &new_desc_blk, &used_blks);
>> >
>> >+ /*
>> >+ * If we're marking blocks in the main block bitmap, be sure that we
>> >+ * clear the group's BLOCK_UNINIT flag if this group has metadata
>> >+ * blocks allocated.
>> >+ */
>> >+ if (fs->block_map == bmap &&
>> >+ (new_desc_blk || old_desc_blk || super_blk || (group == 0)))
>> >+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
>> >+
>> > if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
>> > old_desc_blocks = fs->super->s_first_meta_bg;
>> > else
>> >@@ -65,8 +74,6 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
>> > ext2fs_mark_block_bitmap2(bmap, 0);
>> >
>> > if (old_desc_blk) {
>> >- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
>> >- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
>> > num_blocks = old_desc_blocks;
>> > if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
>> > num_blocks = ext2fs_blocks_count(fs->super) -
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4"
>> in the body of a message to [email protected] More majordomo
>> info at http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message
to
>[email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-11-30 20:06:34

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

Hi Ted, I was hoping you might resolve a question for us:

(please scroll down a bit)

On Thu, Nov 28, 2013 at 05:50:19PM +0900, Akira Fujita wrote:
> >On Tue, Nov 26, 2013 at 05:10:56PM +0900, Akira Fujita wrote:
> >> Hi Darrick,
> >>
> >> With your patch, e2fsck can't fix the filesystem created with existing
> >> mke2fs without resize_inode.
> >> Because "fs->block_map == bmap" condition is in ext2fs_reserve_super_and_bgd().
> >
> >Hmm, I think I see what's going on here.
> >
> >First, I think you're arguing that BLOCK_UNINIT should only be set on a group if it's
> completely
> >empty. I think you're also arguing that even if the group contains only its own superblocks,
> group
> >descriptors, bitmaps, or inode tables, BLOCK_UNINIT should not be set. Let's call that
> condition
> >(block group contains only group data, but BLOCK_UNINIT is set) "(1)".
> >
> >Note: If a group contains some other group's metadata, BLOCK_UNINIT should not (and does not
> seem
> >to) be set.
> >
> >(If I'm wrong, please let me know.)
>
> Yes. Current ext4, it seems to me that BLOCK_UNINIT is set under the following rules:
>
> 1 sb(backup sb)/gdt/bitmap/itable exist (e.g. #BG0) -> BLOCK_UNINIT off
> 2 backup sb/gdt exist (e.g. #BG5) -> BLOCK_UNINIT off
> 3 bitmap/itable exist (e.g. #BG16) -> BLOCK_UNINIT off
> 4 Completely empty (e.g. #BG6) -> BLOCK_UNINIT on
> Note: #BGN is in ext4 created by default option
>
> And there is a bug that block group which has backup super block
> and group descriptor block are not write out to device in wrtie_bitmaps()
> when BLOCK_UNINIT is set.
> So at this time, I attempted to fix this by cleaning BLOCK_UNIIT.

Is it the case that a group should only have BLOCK_UNINIT set if the group is
totally empty (no group metadata, no blocks allocated to files) as Akira says?

Or is it the case that a group can have BLOCK_UNINIT set if the group contains
group metadata but no blocks are allocated to files?

The kernel and e2fsck contain code to construct bitmaps at run time with the
group metadata blocks marked in-use (thus handling case #2). Unfortunately,
libext2fs does not do this; it stuffs in a zeroed bitmap any time it sees
BLOCK_UNINIT (case #1).

It seems that you can mkfs -O ^resize_inode and it'll produce an fs containing
groups that have superblocks/group descriptors but BLOCK_UNINIT set. The
kernel handles it correctly, but things like debugfs report that every block in
that group is free.

For sure I'd like to fix libext2fs to ensure that it marks the group metadata
blocks in-use in the in-memory bitmap when it's loading the block bitmap for a
BLOCK_UNINIT group.

As for whether or not a group can contain group data and have BLOCK_UNINIT set,
well, I defer to you. :)

--D

>
>
> >The kernel, when it needs to load a bitmap, checks for BLOCK_UNINIT; if set, it calls
> >ext4_init_block_bitmap() to zero the bitmap and mark the relevant sb/gdt/bmap/table bits, if
> >necessary. That takes care of (1) in the kernel.
> >
> >e2fsck pass5 also checks for BLOCK_UNINIT; if it's set, then it first compares block_found_map
> >against a zeroed bitmap (which falls back to no_optimize mode), and then it secondly compares
> each
> >block individually against the ranges that ext2fs_super_and_bgd_loc2(), ext2fs_*_bitmap_loc(),
> and
> >ext2fs_inode_table_loc() return. That takes care of (1) in e2fsck.
> >
> >libext2fs, however, checks for BLOCK_UNINIT; if it's set, it zeroes that part of the bitmap
> and
> >moves on without bothering to mark the sb/gdt/bmap/table bits. This does NOT take care of
> (1),
> >and opens a window where callers can allocate blocks that already hold group metadata! (Let's
> call
> >this (2).)
> >
> >The fuse2fs bug that I was seeing was that if (2) happens, then fuse2fs sees a zero bitmap and
> blindly
> >hands away group metadata, which is broken. With yesterday's patch, (1) doesn't happen and
> all
> >seems ok.
> >
> >I think in your e2fsck test, e2fsck tests block_found_map against a correctly constructed
> bitmap
> >(with ext2fs_super_and_bgd_loc2) if BLOCK_UNINIT is set, which is why it can't fix (1) unless
> e2fsck
> >is fed a backup superblock (or there's a corrupt a group descriptor checksum), either of which
> clears
> >BLOCK_UNINIT before pass5).
> >
> >So I'm wondering, is there more to the bug report than just "I ran mkfs and prodded e2fsck
> into
> >reporting errors"?
>
> I guess there is another bug.
>
> >Given that both the kernel and e2fsck know how to calculate the proper block bitmap when
> BLOCK_UNINIT
> >is set, it doesn't seem to be an error case if BLOCK_UNINIT is set on a group that contains
> superblocks,
> >group descriptors, bitmaps, or inode tables. It seems to me that if we can calculate a
> bitmap,
> >then there's no need to record it, and we can use BLOCK_UNINIT.
> >
> >What do you think of this solution: Change read_bitmaps to mark the location of
> sb/gdt/bmap/inode
> >blocks in the in-core block bitmap for BLOCK_UNINIT groups.
> >This fixes the problem that fs->block_bitmap doesn't accurately reflect the real state of the
> >filesystem (as evidenced by dumpe2fs/debugfs reporting more free blocks than the summar. I
> think
> >we can also eliminate the skip_group code from the block bitmap check in pass5. That should
> speed
> >up pass5 since we can also use the fast bitmap compare.
>
> Speeding up bitmap check in pass5 sounds nice,
> but setting BLOCK_UNINIT for block groups which have metadata means
> we'll change the definition of BLOCK_UNIIT... :s
>
> Regards,
> Akira Fujita
>
> >I'll send out a test patch.
> >
> >Of course, the tests seem coded to pass things like:
> >
> >> Group 2: (Blocks 16385-24576) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
> >> Block bitmap at 16385 (+0), Inode bitmap at 16386 (+1)
> > ^^^^^ ^^^^^
> >> Inode table at 16387-16642 (+2)
> > ^^^^^^^^^^^
> >> 7934 free blocks, 2048 free inodes, 0 directories, 2048 unused inodes
> >> Free blocks: 16385-24576
> > ^^^^^
> >> Free inodes: 4097-6144
> >
> ><grumble>
> >
> >--D
> >
> >> But if the condition removed (like my previous patch) we can detect
> >> and fix filesystem inconsistency as follows.
> >>
> >> 1.Create filesystem with existing mke2fs command # mke2fs -t ext4 -O
> >> ^resize_inode DEVICE
> >>
> >> 2. (With my patch) e2fsck removes uninit_bg flag then fixes metadata
> >> inconsistency # ./e2fsck/e2fsck -f DEVICE e2fsck 1.43-WIP (8-Jul-2013)
> >> One or more block group descriptor checksums are invalid. Fix<y>? yes
> >> Group descriptor 1 checksum is 0x0343, should be 0xc4ba. FIXED.
> >> Group descriptor 3 checksum is 0x17f2, should be 0xd00b. FIXED.
> >> Group descriptor 5 checksum is 0xf36c, should be 0x3495. FIXED.
> >> Group descriptor 7 checksum is 0x3e90, should be 0xf969. FIXED.
> >> Group descriptor 9 checksum is 0xa31e, should be 0x64e7. FIXED.
> >> Group descriptor 25 checksum is 0x165a, should be 0xd1a3. FIXED.
> >> Group descriptor 27 checksum is 0xce4c, should be 0x09b5. FIXED.
> >> Group descriptor 49 checksum is 0x502b, should be 0x97d2. FIXED.
> >> Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory
> >> structure Pass 3: Checking directory connectivity Pass 4: Checking
> >> reference counts Pass 5: Checking group summary information Block
> >> bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
> >> +(229376--229377)
> >> +(294912--294913) +(819200--819201) +(884736--884737)
> >> ++(1605632--1605633)
> >> Fix<y>? yes
> >>
> >> /dev/loop0: ***** FILE SYSTEM WAS MODIFIED *****
> >> /dev/loop0: 11/656640 files (0.0% non-contiguous), 73991/2621440
> >> blocks
> >>
> >> Regards,
> >> Akira Fujita
> >>
> >> >-----Original Message-----
> >> >From: Darrick J. Wong [mailto:[email protected]]
> >> >Sent: Tuesday, November 26, 2013 10:27 AM
> >> >To: Akira Fujita
> >> >Cc: Theodore Tso; ext4 development
> >> >Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O
> >> >^resize_inode
> >> >
> >> >Does the following patch fix your error? I think I've started to hit
> >> >it too, now that I added ^resize_inode,^meta_bg to the mix.
> >> >
> >> >--D
> >> >
> >> >libext2fs: clear BLOCK_UNINIT any time a group has metadata blocks
> >> >
> >> >If ext2fs_reserve_super_and_bgd() is called with the filesystem's
> >> >block bitmap, we clear the
> >> group's
> >> >BLOCK_UNINIT flag if the group has an old-style (pre-metabg) group
> >> >descriptor block with
> >> reserved
> >> >descriptor blocks prior to marking the old-style descriptor blocks in use.
> >> >
> >> >Unfortunately, this is not sufficient -- if the group has no reserved
> >> >blocks yet isn't a
> >> meta_bg
> >> >filesystem (^resize_inode,^meta_bg), we forget to clear the flag, so
> >> >the filesystem thinks all blocks are free, which is disastrous.
> >> >
> >> >Therefore, if we know that we're going to mark any of the bitmaps,
> >> >make sure that we clear BLOCK_UNINIT.
> >> >
> >> >This is based on an earlier patch by Akira Fujita.
> >> >
> >> >> Steps to reproduce:
> >> >> 1. mke2fs -t ext4 -b 4096 -O ^resize_inode device 2. e2fsck -b
> >> >> 32768 DEV
> >> >> <snip>
> >> >> Pass 1: Checking inodes, blocks, and sizes
> >> >> Pass 2: Checking directory structure
> >> >> Pass 3: Checking directory connectivity
> >> >> Pass 4: Checking reference counts
> >> >> Pass 5: Checking group summary information
> >> >> Block bitmap differences: +(32768--32769) +(98304--98305) +(163840--163841)
> >> >> Fix<y>?
> >> >
> >> >Reported-by: Akira Fujita <[email protected]>
> >> >Signed-off-by: Darrick J. Wong <[email protected]>
> >> >---
> >> > lib/ext2fs/alloc_sb.c | 11 +++++++++--
> >> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >> >
> >> >diff --git a/lib/ext2fs/alloc_sb.c b/lib/ext2fs/alloc_sb.c index
> >> >223ec51..f1898df 100644
> >> >--- a/lib/ext2fs/alloc_sb.c
> >> >+++ b/lib/ext2fs/alloc_sb.c
> >> >@@ -52,6 +52,15 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> >> > ext2fs_super_and_bgd_loc2(fs, group, &super_blk,
> >> > &old_desc_blk, &new_desc_blk, &used_blks);
> >> >
> >> >+ /*
> >> >+ * If we're marking blocks in the main block bitmap, be sure that we
> >> >+ * clear the group's BLOCK_UNINIT flag if this group has metadata
> >> >+ * blocks allocated.
> >> >+ */
> >> >+ if (fs->block_map == bmap &&
> >> >+ (new_desc_blk || old_desc_blk || super_blk || (group == 0)))
> >> >+ ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> >> >+
> >> > if (fs->super->s_feature_incompat & EXT2_FEATURE_INCOMPAT_META_BG)
> >> > old_desc_blocks = fs->super->s_first_meta_bg;
> >> > else
> >> >@@ -65,8 +74,6 @@ int ext2fs_reserve_super_and_bgd(ext2_filsys fs,
> >> > ext2fs_mark_block_bitmap2(bmap, 0);
> >> >
> >> > if (old_desc_blk) {
> >> >- if (fs->super->s_reserved_gdt_blocks && fs->block_map == bmap)
> >> >- ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> >> > num_blocks = old_desc_blocks;
> >> > if (old_desc_blk + num_blocks >= ext2fs_blocks_count(fs->super))
> >> > num_blocks = ext2fs_blocks_count(fs->super) -
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4"
> >> in the body of a message to [email protected] More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message
> to
> >[email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-12-04 23:44:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

On Sat, Nov 30, 2013 at 12:06:24PM -0800, Darrick J. Wong wrote:
> Hi Ted, I was hoping you might resolve a question for us:
> > And there is a bug that block group which has backup super block
> > and group descriptor block are not write out to device in wrtie_bitmaps()
> > when BLOCK_UNINIT is set.
> > So at this time, I attempted to fix this by cleaning BLOCK_UNIIT.
>
> Is it the case that a group should only have BLOCK_UNINIT set if the group is
> totally empty (no group metadata, no blocks allocated to files) as Akira says?
>
> Or is it the case that a group can have BLOCK_UNINIT set if the group contains
> group metadata but no blocks are allocated to files?

The meaning of BLOCK_UNINIT is that the contents of that block group's
block allocation bitmap is not initialized. This causes libext2fs to
skip writing the block allocation bitmap, and to also skip reading the
block allocation bitmap (which is why it substitutes all zero's
instead of reading the allocation bitmap block).

Before allocating a block from a block group that has the BLOCK_UNINIT
flag set, it is important that the kernel or the userspace library
first initialize the block allocation bitmap and clear the
BLOCK_UNINIT flag.

When allocating blocks, implementations MUST be able to initiaize the
allocation bitmap for block groups which has the block group's own
metadata blocks (backup superblock and bg descriptor blocks if any,
reserved bg blocks, the allocation bitmaps, and inode table blocks) in
use.

Implementations SHOULD be able to initialize bitmaps for block groups
that have metadata blocks from other block groups if the case of
flex_bg. However, historically there were some implementations that
didn't handle this correctly, which is why mke2fs initializes the
block bitmap and clears BLOCK_UNINIT in block groups that have
metadata blocks for other block groups.

Optionally, implementations MAY set the BLOCK_UNINIT bit after data
blocks have been deallocated from a block group such that the only
blocks in use are the block group's metadata groups.

Also, some implementations MAY clear the BLOCK_UNINIT bit and
initialize the block allocation bitmap early --- for example, when
allocating an inode in the block group. This shouldn't be required,
however, and so implementations SHOULD correctly handle a situation
where an inode has been allocated in the inode table, but BLOCK_UNINIT
is set.

All of this basically boils down to the two rules of thumb:

1) The BLOCK_UNINIT bit is fundamentally about whether the block
allocation bitmap is valid, and whether mke2fs can skip needing to
initialize the block, and whether e2fsck, dumpe2fs, debugfs, etc. can
skip reading said allocation bitmap.

2) The IETF rule of "be conservative in what you send, and liberal in
what you accept" applies.

Does that help?

Thanks,

- Ted

2013-12-05 01:22:18

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

Hi Ted, thank you for providing some historical context. :)

On Wed, Dec 04, 2013 at 06:44:35PM -0500, Theodore Ts'o wrote:
> On Sat, Nov 30, 2013 at 12:06:24PM -0800, Darrick J. Wong wrote:
> > Hi Ted, I was hoping you might resolve a question for us:
> > > And there is a bug that block group which has backup super block
> > > and group descriptor block are not write out to device in wrtie_bitmaps()
> > > when BLOCK_UNINIT is set.
> > > So at this time, I attempted to fix this by cleaning BLOCK_UNIIT.
> >
> > Is it the case that a group should only have BLOCK_UNINIT set if the group is
> > totally empty (no group metadata, no blocks allocated to files) as Akira says?
> >
> > Or is it the case that a group can have BLOCK_UNINIT set if the group contains
> > group metadata but no blocks are allocated to files?
>
> The meaning of BLOCK_UNINIT is that the contents of that block group's
> block allocation bitmap is not initialized. This causes libext2fs to
> skip writing the block allocation bitmap, and to also skip reading the
> block allocation bitmap (which is why it substitutes all zero's
> instead of reading the allocation bitmap block).
>
> Before allocating a block from a block group that has the BLOCK_UNINIT
> flag set, it is important that the kernel or the userspace library
> first initialize the block allocation bitmap and clear the
> BLOCK_UNINIT flag.
>
> When allocating blocks, implementations MUST be able to initiaize the
> allocation bitmap for block groups which has the block group's own
> metadata blocks (backup superblock and bg descriptor blocks if any,
> reserved bg blocks, the allocation bitmaps, and inode table blocks) in
> use.

I'll call this ^^^^^ (B).

> Implementations SHOULD be able to initialize bitmaps for block groups
> that have metadata blocks from other block groups if the case of
> flex_bg. However, historically there were some implementations that
> didn't handle this correctly, which is why mke2fs initializes the
> block bitmap and clears BLOCK_UNINIT in block groups that have
> metadata blocks for other block groups.
>
> Optionally, implementations MAY set the BLOCK_UNINIT bit after data
> blocks have been deallocated from a block group such that the only
> blocks in use are the block group's metadata groups.

For the record, neither e2fsprogs nor the kernel do this -- the only code that
sets BLOCK_UNINIT is the fs grow code in resize2fs.

> Also, some implementations MAY clear the BLOCK_UNINIT bit and
> initialize the block allocation bitmap early --- for example, when
> allocating an inode in the block group. This shouldn't be required,
> however, and so implementations SHOULD correctly handle a situation
> where an inode has been allocated in the inode table, but BLOCK_UNINIT
> is set.
>
> All of this basically boils down to the two rules of thumb:
>
> 1) The BLOCK_UNINIT bit is fundamentally about whether the block
> allocation bitmap is valid, and whether mke2fs can skip needing to
> initialize the block, and whether e2fsck, dumpe2fs, debugfs, etc. can
> skip reading said allocation bitmap.

Right now, e2fsck and resize2fs take care of (B) on their own.

One of my patches fixes everything else (debugfs, dumpe2fs, fuse2fs, tune2fs)
to take care of (B) by doing it in the library. I can't think of a scenario
where it'd be useful to run around with something like this:

# mke2fs -t ext4 /dev/vda -O ^resize_inode,meta_bg
# dumpe2fs /dev/vda
<snip>
Group 1: (Blocks 32768-65535) [INODE_UNINIT, BLOCK_UNINIT]
Checksum 0xa85c, unused inodes 8192
Backup superblock at 32768, Group descriptor at 32769
^^^^^ ^^^^^
Block bitmap at 3 (bg #0 + 3), Inode bitmap at 19 (bg #0 + 19)
Inode table at 546-1057 (bg #0 + 546)
32766 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
Free blocks: 32768-65535
^^^^^ <------------------ HEY!
Free inodes: 8193-16384
<snip>
# debugfs /dev/vda -R 'testb 32768'
Block 32768 not in use

Notice that blocks 32768-32769 are claimed by the group, but fs->block_map
thinks those blocks are free. ext2fs_open() hands back to client programs a fs
handle in which the block bitmap is in that broken state.

> 2) The IETF rule of "be conservative in what you send, and liberal in
> what you accept" applies.

I'm not convinced that we /need/ Akira's patch to clear BLOCK_UNINIT on any
group containing its own metadata, but I doubt it'd harm anything other than
make e2fsck slower.

It would certainly be the conservative-send route though.

--D

>
> Does that help?
>
> Thanks,
>
> - Ted

2013-12-05 02:03:46

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

On Wed, Dec 04, 2013 at 05:22:10PM -0800, Darrick J. Wong wrote:
> > 2) The IETF rule of "be conservative in what you send, and liberal in
> > what you accept" applies.
>
> I'm not convinced that we /need/ Akira's patch to clear BLOCK_UNINIT on any
> group containing its own metadata, but I doubt it'd harm anything other than
> make e2fsck slower.
>
> It would certainly be the conservative-send route though.

The place where we are being conserviative what we send is that we
clear BLOCK_UNINIT for block groups that don't have any data blocks,
but which has metadata blocks belonging to *other* block groups,
because there were some kernel implementations in the past that didn't
handle this correctly.

But if you have a block group that has only its metadata, that's
perfectly fine. And that's easy to test; if you create a file system
like this:

touch /tmp/foo.img
mke2fs -t ext2 -O uninit_bg /tmp/foo.img 1800000

... then by definition every single block group has its own metadata,
and if there were problems with block groups that had its own metadata
blocks, we wouldn't be able to set BLOCK_UNINIT on any block group at
all.

It looks like we are currently clearing BLOCK_UNINIT for block groups
that contain superblocks and backup superblocs. To be honest, I don't
remember why we are currently doing this. I *think* the kernel and
all modern e2fsprogs should be able to do the right thing, if we set
BLOCK_UNINIT on all block groups.

- Ted

2013-12-05 02:09:42

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

On Wed, Dec 04, 2013 at 05:22:10PM -0800, Darrick J. Wong wrote:
> Right now, e2fsck and resize2fs take care of (B) on their own.
>
> One of my patches fixes everything else (debugfs, dumpe2fs, fuse2fs, tune2fs)
> to take care of (B) by doing it in the library. I can't think of a scenario
> where it'd be useful to run around with something like this:
>
> # mke2fs -t ext4 /dev/vda -O ^resize_inode,meta_bg
> # dumpe2fs /dev/vda
> <snip>
> Group 1: (Blocks 32768-65535) [INODE_UNINIT, BLOCK_UNINIT]
> Checksum 0xa85c, unused inodes 8192
> Backup superblock at 32768, Group descriptor at 32769
> ^^^^^ ^^^^^
> Block bitmap at 3 (bg #0 + 3), Inode bitmap at 19 (bg #0 + 19)
> Inode table at 546-1057 (bg #0 + 546)
> 32766 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
> Free blocks: 32768-65535
> ^^^^^ <------------------ HEY!
> Free inodes: 8193-16384
>
> <snip>
> # debugfs /dev/vda -R 'testb 32768'
> Block 32768 not in use

I'm not that concerned about dumpe2fs and debugfs, since that's just a
display issue. What's more important is that lib/ext2fs/alloc.c
correctly handles allocations --- see check_block_uninit() and
check_inode_uninit() in lib/ext2fs/alloc.c. We do take care of
initializing the portion of the allocation bitmap.

This is good, because it means that e2tools will do the right thing
when it writes files to an ext4 file system that has BLOCK_UNINIT set
in a block group.

- Ted

2013-12-05 05:01:07

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

On Wed, Dec 04, 2013 at 09:09:40PM -0500, Theodore Ts'o wrote:
> On Wed, Dec 04, 2013 at 05:22:10PM -0800, Darrick J. Wong wrote:
> > Right now, e2fsck and resize2fs take care of (B) on their own.
> >
> > One of my patches fixes everything else (debugfs, dumpe2fs, fuse2fs, tune2fs)
> > to take care of (B) by doing it in the library. I can't think of a scenario
> > where it'd be useful to run around with something like this:
> >
> > # mke2fs -t ext4 /dev/vda -O ^resize_inode,meta_bg
> > # dumpe2fs /dev/vda
> > <snip>
> > Group 1: (Blocks 32768-65535) [INODE_UNINIT, BLOCK_UNINIT]
> > Checksum 0xa85c, unused inodes 8192
> > Backup superblock at 32768, Group descriptor at 32769
> > ^^^^^ ^^^^^
> > Block bitmap at 3 (bg #0 + 3), Inode bitmap at 19 (bg #0 + 19)
> > Inode table at 546-1057 (bg #0 + 546)
> > 32766 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
> > Free blocks: 32768-65535
> > ^^^^^ <------------------ HEY!
> > Free inodes: 8193-16384
> >
> > <snip>
> > # debugfs /dev/vda -R 'testb 32768'
> > Block 32768 not in use
>
> I'm not that concerned about dumpe2fs and debugfs, since that's just a
> display issue. What's more important is that lib/ext2fs/alloc.c

I disagree that testb returning wrong results is a merely display issue. I
wasn't expecting to ffb a block I know is free and have other blocks suddenly
"become" in use:

debugfs: testb 32768
Block 32768 not in use
debugfs: ffb 1 65500
Free blocks found: 65500
debugfs: testb 32768
Block 32768 marked in use
debugfs: q
ext2fs_write_block_bitmap: Attempt to write to filesystem opened read-only
ext2fs_close: Attempt to write to filesystem opened read-only

I shouldn't have to make debugfs modify the fs just to have testb return
correct results. I don't think this qualifies as "least surprise".

Anyway, I'll send the patch along and see what everyone thinks of it.

> correctly handles allocations --- see check_block_uninit() and
> check_inode_uninit() in lib/ext2fs/alloc.c. We do take care of
> initializing the portion of the allocation bitmap.
>
> This is good, because it means that e2tools will do the right thing
> when it writes files to an ext4 file system that has BLOCK_UNINIT set
> in a block group.

I agree that doing the initialization late is better than not doing it at all,
so at least we're surprising and not broken. :)

--D

2013-12-05 05:04:17

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] mke2fs: Fix block bitmaps initalization with -O ^resize_inode

[resend without accidentally dropping the cc's]

On Wed, Dec 04, 2013 at 09:03:40PM -0500, Theodore Ts'o wrote:
> On Wed, Dec 04, 2013 at 05:22:10PM -0800, Darrick J. Wong wrote:
> > > 2) The IETF rule of "be conservative in what you send, and liberal in
> > > what you accept" applies.
> >
> > I'm not convinced that we /need/ Akira's patch to clear BLOCK_UNINIT on any
> > group containing its own metadata, but I doubt it'd harm anything other than
> > make e2fsck slower.
> >
> > It would certainly be the conservative-send route though.
>
> The place where we are being conserviative what we send is that we
> clear BLOCK_UNINIT for block groups that don't have any data blocks,
> but which has metadata blocks belonging to *other* block groups,
> because there were some kernel implementations in the past that didn't
> handle this correctly.
>
> But if you have a block group that has only its metadata, that's
> perfectly fine. And that's easy to test; if you create a file system
> like this:
>
> touch /tmp/foo.img
> mke2fs -t ext2 -O uninit_bg /tmp/foo.img 1800000
>
> ... then by definition every single block group has its own metadata,
> and if there were problems with block groups that had its own metadata
> blocks, we wouldn't be able to set BLOCK_UNINIT on any block group at
> all.
>
> It looks like we are currently clearing BLOCK_UNINIT for block groups
> that contain superblocks and backup superblocs. To be honest, I don't
> remember why we are currently doing this. I *think* the kernel and

Looking at the git history, that chunk of ext2fs_reserve_super_and_bgd() landed
there when you were trying to clean up libext2fs, though there's no obvious
reason why we need to unset BLOCK_UNINIT there too. Oh well, it can't hurt.

> all modern e2fsprogs should be able to do the right thing, if we set
> BLOCK_UNINIT on all block groups.

Fortunately, they do, and have since the initial ext4dev code in 2006-7.

--D
>
> - Ted