2015-05-20 01:10:13

by Phillip Susi

[permalink] [raw]
Subject: Unused block group, but all blocks not free?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

I broke out the old e2defrag for an experiment today ( wondering why
resize2fs refuses to shrink a volume below half its size even though
it is only 33% used; thought I would try packing all files as far to
the left as possible and try again ) and it bailed out because the
free block count listed in the superblock does not match what the
block allocation bitmaps indicate. It turns out this is due to some
block groups that are unused, and thus have uninitialized allocation
bitmaps, yet somehow claim to not have all of their blocks free. The
stats output from debugfs shows this:

Group 25: block bitmap at 524297, inode bitmap at 524313, inode table
at 528928
32383 free blocks, 8192 free inodes, 0 used directories,
8192 unused inodes
[Inode not init, Block not init, Checksum 0x45a8]


How on earth can this be? The block allocation bitmap is
uninitialized, therefore all bits are assumed to be clear. Yet the
free blocks count is only 32383 instead of 32768. How can a block
group be totally unused, and yet not have all of its blocks free?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJVW970AAoJENRVrw2cjl5RU5cH/2CM88pWTMD52KXWAT+1RfK8
oTuwSRdIyLlNF63kYicHAyd0TKHJ/qUJwmyZMJUfdDxbraMSYsGNtzUOaaL5hcgJ
qwJfDXO3UmSLILSt8xGpgBH0TfybPLsBSzzrnyvq1Wk79a2HNAnOzTZJX+g7iDhP
iePN3QEmG78xYK/V9gsiFO/PFh5KBXjhsdlLHkZgHLKUzWncfEzgtGU8m47PQLWq
w4NEn+KBXr6k8jxx/btB3halgH0+70eejVNdM34SHx4ZvcckStz71aNT4six3os2
euOMLMQol8fgKCu/eCRSIT/oruF5F7op467Q3gP3JHp3e7V+21jW8f1ykO9xG8Y=
=DhLq
-----END PGP SIGNATURE-----


2015-05-20 15:10:11

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On Tue, May 19, 2015 at 09:10:12PM -0400, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> I broke out the old e2defrag for an experiment today ( wondering why
> resize2fs refuses to shrink a volume below half its size even though
> it is only 33% used; thought I would try packing all files as far to
> the left as possible and try again ) and it bailed out because the
> free block count listed in the superblock does not match what the
> block allocation bitmaps indicate. It turns out this is due to some
> block groups that are unused, and thus have uninitialized allocation
> bitmaps, yet somehow claim to not have all of their blocks free. The
> stats output from debugfs shows this:
>
> Group 25: block bitmap at 524297, inode bitmap at 524313, inode table
> at 528928
> 32383 free blocks, 8192 free inodes, 0 used directories,
> 8192 unused inodes
> [Inode not init, Block not init, Checksum 0x45a8]
>
>
> How on earth can this be? The block allocation bitmap is
> uninitialized, therefore all bits are assumed to be clear. Yet the
> free blocks count is only 32383 instead of 32768. How can a block
> group be totally unused, and yet not have all of its blocks free?

Have you checked to see if the metadata for other block groups are
taking up space in the block group. This can happen when using the
flex_bg layout. (And without flex_bg, then *all* block groups will
always have blocks in use, for their own metadata blocks.)

- Ted

2015-05-20 15:17:16

by Phillip Susi

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On 5/20/2015 11:10 AM, Theodore Ts'o wrote:
> Have you checked to see if the metadata for other block groups are
> taking up space in the block group. This can happen when using the
> flex_bg layout. (And without flex_bg, then *all* block groups will
> always have blocks in use, for their own metadata blocks.)

That was my first thought and no, they aren't. The metadata is in bg 0
and bg 32, not 25. Even still, when the metadata is there, shouldn't
the allocation bitmap mark those blocks as in use rather than be
uninitialized?


2015-05-20 16:31:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On Wed, May 20, 2015 at 11:15:59AM -0400, Phil Susi wrote:
> On 5/20/2015 11:10 AM, Theodore Ts'o wrote:
> >Have you checked to see if the metadata for other block groups are
> >taking up space in the block group. This can happen when using the
> >flex_bg layout. (And without flex_bg, then *all* block groups will
> >always have blocks in use, for their own metadata blocks.)
>
> That was my first thought and no, they aren't. The metadata is in bg 0 and
> bg 32, not 25. Even still, when the metadata is there, shouldn't the
> allocation bitmap mark those blocks as in use rather than be uninitialized?

As an optimization, if we can reconstruct the allocation bitmap from
the block group descriptors, we'll leave the block allocation bitmap
uninitialized, so that programs like e2fsck don't have to read the
bitmap block. For a mostly empty file system, this optimization is
quite noticeable.

Can you send me a compressed raw e2image of the file system so I can
take a look?

- Ted

2015-05-21 23:59:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On Wed, May 20, 2015 at 06:05:53PM -0400, Phillip Susi wrote:
> On 05/20/2015 12:31 PM, Theodore Ts'o wrote:
> > As an optimization, if we can reconstruct the allocation bitmap from
> > the block group descriptors, we'll leave the block allocation bitmap
> > uninitialized, so that programs like e2fsck don't have to read the
> > bitmap block. For a mostly empty file system, this optimization is
> > quite noticeable.
>
> Ahh, so for block groups that have uninitialized bitmaps, I suppose I'll
> need to add a check to reconstruct the used blocks from the group
> descriptor table pointers.
>
> > Can you send me a compressed raw e2image of the file system so I can
> > take a look?
>
> Here it is.

Ah, so it's pretty self-explanatory. From the dumpe2fs of the image:

Group 25: (Blocks 819200-851967) csum 0x45a8 [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
^^^^^^^^^^^^^
Backup superblock at 819200, Group descriptors at 819201-819201
^^^^^^ ^^^^^^^^^^^^^
Reserved GDT blocks at 819202-819584
^^^^^^^^^^^^^
Block bitmap at 524297 (bg #16 + 9)
Inode bitmap at 524313 (bg #16 + 25)
Inode table at 528928-529439 (bg #16 + 4640)
32383 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
Free blocks: 819585-851967
Free inodes: 204801-212992

One of the things which we didn't do as part of flex_bg was to change
the location of the superblocks that had backup superblock and block
group descriptor blocks (which perhaps we should have done, but oh,
well). There is the sparse_super2 feature which would allow us to
reduce the number of backup superblocks down to two, one, or zero, but
that would require people using newer versions of e2fsprogs, so it's
not something we've enabled yet for wider use (although it is
something I've been using at $WORK).

- Ted




2015-05-22 00:11:30

by Phillip Susi

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 05/21/2015 07:59 PM, Theodore Ts'o wrote:
> Ah, so it's pretty self-explanatory. From the dumpe2fs of the
> image:
>
> Group 25: (Blocks 819200-851967) csum 0x45a8 [INODE_UNINIT,
> BLOCK_UNINIT, ITABLE_ZEROED] ^^^^^^^^^^^^^ Backup superblock at
> 819200, Group descriptors at 819201-819201 ^^^^^^
> ^^^^^^^^^^^^^ Reserved GDT blocks at 819202-819584 ^^^^^^^^^^^^^
> Block bitmap at 524297 (bg #16 + 9) Inode bitmap at 524313 (bg #16
> + 25) Inode table at 528928-529439 (bg #16 + 4640) 32383 free
> blocks, 8192 free inodes, 0 directories, 8192 unused inodes Free
> blocks: 819585-851967 Free inodes: 204801-212992

Interesting... I don't get that information from debugfs.
Specifically it doesn't give the blocks that comprise the group ( but
that is calculated easily enough ), nor list which blocks are reserved
for the GDT. I'm using 1.42.12, is that a more recent feature?

I had actually just figured out that 25 is where the superblock and
GDT backups are, and therefore, should also contain part of the resize
inode. It seems that e2defrag adds the blocks owned by reserved
inodes after checking the free count so I'll have to rejigger it to do
that first.

I wonder though, why I didn't see these problems the last time I used
e2defrag, which was probably 2 years ago and was on a filesystem with
a resize inode, and flex_bg and uninitialized bitmaps. At the time it
seemed that uninitialized bitmaps were not used on groups with blocks
used for these sorts of things. I guess that must also be a more
recent change.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJVXnNxAAoJENRVrw2cjl5RQ9IH/2fO3y4PkV6lUMnyfzfoDxGX
vpv9U1jcKa1KVkbycT7ghB5JF2PugdWA5Y2PB0yrNKpQjhNTcMXx7aa1pAx8JbBc
zzYoOFMyMw6nTPdOz+rLH+VZwI/SqGyy6uLsaIdoQ/FJ5Xcq2mpgRQUt7DQNYXEW
xGdtGWiyxFdVt6SmBOSS0SDfyYnRmv/erqy1BvpZGDL6syVjPfxTcTHaHOY6wKL2
sWcvLO2ynR+ry/c+d6XnWDZb5dz9dZQTZaCj2FVEEsFMDFtQQEu0Eb5/LWGLnQap
TSUfub1fGhRW9ciJBkoC52XKwiMVWKBi59/VpYixIN2s0S+Tiu4+Yn/x1BP3udk=
=Re+Z
-----END PGP SIGNATURE-----

2015-05-22 02:28:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On Thu, May 21, 2015 at 08:08:17PM -0400, Phillip Susi wrote:
>
> I wonder though, why I didn't see these problems the last time I used
> e2defrag, which was probably 2 years ago and was on a filesystem with
> a resize inode, and flex_bg and uninitialized bitmaps. At the time it
> seemed that uninitialized bitmaps were not used on groups with blocks
> used for these sorts of things. I guess that must also be a more
> recent change.

The change was that for uninitialized block bitmaps, dumpe2fs used to
display incorrect information (that is, it would claim all of the
blocks were free, even though in fact that was not true).

People complained this wasn't actually accurate, so what we are
currently doing is considered more correct(tm).

- Ted

2015-05-22 12:38:13

by Phillip Susi

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On 5/21/2015 10:28 PM, Theodore Ts'o wrote:
> On Thu, May 21, 2015 at 08:08:17PM -0400, Phillip Susi wrote:
> The change was that for uninitialized block bitmaps, dumpe2fs used to
> display incorrect information (that is, it would claim all of the
> blocks were free, even though in fact that was not true).

It seems to have actually been a change on disk, since e2defrag *used*
to count the number of free blocks assuming an uninitialized bitmap
meant that they were all free, and get the same number reported in the
superblock. This was probably prior to the change you are thinking of.

IIRC, when I looked using debugfs, the block groups containing metadata
always had their bitmaps initialized.


2015-05-23 03:07:09

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

On Fri, May 22, 2015 at 08:37:26AM -0400, Phil Susi wrote:
> On 5/21/2015 10:28 PM, Theodore Ts'o wrote:
> >On Thu, May 21, 2015 at 08:08:17PM -0400, Phillip Susi wrote:
> >The change was that for uninitialized block bitmaps, dumpe2fs used to
> >display incorrect information (that is, it would claim all of the
> >blocks were free, even though in fact that was not true).
>
> It seems to have actually been a change on disk, since e2defrag *used* to
> count the number of free blocks assuming an uninitialized bitmap meant that
> they were all free, and get the same number reported in the superblock.
> This was probably prior to the change you are thinking of.

The change was in libext2fs; it would actually initialize portion of
the bitmap coming from uninitialized block groups instead of actually
leaving that portion of the bitmap as all zero.

> IIRC, when I looked using debugfs, the block groups containing metadata
> always had their bitmaps initialized.

That's a different change. We checked that the kernel and e2fsprogs
was doing the right thing if block gorups containing metadata were
left uninitialized (and in fact had been doing the right thing for a
long time), and so we started allowing mke2fs to mark those block
groups as uninitialized.

We didn't check e2defrag because I didn't realize you had ressurected
it. (At least when I last looked at it, I was too scared about its
error handling, etc., and so I had deliberately declined to try to get
it into e2fsprogs as something I wasn't willing to support it. So you
are a braver person than I....)

- Ted

2015-05-23 15:39:57

by Phillip Susi

[permalink] [raw]
Subject: Re: Unused block group, but all blocks not free?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 05/22/2015 11:05 PM, Theodore Ts'o wrote:
> We didn't check e2defrag because I didn't realize you had
> ressurected it. (At least when I last looked at it, I was too
> scared about its error handling, etc., and so I had deliberately
> declined to try to get it into e2fsprogs as something I wasn't
> willing to support it. So you are a braver person than I....)

Of course... I rescued it a few years ago after debian removed it (
before e4defrag was around ) and got it fixed up to work on modern
filesystems and it's hosted at launchpad.net/e2defrag. I break it out
every now and again for performance testing since it does a full
defrag rather than only files ( maximize large free extents ) and the
ability to pack files in a specified order, such as the order they are
read during boot. Well, that and I still find the ANSI block map just
as fun to watch as when I first used it in the '90s on a 486 with 2mb
of ram ;)

I really like the algorithm you guys came up with back then for moving
blocks from where they are to their optimal position in mostly large
sequential chunks and without having to move them more than once.
Yea, it's not crash resilient, but it is FAST.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJVYJ9IAAoJENRVrw2cjl5R5+YIAIjUyZ869wMO1LOMpcmjt0Rs
XRt2VJHwkT8y1U+STsEeMOOR5TydR+xWyG4ExaoPW+WOzROQNs+vbEH9tQZvBVNm
e6BLnKPPAYX392Ar3siufF87ivcGlM3SyQxrZgaj+EIlSpVbOVE6pnx2aSLhPhz5
znXL0sEioK+F/KxHNRfQyAtqPOc/OkJ1l1csanPUJFlEgQEuu4rSgaLvc3e4Y1pO
M0b09dqQHLJcGOSn+me22laLKAlTxhZWYkRoJrHielgX7sCukYMe5hEu6Nc/Gzt4
T/rAca/vmM9R5owExrbNydwNExnwZcB0bZT5ZBqU+9U/BwouJalKkGUhsH+zbpE=
=ahhy
-----END PGP SIGNATURE-----