I've just pushed out large number of updates to e2fsprogs's git
repository. The maint branch is close to what I hope to release as
1.42.7. It has the flex_bg resize2fs fixes, and is gcc -Wall clean.
Please test it out, and let me know if you see any issues before we do a
release.
One known issue. There still seems to be some resize2fs problems when
the 64-bit feature is set when doing off-line (unmounted) resizing:
touch /mnt/testfs
truncate -s 8T /mnt/testfs
mke2fs -t ext4 -O 64bit /mnt/testfs
e2fsck -fy /mnt/testfs
truncate -s 20T /mnt/testfs
resize2fs -p /mnt/testfs
e2fsck -fy /mnt/testfs
#
# Pass 5 errors found:
#
# Free blocks count wrong (1031374305, counted=5326341601).
# Fix<y>? yes
truncate -s 21T /mnt/testfs
resize2fs -p /mnt/testfs
e2fsck -fy /mnt/testfs
#
# Pass 5 errors found:
#
# Block bitmap differences: +(1073774592--1073807359)
# Fix<y>? yes
# Free blocks count wrong for group #32769 (32768, counted=0).
# Fix<y>? yes
# Free blocks count wrong (1297725793, counted=5592660321).
# Fix<y>? yes
Apparently somehow the blocks associated with the journal inode got
released as part of the resize2fs. I'm not sure what's going on, but
this is clearly something we'll want to fix up before we release 1.42.7.
Still, it's much less serious set of problems than we had before.
- Ted
On Tue, Jan 01, 2013 at 10:24:20PM -0500, Theodore Ts'o wrote:
>
> One known issue. There still seems to be some resize2fs problems when
> the 64-bit feature is set when doing off-line (unmounted) resizing.
One thought.... since the problems are limited to issues with the bg
accounting and block bitmap, one hack we could do if we can't figure
out the bug sooner enough would be to throw in a check where if we are
doing an off-line resize with 64-bit filesystems, to mark the file
system has needing an fsck and requesting the user to run fsck before
using the file system.
It's ugly, and it will screw up programs like parted which invoke
resize2fs, but (a) it's better what happens if people try using
resize2fs with 64-bit file systems with current versions of e2fsprogs,
and (b) at least online resizing with 64-bit file systems works
(assuming you have a sufficiently new kernel).
Not something I want to do, but if it takes too long to track down
these last defects with resize2fs, I'd rather get 1.42.7 out the door
sooner rather later, given the other known bugs in resize2fs and
e2fsck which are unfixed in 1.42.6 and earlier versions (see the
updated RELEASE-NOTES for more details).
- Ted
On 1/2/13 9:27 AM, Theodore Ts'o wrote:
> On Tue, Jan 01, 2013 at 10:24:20PM -0500, Theodore Ts'o wrote:
>>
>> One known issue. There still seems to be some resize2fs problems when
>> the 64-bit feature is set when doing off-line (unmounted) resizing.
>
> One thought.... since the problems are limited to issues with the bg
> accounting and block bitmap, one hack we could do if we can't figure
> out the bug sooner enough would be to throw in a check where if we are
> doing an off-line resize with 64-bit filesystems, to mark the file
> system has needing an fsck and requesting the user to run fsck before
> using the file system.
>
> It's ugly, and it will screw up programs like parted which invoke
> resize2fs, but (a) it's better what happens if people try using
> resize2fs with 64-bit file systems with current versions of e2fsprogs,
> and (b) at least online resizing with 64-bit file systems works
> (assuming you have a sufficiently new kernel).
>
> Not something I want to do, but if it takes too long to track down
> these last defects with resize2fs, I'd rather get 1.42.7 out the door
> sooner rather later, given the other known bugs in resize2fs and
> e2fsck which are unfixed in 1.42.6 and earlier versions (see the
> updated RELEASE-NOTES for more details).
Apologies for not following more closely, but is this problem
a new regression, an old regression, or something that has never
worked?
-Eric
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Wed, Jan 02, 2013 at 09:54:26AM -0600, Eric Sandeen wrote:
>
> Apologies for not following more closely, but is this problem
> a new regression, an old regression, or something that has never
> worked?
I'm not 100% sure, since we had done _some_ 64-bit off-line resize
testing back when we merged 64-bit support, but it's possible that
this was a problem that had been missed.
Part of the problem is we don't have any automated regression testing
for resize2fs, since creating test file systems is slow --- doing a
complete set of tests would probably take hours and hours, and would
require having a file system capable of 64-bit logical blocknumbers
(i.e., such as XFS) mounted, and/or require using device mapper with
thin provisioning.
The much more serious problems were resizing ext4 file systems
(specifically, file systems with the flex_bg feature enabled) when we
had run out of reserved gdt blocks in the resize inode, or if there
was no resize inode at all. There was a safety check protecting users
who fell in the latter category, but if you deliberately created a
file system with a smaller resize inode, and then tried to resize to a
file system size larger than the resize inode, the result was inode
table corruption, as George Spelvin discovered. This specific
resize2fs problem was not unique to 64-bit file systems, but was much
more likely to trigger with large 64-bit file systems.
I'm pretty sure we have two separate problems going on at this point.
One is that in some cases, the free blocks count is corrupted after a
64-bit resize. That one seems pretty easy to find and fix; we're
probably overflowing a 32-bit blk_t somewhere that needs to be a
blk64_t. The other one is a mysterious problem where apparently the
blocks associated with the journal inode gets marked as cleared after
an off-line resize. This is the one which is scarier, but thinking
about it, we can probably find this using some debugging code in the
block bitmap functions to trigger a breakpoint when those blocks get
cleared, so we can figure out what is happening at that point.
After we fix these two problem, the sort of testing we should do to
make sure off-line resizing is sane would be to fill a file system
with some test data, checksum all of the data files, run resize2fs,
and then run e2fsck on the resulting file system, recheck the
checksums of the data files to make sure nothing got crunched.
- Ted