Can't remember how I stumbled on this testcase, but mounting
an ext3 filesystem with "-t ext4" and then resizing leads to trouble.
With -o nodelalloc, the newly added space isn't seen by the allocator
and we get ENOSPC for the extending writes in the script below.
Without -o nodelalloc, the writes worked but I got an umount hang.
Without -t ext4 (but letting ext4.ko handle the ext3 mount) it seems
to work fine.
Haven't looked into it much at all yet but wanted to put it out
there for posterity.
(script requires xfs_io, sorry, could be changed to dd I suppose)
-Eric
#!/bin/bash
# Initial setup to create block devices prior to test:
# /root/fallocate -l 100g fsfile
# losetup /dev/loop0 fsfile
# pvcreate /dev/loop0
# vgcreate VG /dev/loop0
COUNT=20
mkdir -p mnt
umount mnt &>/dev/null
lvremove -f /dev/VG/LV
lvcreate -L 75G -n LV /dev/VG
mkfs.ext3 -K /dev/VG/LV
mount -t ext4 -o nodelalloc /dev/VG/LV mnt/
for I in `seq 1 $COUNT`; do mkdir mnt/dir$I; dd if=/dev/zero of=mnt/dir$I/file$I bs=1M count=4096; done
echo "before growing:"
df mnt/
umount mnt
mount -t ext4 -o nodelalloc /dev/VG/LV mnt/
lvextend -L +5g /dev/VG/LV
echo "growing:"
resize2fs /dev/VG/LV
echo "done growing:"
df mnt/
# This gets ENOSPC for all of them
echo "try extending files:"
for I in `seq 1 $COUNT`; do xfs_io -f -F -c "pwrite -b 60m 4g 120m" mnt/dir$I/file$I; done
df mnt/
umount mnt
On Mon, Feb 18, 2013 at 03:41:11PM -0600, Eric Sandeen wrote:
> Can't remember how I stumbled on this testcase, but mounting
> an ext3 filesystem with "-t ext4" and then resizing leads to trouble.
>
> With -o nodelalloc, the newly added space isn't seen by the allocator
> and we get ENOSPC for the extending writes in the script below.
>
> Without -o nodelalloc, the writes worked but I got an umount hang.
>
> Without -t ext4 (but letting ext4.ko handle the ext3 mount) it seems
> to work fine.
>
> Haven't looked into it much at all yet but wanted to put it out
> there for posterity.
At least one of the problems is that ext4_alloc_blocks() is buggy if
it is asked to allocate one or more indirect blocks, and then it
doesn't have room to allocate any direct blocks. In that case,
ext4_alloc_blocks() does not return ENOSPC, and so ext4_alloc_branch()
doesn't fail. But since the number of direct blocks allocated is
zeor, ext4_splice_branch() will not actually initialize the indirect
block, and then we end up looping forever and calling
ext4_mballoc_alloc() --- demonstrating that one of the best definition
of insanity is doing the same thing over and over again and expecting
a different result:
flush-254:32-2913 [001] .... 1073.028245: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
flush-254:32-2913 [001] .... 1073.050655: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
flush-254:32-2913 [001] .... 1073.073034: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
flush-254:32-2913 [001] .... 1073.112163: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
I suspect the right way to deal with this is to nuke
ext4_alloc_blocks() from orbit, and change ext4_alloc_branch() to
allocate the indirect and direct blocks directly, calling
ext4_new_meta_block() and ext4_mb_new_blocks() directly. What we have
right now is pretty gross....
The other problem is why resizing isn't adding the blocks so that they
are visible to the allocator. Since we are using the same code path
for ext3 and ext4 file systems, I have a sneaking suspicion that we're
not actually making all of the newly allocated blocks for ext4 file
systems available too, but it's something like we're not making the
first block in each flex_bg group available (and that happens to be all
of the newly grown blocks for ext3 file systems).
As near as I can tell this isn't a regression, but since this is a
pretty seriouis bug, it's something we should try to fix during the
3.8 development cycle.
- Ted
On 2/21/13 12:07 AM, Theodore Ts'o wrote:
> On Mon, Feb 18, 2013 at 03:41:11PM -0600, Eric Sandeen wrote:
>> Can't remember how I stumbled on this testcase, but mounting
>> an ext3 filesystem with "-t ext4" and then resizing leads to trouble.
>>
>> With -o nodelalloc, the newly added space isn't seen by the allocator
>> and we get ENOSPC for the extending writes in the script below.
>>
>> Without -o nodelalloc, the writes worked but I got an umount hang.
>>
>> Without -t ext4 (but letting ext4.ko handle the ext3 mount) it seems
>> to work fine.
>>
>> Haven't looked into it much at all yet but wanted to put it out
>> there for posterity.
>
> At least one of the problems is that ext4_alloc_blocks() is buggy if
> it is asked to allocate one or more indirect blocks, and then it
> doesn't have room to allocate any direct blocks. In that case,
> ext4_alloc_blocks() does not return ENOSPC, and so ext4_alloc_branch()
> doesn't fail. But since the number of direct blocks allocated is
> zeor, ext4_splice_branch() will not actually initialize the indirect
> block, and then we end up looping forever and calling
> ext4_mballoc_alloc() --- demonstrating that one of the best definition
> of insanity is doing the same thing over and over again and expecting
> a different result:
>
> flush-254:32-2913 [001] .... 1073.028245: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
> flush-254:32-2913 [001] .... 1073.050655: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
> flush-254:32-2913 [001] .... 1073.073034: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
> flush-254:32-2913 [001] .... 1073.112163: ext4_mballoc_alloc: dev 254,32 inode 21824 orig 0/4031/64@4981 goal 0/4029/2048@4096 result 0/0/0@0 blks 0 grps 0 cr 3 flags 0x0c20 tail 0 broken 0
>
> I suspect the right way to deal with this is to nuke
> ext4_alloc_blocks() from orbit, and change ext4_alloc_branch() to
> allocate the indirect and direct blocks directly, calling
> ext4_new_meta_block() and ext4_mb_new_blocks() directly. What we have
> right now is pretty gross....
>
> The other problem is why resizing isn't adding the blocks so that they
> are visible to the allocator. Since we are using the same code path
> for ext3 and ext4 file systems, I have a sneaking suspicion that we're
> not actually making all of the newly allocated blocks for ext4 file
> systems available too, but it's something like we're not making the
> first block in each flex_bg group available (and that happens to be all
> of the newly grown blocks for ext3 file systems).
>
> As near as I can tell this isn't a regression, but since this is a
> pretty seriouis bug, it's something we should try to fix during the
> 3.8 development cycle.
I think you're correct that it's not a regression. Now I remember that
the same basic bug came up w/ a RHEL customer who was doing this
"mkfs ext3; mount -t ext4 -o nodelalloc" business. Which really isn't
tested or supported, but it's still quite the odd corner case.
(Being RHEL, though, it wasn't clear that the same bug persisted upstream,
but it appears that it does. The older RHEL kernel was more noticeable
because it spewed the ext4 allocation context errors as well).
Thanks,
-Eric
> - Ted
>
On Mon, Feb 18, 2013 at 03:41:11PM -0600, Eric Sandeen wrote:
> Can't remember how I stumbled on this testcase, but mounting
> an ext3 filesystem with "-t ext4" and then resizing leads to trouble.
Is there a bugzilla entry for this? I found the problem and a fix;
patch follows in a moment.
Also, here's a simplified repro that doesn't require LVM. The bug was
introduced in commit fb0a387dcdc, so it goes back to 2.6.32. It only
affects block allocations for files that aren't extent mapped, and is
caused by the fact that the online resizer wasn't updating
s_blockfile_groups (which was introduced in commit fb0a387dcdc).
- Ted
#!/bin/bash
COUNT=15
SIZE_1=15G
SIZE_2=16G
DEVICE=/dev/vdc
XFS_IO=/root/xfstests/bin/xfs_io
mkdir -p mnt
umount mnt &>/dev/null
mkfs.ext3 $DEVICE $SIZE_1
mount -t ext4 -o nodelalloc $DEVICE mnt/
for I in `seq 1 $COUNT`; do mkdir mnt/dir$I; dd if=/dev/zero of=mnt/dir$I/file$I bs=1M count=1024; done
echo "before growing:"
df mnt/
umount mnt
mount -t ext4 -o nodelalloc $DEVICE mnt/
echo "growing:"
#export RESIZE2FS_KERNEL_VERSION=3.2.0
strace -o /tmp/resize2fs.strace resize2fs $DEVICE $SIZE_2
echo "done growing:"
df mnt/
# This gets ENOSPC for all of them
echo "try extending files:"
for I in `seq 1 $COUNT`; do $XFS_IO -f -F -c "pwrite -b 4m 1G 50m" mnt/dir$I/file$I; done
df mnt/
umount mnt
Commit fb0a387dcdc restricts block allocations for indirect-mapped
files to block groups less than s_blockfile_groups. However, the
online resizing code wasn't setting s_blockfile_groups, so the newly
added block groups were not available for non-extent mapped files.
Reported-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index e349853..08d2312 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1341,6 +1341,8 @@ static void ext4_update_super(struct super_block *sb,
/* Update the global fs size fields */
sbi->s_groups_count += flex_gd->count;
+ sbi->s_blockfile_groups = min_t(ext4_group_t, sbi->s_groups_count,
+ (EXT4_MAX_BLOCK_FILE_PHYS / EXT4_BLOCKS_PER_GROUP(sb)));
/* Update the reserved block counts only once the new group is
* active. */
--
1.7.12.rc0.22.gcdd159b
On 4/21/13 6:30 PM, Theodore Ts'o wrote:
> On Mon, Feb 18, 2013 at 03:41:11PM -0600, Eric Sandeen wrote:
>> Can't remember how I stumbled on this testcase, but mounting
>> an ext3 filesystem with "-t ext4" and then resizing leads to trouble.
>
> Is there a bugzilla entry for this? I found the problem and a fix;
> patch follows in a moment.
Sorry, I took a look and I think I failed to file a bug.
> Also, here's a simplified repro that doesn't require LVM. The bug was
> introduced in commit fb0a387dcdc, so it goes back to 2.6.32.
Argh. Karma! Sorry about that, and thanks for taking a look.
-Eric
> It only
> affects block allocations for files that aren't extent mapped, and is
> caused by the fact that the online resizer wasn't updating
> s_blockfile_groups (which was introduced in commit fb0a387dcdc).
>
> - Ted
>
> #!/bin/bash
>
> COUNT=15
> SIZE_1=15G
> SIZE_2=16G
> DEVICE=/dev/vdc
> XFS_IO=/root/xfstests/bin/xfs_io
>
> mkdir -p mnt
> umount mnt &>/dev/null
> mkfs.ext3 $DEVICE $SIZE_1
> mount -t ext4 -o nodelalloc $DEVICE mnt/
>
> for I in `seq 1 $COUNT`; do mkdir mnt/dir$I; dd if=/dev/zero of=mnt/dir$I/file$I bs=1M count=1024; done
> echo "before growing:"
> df mnt/
>
> umount mnt
> mount -t ext4 -o nodelalloc $DEVICE mnt/
>
> echo "growing:"
> #export RESIZE2FS_KERNEL_VERSION=3.2.0
> strace -o /tmp/resize2fs.strace resize2fs $DEVICE $SIZE_2
>
> echo "done growing:"
> df mnt/
>
> # This gets ENOSPC for all of them
> echo "try extending files:"
> for I in `seq 1 $COUNT`; do $XFS_IO -f -F -c "pwrite -b 4m 1G 50m" mnt/dir$I/file$I; done
> df mnt/
>
> umount mnt
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Sun, Apr 21, 2013 at 08:38:14PM -0500, Eric Sandeen wrote:
> > Also, here's a simplified repro that doesn't require LVM. The bug was
> > introduced in commit fb0a387dcdc, so it goes back to 2.6.32.
>
> Argh. Karma! Sorry about that, and thanks for taking a look.
No problem! Could you do me a favor and package up the repro as a
generic test case for xfstests? I figure this would be a good thing
to delegate to you, since you have a lot more experience creating
tests for xfstests.
- Ted
On 4/21/13 8:47 PM, Theodore Ts'o wrote:
> On Sun, Apr 21, 2013 at 08:38:14PM -0500, Eric Sandeen wrote:
>>> Also, here's a simplified repro that doesn't require LVM. The bug was
>>> introduced in commit fb0a387dcdc, so it goes back to 2.6.32.
>>
>> Argh. Karma! Sorry about that, and thanks for taking a look.
>
> No problem! Could you do me a favor and package up the repro as a
> generic test case for xfstests? I figure this would be a good thing
> to delegate to you, since you have a lot more experience creating
> tests for xfstests.
normally I'd say doing is a great way to learn, but I'm grateful
to you for fixing my regression, so - sure thing :)
Thanks,
-Eric
> - Ted
>