2020-04-16 20:44:27

by Eric Sandeen

[permalink] [raw]
Subject: strange allocator behavior on a 2k block fs, skipping free blocks

This got picked up by xfstests generic/018 on a 2k block filesystem when it
failed to defragment a file into 1 extent as expected.

For some reason, the allocator is skipping over free blocks when it allocates
the donor file. The attached image shows this behavior - if you do:

# bunzip2 ext4.img.qcow.bz2
# qemu-img convert -O raw ext4.img.qcow ext4.img
# mkdir -p mnt
# mount -o loop ext4.img mnt/
# fallocate -l 20480 mnt/newfile
# filefrag -v mnt/newfile
Filesystem type is: ef53
File size of mnt/newfile is 20480 (10 blocks of 2048 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 1: 16962.. 16963: 2: unwritten
1: 2.. 9: 16968.. 16975: 8: 16964: unwritten,eof
mnt/newfile: 2 extents found

it allocates 2 extents, even though the blocks in between the extents are free:

# dumpe2fs test.img | grep -w 16964
dumpe2fs 1.42.9 (28-Dec-2013)
Free blocks: 16964-16967, 16976-17407, 17410-17919, 17922-18431, 18434-18943, 18946-19455, 19457-19967, 19969-32767

I suppose this isn't critical, as defrag is best-effort and the allocator doesn't ever guarantee contiguous allocations, but it still seems a little odd so just thought I'd highlight it.

Thanks,
-Eric


Attachments:
ext4.img.qcow.bz2 (29.57 kB)

2020-04-19 16:50:27

by Ritesh Harjani

[permalink] [raw]
Subject: Re: strange allocator behavior on a 2k block fs, skipping free blocks

Hello All,

On 4/17/20 12:46 AM, Eric Sandeen wrote:
> This got picked up by xfstests generic/018 on a 2k block filesystem when it
> failed to defragment a file into 1 extent as expected.
>
> For some reason, the allocator is skipping over free blocks when it allocates
> the donor file. The attached image shows this behavior - if you do:
>
> # bunzip2 ext4.img.qcow.bz2
> # qemu-img convert -O raw ext4.img.qcow ext4.img
> # mkdir -p mnt
> # mount -o loop ext4.img mnt/
> # fallocate -l 20480 mnt/newfile
> # filefrag -v mnt/newfile
> Filesystem type is: ef53
> File size of mnt/newfile is 20480 (10 blocks of 2048 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 1: 16962.. 16963: 2: unwritten
> 1: 2.. 9: 16968.. 16975: 8: 16964: unwritten,eof
> mnt/newfile: 2 extents found
>
> it allocates 2 extents, even though the blocks in between the extents are free:
>
> # dumpe2fs test.img | grep -w 16964
> dumpe2fs 1.42.9 (28-Dec-2013)
> Free blocks: 16964-16967, 16976-17407, 17410-17919, 17922-18431, 18434-18943, 18946-19455, 19457-19967, 19969-32767
>

So my initial investigation on this says that below is what is
happening. Also verified by logs.
1. Initially when the fallocate blocks are requested with length of 10
blocks. (please note in fallocate path we don't set the
EXT4_MB_HINT_TRY_GOAL).
-> For blocks of length 10 (since length of not order of 2
multiple), we chose allocation criteria as 1. And go for
ext4_mb_scan_aligned() with stripe size as 2. So in that function
we only look for 2 blocks as needed blocks(since stripe size is 2
blocks) and we return this 2 blocks as the allocated blocks from
ext4_map_blocks.
This is where we get the blocks as (16962, 16963).

2. Now again fallocate path request for remaining length which is 8.
At this time, since 8 is equal 2^3 request. So we go with criteria
as 0. And try the allocation path via ext4_mb_simple_scan_group().

In 2nd iteration, buddy structures are scanned to find the right fit of
the block. That's why we see two extents in above results.

I guess if we make stripe size as 0, then I don't think we will see
this problem.

> I suppose this isn't critical, as defrag is best-effort and the allocator doesn't ever guarantee contiguous allocations, but it still seems a little odd so just thought I'd highlight it.

But others can tell if this is really a problem which needs fixing in
the long run?

-ritesh