2009-12-22 10:44:38

by Vyacheslav Dubeyko

[permalink] [raw]
Subject: About strange behaviour of ext4 allocation algorithm

Hello all,

I think that I found some strange behaviour in ext4 allocation algorithm. Maybe I wrong or use not actual code but such allocation policy is strange from my point of view.

First of all, I has created ext4 volume of 100 Mb in size (mkfs.ext4 -b 1024 -L ext4 /dev/sdb1). And I have such free space map after volume creation ([group; begin; end]):
[group=0; begin=3815; end=8192],
[group=1; begin=8451; end=16384],
[group=2; begin=16385; end=24576],
[group=3; begin=24835; end=32768],
[group=4; begin=32769; end=40960],
[group=5; begin=41219; end=49152],
[group=6; begin=53249; end=57344],
[group=7; begin=57603; end=65536],
[group=8; begin=65537; end=73728],
[group=9; begin=73987; end=81920],
[group=10; begin=81921; end=90112],
[group=11; begin=90113; end=98304],
[group=12; begin=98305; end=106495],
[group=13; begin=106497; end=112419].

Then I mount created volume and has generated a regular file of 95 Mb in size on it by command: dd if=/dev/urandom of=/ext4/001.bin bs=1048576 count=95. And for the file I have such extents' tree ([LogicalBlock; PhysicalBlock; NumberOfBlocks]):

Depth = 1: [logical=0; physical=92161; size=1]
Depth = 0:
[logical=0; physical=4097; size=4096],
[logical=4096; physical=10241; size=14336],
[logical=18432; physical=26625; size=14336],
[logical=32768; physical=43009; size=6144],
[logical=38912; physical=53249; size=4096],
[logical=43008; physical=59393; size=14336],
[logical=57344; physical=75777; size=14336],
[logical=71680; physical=109569; size=2048],
[logical=73728; physical=92162; size=2047],
[logical=75775; physical=94753; size=1],
[logical=75776; physical=8451; size=1790],
[logical=77566; physical=24835; size=258],
[logical=77824; physical=41219; size=1790],
[logical=79614; physical=57603; size=970],
[logical=80584; physical=90825; size=1336],
[logical=81920; physical=94209; size=112],
[logical=82032; physical=111617; size=436],
[logical=82468; physical=94757; size=14812].

Such used space allocation map for file is strange.

Firstly, I can see that extents [0; 4097; 4096], [4096; 10241; 14336], [18432; 26625; 14336], [32768; 43009; 6144], [43008; 59393; 14336], [57344; 75777; 14336], [71680; 109569; 2048] begins inside free spaces (not from begin of free space). But why? If it is a reserve policy for metadata blocks then I don't understand why index block of extents' tree [0; 92161; 1] allocates such far from volume begin.

Secondly, it is strange that after extent [71680; 109569; 2048] allocation algorithm has found firstly [73728; 92162; 2047], [75775; 94753; 1] and only then try to search from volume begin [75776; 8451; 1790] (however, free space [group=0; begin=3815; end=4096] has excluded from search).

Thirdly, I can't understand why during "first search cycle" ([0; 4097; 4096] - [71680; 109569; 2048]) allocation algorithm can't find [82032; 111617; 436] extent. And why after [81920; 94209; 112] extent it is found [82032; 111617; 436] instead of [82468; 94757; 14812]? Such strange block allocations is not rare occurence for files of greater size.

As I can see existing allocation algorithm grows extents count in tree. The file of 95 Mb has 18 extents in the tree. But volume initially (before allocation) had 13 free spaces (that enough for file allocation). Is it bug or feature?

--
Vyacheslav Dubeyko <[email protected]>
Acronis


2009-12-22 22:24:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: About strange behaviour of ext4 allocation algorithm

On 2009-12-22, at 03:42, Vyacheslav Dubeyko wrote:
> I think that I found some strange behaviour in ext4 allocation
> algorithm. Maybe I wrong or use not actual code but such allocation
> policy is strange from my point of view.

What kernel version are you using? I know Ted has looked into some
allocation problems, specifically related to uninitialized groups, but
I don't know when they were fixed.

> First of all, I has created ext4 volume of 100 Mb in size (mkfs.ext4
> -b 1024 -L ext4 /dev/sdb1).

If you delete your file, without reformatting the filesystem, and then
re-run the test, does it produce the same results? If not, then it is
likely you are seeing the problem with uninitialized groups that was
fixed a month or two ago.

> And I have such free space map after volume creation ([group; begin;
> end]):
> [group=0; begin=3815; end=8192],
> [group=1; begin=8451; end=16384],
> [group=2; begin=16385; end=24576],
> [group=3; begin=24835; end=32768],
> [group=4; begin=32769; end=40960],
> [group=5; begin=41219; end=49152],
> [group=6; begin=53249; end=57344],
> [group=7; begin=57603; end=65536],
> [group=8; begin=65537; end=73728],
> [group=9; begin=73987; end=81920],
> [group=10; begin=81921; end=90112],
> [group=11; begin=90113; end=98304],
> [group=12; begin=98305; end=106495],
> [group=13; begin=106497; end=112419].
>
> Then I mount created volume and has generated a regular file of 95
> Mb in size on it by command: dd if=/dev/urandom of=/ext4/001.bin
> bs=1048576 count=95. And for the file I have such extents' tree
> ([LogicalBlock; PhysicalBlock; NumberOfBlocks]):
>
> Depth = 1: [logical=0; physical=92161; size=1]
> Depth = 0:
> [logical=0; physical=4097; size=4096],
> [logical=4096; physical=10241; size=14336],
> [logical=18432; physical=26625; size=14336],
> [logical=32768; physical=43009; size=6144],
> [logical=38912; physical=53249; size=4096],
> [logical=43008; physical=59393; size=14336],
> [logical=57344; physical=75777; size=14336],
> [logical=71680; physical=109569; size=2048],
> [logical=73728; physical=92162; size=2047],
> [logical=75775; physical=94753; size=1],
> [logical=75776; physical=8451; size=1790],
> [logical=77566; physical=24835; size=258],
> [logical=77824; physical=41219; size=1790],
> [logical=79614; physical=57603; size=970],
> [logical=80584; physical=90825; size=1336],
> [logical=81920; physical=94209; size=112],
> [logical=82032; physical=111617; size=436],
> [logical=82468; physical=94757; size=14812].
>
> Such used space allocation map for file is strange.
>
> Firstly, I can see that extents [0; 4097; 4096], [4096; 10241;
> 14336], [18432; 26625; 14336], [32768; 43009; 6144], [43008; 59393;
> 14336], [57344; 75777; 14336], [71680; 109569; 2048] begins inside
> free spaces (not from begin of free space). But why? If it is a
> reserve policy for metadata blocks then I don't understand why index
> block of extents' tree [0; 92161; 1] allocates such far from volume
> begin.
>
> Secondly, it is strange that after extent [71680; 109569; 2048]
> allocation algorithm has found firstly [73728; 92162; 2047], [75775;
> 94753; 1] and only then try to search from volume begin [75776;
> 8451; 1790] (however, free space [group=0; begin=3815; end=4096] has
> excluded from search).
>
> Thirdly, I can't understand why during "first search cycle" ([0;
> 4097; 4096] - [71680; 109569; 2048]) allocation algorithm can't find
> [82032; 111617; 436] extent. And why after [81920; 94209; 112]
> extent it is found [82032; 111617; 436] instead of [82468; 94757;
> 14812]? Such strange block allocations is not rare occurence for
> files of greater size.

One problem with mballoc is that it is only doing "local optimal"
searching for freespace. It is searching consecutive block groups,
and if it doesn't find an optimal allocation, it uses the best
available one.

> As I can see existing allocation algorithm grows extents count in
> tree. The file of 95 Mb has 18 extents in the tree. But volume
> initially (before allocation) had 13 free spaces (that enough for
> file allocation). Is it bug or feature?


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-12-23 10:53:07

by Vyacheslav Dubeyko

[permalink] [raw]
Subject: RE: About strange behaviour of ext4 allocation algorithm



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Andreas Dilger
> Sent: Wednesday, December 23, 2009 1:25 AM
> To: Dubeyko, Vyacheslav
> Cc: [email protected]
> Subject: Re: About strange behaviour of ext4 allocation algorithm
>
> On 2009-12-22, at 03:42, Vyacheslav Dubeyko wrote:
>> I think that I found some strange behaviour in ext4 allocation
>> algorithm. Maybe I wrong or use not actual code but such allocation
>> policy is strange from my point of view.
>
> What kernel version are you using? I know Ted has looked into some allocation problems, specifically related to uninitialized groups, but I don't know
> when they were fixed.

I use kernel: 2.6.29.4-167.fc11.i686.PAE #1 SMP Wed May 27 17:28:22 EDT 2009 i686 i686 i386 GNU/Linux.

>> First of all, I has created ext4 volume of 100 Mb in size (mkfs.ext4
>> -b 1024 -L ext4 /dev/sdb1).
>
> If you delete your file, without reformatting the filesystem, and then re-run the test, does it produce the same results? If not, then it is likely you are
> seeing the problem with uninitialized groups that was fixed a month or two ago.

After deletion of the file and re-run test (without reformatting the filesystem) I have slightly different extents' tree. Index block (depth of the tree = 1) has changed place and several extents has another sizes. But nature of the extents' sequence is the same.

--
Vyacheslav Dubeyko <[email protected]>
Acronis
--


2009-12-23 12:07:58

by Theodore Ts'o

[permalink] [raw]
Subject: Re: About strange behaviour of ext4 allocation algorithm

On Wed, Dec 23, 2009 at 01:52:48PM +0300, Vyacheslav Dubeyko wrote:
>
> I use kernel: 2.6.29.4-167.fc11.i686.PAE #1 SMP Wed May 27 17:28:22
> EDT 2009 i686 i686 i386 GNU/Linux.

Yeah, that was before a massive number of changes to the ext4
allocator. The changes to the allocators which speed up fsck
described here:

http://thunk.org/tytso/blog/2009/02/26/fast-ext4-fsck-times-revisited/

All went in *after* 2.6.29. That is, how the block and inode
allocators worked change significantly between 2.6.29 and 2.6.31.

> > If you delete your file, without reformatting the filesystem, and
> > then re-run the test, does it produce the same results? If not,
> > then it is likely you are seeing the problem with uninitialized
> > groups that was fixed a month or two ago.
>
> After deletion of the file and re-run test (without reformatting the
> filesystem) I have slightly different extents' tree. Index block
> (depth of the tree = 1) has changed place and several extents has
> another sizes. But nature of the extents' sequence is the same.

The change which Andreas was referring to --- taking out the bias
against opening up uninitialized block groups for allocations until
absolutely necessary, which had a tendency to cause unnecessary
fragmentation --- was merged into mainline between 2.6.30 and 2.6.31.

Best regards,

- Ted