2007-05-23 18:08:17

by Coly Li

[permalink] [raw]
Subject: [RFC 4/5] inode reservation v0.1 (benchmark result)

Current patch avoids inodes from different directories mixed together in
the inode table. Therefore the benchmakr emulate a situation that mixes
inodes of different sub-directories together. and record the time on
removing them all.

In the first part, reserving 16 inodes for each new created directory.
Therefore 14 files can only use 1 reserved block for each directory in
inode table, obviously, the result of benchmark is the best case :-)

Enviornment:
1) create 9890 directory, create files in each directory alternatively
2) kernel version 2.6.20-mm, the ext4 subdir-inode-reservation is
patched based on 2.6.20-mm
3) 14 files in each subdirectory. 9890 sub directories in
mount_point/mailbox/
4) mount with option data=writeback
5) each operation followed by a reboot
6) EXT4_INIT_RESERVE_INODES = 16

===================== data=writeback ================================
remove directories and files by rm -rf:
* ext3
read 16m56.979s
user 0m0.156s
sys 0m21.449s

* ext4org
real 18m38.809s
user 0m0.636s
sys 0m37.422s

* ext4inores
real 7m57.437s
user 0m0.452s
sys 0m34.698s


===================== data=ordered ================================
remove directories and files by rm -rf:
* ext3
real 17m23.435s
user 0m0.140s
sys 0m21.709s

* ext4org
real 17m39.515s
user 0m0.120s
sys 0.22.097s

* ext4inores
real 7m41.365s
user 0m0.196s
sys 0m24.210s

===================== data=journal ================================
remove directories and files by rm -rf:
* ext3
real 12m50.545s
user 0m0.152s
sys 0m22.725s

* ext4org
real 13m43.910s
user 0m0.196s
sys 0m23.161s

* ext4inores
real 7m49.915s
user 0m0.168s
sys 0m23.633s


Due to the bad design of magic inode and the on-disk layout of magic
inode. When 30 files created alternatively in each directory, no
performance advantage exists. When 50 files created alternatively in
each directory, the patched ext4 will use double time on removing all
the files and directories.

Therefore, in the next version a new on-disk layout will be used.



2007-05-24 20:21:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC 4/5] inode reservation v0.1 (benchmark result)

On May 24, 2007 02:08 +0800, coly wrote:
> Due to the bad design of magic inode and the on-disk layout of magic
> inode. When 30 files created alternatively in each directory, no
> performance advantage exists. When 50 files created alternatively in
> each directory, the patched ext4 will use double time on removing all
> the files and directories.

I don't think the use of magic inodes is the right approach. One possibility
to avoid changing the on-disk format at all is to only do the reservation in
memory, scaling the reservation with the size of the directory.

The only issue that arises is how to regenerate the same reservation
after a remount. This might be possible to do by looking into the leaf
block at create time to see which inode numbers are already in use for
that leaf and checking whether there are free inodes in each group.

One way to get the "best" mapping is possibly checking groups in order of
decreasing number of inodes for that leaf in each group and once a suitable
group has been found doing a few name->hash->inode numbers to get the old
mapping back. Once this leaf->group mapping has been established it
can be re-used for a given leaf block until that window is full.

Since you need to scan all of a leaf block's dir entries in a hash block
at insert time to look for duplicate names, and the inode numbers are
in the dir entries, this shouldn't introduce any additional disk IO.

Also, regardless of what the mapping turns out to be - the goal is to place
inodes with a similar hash into nearby inodes, and this heuristic works
relatively well for that. Once the given leaf block's inode range is full
then new inodes can be allocated from a new window as it was done for the
newly-created directory.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.