LinuxLists.cc - status update for inode reservation

2007-03-20 10:26:54

Subject: status update for inode reservation

Andreas,

1, The size of files I created for benchmark is 0 byte. I created the
files by this script:
> for i in `seq 1 50`;do for j in `seq 1 10000`;do touch
`/usr/bin/keygen | head -c 8`;done;done

2, using magic inode will not generate compatibility issues. for fsck do
not understand magic inode can ignore and remove the magic inodes. This
can only happen when fsck is performced and filesystem code can rebuild
the magic inode if it can not be found (this will take a some time for
reading inode table when mount).

Best regards.

Coly

P.S here is the result of my benchmark:
I created 500000 zero byte files in a directory named "sub", record
times for:
1, copy sub to another dir named "ordered1" in another harddisk.
2, copy dir "ordered1" to "ordered2" in another harddisk.
3, reboot the system and repeat 2 (change target to ordered3).
4, remove ordered3, ordered2, ordered 1.
5, remove sub.

>From the benchmark, I found no much performance improved for hash
ordered inode allocating when data=journal and data=ordered.

created 500000 new file in dir called "sub" by this script:
for i in `seq 1 50`;do for j in `seq 1 10000`;do touch `/usr/bin/keygen
| head -c 8`;done;done

==== data=writeback ====
copy sub to another dir named "ordered1":
real 7m17.616s
user 0m1.456s
sys 0m27.586s

copy dir "ordered1" to "ordered2":
real 0m45.231s
user 0m1.340s
sys 0m21.233s

reboot
copy dir "ordered2" to "ordered3":
real 1m8.764s
user 0m1.568s
sys 0m26.050s

remove ordered3 by rm -rf ordered3:
real 0m9.200s
user 0m0.168s
sys 0m8.893s

remove ordered2 by rm -rf ordered2:
real 0m12.225s
user 0m0.128s
sys 0m8.857s

remove ordered1 by rm -rf ordered1:
real 0m37.493s
user 0m0.076s
sys 0m11.089s

remove original dir "sub":
real 9m49.902s
user 0m0.220s
sys 0m14.377s

==== data=journal ====
copy sub to another dir named "ordered1":
real 6m54.151s
user 0m1.696s
sys 0m22.705s

copy dir "ordered1" to "ordered2":
real 7m7.696s
user 0m1.416s
sys 0m23.541s

reboot
copy dir "ordered1" to "ordered2":
real 10m46.649s
user 0m1.792s
sys 0m28.778s

remove ordered1 by rm -rf ordered1:
real 12m54.271s
user 0m0.192s
sys 0m15.353s

remove ordered2 by rm -rf ordered2:
real 13m37.035s
user 0m0.260s
sys 0m15.009s

remove ordered3 by rm -rf ordered3:
real 7m43.703s
user 0m0.216s
sys 0m12.117s

remove sub by rm -rf sub:
real 10m41.150s
user 0m0.188s
sys 0m13.781s

===== data=ordered ====
copy sub to another dir named "ordered1":
real 7m57.016s
user 0m1.632s
sys 0m25.558s

copy dir "ordered1" to "ordered2":
real 7m46.037s
user 0m1.604.s
sys 0m24.902s

reboot
copy dir "ordered2" to "ordered3":
real 8m21.952s
user 0m1.720s
sys 0m28.290s

remove ordered1 by rm -rf ordered1:
real 10m12.652s
user 0m0.272s
sys 0m15.049s

remove ordered2 by rm -rf ordered2:
real 9m21.770s
user 0m0.220s
sys 0m15.025s

remove ordered3 by rm -rf ordered3:
real 6m32.278s
user 0m0.176s
sys 0m12.093s

remove sub by rm -rf sub:
real 10m17.966s
user 0m0.236s
sys 0m14.453s

在 2007-03-20二的 03:51 -0600，Andreas Dilger写道：
> On Mar 20, 2007 17:22 +0800, coly wrote:
> > 1, I did benchmark on large number of file copy and remove. The method
> > is what you did and told me before (create many file in a dir, copy this
> > dir, remove the new and original dirs).
> > * In data=journal and data=ordered, not much performance improve will
> > be gained from inode reservation. For every inode modification will be
> > submitted into journal at once, no chance to merge multiple inode
> > modification in one inode table into 1 journal submitting.
>
> That shouldn't be true. Whether operation is data=journal or data=writeback
> the filesystem metadata (i.e. inode table, directory) will always be in the
> journal. Unless operation is always sync'd then it should still be possible
> to merge many filesystem operations into a single journal transaction (so
> that they can share the changes to the same blocks).
>
> Now, whether the implementation matches the theory is a different question.
> It would be interesting to figure out why your test results are not showing
> the same performance between data=ordered and data=writeback. How large
> are the files being unlinked? Maybe if they are large the truncate time is
> long enough that the journal transaction is being committed? Maybe with
> data=journal there is so much going into the journal that it also forces a
> commit because the journal is full?
>
> > 2, In order to management reserved inode table for each directories,
> > especially when files number of a directory exceeded the current
> > reserved limitation, a list is needed to manage the reserved inode
> > tables. I want to use some inode on disk as pointer. I think only by
> > this way, we can avoid to change ext4 on disk meta data format.
> > For some inodes used as pointers of list, I can assign MAGIC numbers
> > for them, identify them from normal inodes. But fsck and mkfs should be
> > modified to understand these MAGIC numbers.
> > With helps for these pointers (inode with special MAGIC number), inode
> > reservation can be implemented more easy.
>
> If you are making a magic inode, and it needs e2fsck and mke2fs support,
> then this by nature is a change to the filesystem format (though possibly
> one that allows an easy upgrade from existing filesystems). If we need
> to change the on-disk format then there are a number of other changes we
> could make, including having "inode in directory" format, which will avoid
> this problem entirely because readdir and inode order are always the same.
>
> I would suggest emailing to the linux-ext4 list with details of findings
> (performance, tests that have been run) so that everyone can read and
> comment on it.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>