LinuxLists.cc - [ricwheeler@gmail.com: suspiciously good fsck times?]

2008-07-10 17:28:31

Subject: [[email protected]: suspiciously good fsck times?]

Transferring this thread to the linux-ext4 list instead of
linux-ext4-owner. :-)

- Ted

Attachments:

(No filename) (95.00 B)
(No filename) (4.38 kB)
(No filename) (3.23 kB)
(No filename) (4.32 kB)
(No filename) (2.81 kB)
(No filename) (5.05 kB)
(No filename) (2.55 kB)
(No filename) (2.86 kB)
Download all attachments

2008-07-10 17:53:56

by Theodore Ts'o

[permalink] [raw]

Subject: Re: suspiciously good fsck times?

Based on the graphs which Eric posted, One interesting thing I think
you'll find if you repeat the ext3 experiment with e2fsck -t -t is
that pass2 will be about seven times longer than pass1. (Which is
backwards from most e2fsck runs, where pass2 is about half pass 1's
run time --- although obviously that depends on how many directory
blocks you have.)

Yes, some kind of reservation windows would help on ext3 --- but the
question is whether such a change would be too-specific for this
benchmark or not. Most of the time directories don't grow to such a
huge size. So if you use a smallish (around 8 blocks, say) for many
directories this might lead to more filesystem fragmentation that in
the long run would cause the filesystem not to age well; it also
wouldn't help much when you have over 11 million files in the
directory, and a directory with over 100,000 blocks.

I don't think delayed allocation is what's helping here either,
because the journal will force the directory blocks to be placed as
soon as we commit a transaction. I think what's saving us here is
that flex_bg and mballoc is separating the directory blocks from the
data blocks, allowng the directory blocks to be closely packed
together.

- Ted

2008-07-10 20:13:42

by Ric Wheeler

[permalink] [raw]

Subject: Re: suspiciously good fsck times?

2008-07-11 15:39:55

by Ric Wheeler

[permalink] [raw]

Subject: Re: suspiciously good fsck times?

Theodore Tso wrote:
> Based on the graphs which Eric posted, One interesting thing I think
> you'll find if you repeat the ext3 experiment with e2fsck -t -t is
> that pass2 will be about seven times longer than pass1. (Which is
> backwards from most e2fsck runs, where pass2 is about half pass 1's
> run time --- although obviously that depends on how many directory
> blocks you have.)
>
> Yes, some kind of reservation windows would help on ext3 --- but the
> question is whether such a change would be too-specific for this
> benchmark or not. Most of the time directories don't grow to such a
> huge size. So if you use a smallish (around 8 blocks, say) for many
> directories this might lead to more filesystem fragmentation that in
> the long run would cause the filesystem not to age well; it also
> wouldn't help much when you have over 11 million files in the
> directory, and a directory with over 100,000 blocks.
>
> I don't think delayed allocation is what's helping here either,
> because the journal will force the directory blocks to be placed as
> soon as we commit a transaction. I think what's saving us here is
> that flex_bg and mballoc is separating the directory blocks from the
> data blocks, allowng the directory blocks to be closely packed
> together.
>
> - Ted
>

I made a new ext4 file system without flex_bg or uninit:[

root@localhost Perf]# /sbin/debuge4fs /dev/sdb1
debuge4fs 1.41-WIP (07-Jul-2008)
debuge4fs: feature
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype extent sparse_super large_file

The fsck time was a bit slower, but still looks like 8 minutes on ext4
vs 1 hour on ext3:

[root@localhost Perf]# umount /mnt
[root@localhost Perf]# time /sbin/fsck.ext4 -t -t -f /dev/sdb1
e4fsck 1.41-WIP (07-Jul-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 43944k/69424k (36476k/7469k), time: 352.48/93.27/29.45
Pass 1: I/O read: 14914MB, write: 0MB, rate: 42.31MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 71396k/61968k (51854k/19543k), time: 73.00/50.46/ 7.65
Pass 2: I/O read: 3023MB, write: 0MB, rate: 41.41MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 71396k/61968k (59307k/12090k), time:
425.82/143.83/37.10
Pass 3A: Memory used: 71396k/61968k (59307k/12090k), time: 0.00/ 0.00/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 71396k/61968k (51854k/19543k), time: 0.01/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 76.91MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 71396k/44968k (27406k/43991k), time: 2.37/ 2.36/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 71396k/240k (64671k/6726k), time: 63.60/ 4.98/ 0.33
Pass 5: I/O read: 37MB, write: 0MB, rate: 0.58MB/s
/dev/sdb1: 45600268/61054976 files (0.0% non-contiguous),
232657587/244190000 blocks
Memory used: 71396k/240k (64671k/6726k), time: 491.82/151.17/37.43
I/O read: 17974MB, write: 1MB, rate: 36.55MB/s

real 8m12.260s
user 2m31.167s
sys 0m37.766s

2008-07-14 21:19:27

by Andreas Dilger

[permalink] [raw]

Subject: Re: suspiciously good fsck times?

On Jul 10, 2008 16:13 -0400, Ric Wheeler wrote:
> It would be interesting to rerun with the 46 million files in one
> directory as well (basically, for working sets that have no natural
> mapping into directories like some object based workloads).

I think you'll hit a limit around 15M files in a single directory.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-07-15 00:48:13

by Ric Wheeler

[permalink] [raw]

Subject: Re: suspiciously good fsck times?

Andreas Dilger wrote:
> On Jul 10, 2008 16:13 -0400, Ric Wheeler wrote:
>
>> It would be interesting to rerun with the 46 million files in one
>> directory as well (basically, for working sets that have no natural
>> mapping into directories like some object based workloads).
>>
>
> I think you'll hit a limit around 15M files in a single directory.
>
> Cheers, Andreas
> --
>
Probably still worth a quick test, just to see how well it holds up at
the edge, thanks!

ric