Hi,
It takes 29 seconds to format 1TB partition as ext4 with 256k cluster
size on MIPS. e2fsprogs version is 1.42.7.
The most time consumer is ext2fs_convert_subcluster_bitmap which folds
30M into 500K walking through block bitmap bit by bit. Afair, it takes
less time on x86 since there are asm bit ops for x86, but mips uses
generic ones.
Of course this folding can be optimized so that it tests series of
bits by byte, short or int and does less shifts and other address
arithmetic in between, but I don't expect that to improve time
dramatically.
First thing I did is allocated per-cluster bitmap instead of per-block
bitmap in ext2fs_initialize so that conversion doesn't occur. That
dropped mke2fs time from 29s to 2.5s. e2fsck -f found this fresh fs as
a good one and mounting/writing/reading/unmounting also went good. Of
course groups are allocated at different offsets and about 60M of
usable space are lost if compared to 'slow' formatting. That's fine,
we can live with that.
Are there any long-term consequences that I've missed? Are there any
reasons for using block bitmap instead of cluster bitmap except for
meta-data space efficiency?
As a long term solution I see following options:
1) Continue using block bitmap as base but allocate mirror cluster
bitmap and replay all base block bitmap allocations to it as soon as
they happen. This will increase memory usage by block/cluster ratio,
e.g. 1.6% in case of 4k blocks / 256k clusters.
2) Use cluster bitmap as base and use small sub-cluster bitmaps for
tracking blocks inside allocated clusters (some sort of search tree).
This will reduce allocated memory significantly in case of high
cluster/block ratio .
3) Variation of (2). Allocate cluster bitmap as base. Allocate block
bitmap via mmap as a mirror, but don't commit all the memory by
memset. This is simpler than (2) since there is no search structure
and at the same time memory is not over-allocated.
Please let me know your thoughts. Thanks!
Regards,
Andrey.
On Wed, Mar 13, 2013 at 08:04:57PM +0400, Andrey Sidorov wrote:
>
> It takes 29 seconds to format 1TB partition as ext4 with 256k cluster
> size on MIPS. e2fsprogs version is 1.42.7.
> The most time consumer is ext2fs_convert_subcluster_bitmap which folds
> 30M into 500K walking through block bitmap bit by bit. Afair, it takes
> less time on x86 since there are asm bit ops for x86, but mips uses
> generic ones.
Actually, these days we use a rbtree to encode the bitmap. That means
that it should be possible to create a very efficient find_first_set
and find_first_zero functions. This would work especially well for
mke2fs since the allocation bitmap will be mostly empty. This I think
would improve things dramatically.
> First thing I did is allocated per-cluster bitmap instead of per-block
> bitmap in ext2fs_initialize so that conversion doesn't occur. That
> dropped mke2fs time from 29s to 2.5s. e2fsck -f found this fresh fs as
> a good one and mounting/writing/reading/unmounting also went good. Of
> course groups are allocated at different offsets and about 60M of
> usable space are lost if compared to 'slow' formatting. That's fine,
> we can live with that.
> Are there any long-term consequences that I've missed? Are there any
> reasons for using block bitmap instead of cluster bitmap except for
> meta-data space efficiency?
Fragmenting the block allocation bitmaps slows down operations that
need to read in multiple block bitmaps in sequence. This includes
dumpe2fs, e2fsck, and in some cases, block allocation. So making this
change is not zero-cost. I agree that 30 seconds for mke2fs is
non-optimal, but I'm surprised you're finding this to be a
showstopper. I assume you're worried about substantially bigger file
systems than just 1TB?
- Ted
On Wed, Mar 13, 2013 at 11:00 PM, Theodore Ts'o <[email protected]> wrote:
> Actually, these days we use a rbtree to encode the bitmap. That means
> that it should be possible to create a very efficient find_first_set
> and find_first_zero functions. This would work especially well for
> mke2fs since the allocation bitmap will be mostly empty. This I think
> would improve things dramatically.
Oh, thanks for pointing that! I was mostly reading v1.42 that is more
than year old and missed the fact 1.42.7 has rbtree bitmap
implementation.
That explains why formatting bigalloc became slower compared to 1.42.
In this case ffs/ffz seem like a right choice, I'll try them. I'll
also try mirroring as it won't waste much RAM and might be faster.
> Fragmenting the block allocation bitmaps slows down operations that
> need to read in multiple block bitmaps in sequence. This includes
> dumpe2fs, e2fsck, and in some cases, block allocation. So making this
> change is not zero-cost. I agree that 30 seconds for mke2fs is
> non-optimal, but I'm surprised you're finding this to be a
> showstopper. I assume you're worried about substantially bigger file
> systems than just 1TB?
1TB is what we have. It is a show stopper because it increases first
boot time a lot compared to total boot time. This is bad because first
boot happens on factory and thus should be as quick as possible.
On Fri, Mar 15, 2013 at 01:53:58AM +0400, Andrey Sidorov wrote:
>
> Oh, thanks for pointing that! I was mostly reading v1.42 that is more
> than year old and missed the fact 1.42.7 has rbtree bitmap
> implementation.
> That explains why formatting bigalloc became slower compared to 1.42.
> In this case ffs/ffz seem like a right choice, I'll try them. I'll
> also try mirroring as it won't waste much RAM and might be faster.
As you'll discover fairly quickly, we haven't implemented ffs/ffz for
rbtree bitmaps yet. We've implemented find_first_zero for for
bitarrays (and for the generic bitmap framework in lib/ext2fs/bitops.h
and lib/ext2fs/gen_bitmap*.c), but not in lib/ext2fs/blkmap64_rb.c
yet. More critically for the mke2fs's use case in
convert_subcluster_bitmap, will be the find_first_set operation.
Adding this has been on my todo list for a while, but I just haven't
had the time. Note that test cases for this will be critically
important, especially if we start using them in correctness-critical
code such as e2fsck.
- Ted