LinuxLists.cc - [RFC, PATCH] Reservation based ext3 preallocation

On Wed, 2004-04-14 at 16:07, Andrew Morton wrote:

> > The current implementation is more than O(n): every time it does not
> > have a reservation window, it search from the head of per filesystem
> > reservation window list head. If it failed within the group, it will
> > move to the next group and start the search from the head of the list
> > again.
>
> Same problem exists in arch_get_unmapped_area(). We have a funny little
> heuristic (free_area_cache) in there to speed up the common case.
Actually, we only hit this more than O(n) case when the file is just
opened for write(without reservation window). In the normal case, if we
have a old reservation window, we will start the from the old
reservation, instead of the head of whole filesystem list, to search for
next new reservable hole.

>
> > This could be fixed by forget about the block group boundary at
> > all,(remove the for loop in ext3_new_block), make it searchs for a block
> > in a filesystem wide:)
>
> I do think we should do this. Does it have any disadvantages?
>
I re-looked at the code today, my concern is, we may end of a big
changes to the existing code. Need to think it more....

2004-04-21 23:28:21

by Mingming Cao

[permalink] [raw]

Subject: [PATCH] Lazy discard ext3 reservation window patch

Andrew,

This patch contains several changes against the ext3 reservation code in
265-mm6 tree:

Lazy Discard Reservation Window:
This patch is trying to do lazy discard: keep the old reservation
window temporally until we find the new reservation window, only do
remove/add if the new reservation window locate different than the old
one. (The reservation code in mm6 tree will discard the old one first,
then search the new one). Two reasons:
- If the ext3_find_goal() does a good job, the reservation windows on
the list should not very close to each other. So a inode's new
reservation window is likely located just next to it's old one, it's
position in the whole list is unchanged, no need to do remove and then
add the new one to the list in the same location. Just update the start
block and end block.
- If we failed to find a new reservation in the goal group and move on
the search to the next group, having the old reservation around
temporally could allow us to search the list directly after the old
window. Otherwise we lost where we were and has to start from the
beginning of the list. Eventually the old window will be discard when we
found a new one.

Other changes:
- Add check to force maximum when dynamically increase window size.
- ext3_discard_reservation() should not be called on every iput(). Now
it is moved to ext3_delete_inode(), so it is only called on the last
iput() if i_nlink is 0
- remove #ifdef EXT3_RESERVATION since we made reservation an mount
option
- Only allow application to modify the file's reservation window size
when fs is mounted with reservation and the operation is performed on
regular files.

This patch should apply to 2.6.5-mm6. Have tested it through many dd
test, untar test,dbench and tiobench.

Thanks!

Mingming

Attachments:

ext3_reservation_lazydiscard.patch (6.93 kB)

2004-04-27 15:27:56

by Mary Edie Meredith

[permalink] [raw]

Subject: Re: [PATCH 0/4] ext3 block reservation patch set

To test the benefit of Mingming's ext3 reservation
patchset, we ran tiobench on 2-way systems on STP
using 2.6.6-rc2-mm1 versus 2.6.6-rc2-mm1 patched to
force the ext3 file system to be built without
reservation.

The results show increased throughput for >1
threads not only for sequential write, but also
for random write, sequential read, and random read.
Latency is also decreased for all cases.

Raw data can be found:
-2 way 2.6.6-rc2-mm1
http://khack.osdl.org/stp/292223/results/tiobench-ext3.txt
-2 way 2.6.6-rc2-mm1 noreservation default
http://khack.osdl.org/stp/292225/results/tiobench-ext3.txt

Judith compared the two runs by plotting the
results at: http://developer.osdl.org/judith/tiobench/ext3-reserve/

Here are some interesting ones:
Thruput results:
-Random write thruput 128k
http://developer.osdl.org/judith/tiobench/ext3-reserve/through.ext3.2CPU.RW.128.png
-Random write thruput 4k
http://developer.osdl.org/judith/tiobench/ext3-reserve/through.ext3.2CPU.RW.4.png
-Sequential write thruput 4k
http://developer.osdl.org/judith/tiobench/ext3-reserve/through.ext3.2CPU.SW.4.png
-Sequential write thruput 128k
http://developer.osdl.org/judith/tiobench/ext3-reserve/through.ext3.2CPU.SW.128.png

Latency is reduced almost across the board.
-Example: Latency figures for Random write 4k:
http://developer.osdl.org/judith/tiobench/ext3-reserve/lat.ext3.2CPU.RW.4.png

Mary Edie Meredith
Open Source Development Labs
503-626-2455 x42
[email protected]

Mingming Cao wrote:
> Hello,
>
> Here is a set of patches which implement the in-memory ext3 block
> reservation (previously called reservation based ext3 preallocation).
>
> [patch 1]ext3_rsv_cleanup.patch: Cleans up the old ext3 preallocation
> code carried from ext2 but turned off.
>
> [patch 2]ext3_rsv_base.patch: Implements the base of in-memory block
> reservation and block allocation from reservation window.
>
> [patch 3]ext3_rsv_mount.patch: Adds features on top of the
> ext3_rsv_base.patch:
> - deal with earlier bogus -ENOSPC error
> - do block reservation only for regular file
> - make the ext3 reservation feature as a mount option:
> new mount option added: reservation
> - A pair of file ioctl commands are added for application to control
> the block reservation window size.
>
> [patch 4]ext3_rsv_dw.patch: adjust the reservation window size
> dynamically:
> Start from the deault reservation window size, if the hit ration of
> the reservation window is more than 50%, we will double the reservation
> window size next time up to a certain upper limit.
>
> Here are some numbers collected on dbench on 8 way PIII 700Mhz:
>
> dbench average throughputs on 4 runs
> ==================================================
> Threads ext3 ext3+rsv(8) ext3+rsv+dw
> 1 103 104(0%) 105(1%)
> 4 144 286(98%) 256(77%)
> 8 118 197(66%) 210(77%)
> 16 113 160(41%) 177(56%)
> 32 61 123(101%) 150(145%)
> 64 41 82(100%) 85(107%)
>
> And some numbers on tiobench sequential write:
>
> tiobench Sequential Writes throughputs(improvments)
> =====================================================================
> Threads ext2 ext3 ext3+rsv(8)(%) ext3+rsv(128)(%) ext3+rsv+dw(%)
> 1 26 23 25(8%) 26(13%) 26(13%)
> 4 17 4 14(250%) 24(500%) 25(525%)
> 8 15 7 13(85%) 23(228%) 24(242%)
> 16 16 13 12(-7%) 22(69%) 24(84%)
> 32 15 3 12(300%) 23(666%) 23(666%)
> 64 14 1 11(1000%) 22(2100%) 23(2200%)
>
> Note each time we run the test on a fresh created ext3 filesystem.
>
> We have also run fsx tests on a 8 way on 2.6.4 kernel with the patch set
> for a whole weekend on fresh created ext3 filesystem, as well as on a 4
> way with the root filesystem as ext3 plus all the changes. Other tests
> include 8 threads dd tests and untar a kernel source tree.
>
> Besides look at the performance numbers and verify the functionality, we
> also checked the block allocation layout for each file generated during
> the test: the blocks for a file are more contiguous with the reservation
> mount option on, especially when we dynamically increase the reservation
> window size in the sequential write cases.
>
> Andrew, is this something that you would consider for -mm tree?
>
> Thanks again for Andrew, Ted and Badari's ideas and helps on this
> project. I would really appreciate any comments and feedbacks.
>
>
> Mingming
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/