2006-10-27 07:23:15

by Takashi Sato

[permalink] [raw]
Subject: Re: [RFC] Ext3 online defrag


Hi,

> TT> On Mon, Oct 23, 2006 at 02:27:10PM +0200, Jan Kara wrote:
> >> Hello,
> >>
> >> I've written a simple patch implementing ext3 ioctl for file
> >> relocation. Basically you call ioctl on a file, give it list of blocks
> >> and it relocates the file into given blocks (provided they are still
> >> free). The idea is to use it as a kernel part of ext3 online
> >> defragmenter (or generally disk access optimizer).
>
>isn't that a kernel responsbility to find/allocate target blocks?
>wouldn't it better to specify desirable target group and minimal
>acceptable chunk of free blocks?

Agreed.

I am considering the online defrag function for ext4 and thinking
that your following patch set for multi-block allocation is useful
to search contiguous free blocks for the defragmentation.

"[RFC] extents,mballoc,delalloc for 2.6.16.8"
http://marc.theaimsgroup.com/?l=linux-ext4&m=114669168616780&w=2

I will send the patch of simple defrag implementation for ext4 later.

Cheers, Takashi


2006-10-27 07:42:46

by Alex Tomas

[permalink] [raw]
Subject: Re: [RFC] Ext3 online defrag


I've been reworking mballoc with few new features:

1) in-core preallocation
like existing reservation, but can preallocate few pieces for a file

2) locality groups
to maintain groups of related files and flush them together.
say, two users are unpacking kernel. with delayed allocation
we've got bunch of files from the both in cache. then we flush
first set (few MBs) of files from one user, then from another.
this way write I/Os will be large enough to achieve good
throughput and files are still quite localized to be used later
at good read rate.

3) scalable reservation
required for delayed allocation to avoid -ENOSPC at flush time.
current version uses per-sb spinlock.

probably we could add something for defragmentation?

thanks, Alex

>>>>> sho (s) writes:

s> I am considering the online defrag function for ext4 and thinking
s> that your following patch set for multi-block allocation is useful
s> to search contiguous free blocks for the defragmentation.

s> "[RFC] extents,mballoc,delalloc for 2.6.16.8"
s> http://marc.theaimsgroup.com/?l=linux-ext4&m=114669168616780&w=2

s> I will send the patch of simple defrag implementation for ext4 later.

s> Cheers, Takashi

2006-10-27 13:53:58

by Eric Sandeen

[permalink] [raw]
Subject: Re: [RFC] Ext3 online defrag

Alex Tomas wrote:
> 3) scalable reservation
> required for delayed allocation to avoid -ENOSPC at flush time.
> current version uses per-sb spinlock.

Can you elaborate on this issue? Shouldn't delayed allocation decrement free
space immediately, and only the actual block location choice is delayed? Or is
this due to potential extra metadata space required as blocks are allocated?

Thanks,

-Eric

2006-10-27 14:04:31

by Alex Tomas

[permalink] [raw]
Subject: Re: [RFC] Ext3 online defrag

>>>>> Eric Sandeen (ES) writes:

ES> Alex Tomas wrote:
>> 3) scalable reservation
>> required for delayed allocation to avoid -ENOSPC at flush time.
>> current version uses per-sb spinlock.

ES> Can you elaborate on this issue? Shouldn't delayed allocation
ES> decrement free space immediately, and only the actual block location
ES> choice is delayed? Or is this due to potential extra metadata space
ES> required as blocks are allocated?

exactly. in this case, reservation has nothing to do with allocation
or preallocation of real blocks. this is just a *per-sb counter* of
blocks reserved for allocation at flush time. it includes all
non-allocated-yet blocks and metadata needed to allocate them (bitmaps,
group descriptors, blocks extent tree, etc). the previous version
of mballoc has reservation, but it doesn't scale very well being
a single global counter protected by the spinlock. at least, in many
regular loads I observed the reservation function in top30 of oprofile.



thanks, Alex

2006-10-27 14:24:24

by Eric Sandeen

[permalink] [raw]
Subject: Re: [RFC] Ext3 online defrag

Alex Tomas wrote:
>>>>>> Eric Sandeen (ES) writes:
>
> ES> Alex Tomas wrote:
> >> 3) scalable reservation
> >> required for delayed allocation to avoid -ENOSPC at flush time.
> >> current version uses per-sb spinlock.
>
> ES> Can you elaborate on this issue? Shouldn't delayed allocation
> ES> decrement free space immediately, and only the actual block location
> ES> choice is delayed? Or is this due to potential extra metadata space
> ES> required as blocks are allocated?
>
> exactly. in this case, reservation has nothing to do with allocation
> or preallocation of real blocks. this is just a *per-sb counter* of
> blocks reserved for allocation at flush time. it includes all
> non-allocated-yet blocks and metadata needed to allocate them (bitmaps,
> group descriptors, blocks extent tree, etc). the previous version
> of mballoc has reservation, but it doesn't scale very well being
> a single global counter protected by the spinlock. at least, in many
> regular loads I observed the reservation function in top30 of oprofile.

Thanks. XFS recently made similar scalability changes in this area, see the 2006
OLS paper, if you're interested.

-Eric

2006-10-27 14:39:34

by Alex Tomas

[permalink] [raw]
Subject: Re: [RFC] Ext3 online defrag


interested, definitely.

thanks, Alex

>>>>> Eric Sandeen (ES) writes:

ES> Thanks. XFS recently made similar scalability changes in this area,
ES> see the 2006 OLS paper, if you're interested.

ES> -Eric