2010-11-09 04:14:26

by Amir Goldstein

[permalink] [raw]
Subject: [RFC] Self healing extent map OR opportunistic de-fragmentation

Hi Ted,

I would like to propose a simple idea how to automatically de-fragment a file.
How difficult the implementation is, I do not know yet,
but since you talked about modifying the delayed allocation path,
I thought the idea was worth mentioning.

The idea is to use every page write as an opportunity to reduce file
fragmentation
(no extra I/O charges included, no donor inodes, etc..).

There are many strategies to do this, but the simplest one would be
something like this:

Extents A and B are logically adjacent, but not physically adjacent.
When re-writing to first block of extent B, try to extend extent A and
if that fails,
fall back to writing to first block of extent B.

Another simple strategy is:
When about to re-write all blocks of extents A and B, try to allocate
a new single extent
for writing all the blocks and if that fails, fall back to writing to A and B.

The biggest change of course is the notion that an already mapped
block may still need to be allocated
(not a new notion to me) and that existing blocks may need to be freed
on flush_alloc_da_page().
As I told you about snapshots, in data!=ordered, that could lead to
data corruption in existing file
(the uninitialized extent flag is not sufficient information when
changing a block's mapping).

However, if you do intend to follow through with your plan for
flushing metadata in end_io,
I believe that you will then already have enough information in-memory
to achieve
opportunistic de-fragmentation for the same price.

The use case for this, besides healing fragmentation caused by
snapshots move-on-rewrite,
is an highly fragmented ext2/3 fs, which was mounted as ext4.
ext2/3 old files are slowly being deleted while new (still fragmented)
extent mapped files are being created.
This viscous cycle cannot end before there is enough contiguous free
space for writing new files,
which may never happen. Online de-fragmentation will not help in this
case either.
With opportunistic de-fragmentation, if the extent mapped files are
being re-written, the health
of the file system will constantly improve over time.
BTW, Is this use case relevant for upgraded google chunk servers?

Amir.


2010-11-09 08:58:23

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC] Self healing extent map OR opportunistic de-fragmentation

On 2010-11-08, at 21:14, Amir Goldstein wrote:
> I would like to propose a simple idea how to automatically de-fragment a file.

[snip]

> The use case for this, besides healing fragmentation caused by
> snapshots move-on-rewrite, is an highly fragmented ext2/3 fs, which was mounted as ext4. ext2/3 old files are slowly being deleted while new (still fragmented) extent mapped files are being created.
> This viscous cycle cannot end before there is enough contiguous free
> space for writing new files, which may never happen.

This will only happen in case the free space is _very_ low. Normally, in a situation like this, mballoc will allocate the largest contiguous chunks of free space, reducing the fragmentation as new files are written, and allocations to highly-fragmented block groups will be avoided until the chunks in those groups have grown larger.

> Online de-fragmentation will not help in this case either.
> With opportunistic de-fragmentation, if the extent mapped files are
> being re-written, the health of the file system will constantly improve over time. BTW, Is this use case relevant for upgraded google chunk servers?

While this is true in theory, the problem is that in most cases files are not overwritten in place. Commonly, when files are "rewritten" they are truncated and new blocks allocated, or a new file is written and renamed in place of the old file. Only in rare cases, like databases, are files rewritten in-place.

Cheers, Andreas






2010-11-09 09:35:08

by Amir Goldstein

[permalink] [raw]
Subject: Re: [RFC] Self healing extent map OR opportunistic de-fragmentation

On Tue, Nov 9, 2010 at 10:58 AM, Andreas Dilger
<[email protected]> wrote:
> On 2010-11-08, at 21:14, Amir Goldstein wrote:
>> I would like to propose a simple idea how to automatically de-fragment a file.
>
> [snip]
>
>> The use case for this, besides healing fragmentation caused by
>> snapshots move-on-rewrite, is an highly fragmented ext2/3 fs, which was mounted as ext4.
>> ext2/3 old files are slowly being deleted while new (still fragmented) extent mapped files are being created.
>> This viscous cycle cannot end before there is enough contiguous free
>> space for writing new files, which may never happen.
>
> This will only happen in case the free space is _very_ low. ?Normally, in a situation like this, mballoc will allocate the
> largest contiguous chunks of free space, reducing the fragmentation as new files are written, and allocations to
> highly-fragmented block groups will be avoided until the chunks in those groups have grown larger.
>
>> Online de-fragmentation will not help in this case either.
>> With opportunistic de-fragmentation, if the extent mapped files are
>> being re-written, the health of the file system will constantly improve over time.
>> BTW, Is this use case relevant for upgraded google chunk servers?
>
> While this is true in theory, the problem is that in most cases files are not overwritten in place.
> Commonly, when files are "rewritten" they are truncated and new blocks allocated, or a new file is written and renamed in place of the old file.
> Only in rare cases, like databases, are files rewritten in-place.
>

Oh, I know that. Which is why it is a bit annoying to invest a lot of
effort to solve the fragmentation caused by snapshot move-on-rewrite.
In those rare use cases, the file may end up like a swiss cheese after
a while. However, I realized that the access pattern of those
applications
in not all that random. A rewrite to offset X has a high likelihood to
repeat more than once (update to DB record or write of metadata block
in virtual disk).
I figured I could use the opportunity of the subsequent rewrites to
restore the file blocks to their original location, without paying a
performance trade-off.

The question is, are there other use cases out there that can benefit
from opportunistic de-fragmentation?

Amir.