2011-02-22 21:19:09

by Chris Mason

[permalink] [raw]
Subject: fiemap bugs on sparse files.

[ resend, sorry if this is a dup ]

Hi everyone,

We've had reports on btrfs that cp is giving us files full of zeros
instead of actually copying them. It was tracked down to a bug with
the btrfs fiemap implementation where it was returning holes for
delalloc ranges.

Newer versions of cp are trusting fiemap to tell it where the holes
are, which does seem like a pretty neat trick.

I decided to give xfs and ext4 a shot with a few tests cases too, xfs
passed with all the ones btrfs was getting wrong, and ext4 got the basic
delalloc case right.

# mkfs.ext4 /dev/xxx
# mount /dev/xxx /mnt
# dd if=/dev/zero of=/mnt/foo bs=1M count=1
# fiemap-test foo
ext: 0 logical: [ 0.. 255] phys: 0.. 255 flags: 0x007 tot: 256

Horray! But once we throw a hole in, things go bad:

# mkfs.ext4 /dev/xxx
# mount /dev/xxx /mnt
# dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
# fiemap-test foo
< no output >

We've got a delalloc extent after the hole and ext4 fiemap didn't find
it. If I run sync to kick the delalloc out:

# sync
# fiemap-test foo
ext: 0 logical: [ 256.. 511] phys: 34048.. 34303 flags: 0x001 tot: 256

fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
got there. It's full of pretty comments so I know it isn't mine, but
you can grab it here:

http://oss.oracle.com/~mason/fiemap-test.c

xfsqa has a fiemap program too.

-chris


2011-02-23 08:59:26

by Yongqiang Yang

[permalink] [raw]
Subject: Re: fiemap bugs on sparse files.

On Wed, Feb 23, 2011 at 5:18 AM, Chris Mason <[email protected]> wrote:
> [ resend, sorry if this is a dup ]
>
> Hi everyone,
>
> We've had reports on btrfs that cp is giving us files full of zeros
> instead of actually copying them. ?It was tracked down to a bug with
> the btrfs fiemap implementation where it was returning holes for
> delalloc ranges.
>
> Newer versions of cp are trusting fiemap to tell it where the holes
> are, which does seem like a pretty neat trick.
>
> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
> passed with all the ones btrfs was getting wrong, and ext4 got the basic
> delalloc case right.
>
> # mkfs.ext4 /dev/xxx
> # mount /dev/xxx /mnt
> # dd if=/dev/zero of=/mnt/foo bs=1M count=1
> # fiemap-test foo
> ext: ? 0 logical: [ ? ? ? 0.. ? ? 255] phys: ? ? ? ?0.. ? ? 255 flags: 0x007 tot: 256
>
> Horray! ?But once we throw a hole in, things go bad:
>
> # mkfs.ext4 /dev/xxx
> # mount /dev/xxx /mnt
> # dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
> # fiemap-test foo
> < no output >
Actually, there is no extent in extent tree now, so
ext4_ext_walk_space() will pass ext4_ext_fiemap_cb() a variable of
struct ext4_ext_cache with the requested length. But in
ext4_ext_fiemap_cb() just the paging contains start block is got
via find_get_page(), if find_get_page() return null,
ext4_ext_fiemap_cb() thinks the whole request range is empty and it
returns request range.

In 1st case, find_get_page() will succeed.

It seems that we should get no. of pages in page cache if
find_get_page() fails, and correct the range to be returned.

Right?

If right I will send a patch.

>
> We've got a delalloc extent after the hole and ext4 fiemap didn't find
> it. ?If I run sync to kick the delalloc out:
>
> # sync
> # fiemap-test foo
> ext: ? 0 logical: [ ? ? 256.. ? ? 511] phys: ? ?34048.. ? 34303 flags: 0x001 tot: 256

Now, there is a extent in extent tree at least. and
ext4_ext_walk_space() will regulate the request range to be passed to
ext4_ext_fiemap_cb().
>
> fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
> got there. ?It's full of pretty comments so I know it isn't mine, but
> you can grab it here:
>
> http://oss.oracle.com/~mason/fiemap-test.c
>
> xfsqa has a fiemap program too.
>
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best Wishes
Yongqiang Yang

2011-02-23 09:34:43

by Yongqiang Yang

[permalink] [raw]
Subject: Re: fiemap bugs on sparse files.

On Wed, Feb 23, 2011 at 4:59 PM, Yongqiang Yang <[email protected]> wrote:
> On Wed, Feb 23, 2011 at 5:18 AM, Chris Mason <[email protected]> wrote:
>> [ resend, sorry if this is a dup ]
>>
>> Hi everyone,
>>
>> We've had reports on btrfs that cp is giving us files full of zeros
>> instead of actually copying them. ?It was tracked down to a bug with
>> the btrfs fiemap implementation where it was returning holes for
>> delalloc ranges.
>>
>> Newer versions of cp are trusting fiemap to tell it where the holes
>> are, which does seem like a pretty neat trick.
>>
>> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>> passed with all the ones btrfs was getting wrong, and ext4 got the basic
>> delalloc case right.
>>
>> # mkfs.ext4 /dev/xxx
>> # mount /dev/xxx /mnt
>> # dd if=/dev/zero of=/mnt/foo bs=1M count=1
>> # fiemap-test foo
>> ext: ? 0 logical: [ ? ? ? 0.. ? ? 255] phys: ? ? ? ?0.. ? ? 255 flags: 0x007 tot: 256
>>
>> Horray! ?But once we throw a hole in, things go bad:
>>
>> # mkfs.ext4 /dev/xxx
>> # mount /dev/xxx /mnt
>> # dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>> # fiemap-test foo
>> < no output >
> Actually, there is no extent in extent tree now, so
> ext4_ext_walk_space() will pass ext4_ext_fiemap_cb() a variable of
> struct ext4_ext_cache with the requested length. But in
> ext4_ext_fiemap_cb() just the paging contains start block is got
> via find_get_page(), if find_get_page() return null,
> ext4_ext_fiemap_cb() ?thinks the whole request range is empty and it
> returns request range.
>
> In 1st case, find_get_page() will succeed.
>
> It seems that we should get no. of pages in page cache if
> find_get_page() fails, and correct the range to be returned.
We can call find_get_pages() with nr_pages=1 instead. And we can regulate
the range with page->index if it is not the the paging contains start block.


>
> Right?
>
> If right I will send a patch.
>
>>
>> We've got a delalloc extent after the hole and ext4 fiemap didn't find
>> it. ?If I run sync to kick the delalloc out:
>>
>> # sync
>> # fiemap-test foo
>> ext: ? 0 logical: [ ? ? 256.. ? ? 511] phys: ? ?34048.. ? 34303 flags: 0x001 tot: 256
>
> Now, there is a extent in extent tree at least. and
> ext4_ext_walk_space() will regulate the request range to be passed to
> ext4_ext_fiemap_cb().
>>
>> fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
>> got there. ?It's full of pretty comments so I know it isn't mine, but
>> you can grab it here:
>>
>> http://oss.oracle.com/~mason/fiemap-test.c
>>
>> xfsqa has a fiemap program too.
>>
>> -chris
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang
>



--
Best Wishes
Yongqiang Yang

2011-02-23 15:34:42

by Eric Sandeen

[permalink] [raw]
Subject: Re: fiemap bugs on sparse files.

On 2/23/11 3:34 AM, Yongqiang Yang wrote:
> On Wed, Feb 23, 2011 at 4:59 PM, Yongqiang Yang <[email protected]> wrote:
>> On Wed, Feb 23, 2011 at 5:18 AM, Chris Mason <[email protected]> wrote:
>>> [ resend, sorry if this is a dup ]
>>>
>>> Hi everyone,
>>>
>>> We've had reports on btrfs that cp is giving us files full of zeros
>>> instead of actually copying them. It was tracked down to a bug with
>>> the btrfs fiemap implementation where it was returning holes for
>>> delalloc ranges.
>>>
>>> Newer versions of cp are trusting fiemap to tell it where the holes
>>> are, which does seem like a pretty neat trick.
>>>
>>> I decided to give xfs and ext4 a shot with a few tests cases too, xfs
>>> passed with all the ones btrfs was getting wrong, and ext4 got the basic
>>> delalloc case right.
>>>
>>> # mkfs.ext4 /dev/xxx
>>> # mount /dev/xxx /mnt
>>> # dd if=/dev/zero of=/mnt/foo bs=1M count=1
>>> # fiemap-test foo
>>> ext: 0 logical: [ 0.. 255] phys: 0.. 255 flags: 0x007 tot: 256
>>>
>>> Horray! But once we throw a hole in, things go bad:
>>>
>>> # mkfs.ext4 /dev/xxx
>>> # mount /dev/xxx /mnt
>>> # dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
>>> # fiemap-test foo
>>> < no output >
>> Actually, there is no extent in extent tree now, so
>> ext4_ext_walk_space() will pass ext4_ext_fiemap_cb() a variable of
>> struct ext4_ext_cache with the requested length. But in
>> ext4_ext_fiemap_cb() just the paging contains start block is got
>> via find_get_page(), if find_get_page() return null,
>> ext4_ext_fiemap_cb() thinks the whole request range is empty and it
>> returns request range.
>>
>> In 1st case, find_get_page() will succeed.
>>
>> It seems that we should get no. of pages in page cache if
>> find_get_page() fails, and correct the range to be returned.
> We can call find_get_pages() with nr_pages=1 instead. And we can regulate
> the range with page->index if it is not the the paging contains start block.
>
>
>>
>> Right?
>>
>> If right I will send a patch.

Your analysis is correct, the way it's working now is pretty broken (my fault I'm afraid)

Right now we only look at the first page in a "gap" to see if it's delalloc; we need to search through any dirty pages in the gap, since the first page may be a hole, with delalloc ranges coming later.

We need some variant of page cache search, yes.

-Eric