2011-10-01 14:01:35

by Dave Young

[permalink] [raw]
Subject: [BUG] copy file result with zero

Hi,

Weird problem, when I build app from source,
make; make install
run the command, but got "cannot execute binary file"

hexdump shows the installed binary is full of zero

Is it related to ext4 fiemap problem described below?
http://lwn.net/Articles/429349/

I finally managed to find the way to reproduce this:
just cp a elf binary A to file B, then cp B to file C, then you will get:
A == B != C

ie.
cp /bin/ls ls1
cp ls1 ls2

ls2 will be filled with zero

Below is a strace log of install, kernel version is 3.1.0-rc6+
geteuid() = 0
umask(0) = 022
stat("/tmp/vpnc", 0x7fff85363710) = -1 ENOENT (No such file or directory)
stat("vpnc", {st_mode=S_IFREG|0755, st_size=368662, ...}) = 0
lstat("/tmp/vpnc", 0x7fff85363250) = -1 ENOENT (No such file or directory)
open("vpnc", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=368662, ...}) = 0
open("/tmp/vpnc", O_WRONLY|O_CREAT|O_EXCL, 0755) = 4
fstat(4, {st_mode=S_IFREG|0755, st_size=0, ...}) = 0
uname({sys="Linux", node="darkstar", ...}) = 0
ioctl(3, FS_IOC_FIEMAP, 0x7fff85361f60) = 0
ftruncate(4, 368662) = 0
fsetxattr(4, "system.posix_acl_access",
"\x02\x00\x00\x00\x01\x00\x06\x00\xff\xff\xff\xff\x04\x00\x00\x00\xff\xff\xff\xff
\x00\x00\x00\xff\xff\xff\xff", 28, 0) = 0
close(4) = 0
close(3) = 0
chmod("/tmp/vpnc", 0755) = 0
close(0) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?
--
Regards
Dave


2011-10-01 14:39:03

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
> Hi,
>
> Weird problem, when I build app from source,
> make; make install
> run the command, but got "cannot execute binary file"
>
> hexdump shows the installed binary is full of zero
>
> Is it related to ext4 fiemap problem described below?
> http://lwn.net/Articles/429349/

There is general agreement that /bin/cp should not have been relying
on FIEMAP, and I believe the more recent versions of /bin/cp have
removed that code by default pending implementation of
SEEK_HOLE/SEEK_DATA. That being said, ext4 had a workaround to its
FIEMAP implementation that landed in 2.6.39, and you're using
3.1.0-rc6.

> I finally managed to find the way to reproduce this:
> just cp a elf binary A to file B, then cp B to file C, then you will get:
> A == B != C
>
> ie.
> cp /bin/ls ls1
> cp ls1 ls2
>
> ls2 will be filled with zero

If you add a "sync" between the two copies, does that work around the
problem? I bet it will...

My suggestion is to upgrade to a newer version of coreutils that
doesn't try to use FIEMAP.

- Ted

2011-10-01 23:37:33

by Dave Young

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>> Hi,
>>
>> Weird problem, when I build app from source,
>> make; make install
>> run the command, but got "cannot execute binary file"
>>
>> hexdump shows the installed binary is full of zero
>>
>> Is it related to ext4 fiemap problem described below?
>> http://lwn.net/Articles/429349/
>
> There is general agreement that /bin/cp should not have been relying
> on FIEMAP, and I believe the more recent versions of /bin/cp have
> removed that code by default pending implementation of
> SEEK_HOLE/SEEK_DATA.  That being said, ext4 had a workaround to its
> FIEMAP implementation that landed in 2.6.39, and you're using
> 3.1.0-rc6.

Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?

>
>> I finally managed to find the way to reproduce this:
>> just cp a elf binary A  to file B, then cp B to file C,  then you will get:
>> A == B != C
>>
>> ie.
>> cp /bin/ls ls1
>> cp ls1 ls2
>>
>> ls2 will be filled with zero
>
> If you add a "sync" between the two copies, does that work around the
> problem?  I bet it will...

Yes, it works

>
> My suggestion is to upgrade to a newer version of coreutils that
> doesn't try to use FIEMAP.

Thanks, will try

>
>                                        - Ted
>



--
Regards
Dave

2011-10-02 06:41:17

by Jeff Liu

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero



> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>> Hi,
>>>
>>> Weird problem, when I build app from source,
>>> make; make install
>>> run the command, but got "cannot execute binary file"
>>>
>>> hexdump shows the installed binary is full of zero
>>>
>>> Is it related to ext4 fiemap problem described below?
>>> http://lwn.net/Articles/429349/
>>
>> There is general agreement that /bin/cp should not have been relying
>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>> removed that code by default pending implementation of
>> SEEK_HOLE/SEEK_DATA. That being said, ext4 had a workaround to its
>> FIEMAP implementation that landed in 2.6.39, and you're using
>> 3.1.0-rc6.

Actually, upstream cp(1) using FIEMAP only if the source file is sparse, or else, it will do normal copy, i.e, block based.


Thanks,
-Jeff
>
> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>
>>
>>> I finally managed to find the way to reproduce this:
>>> just cp a elf binary A to file B, then cp B to file C, then you will get:
>>> A == B != C
>>>
>>> ie.
>>> cp /bin/ls ls1
>>> cp ls1 ls2
>>>
>>> ls2 will be filled with zero
>>
>> If you add a "sync" between the two copies, does that work around the
>> problem? I bet it will...
>
> Yes, it works
>
>>
>> My suggestion is to upgrade to a newer version of coreutils that
>> doesn't try to use FIEMAP.
>
> Thanks, will try
>
>>
>> - Ted
>>
>
>
>
> --
> Regards
> Dave
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-10-02 06:59:40

by Andreas Dilger

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>> Hi,
>>>>
>>>> Weird problem, when I build app from source,
>>>> make; make install
>>>> run the command, but got "cannot execute binary file"
>>>>
>>>> hexdump shows the installed binary is full of zero
>>>>
>>>> Is it related to ext4 fiemap problem described below?
>>>> http://lwn.net/Articles/429349/
>>>
>>> There is general agreement that /bin/cp should not have been relying
>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>> removed that code by default pending implementation of
>>> SEEK_HOLE/SEEK_DATA. That being said, ext4 had a workaround to its
>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>> 3.1.0-rc6.
>
> Actually, upstream cp(1) using FIEMAP only if the source file is sparse, or else, it will do normal copy, i.e, block based.

My understanding is that cp uses the blocks count to determine whether the file is sparse or not. In the case of delayed allocation (where blocks are not yet allocated, if they are not reflected in the i_blocks count) it might mistakenly think that the file is sparse.

Given the danger of this bug, it is important to ensure ext4 returns DELALLOC extents for pages in the page cache. I think Yongqiang Yang just submitted a patch series to do this for ext4, so it would be important to verify it fixes this problem.

>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>>
>>>
>>>> I finally managed to find the way to reproduce this:
>>>> just cp a elf binary A to file B, then cp B to file C, then you will get:
>>>> A == B != C
>>>>
>>>> ie.
>>>> cp /bin/ls ls1
>>>> cp ls1 ls2
>>>>
>>>> ls2 will be filled with zero
>>>
>>> If you add a "sync" between the two copies, does that work around the
>>> problem? I bet it will...
>>
>> Yes, it works
>>
>>>
>>> My suggestion is to upgrade to a newer version of coreutils that
>>> doesn't try to use FIEMAP.
>>
>> Thanks, will try
>>
>>>
>>> - Ted
>>>
>>
>>
>>
>> --
>> Regards
>> Dave
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2011-10-02 07:02:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>> Weird problem, when I build app from source,
>>>> make; make install
>>>> run the command, but got "cannot execute binary file"
>>>>
>>>> hexdump shows the installed binary is full of zero
>>>>
>>>> Is it related to ext4 fiemap problem described below?
>>>> http://lwn.net/Articles/429349/
>>>
>>> There is general agreement that /bin/cp should not have been relying
>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>> removed that code by default pending implementation of
>>> SEEK_HOLE/SEEK_DATA. That being said, ext4 had a workaround to its
>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>> 3.1.0-rc6.
>
> Actually, upstream cp(1) using FIEMAP only if the source file is sparse, or else, it will do normal copy, i.e, block based.

Are there any distros that are shipping with a version of cp that depends on FIEMAP? That would dramatically increase the severity of this problem, since orders of magnitude more users will hit the problem.

Dave, what distro were you seeing this problem on, and had you installed/upgraded your coreutils and/or kernel yourself?

>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>>
>>>
>>>> I finally managed to find the way to reproduce this:
>>>> just cp a elf binary A to file B, then cp B to file C, then you will get:
>>>> A == B != C
>>>>
>>>> ie.
>>>> cp /bin/ls ls1
>>>> cp ls1 ls2
>>>>
>>>> ls2 will be filled with zero
>>>
>>> If you add a "sync" between the two copies, does that work around the
>>> problem? I bet it will...
>>
>> Yes, it works
>>
>>>
>>> My suggestion is to upgrade to a newer version of coreutils that
>>> doesn't try to use FIEMAP.
>>
>> Thanks, will try
>>
>>>
>>> - Ted
>>>
>>
>>
>>
>> --
>> Regards
>> Dave
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2011-10-02 07:15:30

by Jeff Liu

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero


?? 2011-10-2??????3:59?? Andreas Dilger д????

> On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>>> Hi,
>>>>>
>>>>> Weird problem, when I build app from source,
>>>>> make; make install
>>>>> run the command, but got "cannot execute binary file"
>>>>>
>>>>> hexdump shows the installed binary is full of zero
>>>>>
>>>>> Is it related to ext4 fiemap problem described below?
>>>>> http://lwn.net/Articles/429349/
>>>>
>>>> There is general agreement that /bin/cp should not have been relying
>>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>>> removed that code by default pending implementation of
>>>> SEEK_HOLE/SEEK_DATA. That being said, ext4 had a workaround to its
>>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>>> 3.1.0-rc6.
>>
>> Actually, upstream cp(1) using FIEMAP only if the source file is sparse, or else, it will do normal copy, i.e, block based.
>
> My understanding is that cp uses the blocks count to determine whether the file is sparse or not.
Yes, it based on blocks count to determine that.

> In the case of delayed allocation (where blocks are not yet allocated, if they are not reflected in the i_blocks count) it might mistakenly think that the file is sparse.
Thanks for pointing this out, I missed this case.
So for Dave's issue, even if he updated to the upstream Coreutils, this issue will still exists occasionally for delayed allocation, if not run sync in between times.

>
> Given the danger of this bug, it is important to ensure ext4 returns DELALLOC extents for pages in the page cache. I think Yongqiang Yang just submitted a patch series to do this for ext4, so it would be important to verify it fixes this problem.


Thanks,
-Jeff
>
>>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>>>
>>>>
>>>>> I finally managed to find the way to reproduce this:
>>>>> just cp a elf binary A to file B, then cp B to file C, then you will get:
>>>>> A == B != C
>>>>>
>>>>> ie.
>>>>> cp /bin/ls ls1
>>>>> cp ls1 ls2
>>>>>
>>>>> ls2 will be filled with zero
>>>>
>>>> If you add a "sync" between the two copies, does that work around the
>>>> problem? I bet it will...
>>>
>>> Yes, it works
>>>
>>>>
>>>> My suggestion is to upgrade to a newer version of coreutils that
>>>> doesn't try to use FIEMAP.
>>>
>>> Thanks, will try
>>>
>>>>
>>>> - Ted
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Dave
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>

2011-10-02 08:43:20

by Dave Young

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Sun, Oct 2, 2011 at 4:02 PM, Andreas Dilger <[email protected]> wrote:
> On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>>> Weird problem, when I build app from source,
>>>>> make; make install
>>>>> run the command, but got "cannot execute binary file"
>>>>>
>>>>> hexdump shows the installed binary is full of zero
>>>>>
>>>>> Is it related to ext4 fiemap problem described below?
>>>>> http://lwn.net/Articles/429349/
>>>>
>>>> There is general agreement that /bin/cp should not have been relying
>>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>>> removed that code by default pending implementation of
>>>> SEEK_HOLE/SEEK_DATA.  That being said, ext4 had a workaround to its
>>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>>> 3.1.0-rc6.
>>
>> Actually, upstream cp(1) using FIEMAP only if the source file is sparse,  or else, it will do normal copy, i.e, block based.
>
> Are there any distros that are shipping with a version of cp that depends on FIEMAP?  That would dramatically increase the severity of this problem, since orders of magnitude more users will hit the problem.

I'm not sure if it depends on FIEMAP, I think it should be not so old.

>
> Dave, what distro were you seeing this problem on, and had you installed/upgraded your coreutils and/or kernel yourself?

Slackware 13.37, coreutils 8.11
kernel is always built from linus's git by myself

>
>>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>>>
>>>>
>>>>> I finally managed to find the way to reproduce this:
>>>>> just cp a elf binary A  to file B, then cp B to file C,  then you will get:
>>>>> A == B != C
>>>>>
>>>>> ie.
>>>>> cp /bin/ls ls1
>>>>> cp ls1 ls2
>>>>>
>>>>> ls2 will be filled with zero
>>>>
>>>> If you add a "sync" between the two copies, does that work around the
>>>> problem?  I bet it will...
>>>
>>> Yes, it works
>>>
>>>>
>>>> My suggestion is to upgrade to a newer version of coreutils that
>>>> doesn't try to use FIEMAP.
>>>
>>> Thanks, will try
>>>
>>>>
>>>>                                       - Ted
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Dave
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>



--
Regards
Dave

2011-10-02 08:46:30

by Dave Young

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

2011/10/2 Jeff liu <[email protected]>:
>
> 在 2011-10-2,下午3:59, Andreas Dilger 写道:
>
>> On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>>>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Weird problem, when I build app from source,
>>>>>> make; make install
>>>>>> run the command, but got "cannot execute binary file"
>>>>>>
>>>>>> hexdump shows the installed binary is full of zero
>>>>>>
>>>>>> Is it related to ext4 fiemap problem described below?
>>>>>> http://lwn.net/Articles/429349/
>>>>>
>>>>> There is general agreement that /bin/cp should not have been relying
>>>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>>>> removed that code by default pending implementation of
>>>>> SEEK_HOLE/SEEK_DATA.  That being said, ext4 had a workaround to its
>>>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>>>> 3.1.0-rc6.
>>>
>>> Actually, upstream cp(1) using FIEMAP only if the source file is sparse,  or else, it will do normal copy, i.e, block based.
>>
>> My understanding is that cp uses the blocks count to determine whether the file is sparse or not.
> Yes, it based on blocks count to determine that.
>
>> In the case of delayed allocation (where blocks are not yet allocated, if they are not reflected in the i_blocks count) it might mistakenly think that the file is sparse.

I think this might be my case

> Thanks for pointing this out, I missed this case.
> So for Dave's issue, even if he updated to the upstream Coreutils, this issue will still exists occasionally for delayed allocation, if not  run sync in between times.

Not occasionally, I can easily reproduce it recently.

>
>>
>> Given the danger of this bug, it is important to ensure ext4 returns DELALLOC extents for pages in the page cache.  I think Yongqiang Yang just submitted a patch series to do this for ext4, so it would be important to verify it fixes this problem.
>
>
> Thanks,
> -Jeff
>>
>>>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>>>>
>>>>>
>>>>>> I finally managed to find the way to reproduce this:
>>>>>> just cp a elf binary A  to file B, then cp B to file C,  then you will get:
>>>>>> A == B != C
>>>>>>
>>>>>> ie.
>>>>>> cp /bin/ls ls1
>>>>>> cp ls1 ls2
>>>>>>
>>>>>> ls2 will be filled with zero
>>>>>
>>>>> If you add a "sync" between the two copies, does that work around the
>>>>> problem?  I bet it will...
>>>>
>>>> Yes, it works
>>>>
>>>>>
>>>>> My suggestion is to upgrade to a newer version of coreutils that
>>>>> doesn't try to use FIEMAP.
>>>>
>>>> Thanks, will try
>>>>
>>>>>
>>>>>                                      - Ted
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Dave
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>
>



--
Regards
Dave

2011-10-02 11:54:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Sun, Oct 02, 2011 at 12:59:22AM -0700, Andreas Dilger wrote:
> My understanding is that cp uses the blocks count to determine whether the file is sparse or not. In the case of delayed allocation (where blocks are not yet allocated, if they are not reflected in the i_blocks count) it might mistakenly think that the file is sparse.

Ext4 fortunatley is smart enough to add the delalloc blocks to st_blocks
for state, just like all other filesystems implementing delayed
allocations.


2011-10-03 09:26:39

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Sun, Oct 2, 2011 at 3:59 PM, Andreas Dilger <[email protected]> wrote:
> On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>>> Hi,
>>>>>
>>>>> Weird problem, when I build app from source,
>>>>> make; make install
>>>>> run the command, but got "cannot execute binary file"
>>>>>
>>>>> hexdump shows the installed binary is full of zero
>>>>>
>>>>> Is it related to ext4 fiemap problem described below?
>>>>> http://lwn.net/Articles/429349/
>>>>
>>>> There is general agreement that /bin/cp should not have been relying
>>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>>> removed that code by default pending implementation of
>>>> SEEK_HOLE/SEEK_DATA. ?That being said, ext4 had a workaround to its
>>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>>> 3.1.0-rc6.
>>
>> Actually, upstream cp(1) using FIEMAP only if the source file is sparse, ?or else, it will do normal copy, i.e, block based.
>
> My understanding is that cp uses the blocks count to determine whether the file is sparse or not. ?In the case of delayed allocation (where blocks are not yet allocated, if they are not reflected in the i_blocks count) it might mistakenly think that the file is sparse.
>
> Given the danger of this bug, it is important to ensure ext4 returns DELALLOC extents for pages in the page cache. ?I think Yongqiang Yang just submitted a patch series to do this for ext4, so it would be important to verify it fixes this problem.
It seemed the patch[ ext4: in fiemap use FIEMAP_EXTENT_LAST flag for
last extent] (http://www.spinics.net/lists/linux-ext4/msg25698.html)
Lukas submitted on FIEMAP which ignores delayed extents beyond the
last allocated block. e.g. AAAHHHHDDDD
A - allocated, H - hole, D - delayed alloc, then the ending delayed
extent is ignored.

Yongqiang.
>
>>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
>>>
>>>>
>>>>> I finally managed to find the way to reproduce this:
>>>>> just cp a elf binary A ?to file B, then cp B to file C, ?then you will get:
>>>>> A == B != C
>>>>>
>>>>> ie.
>>>>> cp /bin/ls ls1
>>>>> cp ls1 ls2
>>>>>
>>>>> ls2 will be filled with zero
>>>>
>>>> If you add a "sync" between the two copies, does that work around the
>>>> problem? ?I bet it will...
>>>
>>> Yes, it works
>>>
>>>>
>>>> My suggestion is to upgrade to a newer version of coreutils that
>>>> doesn't try to use FIEMAP.
>>>
>>> Thanks, will try
>>>
>>>>
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted
>>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>> Dave
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>>> the body of a message to [email protected]
>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>



--
Best Wishes
Yongqiang Yang

2011-10-03 11:08:58

by Pádraig Brady

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On 10/02/2011 09:43 AM, Dave Young wrote:
> On Sun, Oct 2, 2011 at 4:02 PM, Andreas Dilger <[email protected]> wrote:
>> On 2011-10-01, at 11:41 PM, Jeff liu wrote:
>>>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
>>>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
>>>>>> Weird problem, when I build app from source,
>>>>>> make; make install
>>>>>> run the command, but got "cannot execute binary file"
>>>>>>
>>>>>> hexdump shows the installed binary is full of zero
>>>>>>
>>>>>> Is it related to ext4 fiemap problem described below?
>>>>>> http://lwn.net/Articles/429349/
>>>>>
>>>>> There is general agreement that /bin/cp should not have been relying
>>>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
>>>>> removed that code by default pending implementation of
>>>>> SEEK_HOLE/SEEK_DATA. That being said, ext4 had a workaround to its
>>>>> FIEMAP implementation that landed in 2.6.39, and you're using
>>>>> 3.1.0-rc6.
>>>
>>> Actually, upstream cp(1) using FIEMAP only if the source file is sparse, or else, it will do normal copy, i.e, block based.
>>
>> Are there any distros that are shipping with a version of cp that depends on FIEMAP? That would dramatically increase the severity of this problem, since orders of magnitude more users will hit the problem.
>
> I'm not sure if it depends on FIEMAP, I think it should be not so old.
>
>>
>> Dave, what distro were you seeing this problem on, and had you installed/upgraded your coreutils and/or kernel yourself?
>
> Slackware 13.37, coreutils 8.11
> kernel is always built from linus's git by myself

Coreutils 8.11 was only released for 13 days,
before 8.12 was released specifically to avoid this issue.
Slackware should update.

Coreutils 8.12 only uses a fiemap based copy for
sparse files, where it will do a sync first.
The sparseness heuristic is st_blocks < st_size / st_blksize

cheers,
Pádraig.

2011-10-03 13:11:41

by Lukas Czerner

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Mon, 3 Oct 2011, Yongqiang Yang wrote:

> On Sun, Oct 2, 2011 at 3:59 PM, Andreas Dilger <[email protected]> wrote:
> > On 2011-10-01, at 11:41 PM, Jeff liu wrote:
> >>> On Sat, Oct 1, 2011 at 10:39 PM, Ted Ts'o <[email protected]> wrote:
> >>>> On Sat, Oct 01, 2011 at 10:01:35PM +0800, Dave Young wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Weird problem, when I build app from source,
> >>>>> make; make install
> >>>>> run the command, but got "cannot execute binary file"
> >>>>>
> >>>>> hexdump shows the installed binary is full of zero
> >>>>>
> >>>>> Is it related to ext4 fiemap problem described below?
> >>>>> http://lwn.net/Articles/429349/
> >>>>
> >>>> There is general agreement that /bin/cp should not have been relying
> >>>> on FIEMAP, and I believe the more recent versions of /bin/cp have
> >>>> removed that code by default pending implementation of
> >>>> SEEK_HOLE/SEEK_DATA. ?That being said, ext4 had a workaround to its
> >>>> FIEMAP implementation that landed in 2.6.39, and you're using
> >>>> 3.1.0-rc6.
> >>
> >> Actually, upstream cp(1) using FIEMAP only if the source file is sparse, ?or else, it will do normal copy, i.e, block based.
> >
> > My understanding is that cp uses the blocks count to determine whether the file is sparse or not. ?In the case of delayed allocation (where blocks are not yet allocated, if they are not reflected in the i_blocks count) it might mistakenly think that the file is sparse.
> >
> > Given the danger of this bug, it is important to ensure ext4 returns DELALLOC extents for pages in the page cache. ?I think Yongqiang Yang just submitted a patch series to do this for ext4, so it would be important to verify it fixes this problem.
> It seemed the patch[ ext4: in fiemap use FIEMAP_EXTENT_LAST flag for
> last extent] (http://www.spinics.net/lists/linux-ext4/msg25698.html)
> Lukas submitted on FIEMAP which ignores delayed extents beyond the
> last allocated block. e.g. AAAHHHHDDDD
> A - allocated, H - hole, D - delayed alloc, then the ending delayed
> extent is ignored.

Oops, you're right. I think that the best solution would be to revert
the commit

c03f8aa9abdd517477c2021ea1251939b4da49e6
ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap

and then fix the original problem with your delayed extent tree
solution, where we can easily check not only for next allocated extent,
but also for next delayed extent to see if the current one is last or
not.

Currently, the problem is that at the point we are filling the fiemap
extent with fiemap_fill_next_extent() we do not have enough information
to say whether the extent is really the last or is not. And currently
there is not easy way to check for next delayed extent (which will be
fixed with your delayed extent tree).

I do not know how "ready" are your patches..Is it possible to wait for
them to be ready and fix it in your patch set ? That means, revert the
mentioned commit and reimplement fiemap with delayed extent tree.

Thanks!
-Lukas

>
> Yongqiang.
> >
> >>> Do you means It should work in 3.1.0-rc6 even with cp which depends fiemap?
> >>>
> >>>>
> >>>>> I finally managed to find the way to reproduce this:
> >>>>> just cp a elf binary A ?to file B, then cp B to file C, ?then you will get:
> >>>>> A == B != C
> >>>>>
> >>>>> ie.
> >>>>> cp /bin/ls ls1
> >>>>> cp ls1 ls2
> >>>>>
> >>>>> ls2 will be filled with zero
> >>>>
> >>>> If you add a "sync" between the two copies, does that work around the
> >>>> problem? ?I bet it will...
> >>>
> >>> Yes, it works
> >>>
> >>>>
> >>>> My suggestion is to upgrade to a newer version of coreutils that
> >>>> doesn't try to use FIEMAP.
> >>>
> >>> Thanks, will try
> >>>
> >>>>
> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Regards
> >>> Dave
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >>> the body of a message to [email protected]
> >>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >> the body of a message to [email protected]
> >> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >
> >
> > Cheers, Andreas
> >
> >
> >
> >
> >
> >
>
>
>
>

--

2011-10-03 14:18:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Mon, Oct 03, 2011 at 03:11:30PM +0200, Lukas Czerner wrote:
>
> Oops, you're right. I think that the best solution would be to revert
> the commit
>
> c03f8aa9abdd517477c2021ea1251939b4da49e6
> ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
>
> and then fix the original problem with your delayed extent tree
> solution, where we can easily check not only for next allocated extent,
> but also for next delayed extent to see if the current one is last or
> not.
> ...
>
> I do not know how "ready" are your patches..Is it possible to wait for
> them to be ready and fix it in your patch set ? That means, revert the
> mentioned commit and reimplement fiemap with delayed extent tree.

Sigh, yeah, we need to fix this to avoid the hang in xfstests #252 but
users losing data even if the coreutils release was only out there for
13 days is bad juju.

I'm working on reviewing the kernel patch backlog this week, and I'll
give this series one priority.

Thanks to Yongqiang and Lukas for looking into this!

- Ted

2011-10-03 15:52:08

by Lukas Czerner

[permalink] [raw]
Subject: Re: [BUG] copy file result with zero

On Mon, 3 Oct 2011, Ted Ts'o wrote:

> On Mon, Oct 03, 2011 at 03:11:30PM +0200, Lukas Czerner wrote:
> >
> > Oops, you're right. I think that the best solution would be to revert
> > the commit
> >
> > c03f8aa9abdd517477c2021ea1251939b4da49e6
> > ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
> >
> > and then fix the original problem with your delayed extent tree
> > solution, where we can easily check not only for next allocated extent,
> > but also for next delayed extent to see if the current one is last or
> > not.
> > ...
> >
> > I do not know how "ready" are your patches..Is it possible to wait for
> > them to be ready and fix it in your patch set ? That means, revert the
> > mentioned commit and reimplement fiemap with delayed extent tree.
>
> Sigh, yeah, we need to fix this to avoid the hang in xfstests #252 but
> users losing data even if the coreutils release was only out there for
> 13 days is bad juju.
>
> I'm working on reviewing the kernel patch backlog this week, and I'll
> give this series one priority.

Actually the series needs to be changed to to fix the problem. I'll
comment the appropriate patch.

Thanks!
-Lukas
>
> Thanks to Yongqiang and Lukas for looking into this!
>
> - Ted