Hi list!
It seems I have found a serious regression in compressed btrfs in
kernel 2.6.37. When creating a small file (less than the block size)
and then cp/mv it to *another* file system, an appropriate number of
zeroes gets written to the destination file. Case in point:
% echo foobar > foobar
% hexdump -C foobar
00000000 66 6f 6f 62 61 72 0a |foobar.|
00000007
% mv foobar /tmp
% hexdump -C /tmp/foobar
00000000 00 00 00 00 00 00 00 |.......|
00000007
% cp foobar foobar2
% hexdump -C foobar2
00000000 00 00 00 00 00 00 00 |.......|
00000007
Via strace I found that mv doesn't even attempt to read anything:
open("foobar", O_RDONLY|O_NOFOLLOW) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=7, ...}) = 0
open("/tmp/foobar", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
ioctl(3, FS_IOC_FIEMAP, 0x7fff62f6bfa0) = 0
write(4, "\0\0\0\0\0\0\0", 7) = 7
What's that, is FS_IOC_FIEMAP telling it that it's a sparse file?
Compare with ext4:
ioctl(3, FS_IOC_FIEMAP, 0x7fff2c576a90) = 0
lseek(3, 0, SEEK_SET) = 0
read(3, "foobar\n", 4096) = 7
write(4, "foobar\n", 7) = 7
I'm currently running on 2.6.37, x86_64 using Arch Linux -testing with
coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o
noatime,compress
This only seems to occur with compressed file systems (either zlib or
LZO). A person on IRC also reproduced the same problem in 2.6.28-rc.
I'm pretty sure this used to work correctly around 2.6.35 or 2.6.36.
This is 100% reproducible here. If anyone has trouble reproducing
this, I can dig further and provide information as needed.
Regards,
Marti
On Sun, Feb 13, 2011 at 05:49:42PM +0200, Marti Raudsepp wrote:
> Hi list!
>
> It seems I have found a serious regression in compressed btrfs in
> kernel 2.6.37. When creating a small file (less than the block size)
> and then cp/mv it to *another* file system, an appropriate number of
> zeroes gets written to the destination file. Case in point:
>
> % echo foobar > foobar
> % hexdump -C foobar
> 00000000 66 6f 6f 62 61 72 0a |foobar.|
> 00000007
> % mv foobar /tmp
> % hexdump -C /tmp/foobar
> 00000000 00 00 00 00 00 00 00 |.......|
> 00000007
> % cp foobar foobar2
> % hexdump -C foobar2
> 00000000 00 00 00 00 00 00 00 |.......|
> 00000007
>
> Via strace I found that mv doesn't even attempt to read anything:
>
> open("foobar", O_RDONLY|O_NOFOLLOW) = 3
> fstat(3, {st_mode=S_IFREG|0664, st_size=7, ...}) = 0
> open("/tmp/foobar", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
> fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
> ioctl(3, FS_IOC_FIEMAP, 0x7fff62f6bfa0) = 0
> write(4, "\0\0\0\0\0\0\0", 7) = 7
>
> What's that, is FS_IOC_FIEMAP telling it that it's a sparse file?
> Compare with ext4:
>
> ioctl(3, FS_IOC_FIEMAP, 0x7fff2c576a90) = 0
> lseek(3, 0, SEEK_SET) = 0
> read(3, "foobar\n", 4096) = 7
> write(4, "foobar\n", 7) = 7
>
> I'm currently running on 2.6.37, x86_64 using Arch Linux -testing with
> coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o
> noatime,compress
>
> This only seems to occur with compressed file systems (either zlib or
> LZO). A person on IRC also reproduced the same problem in 2.6.28-rc.
> I'm pretty sure this used to work correctly around 2.6.35 or 2.6.36.
>
> This is 100% reproducible here. If anyone has trouble reproducing
> this, I can dig further and provide information as needed.
>
Does the same problem happen when you use cp --sparse=never? Thanks,
Josef
On Sun, Feb 13, 2011 at 17:57, Josef Bacik <[email protected]> wrote:
> Does the same problem happen when you use cp --sparse=never?
You are right. cp --sparse=never does not cause data loss.
Regards,
Marti
On Sun, Feb 13, 2011 at 06:07:36PM +0200, Marti Raudsepp wrote:
> On Sun, Feb 13, 2011 at 17:57, Josef Bacik <[email protected]> wrote:
> > Does the same problem happen when you use cp --sparse=never?
>
> You are right. cp --sparse=never does not cause data loss.
>
So fiemap probably isn't doing the right thing when compression is enabled,
which doesn't suprise me since we don't do the right thing with delalloc either.
I will try and get to this soon. Thanks,
Josef
On Sun, Feb 13, 2011 at 05:49:42PM +0200, Marti Raudsepp wrote:
> Hi list!
>
> It seems I have found a serious regression in compressed btrfs in
> kernel 2.6.37. When creating a small file (less than the block size)
> and then cp/mv it to *another* file system, an appropriate number of
> zeroes gets written to the destination file. Case in point:
[snip]
> I'm currently running on 2.6.37, x86_64 using Arch Linux -testing with
> coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o
> noatime,compress
>
> This only seems to occur with compressed file systems (either zlib or
> LZO). A person on IRC also reproduced the same problem in 2.6.28-rc.
> I'm pretty sure this used to work correctly around 2.6.35 or 2.6.36.
This would seem to be the same effect that we've had reported on
IRC by at least two Gentoo users, of files full of zeroes in their
build system. We'll follow up with them over there and see if it's the
same bug.
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I must be musical: I've got *loads* of CDs ---
Excerpts from Josef Bacik's message of 2011-02-13 11:13:30 -0500:
> On Sun, Feb 13, 2011 at 06:07:36PM +0200, Marti Raudsepp wrote:
> > On Sun, Feb 13, 2011 at 17:57, Josef Bacik <[email protected]> wrote:
> > > Does the same problem happen when you use cp --sparse=never?
> >
> > You are right. cp --sparse=never does not cause data loss.
> >
>
> So fiemap probably isn't doing the right thing when compression is enabled,
> which doesn't suprise me since we don't do the right thing with delalloc either.
> I will try and get to this soon. Thanks,
This might be a bug in the cp code. We're setting the disk extent to
zero but setting different flags to say we're inline and compressed.
The cp fiemap code might be ignoring the flags?
Or, it could just be delalloc ;)
-chris
On Mon, Feb 14, 2011 at 17:01, Chris Mason <[email protected]> wrote:
> Or, it could just be delalloc ;)
I suspect delalloc. After creating the file, filefrag reports "1
extent found", but for some reason it doesn't actually print out
details of the extent.
After a "sync" call, the extent appears and "cp" starts working as expected:
% rm -f foo bar
% echo foo > foo
% sync
% filefrag -v foo
Filesystem type is: 9123683e
File size of foo is 4 (1 block, blocksize 4096)
ext logical physical expected length flags
0 0 0 4096 not_aligned,inline,eof
foo: 1 extent found
% cp foo bar
% hexdump bar
0000000 6f66 0a6f
0000004
Without sync:
% rm -f foo bar
% echo foo > foo
% filefrag -v foo
Filesystem type is: 9123683e
File size of foo is 4 (1 block, blocksize 4096)
ext logical physical expected length flags
foo: 1 extent found
% cp foo bar
% hexdump bar
0000000 0000 0000
0000004
Regards,
Marti
Excerpts from Marti Raudsepp's message of 2011-02-14 12:58:17 -0500:
> On Mon, Feb 14, 2011 at 17:01, Chris Mason <[email protected]> wrote:
> > Or, it could just be delalloc ;)
>
> I suspect delalloc. After creating the file, filefrag reports "1
> extent found", but for some reason it doesn't actually print out
> details of the extent.
>
> After a "sync" call, the extent appears and "cp" starts working as expected:
Great, that's a ton easier than fixing cp.
-chris
On 14/02/11 17:58, Marti Raudsepp wrote:
> On Mon, Feb 14, 2011 at 17:01, Chris Mason <[email protected]> wrote:
>> Or, it could just be delalloc ;)
>
> I suspect delalloc. After creating the file, filefrag reports "1
> extent found", but for some reason it doesn't actually print out
> details of the extent.
That's a bug in `filefrag -v` that I noticed independently yesterday.
Without -v it will correctly report 0 extents.
I've already suggested a patch to fix upstream.
> After a "sync" call, the extent appears and "cp" starts working as expected:
About that sync.
I've noticed on ext4 loop back at least (and I suspect BTRFS is the same)
that specifying FIEMAP_FLAG_SYNC (which cp does) is ineffective.
I worked around this for cp tests by explicitly syncing with:
dd if=/dev/null of=foo conv=notrunc,fdatasync
> % rm -f foo bar
> % echo foo > foo
> % sync
> % filefrag -v foo
> Filesystem type is: 9123683e
> File size of foo is 4 (1 block, blocksize 4096)
> ext logical physical expected length flags
> 0 0 0 4096 not_aligned,inline,eof
> foo: 1 extent found
> % cp foo bar
> % hexdump bar
> 0000000 6f66 0a6f
> 0000004
OK that's fine for normal files.
cp (from coreutils >= 8.10) may still do the wrong thing
as it currently ignores FIEMAP_EXTENT_DATA_ENCRYPTED and FIEMAP_EXTENT_ENCODED
as I've already reported:
http://www.mail-archive.com/[email protected]/msg08356.html
I'd appreciate some `filefrag -v` output from a large compressed file.
cheers,
Pádraig.
On Tue, Feb 15, 2011 at 11:30:38AM +0000, P?draig Brady wrote:
> On 14/02/11 17:58, Marti Raudsepp wrote:
> > On Mon, Feb 14, 2011 at 17:01, Chris Mason <[email protected]> wrote:
> >> Or, it could just be delalloc ;)
> >
> > I suspect delalloc. After creating the file, filefrag reports "1
> > extent found", but for some reason it doesn't actually print out
> > details of the extent.
>
> That's a bug in `filefrag -v` that I noticed independently yesterday.
> Without -v it will correctly report 0 extents.
> I've already suggested a patch to fix upstream.
>
> > After a "sync" call, the extent appears and "cp" starts working as expected:
>
> About that sync.
> I've noticed on ext4 loop back at least (and I suspect BTRFS is the same)
> that specifying FIEMAP_FLAG_SYNC (which cp does) is ineffective.
> I worked around this for cp tests by explicitly syncing with:
> dd if=/dev/null of=foo conv=notrunc,fdatasync
>
Well thats not good, thats all take care of in the generic code before it gets
to the fs, I'll take a look at that when I try and fix delalloc fiemap for
btrfs. Thanks,
Josef