2012-02-10 19:10:43

by Lukas Czerner

[permalink] [raw]
Subject: Punch hole problem on PAGE_SIZE > blocksize

Hi Allison,

I found quite disturbing problem when testing loop discard support on
file systems where PAGE_SIZE > blocksize. The result is that the file
system image is completely destroyed, but the underlying file system
seems ok. I have seen this messages in the logs:

EXT4-fs error (device sdb): ext4_ext_search_left:1221: inode #12: comm
flush-8:16: ix (2248761) != EXT_FIRST_INDEX (0) (depth 1)!
EXT4-fs (sdb): delayed block allocation failed for inode 12 at logical
offset 2258177 with max blocks 64 with error -5
EXT4-fs (sdb): This should not happen!! Data will be lost

and

EXT4-fs error (device sdd2): ext4_ext_get_blocks: inode #12: (comm
loop0) bad extent address iblock: 34479, depth: 3 pblock 0

Steps to reproduce

mkfs.ext4 -b1024 /dev/sdb
mount /dev/sdb /mnt/test2
dd if=/dev/zero of=/mnt/test2/file bs=1M count=4096
losetup /dev/loop0 /mnt/test2/file

cd xfstests

export TEST_DIR=/mnt/test
export TEST_DEV=/dev/sda
export SCRATCH_DEV=/dev/loop0
export SCRATCH_MNT=/mnt/test1
export MKFS_OPTIONS="-F -b1024"
export MOUNT_OPTIONS="-o discard"
export FSTYP="ext4"

while ./check 251; do echo "OK"; done

..and just wait and watch the logs.

Do you have any idea what might be the problem ?

Thanks!
-Lukas


2012-02-12 09:33:17

by Allison Henderson

[permalink] [raw]
Subject: Re: Punch hole problem on PAGE_SIZE > blocksize

On 02/10/2012 12:10 PM, Lukas Czerner wrote:
> Hi Allison,
>
> I found quite disturbing problem when testing loop discard support on
> file systems where PAGE_SIZE> blocksize. The result is that the file
> system image is completely destroyed, but the underlying file system
> seems ok. I have seen this messages in the logs:
>
> EXT4-fs error (device sdb): ext4_ext_search_left:1221: inode #12: comm
> flush-8:16: ix (2248761) != EXT_FIRST_INDEX (0) (depth 1)!
> EXT4-fs (sdb): delayed block allocation failed for inode 12 at logical
> offset 2258177 with max blocks 64 with error -5
> EXT4-fs (sdb): This should not happen!! Data will be lost
>
> and
>
> EXT4-fs error (device sdd2): ext4_ext_get_blocks: inode #12: (comm
> loop0) bad extent address iblock: 34479, depth: 3 pblock 0
>
> Steps to reproduce
>
> mkfs.ext4 -b1024 /dev/sdb
> mount /dev/sdb /mnt/test2
> dd if=/dev/zero of=/mnt/test2/file bs=1M count=4096
> losetup /dev/loop0 /mnt/test2/file
>
> cd xfstests
>
> export TEST_DIR=/mnt/test
> export TEST_DEV=/dev/sda
> export SCRATCH_DEV=/dev/loop0
> export SCRATCH_MNT=/mnt/test1
> export MKFS_OPTIONS="-F -b1024"
> export MOUNT_OPTIONS="-o discard"
> export FSTYP="ext4"
>
> while ./check 251; do echo "OK"; done
>
> ..and just wait and watch the logs.
>
> Do you have any idea what might be the problem ?
>
> Thanks!
> -Lukas
>

Hi Lukas,

Im having some trouble getting the bug to reproduce for me. I have the
dm-crypt module, but when I get to the test loop, i get "mount: unknown
filesystem type 'crypto_LUKS'". Is there something else I need to do or
install? With out being able to dig into it, I cant think of why it
would do that, I have not seen it produce that error before. :( Thx!

Allison Henderson


2012-02-12 10:31:48

by Lukas Czerner

[permalink] [raw]
Subject: Re: Punch hole problem on PAGE_SIZE > blocksize

On Sun, 12 Feb 2012, Allison Henderson wrote:

> On 02/10/2012 12:10 PM, Lukas Czerner wrote:
> > Hi Allison,
> >
> > I found quite disturbing problem when testing loop discard support on
> > file systems where PAGE_SIZE> blocksize. The result is that the file
> > system image is completely destroyed, but the underlying file system
> > seems ok. I have seen this messages in the logs:
> >
> > EXT4-fs error (device sdb): ext4_ext_search_left:1221: inode #12: comm
> > flush-8:16: ix (2248761) != EXT_FIRST_INDEX (0) (depth 1)!
> > EXT4-fs (sdb): delayed block allocation failed for inode 12 at logical
> > offset 2258177 with max blocks 64 with error -5
> > EXT4-fs (sdb): This should not happen!! Data will be lost
> >
> > and
> >
> > EXT4-fs error (device sdd2): ext4_ext_get_blocks: inode #12: (comm
> > loop0) bad extent address iblock: 34479, depth: 3 pblock 0
> >
> > Steps to reproduce
> >
> > mkfs.ext4 -b1024 /dev/sdb
> > mount /dev/sdb /mnt/test2
> > dd if=/dev/zero of=/mnt/test2/file bs=1M count=4096
> > losetup /dev/loop0 /mnt/test2/file
> >
> > cd xfstests
> >
> > export TEST_DIR=/mnt/test
> > export TEST_DEV=/dev/sda
> > export SCRATCH_DEV=/dev/loop0
> > export SCRATCH_MNT=/mnt/test1
> > export MKFS_OPTIONS="-F -b1024"
> > export MOUNT_OPTIONS="-o discard"
> > export FSTYP="ext4"
> >
> > while ./check 251; do echo "OK"; done
> >
> > ..and just wait and watch the logs.
> >
> > Do you have any idea what might be the problem ?
> >
> > Thanks!
> > -Lukas
> >
>
> Hi Lukas,
>
> Im having some trouble getting the bug to reproduce for me. I have the
> dm-crypt module, but when I get to the test loop, i get "mount: unknown
> filesystem type 'crypto_LUKS'". Is there something else I need to do or
> install? With out being able to dig into it, I cant think of why it would do
> that, I have not seen it produce that error before. :( Thx!
>
> Allison Henderson

Hi Allison,

I do not understand it either, there is no dm-crypt involved in this
scenario. One think that comes to my mind is that TEST_DEV (in my case
/dev/sda) needs to contain valid file system, but that is just how
xfstests works. Please, let me know if you still have problems
reproducing it.

Thanks!
-Lukas

2012-02-12 16:42:50

by Allison Henderson

[permalink] [raw]
Subject: Re: Punch hole problem on PAGE_SIZE > blocksize

On 02/12/2012 03:31 AM, Lukas Czerner wrote:
> On Sun, 12 Feb 2012, Allison Henderson wrote:
>
>> On 02/10/2012 12:10 PM, Lukas Czerner wrote:
>>> Hi Allison,
>>>
>>> I found quite disturbing problem when testing loop discard support on
>>> file systems where PAGE_SIZE> blocksize. The result is that the file
>>> system image is completely destroyed, but the underlying file system
>>> seems ok. I have seen this messages in the logs:
>>>
>>> EXT4-fs error (device sdb): ext4_ext_search_left:1221: inode #12: comm
>>> flush-8:16: ix (2248761) != EXT_FIRST_INDEX (0) (depth 1)!
>>> EXT4-fs (sdb): delayed block allocation failed for inode 12 at logical
>>> offset 2258177 with max blocks 64 with error -5
>>> EXT4-fs (sdb): This should not happen!! Data will be lost
>>>
>>> and
>>>
>>> EXT4-fs error (device sdd2): ext4_ext_get_blocks: inode #12: (comm
>>> loop0) bad extent address iblock: 34479, depth: 3 pblock 0
>>>
>>> Steps to reproduce
>>>
>>> mkfs.ext4 -b1024 /dev/sdb
>>> mount /dev/sdb /mnt/test2
>>> dd if=/dev/zero of=/mnt/test2/file bs=1M count=4096
>>> losetup /dev/loop0 /mnt/test2/file
>>>
>>> cd xfstests
>>>
>>> export TEST_DIR=/mnt/test
>>> export TEST_DEV=/dev/sda
>>> export SCRATCH_DEV=/dev/loop0
>>> export SCRATCH_MNT=/mnt/test1
>>> export MKFS_OPTIONS="-F -b1024"
>>> export MOUNT_OPTIONS="-o discard"
>>> export FSTYP="ext4"
>>>
>>> while ./check 251; do echo "OK"; done
>>>
>>> ..and just wait and watch the logs.
>>>
>>> Do you have any idea what might be the problem ?
>>>
>>> Thanks!
>>> -Lukas
>>>
>>
>> Hi Lukas,
>>
>> Im having some trouble getting the bug to reproduce for me. I have the
>> dm-crypt module, but when I get to the test loop, i get "mount: unknown
>> filesystem type 'crypto_LUKS'". Is there something else I need to do or
>> install? With out being able to dig into it, I cant think of why it would do
>> that, I have not seen it produce that error before. :( Thx!
>>
>> Allison Henderson
>
> Hi Allison,
>
> I do not understand it either, there is no dm-crypt involved in this
> scenario. One think that comes to my mind is that TEST_DEV (in my case
> /dev/sda) needs to contain valid file system, but that is just how
> xfstests works. Please, let me know if you still have problems
> reproducing it.
>
> Thanks!
> -Lukas
>
Ok, I got it, it was my fault I had forgotten that I had used the
scratch partition for an encryption test a while back. Sorry! I am
getting a "[not run] FSTRIM is not supported" though, I think I need a
device that supports discard. I will poke around Monday and see if I
can borrow one from somebody.

Allison Henderson


2012-02-12 19:10:02

by Lukas Czerner

[permalink] [raw]
Subject: Re: Punch hole problem on PAGE_SIZE > blocksize

On Sun, 12 Feb 2012, Allison Henderson wrote:

> On 02/12/2012 03:31 AM, Lukas Czerner wrote:
> > On Sun, 12 Feb 2012, Allison Henderson wrote:
> >
> > > On 02/10/2012 12:10 PM, Lukas Czerner wrote:
> > > > Hi Allison,
> > > >
> > > > I found quite disturbing problem when testing loop discard support on
> > > > file systems where PAGE_SIZE> blocksize. The result is that the file
> > > > system image is completely destroyed, but the underlying file system
> > > > seems ok. I have seen this messages in the logs:
> > > >
> > > > EXT4-fs error (device sdb): ext4_ext_search_left:1221: inode #12: comm
> > > > flush-8:16: ix (2248761) != EXT_FIRST_INDEX (0) (depth 1)!
> > > > EXT4-fs (sdb): delayed block allocation failed for inode 12 at logical
> > > > offset 2258177 with max blocks 64 with error -5
> > > > EXT4-fs (sdb): This should not happen!! Data will be lost
> > > >
> > > > and
> > > >
> > > > EXT4-fs error (device sdd2): ext4_ext_get_blocks: inode #12: (comm
> > > > loop0) bad extent address iblock: 34479, depth: 3 pblock 0
> > > >
> > > > Steps to reproduce
> > > >
> > > > mkfs.ext4 -b1024 /dev/sdb
> > > > mount /dev/sdb /mnt/test2
> > > > dd if=/dev/zero of=/mnt/test2/file bs=1M count=4096
> > > > losetup /dev/loop0 /mnt/test2/file
> > > >
> > > > cd xfstests
> > > >
> > > > export TEST_DIR=/mnt/test
> > > > export TEST_DEV=/dev/sda
> > > > export SCRATCH_DEV=/dev/loop0
> > > > export SCRATCH_MNT=/mnt/test1
> > > > export MKFS_OPTIONS="-F -b1024"
> > > > export MOUNT_OPTIONS="-o discard"
> > > > export FSTYP="ext4"
> > > >
> > > > while ./check 251; do echo "OK"; done
> > > >
> > > > ..and just wait and watch the logs.
> > > >
> > > > Do you have any idea what might be the problem ?
> > > >
> > > > Thanks!
> > > > -Lukas
> > > >
> > >
> > > Hi Lukas,
> > >
> > > Im having some trouble getting the bug to reproduce for me. I have the
> > > dm-crypt module, but when I get to the test loop, i get "mount: unknown
> > > filesystem type 'crypto_LUKS'". Is there something else I need to do or
> > > install? With out being able to dig into it, I cant think of why it would
> > > do
> > > that, I have not seen it produce that error before. :( Thx!
> > >
> > > Allison Henderson
> >
> > Hi Allison,
> >
> > I do not understand it either, there is no dm-crypt involved in this
> > scenario. One think that comes to my mind is that TEST_DEV (in my case
> > /dev/sda) needs to contain valid file system, but that is just how
> > xfstests works. Please, let me know if you still have problems
> > reproducing it.
> >
> > Thanks!
> > -Lukas
> >
> Ok, I got it, it was my fault I had forgotten that I had used the scratch
> partition for an encryption test a while back. Sorry! I am getting a "[not
> run] FSTRIM is not supported" though, I think I need a device that supports
> discard. I will poke around Monday and see if I can borrow one from somebody.
>
> Allison Henderson

Well, you'll have to try that on the recent upstream kernel, or at least
the one which has my loop discard support patch
dfaa2ef68e80c378e610e3c8c536f1c239e8d3ef
so we convert the discard command in the loop driver into punch hole to
the backing file. But I have been able to reproduce it just with dd and
fallocate - just create a file with some extents and then puch a hole in
the size of the file and you'll get the error. I believe that there is
some kind of off-by-one error (probabaly blocks count vs. block
number:)). I have not had time to investigate this issue, but I'll look
into it soon.

Thanks!
-Lukas