2011-05-11 17:26:29

by Kazuya Mio

[permalink] [raw]
Subject: [PATCH] ext4: Fix max file size of extent format file

We hit BUG_ON in ext4_ext_put_gap_in_cache() while creating a file
whose size is the max file size of extent format. Because the extent cache
length is 0 when we allocate two blocks to the file offset 2^32-2, and then
the offset 2^32-1. To fix it, we decrease the max file size to
(2^32-2)*blocksize. In this way, we would be able to allocate a block up to
the offset 2^32-2. I think there is no data loss because we can read all files
created before applying this patch.

How to reproduce:
I'm running 2.6.39-rc6. Note that i386 architecture and 4KB blocksize cannot
reproduce this problem.

# dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
# sync
# dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Signed-off-by: Kazuya Mio <[email protected]>
---
fs/ext4/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8553dfb..fce0249 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2248,8 +2248,8 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)

/* 32-bit extent-start container, ee_block */
res = 1LL << 32;
- res <<= blkbits;
res -= 1;
+ res <<= blkbits;

/* Sanity check against vm- & vfs- imposed limits */
if (res > upper_limit)



2011-05-13 06:34:49

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix max file size of extent format file

On Wed, 11 May 2011, Kazuya Mio wrote:

> We hit BUG_ON in ext4_ext_put_gap_in_cache() while creating a file
> whose size is the max file size of extent format. Because the extent cache
> length is 0 when we allocate two blocks to the file offset 2^32-2, and then
> the offset 2^32-1. To fix it, we decrease the max file size to
> (2^32-2)*blocksize. In this way, we would be able to allocate a block up to
> the offset 2^32-2. I think there is no data loss because we can read all files
> created before applying this patch.
>
> How to reproduce:
> I'm running 2.6.39-rc6. Note that i386 architecture and 4KB blocksize cannot
> reproduce this problem.
>
> # dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
> # sync
> # dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Hi Kazuya,

Thanks for the patch, however I think that there is a better solution
than lowering the max file size, which is not necessary. I would rather
fix ext4_ext_put_gap_in_cache() and allow to invalidate the cache by
setting the length to zero.

Please see the following patch, if it works for you.

Thanks!
-Lukas

>
> Signed-off-by: Kazuya Mio <[email protected]>
> ---
> fs/ext4/super.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 8553dfb..fce0249 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2248,8 +2248,8 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
>
> /* 32-bit extent-start container, ee_block */
> res = 1LL << 32;
> - res <<= blkbits;
> res -= 1;
> + res <<= blkbits;
>
> /* Sanity check against vm- & vfs- imposed limits */
> if (res > upper_limit)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2011-05-13 06:37:04

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH] ext4: invalidate gap cache when writing extents last block

Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
format and fill the tail of file up to its end. We will hit the BUG_ON
when we write the last block (2^32-1) into the sparse file.

That is because due to defensive programming we planted a lot of
BUG_ON's to prevent the length of the gap cache to be zero, but in this
case it actually will be zero, because there will be no gap at the end
of the file.

We could fix that as Kazuya Mio suggested by lowering the max file size
of extent format file by one block. But I do not think this is necessary
and we should rather fix the BUG_ON's to allow invalidating the gap
cache by setting its lenght to zero and this is what this commit is
doing.

The bug which this commit fixes can be reproduced as follows:

dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
sync
dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Reported-by: Kazuya Mio <[email protected]>
Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4890d6f..779ca49 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1944,7 +1944,6 @@ ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block,
__u32 len, ext4_fsblk_t start)
{
struct ext4_ext_cache *cex;
- BUG_ON(len == 0);
spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
cex = &EXT4_I(inode)->i_cached_extent;
cex->ec_block = block;
@@ -1991,7 +1990,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
le32_to_cpu(ex->ee_block),
ext4_ext_get_actual_len(ex),
block);
- BUG_ON(next == lblock);
+ BUG_ON((next == lblock) && (next != EXT_MAX_BLOCK));
len = next - lblock;
} else {
lblock = len = 0;
--
1.7.4.4


2011-05-13 08:56:14

by Kazuya Mio

[permalink] [raw]
Subject: Re: [PATCH] ext4: invalidate gap cache when writing extents last block

2011/05/13 15:36, Lukas Czerner wrote:
> That is because due to defensive programming we planted a lot of
> BUG_ON's to prevent the length of the gap cache to be zero, but in this
> case it actually will be zero, because there will be no gap at the end
> of the file.

Um, I think there is a block (blocksize -1 byte) in gap.
And this gap should be used for the next block searching.
According to ext4_ext_in_cache(), len=0 means the special case
which we have no cache extent, so len=0 should be disallowed.

Moreover, if we create the file which has 2^32-1 offset,
we can't get extent which starts from this offset with FIEMAP ioctl.
That's why I think the maximum file size should be 2^32-1 * blocksize.

Regards,
Kazuya Mio

2011-05-24 08:34:31

by Kazuya Mio

[permalink] [raw]
Subject: Re: [PATCH] ext4: invalidate gap cache when writing extents last block

Hi Lukas,
How do you think about my comment?

Regards,
Kazuya Mio

2011/05/13 17:55, Kazuya Mio wrote:
> 2011/05/13 15:36, Lukas Czerner wrote:
>> That is because due to defensive programming we planted a lot of
>> BUG_ON's to prevent the length of the gap cache to be zero, but in this
>> case it actually will be zero, because there will be no gap at the end
>> of the file.
>
> Um, I think there is a block (blocksize -1 byte) in gap.
> And this gap should be used for the next block searching.
> According to ext4_ext_in_cache(), len=0 means the special case
> which we have no cache extent, so len=0 should be disallowed.
>
> Moreover, if we create the file which has 2^32-1 offset,
> we can't get extent which starts from this offset with FIEMAP ioctl.
> That's why I think the maximum file size should be 2^32-1 * blocksize.
>
> Regards,
> Kazuya Mio

2011-05-24 08:57:24

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] ext4: invalidate gap cache when writing extents last block

Hi Kazuya,

I am really sorry for late answer. I think you're partly right. my
solution is not good, but I still think that your is not good as well. I
need to look at this again and more closely, sorry.

What do you think about this: ext4_ext_next_allocated_block() should
return next allocated block, however instead it in some cases returns
EXT_MAX_BLOCK, which points at the last logical block in a file, which
however in some cases might be equal to the last not allocated block,
not first allocated block in subsequent extent. And boom, we have (next
== lblock).

So if we want to really return next allocated block (or more
specifically, next block which we can not allocate), we should in those
cases return EXT_MAX_BLOCK+1. And we should do this in
ext4_ext_put_gap_in_cache() as well when there is no extent yet. Also
note that as I said EXT_MAX_BLOCK means maximum logical block, however
we use it as lenght in ext4_ext_put_gap_in_cache() which does not sound
right either. It seems all a little bit messy :-/. I need to look at it
and try it to see if it would work, but I think it does make sense.

What do you think?

Thanks!
-Lukas

On Tue, 24 May 2011, Kazuya Mio wrote:

> Hi Lukas,
> How do you think about my comment?
>
> Regards,
> Kazuya Mio
>
> 2011/05/13 17:55, Kazuya Mio wrote:
> > 2011/05/13 15:36, Lukas Czerner wrote:
> >> That is because due to defensive programming we planted a lot of
> >> BUG_ON's to prevent the length of the gap cache to be zero, but in this
> >> case it actually will be zero, because there will be no gap at the end
> >> of the file.
> >
> > Um, I think there is a block (blocksize -1 byte) in gap.
> > And this gap should be used for the next block searching.
> > According to ext4_ext_in_cache(), len=0 means the special case
> > which we have no cache extent, so len=0 should be disallowed.
> >
> > Moreover, if we create the file which has 2^32-1 offset,
> > we can't get extent which starts from this offset with FIEMAP ioctl.
> > That's why I think the maximum file size should be 2^32-1 * blocksize.
> >
> > Regards,
> > Kazuya Mio
>

--

2011-05-25 08:59:54

by Kazuya Mio

[permalink] [raw]
Subject: Re: [PATCH] ext4: invalidate gap cache when writing extents last block

2011/05/24 17:57, Lukas Czerner wrote:
> Hi Kazuya,
>
> I am really sorry for late answer. I think you're partly right. my
> solution is not good, but I still think that your is not good as well. I
> need to look at this again and more closely, sorry.
>
> What do you think about this: ext4_ext_next_allocated_block() should
> return next allocated block, however instead it in some cases returns
> EXT_MAX_BLOCK, which points at the last logical block in a file, which
> however in some cases might be equal to the last not allocated block,
> not first allocated block in subsequent extent. And boom, we have (next
> == lblock).

I think so, too.

> So if we want to really return next allocated block (or more
> specifically, next block which we can not allocate), we should in those
> cases return EXT_MAX_BLOCK+1. And we should do this in
> ext4_ext_put_gap_in_cache() as well when there is no extent yet. Also
> note that as I said EXT_MAX_BLOCK means maximum logical block, however
> we use it as lenght in ext4_ext_put_gap_in_cache() which does not sound
> right either. It seems all a little bit messy :-/. I need to look at it
> and try it to see if it would work, but I think it does make sense.
>
> What do you think?

AFAIK, it's the best way to fix this problem. But I have no idea that doesn't
increase the size of struct ext4_ext_cache. I'm looking forward to seeing
the patch.

Regards,
Kazuya Mio

2011-06-03 17:35:57

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix max file size of extent format file

On 2011-05-13, at 12:34 AM, Lukas Czerner wrote:
> On Wed, 11 May 2011, Kazuya Mio wrote:
>> We hit BUG_ON in ext4_ext_put_gap_in_cache() while creating a file
>> whose size is the max file size of extent format. Because the extent cache
>> length is 0 when we allocate two blocks to the file offset 2^32-2, and then
>> the offset 2^32-1. To fix it, we decrease the max file size to
>> (2^32-2)*blocksize. In this way, we would be able to allocate a block up to
>> the offset 2^32-2. I think there is no data loss because we can read all files
>> created before applying this patch.
>>
>> How to reproduce:
>> I'm running 2.6.39-rc6. Note that i386 architecture and 4KB blocksize cannot
>> reproduce this problem.
>>
>> # dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
>> # sync
>> # dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
>
> Hi Kazuya,
>
> Thanks for the patch, however I think that there is a better solution
> than lowering the max file size, which is not necessary. I would rather
> fix ext4_ext_put_gap_in_cache() and allow to invalidate the cache by
> setting the length to zero.
>
> Please see the following patch, if it works for you.

This thread has stalled. Until we can come up with a better fix, I think
we should land Kazuya's patch for the current merge window and then worry
about allocating that last 4kB at the end of the 16TB file later (2.3e-6%
of the file :-).

>> Signed-off-by: Kazuya Mio <[email protected]>
>> ---
>> fs/ext4/super.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 8553dfb..fce0249 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -2248,8 +2248,8 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
>>
>> /* 32-bit extent-start container, ee_block */
>> res = 1LL << 32;
>> - res <<= blkbits;
>> res -= 1;
>> + res <<= blkbits;
>>
>> /* Sanity check against vm- & vfs- imposed limits */
>> if (res > upper_limit)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






2011-06-03 18:08:13

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix max file size of extent format file

On Fri, 3 Jun 2011, Andreas Dilger wrote:

> On 2011-05-13, at 12:34 AM, Lukas Czerner wrote:
> > On Wed, 11 May 2011, Kazuya Mio wrote:
> >> We hit BUG_ON in ext4_ext_put_gap_in_cache() while creating a file
> >> whose size is the max file size of extent format. Because the extent cache
> >> length is 0 when we allocate two blocks to the file offset 2^32-2, and then
> >> the offset 2^32-1. To fix it, we decrease the max file size to
> >> (2^32-2)*blocksize. In this way, we would be able to allocate a block up to
> >> the offset 2^32-2. I think there is no data loss because we can read all files
> >> created before applying this patch.
> >>
> >> How to reproduce:
> >> I'm running 2.6.39-rc6. Note that i386 architecture and 4KB blocksize cannot
> >> reproduce this problem.
> >>
> >> # dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
> >> # sync
> >> # dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
> >
> > Hi Kazuya,
> >
> > Thanks for the patch, however I think that there is a better solution
> > than lowering the max file size, which is not necessary. I would rather
> > fix ext4_ext_put_gap_in_cache() and allow to invalidate the cache by
> > setting the length to zero.
> >
> > Please see the following patch, if it works for you.
>
> This thread has stalled. Until we can come up with a better fix, I think
> we should land Kazuya's patch for the current merge window and then worry
> about allocating that last 4kB at the end of the 16TB file later (2.3e-6%
> of the file :-).

Wait just a sec, I am testing new version of my patch. I have tried and
failed with my approach. It seems that there is no way to do that
without also changing the struct ext4_ext_cache and other structures,
resulting in different on-disk format.

So I think that Kazuyas fix is good after all, however there is
unconsistency in using EXT_MAX_BLOCK and in comments as well. So I fixed
that as well.

Also I hit another bug, while testing this change so I'll send both as
a series shortly.

Thanks!
-Lukas

>
> >> Signed-off-by: Kazuya Mio <[email protected]>
> >> ---
> >> fs/ext4/super.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >> index 8553dfb..fce0249 100644
> >> --- a/fs/ext4/super.c
> >> +++ b/fs/ext4/super.c
> >> @@ -2248,8 +2248,8 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
> >>
> >> /* 32-bit extent-start container, ee_block */
> >> res = 1LL << 32;
> >> - res <<= blkbits;
> >> res -= 1;
> >> + res <<= blkbits;
> >>
> >> /* Sanity check against vm- & vfs- imposed limits */
> >> if (res > upper_limit)
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
>

--