2014-10-01 21:33:36

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH] ext4: ensure LARGE_FILE feature when mounting delalloc

Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary. However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.

This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:

dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
conv=notrunc of=testfile

leads to:

EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28

It simplifies things if we ensure that when we are running
with delalloc, we have LARGE_FILE set already; that way we
don't have to potentially set it later during a file write.

For any fs of sufficient size, LARGE_FILE is usually set
simply due to the size of the resize inode. And for ext4,
HUGE_FILE is set by default.

LARGE_FILE is a decades-old compatibility flag, so at this
point there is little risk of backwards compatibility problems
by enabling it when the filesystem is mounted as ext4.

So just set LARGE_FILE if we are mounted delalloc, if it's
not set already, and be done with it.

Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0b28b36..8e56d7e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3576,6 +3576,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
clear_opt(sb, DELALLOC);
}

+ /*
+ * Adding the LARGE_FILES feature to the superblock adds
+ * unnecessary complication to journal credit calculations
+ * when delalloc is enabled. This is a decades-old feature,
+ * so just enable it now to simplify things.
+ */
+ if (test_opt(sb, DELALLOC) && !(sb->s_flags & MS_RDONLY) &&
+ EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL) &&
+ !EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_LARGE_FILE)) {
+ ext4_update_dynamic_rev(sb);
+ EXT4_SET_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
+ }
+
sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
(test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);




2014-10-02 01:26:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: ensure LARGE_FILE feature when mounting delalloc

On Oct 1, 2014, at 3:33 PM, Eric Sandeen <[email protected]> wrote:
> Delalloc write journal reservations only reserve 1 credit,
> to update the inode if necessary. However, it may happen
> once in a filesystem's lifetime that a file will cross
> the 2G threshold, and require the LARGE_FILE feature to
> be set in the superblock as well, if it was not set already.
>
> This overruns the transaction reservation, and can be
> demonstrated simply on any ext4 filesystem without the LARGE_FILE
> feature already set:
>
> dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
> conv=notrunc of=testfile
> sync
> dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
> conv=notrunc of=testfile
>
> leads to:
>
> EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
> EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
> EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
> EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
> EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
>
> It simplifies things if we ensure that when we are running
> with delalloc, we have LARGE_FILE set already; that way we
> don't have to potentially set it later during a file write.
>
> For any fs of sufficient size, LARGE_FILE is usually set
> simply due to the size of the resize inode. And for ext4,
> HUGE_FILE is set by default.
>
> LARGE_FILE is a decades-old compatibility flag, so at this
> point there is little risk of backwards compatibility problems
> by enabling it when the filesystem is mounted as ext4.
>
> So just set LARGE_FILE if we are mounted delalloc, if it's
> not set already, and be done with it.
>
> Signed-off-by: Eric Sandeen <[email protected]>
> ---
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 0b28b36..8e56d7e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3576,6 +3576,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> clear_opt(sb, DELALLOC);
> }
>
> + /*
> + * Adding the LARGE_FILES feature to the superblock adds
> + * unnecessary complication to journal credit calculations
> + * when delalloc is enabled. This is a decades-old feature,
> + * so just enable it now to simplify things.
> + */
> + if (test_opt(sb, DELALLOC) && !(sb->s_flags & MS_RDONLY) &&
> + EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL) &&
> + !EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_LARGE_FILE)) {
> + ext4_update_dynamic_rev(sb);
> + EXT4_SET_RO_COMPAT_FEATURE(sb,
> + EXT4_FEATURE_RO_COMPAT_LARGE_FILE);

This sets the superblock flag, but doesn't actually mark the superblock
dirty. Later in ext4_fill_super() it is possible that this buffer_head
is discarded without writing it out:

if (sb->s_blocksize != blocksize) {
:
:
brelse(bh);

While this isn't completely fatal (the next mount would enable this
flag again), it could cause some errors to appear in e2fsck if large
files are created without the large_file feature in the superblock.
It would probably be safer to mark the superblock dirty in this case
so that it is written out. No need to sync it I think

ext4_commit_super(sb, 0);

Also, it looks like it is possible to enable delalloc via remount, so
this feature check/set should also be added there?

Cheers, Andreas

> + }
> +
> sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
> (test_opt(sb, POSIX_ACL) ? MS_POSIXACL : 0);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2014-10-02 02:15:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext4: ensure LARGE_FILE feature when mounting delalloc

On 10/1/14 8:26 PM, Andreas Dilger wrote:
> On Oct 1, 2014, at 3:33 PM, Eric Sandeen <[email protected]> wrote:
>> Delalloc write journal reservations only reserve 1 credit,
>> to update the inode if necessary. However, it may happen
>> once in a filesystem's lifetime that a file will cross
>> the 2G threshold, and require the LARGE_FILE feature to
>> be set in the superblock as well, if it was not set already.
>>
>> This overruns the transaction reservation, and can be
>> demonstrated simply on any ext4 filesystem without the LARGE_FILE
>> feature already set:
>>
>> dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
>> conv=notrunc of=testfile
>> sync
>> dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
>> conv=notrunc of=testfile
>>
>> leads to:
>>
>> EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
>> EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
>> EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
>> EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
>> EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
>>
>> It simplifies things if we ensure that when we are running
>> with delalloc, we have LARGE_FILE set already; that way we
>> don't have to potentially set it later during a file write.
>>
>> For any fs of sufficient size, LARGE_FILE is usually set
>> simply due to the size of the resize inode. And for ext4,
>> HUGE_FILE is set by default.
>>
>> LARGE_FILE is a decades-old compatibility flag, so at this
>> point there is little risk of backwards compatibility problems
>> by enabling it when the filesystem is mounted as ext4.
>>
>> So just set LARGE_FILE if we are mounted delalloc, if it's
>> not set already, and be done with it.
>>
>> Signed-off-by: Eric Sandeen <[email protected]>
>> ---
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 0b28b36..8e56d7e 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -3576,6 +3576,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>> clear_opt(sb, DELALLOC);
>> }
>>
>> + /*
>> + * Adding the LARGE_FILES feature to the superblock adds
>> + * unnecessary complication to journal credit calculations
>> + * when delalloc is enabled. This is a decades-old feature,
>> + * so just enable it now to simplify things.
>> + */
>> + if (test_opt(sb, DELALLOC) && !(sb->s_flags & MS_RDONLY) &&
>> + EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_HAS_JOURNAL) &&
>> + !EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_LARGE_FILE)) {
>> + ext4_update_dynamic_rev(sb);
>> + EXT4_SET_RO_COMPAT_FEATURE(sb,
>> + EXT4_FEATURE_RO_COMPAT_LARGE_FILE);
>
> This sets the superblock flag, but doesn't actually mark the superblock
> dirty. Later in ext4_fill_super() it is possible that this buffer_head
> is discarded without writing it out:
>
> if (sb->s_blocksize != blocksize) {
> :
> :
> brelse(bh);

sorry, I missed this; skipped to the end too fast.

> While this isn't completely fatal (the next mount would enable this
> flag again), it could cause some errors to appear in e2fsck if large
> files are created without the large_file feature in the superblock.
> It would probably be safer to mark the superblock dirty in this case
> so that it is written out. No need to sync it I think
>
> ext4_commit_super(sb, 0);
>
> Also, it looks like it is possible to enable delalloc via remount, so
> this feature check/set should also be added there?

oh, bleah. I guess so.

Thanks for the review, will send V2.

-Eric

> Cheers, Andreas
>


2014-10-02 15:28:40

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH] ext4: fix reservation overflow in ext4_da_write_begin

Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary. However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.

This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:

dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
conv=notrunc of=testfile

leads to:

EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28

Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.

Signed-off-by: Eric Sandeen <[email protected]>
---

Ok, how's this ... I do like this a lot better than the set-flag-on-
mount-or-remount, which started to get a bit icky.


diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3aa26e9..8d362c2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2515,6 +2515,20 @@ static int ext4_nonda_switch(struct super_block *sb)
return 0;
}

+/* We always reserve for an inode update; the superblock could be there too */
+static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
+{
+ if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_LARGE_FILE))
+ return 1;
+
+ if (pos + len <= 0x7fffffffULL)
+ return 1;
+
+ /* We might need to update the superblock to set LARGE_FILE */
+ return 2;
+}
+
static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@@ -2565,7 +2579,8 @@ retry_grab:
* of file which has an already mapped buffer.
*/
retry_journal:
- handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
+ handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
+ ext4_da_write_credits(inode, pos, len));
if (IS_ERR(handle)) {
page_cache_release(page);
return PTR_ERR(handle);


2014-10-02 21:00:18

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix reservation overflow in ext4_da_write_begin

On Oct 2, 2014, at 9:28 AM, Eric Sandeen <[email protected]> wrote:
> Delalloc write journal reservations only reserve 1 credit,
> to update the inode if necessary. However, it may happen
> once in a filesystem's lifetime that a file will cross
> the 2G threshold, and require the LARGE_FILE feature to
> be set in the superblock as well, if it was not set already.
>
> This overruns the transaction reservation, and can be
> demonstrated simply on any ext4 filesystem without the LARGE_FILE
> feature already set:
>
> dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
> conv=notrunc of=testfile
> sync
> dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
> conv=notrunc of=testfile
>
> leads to:
>
> EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
> EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
> EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
> EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
> EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
>
> Adjust the number of credits based on whether the flag is
> already set, and whether the current write may extend past the
> LARGE_FILE limit.
>
> Signed-off-by: Eric Sandeen <[email protected]>

Reviewed-by: Andreas Dilger <[email protected]>

> ---
>
> Ok, how's this ... I do like this a lot better than the set-flag-on-
> mount-or-remount, which started to get a bit icky.
>
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3aa26e9..8d362c2 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2515,6 +2515,20 @@ static int ext4_nonda_switch(struct super_block *sb)
> return 0;
> }
>
> +/* We always reserve for an inode update; the superblock could be there too */
> +static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
> +{
> + if (EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,

This could be marked "likely()" I suspect, but not critical.

> + EXT4_FEATURE_RO_COMPAT_LARGE_FILE))
> + return 1;
> +
> + if (pos + len <= 0x7fffffffULL)
> + return 1;
> +
> + /* We might need to update the superblock to set LARGE_FILE */
> + return 2;
> +}
> +
> static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned flags,
> struct page **pagep, void **fsdata)
> @@ -2565,7 +2579,8 @@ retry_grab:
> * of file which has an already mapped buffer.
> */
> retry_journal:
> - handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE, 1);
> + handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> + ext4_da_write_credits(inode, pos, len));
> if (IS_ERR(handle)) {
> page_cache_release(page);
> return PTR_ERR(handle);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






Attachments:
signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2014-10-11 23:52:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix reservation overflow in ext4_da_write_begin

On Thu, Oct 02, 2014 at 03:00:23PM -0600, Andreas Dilger wrote:
> On Oct 2, 2014, at 9:28 AM, Eric Sandeen <[email protected]> wrote:
> > Delalloc write journal reservations only reserve 1 credit,
> > to update the inode if necessary. However, it may happen
> > once in a filesystem's lifetime that a file will cross
> > the 2G threshold, and require the LARGE_FILE feature to
> > be set in the superblock as well, if it was not set already.
> >
> > This overruns the transaction reservation, and can be
> > demonstrated simply on any ext4 filesystem without the LARGE_FILE
> > feature already set:
> >
> > dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
> > conv=notrunc of=testfile
> > sync
> > dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
> > conv=notrunc of=testfile
> >
> > leads to:
> >
> > EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
> > EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
> > EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
> > EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
> > EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28
> >
> > Adjust the number of credits based on whether the flag is
> > already set, and whether the current write may extend past the
> > LARGE_FILE limit.
> >
> > Signed-off-by: Eric Sandeen <[email protected]>
>
> Reviewed-by: Andreas Dilger <[email protected]>

Applied, thanks. I added the likely() qualifer per Andreas'
suggestion.

- Ted