2009-12-11 23:07:04

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH 2/2] ext4: flush delalloc blocks when space is low

Creating many small files in rapid succession on a small
filesystem can lead to spurious ENOSPC; on a 104MB filesystem:

for i in `seq 1 22500`; do
echo -n > $SCRATCH_MNT/$i
echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
done

leads to ENOSPC even though after a sync, 40% of the fs is free
again.

This is because we reserve worst-case metadata for delalloc writes,
and when data is allocated that worst-case reservation is not
usually needed.

When freespace is low, kicking off an async writeback will start
converting that worst-case space usage into something more realistic,
almost always freeing up space to continue.

This resolves the testcase for me, and survives all 4 generic
ENOSPC tests in xfstests.

We'll still need a hard synchronous sync to squeeze out the last bit,
but this fixes things up to a large degree.

Signed-off-by: Eric Sandeen <[email protected]>
---

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5c5bc5d..5b3f468 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3024,11 +3024,18 @@ static int ext4_nonda_switch(struct super_block *sb)
if (2 * free_blocks < 3 * dirty_blocks ||
free_blocks < (dirty_blocks + EXT4_FREEBLOCKS_WATERMARK)) {
/*
- * free block count is less that 150% of dirty blocks
- * or free blocks is less that watermark
+ * free block count is less than 150% of dirty blocks
+ * or free blocks is less than watermark
*/
return 1;
}
+ /*
+ * Even if we don't switch but are nearing capacity,
+ * start pushing delalloc when 1/2 of free blocks are dirty.
+ */
+ if (free_blocks < 2 * dirty_blocks)
+ writeback_inodes_sb_if_idle(sb);
+
return 0;
}




2009-12-16 20:46:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/2] ext4: flush delalloc blocks when space is low

> Creating many small files in rapid succession on a small
> filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
>
> for i in `seq 1 22500`; do
> echo -n > $SCRATCH_MNT/$i
> echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
> done
>
> leads to ENOSPC even though after a sync, 40% of the fs is free
> again.
>
> This is because we reserve worst-case metadata for delalloc writes,
> and when data is allocated that worst-case reservation is not
> usually needed.
>
> When freespace is low, kicking off an async writeback will start
> converting that worst-case space usage into something more realistic,
> almost always freeing up space to continue.
>
> This resolves the testcase for me, and survives all 4 generic
> ENOSPC tests in xfstests.
>
> We'll still need a hard synchronous sync to squeeze out the last bit,
> but this fixes things up to a large degree.
>
> Signed-off-by: Eric Sandeen <[email protected]>
Looks good.

Acked-by: Jan Kara <[email protected]>

Honza
> ---
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5c5bc5d..5b3f468 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3024,11 +3024,18 @@ static int ext4_nonda_switch(struct super_block *sb)
> if (2 * free_blocks < 3 * dirty_blocks ||
> free_blocks < (dirty_blocks + EXT4_FREEBLOCKS_WATERMARK)) {
> /*
> - * free block count is less that 150% of dirty blocks
> - * or free blocks is less that watermark
> + * free block count is less than 150% of dirty blocks
> + * or free blocks is less than watermark
> */
> return 1;
> }
> + /*
> + * Even if we don't switch but are nearing capacity,
> + * start pushing delalloc when 1/2 of free blocks are dirty.
> + */
> + if (free_blocks < 2 * dirty_blocks)
> + writeback_inodes_sb_if_idle(sb);
> +
> return 0;
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-12-23 13:01:26

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 2/2] ext4: flush delalloc blocks when space is low

On Wed, Dec 16, 2009 at 09:46:02PM +0100, Jan Kara wrote:
> > Creating many small files in rapid succession on a small
> > filesystem can lead to spurious ENOSPC; on a 104MB filesystem:
> >
> > for i in `seq 1 22500`; do
> > echo -n > $SCRATCH_MNT/$i
> > echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > $SCRATCH_MNT/$i
> > done
> >
> > leads to ENOSPC even though after a sync, 40% of the fs is free
> > again.
> >
> > This is because we reserve worst-case metadata for delalloc writes,
> > and when data is allocated that worst-case reservation is not
> > usually needed.
> >
> > When freespace is low, kicking off an async writeback will start
> > converting that worst-case space usage into something more realistic,
> > almost always freeing up space to continue.
> >
> > This resolves the testcase for me, and survives all 4 generic
> > ENOSPC tests in xfstests.
> >
> > We'll still need a hard synchronous sync to squeeze out the last bit,
> > but this fixes things up to a large degree.
> >
> > Signed-off-by: Eric Sandeen <[email protected]>

Thanks, added to the ext4 patch queue.

- Ted