2010-01-06 19:22:48

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 1/3] ext4: Fix quota accounting error with fallocate

When we fallocate a region of the file which we had recently written,
and which is still in the page cache marked as delayed allocated blocks
we need to make sure we don't do the quota update on writepage path.
This is because the needed quota updated would have already be done
by fallocate.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/extents.c | 21 +++++++++++++++++++++
fs/ext4/inode.c | 35 +++++++++++++++++++++++------------
3 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index af7b626..b98de17 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1443,6 +1443,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern int flush_aio_dio_completed_IO(struct inode *inode);
+extern void ext4_da_update_reserve_space(struct inode *inode,
+ int used, int quota_claim);
/* ioctl.c */
extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7d7b74e..3b6ff72 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3132,7 +3132,19 @@ out:
unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
newblock + max_blocks,
allocated - max_blocks);
+ allocated = max_blocks;
}
+
+ /*
+ * If we have done fallocate with the offset that is already
+ * delayed allocated, we would have block reservation
+ * and quota reservation done in the delayed write path.
+ * But fallocate would have already updated quota and block
+ * count for this offset. So cancel these reservation
+ */
+ if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
+ ext4_da_update_reserve_space(inode, allocated, 0);
+
map_out:
set_buffer_mapped(bh_result);
out1:
@@ -3368,9 +3380,18 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
/* previous routine could use block we allocated */
newblock = ext_pblock(&newex);
allocated = ext4_ext_get_actual_len(&newex);
+ if (allocated > max_blocks)
+ allocated = max_blocks;
set_buffer_new(bh_result);

/*
+ * Update reserved blocks/metadata blocks after successful
+ * block allocation which had been deferred till now.
+ */
+ if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
+ ext4_da_update_reserve_space(inode, allocated, 1);
+
+ /*
* Cache the extent and update transaction to commit on fdatasync only
* when it is _not_ an uninitialized extent.
*/
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c818972..3d1a1d6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1053,7 +1053,8 @@ static int ext4_calc_metadata_amount(struct inode *inode, sector_t lblock)
* Called with i_data_sem down, which is important since we can call
* ext4_discard_preallocations() from here.
*/
-static void ext4_da_update_reserve_space(struct inode *inode, int used)
+void ext4_da_update_reserve_space(struct inode *inode,
+ int used, int quota_claim)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_inode_info *ei = EXT4_I(inode);
@@ -1090,9 +1091,17 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

/* Update quota subsystem */
- vfs_dq_claim_block(inode, used);
- if (mdb_free)
- vfs_dq_release_reservation_block(inode, mdb_free);
+ if (quota_claim) {
+ vfs_dq_claim_block(inode, used);
+ if (mdb_free)
+ vfs_dq_release_reservation_block(inode, mdb_free);
+ } else {
+ /*
+ * This is a request to cancel the reservation. So just
+ * update the resevation and cancel the quota reservation
+ */
+ vfs_dq_release_reservation_block(inode, mdb_free + used);
+ }

/*
* If we have done all the pending block allocations and if
@@ -1292,18 +1301,20 @@ int ext4_get_blocks(handle_t *handle, struct inode *inode, sector_t block,
*/
EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE;
}
- }

+ /*
+ * Update reserved blocks/metadata blocks after successful
+ * block allocation which had been deferred till now. We don't
+ * support fallocate for non extent files. So we can update
+ * reserve space here.
+ */
+ if ((retval > 0) &&
+ (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
+ ext4_da_update_reserve_space(inode, retval, 1);
+ }
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
EXT4_I(inode)->i_delalloc_reserved_flag = 0;

- /*
- * Update reserved blocks/metadata blocks after successful
- * block allocation which had been deferred till now.
- */
- if ((retval > 0) && (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
- ext4_da_update_reserve_space(inode, retval);


2010-01-06 19:22:51

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 2/3] ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag

We should update reserve space if it is delalloc buffer
and thtat is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag.
So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of
EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/ext4.h | 7 ++-----
fs/ext4/extents.c | 4 ++--
fs/ext4/inode.c | 8 ++++----
3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b98de17..874d169 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -361,14 +361,11 @@ struct ext4_new_group_data {
so set the magic i_delalloc_reserve_flag after taking the
inode allocation semaphore for */
#define EXT4_GET_BLOCKS_DELALLOC_RESERVE 0x0004
- /* Call ext4_da_update_reserve_space() after successfully
- allocating the blocks */
-#define EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE 0x0008
/* caller is from the direct IO path, request to creation of an
unitialized extents if not allocated, split the uninitialized
extent if blocks has been preallocated already*/
-#define EXT4_GET_BLOCKS_DIO 0x0010
-#define EXT4_GET_BLOCKS_CONVERT 0x0020
+#define EXT4_GET_BLOCKS_DIO 0x0008
+#define EXT4_GET_BLOCKS_CONVERT 0x0010
#define EXT4_GET_BLOCKS_DIO_CREATE_EXT (EXT4_GET_BLOCKS_DIO|\
EXT4_GET_BLOCKS_CREATE_UNINIT_EXT)
/* Convert extent to initialized after direct IO complete */
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3b6ff72..765a482 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3142,7 +3142,7 @@ out:
* But fallocate would have already updated quota and block
* count for this offset. So cancel these reservation
*/
- if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
ext4_da_update_reserve_space(inode, allocated, 0);

map_out:
@@ -3388,7 +3388,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* Update reserved blocks/metadata blocks after successful
* block allocation which had been deferred till now.
*/
- if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
ext4_da_update_reserve_space(inode, allocated, 1);

/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3d1a1d6..fec4ea1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1309,7 +1309,7 @@ int ext4_get_blocks(handle_t *handle, struct inode *inode, sector_t block,
* reserve space here.
*/
if ((retval > 0) &&
- (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
+ (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE))
ext4_da_update_reserve_space(inode, retval, 1);
}
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
@@ -2224,10 +2224,10 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
* variables are updated after the blocks have been allocated.
*/
new.b_state = 0;
- get_blocks_flags = (EXT4_GET_BLOCKS_CREATE |
- EXT4_GET_BLOCKS_DELALLOC_RESERVE);
+ get_blocks_flags = EXT4_GET_BLOCKS_CREATE ;
if (mpd->b_state & (1 << BH_Delay))
- get_blocks_flags |= EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE;
+ get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
+
blks = ext4_get_blocks(handle, mpd->inode, next, max_blocks,
&new, get_blocks_flags);
if (blks < 0) {
--
1.6.6.75.g37bae


2010-01-06 19:22:50

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 3/3] ext4: unmap the underlying metadata when allocating blocks via fallocate

This become important when we are running with nojournal mode. We
may end up allocating recently freed metablocks for fallocate. We
want to make sure we unmap the old mapping so that when we convert
the fallocated uninitialized extent to initialized extent we don't
have the old mapping around. Leaving the old mapping can cause
file system corruption

Now that we unmap old metadata blocks we need not return blocks
allocated from fallocate area as new.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/ext4.h | 14 ++++++++++++++
fs/ext4/extents.c | 18 ++++--------------
fs/ext4/inode.c | 18 +-----------------
3 files changed, 19 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 874d169..110a31f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1773,6 +1773,20 @@ static inline void set_bitmap_uptodate(struct buffer_head *bh)
set_bit(BH_BITMAP_UPTODATE, &(bh)->b_state);
}

+/*
+ * __unmap_underlying_bh_blocks - just a helper function to unmap
+ * set of blocks described by @bh
+ */
+static inline void __unmap_underlying_bh_blocks(struct inode *inode,
+ struct buffer_head *bh)
+{
+ struct block_device *bdev = inode->i_sb->s_bdev;
+ int blocks, i;
+
+ blocks = bh->b_size >> inode->i_blkbits;
+ for (i = 0; i < blocks; i++)
+ unmap_underlying_metadata(bdev, bh->b_blocknr + i);
+}
#endif /* __KERNEL__ */

#endif /* _EXT4_H */
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 765a482..7b4f4cc 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3120,21 +3120,9 @@ out:
goto out2;
} else
allocated = ret;
- set_buffer_new(bh_result);
- /*
- * if we allocated more blocks than requested
- * we need to make sure we unmap the extra block
- * allocated. The actual needed block will get
- * unmapped later when we find the buffer_head marked
- * new.
- */
- if (allocated > max_blocks) {
- unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
- newblock + max_blocks,
- allocated - max_blocks);
- allocated = max_blocks;
- }

+ if (allocated > max_blocks)
+ allocated = max_blocks;
/*
* If we have done fallocate with the offset that is already
* delayed allocated, we would have block reservation
@@ -3570,6 +3558,8 @@ retry:
ret2 = ext4_journal_stop(handle);
break;
}
+ if (buffer_new(&map_bh))
+ __unmap_underlying_bh_blocks(inode, &map_bh);
if ((block + ret) >= (EXT4_BLOCK_ALIGN(offset + len,
blkbits) >> blkbits))
new_size = offset + len;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fec4ea1..831c7cd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2104,22 +2104,6 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
}
}

-
-/*
- * __unmap_underlying_blocks - just a helper function to unmap
- * set of blocks described by @bh
- */
-static inline void __unmap_underlying_blocks(struct inode *inode,
- struct buffer_head *bh)
-{
- struct block_device *bdev = inode->i_sb->s_bdev;
- int blocks, i;
-
- blocks = bh->b_size >> inode->i_blkbits;
- for (i = 0; i < blocks; i++)
- unmap_underlying_metadata(bdev, bh->b_blocknr + i);
-}

2010-01-06 19:47:28

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH 1/3] ext4: Fix quota accounting error with fallocate

On Thu, 2010-01-07 at 00:52 +0530, Aneesh Kumar K.V wrote:
> When we fallocate a region of the file which we had recently written,
> and which is still in the page cache marked as delayed allocated blocks
> we need to make sure we don't do the quota update on writepage path.
> This is because the needed quota updated would have already be done
> by fallocate.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/ext4.h | 2 ++
> fs/ext4/extents.c | 21 +++++++++++++++++++++
> fs/ext4/inode.c | 35 +++++++++++++++++++++++------------
> 3 files changed, 46 insertions(+), 12 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index af7b626..b98de17 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1443,6 +1443,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
> extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
> extern qsize_t *ext4_get_reserved_space(struct inode *inode);
> extern int flush_aio_dio_completed_IO(struct inode *inode);
> +extern void ext4_da_update_reserve_space(struct inode *inode,
> + int used, int quota_claim);
> /* ioctl.c */
> extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
> extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 7d7b74e..3b6ff72 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3132,7 +3132,19 @@ out:
> unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
> newblock + max_blocks,
> allocated - max_blocks);
> + allocated = max_blocks;
> }
> +
> + /*
> + * If we have done fallocate with the offset that is already
> + * delayed allocated, we would have block reservation
> + * and quota reservation done in the delayed write path.
> + * But fallocate would have already updated quota and block
> + * count for this offset. So cancel these reservation
> + */
> + if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
> + ext4_da_update_reserve_space(inode, allocated, 0);
> +

Looks right to me, we are only updating the reserve space in the delayed
buffered IO case, but avoid updating the quota.

> map_out:
> set_buffer_mapped(bh_result);
> out1:
> @@ -3368,9 +3380,18 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> /* previous routine could use block we allocated */
> newblock = ext_pblock(&newex);
> allocated = ext4_ext_get_actual_len(&newex);
> + if (allocated > max_blocks)
> + allocated = max_blocks;
> set_buffer_new(bh_result);
>
> /*
> + * Update reserved blocks/metadata blocks after successful
> + * block allocation which had been deferred till now.
> + */
> + if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
> + ext4_da_update_reserve_space(inode, allocated, 1);
> +
> + /*
> * Cache the extent and update transaction to commit on fdatasync only
> * when it is _not_ an uninitialized extent.
> */
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index c818972..3d1a1d6 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1053,7 +1053,8 @@ static int ext4_calc_metadata_amount(struct inode *inode, sector_t lblock)
> * Called with i_data_sem down, which is important since we can call
> * ext4_discard_preallocations() from here.
> */
> -static void ext4_da_update_reserve_space(struct inode *inode, int used)
> +void ext4_da_update_reserve_space(struct inode *inode,
> + int used, int quota_claim)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> struct ext4_inode_info *ei = EXT4_I(inode);
> @@ -1090,9 +1091,17 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> /* Update quota subsystem */
> - vfs_dq_claim_block(inode, used);
> - if (mdb_free)
> - vfs_dq_release_reservation_block(inode, mdb_free);
> + if (quota_claim) {
> + vfs_dq_claim_block(inode, used);
> + if (mdb_free)
> + vfs_dq_release_reservation_block(inode, mdb_free);
> + } else {
> + /*
> + * This is a request to cancel the reservation. So just
> + * update the resevation and cancel the quota reservation
> + */
> + vfs_dq_release_reservation_block(inode, mdb_free + used);
> + }
>
> /*
> * If we have done all the pending block allocations and if
> @@ -1292,18 +1301,20 @@ int ext4_get_blocks(handle_t *handle, struct inode *inode, sector_t block,
> */
> EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE;
> }
> - }
>
> + /*
> + * Update reserved blocks/metadata blocks after successful
> + * block allocation which had been deferred till now. We don't
> + * support fallocate for non extent files. So we can update
> + * reserve space here.
> + */
> + if ((retval > 0) &&
> + (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
> + ext4_da_update_reserve_space(inode, retval, 1);
> + }
> if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> EXT4_I(inode)->i_delalloc_reserved_flag = 0;
>
> - /*
> - * Update reserved blocks/metadata blocks after successful
> - * block allocation which had been deferred till now.
> - */
> - if ((retval > 0) && (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
> - ext4_da_update_reserve_space(inode, retval);
> -
> up_write((&EXT4_I(inode)->i_data_sem));
> if (retval > 0 && buffer_mapped(bh)) {
> int ret = check_block_validity(inode, "file system "

Looks good,
Reviewed-by: Mingming Cao<[email protected]>


2010-01-06 19:50:57

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH 2/3] ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag

On Thu, 2010-01-07 at 00:52 +0530, Aneesh Kumar K.V wrote:
> We should update reserve space if it is delalloc buffer
> and thtat is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag.
> So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of
> EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE
>

That's a nice cleanup. With the old name, it confused me a bit on your
first patch where I thought even direct IO gets unnecessary reservation
update.

> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/ext4.h | 7 ++-----
> fs/ext4/extents.c | 4 ++--
> fs/ext4/inode.c | 8 ++++----
> 3 files changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index b98de17..874d169 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -361,14 +361,11 @@ struct ext4_new_group_data {
> so set the magic i_delalloc_reserve_flag after taking the
> inode allocation semaphore for */
> #define EXT4_GET_BLOCKS_DELALLOC_RESERVE 0x0004
> - /* Call ext4_da_update_reserve_space() after successfully
> - allocating the blocks */
> -#define EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE 0x0008
> /* caller is from the direct IO path, request to creation of an
> unitialized extents if not allocated, split the uninitialized
> extent if blocks has been preallocated already*/
> -#define EXT4_GET_BLOCKS_DIO 0x0010
> -#define EXT4_GET_BLOCKS_CONVERT 0x0020
> +#define EXT4_GET_BLOCKS_DIO 0x0008
> +#define EXT4_GET_BLOCKS_CONVERT 0x0010
> #define EXT4_GET_BLOCKS_DIO_CREATE_EXT (EXT4_GET_BLOCKS_DIO|\
> EXT4_GET_BLOCKS_CREATE_UNINIT_EXT)
> /* Convert extent to initialized after direct IO complete */
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3b6ff72..765a482 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3142,7 +3142,7 @@ out:
> * But fallocate would have already updated quota and block
> * count for this offset. So cancel these reservation
> */
> - if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
> + if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> ext4_da_update_reserve_space(inode, allocated, 0);
>
> map_out:
> @@ -3388,7 +3388,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> * Update reserved blocks/metadata blocks after successful
> * block allocation which had been deferred till now.
> */
> - if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
> + if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> ext4_da_update_reserve_space(inode, allocated, 1);
>
> /*
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3d1a1d6..fec4ea1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1309,7 +1309,7 @@ int ext4_get_blocks(handle_t *handle, struct inode *inode, sector_t block,
> * reserve space here.
> */
> if ((retval > 0) &&
> - (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
> + (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE))
> ext4_da_update_reserve_space(inode, retval, 1);
> }
> if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> @@ -2224,10 +2224,10 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
> * variables are updated after the blocks have been allocated.
> */
> new.b_state = 0;
> - get_blocks_flags = (EXT4_GET_BLOCKS_CREATE |
> - EXT4_GET_BLOCKS_DELALLOC_RESERVE);
> + get_blocks_flags = EXT4_GET_BLOCKS_CREATE ;
> if (mpd->b_state & (1 << BH_Delay))
> - get_blocks_flags |= EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE;
> + get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;
> +
> blks = ext4_get_blocks(handle, mpd->inode, next, max_blocks,
> &new, get_blocks_flags);
> if (blks < 0) {



2010-01-06 20:20:47

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 3/3] ext4: unmap the underlying metadata when allocating blocks via fallocate

Aneesh Kumar K.V wrote:
> This become important when we are running with nojournal mode. We
> may end up allocating recently freed metablocks for fallocate. We
> want to make sure we unmap the old mapping so that when we convert
> the fallocated uninitialized extent to initialized extent we don't
> have the old mapping around. Leaving the old mapping can cause
> file system corruption

if you could devise a testcase for that, it'd be great - it should
go into the test suite ...

-Eric

> Now that we unmap old metadata blocks we need not return blocks
> allocated from fallocate area as new.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/ext4.h | 14 ++++++++++++++
> fs/ext4/extents.c | 18 ++++--------------
> fs/ext4/inode.c | 18 +-----------------
> 3 files changed, 19 insertions(+), 31 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 874d169..110a31f 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1773,6 +1773,20 @@ static inline void set_bitmap_uptodate(struct buffer_head *bh)
> set_bit(BH_BITMAP_UPTODATE, &(bh)->b_state);
> }
>
> +/*
> + * __unmap_underlying_bh_blocks - just a helper function to unmap
> + * set of blocks described by @bh
> + */
> +static inline void __unmap_underlying_bh_blocks(struct inode *inode,
> + struct buffer_head *bh)
> +{
> + struct block_device *bdev = inode->i_sb->s_bdev;
> + int blocks, i;
> +
> + blocks = bh->b_size >> inode->i_blkbits;
> + for (i = 0; i < blocks; i++)
> + unmap_underlying_metadata(bdev, bh->b_blocknr + i);
> +}
> #endif /* __KERNEL__ */
>
> #endif /* _EXT4_H */
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 765a482..7b4f4cc 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3120,21 +3120,9 @@ out:
> goto out2;
> } else
> allocated = ret;
> - set_buffer_new(bh_result);
> - /*
> - * if we allocated more blocks than requested
> - * we need to make sure we unmap the extra block
> - * allocated. The actual needed block will get
> - * unmapped later when we find the buffer_head marked
> - * new.
> - */
> - if (allocated > max_blocks) {
> - unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
> - newblock + max_blocks,
> - allocated - max_blocks);
> - allocated = max_blocks;
> - }
>
> + if (allocated > max_blocks)
> + allocated = max_blocks;
> /*
> * If we have done fallocate with the offset that is already
> * delayed allocated, we would have block reservation
> @@ -3570,6 +3558,8 @@ retry:
> ret2 = ext4_journal_stop(handle);
> break;
> }
> + if (buffer_new(&map_bh))
> + __unmap_underlying_bh_blocks(inode, &map_bh);
> if ((block + ret) >= (EXT4_BLOCK_ALIGN(offset + len,
> blkbits) >> blkbits))
> new_size = offset + len;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fec4ea1..831c7cd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2104,22 +2104,6 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
> }
> }
>
> -
> -/*
> - * __unmap_underlying_blocks - just a helper function to unmap
> - * set of blocks described by @bh
> - */
> -static inline void __unmap_underlying_blocks(struct inode *inode,
> - struct buffer_head *bh)
> -{
> - struct block_device *bdev = inode->i_sb->s_bdev;
> - int blocks, i;
> -
> - blocks = bh->b_size >> inode->i_blkbits;
> - for (i = 0; i < blocks; i++)
> - unmap_underlying_metadata(bdev, bh->b_blocknr + i);
> -}
> -
> static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd,
> sector_t logical, long blk_cnt)
> {
> @@ -2274,7 +2258,7 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
> new.b_size = (blks << mpd->inode->i_blkbits);
>
> if (buffer_new(&new))
> - __unmap_underlying_blocks(mpd->inode, &new);
> + __unmap_underlying_bh_blocks(mpd->inode, &new);
>
> /*
> * If blocks are delayed marked, we need to


2010-01-13 14:55:53

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH -V2] ext4: Fix quota accounting error with fallocate

When we fallocate a region of the file which we had recently written,
and which is still in the page cache marked as delayed allocated blocks
we need to make sure we don't do the quota update on writepage path.
This is because the needed quota updated would have already be done
by fallocate.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
Changes from V1:
Fix the i_block corruption due to not claiming quota
for the new meta data block allocated during uninit to init
extent conversion. Updated ext4_da_update_reserve_space to
claim for only metadata block.

fs/ext4/ext4.h | 2 ++
fs/ext4/extents.c | 21 +++++++++++++++++++++
fs/ext4/inode.c | 44 +++++++++++++++++++++++++++++++-------------
3 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index af7b626..b98de17 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1443,6 +1443,8 @@ extern int ext4_block_truncate_page(handle_t *handle,
extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
extern qsize_t *ext4_get_reserved_space(struct inode *inode);
extern int flush_aio_dio_completed_IO(struct inode *inode);
+extern void ext4_da_update_reserve_space(struct inode *inode,
+ int used, int quota_claim);
/* ioctl.c */
extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
extern long ext4_compat_ioctl(struct file *, unsigned int, unsigned long);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 7d7b74e..3b6ff72 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3132,7 +3132,19 @@ out:
unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
newblock + max_blocks,
allocated - max_blocks);
+ allocated = max_blocks;
}
+
+ /*
+ * If we have done fallocate with the offset that is already
+ * delayed allocated, we would have block reservation
+ * and quota reservation done in the delayed write path.
+ * But fallocate would have already updated quota and block
+ * count for this offset. So cancel these reservation
+ */
+ if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
+ ext4_da_update_reserve_space(inode, allocated, 0);
+
map_out:
set_buffer_mapped(bh_result);
out1:
@@ -3368,9 +3380,18 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
/* previous routine could use block we allocated */
newblock = ext_pblock(&newex);
allocated = ext4_ext_get_actual_len(&newex);
+ if (allocated > max_blocks)
+ allocated = max_blocks;
set_buffer_new(bh_result);

/*
+ * Update reserved blocks/metadata blocks after successful
+ * block allocation which had been deferred till now.
+ */
+ if (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE)
+ ext4_da_update_reserve_space(inode, allocated, 1);
+
+ /*
* Cache the extent and update transaction to commit on fdatasync only
* when it is _not_ an uninitialized extent.
*/
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c818972..cc9440a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1053,11 +1053,12 @@ static int ext4_calc_metadata_amount(struct inode *inode, sector_t lblock)
* Called with i_data_sem down, which is important since we can call
* ext4_discard_preallocations() from here.
*/
-static void ext4_da_update_reserve_space(struct inode *inode, int used)
+void ext4_da_update_reserve_space(struct inode *inode,
+ int used, int quota_claim)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct ext4_inode_info *ei = EXT4_I(inode);
- int mdb_free = 0;
+ int mdb_free = 0, allocated_meta_blocks = 0;

spin_lock(&ei->i_block_reservation_lock);
if (unlikely(used > ei->i_reserved_data_blocks)) {
@@ -1073,6 +1074,7 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
ei->i_reserved_data_blocks -= used;
used += ei->i_allocated_meta_blocks;
ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
+ allocated_meta_blocks = ei->i_allocated_meta_blocks;
ei->i_allocated_meta_blocks = 0;
percpu_counter_sub(&sbi->s_dirtyblocks_counter, used);

@@ -1090,9 +1092,23 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);

/* Update quota subsystem */
- vfs_dq_claim_block(inode, used);
- if (mdb_free)
- vfs_dq_release_reservation_block(inode, mdb_free);
+ if (quota_claim) {
+ vfs_dq_claim_block(inode, used);
+ if (mdb_free)
+ vfs_dq_release_reservation_block(inode, mdb_free);
+ } else {
+ /*
+ * We did fallocate with an offset that is already delayed
+ * allocated. So on delayed allocated writeback we should
+ * not update the quota for allocated blocks. But then
+ * converting an fallocate region to initialized region would
+ * have caused a metadata allocation. So claim quota for
+ * that
+ */
+ if (allocated_meta_blocks)
+ vfs_dq_claim_block(inode, allocated_meta_blocks);
+ vfs_dq_release_reservation_block(inode, mdb_free + used);
+ }

/*
* If we have done all the pending block allocations and if
@@ -1292,18 +1308,20 @@ int ext4_get_blocks(handle_t *handle, struct inode *inode, sector_t block,
*/
EXT4_I(inode)->i_state &= ~EXT4_STATE_EXT_MIGRATE;
}
- }

+ /*
+ * Update reserved blocks/metadata blocks after successful
+ * block allocation which had been deferred till now. We don't
+ * support fallocate for non extent files. So we can update
+ * reserve space here.
+ */
+ if ((retval > 0) &&
+ (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
+ ext4_da_update_reserve_space(inode, retval, 1);
+ }
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
EXT4_I(inode)->i_delalloc_reserved_flag = 0;

- /*
- * Update reserved blocks/metadata blocks after successful
- * block allocation which had been deferred till now.
- */
- if ((retval > 0) && (flags & EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE))
- ext4_da_update_reserve_space(inode, retval);

2010-01-14 17:00:54

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH 3/3] ext4: unmap the underlying metadata when allocating blocks via fallocate

Aneesh Kumar K.V wrote:
> This become important when we are running with nojournal mode. We
> may end up allocating recently freed metablocks for fallocate. We
> want to make sure we unmap the old mapping so that when we convert
> the fallocated uninitialized extent to initialized extent we don't
> have the old mapping around. Leaving the old mapping can cause
> file system corruption
>
> Now that we unmap old metadata blocks we need not return blocks
> allocated from fallocate area as new.

With this, unmap_underlying_metadata_blocks is no longer used
and can be removed I guess.

-Eric

> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/ext4.h | 14 ++++++++++++++
> fs/ext4/extents.c | 18 ++++--------------
> fs/ext4/inode.c | 18 +-----------------
> 3 files changed, 19 insertions(+), 31 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 874d169..110a31f 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1773,6 +1773,20 @@ static inline void set_bitmap_uptodate(struct buffer_head *bh)
> set_bit(BH_BITMAP_UPTODATE, &(bh)->b_state);
> }
>
> +/*
> + * __unmap_underlying_bh_blocks - just a helper function to unmap
> + * set of blocks described by @bh
> + */
> +static inline void __unmap_underlying_bh_blocks(struct inode *inode,
> + struct buffer_head *bh)
> +{
> + struct block_device *bdev = inode->i_sb->s_bdev;
> + int blocks, i;
> +
> + blocks = bh->b_size >> inode->i_blkbits;
> + for (i = 0; i < blocks; i++)
> + unmap_underlying_metadata(bdev, bh->b_blocknr + i);
> +}
> #endif /* __KERNEL__ */
>
> #endif /* _EXT4_H */
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 765a482..7b4f4cc 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3120,21 +3120,9 @@ out:
> goto out2;
> } else
> allocated = ret;
> - set_buffer_new(bh_result);
> - /*
> - * if we allocated more blocks than requested
> - * we need to make sure we unmap the extra block
> - * allocated. The actual needed block will get
> - * unmapped later when we find the buffer_head marked
> - * new.
> - */
> - if (allocated > max_blocks) {
> - unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
> - newblock + max_blocks,
> - allocated - max_blocks);
> - allocated = max_blocks;
> - }
>
> + if (allocated > max_blocks)
> + allocated = max_blocks;
> /*
> * If we have done fallocate with the offset that is already
> * delayed allocated, we would have block reservation
> @@ -3570,6 +3558,8 @@ retry:
> ret2 = ext4_journal_stop(handle);
> break;
> }
> + if (buffer_new(&map_bh))
> + __unmap_underlying_bh_blocks(inode, &map_bh);
> if ((block + ret) >= (EXT4_BLOCK_ALIGN(offset + len,
> blkbits) >> blkbits))
> new_size = offset + len;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fec4ea1..831c7cd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2104,22 +2104,6 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
> }
> }
>
> -
> -/*
> - * __unmap_underlying_blocks - just a helper function to unmap
> - * set of blocks described by @bh
> - */
> -static inline void __unmap_underlying_blocks(struct inode *inode,
> - struct buffer_head *bh)
> -{
> - struct block_device *bdev = inode->i_sb->s_bdev;
> - int blocks, i;
> -
> - blocks = bh->b_size >> inode->i_blkbits;
> - for (i = 0; i < blocks; i++)
> - unmap_underlying_metadata(bdev, bh->b_blocknr + i);
> -}
> -
> static void ext4_da_block_invalidatepages(struct mpage_da_data *mpd,
> sector_t logical, long blk_cnt)
> {
> @@ -2274,7 +2258,7 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
> new.b_size = (blks << mpd->inode->i_blkbits);
>
> if (buffer_new(&new))
> - __unmap_underlying_blocks(mpd->inode, &new);
> + __unmap_underlying_bh_blocks(mpd->inode, &new);
>
> /*
> * If blocks are delayed marked, we need to


2010-01-14 18:31:07

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

On Thu, 14 Jan 2010 11:00:39 -0600, Eric Sandeen <[email protected]> wrote:
> Aneesh Kumar K.V wrote:
> > This become important when we are running with nojournal mode. We
> > may end up allocating recently freed metablocks for fallocate. We
> > want to make sure we unmap the old mapping so that when we convert
> > the fallocated uninitialized extent to initialized extent we don't
> > have the old mapping around. Leaving the old mapping can cause
> > file system corruption
> >
> > Now that we unmap old metadata blocks we need not return blocks
> > allocated from fallocate area as new.
>
> With this, unmap_underlying_metadata_blocks is no longer used
> and can be removed I guess.
>

Yes. Updated patch below

>From 947ca1e49381bd76b85b44a1d41336a105e421ad Mon Sep 17 00:00:00 2001
From: Aneesh Kumar K.V <[email protected]>
Date: Thu, 7 Jan 2010 00:48:12 +0530
Subject: [PATCH] ext4: unmap the underlying metadata when allocating blocks via fallocate

This become important when we are running with nojournal mode. We
may end up allocating recently freed metablocks for fallocate. We
want to make sure we unmap the old mapping so that when we convert
the fallocated uninitialized extent to initialized extent we don't
have the old mapping around. Leaving the old mapping can cause
file system corruption

Now that we unmap old metadata blocks we need not return blocks
allocated from fallocate area as new.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/ext4.h | 14 ++++++++++++++
fs/ext4/extents.c | 26 ++++----------------------
fs/ext4/inode.c | 18 +-----------------
3 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 874d169..110a31f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1773,6 +1773,20 @@ static inline void set_bitmap_uptodate(struct buffer_head *bh)
set_bit(BH_BITMAP_UPTODATE, &(bh)->b_state);
}

+/*
+ * __unmap_underlying_bh_blocks - just a helper function to unmap
+ * set of blocks described by @bh
+ */
+static inline void __unmap_underlying_bh_blocks(struct inode *inode,
+ struct buffer_head *bh)
+{
+ struct block_device *bdev = inode->i_sb->s_bdev;
+ int blocks, i;
+
+ blocks = bh->b_size >> inode->i_blkbits;
+ for (i = 0; i < blocks; i++)
+ unmap_underlying_metadata(bdev, bh->b_blocknr + i);
+}
#endif /* __KERNEL__ */

#endif /* _EXT4_H */
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 765a482..8a20a5e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3038,14 +3038,6 @@ out:
return err;
}

-static void unmap_underlying_metadata_blocks(struct block_device *bdev,
- sector_t block, int count)
-{
- int i;
- for (i = 0; i < count; i++)
- unmap_underlying_metadata(bdev, block + i);
-}
-
static int
ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
ext4_lblk_t iblock, unsigned int max_blocks,
@@ -3120,21 +3112,9 @@ out:
goto out2;
} else
allocated = ret;
- set_buffer_new(bh_result);
- /*
- * if we allocated more blocks than requested
- * we need to make sure we unmap the extra block
- * allocated. The actual needed block will get
- * unmapped later when we find the buffer_head marked
- * new.
- */
- if (allocated > max_blocks) {
- unmap_underlying_metadata_blocks(inode->i_sb->s_bdev,
- newblock + max_blocks,
- allocated - max_blocks);
- allocated = max_blocks;
- }

+ if (allocated > max_blocks)
+ allocated = max_blocks;
/*
* If we have done fallocate with the offset that is already
* delayed allocated, we would have block reservation
@@ -3570,6 +3550,8 @@ retry:
ret2 = ext4_journal_stop(handle);
break;
}
+ if (buffer_new(&map_bh))
+ __unmap_underlying_bh_blocks(inode, &map_bh);
if ((block + ret) >= (EXT4_BLOCK_ALIGN(offset + len,
blkbits) >> blkbits))
new_size = offset + len;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbf56da..f1b66e6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2111,22 +2111,6 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
}
}

-
-/*
- * __unmap_underlying_blocks - just a helper function to unmap
- * set of blocks described by @bh
- */
-static inline void __unmap_underlying_blocks(struct inode *inode,
- struct buffer_head *bh)
-{
- struct block_device *bdev = inode->i_sb->s_bdev;
- int blocks, i;
-
- blocks = bh->b_size >> inode->i_blkbits;
- for (i = 0; i < blocks; i++)
- unmap_underlying_metadata(bdev, bh->b_blocknr + i);
-}

2010-01-14 19:32:11

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

Aneesh Kumar K. V wrote:
> On Thu, 14 Jan 2010 11:00:39 -0600, Eric Sandeen <[email protected]> wrote:
>> Aneesh Kumar K.V wrote:
>>> This become important when we are running with nojournal mode. We
>>> may end up allocating recently freed metablocks for fallocate. We
>>> want to make sure we unmap the old mapping so that when we convert
>>> the fallocated uninitialized extent to initialized extent we don't
>>> have the old mapping around. Leaving the old mapping can cause
>>> file system corruption
>>>
>>> Now that we unmap old metadata blocks we need not return blocks
>>> allocated from fallocate area as new.
>> With this, unmap_underlying_metadata_blocks is no longer used
>> and can be removed I guess.
>>
>
> Yes. Updated patch below

Ok, with the latest versions of these 3 patches, fsstress / 013
is working here, modulo the blocks-past-eof-from-fallocate problem
that remains.

Thanks,
-Eric

2010-01-24 18:47:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

On Fri, Jan 15, 2010 at 12:00:58AM +0530, Aneesh Kumar K. V wrote:
> From 947ca1e49381bd76b85b44a1d41336a105e421ad Mon Sep 17 00:00:00 2001
> From: Aneesh Kumar K.V <[email protected]>
> Date: Thu, 7 Jan 2010 00:48:12 +0530
> Subject: [PATCH] ext4: unmap the underlying metadata when allocating blocks via fallocate
>
> This become important when we are running with nojournal mode. We
> may end up allocating recently freed metablocks for fallocate. We
> want to make sure we unmap the old mapping so that when we convert
> the fallocated uninitialized extent to initialized extent we don't
> have the old mapping around. Leaving the old mapping can cause
> file system corruption
>
> Now that we unmap old metadata blocks we need not return blocks
> allocated from fallocate area as new.

Is this patch still strictly necessary given commit 515f41c? This
clears the blocks when we convert the extent from uninitialized to
initialized.

Hmmm, or is the problem that this fixes the problem only for the
zero'ed out blocks, but we can still get burned on the write path?
No, we fix that up in mpage_da_map_blocks.

So do we need this patch? It seems more efficient to clear it when we
actually write or zero out the blocks; I thought that was the
conclusion we came to when we were discussing curtw's patch (515f41c
above).

If we now decide that we need call unmap_underlying_blocks() at
fallocate time, maybe we should revert 515f41c? No point doing the
work twice; we probably have better uses for the CPU. :-)

- Ted

2010-01-25 05:12:09

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

On Sun, 24 Jan 2010 13:24:46 -0500, [email protected] wrote:
> On Fri, Jan 15, 2010 at 12:00:58AM +0530, Aneesh Kumar K. V wrote:
> > From 947ca1e49381bd76b85b44a1d41336a105e421ad Mon Sep 17 00:00:00 2001
> > From: Aneesh Kumar K.V <[email protected]>
> > Date: Thu, 7 Jan 2010 00:48:12 +0530
> > Subject: [PATCH] ext4: unmap the underlying metadata when allocating blocks via fallocate
> >
> > This become important when we are running with nojournal mode. We
> > may end up allocating recently freed metablocks for fallocate. We
> > want to make sure we unmap the old mapping so that when we convert
> > the fallocated uninitialized extent to initialized extent we don't
> > have the old mapping around. Leaving the old mapping can cause
> > file system corruption
> >
> > Now that we unmap old metadata blocks we need not return blocks
> > allocated from fallocate area as new.
>
> Is this patch still strictly necessary given commit 515f41c? This
> clears the blocks when we convert the extent from uninitialized to
> initialized.
>
> Hmmm, or is the problem that this fixes the problem only for the
> zero'ed out blocks, but we can still get burned on the write path?
> No, we fix that up in mpage_da_map_blocks.
>
> So do we need this patch? It seems more efficient to clear it when we
> actually write or zero out the blocks; I thought that was the
> conclusion we came to when we were discussing curtw's patch (515f41c
> above).
>
> If we now decide that we need call unmap_underlying_blocks() at
> fallocate time, maybe we should revert 515f41c? No point doing the
> work twice; we probably have better uses for the CPU. :-)

My goal was to see if we can drop the BH_New flag when returning
preallocated blocks. But now i look at it again i guess you don't need
to take this patch. We cannot unmap underlying meta data during
fallocate because even if we remove the old mapping for the blocks,
we could reboot and later somebody(e2fsprogs) can directly read the blocks
and create a bh with old data. So i guess we should be doing
unmap_underlying_blocks when we are returning blocks to userspace or
when we actually zero out blocks.

-aneesh




2010-01-25 09:02:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

On Mon, Jan 25, 2010 at 10:42:05AM +0530, Aneesh Kumar K. V wrote:
> My goal was to see if we can drop the BH_New flag when returning
> preallocated blocks. But now i look at it again i guess you don't need
> to take this patch. We cannot unmap underlying meta data during
> fallocate because even if we remove the old mapping for the blocks,
> we could reboot and later somebody(e2fsprogs) can directly read the blocks
> and create a bh with old data. So i guess we should be doing
> unmap_underlying_blocks when we are returning blocks to userspace or
> when we actually zero out blocks.

OK, so I'll drop this patch from the patch queue, and will do a full
test run (XFSQA -g auto using both 1k and 4k block sizes) of the
following patches:

Aneesh Kumar K.V (3):
ext4: Handle -EDQUOT error on write
ext4: Fix quota accounting error with fallocate
ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag

Of all of the patches in the ext4 patch queue, these seem to be the
ones which need to be urgently pushed to Linus before 2.6.33.

Any objections, or nominations for other patches that should go in
before 2.6.33, please speak up now...

- Ted

2010-01-25 15:43:32

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

[email protected] wrote:
> On Mon, Jan 25, 2010 at 10:42:05AM +0530, Aneesh Kumar K. V wrote:
>> My goal was to see if we can drop the BH_New flag when returning
>> preallocated blocks. But now i look at it again i guess you don't need
>> to take this patch. We cannot unmap underlying meta data during
>> fallocate because even if we remove the old mapping for the blocks,
>> we could reboot and later somebody(e2fsprogs) can directly read the blocks
>> and create a bh with old data. So i guess we should be doing
>> unmap_underlying_blocks when we are returning blocks to userspace or
>> when we actually zero out blocks.
>
> OK, so I'll drop this patch from the patch queue, and will do a full
> test run (XFSQA -g auto using both 1k and 4k block sizes) of the
> following patches:
>
> Aneesh Kumar K.V (3):
> ext4: Handle -EDQUOT error on write
> ext4: Fix quota accounting error with fallocate
> ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag

You may wish to add the patch from Jiaying (via me) as well, to handle
the blocks-past-eof case, if it's had sufficient review.

-Eric

> Of all of the patches in the ext4 patch queue, these seem to be the
> ones which need to be urgently pushed to Linus before 2.6.33.
>
> Any objections, or nominations for other patches that should go in
> before 2.6.33, please speak up now...
>
> - Ted


2010-01-25 17:55:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

On Mon, Jan 25, 2010 at 09:43:17AM -0600, Eric Sandeen wrote:
> > Aneesh Kumar K.V (3):
> > ext4: Handle -EDQUOT error on write
> > ext4: Fix quota accounting error with fallocate
> > ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag
>
> You may wish to add the patch from Jiaying (via me) as well, to handle
> the blocks-past-eof case, if it's had sufficient review.

I'm going to merge it, but I only want to push regressions to Linus at
this stage. We can add the blocks-past-EOF for the next merge window,
and mark it for merging via -stable, but it's really late for
non-regression bug fixes at this point.

- Ted

2010-01-25 18:21:18

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH -V2] ext4: unmap the underlying metadata when allocating blocks via fallocate

On Mon, 25 Jan 2010 12:55:12 -0500, [email protected] wrote:
> On Mon, Jan 25, 2010 at 09:43:17AM -0600, Eric Sandeen wrote:
> > > Aneesh Kumar K.V (3):
> > > ext4: Handle -EDQUOT error on write
> > > ext4: Fix quota accounting error with fallocate
> > > ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag
> >
> > You may wish to add the patch from Jiaying (via me) as well, to handle
> > the blocks-past-eof case, if it's had sufficient review.
>
> I'm going to merge it, but I only want to push regressions to Linus at
> this stage. We can add the blocks-past-EOF for the next merge window,
> and mark it for merging via -stable, but it's really late for
> non-regression bug fixes at this point.

I guess we can push i_size corruption with blocks past EOF in this merge
window. The other part of the patch get/set flags via chattr can be
pushed for the next merge window.

-aneesh