2008-02-15 18:16:59

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

fallocate blocks are considered as sparse area and read from them should
return zero. ext4_ext_get_blocks should return zero for read request.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/extents.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3efbfd1..5b22f71 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2379,8 +2379,14 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
}
if (create == EXT4_CREATE_UNINITIALIZED_EXT)
goto out;
- if (!create)
+ if (!create) {
+ /*
+ * read request should return zero blocks
+ * allocated
+ */
+ allocated = 0;
goto out2;
+ }

ret = ext4_ext_convert_to_initialized(handle, inode,
path, iblock,
--
1.5.4.1.97.g40aab-dirty


2008-02-15 19:43:26

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

On Fri, 2008-02-15 at 23:46 +0530, Aneesh Kumar K.V wrote:
> fallocate blocks are considered as sparse area and read from them should
> return zero. ext4_ext_get_blocks should return zero for read request.
>

The patch itself looks harmless, but I still don't see how this could
fix the problem you described at irc: a write hit a BUG_ON() in
fs/buffer.c saying the buffer is not mapped. Could you add more details
here?

Thanks
Mingming

> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/extents.c | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3efbfd1..5b22f71 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2379,8 +2379,14 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> }
> if (create == EXT4_CREATE_UNINITIALIZED_EXT)
> goto out;
> - if (!create)
> + if (!create) {
> + /*
> + * read request should return zero blocks
> + * allocated
> + */
> + allocated = 0;
> goto out2;
> + }
>
> ret = ext4_ext_convert_to_initialized(handle, inode,
> path, iblock,

2008-02-16 03:23:42

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

On Fri, Feb 15, 2008 at 11:43:04AM -0800, Mingming Cao wrote:
> On Fri, 2008-02-15 at 23:46 +0530, Aneesh Kumar K.V wrote:
> > fallocate blocks are considered as sparse area and read from them should
> > return zero. ext4_ext_get_blocks should return zero for read request.
> >
>
> The patch itself looks harmless, but I still don't see how this could
> fix the problem you described at irc: a write hit a BUG_ON() in
> fs/buffer.c saying the buffer is not mapped. Could you add more details
> here?

Write will take the below call chain

ext4_write_begin
block_write_begin
__block_prepare_write
ext4_getblock
ext4_get_blocks_wrap
(1) ext4_ext_get_blocks with create = 0 return allocated
ll_rw_block if buffer not uptodate.
submit_bh
BUG_ON(!buffer_mapped(bh))


ext4_ext_get_blocks at (1) should have returned 0. That would cause
ext4_get_blocks_wrap to again call ext4_ext_get_blocks with create = 1
and that would have returned us the buffer head which is mapped. This
would also result in splitting the extent to initialized and
uninitialized one.


-aneesh

2008-02-18 07:45:31

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

On Sat, Feb 16, 2008 at 08:53:34AM +0530, Aneesh Kumar K.V wrote:
> On Fri, Feb 15, 2008 at 11:43:04AM -0800, Mingming Cao wrote:
> > On Fri, 2008-02-15 at 23:46 +0530, Aneesh Kumar K.V wrote:
> > > fallocate blocks are considered as sparse area and read from them should
> > > return zero. ext4_ext_get_blocks should return zero for read request.
> > >
> >
> > The patch itself looks harmless, but I still don't see how this could
> > fix the problem you described at irc: a write hit a BUG_ON() in
> > fs/buffer.c saying the buffer is not mapped. Could you add more details
> > here?
>
> Write will take the below call chain
>
> ext4_write_begin
> block_write_begin
> __block_prepare_write
> ext4_getblock
> ext4_get_blocks_wrap
> (1) ext4_ext_get_blocks with create = 0 return allocated
> ll_rw_block if buffer not uptodate.
> submit_bh
> BUG_ON(!buffer_mapped(bh))
>
>
> ext4_ext_get_blocks at (1) should have returned 0. That would cause
> ext4_get_blocks_wrap to again call ext4_ext_get_blocks with create = 1
> and that would have returned us the buffer head which is mapped. This
> would also result in splitting the extent to initialized and
> uninitialized one.
>

The change is also needed to get mmap on fallocate space to work.

------------[ cut here ]------------
WARNING: at fs/buffer.c:1680 __block_write_full_page+0x101/0x2f1()
Modules linked in:
Pid: 2478, comm: mmaptest Not tainted 2.6.25-rc1 #12
[<c0120e84>] warn_on_slowpath+0x41/0x51
[<c0108c00>] ? native_sched_clock+0x2d/0x9f
[<c013b44e>] ? __lock_acquire+0xacb/0xb13
[<c013b44e>] ? __lock_acquire+0xacb/0xb13
[<c0180f97>] __block_write_full_page+0x101/0x2f1
[<c01d053f>] ? ext4_get_block+0x0/0xc0
[<c018124f>] block_write_full_page+0xc8/0xd1
[<c01d053f>] ? ext4_get_block+0x0/0xc0
[<c01d1a36>] ext4_ordered_writepage+0xad/0x146
[<c01cec12>] ? bget_one+0x0/0xb
[<c014c5dd>] __writepage+0xb/0x25
[<c014cab2>] write_cache_pages+0x180/0x287
[<c014c5d2>] ? __writepage+0x0/0x25
[<c01527d5>] ? __do_fault+0x2e2/0x324
[<c0147889>] ? unlock_page+0x25/0x28
[<c014cbd6>] generic_writepages+0x1d/0x27
[<c014cc0c>] do_writepages+0x2c/0x34
[<c0147fe1>] __filemap_fdatawrite_range+0x5b/0x67
[<c01481ba>] filemap_fdatawrite+0x15/0x17
[<c017ea3d>] do_fsync+0x2c/0x9a
[<c017eacb>] __do_fsync+0x20/0x2f
[<c017eaf9>] sys_fsync+0xd/0xf
[<c0104992>] sysenter_past_esp+0x5f/0xa5
=======================
---[ end trace 5ba60b430e0af601 ]---


-aneesh

2008-02-19 00:14:41

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

On Sat, 2008-02-16 at 08:53 +0530, Aneesh Kumar K.V wrote:
> On Fri, Feb 15, 2008 at 11:43:04AM -0800, Mingming Cao wrote:
> > On Fri, 2008-02-15 at 23:46 +0530, Aneesh Kumar K.V wrote:
> > > fallocate blocks are considered as sparse area and read from them should
> > > return zero. ext4_ext_get_blocks should return zero for read request.
> > >
> >
> > The patch itself looks harmless, but I still don't see how this could
> > fix the problem you described at irc: a write hit a BUG_ON() in
> > fs/buffer.c saying the buffer is not mapped. Could you add more details
> > here?
>
> Write will take the below call chain
>
> ext4_write_begin
> block_write_begin
> __block_prepare_write
> ext4_getblock
> ext4_get_blocks_wrap
> (1) ext4_ext_get_blocks with create = 0 return allocated
> ll_rw_block if buffer not uptodate.
> submit_bh
> BUG_ON(!buffer_mapped(bh))
>
>
> ext4_ext_get_blocks at (1) should have returned 0. That would cause
> ext4_get_blocks_wrap to again call ext4_ext_get_blocks with create = 1
> and that would have returned us the buffer head which is mapped. This
> would also result in splitting the extent to initialized and
> uninitialized one.
>

I see what is happening. Your fix seems treated preallocated blocks as
holes equally when get_blocks() with create = 0. This works for the
current case, but now I think the patch has side effect to delayed
allocation.

How about the following patch?

Regards,
Mingming

ext4: ext4_get_blocks_wrap fix for writing to preallocated
From: Mingming Cao <[email protected]>

This patch fixed a issue with wrting to a preallocated blocks.
A write hit a BUG_ON() in fs/buffer.c saying the buffer is not mapped.

On the write path, ext4_get_block_wrap() is called with create=1, but it
will pass create=0 down to the underlying ext4ext_get_blocks()
to do a look up first. In the preallocation case, ext4_ext_get_blocks()
with create = 0, will return number of blocks pre-allocated and buffer
head unmapped. ext4_get_blocks_wrap() thinks it succeeds too early, without
checking if it needs again call ext4_ext_get_blocks with create = 1
which would do proper handling for writing to preallocated blocks:
split the extent to initialized and uninitialized one and
returns the mapped buffer head.

Treating preallocated blocks as holes equally(i.e. ignoring the number of blocks
pre-allocated and returns 0) when get_blocks() is called with create = 0 is not enough.
ext4_ext_get_blocks() needs to differentiate these two cases for delayed allocation
purpose, as for holes it need to do reservation and prepare for later
delayed allocation, but for pre-allocated blocks it needs skip that work.

It would makes things more clear if we have clear definition of what
get_blocks() return value means.

Similar to ext4_get_blocks_handle(), the following
* return > 0, # of blocks already allocated
* if these are pre-allocated blocks and create = 0
* buffer head is unmapped
* otherwise blocks are mapped.
*
* return = 0, if plain look up failed (blocks have not been allocated)
* buffer head is unmapped
*
* return < 0, error case.

The for the write path, at ext4_ext_get_blocks_wrap(), it could check the
buffer_mapped() status for preallocated extent before quit too early.

Signed-off-by: Mingming Cao <[email protected]>

---
fs/ext4/extents.c | 13 +++++++++++++
fs/ext4/inode.c | 41 ++++++++++++++++++++++++++++++++++++++---
2 files changed, 51 insertions(+), 3 deletions(-)

Index: linux-2.6.25-rc2/fs/ext4/extents.c
===================================================================
--- linux-2.6.25-rc2.orig/fs/ext4/extents.c 2008-02-18 15:39:52.000000000 -0800
+++ linux-2.6.25-rc2/fs/ext4/extents.c 2008-02-18 15:43:33.000000000 -0800
@@ -2285,9 +2285,22 @@ out:
}

/*
+ * Block allocation/map/preallocation routine for extents based files
+ *
+ *
* Need to be called with
* down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
* (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
+ *
+ * return > 0, number of of blocks already mapped/allocated
+ * if these are pre-allocated blocks, buffer head is unmapped if
+ * create = 0 (look up only)
+ * otherwise blocks are mapped
+ *
+ * return = 0, if plain look up failed (blocks have not been allocated)
+ * buffer head is unmapped
+ *
+ * return < 0, error case.
*/
int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_lblk_t iblock,
Index: linux-2.6.25-rc2/fs/ext4/inode.c
===================================================================
--- linux-2.6.25-rc2.orig/fs/ext4/inode.c 2008-02-18 15:07:00.000000000 -0800
+++ linux-2.6.25-rc2/fs/ext4/inode.c 2008-02-18 15:43:59.000000000 -0800
@@ -908,6 +908,26 @@ out:
*/
#define DIO_CREDITS 25

+
+/*
+ * ext4 get_block() wrapper function
+ * It first do a look up, returns if the blocks already mapped. Otherwise
+ * it takes the write sem and do block allocation
+ *
+ * If file type is extents based, call with ext4_ext_get_blocks()
+ * Otherwise, call with ext4_get_blocks_handle() to handle indirect mapping
+ * based files
+ *
+ * return > 0, number of of blocks already mapped/allocated
+ * if these are pre-allocated blocks, buffer head is unmapped if
+ * create = 0 (look up only)
+ * otherwise blocks are mapped
+ *
+ * return = 0, if plain look up failed (blocks have not been allocated)
+ * buffer head is unmapped
+ *
+ * return < 0, error case.
+ */
int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
unsigned long max_blocks, struct buffer_head *bh,
int create, int extend_disksize)
@@ -926,12 +946,27 @@ int ext4_get_blocks_wrap(handle_t *handl
inode, block, max_blocks, bh, 0, 0);
}
up_read((&EXT4_I(inode)->i_data_sem));
- if (!create || (retval > 0))
+
+ /* If it is only a block(s) look up */
+ if (!create )
+ return retval;
+
+ /*
+ * Returns if the blocks have already allocated
+ *
+ * Note that if blocks have been preallocated
+ * ext4_ext_get_block() returns with buffer head unmapped.
+ * Write to a preallocated space needs to split
+ * the preallocated extents, thus needs to update
+ * i_data
+ */
+ if (retval > 0 && buffer_mapped(bh))
return retval;

/*
- * We need to allocate new blocks which will result
- * in i_data update
+ * New blocks and preallocation handling will possiblely result
+ * in i_data update, take the write sem, and call get_blocks()
+ * with create = 1
*/
down_write((&EXT4_I(inode)->i_data_sem));
/*

2008-02-19 03:19:23

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

On Mon, Feb 18, 2008 at 04:14:34PM -0800, Mingming Cao wrote:
> On Sat, 2008-02-16 at 08:53 +0530, Aneesh Kumar K.V wrote:
>
> How about the following patch?
>
> Regards,
> Mingming
>
> ext4: ext4_get_blocks_wrap fix for writing to preallocated
> From: Mingming Cao <[email protected]>
>
> This patch fixed a issue with wrting to a preallocated blocks.
> A write hit a BUG_ON() in fs/buffer.c saying the buffer is not mapped.
>
> On the write path, ext4_get_block_wrap() is called with create=1, but it
> will pass create=0 down to the underlying ext4ext_get_blocks()
> to do a look up first. In the preallocation case, ext4_ext_get_blocks()
> with create = 0, will return number of blocks pre-allocated and buffer
> head unmapped. ext4_get_blocks_wrap() thinks it succeeds too early, without
> checking if it needs again call ext4_ext_get_blocks with create = 1
> which would do proper handling for writing to preallocated blocks:
> split the extent to initialized and uninitialized one and
> returns the mapped buffer head.
>
> Treating preallocated blocks as holes equally(i.e. ignoring the number of blocks
> pre-allocated and returns 0) when get_blocks() is called with create = 0 is not enough.
> ext4_ext_get_blocks() needs to differentiate these two cases for delayed allocation
> purpose, as for holes it need to do reservation and prepare for later
> delayed allocation, but for pre-allocated blocks it needs skip that work.
>
> It would makes things more clear if we have clear definition of what
> get_blocks() return value means.
>
> Similar to ext4_get_blocks_handle(), the following
> * return > 0, # of blocks already allocated
> * if these are pre-allocated blocks and create = 0
> * buffer head is unmapped
> * otherwise blocks are mapped.
> *
> * return = 0, if plain look up failed (blocks have not been allocated)
> * buffer head is unmapped
> *
> * return < 0, error case.
>
> The for the write path, at ext4_ext_get_blocks_wrap(), it could check the
> buffer_mapped() status for preallocated extent before quit too early.
>
> Signed-off-by: Mingming Cao <[email protected]>

Reviewed-by: Aneesh Kumar K.V <[email protected]>



I guess we also need to make sure the buffer head have the mapped bit
set. Something like the patch below.

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bc7081f..69ccda9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2294,6 +2294,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
struct ext4_allocation_request ar;

__clear_bit(BH_New, &bh_result->b_state);
+ __clear_bit(BH_Mapped, &bh_result->b_state);
ext_debug("blocks %u/%lu requested for inode %u\n",
iblock, max_blocks, inode->i_ino);


2008-02-19 05:24:16

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH] ext4: When reading from fallocated blocks make sure we return zero.

On Tue, 2008-02-19 at 08:49 +0530, Aneesh Kumar K.V wrote:
> On Mon, Feb 18, 2008 at 04:14:34PM -0800, Mingming Cao wrote:
> > On Sat, 2008-02-16 at 08:53 +0530, Aneesh Kumar K.V wrote:
> >
> > How about the following patch?
> >
> > Regards,
> > Mingming
> >
> > ext4: ext4_get_blocks_wrap fix for writing to preallocated
> > From: Mingming Cao <[email protected]>
> >
> > This patch fixed a issue with wrting to a preallocated blocks.
> > A write hit a BUG_ON() in fs/buffer.c saying the buffer is not mapped.
> >
> > On the write path, ext4_get_block_wrap() is called with create=1, but it
> > will pass create=0 down to the underlying ext4ext_get_blocks()
> > to do a look up first. In the preallocation case, ext4_ext_get_blocks()
> > with create = 0, will return number of blocks pre-allocated and buffer
> > head unmapped. ext4_get_blocks_wrap() thinks it succeeds too early, without
> > checking if it needs again call ext4_ext_get_blocks with create = 1
> > which would do proper handling for writing to preallocated blocks:
> > split the extent to initialized and uninitialized one and
> > returns the mapped buffer head.
> >
> > Treating preallocated blocks as holes equally(i.e. ignoring the number of blocks
> > pre-allocated and returns 0) when get_blocks() is called with create = 0 is not enough.
> > ext4_ext_get_blocks() needs to differentiate these two cases for delayed allocation
> > purpose, as for holes it need to do reservation and prepare for later
> > delayed allocation, but for pre-allocated blocks it needs skip that work.
> >
> > It would makes things more clear if we have clear definition of what
> > get_blocks() return value means.
> >
> > Similar to ext4_get_blocks_handle(), the following
> > * return > 0, # of blocks already allocated
> > * if these are pre-allocated blocks and create = 0
> > * buffer head is unmapped
> > * otherwise blocks are mapped.
> > *
> > * return = 0, if plain look up failed (blocks have not been allocated)
> > * buffer head is unmapped
> > *
> > * return < 0, error case.
> >
> > The for the write path, at ext4_ext_get_blocks_wrap(), it could check the
> > buffer_mapped() status for preallocated extent before quit too early.
> >
> > Signed-off-by: Mingming Cao <[email protected]>
>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
>
>
>
> I guess we also need to make sure the buffer head have the mapped bit
> set. Something like the patch below.
>
Good point. I modified the patch with clear_buffer_mapped() added at the
begining of the wrapper function.

Mingming
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index bc7081f..69ccda9 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2294,6 +2294,7 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> struct ext4_allocation_request ar;
>
> __clear_bit(BH_New, &bh_result->b_state);
> + __clear_bit(BH_Mapped, &bh_result->b_state);
> ext_debug("blocks %u/%lu requested for inode %u\n",
> iblock, max_blocks, inode->i_ino);
>