2009-04-29 11:13:41

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 1/2 for 2.6.31] ext4: Use -1 as the fake block number for delayed new buffer_head

Block number '0' should not be used as the fake block number for
the delayed new buffer. This will result in vfs calling umap_underlying_metadata for
block number '0'. So use -1 instead.

Signed-off-by: Aneesh Kumar K.V <[email protected]>

---
fs/ext4/inode.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e91f978..3afd5f1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2318,7 +2318,7 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
/* not enough space to reserve */
return ret;

- map_bh(bh_result, inode->i_sb, 0);
+ map_bh(bh_result, inode->i_sb, -1);
set_buffer_new(bh_result);
set_buffer_delay(bh_result);
} else if (ret > 0) {
--
tg: (56a50ad..) ext4_corruption_fix (depends on: master)


2009-04-29 11:13:52

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 2/2 for 2.6.31] ext4: Mark the unwritten buffer_head mapped during write_begin

This avoid multiple get_block calls during write. Now that we have unwritten buffer marked
as mapped, we need to make sure writepages will handle unwritten buffer_heads also.

With this patch we have the below:

ext4_ext_get_block returns unmapped, unwritten, buffer head when called with create = 0
for prealloc space. This make sure we handle the read path and non delalloc case correctly.
Even though the buffer head is marked unmapped we have valid b_blocknr and b_bdev values
in the buffer_head.

ext4_da_get_block_prep called for block resrevation will now return mapped, unwritten, new
buffer_head for prealloc space. This make sure we don't do multiple get_block calls for
write to same offset. Also marking it new make sure sub-block zeroing of buffered writes
happen correctly.


Signed-off-by: Aneesh Kumar K.V <[email protected]>

---
fs/ext4/extents.c | 6 +++-
fs/ext4/inode.c | 78 ++++++++++++++++++++++++++++++++++++----------------
2 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e403321..1d002e0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2865,6 +2865,8 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
if (create == EXT4_CREATE_UNINITIALIZED_EXT)
goto out;
if (!create) {
+ if (allocated > max_blocks)
+ allocated = max_blocks;
/*
* We have blocks reserved already. We
* return allocated blocks so that delalloc
@@ -2872,9 +2874,9 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* the buffer head will be unmapped so that
* a read from the block returns 0s.
*/
- if (allocated > max_blocks)
- allocated = max_blocks;
set_buffer_unwritten(bh_result);
+ bh_result->b_bdev = inode->i_sb->s_bdev;
+ bh_result->b_blocknr = newblock;
goto out2;
}

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3afd5f1..00d4a94 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1839,7 +1839,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd)
* @logical - first logical block to start assignment with
*
* the function goes through all passed space and put actual disk
- * block numbers into buffer heads, dropping BH_Delay
+ * block numbers into buffer heads, dropping BH_Delay and BH_Unwritten
*/
static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
struct buffer_head *exbh)
@@ -1889,16 +1889,24 @@ static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd, sector_t logical,
do {
if (cur_logical >= logical + blocks)
break;
- if (buffer_delay(bh)) {
- bh->b_blocknr = pblock;
- clear_buffer_delay(bh);
- bh->b_bdev = inode->i_sb->s_bdev;
- } else if (buffer_unwritten(bh)) {
- bh->b_blocknr = pblock;
- clear_buffer_unwritten(bh);
- set_buffer_mapped(bh);
- set_buffer_new(bh);
- bh->b_bdev = inode->i_sb->s_bdev;
+
+ if (buffer_delay(bh) ||
+ buffer_unwritten(bh)) {
+
+ BUG_ON(bh->b_bdev != inode->i_sb->s_bdev);
+
+ if (buffer_delay(bh)) {
+ clear_buffer_delay(bh);
+ bh->b_blocknr = pblock;
+ } else {
+ /*
+ * unwritten already should have
+ * blocknr assigned. Verify that
+ */
+ clear_buffer_unwritten(bh);
+ BUG_ON(bh->b_blocknr != pblock);
+ }
+
} else if (buffer_mapped(bh))
BUG_ON(bh->b_blocknr != pblock);

@@ -2040,8 +2048,10 @@ static int mpage_da_map_blocks(struct mpage_da_data *mpd)
* We consider only non-mapped and non-allocated blocks
*/
if ((mpd->b_state & (1 << BH_Mapped)) &&
- !(mpd->b_state & (1 << BH_Delay)))
+ !(mpd->b_state & (1 << BH_Delay)) &&
+ !(mpd->b_state & (1 << BH_Unwritten)))
return 0;
+
new.b_state = mpd->b_state;
new.b_blocknr = 0;
new.b_size = mpd->b_size;
@@ -2179,6 +2189,17 @@ flush_it:
return;
}

+static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
+{
+ /*
+ * unmapped buffer is possible for holes.
+ * delay buffer is possible with delayed allocation.
+ * We also need to consider unwritten buffer as unmapped.
+ */
+ return ((!buffer_mapped(bh) || buffer_delay(bh) ||
+ buffer_unwritten(bh)) && buffer_dirty(bh));
+}
+
/*
* __mpage_da_writepage - finds extent of pages and blocks
*
@@ -2263,8 +2284,7 @@ static int __mpage_da_writepage(struct page *page,
* Otherwise we won't make progress
* with the page in ext4_da_writepage
*/
- if (buffer_dirty(bh) &&
- (!buffer_mapped(bh) || buffer_delay(bh))) {
+ if (ext4_bh_unmapped_or_delay(NULL, bh)) {
mpage_add_bh_to_extent(mpd, logical,
bh->b_size,
bh->b_state);
@@ -2292,6 +2312,14 @@ static int __mpage_da_writepage(struct page *page,
/*
* this is a special callback for ->write_begin() only
* it's intention is to return mapped block or reserve space
+ *
+ * For delayed buffer_head we have BH_Mapped, BH_New, BH_Delay set.
+ * We also have b_blocknr = -1 and b_bdev initialized properly
+ *
+ * For unwritten buffer_head we have BH_Mapped, BH_New, BH_Unwritten set.
+ * We also have b_blocknr = physicalblock mapping unwritten extent and b_bdev
+ * initialized properly.
+ *
*/
static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
@@ -2323,21 +2351,23 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
set_buffer_delay(bh_result);
} else if (ret > 0) {
bh_result->b_size = (ret << inode->i_blkbits);
+ if (buffer_unwritten(bh_result)) {
+ /* a delayed write to unwritten bh should
+ * be marked new and mapped. Mapped ensures
+ * that we don't do get_block multiple times
+ * when we write to the same offset and new
+ * ensures that we do proper zero out for
+ * partial write
+ */
+ set_buffer_new(bh_result);
+ set_buffer_mapped(bh_result);
+ }
ret = 0;
}

return ret;
}

-static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
-{
- /*
- * unmapped buffer is possible for holes.
- * delay buffer is possible with delayed allocation
- */
- return ((!buffer_mapped(bh) || buffer_delay(bh)) && buffer_dirty(bh));
-}

2009-05-01 19:05:56

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 1/2 for 2.6.31] ext4: Use -1 as the fake block number for delayed new buffer_head

On Fri, May 01, 2009 at 02:58:41PM -0400, Theodore Tso wrote:
> You mean "for 2.6.30", right?
>

The reason for me to post two series one for 2.6.30 and other for 2.6.31
was the [Patch 2/2] for 2.6.31 needs more testing. I added this specific
patch in both the series to make sure we don't miss the change in case
we decided not to make any changes for 2.6.30. Also in my testing I created
different topgit branches with different dependencies. So having [PATCH
1/2 ] in both the series helped in testing with topgit branches.

-aneesh

2009-05-01 22:59:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 1/2 for 2.6.31] ext4: Use -1 as the fake block number for delayed new buffer_head

On Sat, May 02, 2009 at 12:35:46AM +0530, Aneesh Kumar K.V wrote:
> On Fri, May 01, 2009 at 02:58:41PM -0400, Theodore Tso wrote:
> > You mean "for 2.6.30", right?
> >
>
> The reason for me to post two series one for 2.6.30 and other for 2.6.31
> was the [Patch 2/2] for 2.6.31 needs more testing. I added this specific
> patch in both the series to make sure we don't miss the change in case
> we decided not to make any changes for 2.6.30. Also in my testing I created
> different topgit branches with different dependencies. So having [PATCH
> 1/2 ] in both the series helped in testing with topgit branches.

Yeah, but you didn't label the other series as "for 2.6.30". To makes
matter worse, the fact that patch #2 in what I think is your 2.6.30
patch series (the V4 series?) is the same as patch #1 of your 2.6.31
series, and your "2.6.31" series doesn't have a patch backing out the
2.6.30 changes (I assume you need to do that, right?), left me as a
very confused maintainer about.

OK, so what I have in the patch queue is the V4 version, somewhat
modified, and I'll ignore the "for 2.6.31" patches for now. When
you're ready, please send me patches versus the end of the stable
series of the ext4 patch queue, and please give me this kind of
context.

If you could verify what's in the patch queue, I'd appreciate it.

Thanks,

- Ted