2009-05-27 15:58:20

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 1/3] ext4: Don't look at buffer_heads outside i_size.

Buffer heads outside i_size will be unmapped. So when we
are doing "walk_page_buffers" limit ourself to i_size.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/inode.c | 29 ++++++++++++++++++-----------
1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3b07a6c..f0f0065 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2514,7 +2514,7 @@ static int ext4_da_writepage(struct page *page,
* all are mapped and non delay. We don't want to
* do block allocation here.
*/
- ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
+ ret = block_prepare_write(page, 0, len,
noalloc_get_block_write);
if (!ret) {
page_bufs = page_buffers(page);
@@ -2536,7 +2536,7 @@ static int ext4_da_writepage(struct page *page,
return 0;
}
/* now mark the buffer_heads as dirty and uptodate */
- block_commit_write(page, 0, PAGE_CACHE_SIZE);
+ block_commit_write(page, 0, len);
}

if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
@@ -3210,6 +3210,8 @@ static int ext4_normal_writepage(struct page *page,
static int __ext4_journalled_writepage(struct page *page,
struct writeback_control *wbc)
{
+ loff_t size;
+ unsigned int len;
struct address_space *mapping = page->mapping;
struct inode *inode = mapping->host;
struct buffer_head *page_bufs;
@@ -3217,14 +3219,19 @@ static int __ext4_journalled_writepage(struct page *page,
int ret = 0;
int err;

- ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
+ size = i_size_read(inode);
+ if (page->index == size >> PAGE_CACHE_SHIFT)
+ len = size & ~PAGE_CACHE_MASK;
+ else
+ len = PAGE_CACHE_SIZE;
+
+ ret = block_prepare_write(page, 0, len,
noalloc_get_block_write);
if (ret != 0)
goto out_unlock;

page_bufs = page_buffers(page);
- walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE, NULL,
- bget_one);
+ walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
/* As soon as we unlock the page, it can go away, but we have
* references to buffers so we are safe */
unlock_page(page);
@@ -3235,19 +3242,19 @@ static int __ext4_journalled_writepage(struct page *page,
goto out;
}

- ret = walk_page_buffers(handle, page_bufs, 0,
- PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
+ ret = walk_page_buffers(handle, page_bufs, 0, len,
+ NULL, do_journal_get_write_access);

- err = walk_page_buffers(handle, page_bufs, 0,
- PAGE_CACHE_SIZE, NULL, write_end_fn);
+ err = walk_page_buffers(handle, page_bufs, 0, len,
+ NULL, write_end_fn);
if (ret == 0)
ret = err;
err = ext4_journal_stop(handle);
if (!ret)
ret = err;

- walk_page_buffers(handle, page_bufs, 0,
- PAGE_CACHE_SIZE, NULL, bput_one);
+ walk_page_buffers(handle, page_bufs, 0, len,
+ NULL, bput_one);
EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
goto out;

--
1.6.3.1.145.gb74d77



2009-05-27 15:58:24

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 2/3] ext4: Check for only delay or unwritten buffer_heads

Even with changes to make pages writeprotech on truncate/i_size update we
can still see buffer_heads which are not mapped in the writepage
callback. Consider the below case.

1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)

Now if we get a writepage callback immediately after (4) and before an
attempt to write at any other offset via mmap address (which implies we
are yet to get a pagefault and do a get_block) what we would have is the
page which is dirty have first block allocated and the other three
buffer_heads unmapped.

In the above case the writepage should go ahead and try to write
the first blocks and clear the page_dirty flag. Because the further
attempt to write to the page will again create a fault and result in
allocating blocks and marking page dirty. Also if we don't write
any other offset via mmap address we would still have written the first
block to the disk and rest of the space will be considered as a hole.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/inode.c | 21 +++++++--------------
1 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f0f0065..1efb296 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2239,15 +2239,9 @@ static void mpage_add_bh_to_extent(struct mpage_da_data *mpd,
return;
}

-static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
+static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh)
{
- /*
- * unmapped buffer is possible for holes.
- * delay buffer is possible with delayed allocation.
- * We also need to consider unwritten buffer as unmapped.
- */
- return (!buffer_mapped(bh) || buffer_delay(bh) ||
- buffer_unwritten(bh)) && buffer_dirty(bh);
+ return (buffer_delay(bh) || buffer_unwritten(bh)) && buffer_dirty(bh);
}

/*
@@ -2334,7 +2328,7 @@ static int __mpage_da_writepage(struct page *page,
* Otherwise we won't make progress
* with the page in ext4_da_writepage
*/
- if (ext4_bh_unmapped_or_delay(NULL, bh)) {
+ if (ext4_bh_delay_or_unwritten(NULL, bh)) {
mpage_add_bh_to_extent(mpd, logical,
bh->b_size,
bh->b_state);
@@ -2451,7 +2445,6 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
* so call get_block_wrap with create = 0
*/
ret = ext4_get_blocks(NULL, inode, iblock, max_blocks, bh_result, 0);
- BUG_ON(create && ret == 0);
if (ret > 0) {
bh_result->b_size = (ret << inode->i_blkbits);
ret = 0;
@@ -2487,7 +2480,7 @@ static int ext4_da_writepage(struct page *page,
if (page_has_buffers(page)) {
page_bufs = page_buffers(page);
if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
- ext4_bh_unmapped_or_delay)) {
+ ext4_bh_delay_or_unwritten)) {
/*
* We don't want to do block allocation
* So redirty the page and return
@@ -2520,7 +2513,7 @@ static int ext4_da_writepage(struct page *page,
page_bufs = page_buffers(page);
/* check whether all are mapped and non delay */
if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
- ext4_bh_unmapped_or_delay)) {
+ ext4_bh_delay_or_unwritten)) {
redirty_page_for_writepage(wbc, page);
unlock_page(page);
return 0;
@@ -3196,7 +3189,7 @@ static int ext4_normal_writepage(struct page *page,
* happily proceed with mapping them and writing the page.
*/
BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
- ext4_bh_unmapped_or_delay));
+ ext4_bh_delay_or_unwritten));
}

if (!ext4_journal_current_handle())
@@ -3291,7 +3284,7 @@ static int ext4_journalled_writepage(struct page *page,
* happily proceed with mapping them and writing the page.
*/
BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
- ext4_bh_unmapped_or_delay));
+ ext4_bh_delay_or_unwritten));
}

if (ext4_journal_current_handle())
--
1.6.3.1.145.gb74d77


2009-05-27 15:58:30

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 3/3] ext4: Add generic writepage callback

Even with changes to make pages writeprotect on truncate/i_size update we
can still see buffer_heads which are not mapped in the writepage
callback. Consider the below case.

1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)

Now if we get a writepage callback immediately after (4) and before an
attempt to write at any other offset via mmap address (which implies we
are yet to get a pagefault and do a get_block) what we would have is the
page which is dirty have first block allocated and the other three
buffer_heads unmapped.

In the above case the writepage should go ahead and try to write
the first blocks and clear the page_dirty flag. Because the further
attempt to write to the page will again create a fault and result in
allocating blocks and marking page dirty. Also if we don't write
any other offset via mmap address we would still have written the first
block to the disk and rest of the space will be considered as a hole.

Delayed allocation writepage callback already did most of these. So
extend it to allow block_write_full_page on pages with unmapped buffer_heads.
We pass noalloc_get_block_write as the call back which ensures we don't
do any block allocation in writepage callback. That is need because of
the locking order between page lock and journal_start.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/inode.c | 338 +++++++++++++++++--------------------------------------
1 files changed, 104 insertions(+), 234 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1efb296..c1ddaaf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2326,7 +2326,7 @@ static int __mpage_da_writepage(struct page *page,
* We need to try to allocate
* unmapped blocks in the same page.
* Otherwise we won't make progress
- * with the page in ext4_da_writepage
+ * with the page in ext4_writepage
*/
if (ext4_bh_delay_or_unwritten(NULL, bh)) {
mpage_add_bh_to_extent(mpd, logical,
@@ -2452,14 +2452,102 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
return ret;
}

+static int bget_one(handle_t *handle, struct buffer_head *bh)
+{
+ get_bh(bh);
+ return 0;
+}
+
+static int bput_one(handle_t *handle, struct buffer_head *bh)
+{
+ put_bh(bh);
+ return 0;
+}
+
+static int __ext4_journalled_writepage(struct page *page,
+ struct writeback_control *wbc,
+ unsigned int len)
+{
+ struct address_space *mapping = page->mapping;
+ struct inode *inode = mapping->host;
+ struct buffer_head *page_bufs;
+ handle_t *handle = NULL;
+ int ret = 0;
+ int err;
+
+ page_bufs = page_buffers(page);
+ BUG_ON(!page_bufs);
+ walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
+ /* As soon as we unlock the page, it can go away, but we have
+ * references to buffers so we are safe */
+ unlock_page(page);
+
+ handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out;
+ }
+
+ ret = walk_page_buffers(handle, page_bufs, 0, len,
+ NULL, do_journal_get_write_access);
+
+ err = walk_page_buffers(handle, page_bufs, 0, len,
+ NULL, write_end_fn);
+ if (ret == 0)
+ ret = err;
+ err = ext4_journal_stop(handle);
+ if (!ret)
+ ret = err;
+
+ walk_page_buffers(handle, page_bufs, 0, len,
+ NULL, bput_one);
+ EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
+out:
+ return ret;
+}
+
/*
+ * Note that we don't need to start a transaction unless we're journaling data
+ * because we should have holes filled from ext4_page_mkwrite(). We even don't
+ * need to file the inode to the transaction's list in ordered mode because if
+ * we are writing back data added by write(), the inode is already there and if
+ * we are writing back data modified via mmap(), noone guarantees in which
+ * transaction the data will hit the disk. In case we are journaling data, we
+ * cannot start transaction directly because transaction start ranks above page
+ * lock so we have to do some magic.
+ *
* This function can get called via...
* - ext4_da_writepages after taking page lock (have journal handle)
* - journal_submit_inode_data_buffers (no journal handle)
* - shrink_page_list via pdflush (no journal handle)
* - grab_page_cache when doing write_begin (have journal handle)
+ *
+ * We don't do any block allocation in this function. If we have page with
+ * multiple blocks we need to write those buffer_heads that are mapped. This
+ * is important for mmaped based write. So if we do with blocksize 1K
+ * truncate(f, 1024);
+ * a = mmap(f, 0, 4096);
+ * a[0] = 'a';
+ * truncate(f, 4096);
+ * we have in the page first buffer_head mapped via page_mkwrite call back
+ * but other bufer_heads would be unmapped but dirty(dirty done via the
+ * do_wp_page). So writepage should write the first block. If we modify
+ * the mmap area beyond 1024 we will again get a page_fault and the
+ * page_mkwrite callback will do the block allocation and mark the
+ * buffer_heads mapped.
+ *
+ * We redirty the page if we have any buffer_heads that is either delay or
+ * unwritten in the page.
+ *
+ * We can get recursively called as show below.
+ *
+ * ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
+ * ext4_writepage()
+ *
+ * But since we don't do any block allocation we should not deadlock.
+ * Page also have the dirty flag cleared so we don't get recurive page_lock.
*/
-static int ext4_da_writepage(struct page *page,
+static int ext4_writepage(struct page *page,
struct writeback_control *wbc)
{
int ret = 0;
@@ -2468,7 +2556,7 @@ static int ext4_da_writepage(struct page *page,
struct buffer_head *page_bufs;
struct inode *inode = page->mapping->host;

- trace_mark(ext4_da_writepage,
+ trace_mark(ext4_writepage,
"dev %s ino %lu page_index %lu",
inode->i_sb->s_id, inode->i_ino, page->index);
size = i_size_read(inode);
@@ -2532,6 +2620,15 @@ static int ext4_da_writepage(struct page *page,
block_commit_write(page, 0, len);
}

+ if (PageChecked(page) && ext4_should_journal_data(inode)) {
+ /*
+ * It's mmapped pagecache. Add buffers and journal it. There
+ * doesn't seem much point in redirtying the page here.
+ */
+ ClearPageChecked(page);
+ return __ext4_journalled_writepage(page, wbc, len);
+ }
+
if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
ret = nobh_writepage(page, noalloc_get_block_write, wbc);
else
@@ -3085,233 +3182,6 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
return generic_block_bmap(mapping, block, ext4_get_block);
}

-static int bget_one(handle_t *handle, struct buffer_head *bh)
-{
- get_bh(bh);
- return 0;
-}
-
-static int bput_one(handle_t *handle, struct buffer_head *bh)
-{
- put_bh(bh);
- return 0;
-}
-
-/*
- * Note that we don't need to start a transaction unless we're journaling data
- * because we should have holes filled from ext4_page_mkwrite(). We even don't
- * need to file the inode to the transaction's list in ordered mode because if
- * we are writing back data added by write(), the inode is already there and if
- * we are writing back data modified via mmap(), noone guarantees in which
- * transaction the data will hit the disk. In case we are journaling data, we
- * cannot start transaction directly because transaction start ranks above page
- * lock so we have to do some magic.
- *
- * In all journaling modes block_write_full_page() will start the I/O.
- *
- * Problem:
- *
- * ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
- * ext4_writepage()
- *
- * Similar for:
- *
- * ext4_file_write() -> generic_file_write() -> __alloc_pages() -> ...
- *
- * Same applies to ext4_get_block(). We will deadlock on various things like
- * lock_journal and i_data_sem
- *
- * Setting PF_MEMALLOC here doesn't work - too many internal memory
- * allocations fail.
- *
- * 16May01: If we're reentered then journal_current_handle() will be
- * non-zero. We simply *return*.
- *
- * 1 July 2001: @@@ FIXME:
- * In journalled data mode, a data buffer may be metadata against the
- * current transaction. But the same file is part of a shared mapping
- * and someone does a writepage() on it.
- *
- * We will move the buffer onto the async_data list, but *after* it has
- * been dirtied. So there's a small window where we have dirty data on
- * BJ_Metadata.
- *
- * Note that this only applies to the last partial page in the file. The
- * bit which block_write_full_page() uses prepare/commit for. (That's
- * broken code anyway: it's wrong for msync()).
- *
- * It's a rare case: affects the final partial page, for journalled data
- * where the file is subject to bith write() and writepage() in the same
- * transction. To fix it we'll need a custom block_write_full_page().
- * We'll probably need that anyway for journalling writepage() output.
- *
- * We don't honour synchronous mounts for writepage(). That would be
- * disastrous. Any write() or metadata operation will sync the fs for
- * us.
- *
- */
-static int __ext4_normal_writepage(struct page *page,
- struct writeback_control *wbc)
-{
- struct inode *inode = page->mapping->host;
-
- if (test_opt(inode->i_sb, NOBH))
- return nobh_writepage(page, noalloc_get_block_write, wbc);
- else
- return block_write_full_page(page, noalloc_get_block_write,
- wbc);
-}
-
-static int ext4_normal_writepage(struct page *page,
- struct writeback_control *wbc)
-{
- struct inode *inode = page->mapping->host;
- loff_t size = i_size_read(inode);
- loff_t len;
-
- trace_mark(ext4_normal_writepage,
- "dev %s ino %lu page_index %lu",
- inode->i_sb->s_id, inode->i_ino, page->index);
- J_ASSERT(PageLocked(page));
- if (page->index == size >> PAGE_CACHE_SHIFT)
- len = size & ~PAGE_CACHE_MASK;
- else
- len = PAGE_CACHE_SIZE;
-
- if (page_has_buffers(page)) {
- /* if page has buffers it should all be mapped
- * and allocated. If there are not buffers attached
- * to the page we know the page is dirty but it lost
- * buffers. That means that at some moment in time
- * after write_begin() / write_end() has been called
- * all buffers have been clean and thus they must have been
- * written at least once. So they are all mapped and we can
- * happily proceed with mapping them and writing the page.
- */
- BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
- ext4_bh_delay_or_unwritten));
- }
-
- if (!ext4_journal_current_handle())
- return __ext4_normal_writepage(page, wbc);
-
- redirty_page_for_writepage(wbc, page);
- unlock_page(page);
- return 0;
-}
-
-static int __ext4_journalled_writepage(struct page *page,
- struct writeback_control *wbc)
-{
- loff_t size;
- unsigned int len;
- struct address_space *mapping = page->mapping;
- struct inode *inode = mapping->host;
- struct buffer_head *page_bufs;
- handle_t *handle = NULL;
- int ret = 0;
- int err;
-
- size = i_size_read(inode);
- if (page->index == size >> PAGE_CACHE_SHIFT)
- len = size & ~PAGE_CACHE_MASK;
- else
- len = PAGE_CACHE_SIZE;
-
- ret = block_prepare_write(page, 0, len,
- noalloc_get_block_write);
- if (ret != 0)
- goto out_unlock;
-
- page_bufs = page_buffers(page);
- walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
- /* As soon as we unlock the page, it can go away, but we have
- * references to buffers so we are safe */
- unlock_page(page);
-
- handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- goto out;
- }
-
- ret = walk_page_buffers(handle, page_bufs, 0, len,
- NULL, do_journal_get_write_access);
-
- err = walk_page_buffers(handle, page_bufs, 0, len,
- NULL, write_end_fn);
- if (ret == 0)
- ret = err;
- err = ext4_journal_stop(handle);
- if (!ret)
- ret = err;
-
- walk_page_buffers(handle, page_bufs, 0, len,
- NULL, bput_one);
- EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
- goto out;
-
-out_unlock:
- unlock_page(page);
-out:
- return ret;
-}
-
-static int ext4_journalled_writepage(struct page *page,
- struct writeback_control *wbc)
-{
- struct inode *inode = page->mapping->host;
- loff_t size = i_size_read(inode);
- loff_t len;
-
- trace_mark(ext4_journalled_writepage,
- "dev %s ino %lu page_index %lu",
- inode->i_sb->s_id, inode->i_ino, page->index);
- J_ASSERT(PageLocked(page));
- if (page->index == size >> PAGE_CACHE_SHIFT)
- len = size & ~PAGE_CACHE_MASK;
- else
- len = PAGE_CACHE_SIZE;
-
- if (page_has_buffers(page)) {
- /* if page has buffers it should all be mapped
- * and allocated. If there are not buffers attached
- * to the page we know the page is dirty but it lost
- * buffers. That means that at some moment in time
- * after write_begin() / write_end() has been called
- * all buffers have been clean and thus they must have been
- * written at least once. So they are all mapped and we can
- * happily proceed with mapping them and writing the page.
- */
- BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
- ext4_bh_delay_or_unwritten));
- }
-
- if (ext4_journal_current_handle())
- goto no_write;
-
- if (PageChecked(page)) {
- /*
- * It's mmapped pagecache. Add buffers and journal it. There
- * doesn't seem much point in redirtying the page here.
- */
- ClearPageChecked(page);
- return __ext4_journalled_writepage(page, wbc);
- } else {
- /*
- * It may be a page full of checkpoint-mode buffers. We don't
- * really know unless we go poke around in the buffer_heads.
- * But block_write_full_page will do the right thing.
- */
- return block_write_full_page(page, noalloc_get_block_write,
- wbc);
- }
-no_write:
- redirty_page_for_writepage(wbc, page);
- unlock_page(page);
- return 0;
-}
-
static int ext4_readpage(struct file *file, struct page *page)
{
return mpage_readpage(page, ext4_get_block);
@@ -3458,7 +3328,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
static const struct address_space_operations ext4_ordered_aops = {
.readpage = ext4_readpage,
.readpages = ext4_readpages,
- .writepage = ext4_normal_writepage,
+ .writepage = ext4_writepage,
.sync_page = block_sync_page,
.write_begin = ext4_write_begin,
.write_end = ext4_ordered_write_end,
@@ -3474,7 +3344,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
static const struct address_space_operations ext4_writeback_aops = {
.readpage = ext4_readpage,
.readpages = ext4_readpages,
- .writepage = ext4_normal_writepage,
+ .writepage = ext4_writepage,
.sync_page = block_sync_page,
.write_begin = ext4_write_begin,
.write_end = ext4_writeback_write_end,
@@ -3490,7 +3360,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
static const struct address_space_operations ext4_journalled_aops = {
.readpage = ext4_readpage,
.readpages = ext4_readpages,
- .writepage = ext4_journalled_writepage,
+ .writepage = ext4_writepage,
.sync_page = block_sync_page,
.write_begin = ext4_write_begin,
.write_end = ext4_journalled_write_end,
@@ -3505,7 +3375,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
static const struct address_space_operations ext4_da_aops = {
.readpage = ext4_readpage,
.readpages = ext4_readpages,
- .writepage = ext4_da_writepage,
+ .writepage = ext4_writepage,
.writepages = ext4_da_writepages,
.sync_page = block_sync_page,
.write_begin = ext4_da_write_begin,
--
1.6.3.1.145.gb74d77


2009-05-27 16:27:39

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 1/3] ext4: Don't look at buffer_heads outside i_size.

On Wed, May 27, 2009 at 09:28:06PM +0530, Aneesh Kumar K.V wrote:
> Buffer heads outside i_size will be unmapped. So when we
> are doing "walk_page_buffers" limit ourself to i_size.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/inode.c | 29 ++++++++++++++++++-----------
> 1 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3b07a6c..f0f0065 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2514,7 +2514,7 @@ static int ext4_da_writepage(struct page *page,
> * all are mapped and non delay. We don't want to
> * do block allocation here.
> */
> - ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
> + ret = block_prepare_write(page, 0, len,
> noalloc_get_block_write);
> if (!ret) {
> page_bufs = page_buffers(page);
> @@ -2536,7 +2536,7 @@ static int ext4_da_writepage(struct page *page,
> return 0;
> }
> /* now mark the buffer_heads as dirty and uptodate */
> - block_commit_write(page, 0, PAGE_CACHE_SIZE);
> + block_commit_write(page, 0, len);
> }
>
> if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
> @@ -3210,6 +3210,8 @@ static int ext4_normal_writepage(struct page *page,
> static int __ext4_journalled_writepage(struct page *page,
> struct writeback_control *wbc)
> {
> + loff_t size;
> + unsigned int len;
> struct address_space *mapping = page->mapping;
> struct inode *inode = mapping->host;
> struct buffer_head *page_bufs;
> @@ -3217,14 +3219,19 @@ static int __ext4_journalled_writepage(struct page *page,
> int ret = 0;
> int err;
>
> - ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
> + size = i_size_read(inode);
> + if (page->index == size >> PAGE_CACHE_SHIFT)
> + len = size & ~PAGE_CACHE_MASK;
> + else
> + len = PAGE_CACHE_SIZE;
> +
> + ret = block_prepare_write(page, 0, len,
> noalloc_get_block_write);
> if (ret != 0)
> goto out_unlock;
>
> page_bufs = page_buffers(page);
> - walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE, NULL,
> - bget_one);
> + walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
> /* As soon as we unlock the page, it can go away, but we have
> * references to buffers so we are safe */
> unlock_page(page);
> @@ -3235,19 +3242,19 @@ static int __ext4_journalled_writepage(struct page *page,
> goto out;
> }
>
> - ret = walk_page_buffers(handle, page_bufs, 0,
> - PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
> + ret = walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, do_journal_get_write_access);
>
> - err = walk_page_buffers(handle, page_bufs, 0,
> - PAGE_CACHE_SIZE, NULL, write_end_fn);
> + err = walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, write_end_fn);
> if (ret == 0)
> ret = err;
> err = ext4_journal_stop(handle);
> if (!ret)
> ret = err;
>
> - walk_page_buffers(handle, page_bufs, 0,
> - PAGE_CACHE_SIZE, NULL, bput_one);
> + walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, bput_one);
> EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
> goto out;
>

This feels a bit silly since its a pretty simple patch, but I took time to
review it so,

Reviewed-by: Josef Bacik <[email protected]>

2009-05-28 08:21:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/3] ext4: Don't look at buffer_heads outside i_size.

On Wed 27-05-09 21:28:06, Aneesh Kumar K.V wrote:
> Buffer heads outside i_size will be unmapped. So when we
> are doing "walk_page_buffers" limit ourself to i_size.
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
The patch looks fine.
Acked-by: Jan Kara <[email protected]>

Honza

> ---
> fs/ext4/inode.c | 29 ++++++++++++++++++-----------
> 1 files changed, 18 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3b07a6c..f0f0065 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2514,7 +2514,7 @@ static int ext4_da_writepage(struct page *page,
> * all are mapped and non delay. We don't want to
> * do block allocation here.
> */
> - ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
> + ret = block_prepare_write(page, 0, len,
> noalloc_get_block_write);
> if (!ret) {
> page_bufs = page_buffers(page);
> @@ -2536,7 +2536,7 @@ static int ext4_da_writepage(struct page *page,
> return 0;
> }
> /* now mark the buffer_heads as dirty and uptodate */
> - block_commit_write(page, 0, PAGE_CACHE_SIZE);
> + block_commit_write(page, 0, len);
> }
>
> if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
> @@ -3210,6 +3210,8 @@ static int ext4_normal_writepage(struct page *page,
> static int __ext4_journalled_writepage(struct page *page,
> struct writeback_control *wbc)
> {
> + loff_t size;
> + unsigned int len;
> struct address_space *mapping = page->mapping;
> struct inode *inode = mapping->host;
> struct buffer_head *page_bufs;
> @@ -3217,14 +3219,19 @@ static int __ext4_journalled_writepage(struct page *page,
> int ret = 0;
> int err;
>
> - ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE,
> + size = i_size_read(inode);
> + if (page->index == size >> PAGE_CACHE_SHIFT)
> + len = size & ~PAGE_CACHE_MASK;
> + else
> + len = PAGE_CACHE_SIZE;
> +
> + ret = block_prepare_write(page, 0, len,
> noalloc_get_block_write);
> if (ret != 0)
> goto out_unlock;
>
> page_bufs = page_buffers(page);
> - walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE, NULL,
> - bget_one);
> + walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
> /* As soon as we unlock the page, it can go away, but we have
> * references to buffers so we are safe */
> unlock_page(page);
> @@ -3235,19 +3242,19 @@ static int __ext4_journalled_writepage(struct page *page,
> goto out;
> }
>
> - ret = walk_page_buffers(handle, page_bufs, 0,
> - PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
> + ret = walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, do_journal_get_write_access);
>
> - err = walk_page_buffers(handle, page_bufs, 0,
> - PAGE_CACHE_SIZE, NULL, write_end_fn);
> + err = walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, write_end_fn);
> if (ret == 0)
> ret = err;
> err = ext4_journal_stop(handle);
> if (!ret)
> ret = err;
>
> - walk_page_buffers(handle, page_bufs, 0,
> - PAGE_CACHE_SIZE, NULL, bput_one);
> + walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, bput_one);
> EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
> goto out;
>
> --
> 1.6.3.1.145.gb74d77
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-05-28 08:27:10

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 2/3] ext4: Check for only delay or unwritten buffer_heads

On Wed 27-05-09 21:28:07, Aneesh Kumar K.V wrote:
> Even with changes to make pages writeprotech on truncate/i_size update we
> can still see buffer_heads which are not mapped in the writepage
> callback. Consider the below case.
>
> 1) truncate(f, 1024)
> 2) mmap(f, 0, 4096)
> 3) a[0] = 'a'
> 4) truncate(f, 4096)
> 5) writepage(...)
>
> Now if we get a writepage callback immediately after (4) and before an
> attempt to write at any other offset via mmap address (which implies we
> are yet to get a pagefault and do a get_block) what we would have is the
> page which is dirty have first block allocated and the other three
> buffer_heads unmapped.
>
> In the above case the writepage should go ahead and try to write
> the first blocks and clear the page_dirty flag. Because the further
> attempt to write to the page will again create a fault and result in
> allocating blocks and marking page dirty. Also if we don't write
> any other offset via mmap address we would still have written the first
> block to the disk and rest of the space will be considered as a hole.
OK, but this requires my patches to not cause data loss, doesn't it?
Nothing prevents user from writing data into the full page just after
truncate(f, 4096) before the writepage is called. And without my patches,
fault will not happen for such user write.
Just that we should have this dependency in mind. Otherwise the patch
looks fine to me.

Honza

> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/inode.c | 21 +++++++--------------
> 1 files changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index f0f0065..1efb296 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2239,15 +2239,9 @@ static void mpage_add_bh_to_extent(struct mpage_da_data *mpd,
> return;
> }
>
> -static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
> +static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh)
> {
> - /*
> - * unmapped buffer is possible for holes.
> - * delay buffer is possible with delayed allocation.
> - * We also need to consider unwritten buffer as unmapped.
> - */
> - return (!buffer_mapped(bh) || buffer_delay(bh) ||
> - buffer_unwritten(bh)) && buffer_dirty(bh);
> + return (buffer_delay(bh) || buffer_unwritten(bh)) && buffer_dirty(bh);
> }
>
> /*
> @@ -2334,7 +2328,7 @@ static int __mpage_da_writepage(struct page *page,
> * Otherwise we won't make progress
> * with the page in ext4_da_writepage
> */
> - if (ext4_bh_unmapped_or_delay(NULL, bh)) {
> + if (ext4_bh_delay_or_unwritten(NULL, bh)) {
> mpage_add_bh_to_extent(mpd, logical,
> bh->b_size,
> bh->b_state);
> @@ -2451,7 +2445,6 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
> * so call get_block_wrap with create = 0
> */
> ret = ext4_get_blocks(NULL, inode, iblock, max_blocks, bh_result, 0);
> - BUG_ON(create && ret == 0);
> if (ret > 0) {
> bh_result->b_size = (ret << inode->i_blkbits);
> ret = 0;
> @@ -2487,7 +2480,7 @@ static int ext4_da_writepage(struct page *page,
> if (page_has_buffers(page)) {
> page_bufs = page_buffers(page);
> if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
> - ext4_bh_unmapped_or_delay)) {
> + ext4_bh_delay_or_unwritten)) {
> /*
> * We don't want to do block allocation
> * So redirty the page and return
> @@ -2520,7 +2513,7 @@ static int ext4_da_writepage(struct page *page,
> page_bufs = page_buffers(page);
> /* check whether all are mapped and non delay */
> if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
> - ext4_bh_unmapped_or_delay)) {
> + ext4_bh_delay_or_unwritten)) {
> redirty_page_for_writepage(wbc, page);
> unlock_page(page);
> return 0;
> @@ -3196,7 +3189,7 @@ static int ext4_normal_writepage(struct page *page,
> * happily proceed with mapping them and writing the page.
> */
> BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> - ext4_bh_unmapped_or_delay));
> + ext4_bh_delay_or_unwritten));
> }
>
> if (!ext4_journal_current_handle())
> @@ -3291,7 +3284,7 @@ static int ext4_journalled_writepage(struct page *page,
> * happily proceed with mapping them and writing the page.
> */
> BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> - ext4_bh_unmapped_or_delay));
> + ext4_bh_delay_or_unwritten));
> }
>
> if (ext4_journal_current_handle())
> --
> 1.6.3.1.145.gb74d77
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-05-28 08:33:19

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 3/3] ext4: Add generic writepage callback

On Wed 27-05-09 21:28:08, Aneesh Kumar K.V wrote:
> Even with changes to make pages writeprotect on truncate/i_size update we
> can still see buffer_heads which are not mapped in the writepage
> callback. Consider the below case.
>
> 1) truncate(f, 1024)
> 2) mmap(f, 0, 4096)
> 3) a[0] = 'a'
> 4) truncate(f, 4096)
> 5) writepage(...)
>
> Now if we get a writepage callback immediately after (4) and before an
> attempt to write at any other offset via mmap address (which implies we
> are yet to get a pagefault and do a get_block) what we would have is the
> page which is dirty have first block allocated and the other three
> buffer_heads unmapped.
>
> In the above case the writepage should go ahead and try to write
> the first blocks and clear the page_dirty flag. Because the further
> attempt to write to the page will again create a fault and result in
> allocating blocks and marking page dirty. Also if we don't write
> any other offset via mmap address we would still have written the first
> block to the disk and rest of the space will be considered as a hole.
>
> Delayed allocation writepage callback already did most of these. So
> extend it to allow block_write_full_page on pages with unmapped buffer_heads.
> We pass noalloc_get_block_write as the call back which ensures we don't
> do any block allocation in writepage callback. That is need because of
> the locking order between page lock and journal_start.
Looks fine.
Acked-by: Jan Kara <[email protected]>

Honza
>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
> ---
> fs/ext4/inode.c | 338 +++++++++++++++++--------------------------------------
> 1 files changed, 104 insertions(+), 234 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 1efb296..c1ddaaf 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2326,7 +2326,7 @@ static int __mpage_da_writepage(struct page *page,
> * We need to try to allocate
> * unmapped blocks in the same page.
> * Otherwise we won't make progress
> - * with the page in ext4_da_writepage
> + * with the page in ext4_writepage
> */
> if (ext4_bh_delay_or_unwritten(NULL, bh)) {
> mpage_add_bh_to_extent(mpd, logical,
> @@ -2452,14 +2452,102 @@ static int noalloc_get_block_write(struct inode *inode, sector_t iblock,
> return ret;
> }
>
> +static int bget_one(handle_t *handle, struct buffer_head *bh)
> +{
> + get_bh(bh);
> + return 0;
> +}
> +
> +static int bput_one(handle_t *handle, struct buffer_head *bh)
> +{
> + put_bh(bh);
> + return 0;
> +}
> +
> +static int __ext4_journalled_writepage(struct page *page,
> + struct writeback_control *wbc,
> + unsigned int len)
> +{
> + struct address_space *mapping = page->mapping;
> + struct inode *inode = mapping->host;
> + struct buffer_head *page_bufs;
> + handle_t *handle = NULL;
> + int ret = 0;
> + int err;
> +
> + page_bufs = page_buffers(page);
> + BUG_ON(!page_bufs);
> + walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
> + /* As soon as we unlock the page, it can go away, but we have
> + * references to buffers so we are safe */
> + unlock_page(page);
> +
> + handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + goto out;
> + }
> +
> + ret = walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, do_journal_get_write_access);
> +
> + err = walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, write_end_fn);
> + if (ret == 0)
> + ret = err;
> + err = ext4_journal_stop(handle);
> + if (!ret)
> + ret = err;
> +
> + walk_page_buffers(handle, page_bufs, 0, len,
> + NULL, bput_one);
> + EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
> +out:
> + return ret;
> +}
> +
> /*
> + * Note that we don't need to start a transaction unless we're journaling data
> + * because we should have holes filled from ext4_page_mkwrite(). We even don't
> + * need to file the inode to the transaction's list in ordered mode because if
> + * we are writing back data added by write(), the inode is already there and if
> + * we are writing back data modified via mmap(), noone guarantees in which
> + * transaction the data will hit the disk. In case we are journaling data, we
> + * cannot start transaction directly because transaction start ranks above page
> + * lock so we have to do some magic.
> + *
> * This function can get called via...
> * - ext4_da_writepages after taking page lock (have journal handle)
> * - journal_submit_inode_data_buffers (no journal handle)
> * - shrink_page_list via pdflush (no journal handle)
> * - grab_page_cache when doing write_begin (have journal handle)
> + *
> + * We don't do any block allocation in this function. If we have page with
> + * multiple blocks we need to write those buffer_heads that are mapped. This
> + * is important for mmaped based write. So if we do with blocksize 1K
> + * truncate(f, 1024);
> + * a = mmap(f, 0, 4096);
> + * a[0] = 'a';
> + * truncate(f, 4096);
> + * we have in the page first buffer_head mapped via page_mkwrite call back
> + * but other bufer_heads would be unmapped but dirty(dirty done via the
> + * do_wp_page). So writepage should write the first block. If we modify
> + * the mmap area beyond 1024 we will again get a page_fault and the
> + * page_mkwrite callback will do the block allocation and mark the
> + * buffer_heads mapped.
> + *
> + * We redirty the page if we have any buffer_heads that is either delay or
> + * unwritten in the page.
> + *
> + * We can get recursively called as show below.
> + *
> + * ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
> + * ext4_writepage()
> + *
> + * But since we don't do any block allocation we should not deadlock.
> + * Page also have the dirty flag cleared so we don't get recurive page_lock.
> */
> -static int ext4_da_writepage(struct page *page,
> +static int ext4_writepage(struct page *page,
> struct writeback_control *wbc)
> {
> int ret = 0;
> @@ -2468,7 +2556,7 @@ static int ext4_da_writepage(struct page *page,
> struct buffer_head *page_bufs;
> struct inode *inode = page->mapping->host;
>
> - trace_mark(ext4_da_writepage,
> + trace_mark(ext4_writepage,
> "dev %s ino %lu page_index %lu",
> inode->i_sb->s_id, inode->i_ino, page->index);
> size = i_size_read(inode);
> @@ -2532,6 +2620,15 @@ static int ext4_da_writepage(struct page *page,
> block_commit_write(page, 0, len);
> }
>
> + if (PageChecked(page) && ext4_should_journal_data(inode)) {
> + /*
> + * It's mmapped pagecache. Add buffers and journal it. There
> + * doesn't seem much point in redirtying the page here.
> + */
> + ClearPageChecked(page);
> + return __ext4_journalled_writepage(page, wbc, len);
> + }
> +
> if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
> ret = nobh_writepage(page, noalloc_get_block_write, wbc);
> else
> @@ -3085,233 +3182,6 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> return generic_block_bmap(mapping, block, ext4_get_block);
> }
>
> -static int bget_one(handle_t *handle, struct buffer_head *bh)
> -{
> - get_bh(bh);
> - return 0;
> -}
> -
> -static int bput_one(handle_t *handle, struct buffer_head *bh)
> -{
> - put_bh(bh);
> - return 0;
> -}
> -
> -/*
> - * Note that we don't need to start a transaction unless we're journaling data
> - * because we should have holes filled from ext4_page_mkwrite(). We even don't
> - * need to file the inode to the transaction's list in ordered mode because if
> - * we are writing back data added by write(), the inode is already there and if
> - * we are writing back data modified via mmap(), noone guarantees in which
> - * transaction the data will hit the disk. In case we are journaling data, we
> - * cannot start transaction directly because transaction start ranks above page
> - * lock so we have to do some magic.
> - *
> - * In all journaling modes block_write_full_page() will start the I/O.
> - *
> - * Problem:
> - *
> - * ext4_writepage() -> kmalloc() -> __alloc_pages() -> page_launder() ->
> - * ext4_writepage()
> - *
> - * Similar for:
> - *
> - * ext4_file_write() -> generic_file_write() -> __alloc_pages() -> ...
> - *
> - * Same applies to ext4_get_block(). We will deadlock on various things like
> - * lock_journal and i_data_sem
> - *
> - * Setting PF_MEMALLOC here doesn't work - too many internal memory
> - * allocations fail.
> - *
> - * 16May01: If we're reentered then journal_current_handle() will be
> - * non-zero. We simply *return*.
> - *
> - * 1 July 2001: @@@ FIXME:
> - * In journalled data mode, a data buffer may be metadata against the
> - * current transaction. But the same file is part of a shared mapping
> - * and someone does a writepage() on it.
> - *
> - * We will move the buffer onto the async_data list, but *after* it has
> - * been dirtied. So there's a small window where we have dirty data on
> - * BJ_Metadata.
> - *
> - * Note that this only applies to the last partial page in the file. The
> - * bit which block_write_full_page() uses prepare/commit for. (That's
> - * broken code anyway: it's wrong for msync()).
> - *
> - * It's a rare case: affects the final partial page, for journalled data
> - * where the file is subject to bith write() and writepage() in the same
> - * transction. To fix it we'll need a custom block_write_full_page().
> - * We'll probably need that anyway for journalling writepage() output.
> - *
> - * We don't honour synchronous mounts for writepage(). That would be
> - * disastrous. Any write() or metadata operation will sync the fs for
> - * us.
> - *
> - */
> -static int __ext4_normal_writepage(struct page *page,
> - struct writeback_control *wbc)
> -{
> - struct inode *inode = page->mapping->host;
> -
> - if (test_opt(inode->i_sb, NOBH))
> - return nobh_writepage(page, noalloc_get_block_write, wbc);
> - else
> - return block_write_full_page(page, noalloc_get_block_write,
> - wbc);
> -}
> -
> -static int ext4_normal_writepage(struct page *page,
> - struct writeback_control *wbc)
> -{
> - struct inode *inode = page->mapping->host;
> - loff_t size = i_size_read(inode);
> - loff_t len;
> -
> - trace_mark(ext4_normal_writepage,
> - "dev %s ino %lu page_index %lu",
> - inode->i_sb->s_id, inode->i_ino, page->index);
> - J_ASSERT(PageLocked(page));
> - if (page->index == size >> PAGE_CACHE_SHIFT)
> - len = size & ~PAGE_CACHE_MASK;
> - else
> - len = PAGE_CACHE_SIZE;
> -
> - if (page_has_buffers(page)) {
> - /* if page has buffers it should all be mapped
> - * and allocated. If there are not buffers attached
> - * to the page we know the page is dirty but it lost
> - * buffers. That means that at some moment in time
> - * after write_begin() / write_end() has been called
> - * all buffers have been clean and thus they must have been
> - * written at least once. So they are all mapped and we can
> - * happily proceed with mapping them and writing the page.
> - */
> - BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> - ext4_bh_delay_or_unwritten));
> - }
> -
> - if (!ext4_journal_current_handle())
> - return __ext4_normal_writepage(page, wbc);
> -
> - redirty_page_for_writepage(wbc, page);
> - unlock_page(page);
> - return 0;
> -}
> -
> -static int __ext4_journalled_writepage(struct page *page,
> - struct writeback_control *wbc)
> -{
> - loff_t size;
> - unsigned int len;
> - struct address_space *mapping = page->mapping;
> - struct inode *inode = mapping->host;
> - struct buffer_head *page_bufs;
> - handle_t *handle = NULL;
> - int ret = 0;
> - int err;
> -
> - size = i_size_read(inode);
> - if (page->index == size >> PAGE_CACHE_SHIFT)
> - len = size & ~PAGE_CACHE_MASK;
> - else
> - len = PAGE_CACHE_SIZE;
> -
> - ret = block_prepare_write(page, 0, len,
> - noalloc_get_block_write);
> - if (ret != 0)
> - goto out_unlock;
> -
> - page_bufs = page_buffers(page);
> - walk_page_buffers(handle, page_bufs, 0, len, NULL, bget_one);
> - /* As soon as we unlock the page, it can go away, but we have
> - * references to buffers so we are safe */
> - unlock_page(page);
> -
> - handle = ext4_journal_start(inode, ext4_writepage_trans_blocks(inode));
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - goto out;
> - }
> -
> - ret = walk_page_buffers(handle, page_bufs, 0, len,
> - NULL, do_journal_get_write_access);
> -
> - err = walk_page_buffers(handle, page_bufs, 0, len,
> - NULL, write_end_fn);
> - if (ret == 0)
> - ret = err;
> - err = ext4_journal_stop(handle);
> - if (!ret)
> - ret = err;
> -
> - walk_page_buffers(handle, page_bufs, 0, len,
> - NULL, bput_one);
> - EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
> - goto out;
> -
> -out_unlock:
> - unlock_page(page);
> -out:
> - return ret;
> -}
> -
> -static int ext4_journalled_writepage(struct page *page,
> - struct writeback_control *wbc)
> -{
> - struct inode *inode = page->mapping->host;
> - loff_t size = i_size_read(inode);
> - loff_t len;
> -
> - trace_mark(ext4_journalled_writepage,
> - "dev %s ino %lu page_index %lu",
> - inode->i_sb->s_id, inode->i_ino, page->index);
> - J_ASSERT(PageLocked(page));
> - if (page->index == size >> PAGE_CACHE_SHIFT)
> - len = size & ~PAGE_CACHE_MASK;
> - else
> - len = PAGE_CACHE_SIZE;
> -
> - if (page_has_buffers(page)) {
> - /* if page has buffers it should all be mapped
> - * and allocated. If there are not buffers attached
> - * to the page we know the page is dirty but it lost
> - * buffers. That means that at some moment in time
> - * after write_begin() / write_end() has been called
> - * all buffers have been clean and thus they must have been
> - * written at least once. So they are all mapped and we can
> - * happily proceed with mapping them and writing the page.
> - */
> - BUG_ON(walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> - ext4_bh_delay_or_unwritten));
> - }
> -
> - if (ext4_journal_current_handle())
> - goto no_write;
> -
> - if (PageChecked(page)) {
> - /*
> - * It's mmapped pagecache. Add buffers and journal it. There
> - * doesn't seem much point in redirtying the page here.
> - */
> - ClearPageChecked(page);
> - return __ext4_journalled_writepage(page, wbc);
> - } else {
> - /*
> - * It may be a page full of checkpoint-mode buffers. We don't
> - * really know unless we go poke around in the buffer_heads.
> - * But block_write_full_page will do the right thing.
> - */
> - return block_write_full_page(page, noalloc_get_block_write,
> - wbc);
> - }
> -no_write:
> - redirty_page_for_writepage(wbc, page);
> - unlock_page(page);
> - return 0;
> -}
> -
> static int ext4_readpage(struct file *file, struct page *page)
> {
> return mpage_readpage(page, ext4_get_block);
> @@ -3458,7 +3328,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
> static const struct address_space_operations ext4_ordered_aops = {
> .readpage = ext4_readpage,
> .readpages = ext4_readpages,
> - .writepage = ext4_normal_writepage,
> + .writepage = ext4_writepage,
> .sync_page = block_sync_page,
> .write_begin = ext4_write_begin,
> .write_end = ext4_ordered_write_end,
> @@ -3474,7 +3344,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
> static const struct address_space_operations ext4_writeback_aops = {
> .readpage = ext4_readpage,
> .readpages = ext4_readpages,
> - .writepage = ext4_normal_writepage,
> + .writepage = ext4_writepage,
> .sync_page = block_sync_page,
> .write_begin = ext4_write_begin,
> .write_end = ext4_writeback_write_end,
> @@ -3490,7 +3360,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
> static const struct address_space_operations ext4_journalled_aops = {
> .readpage = ext4_readpage,
> .readpages = ext4_readpages,
> - .writepage = ext4_journalled_writepage,
> + .writepage = ext4_writepage,
> .sync_page = block_sync_page,
> .write_begin = ext4_write_begin,
> .write_end = ext4_journalled_write_end,
> @@ -3505,7 +3375,7 @@ static int ext4_journalled_set_page_dirty(struct page *page)
> static const struct address_space_operations ext4_da_aops = {
> .readpage = ext4_readpage,
> .readpages = ext4_readpages,
> - .writepage = ext4_da_writepage,
> + .writepage = ext4_writepage,
> .writepages = ext4_da_writepages,
> .sync_page = block_sync_page,
> .write_begin = ext4_da_write_begin,
> --
> 1.6.3.1.145.gb74d77
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-05-28 09:49:51

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 2/3] ext4: Check for only delay or unwritten buffer_heads

On Thu, May 28, 2009 at 10:27:11AM +0200, Jan Kara wrote:
> On Wed 27-05-09 21:28:07, Aneesh Kumar K.V wrote:
> > Even with changes to make pages writeprotech on truncate/i_size update we
> > can still see buffer_heads which are not mapped in the writepage
> > callback. Consider the below case.
> >
> > 1) truncate(f, 1024)
> > 2) mmap(f, 0, 4096)
> > 3) a[0] = 'a'
> > 4) truncate(f, 4096)
> > 5) writepage(...)
> >
> > Now if we get a writepage callback immediately after (4) and before an
> > attempt to write at any other offset via mmap address (which implies we
> > are yet to get a pagefault and do a get_block) what we would have is the
> > page which is dirty have first block allocated and the other three
> > buffer_heads unmapped.
> >
> > In the above case the writepage should go ahead and try to write
> > the first blocks and clear the page_dirty flag. Because the further
> > attempt to write to the page will again create a fault and result in
> > allocating blocks and marking page dirty. Also if we don't write
> > any other offset via mmap address we would still have written the first
> > block to the disk and rest of the space will be considered as a hole.
> OK, but this requires my patches to not cause data loss, doesn't it?
> Nothing prevents user from writing data into the full page just after
> truncate(f, 4096) before the writepage is called. And without my patches,
> fault will not happen for such user write.
> Just that we should have this dependency in mind. Otherwise the patch
> looks fine to me.

Yes the entire series is on top of your patches

-aneesh

2009-06-03 22:53:14

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/3] ext4: Add generic writepage callback

Hi Aneesh,

In the future, *please* give me patches versus the patch queue. This
patch moves functions around, so applying it is painful in the
extreme, and it's very painful to check and see whether it undoes
patches later in the patch.

In fact, can you redo these patches (2/3 and 3/3), versus the latest
in the patch queue?

Many thanks,

- Ted

2009-06-03 22:57:35

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/3] ext4: Add generic writepage callback

On Wed, Jun 03, 2009 at 06:53:08PM -0400, Theodore Tso wrote:
> Hi Aneesh,
>
> In the future, *please* give me patches versus the patch queue. This
> patch moves functions around, so applying it is painful in the
> extreme, and it's very painful to check and see whether it undoes
> patches later in the patch.
>
> In fact, can you redo these patches (2/3 and 3/3), versus the latest
> in the patch queue?

P.S. And if you can avoid making the patch move functions around,
it'll make it a lot more likely that I'll be able to backport it to
the -stable series. In its current form it's essentially impossible
that I would ever consider trying to backport it to 2.6.29, or
(goodness help us, 2.6.27).

- Ted

2009-06-04 06:29:04

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH 3/3] ext4: Add generic writepage callback

On Wed, Jun 03, 2009 at 06:53:08PM -0400, Theodore Tso wrote:
> Hi Aneesh,
>
> In the future, *please* give me patches versus the patch queue. This
> patch moves functions around, so applying it is painful in the
> extreme, and it's very painful to check and see whether it undoes
> patches later in the patch.

I have done is against patch queue. After the stable patch. But I guess
some patches you added before the stable cause the patch apply to fail.

I have regenerated the patch and sent -v2 version. Also i split the
coding moving changes in to a new patch

>
> In fact, can you redo these patches (2/3 and 3/3), versus the latest
> in the patch queue?
>

Done. I sent you 4 patches which should be all you need to take.

-aneesh