2011-02-13 00:57:24

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 0/7] Simplify buffered write submissions, part II

This has been a background project I've been working on for the past
couple of weeks. It cleans up middle part of the buffered write pages a
bit. The main features of this patch are:

*) Every single patch removes more lines of code than it adds, with a
net total removal of nearly 100 lines of code from fs/ext4/inode.c
*) The ext4_da_writepages() codepath no longer abuses the
clear_page_dirty_for_io() function, which means we no longer
need to call redirty_page_for_writeback(). This removes
unneeded work, which is goodness.
*) We no longer start journal handles if they are not needed. This
should improve performance and improve SMP scalability on
parallel random write workloads when the journal is enabled
in the best way possible --- don't take locks when they aren't
needed!

There is still more cleanup that needs to be done, but since these
patches should improve performance by themselves, it seems worthwhile
for me to send these out as-is, and ask people to take a look. What do
you guys think?

- Ted

Theodore Ts'o (7):
ext4: fold __mpage_da_writepage() into write_cache_pages_da()
ext4: simple cleanups to write_cache_pages_da()
ext4: clear the dirty bit for a page in writeback at the last minute
ext4: remove page_skipped hackery in ext4_da_writepages()
ext4: don't lock the next page in write_cache_pages if not needed
ext4: move setup of the mpd structure to write_cache_pages_da()
ext4: move ext4_journal_start/stop to mpage_da_map_and_submit()

fs/ext4/ext4.h | 3 +-
fs/ext4/inode.c | 428 +++++++++++++++++++++----------------------------------
2 files changed, 167 insertions(+), 264 deletions(-)

--
1.7.3.1



2011-02-13 00:15:55

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 4/7] ext4: remove page_skipped hackery in ext4_da_writepages()

Because the ext4 page writeback codepath had been prematurely calling
clear_page_dirty_for_io(), if it turned out that a particular page
couldn't be written out during a particular pass of
write_cache_pages_da(), the page would have to get redirtied by
calling redirty_pages_for_writeback(). Not only was this wasted work,
but redirty_page_for_writeback() would increment wbc->pages_skipped to
signal to writeback_sb_inodes() that buffers were locked, and that it
should skip this inode until later.

Since this signal was incorrect in ext4's case --- which was caused by
ext4's historically incorrect use of write_cache_pages() ---
ext4_da_writepages() saved and restored wbc->skipped_pages to avoid
confusing writeback_sb_inodes().

Now that we've fixed ext4 to call clear_page_dirty_for_io() right
before initiating the page I/O, we can nuke the page_skipped
save/restore hackery, and breathe a sigh of relief.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 10 ----------
1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3eca465..6dfdc0e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2900,7 +2900,6 @@ static int ext4_da_writepages(struct address_space *mapping,
struct mpage_da_data mpd;
struct inode *inode = mapping->host;
int pages_written = 0;
- long pages_skipped;
unsigned int max_pages;
int range_cyclic, cycled = 1, io_done = 0;
int needed_blocks, ret = 0;
@@ -2986,8 +2985,6 @@ static int ext4_da_writepages(struct address_space *mapping,
mpd.wbc = wbc;
mpd.inode = mapping->host;

- pages_skipped = wbc->pages_skipped;
-
retry:
if (wbc->sync_mode == WB_SYNC_ALL)
tag_pages_for_writeback(mapping, index, end);
@@ -3047,7 +3044,6 @@ retry:
* and try again
*/
jbd2_journal_force_commit_nested(sbi->s_journal);
- wbc->pages_skipped = pages_skipped;
ret = 0;
} else if (ret == MPAGE_DA_EXTENT_TAIL) {
/*
@@ -3055,7 +3051,6 @@ retry:
* rest of the pages
*/
pages_written += mpd.pages_written;
- wbc->pages_skipped = pages_skipped;
ret = 0;
io_done = 1;
} else if (wbc->nr_to_write)
@@ -3073,11 +3068,6 @@ retry:
wbc->range_end = mapping->writeback_index - 1;
goto retry;
}
- if (pages_skipped != wbc->pages_skipped)
- ext4_msg(inode->i_sb, KERN_CRIT,
- "This should not happen leaving %s "
- "with nr_to_write = %ld ret = %d",
- __func__, wbc->nr_to_write, ret);

/* Update index */
wbc->range_cyclic = range_cyclic;
--
1.7.3.1


2011-02-13 00:16:00

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 5/7] ext4: don't lock the next page in write_cache_pages if not needed

If we have accumulated a contiguous region of memory to be written
out, and the next page can added to this region, don't bother locking
(and then unlocking the page) before writing out the memory. In the
unlikely event that the next page was being written back by some other
CPU, we can also skip waiting that page to finish writeback.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 27 ++++++++++-----------------
1 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6dfdc0e..2ac64e3 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2761,6 +2761,16 @@ static int write_cache_pages_da(struct address_space *mapping,

*done_index = page->index + 1;

+ /*
+ * If we can't merge this page, and we have
+ * accumulated an contiguous region, write it
+ */
+ if ((mpd->next_page != page->index) &&
+ (mpd->next_page != mpd->first_page)) {
+ mpage_da_map_and_submit(mpd);
+ goto ret_extent_tail;
+ }
+
lock_page(page);

/*
@@ -2784,25 +2794,8 @@ static int write_cache_pages_da(struct address_space *mapping,

BUG_ON(PageWriteback(page));

- /*
- * Can we merge this page to current extent?
- */
if (mpd->next_page != page->index) {
/*
- * Nope, we can't. So, we map
- * non-allocated blocks and start IO
- * on them
- */
- if (mpd->next_page != mpd->first_page) {
- mpage_da_map_and_submit(mpd);
- /*
- * skip rest of the page in the page_vec
- */
- unlock_page(page);
- goto ret_extent_tail;
- }

2011-02-13 00:57:24

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 1/7] ext4: fold __mpage_da_writepage() into write_cache_pages_da()

Fold the __mpage_da_writepage() function into write_cache_pages_da().
This will give us opportunities to clean up and simplify the resulting
code.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 206 ++++++++++++++++++++++++-------------------------------
1 files changed, 91 insertions(+), 115 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9f7f9e4..627729f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2438,102 +2438,6 @@ static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh)
}

/*
- * __mpage_da_writepage - finds extent of pages and blocks
- *
- * @page: page to consider
- * @wbc: not used, we just follow rules
- * @data: context
- *
- * The function finds extents of pages and scan them for all blocks.
- */
-static int __mpage_da_writepage(struct page *page,
- struct writeback_control *wbc,
- struct mpage_da_data *mpd)
-{
- struct inode *inode = mpd->inode;
- struct buffer_head *bh, *head;
- sector_t logical;
-
- /*
- * Can we merge this page to current extent?
- */
- if (mpd->next_page != page->index) {
- /*
- * Nope, we can't. So, we map non-allocated blocks
- * and start IO on them
- */
- if (mpd->next_page != mpd->first_page) {
- mpage_da_map_and_submit(mpd);
- /*
- * skip rest of the page in the page_vec
- */
- redirty_page_for_writepage(wbc, page);
- unlock_page(page);
- return MPAGE_DA_EXTENT_TAIL;
- }
-
- /*
- * Start next extent of pages ...
- */
- mpd->first_page = page->index;
-
- /*
- * ... and blocks
- */
- mpd->b_size = 0;
- mpd->b_state = 0;
- mpd->b_blocknr = 0;
- }
-
- mpd->next_page = page->index + 1;
- logical = (sector_t) page->index <<
- (PAGE_CACHE_SHIFT - inode->i_blkbits);
-
- if (!page_has_buffers(page)) {
- mpage_add_bh_to_extent(mpd, logical, PAGE_CACHE_SIZE,
- (1 << BH_Dirty) | (1 << BH_Uptodate));
- if (mpd->io_done)
- return MPAGE_DA_EXTENT_TAIL;
- } else {
- /*
- * Page with regular buffer heads, just add all dirty ones
- */
- head = page_buffers(page);
- bh = head;
- do {
- BUG_ON(buffer_locked(bh));
- /*
- * We need to try to allocate
- * unmapped blocks in the same page.
- * Otherwise we won't make progress
- * with the page in ext4_writepage
- */
- if (ext4_bh_delay_or_unwritten(NULL, bh)) {
- mpage_add_bh_to_extent(mpd, logical,
- bh->b_size,
- bh->b_state);
- if (mpd->io_done)
- return MPAGE_DA_EXTENT_TAIL;
- } else if (buffer_dirty(bh) && (buffer_mapped(bh))) {
- /*
- * mapped dirty buffer. We need to update
- * the b_state because we look at
- * b_state in mpage_da_map_blocks. We don't
- * update b_size because if we find an
- * unmapped buffer_head later we need to
- * use the b_state flag of that buffer_head.
- */
- if (mpd->b_size == 0)
- mpd->b_state = bh->b_state & BH_FLAGS;
- }
- logical++;
- } while ((bh = bh->b_this_page) != head);
- }
-
- return 0;
-}
-
-/*
* This is a special get_blocks_t callback which is used by
* ext4_da_write_begin(). It will either return mapped block or
* reserve space for a single block.
@@ -2811,18 +2715,17 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)

/*
* write_cache_pages_da - walk the list of dirty pages of the given
- * address space and call the callback function (which usually writes
- * the pages).
- *
- * This is a forked version of write_cache_pages(). Differences:
- * Range cyclic is ignored.
- * no_nrwrite_index_update is always presumed true
+ * address space and accumulate pages that need writing, and call
+ * mpage_da_map_and_submit to map the pages and then write them.
*/
static int write_cache_pages_da(struct address_space *mapping,
struct writeback_control *wbc,
struct mpage_da_data *mpd,
pgoff_t *done_index)
{
+ struct inode *inode = mpd->inode;
+ struct buffer_head *bh, *head;
+ sector_t logical;
int ret = 0;
int done = 0;
struct pagevec pvec;
@@ -2899,17 +2802,90 @@ continue_unlock:
if (!clear_page_dirty_for_io(page))
goto continue_unlock;

- ret = __mpage_da_writepage(page, wbc, mpd);
- if (unlikely(ret)) {
- if (ret == AOP_WRITEPAGE_ACTIVATE) {
+ /* BEGIN __mpage_da_writepage */
+
+ /*
+ * Can we merge this page to current extent?
+ */
+ if (mpd->next_page != page->index) {
+ /*
+ * Nope, we can't. So, we map
+ * non-allocated blocks and start IO
+ * on them
+ */
+ if (mpd->next_page != mpd->first_page) {
+ mpage_da_map_and_submit(mpd);
+ /*
+ * skip rest of the page in the page_vec
+ */
+ redirty_page_for_writepage(wbc, page);
unlock_page(page);
- ret = 0;
- } else {
- done = 1;
- break;
+ ret = MPAGE_DA_EXTENT_TAIL;
+ goto out;
+ }
+
+ /*
+ * Start next extent of pages and blocks
+ */
+ mpd->first_page = page->index;
+ mpd->b_size = 0;
+ mpd->b_state = 0;
+ mpd->b_blocknr = 0;
+ }
+
+ mpd->next_page = page->index + 1;
+ logical = (sector_t) page->index <<
+ (PAGE_CACHE_SHIFT - inode->i_blkbits);
+
+ if (!page_has_buffers(page)) {
+ mpage_add_bh_to_extent(mpd, logical, PAGE_CACHE_SIZE,
+ (1 << BH_Dirty) | (1 << BH_Uptodate));
+ if (mpd->io_done) {
+ ret = MPAGE_DA_EXTENT_TAIL;
+ goto out;
}
+ } else {
+ /*
+ * Page with regular buffer heads, just add all dirty ones
+ */
+ head = page_buffers(page);
+ bh = head;
+ do {
+ BUG_ON(buffer_locked(bh));
+ /*
+ * We need to try to allocate
+ * unmapped blocks in the same page.
+ * Otherwise we won't make progress
+ * with the page in ext4_writepage
+ */
+ if (ext4_bh_delay_or_unwritten(NULL, bh)) {
+ mpage_add_bh_to_extent(mpd, logical,
+ bh->b_size,
+ bh->b_state);
+ if (mpd->io_done) {
+ ret = MPAGE_DA_EXTENT_TAIL;
+ goto out;
+ }
+ } else if (buffer_dirty(bh) && (buffer_mapped(bh))) {
+ /*
+ * mapped dirty buffer. We need to update
+ * the b_state because we look at
+ * b_state in mpage_da_map_blocks. We don't
+ * update b_size because if we find an
+ * unmapped buffer_head later we need to
+ * use the b_state flag of that buffer_head.
+ */
+ if (mpd->b_size == 0)
+ mpd->b_state = bh->b_state & BH_FLAGS;
+ }
+ logical++;
+ } while ((bh = bh->b_this_page) != head);
}

+ ret = 0;
+
+ /* END __mpage_da_writepage */
+
if (nr_to_write > 0) {
nr_to_write--;
if (nr_to_write == 0 &&
@@ -2933,6 +2909,10 @@ continue_unlock:
cond_resched();
}
return ret;
+out:
+ pagevec_release(&pvec);
+ cond_resched();
+ return ret;
}


@@ -3059,13 +3039,9 @@ retry:
}

/*
- * Now call __mpage_da_writepage to find the next
+ * Now call write_cache_pages_da() to find the next
* contiguous region of logical blocks that need
- * blocks to be allocated by ext4. We don't actually
- * submit the blocks for I/O here, even though
- * write_cache_pages thinks it will, and will set the
- * pages as clean for write before calling
- * __mpage_da_writepage().
+ * blocks to be allocated by ext4 and submit them.
*/
mpd.b_size = 0;
mpd.b_state = 0;
--
1.7.3.1


2011-02-13 00:57:28

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 3/7] ext4: clear the dirty bit for a page in writeback at the last minute

Move when we call clear_page_dirty_for_io() to just before we actually
write the page. This simplifies the code somewhat, and avoids marking
pages as clean and then needing to remark them as dirty later.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 28 +++++++++++-----------------
1 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e230f4f..3eca465 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2060,7 +2060,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
if (nr_pages == 0)
break;
for (i = 0; i < nr_pages; i++) {
- int commit_write = 0, redirty_page = 0;
+ int commit_write = 0, skip_page = 0;
struct page *page = pvec.pages[i];

index = page->index;
@@ -2086,14 +2086,12 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
* If the page does not have buffers (for
* whatever reason), try to create them using
* __block_write_begin. If this fails,
- * redirty the page and move on.
+ * skip the page and move on.
*/
if (!page_has_buffers(page)) {
if (__block_write_begin(page, 0, len,
noalloc_get_block_write)) {
- redirty_page:
- redirty_page_for_writepage(mpd->wbc,
- page);
+ skip_page:
unlock_page(page);
continue;
}
@@ -2104,7 +2102,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
block_start = 0;
do {
if (!bh)
- goto redirty_page;
+ goto skip_page;
if (map && (cur_logical >= map->m_lblk) &&
(cur_logical <= (map->m_lblk +
(map->m_len - 1)))) {
@@ -2120,22 +2118,23 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
clear_buffer_unwritten(bh);
}

- /* redirty page if block allocation undone */
+ /* skip page if block allocation undone */
if (buffer_delay(bh) || buffer_unwritten(bh))
- redirty_page = 1;
+ skip_page = 1;
bh = bh->b_this_page;
block_start += bh->b_size;
cur_logical++;
pblock++;
} while (bh != page_bufs);

- if (redirty_page)
- goto redirty_page;
+ if (skip_page)
+ goto skip_page;

if (commit_write)
/* mark the buffer_heads as dirty & uptodate */
block_commit_write(page, 0, len);

+ clear_page_dirty_for_io(page);
/*
* Delalloc doesn't support data journalling,
* but eventually maybe we'll lift this
@@ -2279,9 +2278,8 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
err = blks;
/*
* If get block returns EAGAIN or ENOSPC and there
- * appears to be free blocks we will call
- * ext4_writepage() for all of the pages which will
- * just redirty the pages.
+ * appears to be free blocks we will just let
+ * mpage_da_submit_io() unlock all of the pages.
*/
if (err == -EAGAIN)
goto submit_io;
@@ -2777,7 +2775,6 @@ static int write_cache_pages_da(struct address_space *mapping,
(PageWriteback(page) &&
(wbc->sync_mode == WB_SYNC_NONE)) ||
unlikely(page->mapping != mapping)) {
- continue_unlock:
unlock_page(page);
continue;
}
@@ -2786,8 +2783,6 @@ static int write_cache_pages_da(struct address_space *mapping,
wait_on_page_writeback(page);

BUG_ON(PageWriteback(page));
- if (!clear_page_dirty_for_io(page))
- goto continue_unlock;

/*
* Can we merge this page to current extent?
@@ -2803,7 +2798,6 @@ static int write_cache_pages_da(struct address_space *mapping,
/*
* skip rest of the page in the page_vec
*/
- redirty_page_for_writepage(wbc, page);
unlock_page(page);
goto ret_extent_tail;
}
--
1.7.3.1


2011-02-13 00:57:28

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 2/7] ext4: simple cleanups to write_cache_pages_da()

Eliminate duplicate code, unneeded variables, etc., to make it easier
to understand the code. No behavioral changes were made in this patch.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 115 +++++++++++++++++++++++--------------------------------
1 files changed, 48 insertions(+), 67 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 627729f..e230f4f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2723,17 +2723,14 @@ static int write_cache_pages_da(struct address_space *mapping,
struct mpage_da_data *mpd,
pgoff_t *done_index)
{
- struct inode *inode = mpd->inode;
- struct buffer_head *bh, *head;
- sector_t logical;
- int ret = 0;
- int done = 0;
- struct pagevec pvec;
- unsigned nr_pages;
- pgoff_t index;
- pgoff_t end; /* Inclusive */
- long nr_to_write = wbc->nr_to_write;
- int tag;
+ struct buffer_head *bh, *head;
+ struct inode *inode = mpd->inode;
+ struct pagevec pvec;
+ unsigned int nr_pages;
+ sector_t logical;
+ pgoff_t index, end;
+ long nr_to_write = wbc->nr_to_write;
+ int i, tag, ret = 0;

pagevec_init(&pvec, 0);
index = wbc->range_start >> PAGE_CACHE_SHIFT;
@@ -2745,13 +2742,11 @@ static int write_cache_pages_da(struct address_space *mapping,
tag = PAGECACHE_TAG_DIRTY;

*done_index = index;
- while (!done && (index <= end)) {
- int i;
-
+ while (index <= end) {
nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
if (nr_pages == 0)
- break;
+ return 0;

for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
@@ -2763,47 +2758,37 @@ static int write_cache_pages_da(struct address_space *mapping,
* mapping. However, page->index will not change
* because we have a reference on the page.
*/
- if (page->index > end) {
- done = 1;
- break;
- }
+ if (page->index > end)
+ goto out;

*done_index = page->index + 1;

lock_page(page);

/*
- * Page truncated or invalidated. We can freely skip it
- * then, even for data integrity operations: the page
- * has disappeared concurrently, so there could be no
- * real expectation of this data interity operation
- * even if there is now a new, dirty page at the same
- * pagecache address.
+ * If the page is no longer dirty, or its
+ * mapping no longer corresponds to inode we
+ * are writing (which means it has been
+ * truncated or invalidated), or the page is
+ * already under writeback and we are not
+ * doing a data integrity writeback, skip the page
*/
- if (unlikely(page->mapping != mapping)) {
-continue_unlock:
+ if (!PageDirty(page) ||
+ (PageWriteback(page) &&
+ (wbc->sync_mode == WB_SYNC_NONE)) ||
+ unlikely(page->mapping != mapping)) {
+ continue_unlock:
unlock_page(page);
continue;
}

- if (!PageDirty(page)) {
- /* someone wrote it for us */
- goto continue_unlock;
- }
-
- if (PageWriteback(page)) {
- if (wbc->sync_mode != WB_SYNC_NONE)
- wait_on_page_writeback(page);
- else
- goto continue_unlock;
- }
+ if (PageWriteback(page))
+ wait_on_page_writeback(page);

BUG_ON(PageWriteback(page));
if (!clear_page_dirty_for_io(page))
goto continue_unlock;

- /* BEGIN __mpage_da_writepage */
-
/*
* Can we merge this page to current extent?
*/
@@ -2820,8 +2805,7 @@ continue_unlock:
*/
redirty_page_for_writepage(wbc, page);
unlock_page(page);
- ret = MPAGE_DA_EXTENT_TAIL;
- goto out;
+ goto ret_extent_tail;
}

/*
@@ -2838,15 +2822,15 @@ continue_unlock:
(PAGE_CACHE_SHIFT - inode->i_blkbits);

if (!page_has_buffers(page)) {
- mpage_add_bh_to_extent(mpd, logical, PAGE_CACHE_SIZE,
+ mpage_add_bh_to_extent(mpd, logical,
+ PAGE_CACHE_SIZE,
(1 << BH_Dirty) | (1 << BH_Uptodate));
- if (mpd->io_done) {
- ret = MPAGE_DA_EXTENT_TAIL;
- goto out;
- }
+ if (mpd->io_done)
+ goto ret_extent_tail;
} else {
/*
- * Page with regular buffer heads, just add all dirty ones
+ * Page with regular buffer heads,
+ * just add all dirty ones
*/
head = page_buffers(page);
bh = head;
@@ -2862,18 +2846,19 @@ continue_unlock:
mpage_add_bh_to_extent(mpd, logical,
bh->b_size,
bh->b_state);
- if (mpd->io_done) {
- ret = MPAGE_DA_EXTENT_TAIL;
- goto out;
- }
+ if (mpd->io_done)
+ goto ret_extent_tail;
} else if (buffer_dirty(bh) && (buffer_mapped(bh))) {
/*
- * mapped dirty buffer. We need to update
- * the b_state because we look at
- * b_state in mpage_da_map_blocks. We don't
- * update b_size because if we find an
- * unmapped buffer_head later we need to
- * use the b_state flag of that buffer_head.
+ * mapped dirty buffer. We need
+ * to update the b_state
+ * because we look at b_state
+ * in mpage_da_map_blocks. We
+ * don't update b_size because
+ * if we find an unmapped
+ * buffer_head later we need to
+ * use the b_state flag of that
+ * buffer_head.
*/
if (mpd->b_size == 0)
mpd->b_state = bh->b_state & BH_FLAGS;
@@ -2882,14 +2867,10 @@ continue_unlock:
} while ((bh = bh->b_this_page) != head);
}

- ret = 0;
-
- /* END __mpage_da_writepage */
-
if (nr_to_write > 0) {
nr_to_write--;
if (nr_to_write == 0 &&
- wbc->sync_mode == WB_SYNC_NONE) {
+ wbc->sync_mode == WB_SYNC_NONE)
/*
* We stop writing back only if we are
* not doing integrity sync. In case of
@@ -2900,15 +2881,15 @@ continue_unlock:
* pages, but have not synced all of the
* old dirty pages.
*/
- done = 1;
- break;
- }
+ goto out;
}
}
pagevec_release(&pvec);
cond_resched();
}
- return ret;
+ return 0;
+ret_extent_tail:
+ ret = MPAGE_DA_EXTENT_TAIL;
out:
pagevec_release(&pvec);
cond_resched();
--
1.7.3.1


2011-02-13 00:57:29

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 6/7] ext4: move setup of the mpd structure to write_cache_pages_da()

Move the initialization of all of the fields of the mpd structure to
write_cache_pages_da(). This simplifies the code considerably.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/inode.c | 29 +++++++----------------------
1 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2ac64e3..235a90e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2714,7 +2714,8 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
/*
* write_cache_pages_da - walk the list of dirty pages of the given
* address space and accumulate pages that need writing, and call
- * mpage_da_map_and_submit to map the pages and then write them.
+ * mpage_da_map_and_submit to map a single contiguous memory region
+ * and then write them.
*/
static int write_cache_pages_da(struct address_space *mapping,
struct writeback_control *wbc,
@@ -2722,7 +2723,7 @@ static int write_cache_pages_da(struct address_space *mapping,
pgoff_t *done_index)
{
struct buffer_head *bh, *head;
- struct inode *inode = mpd->inode;
+ struct inode *inode = mapping->host;
struct pagevec pvec;
unsigned int nr_pages;
sector_t logical;
@@ -2730,6 +2731,9 @@ static int write_cache_pages_da(struct address_space *mapping,
long nr_to_write = wbc->nr_to_write;
int i, tag, ret = 0;

+ memset(mpd, 0, sizeof(struct mpage_da_data));
+ mpd->wbc = wbc;
+ mpd->inode = inode;
pagevec_init(&pvec, 0);
index = wbc->range_start >> PAGE_CACHE_SHIFT;
end = wbc->range_end >> PAGE_CACHE_SHIFT;
@@ -2794,16 +2798,8 @@ static int write_cache_pages_da(struct address_space *mapping,

BUG_ON(PageWriteback(page));

- if (mpd->next_page != page->index) {
- /*
- * Start next extent of pages and blocks
- */
+ if (mpd->next_page != page->index)
mpd->first_page = page->index;
- mpd->b_size = 0;
- mpd->b_state = 0;
- mpd->b_blocknr = 0;
- }
-
mpd->next_page = page->index + 1;
logical = (sector_t) page->index <<
(PAGE_CACHE_SHIFT - inode->i_blkbits);
@@ -2975,9 +2971,6 @@ static int ext4_da_writepages(struct address_space *mapping,
wbc->nr_to_write = desired_nr_to_write;
}

- mpd.wbc = wbc;
- mpd.inode = mapping->host;

2011-02-13 00:57:24

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH,RFC 7/7] ext4: move ext4_journal_start/stop to mpage_da_map_and_submit()

Previously, ext4_da_writepages() was responsible for calling
ext4_journal_start() and ext4_journal_stop(). If the blocks had
already been allocated (we don't support journal=data in
ext4_da_writepages), then there's no need to start a new journal
handle.

By moving ext4_journal_start/stop calls to mpage_da_map_and_submit()
we should significantly reduce the cpu usage (and cache line bouncing)
if the journal is enabled. This should (hopefully!) be especially
noticeable on large SMP systems.

Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 3 +-
fs/ext4/inode.c | 125 ++++++++++++++++++++++++-------------------------------
2 files changed, 56 insertions(+), 72 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3aa0b72..be5c9e7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -164,7 +164,8 @@ struct mpage_da_data {
unsigned long b_state; /* state of the extent */
unsigned long first_page, next_page; /* extent of pages */
struct writeback_control *wbc;
- int io_done;
+ int io_done:1;
+ int stop_writepages:1;
int pages_written;
int retval;
};
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 235a90e..ad1dc38 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2225,12 +2225,13 @@ static void ext4_print_free_blocks(struct inode *inode)
*/
static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
{
- int err, blks, get_blocks_flags;
+ int err, blks, get_blocks_flags, needed_blocks;
struct ext4_map_blocks map, *mapp = NULL;
sector_t next = mpd->b_blocknr;
unsigned max_blocks = mpd->b_size >> mpd->inode->i_blkbits;
loff_t disksize = EXT4_I(mpd->inode)->i_disksize;
- handle_t *handle = NULL;
+ struct inode *inode = mpd->inode;
+ handle_t *handle;

/*
* If the blocks are mapped already, or we couldn't accumulate
@@ -2242,8 +2243,28 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
!(mpd->b_state & (1 << BH_Unwritten))))
goto submit_io;

- handle = ext4_journal_current_handle();
- BUG_ON(!handle);
+ /*
+ * Calculate the number of journal credits needed. In the
+ * non-extent case, the journal credits needed to insert
+ * nrblocks contiguous blocks is dependent on number of
+ * contiguous blocks. So we will limit this value to a sane
+ * value.
+ */
+ needed_blocks = EXT4_I(inode)->i_reserved_data_blocks;
+ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) &&
+ (needed_blocks > EXT4_MAX_TRANS_DATA))
+ needed_blocks = EXT4_MAX_TRANS_DATA;
+ needed_blocks = ext4_chunk_trans_blocks(inode, needed_blocks);
+
+ /* start a new transaction */
+ handle = ext4_journal_start(inode, needed_blocks);
+ if (IS_ERR(handle)) {
+ ext4_msg(inode->i_sb, KERN_CRIT, "%s: jbd2_start: "
+ "%ld pages, ino %lu; err %ld", __func__,
+ mpd->wbc->nr_to_write, inode->i_ino, PTR_ERR(handle));
+ mpd->stop_writepages = 1;
+ goto submit_io;
+ }

/*
* Call ext4_map_blocks() to allocate any delayed allocation
@@ -2266,15 +2287,16 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
map.m_lblk = next;
map.m_len = max_blocks;
get_blocks_flags = EXT4_GET_BLOCKS_CREATE;
- if (ext4_should_dioread_nolock(mpd->inode))
+ if (ext4_should_dioread_nolock(inode))
get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT;
if (mpd->b_state & (1 << BH_Delay))
get_blocks_flags |= EXT4_GET_BLOCKS_DELALLOC_RESERVE;

- blks = ext4_map_blocks(handle, mpd->inode, &map, get_blocks_flags);
+ blks = ext4_map_blocks(handle, inode, &map, get_blocks_flags);
if (blks < 0) {
- struct super_block *sb = mpd->inode->i_sb;
+ struct super_block *sb = inode->i_sb;

+ ext4_journal_stop(handle);
err = blks;
/*
* If get block returns EAGAIN or ENOSPC and there
@@ -2301,32 +2323,32 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
ext4_msg(sb, KERN_CRIT,
"delayed block allocation failed for inode %lu "
"at logical offset %llu with max blocks %zd "
- "with error %d", mpd->inode->i_ino,
+ "with error %d", inode->i_ino,
(unsigned long long) next,
- mpd->b_size >> mpd->inode->i_blkbits, err);
+ mpd->b_size >> inode->i_blkbits, err);
ext4_msg(sb, KERN_CRIT,
"This should not happen!! Data will be lost\n");
if (err == -ENOSPC)
- ext4_print_free_blocks(mpd->inode);
+ ext4_print_free_blocks(inode);
}
/* invalidate all the pages */
ext4_da_block_invalidatepages(mpd, next,
- mpd->b_size >> mpd->inode->i_blkbits);
+ mpd->b_size >> inode->i_blkbits);
return;
}
BUG_ON(blks == 0);

mapp = &map;
if (map.m_flags & EXT4_MAP_NEW) {
- struct block_device *bdev = mpd->inode->i_sb->s_bdev;
+ struct block_device *bdev = inode->i_sb->s_bdev;
int i;

for (i = 0; i < map.m_len; i++)
unmap_underlying_metadata(bdev, map.m_pblk + i);
}

- if (ext4_should_order_data(mpd->inode)) {
- err = ext4_jbd2_file_inode(handle, mpd->inode);
+ if (ext4_should_order_data(inode)) {
+ err = ext4_jbd2_file_inode(handle, inode);
if (err)
/* This only happens if the journal is aborted */
return;
@@ -2335,19 +2357,24 @@ static void mpage_da_map_and_submit(struct mpage_da_data *mpd)
/*
* Update on-disk size along with block allocation.
*/
- disksize = ((loff_t) next + blks) << mpd->inode->i_blkbits;
- if (disksize > i_size_read(mpd->inode))
- disksize = i_size_read(mpd->inode);
- if (disksize > EXT4_I(mpd->inode)->i_disksize) {
- ext4_update_i_disksize(mpd->inode, disksize);
- err = ext4_mark_inode_dirty(handle, mpd->inode);
+ disksize = ((loff_t) next + blks) << inode->i_blkbits;
+ if (disksize > i_size_read(inode))
+ disksize = i_size_read(inode);
+ if (disksize > EXT4_I(inode)->i_disksize) {
+ ext4_update_i_disksize(inode, disksize);
+ err = ext4_mark_inode_dirty(handle, inode);
if (err)
- ext4_error(mpd->inode->i_sb,
+ ext4_error(inode->i_sb,
"Failed to mark inode %lu dirty",
- mpd->inode->i_ino);
+ inode->i_ino);
}
+ ext4_journal_stop(handle);

submit_io:
+ /*
+ * This also doubles as the the way we unlock all of the pages
+ * in case of an error. Hacky, but it works...
+ */
mpage_da_submit_io(mpd, mapp);
mpd->io_done = 1;
}
@@ -2687,31 +2714,6 @@ static int ext4_writepage(struct page *page,
}

/*
- * This is called via ext4_da_writepages() to
- * calulate the total number of credits to reserve to fit
- * a single extent allocation into a single transaction,
- * ext4_da_writpeages() will loop calling this before
- * the block allocation.
- */
-
-static int ext4_da_writepages_trans_blocks(struct inode *inode)
-{
- int max_blocks = EXT4_I(inode)->i_reserved_data_blocks;
-
- /*
- * With non-extent format the journal credit needed to
- * insert nrblocks contiguous block is dependent on
- * number of contiguous block. So we will limit
- * number of contiguous block to a sane value
- */
- if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) &&
- (max_blocks > EXT4_MAX_TRANS_DATA))
- max_blocks = EXT4_MAX_TRANS_DATA;
-
- return ext4_chunk_trans_blocks(inode, max_blocks);
-}
-
-/*
* write_cache_pages_da - walk the list of dirty pages of the given
* address space and accumulate pages that need writing, and call
* mpage_da_map_and_submit to map a single contiguous memory region
@@ -2885,13 +2887,12 @@ static int ext4_da_writepages(struct address_space *mapping,
{
pgoff_t index;
int range_whole = 0;
- handle_t *handle = NULL;
struct mpage_da_data mpd;
struct inode *inode = mapping->host;
int pages_written = 0;
unsigned int max_pages;
int range_cyclic, cycled = 1, io_done = 0;
- int needed_blocks, ret = 0;
+ int ret = 0;
long desired_nr_to_write, nr_to_writebump = 0;
loff_t range_start = wbc->range_start;
struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
@@ -2899,6 +2900,7 @@ static int ext4_da_writepages(struct address_space *mapping,
pgoff_t end;

trace_ext4_da_writepages(inode, wbc);
+ BUG_ON(ext4_should_journal_data(inode));

/*
* No pages to write? This is mainly a kludge to avoid starting
@@ -2976,28 +2978,8 @@ retry:
tag_pages_for_writeback(mapping, index, end);

while (!ret && wbc->nr_to_write > 0) {
-
/*
- * we insert one extent at a time. So we need
- * credit needed for single extent allocation.
- * journalled mode is currently not supported
- * by delalloc
- */
- BUG_ON(ext4_should_journal_data(inode));
- needed_blocks = ext4_da_writepages_trans_blocks(inode);
-
- /* start a new transaction*/
- handle = ext4_journal_start(inode, needed_blocks);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- ext4_msg(inode->i_sb, KERN_CRIT, "%s: jbd2_start: "
- "%ld pages, ino %lu; err %d", __func__,
- wbc->nr_to_write, inode->i_ino, ret);
- goto out_writepages;
- }

2011-02-13 01:30:20

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 1/7] ext4: fold __mpage_da_writepage() into write_cache_pages_da()

On Sat, Feb 12, 2011 at 07:15:51PM -0500, Theodore Ts'o wrote:
> Fold the __mpage_da_writepage() function into write_cache_pages_da().
> This will give us opportunities to clean up and simplify the resulting
> code.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 206 ++++++++++++++++++++++++-------------------------------
> 1 files changed, 91 insertions(+), 115 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9f7f9e4..627729f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2438,102 +2438,6 @@ static int ext4_bh_delay_or_unwritten(handle_t *handle, struct buffer_head *bh)
> }
>
> /*
> - * __mpage_da_writepage - finds extent of pages and blocks
> - *
> - * @page: page to consider
> - * @wbc: not used, we just follow rules
> - * @data: context
> - *
> - * The function finds extents of pages and scan them for all blocks.
> - */
> -static int __mpage_da_writepage(struct page *page,
> - struct writeback_control *wbc,
> - struct mpage_da_data *mpd)
> -{
> - struct inode *inode = mpd->inode;
> - struct buffer_head *bh, *head;
> - sector_t logical;
> -
> - /*
> - * Can we merge this page to current extent?
> - */
> - if (mpd->next_page != page->index) {
> - /*
> - * Nope, we can't. So, we map non-allocated blocks
> - * and start IO on them
> - */
> - if (mpd->next_page != mpd->first_page) {
> - mpage_da_map_and_submit(mpd);
> - /*
> - * skip rest of the page in the page_vec
> - */
> - redirty_page_for_writepage(wbc, page);
> - unlock_page(page);
> - return MPAGE_DA_EXTENT_TAIL;
> - }
> -
> - /*
> - * Start next extent of pages ...
> - */
> - mpd->first_page = page->index;
> -
> - /*
> - * ... and blocks
> - */
> - mpd->b_size = 0;
> - mpd->b_state = 0;
> - mpd->b_blocknr = 0;
> - }
> -
> - mpd->next_page = page->index + 1;
> - logical = (sector_t) page->index <<
> - (PAGE_CACHE_SHIFT - inode->i_blkbits);
> -
> - if (!page_has_buffers(page)) {
> - mpage_add_bh_to_extent(mpd, logical, PAGE_CACHE_SIZE,
> - (1 << BH_Dirty) | (1 << BH_Uptodate));
> - if (mpd->io_done)
> - return MPAGE_DA_EXTENT_TAIL;
> - } else {
> - /*
> - * Page with regular buffer heads, just add all dirty ones
> - */
> - head = page_buffers(page);
> - bh = head;
> - do {
> - BUG_ON(buffer_locked(bh));
> - /*
> - * We need to try to allocate
> - * unmapped blocks in the same page.
> - * Otherwise we won't make progress
> - * with the page in ext4_writepage
> - */
> - if (ext4_bh_delay_or_unwritten(NULL, bh)) {
> - mpage_add_bh_to_extent(mpd, logical,
> - bh->b_size,
> - bh->b_state);
> - if (mpd->io_done)
> - return MPAGE_DA_EXTENT_TAIL;
> - } else if (buffer_dirty(bh) && (buffer_mapped(bh))) {
> - /*
> - * mapped dirty buffer. We need to update
> - * the b_state because we look at
> - * b_state in mpage_da_map_blocks. We don't
> - * update b_size because if we find an
> - * unmapped buffer_head later we need to
> - * use the b_state flag of that buffer_head.
> - */
> - if (mpd->b_size == 0)
> - mpd->b_state = bh->b_state & BH_FLAGS;
> - }
> - logical++;
> - } while ((bh = bh->b_this_page) != head);
> - }
> -
> - return 0;
> -}
> -
> -/*
> * This is a special get_blocks_t callback which is used by
> * ext4_da_write_begin(). It will either return mapped block or
> * reserve space for a single block.
> @@ -2811,18 +2715,17 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
>
> /*
> * write_cache_pages_da - walk the list of dirty pages of the given
> - * address space and call the callback function (which usually writes
> - * the pages).
> - *
> - * This is a forked version of write_cache_pages(). Differences:
> - * Range cyclic is ignored.
> - * no_nrwrite_index_update is always presumed true
> + * address space and accumulate pages that need writing, and call
> + * mpage_da_map_and_submit to map the pages and then write them.
> */
> static int write_cache_pages_da(struct address_space *mapping,
> struct writeback_control *wbc,
> struct mpage_da_data *mpd,
> pgoff_t *done_index)
> {
> + struct inode *inode = mpd->inode;
> + struct buffer_head *bh, *head;
> + sector_t logical;
> int ret = 0;
> int done = 0;
> struct pagevec pvec;
> @@ -2899,17 +2802,90 @@ continue_unlock:
> if (!clear_page_dirty_for_io(page))
> goto continue_unlock;
>
> - ret = __mpage_da_writepage(page, wbc, mpd);
> - if (unlikely(ret)) {
> - if (ret == AOP_WRITEPAGE_ACTIVATE) {
> + /* BEGIN __mpage_da_writepage */
> +
> + /*
> + * Can we merge this page to current extent?
> + */
> + if (mpd->next_page != page->index) {
> + /*
> + * Nope, we can't. So, we map
> + * non-allocated blocks and start IO
> + * on them
> + */
> + if (mpd->next_page != mpd->first_page) {
> + mpage_da_map_and_submit(mpd);
> + /*
> + * skip rest of the page in the page_vec
> + */
> + redirty_page_for_writepage(wbc, page);
> unlock_page(page);
> - ret = 0;
> - } else {
> - done = 1;
> - break;
> + ret = MPAGE_DA_EXTENT_TAIL;
> + goto out;
> + }
> +
> + /*
> + * Start next extent of pages and blocks
> + */
> + mpd->first_page = page->index;
> + mpd->b_size = 0;
> + mpd->b_state = 0;
> + mpd->b_blocknr = 0;
> + }
> +
> + mpd->next_page = page->index + 1;
> + logical = (sector_t) page->index <<
> + (PAGE_CACHE_SHIFT - inode->i_blkbits);
> +
> + if (!page_has_buffers(page)) {
> + mpage_add_bh_to_extent(mpd, logical, PAGE_CACHE_SIZE,
> + (1 << BH_Dirty) | (1 << BH_Uptodate));
> + if (mpd->io_done) {
> + ret = MPAGE_DA_EXTENT_TAIL;
> + goto out;
> }
> + } else {
> + /*
> + * Page with regular buffer heads, just add all dirty ones
> + */
> + head = page_buffers(page);
> + bh = head;
> + do {
> + BUG_ON(buffer_locked(bh));
> + /*
> + * We need to try to allocate
> + * unmapped blocks in the same page.
> + * Otherwise we won't make progress
> + * with the page in ext4_writepage
> + */
> + if (ext4_bh_delay_or_unwritten(NULL, bh)) {
> + mpage_add_bh_to_extent(mpd, logical,
> + bh->b_size,
> + bh->b_state);
> + if (mpd->io_done) {
> + ret = MPAGE_DA_EXTENT_TAIL;
> + goto out;
> + }
> + } else if (buffer_dirty(bh) && (buffer_mapped(bh))) {
> + /*
> + * mapped dirty buffer. We need to update
> + * the b_state because we look at
> + * b_state in mpage_da_map_blocks. We don't
> + * update b_size because if we find an
> + * unmapped buffer_head later we need to
> + * use the b_state flag of that buffer_head.
> + */
> + if (mpd->b_size == 0)
> + mpd->b_state = bh->b_state & BH_FLAGS;
> + }
> + logical++;
> + } while ((bh = bh->b_this_page) != head);
> }
>
> + ret = 0;
> +
> + /* END __mpage_da_writepage */
> +
> if (nr_to_write > 0) {
> nr_to_write--;
> if (nr_to_write == 0 &&
> @@ -2933,6 +2909,10 @@ continue_unlock:
> cond_resched();
> }
> return ret;
> +out:
> + pagevec_release(&pvec);
> + cond_resched();
> + return ret;
> }

Do we really need the cond_resched() here? Seems like it will just add
unwanted/uneeded latencies. Thanks,

Josef

2011-02-13 01:35:56

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 2/7] ext4: simple cleanups to write_cache_pages_da()

On Sat, Feb 12, 2011 at 07:15:52PM -0500, Theodore Ts'o wrote:
> Eliminate duplicate code, unneeded variables, etc., to make it easier
> to understand the code. No behavioral changes were made in this patch.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 115 +++++++++++++++++++++++--------------------------------
> 1 files changed, 48 insertions(+), 67 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 627729f..e230f4f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2723,17 +2723,14 @@ static int write_cache_pages_da(struct address_space *mapping,
> struct mpage_da_data *mpd,
> pgoff_t *done_index)
> {
> - struct inode *inode = mpd->inode;
> - struct buffer_head *bh, *head;
> - sector_t logical;
> - int ret = 0;
> - int done = 0;
> - struct pagevec pvec;
> - unsigned nr_pages;
> - pgoff_t index;
> - pgoff_t end; /* Inclusive */
> - long nr_to_write = wbc->nr_to_write;
> - int tag;
> + struct buffer_head *bh, *head;
> + struct inode *inode = mpd->inode;
> + struct pagevec pvec;
> + unsigned int nr_pages;
> + sector_t logical;
> + pgoff_t index, end;
> + long nr_to_write = wbc->nr_to_write;
> + int i, tag, ret = 0;
>
> pagevec_init(&pvec, 0);
> index = wbc->range_start >> PAGE_CACHE_SHIFT;
> @@ -2745,13 +2742,11 @@ static int write_cache_pages_da(struct address_space *mapping,
> tag = PAGECACHE_TAG_DIRTY;
>
> *done_index = index;
> - while (!done && (index <= end)) {
> - int i;
> -
> + while (index <= end) {
> nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
> min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
> if (nr_pages == 0)
> - break;
> + return 0;
>
> for (i = 0; i < nr_pages; i++) {
> struct page *page = pvec.pages[i];
> @@ -2763,47 +2758,37 @@ static int write_cache_pages_da(struct address_space *mapping,
> * mapping. However, page->index will not change
> * because we have a reference on the page.
> */
> - if (page->index > end) {
> - done = 1;
> - break;
> - }
> + if (page->index > end)
> + goto out;
>
> *done_index = page->index + 1;
>
> lock_page(page);
>
> /*
> - * Page truncated or invalidated. We can freely skip it
> - * then, even for data integrity operations: the page
> - * has disappeared concurrently, so there could be no
> - * real expectation of this data interity operation
> - * even if there is now a new, dirty page at the same
> - * pagecache address.
> + * If the page is no longer dirty, or its
> + * mapping no longer corresponds to inode we
> + * are writing (which means it has been
> + * truncated or invalidated), or the page is
> + * already under writeback and we are not
> + * doing a data integrity writeback, skip the page
> */
> - if (unlikely(page->mapping != mapping)) {
> -continue_unlock:
> + if (!PageDirty(page) ||
> + (PageWriteback(page) &&
> + (wbc->sync_mode == WB_SYNC_NONE)) ||
> + unlikely(page->mapping != mapping)) {
> + continue_unlock:

Formatting is wrong here. Everything else looks fine. Thanks,

Josef

2011-02-13 01:39:44

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 3/7] ext4: clear the dirty bit for a page in writeback at the last minute

On Sat, Feb 12, 2011 at 07:15:53PM -0500, Theodore Ts'o wrote:
> Move when we call clear_page_dirty_for_io() to just before we actually
> write the page. This simplifies the code somewhat, and avoids marking
> pages as clean and then needing to remark them as dirty later.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 28 +++++++++++-----------------
> 1 files changed, 11 insertions(+), 17 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e230f4f..3eca465 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2060,7 +2060,7 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
> if (nr_pages == 0)
> break;
> for (i = 0; i < nr_pages; i++) {
> - int commit_write = 0, redirty_page = 0;
> + int commit_write = 0, skip_page = 0;
> struct page *page = pvec.pages[i];
>
> index = page->index;
> @@ -2086,14 +2086,12 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
> * If the page does not have buffers (for
> * whatever reason), try to create them using
> * __block_write_begin. If this fails,
> - * redirty the page and move on.
> + * skip the page and move on.
> */
> if (!page_has_buffers(page)) {
> if (__block_write_begin(page, 0, len,
> noalloc_get_block_write)) {
> - redirty_page:
> - redirty_page_for_writepage(mpd->wbc,
> - page);
> + skip_page:

Hmm so it looks like it's been done like this before. I guess if thats the way
you want it then it's ok, I just find it hard to read. Other than that this
looks good. Thanks,

Josef

2011-02-13 01:41:49

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 4/7] ext4: remove page_skipped hackery in ext4_da_writepages()

On Sat, Feb 12, 2011 at 07:15:54PM -0500, Theodore Ts'o wrote:
> Because the ext4 page writeback codepath had been prematurely calling
> clear_page_dirty_for_io(), if it turned out that a particular page
> couldn't be written out during a particular pass of
> write_cache_pages_da(), the page would have to get redirtied by
> calling redirty_pages_for_writeback(). Not only was this wasted work,
> but redirty_page_for_writeback() would increment wbc->pages_skipped to
> signal to writeback_sb_inodes() that buffers were locked, and that it
> should skip this inode until later.
>
> Since this signal was incorrect in ext4's case --- which was caused by
> ext4's historically incorrect use of write_cache_pages() ---
> ext4_da_writepages() saved and restored wbc->skipped_pages to avoid
> confusing writeback_sb_inodes().
>
> Now that we've fixed ext4 to call clear_page_dirty_for_io() right
> before initiating the page I/O, we can nuke the page_skipped
> save/restore hackery, and breathe a sigh of relief.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 10 ----------
> 1 files changed, 0 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3eca465..6dfdc0e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2900,7 +2900,6 @@ static int ext4_da_writepages(struct address_space *mapping,
> struct mpage_da_data mpd;
> struct inode *inode = mapping->host;
> int pages_written = 0;
> - long pages_skipped;
> unsigned int max_pages;
> int range_cyclic, cycled = 1, io_done = 0;
> int needed_blocks, ret = 0;
> @@ -2986,8 +2985,6 @@ static int ext4_da_writepages(struct address_space *mapping,
> mpd.wbc = wbc;
> mpd.inode = mapping->host;
>
> - pages_skipped = wbc->pages_skipped;
> -
> retry:
> if (wbc->sync_mode == WB_SYNC_ALL)
> tag_pages_for_writeback(mapping, index, end);
> @@ -3047,7 +3044,6 @@ retry:
> * and try again
> */
> jbd2_journal_force_commit_nested(sbi->s_journal);
> - wbc->pages_skipped = pages_skipped;
> ret = 0;
> } else if (ret == MPAGE_DA_EXTENT_TAIL) {
> /*
> @@ -3055,7 +3051,6 @@ retry:
> * rest of the pages
> */
> pages_written += mpd.pages_written;
> - wbc->pages_skipped = pages_skipped;
> ret = 0;
> io_done = 1;
> } else if (wbc->nr_to_write)
> @@ -3073,11 +3068,6 @@ retry:
> wbc->range_end = mapping->writeback_index - 1;
> goto retry;
> }
> - if (pages_skipped != wbc->pages_skipped)
> - ext4_msg(inode->i_sb, KERN_CRIT,
> - "This should not happen leaving %s "
> - "with nr_to_write = %ld ret = %d",
> - __func__, wbc->nr_to_write, ret);
>
> /* Update index */
> wbc->range_cyclic = range_cyclic;

Looks good.

Josef

2011-02-13 01:42:58

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 5/7] ext4: don't lock the next page in write_cache_pages if not needed

On Sat, Feb 12, 2011 at 07:15:55PM -0500, Theodore Ts'o wrote:
> If we have accumulated a contiguous region of memory to be written
> out, and the next page can added to this region, don't bother locking
> (and then unlocking the page) before writing out the memory. In the
> unlikely event that the next page was being written back by some other
> CPU, we can also skip waiting that page to finish writeback.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 27 ++++++++++-----------------
> 1 files changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 6dfdc0e..2ac64e3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2761,6 +2761,16 @@ static int write_cache_pages_da(struct address_space *mapping,
>
> *done_index = page->index + 1;
>
> + /*
> + * If we can't merge this page, and we have
> + * accumulated an contiguous region, write it
> + */
> + if ((mpd->next_page != page->index) &&
> + (mpd->next_page != mpd->first_page)) {
> + mpage_da_map_and_submit(mpd);
> + goto ret_extent_tail;
> + }
> +
> lock_page(page);
>
> /*
> @@ -2784,25 +2794,8 @@ static int write_cache_pages_da(struct address_space *mapping,
>
> BUG_ON(PageWriteback(page));
>
> - /*
> - * Can we merge this page to current extent?
> - */
> if (mpd->next_page != page->index) {
> /*
> - * Nope, we can't. So, we map
> - * non-allocated blocks and start IO
> - * on them
> - */
> - if (mpd->next_page != mpd->first_page) {
> - mpage_da_map_and_submit(mpd);
> - /*
> - * skip rest of the page in the page_vec
> - */
> - unlock_page(page);
> - goto ret_extent_tail;
> - }
> -
> - /*
> * Start next extent of pages and blocks
> */
> mpd->first_page = page->index;

Looks good.

Josef

2011-02-13 01:44:52

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 6/7] ext4: move setup of the mpd structure to write_cache_pages_da()

On Sat, Feb 12, 2011 at 07:15:56PM -0500, Theodore Ts'o wrote:
> Move the initialization of all of the fields of the mpd structure to
> write_cache_pages_da(). This simplifies the code considerably.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> ---
> fs/ext4/inode.c | 29 +++++++----------------------
> 1 files changed, 7 insertions(+), 22 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2ac64e3..235a90e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2714,7 +2714,8 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
> /*
> * write_cache_pages_da - walk the list of dirty pages of the given
> * address space and accumulate pages that need writing, and call
> - * mpage_da_map_and_submit to map the pages and then write them.
> + * mpage_da_map_and_submit to map a single contiguous memory region
> + * and then write them.
> */
> static int write_cache_pages_da(struct address_space *mapping,
> struct writeback_control *wbc,
> @@ -2722,7 +2723,7 @@ static int write_cache_pages_da(struct address_space *mapping,
> pgoff_t *done_index)
> {
> struct buffer_head *bh, *head;
> - struct inode *inode = mpd->inode;
> + struct inode *inode = mapping->host;
> struct pagevec pvec;
> unsigned int nr_pages;
> sector_t logical;
> @@ -2730,6 +2731,9 @@ static int write_cache_pages_da(struct address_space *mapping,
> long nr_to_write = wbc->nr_to_write;
> int i, tag, ret = 0;
>
> + memset(mpd, 0, sizeof(struct mpage_da_data));
> + mpd->wbc = wbc;
> + mpd->inode = inode;
> pagevec_init(&pvec, 0);
> index = wbc->range_start >> PAGE_CACHE_SHIFT;
> end = wbc->range_end >> PAGE_CACHE_SHIFT;
> @@ -2794,16 +2798,8 @@ static int write_cache_pages_da(struct address_space *mapping,
>
> BUG_ON(PageWriteback(page));
>
> - if (mpd->next_page != page->index) {
> - /*
> - * Start next extent of pages and blocks
> - */
> + if (mpd->next_page != page->index)
> mpd->first_page = page->index;
> - mpd->b_size = 0;
> - mpd->b_state = 0;
> - mpd->b_blocknr = 0;
> - }
> -
> mpd->next_page = page->index + 1;
> logical = (sector_t) page->index <<
> (PAGE_CACHE_SHIFT - inode->i_blkbits);
> @@ -2975,9 +2971,6 @@ static int ext4_da_writepages(struct address_space *mapping,
> wbc->nr_to_write = desired_nr_to_write;
> }
>
> - mpd.wbc = wbc;
> - mpd.inode = mapping->host;
> -
> retry:
> if (wbc->sync_mode == WB_SYNC_ALL)
> tag_pages_for_writeback(mapping, index, end);
> @@ -3008,14 +3001,6 @@ retry:
> * contiguous region of logical blocks that need
> * blocks to be allocated by ext4 and submit them.
> */
> - mpd.b_size = 0;
> - mpd.b_state = 0;
> - mpd.b_blocknr = 0;
> - mpd.first_page = 0;
> - mpd.next_page = 0;
> - mpd.io_done = 0;
> - mpd.pages_written = 0;
> - mpd.retval = 0;
> ret = write_cache_pages_da(mapping, wbc, &mpd, &done_index);
> /*
> * If we have a contiguous extent of pages and we

Looks good,

Josef

2011-02-13 05:42:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH,RFC 1/7] ext4: fold __mpage_da_writepage() into write_cache_pages_da()

On Sat, Feb 12, 2011 at 08:25:29PM -0500, Josef Bacik wrote:
> > +out:
> > + pagevec_release(&pvec);
> > + cond_resched();
> > + return ret;
> > }
>
> Do we really need the cond_resched() here? Seems like it will just add
> unwanted/uneeded latencies.

The cond_resched is from the original write_cache_pages(), and if you
follow the code movement, it goes all the way back to fs/mpage.c's
__mpage_writepages() from 2.6.11 (the beginning of time as far as the
Linux 2.6's git repository is concerned).

The basic idea is that given that writeback threads are basically
running in a tight loop trying to push out dirty pages, you need to
eventually give other processes a chance to run --- especially on a UP
system! I do wonder whether we are checking way too much, though.
The cond_resched() I'd be tempted to take out is not the one at the
end of the function, but the one at the end of the while loop.

That would allow us to complete the the writeback for a particular
inode before letting another process run, which would trade off
efficiency for a bit more scheduling unfairness. But given that a
particular writeback call is capped at writing out a relatively small
mount of data anyway, that would seem to be OK.

But even XFS has a cond_resched in xfs_cluster_write() (in
fs/xfs/linux-2.6/xfs_aops.c) so I'd want to do a lot of thinking,
testing, and benchmarking before removing that call to cond_resched().

- Ted

2011-02-13 12:53:12

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH,RFC 1/7] ext4: fold __mpage_da_writepage() into write_cache_pages_da()

On Sun, Feb 13, 2011 at 12:42:35AM -0500, Ted Ts'o wrote:
> On Sat, Feb 12, 2011 at 08:25:29PM -0500, Josef Bacik wrote:
> > > +out:
> > > + pagevec_release(&pvec);
> > > + cond_resched();
> > > + return ret;
> > > }
> >
> > Do we really need the cond_resched() here? Seems like it will just add
> > unwanted/uneeded latencies.
>
> The cond_resched is from the original write_cache_pages(), and if you
> follow the code movement, it goes all the way back to fs/mpage.c's
> __mpage_writepages() from 2.6.11 (the beginning of time as far as the
> Linux 2.6's git repository is concerned).
>
> The basic idea is that given that writeback threads are basically
> running in a tight loop trying to push out dirty pages, you need to
> eventually give other processes a chance to run --- especially on a UP
> system! I do wonder whether we are checking way too much, though.
> The cond_resched() I'd be tempted to take out is not the one at the
> end of the function, but the one at the end of the while loop.
>
> That would allow us to complete the the writeback for a particular
> inode before letting another process run, which would trade off
> efficiency for a bit more scheduling unfairness. But given that a
> particular writeback call is capped at writing out a relatively small
> mount of data anyway, that would seem to be OK.
>
> But even XFS has a cond_resched in xfs_cluster_write() (in
> fs/xfs/linux-2.6/xfs_aops.c) so I'd want to do a lot of thinking,
> testing, and benchmarking before removing that call to cond_resched().
>

Ah I didn't look at anybody else. My thinking was we only really need it in one
place, and we have it in the while() loop. But you are right, it probably makes
more sense to drop the one in the while loop and then have it before we go back
to the main writeback code. Thanks,

Josef

2011-02-18 04:23:55

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH,RFC 7/7] ext4: move ext4_journal_start/stop to mpage_da_map_and_submit()

On Sat, Feb 12, 2011 at 07:15:57PM -0500, Theodore Ts'o wrote:
> Previously, ext4_da_writepages() was responsible for calling
> ext4_journal_start() and ext4_journal_stop(). If the blocks had
> already been allocated (we don't support journal=data in
> ext4_da_writepages), then there's no need to start a new journal
> handle.
>
> By moving ext4_journal_start/stop calls to mpage_da_map_and_submit()
> we should significantly reduce the cpu usage (and cache line bouncing)
> if the journal is enabled. This should (hopefully!) be especially
> noticeable on large SMP systems.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>

Argh, it turns out this doesn't work. I was getting sporadic
deadlocks and I finally figured out the problem. If a process is
holding page locks, it can't call ext4_journal_start() safely in
data=ordered, since there's a chance that there won't be enough
transaction credits and a new transaction will be started. And at
that point, in data=ordered mode, we may end up calling
journal_submit_inode_data_buffers(), which could try to write back the
inode pages in question --- which are already locked.

This means that we need to start the journal handle long before we
know whether or not we really need it. Boo, hiss!

The only way to solve this problem is to do what I've been planning
all for a while, which is to add support in ext4_map_blocks() for a
mode where it will allocate a region of blocks, but *not* update the
extent map. It will have to store the allocation in an in-memory
cache, so that if other CPU's try to request a logical block, it will
get the right answer. However, the actual on-disk extent map can't be
updated until *after* the data is safely written on disk (and the
pages can thus be unlocked).

Once we do that, we'll also be able to ditch ordered mode for good,
since it means that there won't be any chance of stale data being
revealed, without any of performance disasters involved with
data=ordered mode.

I have no idea what these changes will do to Amir's snapshot plans,
but sorry, getting this right is going to be higher priority.

I may end up submitting the rest of this patch series without this
last patch, since it does clean up the code paths a lot, and it should
result in a few small performance improvements --- the big performance
improvement, found in this patch, we'll have to skip until we can fix
up the writeback submission.

- Ted

2011-02-18 10:42:14

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH,RFC 7/7] ext4: move ext4_journal_start/stop to mpage_da_map_and_submit()

On Fri, Feb 18, 2011 at 6:23 AM, Ted Ts'o <[email protected]> wrote:
> On Sat, Feb 12, 2011 at 07:15:57PM -0500, Theodore Ts'o wrote:
>> Previously, ext4_da_writepages() was responsible for calling
>> ext4_journal_start() and ext4_journal_stop(). ?If the blocks had
>> already been allocated (we don't support journal=data in
>> ext4_da_writepages), then there's no need to start a new journal
>> handle.
>>
>> By moving ext4_journal_start/stop calls to mpage_da_map_and_submit()
>> we should significantly reduce the cpu usage (and cache line bouncing)
>> if the journal is enabled. ?This should (hopefully!) be especially
>> noticeable on large SMP systems.
>>
>> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> Argh, it turns out this doesn't work. ?I was getting sporadic
> deadlocks and I finally figured out the problem. ?If a process is
> holding page locks, it can't call ext4_journal_start() safely in
> data=ordered, since there's a chance that there won't be enough
> transaction credits and a new transaction will be started. ?And at
> that point, in data=ordered mode, we may end up calling
> journal_submit_inode_data_buffers(), which could try to write back the
> inode pages in question --- which are already locked.
>
> This means that we need to start the journal handle long before we
> know whether or not we really need it. ?Boo, hiss!
>
> The only way to solve this problem is to do what I've been planning
> all for a while, which is to add support in ext4_map_blocks() for a
> mode where it will allocate a region of blocks, but *not* update the
> extent map. ?It will have to store the allocation in an in-memory
> cache, so that if other CPU's try to request a logical block, it will
> get the right answer. ?However, the actual on-disk extent map can't be
> updated until *after* the data is safely written on disk (and the
> pages can thus be unlocked).
>
> Once we do that, we'll also be able to ditch ordered mode for good,
> since it means that there won't be any chance of stale data being
> revealed, without any of performance disasters involved with
> data=ordered mode.
>
> I have no idea what these changes will do to Amir's snapshot plans,
> but sorry, getting this right is going to be higher priority.

If anything, memory-only data allocations would be a great contribution
to extent data move-on-write :-)

It would allow me to split the extent in-memory and defer the decision,
whether to split the extent on-disk or wait for copy-on-write to complete,
to data writeback time.

By that time, async copy-on-write sequence may have already completed
and fragmentation can be avoided.

If you are looking for someone to execute your plan, or write some
experimental code, I think that Yongqiang would be up for the task
(hope that's OK with Yongqiang)

>
> I may end up submitting the rest of this patch series without this
> last patch, since it does clean up the code paths a lot, and it should
> result in a few small performance improvements --- the big performance
> improvement, found in this patch, we'll have to skip until we can fix
> up the writeback submission.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-02-18 11:44:06

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH,RFC 7/7] ext4: move ext4_journal_start/stop to mpage_da_map_and_submit()

On Fri, Feb 18, 2011 at 6:42 PM, Amir Goldstein <[email protected]> wrote:
>
> On Fri, Feb 18, 2011 at 6:23 AM, Ted Ts'o <[email protected]> wrote:
> > On Sat, Feb 12, 2011 at 07:15:57PM -0500, Theodore Ts'o wrote:
> >> Previously, ext4_da_writepages() was responsible for calling
> >> ext4_journal_start() and ext4_journal_stop(). ?If the blocks had
> >> already been allocated (we don't support journal=data in
> >> ext4_da_writepages), then there's no need to start a new journal
> >> handle.
> >>
> >> By moving ext4_journal_start/stop calls to mpage_da_map_and_submit()
> >> we should significantly reduce the cpu usage (and cache line bouncing)
> >> if the journal is enabled. ?This should (hopefully!) be especially
> >> noticeable on large SMP systems.
> >>
> >> Signed-off-by: "Theodore Ts'o" <[email protected]>
> >
> > Argh, it turns out this doesn't work. ?I was getting sporadic
> > deadlocks and I finally figured out the problem. ?If a process is
> > holding page locks, it can't call ext4_journal_start() safely in
> > data=ordered, since there's a chance that there won't be enough
> > transaction credits and a new transaction will be started. ?And at
> > that point, in data=ordered mode, we may end up calling
> > journal_submit_inode_data_buffers(), which could try to write back the
> > inode pages in question --- which are already locked.
> >
> > This means that we need to start the journal handle long before we
> > know whether or not we really need it. ?Boo, hiss!
> >
> > The only way to solve this problem is to do what I've been planning
> > all for a while, which is to add support in ext4_map_blocks() for a
> > mode where it will allocate a region of blocks, but *not* update the
> > extent map. ?It will have to store the allocation in an in-memory
> > cache, so that if other CPU's try to request a logical block, it will
> > get the right answer. ?However, the actual on-disk extent map can't be
> > updated until *after* the data is safely written on disk (and the
> > pages can thus be unlocked).
> >
> > Once we do that, we'll also be able to ditch ordered mode for good,
> > since it means that there won't be any chance of stale data being
> > revealed, without any of performance disasters involved with
> > data=ordered mode.
> >
> > I have no idea what these changes will do to Amir's snapshot plans,
> > but sorry, getting this right is going to be higher priority.
>
> If anything, memory-only data allocations would be a great contribution
> to extent data move-on-write :-)
>
> It would allow me to split the extent in-memory and defer the decision,
> whether to split the extent on-disk or wait for copy-on-write to complete,
> to data writeback time.
>
> By that time, async copy-on-write sequence may have already completed
> and fragmentation can be avoided.
>
> If you are looking for someone to execute your plan, or write some
> experimental code, I think that Yongqiang would be up for the task
> (hope that's OK with Yongqiang)
No problem with me.
>
> >
> > I may end up submitting the rest of this patch series without this
> > last patch, since it does clean up the code paths a lot, and it should
> > result in a few small performance improvements --- the big performance
> > improvement, found in this patch, we'll have to skip until we can fix
> > up the writeback submission.
> >
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> >



--
Best Wishes
Yongqiang Yang