Hi all,
this is basically a evolution of the series Matthew Wilcox originally
set in June. Based on comments from Jan a Brian this now actually
untangles some of the more confusing conditional in the writeback code
before refactoring it into the iterator. Because of that all the
later patches need a fair amount of rebasing and I've not carried any
reviewed-by over.
The original cover letter is below:
Dave Howells doesn't like the indirect function call imposed by
write_cache_pages(), so refactor it into an iterator. I took the
opportunity to add the ability to iterate a folio_batch without having
an external variable.
This is against next-20230623. If you try to apply it on top of a tree
which doesn't include the pagevec removal series, IT WILL CRASH because
it won't reinitialise folio_batch->i and the iteration will index out
of bounds.
I have a feeling the 'done' parameter could have a better name, but I
can't think what it might be.
Diffstat:
include/linux/pagevec.h | 18 ++
include/linux/writeback.h | 19 ++
mm/page-writeback.c | 333 +++++++++++++++++++++++++---------------------
3 files changed, 220 insertions(+), 150 deletions(-)
From: "Matthew Wilcox (Oracle)" <[email protected]>
Instead of having a 'done' variable that controls the nested loops,
have a writeback_finish() that can be returned directly. This involves
keeping more things in writeback_control, but it's just moving stuff
allocated on the stack to being allocated slightly earlier on the stack.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
[hch: heavily rebased, reordered and commented struct writeback_control]
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/writeback.h | 6 +++
mm/page-writeback.c | 79 ++++++++++++++++++++-------------------
2 files changed, 47 insertions(+), 38 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 833ec38fc3e0c9..390f2dd03cf27e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -11,6 +11,7 @@
#include <linux/flex_proportions.h>
#include <linux/backing-dev-defs.h>
#include <linux/blk_types.h>
+#include <linux/pagevec.h>
struct bio;
@@ -40,6 +41,7 @@ enum writeback_sync_modes {
* in a manner such that unspecified fields are set to zero.
*/
struct writeback_control {
+ /* public fields that can be set and/or consumed by the caller: */
long nr_to_write; /* Write this many pages, and decrement
this for each page written */
long pages_skipped; /* Pages which were not written */
@@ -77,6 +79,10 @@ struct writeback_control {
*/
struct swap_iocb **swap_plug;
+ /* internal fields used by the ->writepages implementation: */
+ struct folio_batch fbatch;
+ int err;
+
#ifdef CONFIG_CGROUP_WRITEBACK
struct bdi_writeback *wb; /* wb this writeback is issued under */
struct inode *inode; /* inode being written out */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c798c0d6d0abb4..564d5faf562ba7 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,6 +2360,29 @@ void tag_pages_for_writeback(struct address_space *mapping,
}
EXPORT_SYMBOL(tag_pages_for_writeback);
+static void writeback_finish(struct address_space *mapping,
+ struct writeback_control *wbc, pgoff_t done_index)
+{
+ folio_batch_release(&wbc->fbatch);
+
+ /*
+ * For range cyclic writeback we need to remember where we stopped so
+ * that we can continue there next time we are called. If we hit the
+ * last page and there is more work to be done, wrap back to the start
+ * of the file.
+ *
+ * For non-cyclic writeback we always start looking up at the beginning
+ * of the file if we are called again, which can only happen due to
+ * -ENOMEM from the file system.
+ */
+ if (wbc->range_cyclic) {
+ if (wbc->err || wbc->nr_to_write <= 0)
+ mapping->writeback_index = done_index;
+ else
+ mapping->writeback_index = 0;
+ }
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2395,17 +2418,12 @@ int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data)
{
- int ret = 0;
- int done = 0;
int error;
- struct folio_batch fbatch;
int nr_folios;
pgoff_t index;
pgoff_t end; /* Inclusive */
- pgoff_t done_index;
xa_mark_t tag;
- folio_batch_init(&fbatch);
if (wbc->range_cyclic) {
index = mapping->writeback_index; /* prev offset */
end = -1;
@@ -2419,22 +2437,23 @@ int write_cache_pages(struct address_space *mapping,
} else {
tag = PAGECACHE_TAG_DIRTY;
}
- done_index = index;
- while (!done && (index <= end)) {
+
+ folio_batch_init(&wbc->fbatch);
+ wbc->err = 0;
+
+ while (index <= end) {
int i;
nr_folios = filemap_get_folios_tag(mapping, &index, end,
- tag, &fbatch);
+ tag, &wbc->fbatch);
if (nr_folios == 0)
break;
for (i = 0; i < nr_folios; i++) {
- struct folio *folio = fbatch.folios[i];
+ struct folio *folio = wbc->fbatch.folios[i];
unsigned long nr;
- done_index = folio->index;
-
folio_lock(folio);
/*
@@ -2481,6 +2500,9 @@ int write_cache_pages(struct address_space *mapping,
folio_unlock(folio);
error = 0;
}
+
+ if (error && !wbc->err)
+ wbc->err = error;
/*
* For integrity sync we have to keep going until we
@@ -2496,38 +2518,19 @@ int write_cache_pages(struct address_space *mapping,
* off and media errors won't choke writeout for the
* entire file.
*/
- if (error && !ret)
- ret = error;
- if (wbc->sync_mode == WB_SYNC_NONE) {
- if (ret || wbc->nr_to_write <= 0) {
- done_index = folio->index + nr;
- done = 1;
- break;
- }
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ (wbc->err || wbc->nr_to_write <= 0)) {
+ writeback_finish(mapping, wbc,
+ folio->index + nr);
+ return error;
}
}
- folio_batch_release(&fbatch);
+ folio_batch_release(&wbc->fbatch);
cond_resched();
}
- /*
- * For range cyclic writeback we need to remember where we stopped so
- * that we can continue there next time we are called. If we hit the
- * last page and there is more work to be done, wrap back to the start
- * of the file.
- *
- * For non-cyclic writeback we always start looking up at the beginning
- * of the file if we are called again, which can only happen due to
- * -ENOMEM from the file system.
- */
- if (wbc->range_cyclic) {
- if (done)
- mapping->writeback_index = done_index;
- else
- mapping->writeback_index = 0;
- }
-
- return ret;
+ writeback_finish(mapping, wbc, 0);
+ return 0;
}
EXPORT_SYMBOL(write_cache_pages);
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
[hch: split from a larger patch]
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/writeback.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 083387c00f0c8b..833ec38fc3e0c9 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -364,8 +364,6 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb);
typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
void *data);
-void tag_pages_for_writeback(struct address_space *mapping,
- pgoff_t start, pgoff_t end);
int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data);
--
2.39.2
Rework we deal with the cleanup after the writepage call. First handle
the magic AOP_WRITEPAGE_ACTIVATE separately from real error returns to
get it out of the way of the actual error handling. Then merge the
code to set ret for integrity vs non-integrity writeback. For
non-integrity writeback the loop is terminated on the first error, so
ret will never be non-zero. Then use a single block to check for
non-integrity writewack to consolidate the cases where it returns for
either an error or running off the end of nr_to_write.
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 62 +++++++++++++++++++++------------------------
1 file changed, 29 insertions(+), 33 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 8e312d73475646..7ed6c2bc8dd51c 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2474,43 +2474,39 @@ int write_cache_pages(struct address_space *mapping,
error = writepage(folio, wbc, data);
nr = folio_nr_pages(folio);
wbc->nr_to_write -= nr;
- if (unlikely(error)) {
- /*
- * Handle errors according to the type of
- * writeback. There's no need to continue for
- * background writeback. Just push done_index
- * past this page so media errors won't choke
- * writeout for the entire file. For integrity
- * writeback, we must process the entire dirty
- * set regardless of errors because the fs may
- * still have state to clear for each page. In
- * that case we continue processing and return
- * the first error.
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- } else if (wbc->sync_mode != WB_SYNC_ALL) {
- ret = error;
- done_index = folio->index + nr;
- done = 1;
- break;
- }
- if (!ret)
- ret = error;
+
+ /*
+ * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return
+ * value. Eventually all instances should just unlock
+ * the folio themselves and return 0;
+ */
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
}
/*
- * We stop writing back only if we are not doing
- * integrity sync. In case of integrity sync we have to
- * keep going until we have written all the pages
- * we tagged for writeback prior to entering this loop.
+ * For integrity sync we have to keep going until we
+ * have written all the folios we tagged for writeback
+ * prior to entering this loop, even if we run past
+ * wbc->nr_to_write or encounter errors. This is
+ * because the file system may still have state to clear
+ * for each folio. We'll eventually return the first
+ * error encountered.
+ *
+ * For background writeback just push done_index past
+ * this folio so that we can just restart where we left
+ * off and media errors won't choke writeout for the
+ * entire file.
*/
- done_index = folio->index + nr;
- if (wbc->nr_to_write <= 0 &&
- wbc->sync_mode == WB_SYNC_NONE) {
- done = 1;
- break;
+ if (error && !ret)
+ ret = error;
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (ret || wbc->nr_to_write <= 0) {
+ done_index = folio->index + nr;
+ done = 1;
+ break;
+ }
}
}
folio_batch_release(&fbatch);
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Reduce write_cache_pages() by about 30 lines; much of it is commentary,
but it all bundles nicely into an obvious function.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 59 ++++++++++++++++++++++++---------------------
1 file changed, 32 insertions(+), 27 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 798e5264dc0353..2a004c0b9bdfbf 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2406,6 +2406,36 @@ static void writeback_get_batch(struct address_space *mapping,
wbc_to_tag(wbc), &wbc->fbatch);
}
+static bool should_writeback_folio(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio)
+{
+ /*
+ * Folio truncated or invalidated. We can freely skip it then,
+ * even for data integrity operations: the folio has disappeared
+ * concurrently, so there could be no real expectation of this
+ * data integrity operation even if there is now a new, dirty
+ * folio at the same pagecache index.
+ */
+ if (unlikely(folio->mapping != mapping))
+ return false;
+
+ /* Did somebody write it for us? */
+ if (!folio_test_dirty(folio))
+ return false;
+
+ if (folio_test_writeback(folio)) {
+ if (wbc->sync_mode == WB_SYNC_NONE)
+ return false;
+ folio_wait_writeback(folio);
+ }
+
+ BUG_ON(folio_test_writeback(folio));
+ if (!folio_clear_dirty_for_io(folio))
+ return false;
+
+ return true;
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2470,38 +2500,13 @@ int write_cache_pages(struct address_space *mapping,
unsigned long nr;
folio_lock(folio);
-
- /*
- * Page truncated or invalidated. We can freely skip it
- * then, even for data integrity operations: the page
- * has disappeared concurrently, so there could be no
- * real expectation of this data integrity operation
- * even if there is now a new, dirty page at the same
- * pagecache address.
- */
- if (unlikely(folio->mapping != mapping)) {
-continue_unlock:
+ if (!should_writeback_folio(mapping, wbc, folio)) {
folio_unlock(folio);
continue;
}
- if (!folio_test_dirty(folio)) {
- /* someone wrote it for us */
- goto continue_unlock;
- }
-
- if (folio_test_writeback(folio)) {
- if (wbc->sync_mode != WB_SYNC_NONE)
- folio_wait_writeback(folio);
- else
- goto continue_unlock;
- }
-
- BUG_ON(folio_test_writeback(folio));
- if (!folio_clear_dirty_for_io(folio))
- goto continue_unlock;
-
trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+
error = writepage(folio, wbc, data);
nr = folio_nr_pages(folio);
wbc->nr_to_write -= nr;
--
2.39.2
When exiting write_cache_pages early due to a non-integrity write
failure, wbc->nr_to_write currently doesn't account for the folio
we just failed to write. This doesn't matter because the callers
always ingore the value on a failure, but moving the update to
common code will allow to simply the code, so do it.
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b13ea243edb6b2..8e312d73475646 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
error = writepage(folio, wbc, data);
nr = folio_nr_pages(folio);
+ wbc->nr_to_write -= nr;
if (unlikely(error)) {
/*
* Handle errors according to the type of
@@ -2506,7 +2507,6 @@ int write_cache_pages(struct address_space *mapping,
* we tagged for writeback prior to entering this loop.
*/
done_index = folio->index + nr;
- wbc->nr_to_write -= nr;
if (wbc->nr_to_write <= 0 &&
wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Instead of keeping our own local iterator variable, use the one just
added to folio_batch.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c7983ea3040be4..70f42fd9a95b5d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2397,13 +2397,19 @@ static pgoff_t wbc_end(struct writeback_control *wbc)
return wbc->range_end >> PAGE_SHIFT;
}
-static void writeback_get_batch(struct address_space *mapping,
+static struct folio *writeback_get_next(struct address_space *mapping,
struct writeback_control *wbc)
{
+ struct folio *folio = folio_batch_next(&wbc->fbatch);
+
+ if (folio)
+ return folio;
+
folio_batch_release(&wbc->fbatch);
cond_resched();
filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
wbc_to_tag(wbc), &wbc->fbatch);
+ return folio_batch_next(&wbc->fbatch);
}
static bool should_writeback_folio(struct address_space *mapping,
@@ -2473,7 +2479,6 @@ int write_cache_pages(struct address_space *mapping,
{
int error;
pgoff_t end; /* Inclusive */
- int i = 0;
if (wbc->range_cyclic) {
wbc->index = mapping->writeback_index; /* prev offset */
@@ -2489,18 +2494,12 @@ int write_cache_pages(struct address_space *mapping,
wbc->err = 0;
for (;;) {
- struct folio *folio;
+ struct folio *folio = writeback_get_next(mapping, wbc);
unsigned long nr;
- if (i == wbc->fbatch.nr) {
- writeback_get_batch(mapping, wbc);
- i = 0;
- }
- if (wbc->fbatch.nr == 0)
+ if (!folio)
break;
- folio = wbc->fbatch.folios[i++];
-
folio_lock(folio);
if (!should_writeback_folio(mapping, wbc, folio)) {
folio_unlock(folio);
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
This simple helper will be the basis of the writeback iterator.
To make this work, we need to remember the current index
and end positions in writeback_control.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
[hch: heavily rebased, add helpers to get the tag and end index,
don't keep the end index in struct writeback_control]
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/writeback.h | 1 +
mm/page-writeback.c | 49 +++++++++++++++++++++++++--------------
2 files changed, 32 insertions(+), 18 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 390f2dd03cf27e..195393981ccb5c 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -81,6 +81,7 @@ struct writeback_control {
/* internal fields used by the ->writepages implementation: */
struct folio_batch fbatch;
+ pgoff_t index;
int err;
#ifdef CONFIG_CGROUP_WRITEBACK
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 564d5faf562ba7..798e5264dc0353 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2383,6 +2383,29 @@ static void writeback_finish(struct address_space *mapping,
}
}
+static xa_mark_t wbc_to_tag(struct writeback_control *wbc)
+{
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ return PAGECACHE_TAG_TOWRITE;
+ return PAGECACHE_TAG_DIRTY;
+}
+
+static pgoff_t wbc_end(struct writeback_control *wbc)
+{
+ if (wbc->range_cyclic)
+ return -1;
+ return wbc->range_end >> PAGE_SHIFT;
+}
+
+static void writeback_get_batch(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ folio_batch_release(&wbc->fbatch);
+ cond_resched();
+ filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
+ wbc_to_tag(wbc), &wbc->fbatch);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2419,38 +2442,30 @@ int write_cache_pages(struct address_space *mapping,
void *data)
{
int error;
- int nr_folios;
- pgoff_t index;
pgoff_t end; /* Inclusive */
- xa_mark_t tag;
if (wbc->range_cyclic) {
- index = mapping->writeback_index; /* prev offset */
+ wbc->index = mapping->writeback_index; /* prev offset */
end = -1;
} else {
- index = wbc->range_start >> PAGE_SHIFT;
+ wbc->index = wbc->range_start >> PAGE_SHIFT;
end = wbc->range_end >> PAGE_SHIFT;
}
- if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) {
- tag_pages_for_writeback(mapping, index, end);
- tag = PAGECACHE_TAG_TOWRITE;
- } else {
- tag = PAGECACHE_TAG_DIRTY;
- }
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ tag_pages_for_writeback(mapping, wbc->index, end);
folio_batch_init(&wbc->fbatch);
wbc->err = 0;
- while (index <= end) {
+ while (wbc->index <= end) {
int i;
- nr_folios = filemap_get_folios_tag(mapping, &index, end,
- tag, &wbc->fbatch);
+ writeback_get_batch(mapping, wbc);
- if (nr_folios == 0)
+ if (wbc->fbatch.nr == 0)
break;
- for (i = 0; i < nr_folios; i++) {
+ for (i = 0; i < wbc->fbatch.nr; i++) {
struct folio *folio = wbc->fbatch.folios[i];
unsigned long nr;
@@ -2525,8 +2540,6 @@ int write_cache_pages(struct address_space *mapping,
return error;
}
}
- folio_batch_release(&wbc->fbatch);
- cond_resched();
}
writeback_finish(mapping, wbc, 0);
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Collapse the two nested loops into one. This is needed as a step
towards turning this into an iterator.
Note that this drops the "index <= end" check in the previous outer loop
and just relies on filemap_get_folios_tag() to return 0 entries when
index > end. This actually has a subtle implication when end == -1
because then the returned index will be -1 as well and thus if there is
page present on index -1, we could be looping indefinitely. But as the
comment in filemap_get_folios_tag documents this as already broken anyway
we should not worry about it here either. The fix for that would
probably a change to the filemap_get_folios_tag() calling convention.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
[hch: updated the commit log based on feedback from Jan Kara]
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 94 ++++++++++++++++++++++-----------------------
1 file changed, 46 insertions(+), 48 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2a004c0b9bdfbf..c7983ea3040be4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
{
int error;
pgoff_t end; /* Inclusive */
+ int i = 0;
if (wbc->range_cyclic) {
wbc->index = mapping->writeback_index; /* prev offset */
@@ -2487,63 +2488,60 @@ int write_cache_pages(struct address_space *mapping,
folio_batch_init(&wbc->fbatch);
wbc->err = 0;
- while (wbc->index <= end) {
- int i;
-
- writeback_get_batch(mapping, wbc);
+ for (;;) {
+ struct folio *folio;
+ unsigned long nr;
+ if (i == wbc->fbatch.nr) {
+ writeback_get_batch(mapping, wbc);
+ i = 0;
+ }
if (wbc->fbatch.nr == 0)
break;
- for (i = 0; i < wbc->fbatch.nr; i++) {
- struct folio *folio = wbc->fbatch.folios[i];
- unsigned long nr;
+ folio = wbc->fbatch.folios[i++];
- folio_lock(folio);
- if (!should_writeback_folio(mapping, wbc, folio)) {
- folio_unlock(folio);
- continue;
- }
+ folio_lock(folio);
+ if (!should_writeback_folio(mapping, wbc, folio)) {
+ folio_unlock(folio);
+ continue;
+ }
- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
- error = writepage(folio, wbc, data);
- nr = folio_nr_pages(folio);
- wbc->nr_to_write -= nr;
+ error = writepage(folio, wbc, data);
+ nr = folio_nr_pages(folio);
+ wbc->nr_to_write -= nr;
- /*
- * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return
- * value. Eventually all instances should just unlock
- * the folio themselves and return 0;
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- }
-
- if (error && !wbc->err)
- wbc->err = error;
+ /*
+ * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
+ * Eventually all instances should just unlock the folio
+ * themselves and return 0;
+ */
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ }
- /*
- * For integrity sync we have to keep going until we
- * have written all the folios we tagged for writeback
- * prior to entering this loop, even if we run past
- * wbc->nr_to_write or encounter errors. This is
- * because the file system may still have state to clear
- * for each folio. We'll eventually return the first
- * error encountered.
- *
- * For background writeback just push done_index past
- * this folio so that we can just restart where we left
- * off and media errors won't choke writeout for the
- * entire file.
- */
- if (wbc->sync_mode == WB_SYNC_NONE &&
- (wbc->err || wbc->nr_to_write <= 0)) {
- writeback_finish(mapping, wbc,
- folio->index + nr);
- return error;
- }
+ if (error && !wbc->err)
+ wbc->err = error;
+
+ /*
+ * For integrity sync we have to keep going until we have
+ * written all the folios we tagged for writeback prior to
+ * entering this loop, even if we run past wbc->nr_to_write or
+ * encounter errors. This is because the file system may still
+ * have state to clear for each folio. We'll eventually return
+ * the first error encountered.
+ *
+ * For background writeback just push done_index past this folio
+ * so that we can just restart where we left off and media
+ * errors won't choke writeout for the entire file.
+ */
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ (wbc->err || wbc->nr_to_write <= 0)) {
+ writeback_finish(mapping, wbc, folio->index + nr);
+ return error;
}
}
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Add a loop counter inside the folio_batch to let us iterate from 0-nr
instead of decrementing nr and treating the batch as a stack. It would
generate some very weird and suboptimal I/O patterns for page writeback
to iterate over the batch as a stack.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/pagevec.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 87cc678adc850b..fcc06c300a72c3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -27,6 +27,7 @@ struct folio;
*/
struct folio_batch {
unsigned char nr;
+ unsigned char i;
bool percpu_pvec_drained;
struct folio *folios[PAGEVEC_SIZE];
};
@@ -40,12 +41,14 @@ struct folio_batch {
static inline void folio_batch_init(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->i = 0;
fbatch->percpu_pvec_drained = false;
}
static inline void folio_batch_reinit(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->i = 0;
}
static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
@@ -75,6 +78,21 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch,
return folio_batch_space(fbatch);
}
+/**
+ * folio_batch_next - Return the next folio to process.
+ * @fbatch: The folio batch being processed.
+ *
+ * Use this function to implement a queue of folios.
+ *
+ * Return: The next folio in the queue, or NULL if the queue is empty.
+ */
+static inline struct folio *folio_batch_next(struct folio_batch *fbatch)
+{
+ if (fbatch->i == fbatch->nr)
+ return NULL;
+ return fbatch->folios[fbatch->i++];
+}
+
void __folio_batch_release(struct folio_batch *pvec);
static inline void folio_batch_release(struct folio_batch *fbatch)
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Pull the post-processing of the writepage_t callback into a
separate function. That means changing writeback_finish() to
return NULL, and writeback_get_next() to call writeback_finish()
when we naturally run out of folios.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 85 ++++++++++++++++++++++++---------------------
1 file changed, 45 insertions(+), 40 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 9d37dd5e58ffb6..0771f19950081f 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2449,8 +2449,10 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
for (;;) {
folio = writeback_get_next(mapping, wbc);
- if (!folio)
+ if (!folio) {
+ writeback_finish(mapping, wbc, 0);
return NULL;
+ }
folio_lock(folio);
if (likely(should_writeback_folio(mapping, wbc, folio)))
break;
@@ -2477,6 +2479,46 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
return writeback_get_folio(mapping, wbc);
}
+static struct folio *writeback_iter_next(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int error)
+{
+ unsigned long nr = folio_nr_pages(folio);
+
+ wbc->nr_to_write -= nr;
+
+ /*
+ * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
+ * Eventually all instances should just unlock the folio themselves and
+ * return 0;
+ */
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ }
+
+ if (error && !wbc->err)
+ wbc->err = error;
+
+ /*
+ * For integrity sync we have to keep going until we have written all
+ * the folios we tagged for writeback prior to entering the writeback
+ * loop, even if we run past wbc->nr_to_write or encounter errors.
+ * This is because the file system may still have state to clear for
+ * each folio. We'll eventually return the first error encountered.
+ *
+ * For background writeback just push done_index past this folio so that
+ * we can just restart where we left off and media errors won't choke
+ * writeout for the entire file.
+ */
+ if (wbc->sync_mode == WB_SYNC_NONE &&
+ (wbc->err || wbc->nr_to_write <= 0)) {
+ writeback_finish(mapping, wbc, folio->index + nr);
+ return NULL;
+ }
+
+ return writeback_get_folio(mapping, wbc);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2517,47 +2559,10 @@ int write_cache_pages(struct address_space *mapping,
for (folio = writeback_iter_init(mapping, wbc);
folio;
- folio = writeback_get_folio(mapping, wbc)) {
- unsigned long nr;
-
+ folio = writeback_iter_next(mapping, wbc, folio, error))
error = writepage(folio, wbc, data);
- nr = folio_nr_pages(folio);
- wbc->nr_to_write -= nr;
-
- /*
- * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
- * Eventually all instances should just unlock the folio
- * themselves and return 0;
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- }
-
- if (error && !wbc->err)
- wbc->err = error;
- /*
- * For integrity sync we have to keep going until we have
- * written all the folios we tagged for writeback prior to
- * entering this loop, even if we run past wbc->nr_to_write or
- * encounter errors. This is because the file system may still
- * have state to clear for each folio. We'll eventually return
- * the first error encountered.
- *
- * For background writeback just push done_index past this folio
- * so that we can just restart where we left off and media
- * errors won't choke writeout for the entire file.
- */
- if (wbc->sync_mode == WB_SYNC_NONE &&
- (wbc->err || wbc->nr_to_write <= 0)) {
- writeback_finish(mapping, wbc, folio->index + nr);
- return error;
- }
- }
-
- writeback_finish(mapping, wbc, 0);
- return 0;
+ return wbc->err;
}
EXPORT_SYMBOL(write_cache_pages);
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Move the loop for should-we-write-this-folio to its own function.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index efcfffa800884d..9d37dd5e58ffb6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2442,6 +2442,25 @@ static bool should_writeback_folio(struct address_space *mapping,
return true;
}
+static struct folio *writeback_get_folio(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ struct folio *folio;
+
+ for (;;) {
+ folio = writeback_get_next(mapping, wbc);
+ if (!folio)
+ return NULL;
+ folio_lock(folio);
+ if (likely(should_writeback_folio(mapping, wbc, folio)))
+ break;
+ folio_unlock(folio);
+ }
+
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+ return folio;
+}
+
static struct folio *writeback_iter_init(struct address_space *mapping,
struct writeback_control *wbc)
{
@@ -2455,7 +2474,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
wbc->err = 0;
folio_batch_init(&wbc->fbatch);
- return writeback_get_next(mapping, wbc);
+ return writeback_get_folio(mapping, wbc);
}
/**
@@ -2498,17 +2517,9 @@ int write_cache_pages(struct address_space *mapping,
for (folio = writeback_iter_init(mapping, wbc);
folio;
- folio = writeback_get_next(mapping, wbc)) {
+ folio = writeback_get_folio(mapping, wbc)) {
unsigned long nr;
- folio_lock(folio);
- if (!should_writeback_folio(mapping, wbc, folio)) {
- folio_unlock(folio);
- continue;
- }
-
- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
-
error = writepage(folio, wbc, data);
nr = folio_nr_pages(folio);
wbc->nr_to_write -= nr;
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Use the new for_each_writeback_folio() directly instead of indirecting
through a callback.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fbffd30a9cc93f..d3c2c78e0c67ce 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2564,13 +2564,21 @@ int write_cache_pages(struct address_space *mapping,
}
EXPORT_SYMBOL(write_cache_pages);
-static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
- void *data)
+static int writeback_use_writepage(struct address_space *mapping,
+ struct writeback_control *wbc)
{
- struct address_space *mapping = data;
- int ret = mapping->a_ops->writepage(&folio->page, wbc);
- mapping_set_error(mapping, ret);
- return ret;
+ struct blk_plug plug;
+ struct folio *folio;
+ int err;
+
+ blk_start_plug(&plug);
+ for_each_writeback_folio(mapping, wbc, folio, err) {
+ err = mapping->a_ops->writepage(&folio->page, wbc);
+ mapping_set_error(mapping, err);
+ }
+ blk_finish_plug(&plug);
+
+ return err;
}
int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
@@ -2586,12 +2594,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
if (mapping->a_ops->writepages) {
ret = mapping->a_ops->writepages(mapping, wbc);
} else if (mapping->a_ops->writepage) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
- ret = write_cache_pages(mapping, wbc, writepage_cb,
- mapping);
- blk_finish_plug(&plug);
+ ret = writeback_use_writepage(mapping, wbc);
} else {
/* deal with chardevs and other special files */
ret = 0;
--
2.39.2
Don't refer to write_cache_pages, which now is just a wrapper for the
writeback iterator.
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d3c2c78e0c67ce..bc69044fd063e8 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2325,18 +2325,18 @@ void __init page_writeback_init(void)
}
/**
- * tag_pages_for_writeback - tag pages to be written by write_cache_pages
+ * tag_pages_for_writeback - tag pages to be written by writeback
* @mapping: address space structure to write
* @start: starting page index
* @end: ending page index (inclusive)
*
* This function scans the page range from @start to @end (inclusive) and tags
- * all pages that have DIRTY tag set with a special TOWRITE tag. The idea is
- * that write_cache_pages (or whoever calls this function) will then use
- * TOWRITE tag to identify pages eligible for writeback. This mechanism is
- * used to avoid livelocking of writeback by a process steadily creating new
- * dirty pages in the file (thus it is important for this function to be quick
- * so that it can tag pages faster than a dirtying process can create them).
+ * all pages that have DIRTY tag set with a special TOWRITE tag. The caller
+ * can then use the TOWRITE tag to identify pages eligible for writeback.
+ * This mechanism is used to avoid livelocking of writeback by a process
+ * steadily creating new dirty pages in the file (thus it is important for this
+ * function to be quick so that it can tag pages faster than a dirtying process
+ * can create them).
*/
void tag_pages_for_writeback(struct address_space *mapping,
pgoff_t start, pgoff_t end)
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Make it return the first folio in the batch so that we can use it
in a typical for() pattern.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 39 ++++++++++++++++++++-------------------
1 file changed, 20 insertions(+), 19 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 70f42fd9a95b5d..efcfffa800884d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2442,6 +2442,22 @@ static bool should_writeback_folio(struct address_space *mapping,
return true;
}
+static struct folio *writeback_iter_init(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ if (wbc->range_cyclic)
+ wbc->index = mapping->writeback_index; /* prev offset */
+ else
+ wbc->index = wbc->range_start >> PAGE_SHIFT;
+
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
+
+ wbc->err = 0;
+ folio_batch_init(&wbc->fbatch);
+ return writeback_get_next(mapping, wbc);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2477,29 +2493,14 @@ int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data)
{
+ struct folio *folio;
int error;
- pgoff_t end; /* Inclusive */
- if (wbc->range_cyclic) {
- wbc->index = mapping->writeback_index; /* prev offset */
- end = -1;
- } else {
- wbc->index = wbc->range_start >> PAGE_SHIFT;
- end = wbc->range_end >> PAGE_SHIFT;
- }
- if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
- tag_pages_for_writeback(mapping, wbc->index, end);
-
- folio_batch_init(&wbc->fbatch);
- wbc->err = 0;
-
- for (;;) {
- struct folio *folio = writeback_get_next(mapping, wbc);
+ for (folio = writeback_iter_init(mapping, wbc);
+ folio;
+ folio = writeback_get_next(mapping, wbc)) {
unsigned long nr;
- if (!folio)
- break;
-
folio_lock(folio);
if (!should_writeback_folio(mapping, wbc, folio)) {
folio_unlock(folio);
--
2.39.2
From: "Matthew Wilcox (Oracle)" <[email protected]>
Wrap up the iterator with a nice bit of syntactic sugar. Now the
caller doesn't need to know about wbc->err and can just return error,
not knowing that the iterator took care of storing errors correctly.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/writeback.h | 10 ++++++++++
mm/page-writeback.c | 8 +++-----
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 195393981ccb5c..1c1a543070c17b 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -368,6 +368,16 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,
bool wb_over_bg_thresh(struct bdi_writeback *wb);
+struct folio *writeback_iter_init(struct address_space *mapping,
+ struct writeback_control *wbc);
+struct folio *writeback_iter_next(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int error);
+
+#define for_each_writeback_folio(mapping, wbc, folio, error) \
+ for (folio = writeback_iter_init(mapping, wbc); \
+ folio || ((error = wbc->err), false); \
+ folio = writeback_iter_next(mapping, wbc, folio, error))
+
typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
void *data);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0771f19950081f..fbffd30a9cc93f 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2463,7 +2463,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
return folio;
}
-static struct folio *writeback_iter_init(struct address_space *mapping,
+struct folio *writeback_iter_init(struct address_space *mapping,
struct writeback_control *wbc)
{
if (wbc->range_cyclic)
@@ -2479,7 +2479,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
return writeback_get_folio(mapping, wbc);
}
-static struct folio *writeback_iter_next(struct address_space *mapping,
+struct folio *writeback_iter_next(struct address_space *mapping,
struct writeback_control *wbc, struct folio *folio, int error)
{
unsigned long nr = folio_nr_pages(folio);
@@ -2557,9 +2557,7 @@ int write_cache_pages(struct address_space *mapping,
struct folio *folio;
int error;
- for (folio = writeback_iter_init(mapping, wbc);
- folio;
- folio = writeback_iter_next(mapping, wbc, folio, error))
+ for_each_writeback_folio(mapping, wbc, folio, error)
error = writepage(folio, wbc, data);
return wbc->err;
--
2.39.2
On Mon, Dec 18, 2023 at 04:35:36PM +0100, Christoph Hellwig wrote:
> Hi all,
>
> this is basically a evolution of the series Matthew Wilcox originally
> set in June. Based on comments from Jan a Brian this now actually
> untangles some of the more confusing conditional in the writeback code
> before refactoring it into the iterator. Because of that all the
> later patches need a fair amount of rebasing and I've not carried any
> reviewed-by over.
>
> The original cover letter is below:
>
> Dave Howells doesn't like the indirect function call imposed by
> write_cache_pages(), so refactor it into an iterator. I took the
> opportunity to add the ability to iterate a folio_batch without having
> an external variable.
>
> This is against next-20230623. If you try to apply it on top of a tree
> which doesn't include the pagevec removal series, IT WILL CRASH because
> it won't reinitialise folio_batch->i and the iteration will index out
> of bounds.
>
> I have a feeling the 'done' parameter could have a better name, but I
> can't think what it might be.
>
> Diffstat:
> include/linux/pagevec.h | 18 ++
> include/linux/writeback.h | 19 ++
> mm/page-writeback.c | 333 +++++++++++++++++++++++++---------------------
> 3 files changed, 220 insertions(+), 150 deletions(-)
I've just done a quick scan of the code - nothing stands out to me
as problematic, and I like how much cleaner the result is.
Acked-by: Dave Chinner <[email protected]>
--
Dave Chinner
[email protected]
On Mon 18-12-23 16:35:38, Christoph Hellwig wrote:
> When exiting write_cache_pages early due to a non-integrity write
> failure, wbc->nr_to_write currently doesn't account for the folio
> we just failed to write. This doesn't matter because the callers
> always ingore the value on a failure, but moving the update to
> common code will allow to simply the code, so do it.
^^^ simplify
> Signed-off-by: Christoph Hellwig <[email protected]>
Otherwise the patch looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index b13ea243edb6b2..8e312d73475646 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
> trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> error = writepage(folio, wbc, data);
> nr = folio_nr_pages(folio);
> + wbc->nr_to_write -= nr;
> if (unlikely(error)) {
> /*
> * Handle errors according to the type of
> @@ -2506,7 +2507,6 @@ int write_cache_pages(struct address_space *mapping,
> * we tagged for writeback prior to entering this loop.
> */
> done_index = folio->index + nr;
> - wbc->nr_to_write -= nr;
> if (wbc->nr_to_write <= 0 &&
> wbc->sync_mode == WB_SYNC_NONE) {
> done = 1;
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:39, Christoph Hellwig wrote:
> Rework we deal with the cleanup after the writepage call. First handle
^^ the way
> the magic AOP_WRITEPAGE_ACTIVATE separately from real error returns to
> get it out of the way of the actual error handling. Then merge the
> code to set ret for integrity vs non-integrity writeback. For
> non-integrity writeback the loop is terminated on the first error, so
> ret will never be non-zero. Then use a single block to check for
> non-integrity writewack to consolidate the cases where it returns for
> either an error or running off the end of nr_to_write.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
Otherwise looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 62 +++++++++++++++++++++------------------------
> 1 file changed, 29 insertions(+), 33 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 8e312d73475646..7ed6c2bc8dd51c 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2474,43 +2474,39 @@ int write_cache_pages(struct address_space *mapping,
> error = writepage(folio, wbc, data);
> nr = folio_nr_pages(folio);
> wbc->nr_to_write -= nr;
> - if (unlikely(error)) {
> - /*
> - * Handle errors according to the type of
> - * writeback. There's no need to continue for
> - * background writeback. Just push done_index
> - * past this page so media errors won't choke
> - * writeout for the entire file. For integrity
> - * writeback, we must process the entire dirty
> - * set regardless of errors because the fs may
> - * still have state to clear for each page. In
> - * that case we continue processing and return
> - * the first error.
> - */
> - if (error == AOP_WRITEPAGE_ACTIVATE) {
> - folio_unlock(folio);
> - error = 0;
> - } else if (wbc->sync_mode != WB_SYNC_ALL) {
> - ret = error;
> - done_index = folio->index + nr;
> - done = 1;
> - break;
> - }
> - if (!ret)
> - ret = error;
> +
> + /*
> + * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return
> + * value. Eventually all instances should just unlock
> + * the folio themselves and return 0;
> + */
> + if (error == AOP_WRITEPAGE_ACTIVATE) {
> + folio_unlock(folio);
> + error = 0;
> }
>
> /*
> - * We stop writing back only if we are not doing
> - * integrity sync. In case of integrity sync we have to
> - * keep going until we have written all the pages
> - * we tagged for writeback prior to entering this loop.
> + * For integrity sync we have to keep going until we
> + * have written all the folios we tagged for writeback
> + * prior to entering this loop, even if we run past
> + * wbc->nr_to_write or encounter errors. This is
> + * because the file system may still have state to clear
> + * for each folio. We'll eventually return the first
> + * error encountered.
> + *
> + * For background writeback just push done_index past
> + * this folio so that we can just restart where we left
> + * off and media errors won't choke writeout for the
> + * entire file.
> */
> - done_index = folio->index + nr;
> - if (wbc->nr_to_write <= 0 &&
> - wbc->sync_mode == WB_SYNC_NONE) {
> - done = 1;
> - break;
> + if (error && !ret)
> + ret = error;
> + if (wbc->sync_mode == WB_SYNC_NONE) {
> + if (ret || wbc->nr_to_write <= 0) {
> + done_index = folio->index + nr;
> + done = 1;
> + break;
> + }
> }
> }
> folio_batch_release(&fbatch);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:41, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> [hch: split from a larger patch]
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Sure. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> include/linux/writeback.h | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 083387c00f0c8b..833ec38fc3e0c9 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -364,8 +364,6 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb);
> typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
> void *data);
>
> -void tag_pages_for_writeback(struct address_space *mapping,
> - pgoff_t start, pgoff_t end);
> int write_cache_pages(struct address_space *mapping,
> struct writeback_control *wbc, writepage_t writepage,
> void *data);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:42, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Instead of having a 'done' variable that controls the nested loops,
> have a writeback_finish() that can be returned directly. This involves
> keeping more things in writeback_control, but it's just moving stuff
> allocated on the stack to being allocated slightly earlier on the stack.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> [hch: heavily rebased, reordered and commented struct writeback_control]
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> include/linux/writeback.h | 6 +++
> mm/page-writeback.c | 79 ++++++++++++++++++++-------------------
> 2 files changed, 47 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 833ec38fc3e0c9..390f2dd03cf27e 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -11,6 +11,7 @@
> #include <linux/flex_proportions.h>
> #include <linux/backing-dev-defs.h>
> #include <linux/blk_types.h>
> +#include <linux/pagevec.h>
>
> struct bio;
>
> @@ -40,6 +41,7 @@ enum writeback_sync_modes {
> * in a manner such that unspecified fields are set to zero.
> */
> struct writeback_control {
> + /* public fields that can be set and/or consumed by the caller: */
> long nr_to_write; /* Write this many pages, and decrement
> this for each page written */
> long pages_skipped; /* Pages which were not written */
> @@ -77,6 +79,10 @@ struct writeback_control {
> */
> struct swap_iocb **swap_plug;
>
> + /* internal fields used by the ->writepages implementation: */
> + struct folio_batch fbatch;
> + int err;
> +
> #ifdef CONFIG_CGROUP_WRITEBACK
> struct bdi_writeback *wb; /* wb this writeback is issued under */
> struct inode *inode; /* inode being written out */
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c798c0d6d0abb4..564d5faf562ba7 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2360,6 +2360,29 @@ void tag_pages_for_writeback(struct address_space *mapping,
> }
> EXPORT_SYMBOL(tag_pages_for_writeback);
>
> +static void writeback_finish(struct address_space *mapping,
> + struct writeback_control *wbc, pgoff_t done_index)
> +{
> + folio_batch_release(&wbc->fbatch);
> +
> + /*
> + * For range cyclic writeback we need to remember where we stopped so
> + * that we can continue there next time we are called. If we hit the
> + * last page and there is more work to be done, wrap back to the start
> + * of the file.
> + *
> + * For non-cyclic writeback we always start looking up at the beginning
> + * of the file if we are called again, which can only happen due to
> + * -ENOMEM from the file system.
> + */
> + if (wbc->range_cyclic) {
> + if (wbc->err || wbc->nr_to_write <= 0)
> + mapping->writeback_index = done_index;
> + else
> + mapping->writeback_index = 0;
> + }
> +}
> +
> /**
> * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> * @mapping: address space structure to write
> @@ -2395,17 +2418,12 @@ int write_cache_pages(struct address_space *mapping,
> struct writeback_control *wbc, writepage_t writepage,
> void *data)
> {
> - int ret = 0;
> - int done = 0;
> int error;
> - struct folio_batch fbatch;
> int nr_folios;
> pgoff_t index;
> pgoff_t end; /* Inclusive */
> - pgoff_t done_index;
> xa_mark_t tag;
>
> - folio_batch_init(&fbatch);
> if (wbc->range_cyclic) {
> index = mapping->writeback_index; /* prev offset */
> end = -1;
> @@ -2419,22 +2437,23 @@ int write_cache_pages(struct address_space *mapping,
> } else {
> tag = PAGECACHE_TAG_DIRTY;
> }
> - done_index = index;
> - while (!done && (index <= end)) {
> +
> + folio_batch_init(&wbc->fbatch);
> + wbc->err = 0;
> +
> + while (index <= end) {
> int i;
>
> nr_folios = filemap_get_folios_tag(mapping, &index, end,
> - tag, &fbatch);
> + tag, &wbc->fbatch);
>
> if (nr_folios == 0)
> break;
>
> for (i = 0; i < nr_folios; i++) {
> - struct folio *folio = fbatch.folios[i];
> + struct folio *folio = wbc->fbatch.folios[i];
> unsigned long nr;
>
> - done_index = folio->index;
> -
> folio_lock(folio);
>
> /*
> @@ -2481,6 +2500,9 @@ int write_cache_pages(struct address_space *mapping,
> folio_unlock(folio);
> error = 0;
> }
> +
> + if (error && !wbc->err)
> + wbc->err = error;
>
> /*
> * For integrity sync we have to keep going until we
> @@ -2496,38 +2518,19 @@ int write_cache_pages(struct address_space *mapping,
> * off and media errors won't choke writeout for the
> * entire file.
> */
> - if (error && !ret)
> - ret = error;
> - if (wbc->sync_mode == WB_SYNC_NONE) {
> - if (ret || wbc->nr_to_write <= 0) {
> - done_index = folio->index + nr;
> - done = 1;
> - break;
> - }
> + if (wbc->sync_mode == WB_SYNC_NONE &&
> + (wbc->err || wbc->nr_to_write <= 0)) {
> + writeback_finish(mapping, wbc,
> + folio->index + nr);
> + return error;
> }
> }
> - folio_batch_release(&fbatch);
> + folio_batch_release(&wbc->fbatch);
> cond_resched();
> }
>
> - /*
> - * For range cyclic writeback we need to remember where we stopped so
> - * that we can continue there next time we are called. If we hit the
> - * last page and there is more work to be done, wrap back to the start
> - * of the file.
> - *
> - * For non-cyclic writeback we always start looking up at the beginning
> - * of the file if we are called again, which can only happen due to
> - * -ENOMEM from the file system.
> - */
> - if (wbc->range_cyclic) {
> - if (done)
> - mapping->writeback_index = done_index;
> - else
> - mapping->writeback_index = 0;
> - }
> -
> - return ret;
> + writeback_finish(mapping, wbc, 0);
> + return 0;
> }
> EXPORT_SYMBOL(write_cache_pages);
>
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:43, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> This simple helper will be the basis of the writeback iterator.
> To make this work, we need to remember the current index
> and end positions in writeback_control.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> [hch: heavily rebased, add helpers to get the tag and end index,
> don't keep the end index in struct writeback_control]
> Signed-off-by: Christoph Hellwig <[email protected]>
Just two nits below. However you decide about them feel free to add:
Reviewed-by: Jan Kara <[email protected]>
> +static pgoff_t wbc_end(struct writeback_control *wbc)
> +{
> + if (wbc->range_cyclic)
> + return -1;
> + return wbc->range_end >> PAGE_SHIFT;
> +}
> +
> +static void writeback_get_batch(struct address_space *mapping,
> + struct writeback_control *wbc)
> +{
> + folio_batch_release(&wbc->fbatch);
> + cond_resched();
I'd prefer to have cond_resched() explicitely in the writeback loop instead
of hidden here in writeback_get_batch() where it logically does not make
too much sense to me...
> + filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
> + wbc_to_tag(wbc), &wbc->fbatch);
> +}
> +
> /**
> * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> * @mapping: address space structure to write
> @@ -2419,38 +2442,30 @@ int write_cache_pages(struct address_space *mapping,
> void *data)
> {
> int error;
> - int nr_folios;
> - pgoff_t index;
> pgoff_t end; /* Inclusive */
> - xa_mark_t tag;
>
> if (wbc->range_cyclic) {
> - index = mapping->writeback_index; /* prev offset */
> + wbc->index = mapping->writeback_index; /* prev offset */
> end = -1;
> } else {
> - index = wbc->range_start >> PAGE_SHIFT;
> + wbc->index = wbc->range_start >> PAGE_SHIFT;
> end = wbc->range_end >> PAGE_SHIFT;
> }
Maybe we should have:
end = wbc_end(wbc);
when we have the helper? But I guess this gets cleaned up in later patches
anyway so whatever.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:44, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Reduce write_cache_pages() by about 30 lines; much of it is commentary,
> but it all bundles nicely into an obvious function.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
One nit below, otherwise feel free to add:
Reviewed-by: Jan Kara <[email protected]>
> +static bool should_writeback_folio(struct address_space *mapping,
> + struct writeback_control *wbc, struct folio *folio)
> +{
I'd call this function folio_prepare_writeback() or something like that to
make it clearer that this function is not only about the decision whether
we want to write folio or not but we also clear the dirty bit &
writeprotect the folio in page tables.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:45, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Collapse the two nested loops into one. This is needed as a step
> towards turning this into an iterator.
>
> Note that this drops the "index <= end" check in the previous outer loop
> and just relies on filemap_get_folios_tag() to return 0 entries when
> index > end. This actually has a subtle implication when end == -1
> because then the returned index will be -1 as well and thus if there is
> page present on index -1, we could be looping indefinitely. But as the
> comment in filemap_get_folios_tag documents this as already broken anyway
> we should not worry about it here either. The fix for that would
> probably a change to the filemap_get_folios_tag() calling convention.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> [hch: updated the commit log based on feedback from Jan Kara]
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 94 ++++++++++++++++++++++-----------------------
> 1 file changed, 46 insertions(+), 48 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 2a004c0b9bdfbf..c7983ea3040be4 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2473,6 +2473,7 @@ int write_cache_pages(struct address_space *mapping,
> {
> int error;
> pgoff_t end; /* Inclusive */
> + int i = 0;
>
> if (wbc->range_cyclic) {
> wbc->index = mapping->writeback_index; /* prev offset */
> @@ -2487,63 +2488,60 @@ int write_cache_pages(struct address_space *mapping,
> folio_batch_init(&wbc->fbatch);
> wbc->err = 0;
>
> - while (wbc->index <= end) {
> - int i;
> -
> - writeback_get_batch(mapping, wbc);
> + for (;;) {
> + struct folio *folio;
> + unsigned long nr;
>
> + if (i == wbc->fbatch.nr) {
> + writeback_get_batch(mapping, wbc);
> + i = 0;
> + }
> if (wbc->fbatch.nr == 0)
> break;
>
> - for (i = 0; i < wbc->fbatch.nr; i++) {
> - struct folio *folio = wbc->fbatch.folios[i];
> - unsigned long nr;
> + folio = wbc->fbatch.folios[i++];
>
> - folio_lock(folio);
> - if (!should_writeback_folio(mapping, wbc, folio)) {
> - folio_unlock(folio);
> - continue;
> - }
> + folio_lock(folio);
> + if (!should_writeback_folio(mapping, wbc, folio)) {
> + folio_unlock(folio);
> + continue;
> + }
>
> - trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> + trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
>
> - error = writepage(folio, wbc, data);
> - nr = folio_nr_pages(folio);
> - wbc->nr_to_write -= nr;
> + error = writepage(folio, wbc, data);
> + nr = folio_nr_pages(folio);
> + wbc->nr_to_write -= nr;
>
> - /*
> - * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return
> - * value. Eventually all instances should just unlock
> - * the folio themselves and return 0;
> - */
> - if (error == AOP_WRITEPAGE_ACTIVATE) {
> - folio_unlock(folio);
> - error = 0;
> - }
> -
> - if (error && !wbc->err)
> - wbc->err = error;
> + /*
> + * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> + * Eventually all instances should just unlock the folio
> + * themselves and return 0;
> + */
> + if (error == AOP_WRITEPAGE_ACTIVATE) {
> + folio_unlock(folio);
> + error = 0;
> + }
>
> - /*
> - * For integrity sync we have to keep going until we
> - * have written all the folios we tagged for writeback
> - * prior to entering this loop, even if we run past
> - * wbc->nr_to_write or encounter errors. This is
> - * because the file system may still have state to clear
> - * for each folio. We'll eventually return the first
> - * error encountered.
> - *
> - * For background writeback just push done_index past
> - * this folio so that we can just restart where we left
> - * off and media errors won't choke writeout for the
> - * entire file.
> - */
> - if (wbc->sync_mode == WB_SYNC_NONE &&
> - (wbc->err || wbc->nr_to_write <= 0)) {
> - writeback_finish(mapping, wbc,
> - folio->index + nr);
> - return error;
> - }
> + if (error && !wbc->err)
> + wbc->err = error;
> +
> + /*
> + * For integrity sync we have to keep going until we have
> + * written all the folios we tagged for writeback prior to
> + * entering this loop, even if we run past wbc->nr_to_write or
> + * encounter errors. This is because the file system may still
> + * have state to clear for each folio. We'll eventually return
> + * the first error encountered.
> + *
> + * For background writeback just push done_index past this folio
> + * so that we can just restart where we left off and media
> + * errors won't choke writeout for the entire file.
> + */
> + if (wbc->sync_mode == WB_SYNC_NONE &&
> + (wbc->err || wbc->nr_to_write <= 0)) {
> + writeback_finish(mapping, wbc, folio->index + nr);
> + return error;
> }
> }
>
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:46, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Add a loop counter inside the folio_batch to let us iterate from 0-nr
> instead of decrementing nr and treating the batch as a stack. It would
> generate some very weird and suboptimal I/O patterns for page writeback
> to iterate over the batch as a stack.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> include/linux/pagevec.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 87cc678adc850b..fcc06c300a72c3 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -27,6 +27,7 @@ struct folio;
> */
> struct folio_batch {
> unsigned char nr;
> + unsigned char i;
> bool percpu_pvec_drained;
> struct folio *folios[PAGEVEC_SIZE];
> };
> @@ -40,12 +41,14 @@ struct folio_batch {
> static inline void folio_batch_init(struct folio_batch *fbatch)
> {
> fbatch->nr = 0;
> + fbatch->i = 0;
> fbatch->percpu_pvec_drained = false;
> }
>
> static inline void folio_batch_reinit(struct folio_batch *fbatch)
> {
> fbatch->nr = 0;
> + fbatch->i = 0;
> }
>
> static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
> @@ -75,6 +78,21 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch,
> return folio_batch_space(fbatch);
> }
>
> +/**
> + * folio_batch_next - Return the next folio to process.
> + * @fbatch: The folio batch being processed.
> + *
> + * Use this function to implement a queue of folios.
> + *
> + * Return: The next folio in the queue, or NULL if the queue is empty.
> + */
> +static inline struct folio *folio_batch_next(struct folio_batch *fbatch)
> +{
> + if (fbatch->i == fbatch->nr)
> + return NULL;
> + return fbatch->folios[fbatch->i++];
> +}
> +
> void __folio_batch_release(struct folio_batch *pvec);
>
> static inline void folio_batch_release(struct folio_batch *fbatch)
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:47, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Instead of keeping our own local iterator variable, use the one just
> added to folio_batch.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
Nice. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index c7983ea3040be4..70f42fd9a95b5d 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2397,13 +2397,19 @@ static pgoff_t wbc_end(struct writeback_control *wbc)
> return wbc->range_end >> PAGE_SHIFT;
> }
>
> -static void writeback_get_batch(struct address_space *mapping,
> +static struct folio *writeback_get_next(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> + struct folio *folio = folio_batch_next(&wbc->fbatch);
> +
> + if (folio)
> + return folio;
> +
> folio_batch_release(&wbc->fbatch);
> cond_resched();
> filemap_get_folios_tag(mapping, &wbc->index, wbc_end(wbc),
> wbc_to_tag(wbc), &wbc->fbatch);
> + return folio_batch_next(&wbc->fbatch);
> }
>
> static bool should_writeback_folio(struct address_space *mapping,
> @@ -2473,7 +2479,6 @@ int write_cache_pages(struct address_space *mapping,
> {
> int error;
> pgoff_t end; /* Inclusive */
> - int i = 0;
>
> if (wbc->range_cyclic) {
> wbc->index = mapping->writeback_index; /* prev offset */
> @@ -2489,18 +2494,12 @@ int write_cache_pages(struct address_space *mapping,
> wbc->err = 0;
>
> for (;;) {
> - struct folio *folio;
> + struct folio *folio = writeback_get_next(mapping, wbc);
> unsigned long nr;
>
> - if (i == wbc->fbatch.nr) {
> - writeback_get_batch(mapping, wbc);
> - i = 0;
> - }
> - if (wbc->fbatch.nr == 0)
> + if (!folio)
> break;
>
> - folio = wbc->fbatch.folios[i++];
> -
> folio_lock(folio);
> if (!should_writeback_folio(mapping, wbc, folio)) {
> folio_unlock(folio);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:48, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Make it return the first folio in the batch so that we can use it
> in a typical for() pattern.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 39 ++++++++++++++++++++-------------------
> 1 file changed, 20 insertions(+), 19 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 70f42fd9a95b5d..efcfffa800884d 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2442,6 +2442,22 @@ static bool should_writeback_folio(struct address_space *mapping,
> return true;
> }
>
> +static struct folio *writeback_iter_init(struct address_space *mapping,
> + struct writeback_control *wbc)
> +{
> + if (wbc->range_cyclic)
> + wbc->index = mapping->writeback_index; /* prev offset */
> + else
> + wbc->index = wbc->range_start >> PAGE_SHIFT;
> +
> + if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> + tag_pages_for_writeback(mapping, wbc->index, wbc_end(wbc));
> +
> + wbc->err = 0;
> + folio_batch_init(&wbc->fbatch);
> + return writeback_get_next(mapping, wbc);
> +}
> +
> /**
> * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> * @mapping: address space structure to write
> @@ -2477,29 +2493,14 @@ int write_cache_pages(struct address_space *mapping,
> struct writeback_control *wbc, writepage_t writepage,
> void *data)
> {
> + struct folio *folio;
> int error;
> - pgoff_t end; /* Inclusive */
>
> - if (wbc->range_cyclic) {
> - wbc->index = mapping->writeback_index; /* prev offset */
> - end = -1;
> - } else {
> - wbc->index = wbc->range_start >> PAGE_SHIFT;
> - end = wbc->range_end >> PAGE_SHIFT;
> - }
> - if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
> - tag_pages_for_writeback(mapping, wbc->index, end);
> -
> - folio_batch_init(&wbc->fbatch);
> - wbc->err = 0;
> -
> - for (;;) {
> - struct folio *folio = writeback_get_next(mapping, wbc);
> + for (folio = writeback_iter_init(mapping, wbc);
> + folio;
> + folio = writeback_get_next(mapping, wbc)) {
> unsigned long nr;
>
> - if (!folio)
> - break;
> -
> folio_lock(folio);
> if (!should_writeback_folio(mapping, wbc, folio)) {
> folio_unlock(folio);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:49, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Move the loop for should-we-write-this-folio to its own function.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good to me. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
But I'd note that the call stack depth of similarly called helper functions
(with more to come later in the series) is getting a bit confusing. Maybe
we should inline writeback_get_next() into its single caller
writeback_get_folio() to reduce confusion a bit...
Honza
> +static struct folio *writeback_get_folio(struct address_space *mapping,
> + struct writeback_control *wbc)
> +{
> + struct folio *folio;
> +
> + for (;;) {
> + folio = writeback_get_next(mapping, wbc);
> + if (!folio)
> + return NULL;
> + folio_lock(folio);
> + if (likely(should_writeback_folio(mapping, wbc, folio)))
> + break;
> + folio_unlock(folio);
> + }
> +
> + trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> + return folio;
> +}
> +
> static struct folio *writeback_iter_init(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> @@ -2455,7 +2474,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
>
> wbc->err = 0;
> folio_batch_init(&wbc->fbatch);
> - return writeback_get_next(mapping, wbc);
> + return writeback_get_folio(mapping, wbc);
> }
>
> /**
> @@ -2498,17 +2517,9 @@ int write_cache_pages(struct address_space *mapping,
>
> for (folio = writeback_iter_init(mapping, wbc);
> folio;
> - folio = writeback_get_next(mapping, wbc)) {
> + folio = writeback_get_folio(mapping, wbc)) {
> unsigned long nr;
>
> - folio_lock(folio);
> - if (!should_writeback_folio(mapping, wbc, folio)) {
> - folio_unlock(folio);
> - continue;
> - }
> -
> - trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> -
> error = writepage(folio, wbc, data);
> nr = folio_nr_pages(folio);
> wbc->nr_to_write -= nr;
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:50, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Pull the post-processing of the writepage_t callback into a
> separate function. That means changing writeback_finish() to
> return NULL, and writeback_get_next() to call writeback_finish()
> when we naturally run out of folios.
The part about writeback_finish() does not seem to be true anymore.
Otherwise feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> mm/page-writeback.c | 85 ++++++++++++++++++++++++---------------------
> 1 file changed, 45 insertions(+), 40 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 9d37dd5e58ffb6..0771f19950081f 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2449,8 +2449,10 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
>
> for (;;) {
> folio = writeback_get_next(mapping, wbc);
> - if (!folio)
> + if (!folio) {
> + writeback_finish(mapping, wbc, 0);
> return NULL;
> + }
> folio_lock(folio);
> if (likely(should_writeback_folio(mapping, wbc, folio)))
> break;
> @@ -2477,6 +2479,46 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
> return writeback_get_folio(mapping, wbc);
> }
>
> +static struct folio *writeback_iter_next(struct address_space *mapping,
> + struct writeback_control *wbc, struct folio *folio, int error)
> +{
> + unsigned long nr = folio_nr_pages(folio);
> +
> + wbc->nr_to_write -= nr;
> +
> + /*
> + * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> + * Eventually all instances should just unlock the folio themselves and
> + * return 0;
> + */
> + if (error == AOP_WRITEPAGE_ACTIVATE) {
> + folio_unlock(folio);
> + error = 0;
> + }
> +
> + if (error && !wbc->err)
> + wbc->err = error;
> +
> + /*
> + * For integrity sync we have to keep going until we have written all
> + * the folios we tagged for writeback prior to entering the writeback
> + * loop, even if we run past wbc->nr_to_write or encounter errors.
> + * This is because the file system may still have state to clear for
> + * each folio. We'll eventually return the first error encountered.
> + *
> + * For background writeback just push done_index past this folio so that
> + * we can just restart where we left off and media errors won't choke
> + * writeout for the entire file.
> + */
> + if (wbc->sync_mode == WB_SYNC_NONE &&
> + (wbc->err || wbc->nr_to_write <= 0)) {
> + writeback_finish(mapping, wbc, folio->index + nr);
> + return NULL;
> + }
> +
> + return writeback_get_folio(mapping, wbc);
> +}
> +
> /**
> * write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
> * @mapping: address space structure to write
> @@ -2517,47 +2559,10 @@ int write_cache_pages(struct address_space *mapping,
>
> for (folio = writeback_iter_init(mapping, wbc);
> folio;
> - folio = writeback_get_folio(mapping, wbc)) {
> - unsigned long nr;
> -
> + folio = writeback_iter_next(mapping, wbc, folio, error))
> error = writepage(folio, wbc, data);
> - nr = folio_nr_pages(folio);
> - wbc->nr_to_write -= nr;
> -
> - /*
> - * Handle the legacy AOP_WRITEPAGE_ACTIVATE magic return value.
> - * Eventually all instances should just unlock the folio
> - * themselves and return 0;
> - */
> - if (error == AOP_WRITEPAGE_ACTIVATE) {
> - folio_unlock(folio);
> - error = 0;
> - }
> -
> - if (error && !wbc->err)
> - wbc->err = error;
>
> - /*
> - * For integrity sync we have to keep going until we have
> - * written all the folios we tagged for writeback prior to
> - * entering this loop, even if we run past wbc->nr_to_write or
> - * encounter errors. This is because the file system may still
> - * have state to clear for each folio. We'll eventually return
> - * the first error encountered.
> - *
> - * For background writeback just push done_index past this folio
> - * so that we can just restart where we left off and media
> - * errors won't choke writeout for the entire file.
> - */
> - if (wbc->sync_mode == WB_SYNC_NONE &&
> - (wbc->err || wbc->nr_to_write <= 0)) {
> - writeback_finish(mapping, wbc, folio->index + nr);
> - return error;
> - }
> - }
> -
> - writeback_finish(mapping, wbc, 0);
> - return 0;
> + return wbc->err;
> }
> EXPORT_SYMBOL(write_cache_pages);
>
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:51, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Wrap up the iterator with a nice bit of syntactic sugar. Now the
> caller doesn't need to know about wbc->err and can just return error,
> not knowing that the iterator took care of storing errors correctly.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
Not sure if the trick with 'error' variable isn't a bit too clever for us
;) We'll see how many bugs it will cause in the future... Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> +#define for_each_writeback_folio(mapping, wbc, folio, error) \
> + for (folio = writeback_iter_init(mapping, wbc); \
> + folio || ((error = wbc->err), false); \
> + folio = writeback_iter_next(mapping, wbc, folio, error))
> +
> typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
> void *data);
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0771f19950081f..fbffd30a9cc93f 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2463,7 +2463,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
> return folio;
> }
>
> -static struct folio *writeback_iter_init(struct address_space *mapping,
> +struct folio *writeback_iter_init(struct address_space *mapping,
> struct writeback_control *wbc)
> {
> if (wbc->range_cyclic)
> @@ -2479,7 +2479,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
> return writeback_get_folio(mapping, wbc);
> }
>
> -static struct folio *writeback_iter_next(struct address_space *mapping,
> +struct folio *writeback_iter_next(struct address_space *mapping,
> struct writeback_control *wbc, struct folio *folio, int error)
> {
> unsigned long nr = folio_nr_pages(folio);
> @@ -2557,9 +2557,7 @@ int write_cache_pages(struct address_space *mapping,
> struct folio *folio;
> int error;
>
> - for (folio = writeback_iter_init(mapping, wbc);
> - folio;
> - folio = writeback_iter_next(mapping, wbc, folio, error))
> + for_each_writeback_folio(mapping, wbc, folio, error)
> error = writepage(folio, wbc, data);
>
> return wbc->err;
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:52, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Use the new for_each_writeback_folio() directly instead of indirecting
> through a callback.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index fbffd30a9cc93f..d3c2c78e0c67ce 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2564,13 +2564,21 @@ int write_cache_pages(struct address_space *mapping,
> }
> EXPORT_SYMBOL(write_cache_pages);
>
> -static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
> - void *data)
> +static int writeback_use_writepage(struct address_space *mapping,
> + struct writeback_control *wbc)
> {
> - struct address_space *mapping = data;
> - int ret = mapping->a_ops->writepage(&folio->page, wbc);
> - mapping_set_error(mapping, ret);
> - return ret;
> + struct blk_plug plug;
> + struct folio *folio;
> + int err;
> +
> + blk_start_plug(&plug);
> + for_each_writeback_folio(mapping, wbc, folio, err) {
> + err = mapping->a_ops->writepage(&folio->page, wbc);
> + mapping_set_error(mapping, err);
> + }
> + blk_finish_plug(&plug);
> +
> + return err;
> }
>
> int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
> @@ -2586,12 +2594,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
> if (mapping->a_ops->writepages) {
> ret = mapping->a_ops->writepages(mapping, wbc);
> } else if (mapping->a_ops->writepage) {
> - struct blk_plug plug;
> -
> - blk_start_plug(&plug);
> - ret = write_cache_pages(mapping, wbc, writepage_cb,
> - mapping);
> - blk_finish_plug(&plug);
> + ret = writeback_use_writepage(mapping, wbc);
> } else {
> /* deal with chardevs and other special files */
> ret = 0;
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 18-12-23 16:35:53, Christoph Hellwig wrote:
> Don't refer to write_cache_pages, which now is just a wrapper for the
> writeback iterator.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <[email protected]>
Honza
> ---
> mm/page-writeback.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index d3c2c78e0c67ce..bc69044fd063e8 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2325,18 +2325,18 @@ void __init page_writeback_init(void)
> }
>
> /**
> - * tag_pages_for_writeback - tag pages to be written by write_cache_pages
> + * tag_pages_for_writeback - tag pages to be written by writeback
> * @mapping: address space structure to write
> * @start: starting page index
> * @end: ending page index (inclusive)
> *
> * This function scans the page range from @start to @end (inclusive) and tags
> - * all pages that have DIRTY tag set with a special TOWRITE tag. The idea is
> - * that write_cache_pages (or whoever calls this function) will then use
> - * TOWRITE tag to identify pages eligible for writeback. This mechanism is
> - * used to avoid livelocking of writeback by a process steadily creating new
> - * dirty pages in the file (thus it is important for this function to be quick
> - * so that it can tag pages faster than a dirtying process can create them).
> + * all pages that have DIRTY tag set with a special TOWRITE tag. The caller
> + * can then use the TOWRITE tag to identify pages eligible for writeback.
> + * This mechanism is used to avoid livelocking of writeback by a process
> + * steadily creating new dirty pages in the file (thus it is important for this
> + * function to be quick so that it can tag pages faster than a dirtying process
> + * can create them).
> */
> void tag_pages_for_writeback(struct address_space *mapping,
> pgoff_t start, pgoff_t end)
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu, Dec 21, 2023 at 12:17:43PM +0100, Jan Kara wrote:
> > +static void writeback_get_batch(struct address_space *mapping,
> > + struct writeback_control *wbc)
> > +{
> > + folio_batch_release(&wbc->fbatch);
> > + cond_resched();
>
> I'd prefer to have cond_resched() explicitely in the writeback loop instead
> of hidden here in writeback_get_batch() where it logically does not make
> too much sense to me...
Based on the final state after this series, where would you place it?
(That beeing said there is a discussion underway on lkml to maybe
kill cond_resched entirely as part of sorting out the preemption
model mess, at that point this would become a moot point anyway)
> > } else {
> > - index = wbc->range_start >> PAGE_SHIFT;
> > + wbc->index = wbc->range_start >> PAGE_SHIFT;
> > end = wbc->range_end >> PAGE_SHIFT;
> > }
>
> Maybe we should have:
> end = wbc_end(wbc);
>
> when we have the helper? But I guess this gets cleaned up in later patches
> anyway so whatever.
Yeah, this end just goes away. I can convert it here, but that feels
like pointless churn to me.
On Thu, Dec 21, 2023 at 12:22:06PM +0100, Jan Kara wrote:
> > +static bool should_writeback_folio(struct address_space *mapping,
> > + struct writeback_control *wbc, struct folio *folio)
> > +{
>
> I'd call this function folio_prepare_writeback() or something like that to
> make it clearer that this function is not only about the decision whether
> we want to write folio or not but we also clear the dirty bit &
> writeprotect the folio in page tables.
Fine with me. I'll wait for willy to chime in if he has a strong
preference.
On Thu, Dec 21, 2023 at 12:41:53PM +0100, Jan Kara wrote:
> But I'd note that the call stack depth of similarly called helper functions
> (with more to come later in the series) is getting a bit confusing. Maybe
> we should inline writeback_get_next() into its single caller
> writeback_get_folio() to reduce confusion a bit...
I just hacked that up based on the fully applied series and that looks
good to me.
On Thu 21-12-23 13:22:33, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 12:17:43PM +0100, Jan Kara wrote:
> > > +static void writeback_get_batch(struct address_space *mapping,
> > > + struct writeback_control *wbc)
> > > +{
> > > + folio_batch_release(&wbc->fbatch);
> > > + cond_resched();
> >
> > I'd prefer to have cond_resched() explicitely in the writeback loop instead
> > of hidden here in writeback_get_batch() where it logically does not make
> > too much sense to me...
>
> Based on the final state after this series, where would you place it?
I guess writeback_get_folio() would be fine... Which is where it naturally
lands with the inlining I already suggested so probably nothing to do here.
> (That beeing said there is a discussion underway on lkml to maybe
> kill cond_resched entirely as part of sorting out the preemption
> model mess, at that point this would become a moot point anyway)
>
> > > } else {
> > > - index = wbc->range_start >> PAGE_SHIFT;
> > > + wbc->index = wbc->range_start >> PAGE_SHIFT;
> > > end = wbc->range_end >> PAGE_SHIFT;
> > > }
> >
> > Maybe we should have:
> > end = wbc_end(wbc);
> >
> > when we have the helper? But I guess this gets cleaned up in later patches
> > anyway so whatever.
>
> Yeah, this end just goes away. I can convert it here, but that feels
> like pointless churn to me.
Agreed. Just leave it alone.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu 21-12-23 13:25:35, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 12:41:53PM +0100, Jan Kara wrote:
> > But I'd note that the call stack depth of similarly called helper functions
> > (with more to come later in the series) is getting a bit confusing. Maybe
> > we should inline writeback_get_next() into its single caller
> > writeback_get_folio() to reduce confusion a bit...
>
> I just hacked that up based on the fully applied series and that looks
> good to me.
Yeah, cleanup on top works for me so that you don't have to rebase.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu, Dec 21, 2023 at 12:51:49PM +0100, Jan Kara wrote:
> On Mon 18-12-23 16:35:51, Christoph Hellwig wrote:
> > From: "Matthew Wilcox (Oracle)" <[email protected]>
> >
> > Wrap up the iterator with a nice bit of syntactic sugar. Now the
> > caller doesn't need to know about wbc->err and can just return error,
> > not knowing that the iterator took care of storing errors correctly.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > Signed-off-by: Christoph Hellwig <[email protected]>
>
> Not sure if the trick with 'error' variable isn't a bit too clever for us
> ;) We'll see how many bugs it will cause in the future...
It's a bit too much syntactic sugar for my taste, but if we want a magic
for macro I can't really see a good way around it. I personally wouldn't
mind a version where the writeback_get_folio moves out of
writeback_iter_init and the pattern would look more like:
writeback_iter_init(mapping, wbc);
while ((folio = writeback_iter_next(mapping, wbc, folio))) {
wbc->err = <do something>
}
return wbc->err;
On Thu 21-12-23 13:29:10, Christoph Hellwig wrote:
> On Thu, Dec 21, 2023 at 12:51:49PM +0100, Jan Kara wrote:
> > On Mon 18-12-23 16:35:51, Christoph Hellwig wrote:
> > > From: "Matthew Wilcox (Oracle)" <[email protected]>
> > >
> > > Wrap up the iterator with a nice bit of syntactic sugar. Now the
> > > caller doesn't need to know about wbc->err and can just return error,
> > > not knowing that the iterator took care of storing errors correctly.
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > Signed-off-by: Christoph Hellwig <[email protected]>
> >
> > Not sure if the trick with 'error' variable isn't a bit too clever for us
> > ;) We'll see how many bugs it will cause in the future...
>
> It's a bit too much syntactic sugar for my taste, but if we want a magic
> for macro I can't really see a good way around it. I personally wouldn't
Agreed. The macro is kind of neat but a magic like this tends to bite us in
surprising ways. E.g. if someone breaks out of the loop, things will go
really wrong (missing writeback_finish() call). That would be actually a
good usecase for the cleanup handlers PeterZ has been promoting - we could
make sure writeback_finish() is called whenever we exit the loop block.
> mind a version where the writeback_get_folio moves out of
> writeback_iter_init and the pattern would look more like:
>
> writeback_iter_init(mapping, wbc);
> while ((folio = writeback_iter_next(mapping, wbc, folio))) {
> wbc->err = <do something>
> }
>
> return wbc->err;
That would work for me as well. But I don't feel to strongly about this
either way.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR