2023-12-14 13:26:41

by Christoph Hellwig

[permalink] [raw]
Subject: Convert write_cache_pages() to an iterator v2

Hi all,

this is basically a report of willy's series of the same name for June.
I picked the lastest version from his (now apparently defunct) git tree,
rebased it to mainline (no coflict, neither for linux-next), reordered
the new fields in struct writeback_control to document what is interface
vs internal, and temporarily dropped the iomap patch due to a conflict
in the VFS tree.

willy: let me know if me reposting it like this is fine, or if you
want me to stop. I'd just really like to see it merged :)
Note that patch 4 is missing your signoff, so we'd need that before
proceeding anyway.

The original cover letter is below:

Dave Howells doesn't like the indirect function call imposed by
write_cache_pages(), so refactor it into an iterator. I took the
opportunity to add the ability to iterate a folio_batch without having
an external variable.

This is against next-20230623. If you try to apply it on top of a tree
which doesn't include the pagevec removal series, IT WILL CRASH because
it won't reinitialise folio_batch->i and the iteration will index out
of bounds.

I have a feeling the 'done' parameter could have a better name, but I
can't think what it might be.

Diffstat:
include/linux/pagevec.h | 18 ++
include/linux/writeback.h | 24 +++
mm/page-writeback.c | 313 +++++++++++++++++++++++++---------------------
3 files changed, 211 insertions(+), 144 deletions(-)


2023-12-14 13:26:45

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/11] writeback: Simplify the loops in write_cache_pages()

From: "Matthew Wilcox (Oracle)" <[email protected]>

Collapse the two nested loops into one. This is needed as a step
towards turning this into an iterator.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 98 ++++++++++++++++++++++-----------------------
1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 5a3df8665ff4f9..2087d16115710e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2460,6 +2460,7 @@ int write_cache_pages(struct address_space *mapping,
void *data)
{
int error;
+ int i = 0;

if (wbc->range_cyclic) {
wbc->index = mapping->writeback_index; /* prev offset */
@@ -2477,67 +2478,66 @@ int write_cache_pages(struct address_space *mapping,
folio_batch_init(&wbc->fbatch);
wbc->err = 0;

- while (wbc->index <= wbc->end) {
- int i;
-
- writeback_get_batch(mapping, wbc);
+ for (;;) {
+ struct folio *folio;
+ unsigned long nr;

+ if (i == wbc->fbatch.nr) {
+ writeback_get_batch(mapping, wbc);
+ i = 0;
+ }
if (wbc->fbatch.nr == 0)
break;

- for (i = 0; i < wbc->fbatch.nr; i++) {
- struct folio *folio = wbc->fbatch.folios[i];
- unsigned long nr;
+ folio = wbc->fbatch.folios[i++];

- wbc->done_index = folio->index;
+ wbc->done_index = folio->index;

- folio_lock(folio);
- if (!should_writeback_folio(mapping, wbc, folio)) {
- folio_unlock(folio);
- continue;
- }
+ folio_lock(folio);
+ if (!should_writeback_folio(mapping, wbc, folio)) {
+ folio_unlock(folio);
+ continue;
+ }

- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
-
- error = writepage(folio, wbc, data);
- nr = folio_nr_pages(folio);
- if (unlikely(error)) {
- /*
- * Handle errors according to the type of
- * writeback. There's no need to continue for
- * background writeback. Just push done_index
- * past this page so media errors won't choke
- * writeout for the entire file. For integrity
- * writeback, we must process the entire dirty
- * set regardless of errors because the fs may
- * still have state to clear for each page. In
- * that case we continue processing and return
- * the first error.
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- } else if (wbc->sync_mode != WB_SYNC_ALL) {
- wbc->err = error;
- wbc->done_index = folio->index + nr;
- return writeback_finish(mapping,
- wbc, true);
- }
- if (!wbc->err)
- wbc->err = error;
- }
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));

+ error = writepage(folio, wbc, data);
+ nr = folio_nr_pages(folio);
+ if (unlikely(error)) {
/*
- * We stop writing back only if we are not doing
- * integrity sync. In case of integrity sync we have to
- * keep going until we have written all the pages
- * we tagged for writeback prior to entering this loop.
+ * Handle errors according to the type of
+ * writeback. There's no need to continue for
+ * background writeback. Just push done_index
+ * past this page so media errors won't choke
+ * writeout for the entire file. For integrity
+ * writeback, we must process the entire dirty
+ * set regardless of errors because the fs may
+ * still have state to clear for each page. In
+ * that case we continue processing and return
+ * the first error.
*/
- wbc->nr_to_write -= nr;
- if (wbc->nr_to_write <= 0 &&
- wbc->sync_mode == WB_SYNC_NONE)
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ } else if (wbc->sync_mode != WB_SYNC_ALL) {
+ wbc->err = error;
+ wbc->done_index = folio->index + nr;
return writeback_finish(mapping, wbc, true);
+ }
+ if (!wbc->err)
+ wbc->err = error;
}
+
+ /*
+ * We stop writing back only if we are not doing
+ * integrity sync. In case of integrity sync we have to
+ * keep going until we have written all the pages
+ * we tagged for writeback prior to entering this loop.
+ */
+ wbc->nr_to_write -= nr;
+ if (wbc->nr_to_write <= 0 &&
+ wbc->sync_mode == WB_SYNC_NONE)
+ return writeback_finish(mapping, wbc, true);
}

return writeback_finish(mapping, wbc, false);
--
2.39.2

2023-12-14 13:26:46

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 10/11] writeback: Add for_each_writeback_folio()

From: "Matthew Wilcox (Oracle)" <[email protected]>

Wrap up the iterator with a nice bit of syntactic sugar. Now the
caller doesn't need to know about wbc->err and can just return error,
not knowing that the iterator took care of storing errors correctly.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/writeback.h | 14 +++++++++++---
mm/page-writeback.c | 11 ++++-------
2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index be960f92ad9dbd..b5fcf91cf18bdd 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -371,14 +371,22 @@ int balance_dirty_pages_ratelimited_flags(struct address_space *mapping,

bool wb_over_bg_thresh(struct bdi_writeback *wb);

+struct folio *writeback_iter_init(struct address_space *mapping,
+ struct writeback_control *wbc);
+struct folio *writeback_iter_next(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int error);
+
+#define for_each_writeback_folio(mapping, wbc, folio, error) \
+ for (folio = writeback_iter_init(mapping, wbc); \
+ folio || ((error = wbc->err), false); \
+ folio = writeback_iter_next(mapping, wbc, folio, error))
+
typedef int (*writepage_t)(struct folio *folio, struct writeback_control *wbc,
void *data);
-
-void tag_pages_for_writeback(struct address_space *mapping,
- pgoff_t start, pgoff_t end);
int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data);
+
int do_writepages(struct address_space *mapping, struct writeback_control *wbc);
void writeback_set_ratelimit(void);
void tag_pages_for_writeback(struct address_space *mapping,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4fae912f7a86e2..e4a1444502ccd4 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2450,7 +2450,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
return folio;
}

-static struct folio *writeback_iter_init(struct address_space *mapping,
+struct folio *writeback_iter_init(struct address_space *mapping,
struct writeback_control *wbc)
{
if (wbc->range_cyclic) {
@@ -2472,7 +2472,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
return writeback_get_folio(mapping, wbc);
}

-static struct folio *writeback_iter_next(struct address_space *mapping,
+struct folio *writeback_iter_next(struct address_space *mapping,
struct writeback_control *wbc, struct folio *folio, int error)
{
unsigned long nr = folio_nr_pages(folio);
@@ -2551,13 +2551,10 @@ int write_cache_pages(struct address_space *mapping,
struct folio *folio;
int error;

- for (folio = writeback_iter_init(mapping, wbc);
- folio;
- folio = writeback_iter_next(mapping, wbc, folio, error)) {
+ for_each_writeback_folio(mapping, wbc, folio, error)
error = writepage(folio, wbc, data);
- }

- return wbc->err;
+ return error;
}
EXPORT_SYMBOL(write_cache_pages);

--
2.39.2

2023-12-14 13:26:50

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 09/11] writeback: Factor writeback_iter_next() out of write_cache_pages()

From: "Matthew Wilcox (Oracle)" <[email protected]>

Pull the post-processing of the writepage_t callback into a
separate function. That means changing writeback_finish() to
return NULL, and writeback_get_next() to call writeback_finish()
when we naturally run out of folios.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 89 +++++++++++++++++++++++----------------------
1 file changed, 46 insertions(+), 43 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b0accca1f4bfa7..4fae912f7a86e2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,7 +2360,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
}
EXPORT_SYMBOL(tag_pages_for_writeback);

-static int writeback_finish(struct address_space *mapping,
+static struct folio *writeback_finish(struct address_space *mapping,
struct writeback_control *wbc, bool done)
{
folio_batch_release(&wbc->fbatch);
@@ -2375,7 +2375,7 @@ static int writeback_finish(struct address_space *mapping,
if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
mapping->writeback_index = wbc->done_index;

- return wbc->err;
+ return NULL;
}

static struct folio *writeback_get_next(struct address_space *mapping,
@@ -2437,7 +2437,7 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
for (;;) {
folio = writeback_get_next(mapping, wbc);
if (!folio)
- return NULL;
+ return writeback_finish(mapping, wbc, false);
wbc->done_index = folio->index;

folio_lock(folio);
@@ -2472,6 +2472,47 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
return writeback_get_folio(mapping, wbc);
}

+static struct folio *writeback_iter_next(struct address_space *mapping,
+ struct writeback_control *wbc, struct folio *folio, int error)
+{
+ unsigned long nr = folio_nr_pages(folio);
+
+ if (unlikely(error)) {
+ /*
+ * Handle errors according to the type of writeback.
+ * There's no need to continue for background writeback.
+ * Just push done_index past this folio so media
+ * errors won't choke writeout for the entire file.
+ * For integrity writeback, we must process the entire
+ * dirty set regardless of errors because the fs may
+ * still have state to clear for each folio. In that
+ * case we continue processing and return the first error.
+ */
+ if (error == AOP_WRITEPAGE_ACTIVATE) {
+ folio_unlock(folio);
+ error = 0;
+ } else if (wbc->sync_mode != WB_SYNC_ALL) {
+ wbc->err = error;
+ wbc->done_index = folio->index + nr;
+ return writeback_finish(mapping, wbc, true);
+ }
+ if (!wbc->err)
+ wbc->err = error;
+ }
+
+ /*
+ * We stop writing back only if we are not doing integrity
+ * sync. In case of integrity sync we have to keep going until
+ * we have written all the folios we tagged for writeback prior
+ * to entering this loop.
+ */
+ wbc->nr_to_write -= nr;
+ if (wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
+ return writeback_finish(mapping, wbc, true);
+
+ return writeback_get_folio(mapping, wbc);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2512,49 +2553,11 @@ int write_cache_pages(struct address_space *mapping,

for (folio = writeback_iter_init(mapping, wbc);
folio;
- folio = writeback_get_folio(mapping, wbc)) {
- unsigned long nr;
-
+ folio = writeback_iter_next(mapping, wbc, folio, error)) {
error = writepage(folio, wbc, data);
- nr = folio_nr_pages(folio);
- if (unlikely(error)) {
- /*
- * Handle errors according to the type of
- * writeback. There's no need to continue for
- * background writeback. Just push done_index
- * past this page so media errors won't choke
- * writeout for the entire file. For integrity
- * writeback, we must process the entire dirty
- * set regardless of errors because the fs may
- * still have state to clear for each page. In
- * that case we continue processing and return
- * the first error.
- */
- if (error == AOP_WRITEPAGE_ACTIVATE) {
- folio_unlock(folio);
- error = 0;
- } else if (wbc->sync_mode != WB_SYNC_ALL) {
- wbc->err = error;
- wbc->done_index = folio->index + nr;
- return writeback_finish(mapping, wbc, true);
- }
- if (!wbc->err)
- wbc->err = error;
- }
-
- /*
- * We stop writing back only if we are not doing
- * integrity sync. In case of integrity sync we have to
- * keep going until we have written all the pages
- * we tagged for writeback prior to entering this loop.
- */
- wbc->nr_to_write -= nr;
- if (wbc->nr_to_write <= 0 &&
- wbc->sync_mode == WB_SYNC_NONE)
- return writeback_finish(mapping, wbc, true);
}

- return writeback_finish(mapping, wbc, false);
+ return wbc->err;
}
EXPORT_SYMBOL(write_cache_pages);

--
2.39.2

2023-12-14 13:26:50

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/11] pagevec: Add ability to iterate a queue

From: "Matthew Wilcox (Oracle)" <[email protected]>

Add a loop counter inside the folio_batch to let us iterate from 0-nr
instead of decrementing nr and treating the batch as a stack. It would
generate some very weird and suboptimal I/O patterns for page writeback
to iterate over the batch as a stack.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/pagevec.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 87cc678adc850b..fcc06c300a72c3 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -27,6 +27,7 @@ struct folio;
*/
struct folio_batch {
unsigned char nr;
+ unsigned char i;
bool percpu_pvec_drained;
struct folio *folios[PAGEVEC_SIZE];
};
@@ -40,12 +41,14 @@ struct folio_batch {
static inline void folio_batch_init(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->i = 0;
fbatch->percpu_pvec_drained = false;
}

static inline void folio_batch_reinit(struct folio_batch *fbatch)
{
fbatch->nr = 0;
+ fbatch->i = 0;
}

static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
@@ -75,6 +78,21 @@ static inline unsigned folio_batch_add(struct folio_batch *fbatch,
return folio_batch_space(fbatch);
}

+/**
+ * folio_batch_next - Return the next folio to process.
+ * @fbatch: The folio batch being processed.
+ *
+ * Use this function to implement a queue of folios.
+ *
+ * Return: The next folio in the queue, or NULL if the queue is empty.
+ */
+static inline struct folio *folio_batch_next(struct folio_batch *fbatch)
+{
+ if (fbatch->i == fbatch->nr)
+ return NULL;
+ return fbatch->folios[fbatch->i++];
+}
+
void __folio_batch_release(struct folio_batch *pvec);

static inline void folio_batch_release(struct folio_batch *fbatch)
--
2.39.2

2023-12-14 13:26:49

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 07/11] writeback: Factor writeback_iter_init() out of write_cache_pages()

From: "Matthew Wilcox (Oracle)" <[email protected]>

Make it return the first folio in the batch so that we can use it
in a typical for() pattern.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 47 +++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 21 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2243a0d1b2d3c7..8c220c6a7f824d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2429,6 +2429,28 @@ static bool should_writeback_folio(struct address_space *mapping,
return true;
}

+static struct folio *writeback_iter_init(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ if (wbc->range_cyclic) {
+ wbc->index = mapping->writeback_index; /* prev offset */
+ wbc->end = -1;
+ } else {
+ wbc->index = wbc->range_start >> PAGE_SHIFT;
+ wbc->end = wbc->range_end >> PAGE_SHIFT;
+ if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
+ wbc->range_whole = 1;
+ }
+ if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
+ tag_pages_for_writeback(mapping, wbc->index, wbc->end);
+
+ wbc->done_index = wbc->index;
+ folio_batch_init(&wbc->fbatch);
+ wbc->err = 0;
+
+ return writeback_get_next(mapping, wbc);
+}
+
/**
* write_cache_pages - walk the list of dirty pages of the given address space and write all of them.
* @mapping: address space structure to write
@@ -2464,31 +2486,14 @@ int write_cache_pages(struct address_space *mapping,
struct writeback_control *wbc, writepage_t writepage,
void *data)
{
+ struct folio *folio;
int error;

- if (wbc->range_cyclic) {
- wbc->index = mapping->writeback_index; /* prev offset */
- wbc->end = -1;
- } else {
- wbc->index = wbc->range_start >> PAGE_SHIFT;
- wbc->end = wbc->range_end >> PAGE_SHIFT;
- if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
- wbc->range_whole = 1;
- }
- if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
- tag_pages_for_writeback(mapping, wbc->index, wbc->end);
-
- wbc->done_index = wbc->index;
- folio_batch_init(&wbc->fbatch);
- wbc->err = 0;
-
- for (;;) {
- struct folio *folio = writeback_get_next(mapping, wbc);
+ for (folio = writeback_iter_init(mapping, wbc);
+ folio;
+ folio = writeback_get_next(mapping, wbc)) {
unsigned long nr;

- if (!folio)
- break;
-
wbc->done_index = folio->index;

folio_lock(folio);
--
2.39.2

2023-12-14 13:26:51

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/11] writeback: Remove a use of write_cache_pages() from do_writepages()

From: "Matthew Wilcox (Oracle)" <[email protected]>

Use the new for_each_writeback_folio() directly instead of indirecting
through a callback.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/page-writeback.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index e4a1444502ccd4..338021bdd136b9 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2558,13 +2558,21 @@ int write_cache_pages(struct address_space *mapping,
}
EXPORT_SYMBOL(write_cache_pages);

-static int writepage_cb(struct folio *folio, struct writeback_control *wbc,
- void *data)
+static int writeback_use_writepage(struct address_space *mapping,
+ struct writeback_control *wbc)
{
- struct address_space *mapping = data;
- int ret = mapping->a_ops->writepage(&folio->page, wbc);
- mapping_set_error(mapping, ret);
- return ret;
+ struct blk_plug plug;
+ struct folio *folio;
+ int err;
+
+ blk_start_plug(&plug);
+ for_each_writeback_folio(mapping, wbc, folio, err) {
+ err = mapping->a_ops->writepage(&folio->page, wbc);
+ mapping_set_error(mapping, err);
+ }
+ blk_finish_plug(&plug);
+
+ return err;
}

int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
@@ -2580,12 +2588,7 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
if (mapping->a_ops->writepages) {
ret = mapping->a_ops->writepages(mapping, wbc);
} else if (mapping->a_ops->writepage) {
- struct blk_plug plug;
-
- blk_start_plug(&plug);
- ret = write_cache_pages(mapping, wbc, writepage_cb,
- mapping);
- blk_finish_plug(&plug);
+ ret = writeback_use_writepage(mapping, wbc);
} else {
/* deal with chardevs and other special files */
ret = 0;
--
2.39.2

2023-12-15 14:24:53

by Brian Foster

[permalink] [raw]
Subject: Re: [PATCH 09/11] writeback: Factor writeback_iter_next() out of write_cache_pages()

On Thu, Dec 14, 2023 at 02:25:42PM +0100, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Pull the post-processing of the writepage_t callback into a
> separate function. That means changing writeback_finish() to
> return NULL, and writeback_get_next() to call writeback_finish()
> when we naturally run out of folios.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> mm/page-writeback.c | 89 +++++++++++++++++++++++----------------------
> 1 file changed, 46 insertions(+), 43 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index b0accca1f4bfa7..4fae912f7a86e2 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2360,7 +2360,7 @@ void tag_pages_for_writeback(struct address_space *mapping,
> }
> EXPORT_SYMBOL(tag_pages_for_writeback);
>
> -static int writeback_finish(struct address_space *mapping,
> +static struct folio *writeback_finish(struct address_space *mapping,
> struct writeback_control *wbc, bool done)
> {
> folio_batch_release(&wbc->fbatch);
> @@ -2375,7 +2375,7 @@ static int writeback_finish(struct address_space *mapping,
> if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
> mapping->writeback_index = wbc->done_index;
>
> - return wbc->err;
> + return NULL;

The series looks reasonable to me on a first pass, but this stood out to
me as really odd. I was initially wondering if it made sense to use an
ERR_PTR() here or something, but on further staring it kind of seems
like this is better off being factored out of the internal iteration
paths. Untested and Probably Broken patch (based on this one) below as a
quick reference, but just a thought...

BTW it would be nicer to just drop the ->done field entirely as Jan has
already suggested, but I just stuffed it in wbc for simplicity.

Brian

--- 8< ---

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index be960f92ad9d..0babca17a2c0 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -84,6 +84,7 @@ struct writeback_control {
pgoff_t index;
pgoff_t end; /* inclusive */
pgoff_t done_index;
+ bool done;
int err;
unsigned range_whole:1; /* entire file */

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4fae912f7a86..3ee2058a2559 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2360,8 +2360,8 @@ void tag_pages_for_writeback(struct address_space *mapping,
}
EXPORT_SYMBOL(tag_pages_for_writeback);

-static struct folio *writeback_finish(struct address_space *mapping,
- struct writeback_control *wbc, bool done)
+static int writeback_iter_finish(struct address_space *mapping,
+ struct writeback_control *wbc)
{
folio_batch_release(&wbc->fbatch);

@@ -2370,12 +2370,12 @@ static struct folio *writeback_finish(struct address_space *mapping,
* wrap the index back to the start of the file for the next
* time we are called.
*/
- if (wbc->range_cyclic && !done)
+ if (wbc->range_cyclic && !wbc->done)
wbc->done_index = 0;
if (wbc->range_cyclic || (wbc->range_whole && wbc->nr_to_write > 0))
mapping->writeback_index = wbc->done_index;

- return NULL;
+ return wbc->err;
}

static struct folio *writeback_get_next(struct address_space *mapping,
@@ -2434,19 +2434,16 @@ static struct folio *writeback_get_folio(struct address_space *mapping,
{
struct folio *folio;

- for (;;) {
- folio = writeback_get_next(mapping, wbc);
- if (!folio)
- return writeback_finish(mapping, wbc, false);
+ while ((folio = writeback_get_next(mapping, wbc)) != NULL) {
wbc->done_index = folio->index;
-
folio_lock(folio);
if (likely(should_writeback_folio(mapping, wbc, folio)))
break;
folio_unlock(folio);
}

- trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
+ if (folio)
+ trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
return folio;
}

@@ -2466,6 +2463,7 @@ static struct folio *writeback_iter_init(struct address_space *mapping,
tag_pages_for_writeback(mapping, wbc->index, wbc->end);

wbc->done_index = wbc->index;
+ wbc->done = false;
folio_batch_init(&wbc->fbatch);
wbc->err = 0;

@@ -2494,7 +2492,8 @@ static struct folio *writeback_iter_next(struct address_space *mapping,
} else if (wbc->sync_mode != WB_SYNC_ALL) {
wbc->err = error;
wbc->done_index = folio->index + nr;
- return writeback_finish(mapping, wbc, true);
+ wbc->done = true;
+ return NULL;
}
if (!wbc->err)
wbc->err = error;
@@ -2507,8 +2506,10 @@ static struct folio *writeback_iter_next(struct address_space *mapping,
* to entering this loop.
*/
wbc->nr_to_write -= nr;
- if (wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE)
- return writeback_finish(mapping, wbc, true);
+ if (wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) {
+ wbc->done = true;
+ return NULL;
+ }

return writeback_get_folio(mapping, wbc);
}
@@ -2557,7 +2558,7 @@ int write_cache_pages(struct address_space *mapping,
error = writepage(folio, wbc, data);
}

- return wbc->err;
+ return writeback_iter_finish(mapping, wbc);
}
EXPORT_SYMBOL(write_cache_pages);



2023-12-15 14:26:12

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/11] writeback: Simplify the loops in write_cache_pages()

On Thu 14-12-23 14:25:37, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Collapse the two nested loops into one. This is needed as a step
> towards turning this into an iterator.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

It would be good to mention in the changelog that we drop the condition
index <= end and just rely on filemap_get_folios_tag() to return 0 entries
when index > end. This actually has a subtle implication when end == -1
because then the returned index will be -1 as well and thus if there is
page present on index -1, we could be looping indefinitely. But I think
that's mostly a theoretical concern so I'd be fine with just mentioning
this subtlety in the changelog and possibly in a comment in the code.

Honza

> ---
> mm/page-writeback.c | 98 ++++++++++++++++++++++-----------------------
> 1 file changed, 49 insertions(+), 49 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 5a3df8665ff4f9..2087d16115710e 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2460,6 +2460,7 @@ int write_cache_pages(struct address_space *mapping,
> void *data)
> {
> int error;
> + int i = 0;
>
> if (wbc->range_cyclic) {
> wbc->index = mapping->writeback_index; /* prev offset */
> @@ -2477,67 +2478,66 @@ int write_cache_pages(struct address_space *mapping,
> folio_batch_init(&wbc->fbatch);
> wbc->err = 0;
>
> - while (wbc->index <= wbc->end) {
> - int i;
> -
> - writeback_get_batch(mapping, wbc);
> + for (;;) {
> + struct folio *folio;
> + unsigned long nr;
>
> + if (i == wbc->fbatch.nr) {
> + writeback_get_batch(mapping, wbc);
> + i = 0;
> + }
> if (wbc->fbatch.nr == 0)
> break;
>
> - for (i = 0; i < wbc->fbatch.nr; i++) {
> - struct folio *folio = wbc->fbatch.folios[i];
> - unsigned long nr;
> + folio = wbc->fbatch.folios[i++];
>
> - wbc->done_index = folio->index;
> + wbc->done_index = folio->index;
>
> - folio_lock(folio);
> - if (!should_writeback_folio(mapping, wbc, folio)) {
> - folio_unlock(folio);
> - continue;
> - }
> + folio_lock(folio);
> + if (!should_writeback_folio(mapping, wbc, folio)) {
> + folio_unlock(folio);
> + continue;
> + }
>
> - trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> -
> - error = writepage(folio, wbc, data);
> - nr = folio_nr_pages(folio);
> - if (unlikely(error)) {
> - /*
> - * Handle errors according to the type of
> - * writeback. There's no need to continue for
> - * background writeback. Just push done_index
> - * past this page so media errors won't choke
> - * writeout for the entire file. For integrity
> - * writeback, we must process the entire dirty
> - * set regardless of errors because the fs may
> - * still have state to clear for each page. In
> - * that case we continue processing and return
> - * the first error.
> - */
> - if (error == AOP_WRITEPAGE_ACTIVATE) {
> - folio_unlock(folio);
> - error = 0;
> - } else if (wbc->sync_mode != WB_SYNC_ALL) {
> - wbc->err = error;
> - wbc->done_index = folio->index + nr;
> - return writeback_finish(mapping,
> - wbc, true);
> - }
> - if (!wbc->err)
> - wbc->err = error;
> - }
> + trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
>
> + error = writepage(folio, wbc, data);
> + nr = folio_nr_pages(folio);
> + if (unlikely(error)) {
> /*
> - * We stop writing back only if we are not doing
> - * integrity sync. In case of integrity sync we have to
> - * keep going until we have written all the pages
> - * we tagged for writeback prior to entering this loop.
> + * Handle errors according to the type of
> + * writeback. There's no need to continue for
> + * background writeback. Just push done_index
> + * past this page so media errors won't choke
> + * writeout for the entire file. For integrity
> + * writeback, we must process the entire dirty
> + * set regardless of errors because the fs may
> + * still have state to clear for each page. In
> + * that case we continue processing and return
> + * the first error.
> */
> - wbc->nr_to_write -= nr;
> - if (wbc->nr_to_write <= 0 &&
> - wbc->sync_mode == WB_SYNC_NONE)
> + if (error == AOP_WRITEPAGE_ACTIVATE) {
> + folio_unlock(folio);
> + error = 0;
> + } else if (wbc->sync_mode != WB_SYNC_ALL) {
> + wbc->err = error;
> + wbc->done_index = folio->index + nr;
> return writeback_finish(mapping, wbc, true);
> + }
> + if (!wbc->err)
> + wbc->err = error;
> }
> +
> + /*
> + * We stop writing back only if we are not doing
> + * integrity sync. In case of integrity sync we have to
> + * keep going until we have written all the pages
> + * we tagged for writeback prior to entering this loop.
> + */
> + wbc->nr_to_write -= nr;
> + if (wbc->nr_to_write <= 0 &&
> + wbc->sync_mode == WB_SYNC_NONE)
> + return writeback_finish(mapping, wbc, true);
> }
>
> return writeback_finish(mapping, wbc, false);
> --
> 2.39.2
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-12-16 04:51:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 09/11] writeback: Factor writeback_iter_next() out of write_cache_pages()

On Fri, Dec 15, 2023 at 09:17:05AM -0500, Brian Foster wrote:
> + while ((folio = writeback_get_next(mapping, wbc)) != NULL) {
> wbc->done_index = folio->index;
> folio_lock(folio);
> if (likely(should_writeback_folio(mapping, wbc, folio)))
> break;
> folio_unlock(folio);
> }
>
> + if (folio)
> + trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> return folio;

I posted a very similar transformation in reply to willy's first posting
of the series :) I guess that's 2:1 now and I might as well go ahead
and do something like that now :)

2023-12-16 06:16:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 04/11] writeback: Simplify the loops in write_cache_pages()

On Thu, Dec 14, 2023 at 02:25:37PM +0100, Christoph Hellwig wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> Collapse the two nested loops into one. This is needed as a step
> towards turning this into an iterator.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> mm/page-writeback.c | 98 ++++++++++++++++++++++-----------------------
> 1 file changed, 49 insertions(+), 49 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 5a3df8665ff4f9..2087d16115710e 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2460,6 +2460,7 @@ int write_cache_pages(struct address_space *mapping,
> void *data)
> {
> int error;
> + int i = 0;
>
> if (wbc->range_cyclic) {
> wbc->index = mapping->writeback_index; /* prev offset */
> @@ -2477,67 +2478,66 @@ int write_cache_pages(struct address_space *mapping,
> folio_batch_init(&wbc->fbatch);
> wbc->err = 0;
>
> - while (wbc->index <= wbc->end) {
> - int i;
> -
> - writeback_get_batch(mapping, wbc);
> + for (;;) {
> + struct folio *folio;
> + unsigned long nr;
>
> + if (i == wbc->fbatch.nr) {
> + writeback_get_batch(mapping, wbc);
> + i = 0;
> + }
> if (wbc->fbatch.nr == 0)
> break;
>
> - for (i = 0; i < wbc->fbatch.nr; i++) {
> - struct folio *folio = wbc->fbatch.folios[i];
> - unsigned long nr;
> + folio = wbc->fbatch.folios[i++];
>
> - wbc->done_index = folio->index;
> + wbc->done_index = folio->index;
>
> - folio_lock(folio);
> - if (!should_writeback_folio(mapping, wbc, folio)) {
> - folio_unlock(folio);
> - continue;
> - }
> + folio_lock(folio);
> + if (!should_writeback_folio(mapping, wbc, folio)) {
> + folio_unlock(folio);
> + continue;
> + }
>
> - trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> -
> - error = writepage(folio, wbc, data);
> - nr = folio_nr_pages(folio);
> - if (unlikely(error)) {
> - /*
> - * Handle errors according to the type of
> - * writeback. There's no need to continue for
> - * background writeback. Just push done_index
> - * past this page so media errors won't choke
> - * writeout for the entire file. For integrity
> - * writeback, we must process the entire dirty
> - * set regardless of errors because the fs may
> - * still have state to clear for each page. In
> - * that case we continue processing and return
> - * the first error.
> - */
> - if (error == AOP_WRITEPAGE_ACTIVATE) {
> - folio_unlock(folio);
> - error = 0;
> - } else if (wbc->sync_mode != WB_SYNC_ALL) {
> - wbc->err = error;
> - wbc->done_index = folio->index + nr;
> - return writeback_finish(mapping,
> - wbc, true);
> - }
> - if (!wbc->err)
> - wbc->err = error;
> - }
> + trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
>
> + error = writepage(folio, wbc, data);
> + nr = folio_nr_pages(folio);
> + if (unlikely(error)) {
> /*
> - * We stop writing back only if we are not doing
> - * integrity sync. In case of integrity sync we have to
> - * keep going until we have written all the pages
> - * we tagged for writeback prior to entering this loop.
> + * Handle errors according to the type of
> + * writeback. There's no need to continue for
> + * background writeback. Just push done_index
> + * past this page so media errors won't choke
> + * writeout for the entire file. For integrity
> + * writeback, we must process the entire dirty
> + * set regardless of errors because the fs may
> + * still have state to clear for each page. In
> + * that case we continue processing and return
> + * the first error.
> */
> - wbc->nr_to_write -= nr;
> - if (wbc->nr_to_write <= 0 &&
> - wbc->sync_mode == WB_SYNC_NONE)
> + if (error == AOP_WRITEPAGE_ACTIVATE) {
> + folio_unlock(folio);
> + error = 0;
> + } else if (wbc->sync_mode != WB_SYNC_ALL) {
> + wbc->err = error;
> + wbc->done_index = folio->index + nr;
> return writeback_finish(mapping, wbc, true);
> + }
> + if (!wbc->err)
> + wbc->err = error;
> }
> +
> + /*
> + * We stop writing back only if we are not doing
> + * integrity sync. In case of integrity sync we have to
> + * keep going until we have written all the pages
> + * we tagged for writeback prior to entering this loop.
> + */
> + wbc->nr_to_write -= nr;
> + if (wbc->nr_to_write <= 0 &&
> + wbc->sync_mode == WB_SYNC_NONE)
> + return writeback_finish(mapping, wbc, true);
> }
>
> return writeback_finish(mapping, wbc, false);
> --
> 2.39.2
>