2023-06-28 19:01:55

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] writeback: Account the number of pages written back

nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
might end up writing 512x as many pages as it asked for.

Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/page-writeback.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 1d17fb1ec863..d3f42009bb70 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2434,6 +2434,7 @@ int write_cache_pages(struct address_space *mapping,

for (i = 0; i < nr_folios; i++) {
struct folio *folio = fbatch.folios[i];
+ unsigned long nr;

done_index = folio->index;

@@ -2471,6 +2472,7 @@ int write_cache_pages(struct address_space *mapping,

trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
error = writepage(folio, wbc, data);
+ nr = folio_nr_pages(folio);
if (unlikely(error)) {
/*
* Handle errors according to the type of
@@ -2489,8 +2491,7 @@ int write_cache_pages(struct address_space *mapping,
error = 0;
} else if (wbc->sync_mode != WB_SYNC_ALL) {
ret = error;
- done_index = folio->index +
- folio_nr_pages(folio);
+ done_index = folio->index + nr;
done = 1;
break;
}
@@ -2504,7 +2505,8 @@ int write_cache_pages(struct address_space *mapping,
* keep going until we have written all the pages
* we tagged for writeback prior to entering this loop.
*/
- if (--wbc->nr_to_write <= 0 &&
+ wbc->nr_to_write -= nr;
+ if (wbc->nr_to_write <= 0 &&
wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
--
2.39.2



2023-06-28 22:24:17

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Wed, Jun 28, 2023 at 07:55:48PM +0100, Matthew Wilcox (Oracle) wrote:
> nr_to_write is a count of pages, so we need to decrease it by the number
> of pages in the folio we just wrote, not by 1. Most callers specify
> either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> might end up writing 512x as many pages as it asked for.
>
> Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> mm/page-writeback.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 1d17fb1ec863..d3f42009bb70 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2434,6 +2434,7 @@ int write_cache_pages(struct address_space *mapping,
>
> for (i = 0; i < nr_folios; i++) {
> struct folio *folio = fbatch.folios[i];
> + unsigned long nr;
>
> done_index = folio->index;
>
> @@ -2471,6 +2472,7 @@ int write_cache_pages(struct address_space *mapping,
>
> trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> error = writepage(folio, wbc, data);
> + nr = folio_nr_pages(folio);

This should really be done before writepage() is called, right? By
the time the writepage() returns, the folio can be unlocked, the
entire write completed and the folio partially invalidated which may
try to split the folio...

Even if this can't happen (folio refcount is elevated, right?), it
makes much more sense to me to sample the size of the folio while it
is obviously locked and not going to change...

Other than that, change looks good.

-Dave.
--
Dave Chinner
[email protected]

2023-06-29 00:06:26

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Thu, Jun 29, 2023 at 07:53:44AM +1000, Dave Chinner wrote:
> On Wed, Jun 28, 2023 at 07:55:48PM +0100, Matthew Wilcox (Oracle) wrote:
> > nr_to_write is a count of pages, so we need to decrease it by the number
> > of pages in the folio we just wrote, not by 1. Most callers specify
> > either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> > might end up writing 512x as many pages as it asked for.
> >
> > Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > mm/page-writeback.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 1d17fb1ec863..d3f42009bb70 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -2434,6 +2434,7 @@ int write_cache_pages(struct address_space *mapping,
> >
> > for (i = 0; i < nr_folios; i++) {
> > struct folio *folio = fbatch.folios[i];
> > + unsigned long nr;
> >
> > done_index = folio->index;
> >
> > @@ -2471,6 +2472,7 @@ int write_cache_pages(struct address_space *mapping,
> >
> > trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> > error = writepage(folio, wbc, data);
> > + nr = folio_nr_pages(folio);
>
> This should really be done before writepage() is called, right? By
> the time the writepage() returns, the folio can be unlocked, the
> entire write completed and the folio partially invalidated which may
> try to split the folio...
>
> Even if this can't happen (folio refcount is elevated, right?), it
> makes much more sense to me to sample the size of the folio while it
> is obviously locked and not going to change...

It can't change because of the refcount we hold (that's put in the call
to folio_batch_release()). I didn't want to call it before the call to
writepage() because that makes the compiler stick it on the stack instead
of a local variable. Also, when we transform this into an iterator (see
patches posted yesterday), we'd have to stash it away in the iterator.

2023-06-29 01:29:23

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Thu, Jun 29, 2023 at 01:01:59AM +0100, Matthew Wilcox wrote:
> On Thu, Jun 29, 2023 at 07:53:44AM +1000, Dave Chinner wrote:
> > On Wed, Jun 28, 2023 at 07:55:48PM +0100, Matthew Wilcox (Oracle) wrote:
> > > nr_to_write is a count of pages, so we need to decrease it by the number
> > > of pages in the folio we just wrote, not by 1. Most callers specify
> > > either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> > > might end up writing 512x as many pages as it asked for.
> > >
> > > Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
> > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > ---
> > > mm/page-writeback.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 1d17fb1ec863..d3f42009bb70 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -2434,6 +2434,7 @@ int write_cache_pages(struct address_space *mapping,
> > >
> > > for (i = 0; i < nr_folios; i++) {
> > > struct folio *folio = fbatch.folios[i];
> > > + unsigned long nr;
> > >
> > > done_index = folio->index;
> > >
> > > @@ -2471,6 +2472,7 @@ int write_cache_pages(struct address_space *mapping,
> > >
> > > trace_wbc_writepage(wbc, inode_to_bdi(mapping->host));
> > > error = writepage(folio, wbc, data);
> > > + nr = folio_nr_pages(folio);
> >
> > This should really be done before writepage() is called, right? By
> > the time the writepage() returns, the folio can be unlocked, the
> > entire write completed and the folio partially invalidated which may
> > try to split the folio...
> >
> > Even if this can't happen (folio refcount is elevated, right?), it
> > makes much more sense to me to sample the size of the folio while it
> > is obviously locked and not going to change...
>
> It can't change because of the refcount we hold (that's put in the call
> to folio_batch_release()). I didn't want to call it before the call to
> writepage() because that makes the compiler stick it on the stack instead
> of a local variable.

I don't care for micro-optimisations when the result is code
that looks dodgy and suspect and requires lots of additional
thinking about to determine that it is safe.

> Also, when we transform this into an iterator (see
> patches posted yesterday), we'd have to stash it away in the iterator.

That's no big deal, either.

-Dave.
--
Dave Chinner
[email protected]

2023-07-02 20:41:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Wed, 28 Jun 2023 19:55:48 +0100 "Matthew Wilcox (Oracle)" <[email protected]> wrote:

> nr_to_write is a count of pages, so we need to decrease it by the number
> of pages in the folio we just wrote, not by 1. Most callers specify
> either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> might end up writing 512x as many pages as it asked for.

512 is a big number, Should we backport this?

> Fixes: 793917d997df ("mm/readahead: Add large folio readahead")

I'm not seeing how a readahead change messed up writeback accounting?



2023-07-03 02:57:01

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Sun, Jul 02, 2023 at 01:06:15PM -0700, Andrew Morton wrote:
> On Wed, 28 Jun 2023 19:55:48 +0100 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
>
> > nr_to_write is a count of pages, so we need to decrease it by the number
> > of pages in the folio we just wrote, not by 1. Most callers specify
> > either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> > might end up writing 512x as many pages as it asked for.
>
> 512 is a big number, Should we backport this?

I'm really not sure. Maybe? I'm hoping one of the bots comes up with a
meaningful performance change as a result of this patch and we find out.

> > Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
>
> I'm not seeing how a readahead change messed up writeback accounting?

That was the first patch which allowed large folios to be added to the
page cache. Until that point, this was latent. We could probably argue
for one of a dozen other commits around the same time.

2023-07-03 21:48:28

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Mon, Jul 03, 2023 at 03:13:15AM +0100, Matthew Wilcox wrote:
> On Sun, Jul 02, 2023 at 01:06:15PM -0700, Andrew Morton wrote:
> > On Wed, 28 Jun 2023 19:55:48 +0100 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
> >
> > > nr_to_write is a count of pages, so we need to decrease it by the number
> > > of pages in the folio we just wrote, not by 1. Most callers specify
> > > either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> > > might end up writing 512x as many pages as it asked for.
> >
> > 512 is a big number, Should we backport this?
>
> I'm really not sure. Maybe? I'm hoping one of the bots comes up with a
> meaningful performance change as a result of this patch and we find out.

XFS is the only filesystem this would affect, right? AFAIA, nothing
else enables large folios and uses writeback through
write_cache_pages() at this point...

In which case, I'd be surprised if much difference, if any, gets
noticed by anyone.

-Dave.
--
Dave Chinner
[email protected]

2023-07-04 18:07:06

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] writeback: Account the number of pages written back

On Tue, Jul 04, 2023 at 07:37:17AM +1000, Dave Chinner wrote:
> On Mon, Jul 03, 2023 at 03:13:15AM +0100, Matthew Wilcox wrote:
> > On Sun, Jul 02, 2023 at 01:06:15PM -0700, Andrew Morton wrote:
> > > On Wed, 28 Jun 2023 19:55:48 +0100 "Matthew Wilcox (Oracle)" <[email protected]> wrote:
> > >
> > > > nr_to_write is a count of pages, so we need to decrease it by the number
> > > > of pages in the folio we just wrote, not by 1. Most callers specify
> > > > either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes()
> > > > might end up writing 512x as many pages as it asked for.
> > >
> > > 512 is a big number, Should we backport this?
> >
> > I'm really not sure. Maybe? I'm hoping one of the bots comes up with a
> > meaningful performance change as a result of this patch and we find out.
>
> XFS is the only filesystem this would affect, right? AFAIA, nothing
> else enables large folios and uses writeback through
> write_cache_pages() at this point...

Good point. Still, Intel's 0day has squawked about a loss of performance
when large folios have _stopped_ being used, so they are at least testing
with XFS.

2023-07-06 14:45:21

by Christoph Hellwig

[permalink] [raw]