2010-08-19 14:15:27

by Jeff Layton

[permalink] [raw]
Subject: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

I'm looking at backporting some upstream changes to earlier kernels,
and ran across something I don't quite understand...

In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
zero out the flags if wbc->nonblocking or wbc->for_background is set.

Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
not waiting on the COMMIT to complete?

For discussion purposes, here's a patch that shows what I'm talking
about (completely untested):

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 874972d..2fca906 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1440,7 +1440,8 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
nfsi->ncommit <= (nfsi->npages >> 1))
goto out_mark_dirty;

- if (wbc->nonblocking || wbc->for_background)
+ if (wbc->nonblocking || wbc->for_background ||
+ wbc->sync_mode == WB_SYNC_NONE)
flags = 0;
ret = nfs_commit_inode(inode, flags);
if (ret >= 0) {

--
Jeff Layton <[email protected]>


2010-08-19 14:37:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> I'm looking at backporting some upstream changes to earlier kernels,
> and ran across something I don't quite understand...
>
> In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> zero out the flags if wbc->nonblocking or wbc->for_background is set.
>
> Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> not waiting on the COMMIT to complete?

I've been trying to figure out what the nonblocking flag is supposed
to mean for a while now.

It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7

"writeback: remove unused nonblocking and congestion checks"

from Wu. What's left these days is a couple of places in local copies
of write_cache_pages (afs, cifs), and a couple of checks in random
writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
and the use in nfs_write_inode. It's only actually set for memory
migration and pageout, that is VM writeback.

To me it really doesn't make much sense, but maybe someone has a better
idea what it is for.

> + if (wbc->nonblocking || wbc->for_background ||
> + wbc->sync_mode == WB_SYNC_NONE)

You could remove the nonblocking and for_background checks as
these impliy WB_SYNC_NONE.


2010-08-20 11:28:07

by Jeff Layton

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, 20 Aug 2010 05:19:04 -0400
Christoph Hellwig <[email protected]> wrote:

> On Fri, Aug 20, 2010 at 07:55:53AM +0800, Wu Fengguang wrote:
> > Since migration and pageout still set nonblocking for ->writepage, we
> > may keep them in the near future, until VM does not start IO on itself.
>
> Why does pageout() and memory migration need to be even more
> non-blocking than the already non-blockig WB_SYNC_NONE writeout?
>

Just an idle thought on this...

I think a lot of the confusion here comes from the fact that we have
sync_mode and a bunch of flags, and it's not at all clear how
filesystems are supposed to treat the union of them. There are also
possible unions of flags/sync_modes that never happen in practice. It's
not always obvious though and as filesystem implementors we have to
consider the possibility that they might occur (consider WB_SYNC_ALL +
for_background).

Perhaps a lot of this confusion could be lifted by getting rid of the
extra flags and adding new sync_mode's. Maybe something like:

WB_SYNC_ALL /* wait on everything to complete */
WB_SYNC_NONE /* don't wait on anything */
WB_SYNC_FOR_RECLAIM /* sync for reclaim */
WB_SYNC_FOR_KUPDATED /* sync by kupdate */
...etc...

That does mean that all of the filesystem specific code may need to be
touched when new modes are added and removed. I think it would be
clearer though about what you're supposed to do in ->writepages.

--
Jeff Layton <[email protected]>

2010-08-30 19:23:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, 2010-08-20 at 21:23 +0800, Wu Fengguang wrote:
> > > Here's a lightly tested patch that turns the check for the two flags
> > > into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> > > don't have a clear testcase for it. Does this look reasonable?
> >
> > Looks fine to me. I'll queue it up for the post-2.6.36 merge window...
>
> Trond, I just created a patch that removes the wbc->nonblocking
> definition and all its references except NFS. So there will be merge
> dependencies. What should we do? To push both patches to Andrew's -mm
> tree?
>
> Thanks,
> Fengguang

Do you want to include it as part of your series? Just remember to add
an

Acked-by: Trond Myklebust <[email protected]>

Cheers
Trond


2010-08-19 23:56:07

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, Aug 19, 2010 at 10:37:10AM -0400, Christoph Hellwig wrote:
> On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > I'm looking at backporting some upstream changes to earlier kernels,
> > and ran across something I don't quite understand...
> >
> > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> >
> > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > not waiting on the COMMIT to complete?
>
> I've been trying to figure out what the nonblocking flag is supposed
> to mean for a while now.
>
> It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
>
> "writeback: remove unused nonblocking and congestion checks"
>
> from Wu. What's left these days is a couple of places in local copies
> of write_cache_pages (afs, cifs), and a couple of checks in random
> writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> and the use in nfs_write_inode.

In principle all nonblocking checks in ->writepages should be removed.

(My original patch does have chunks for afs/cifs that somehow get
dropped in the process, and missed ceph because it's not upstream
when I started patch..)

> It's only actually set for memory
> migration and pageout, that is VM writeback.
>
> To me it really doesn't make much sense, but maybe someone has a better
> idea what it is for.

Since migration and pageout still set nonblocking for ->writepage, we
may keep them in the near future, until VM does not start IO on itself.

> > + if (wbc->nonblocking || wbc->for_background ||
> > + wbc->sync_mode == WB_SYNC_NONE)
>
> You could remove the nonblocking and for_background checks as
> these impliy WB_SYNC_NONE.

Agreed.

Thanks,
Fengguang
---
writeback: remove useless nonblocking checks in ->writepages

This removes more deadcode that was somehow missed by commit 0d99519efef
(writeback: remove unused nonblocking and congestion checks).

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion. The latter will lead to more seeky IO.

CC: Chris Mason <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Christoph Hellwig <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/afs/write.c | 16 +---------------
fs/cifs/file.c | 10 ----------
2 files changed, 1 insertion(+), 25 deletions(-)

--- linux-next.orig/fs/afs/write.c 2010-06-24 14:32:01.000000000 +0800
+++ linux-next/fs/afs/write.c 2010-08-20 07:03:01.000000000 +0800
@@ -455,8 +455,6 @@ int afs_writepage(struct page *page, str
}

wbc->nr_to_write -= ret;
- if (wbc->nonblocking && bdi_write_congested(bdi))
- wbc->encountered_congestion = 1;

_leave(" = 0");
return 0;
@@ -529,11 +527,6 @@ static int afs_writepages_region(struct

wbc->nr_to_write -= ret;

- if (wbc->nonblocking && bdi_write_congested(bdi)) {
- wbc->encountered_congestion = 1;
- break;
- }
-
cond_resched();
} while (index < end && wbc->nr_to_write > 0);

@@ -554,18 +547,11 @@ int afs_writepages(struct address_space

_enter("");

- if (wbc->nonblocking && bdi_write_congested(bdi)) {
- wbc->encountered_congestion = 1;
- _leave(" = 0 [congest]");
- return 0;
- }
-
if (wbc->range_cyclic) {
start = mapping->writeback_index;
end = -1;
ret = afs_writepages_region(mapping, wbc, start, end, &next);
- if (start > 0 && wbc->nr_to_write > 0 && ret == 0 &&
- !(wbc->nonblocking && wbc->encountered_congestion))
+ if (start > 0 && wbc->nr_to_write > 0 && ret == 0)
ret = afs_writepages_region(mapping, wbc, 0, start,
&next);
mapping->writeback_index = next;
--- linux-next.orig/fs/cifs/file.c 2010-08-20 06:57:11.000000000 +0800
+++ linux-next/fs/cifs/file.c 2010-08-20 07:03:01.000000000 +0800
@@ -1379,16 +1379,6 @@ static int cifs_writepages(struct addres
return generic_writepages(mapping, wbc);


- /*
- * BB: Is this meaningful for a non-block-device file system?
- * If it is, we should test it again after we do I/O
- */
- if (wbc->nonblocking && bdi_write_congested(bdi)) {
- wbc->encountered_congestion = 1;
- kfree(iov);
- return 0;
- }
-
xid = GetXid();

pagevec_init(&pvec, 0);

2010-08-19 19:16:29

by Jeff Layton

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, 19 Aug 2010 10:58:25 -0400
Trond Myklebust <[email protected]> wrote:

> On Thu, 2010-08-19 at 10:37 -0400, Christoph Hellwig wrote:
> > On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > > I'm looking at backporting some upstream changes to earlier kernels,
> > > and ran across something I don't quite understand...
> > >
> > > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> > >
> > > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > > not waiting on the COMMIT to complete?
> >
> > I've been trying to figure out what the nonblocking flag is supposed
> > to mean for a while now.
> >
> > It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
> >
> > "writeback: remove unused nonblocking and congestion checks"
> >
> > from Wu. What's left these days is a couple of places in local copies
> > of write_cache_pages (afs, cifs), and a couple of checks in random
> > writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> > and the use in nfs_write_inode. It's only actually set for memory
> > migration and pageout, that is VM writeback.
> >
> > To me it really doesn't make much sense, but maybe someone has a better
> > idea what it is for.
> >
> > > + if (wbc->nonblocking || wbc->for_background ||
> > > + wbc->sync_mode == WB_SYNC_NONE)
> >
> > You could remove the nonblocking and for_background checks as
> > these impliy WB_SYNC_NONE.
>
> To me that sounds fine. I've also been trying to wrap my head around the
> differences between 'nonblocking', 'for_background', 'for_reclaim' and
> 'for_kupdate' and how the filesystem is supposed to treat them.
>
> Aside from the above, I've used 'for_reclaim', 'for_kupdate' and
> 'for_background' in order to adjust the RPC request's queuing priority
> (high in the case of 'for_reclaim' and low for the other two).
>

Here's a lightly tested patch that turns the check for the two flags
into a check for WB_SYNC_NONE. It seems to do the right thing, but I
don't have a clear testcase for it. Does this look reasonable?

------------------[snip]------------------------

NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls

WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
also include not waiting for COMMIT calls to complete.

WB_SYNC_NONE is also implied when wbc->nonblocking or
wbc->for_background are set, so we can replace those checks in
nfs_commit_unstable_pages with a check for WB_SYNC_NONE.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/write.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 874972d..35bd7d0 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1436,12 +1436,12 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
/* Don't commit yet if this is a non-blocking flush and there are
* lots of outstanding writes for this mapping.
*/
- if (wbc->sync_mode == WB_SYNC_NONE &&
- nfsi->ncommit <= (nfsi->npages >> 1))
- goto out_mark_dirty;
-
- if (wbc->nonblocking || wbc->for_background)
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (nfsi->ncommit <= (nfsi->npages >> 1))
+ goto out_mark_dirty;
flags = 0;
+ }
+
ret = nfs_commit_inode(inode, flags);
if (ret >= 0) {
if (wbc->sync_mode == WB_SYNC_NONE) {

--
1.5.5.6

2010-08-19 15:24:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, Aug 19, 2010 at 10:58:25AM -0400, Trond Myklebust wrote:
> To me that sounds fine. I've also been trying to wrap my head around the
> differences between 'nonblocking', 'for_background', 'for_reclaim' and
> 'for_kupdate' and how the filesystem is supposed to treat them.

Yeah, it's not clear to me either. for_background is in fact only used
in nfs, for the priority and the nfs_commit_inode flags, for_kupdate
is only used in nfs, and in a really weird spot in btrfs, and
for_reclaim is used in nfs, and two places in nilfs2 and in shmemfs.

> Aside from the above, I've used 'for_reclaim', 'for_kupdate' and
> 'for_background' in order to adjust the RPC request's queuing priority
> (high in the case of 'for_reclaim' and low for the other two).

Right now writepage calls to the filesystem can come from various
places:

- the flusher threads
- VM reclaim (kswapd, memcg, direct reclaim)
- memory migration
- filemap_fdatawrite & other calls directly from FS code, also
including fsync

We have WB_SYNC_ALL set for the second, data integrity pass when doing
a sync from the flusher threads, and when doing data integrity writes
from fs context (most fsync but also a few others). All these obviously
are high priority. It's not too easy to set priorities for the others
in my opinion.


2010-08-20 12:26:40

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, Aug 20, 2010 at 05:19:04AM -0400, Christoph Hellwig wrote:
> On Fri, Aug 20, 2010 at 07:55:53AM +0800, Wu Fengguang wrote:
> > Since migration and pageout still set nonblocking for ->writepage, we
> > may keep them in the near future, until VM does not start IO on itself.
>
> Why does pageout() and memory migration need to be even more
> non-blocking than the already non-blockig WB_SYNC_NONE writeout?

Good question! So the other nonblocking checks in ->writepage are also
redundant code, let's rip them all!

Thanks,
Fengguang
---

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 72a5d64..bd11c0e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -45,8 +45,6 @@ struct writeback_control {
loff_t range_start;
loff_t range_end;

- unsigned nonblocking:1; /* Don't get stuck on request queues */
- unsigned encountered_congestion:1; /* An output: a queue is full */
unsigned for_kupdate:1; /* A kupdate writeback */
unsigned for_background:1; /* A background writeback */
unsigned for_reclaim:1; /* Invoked from the page allocator */
diff --git a/fs/buffer.c b/fs/buffer.c
index 3e7dca2..c9a7a80 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1706,7 +1706,7 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
* and kswapd activity, but those code paths have their own
* higher-level throttling.
*/
- if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) {
+ if (wbc->sync_mode != WB_SYNC_NONE) {
lock_buffer(bh);
} else if (!trylock_buffer(bh)) {
redirty_page_for_writepage(wbc, page);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index f3b071f..939739c 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -55,7 +55,7 @@ static int gfs2_aspace_writepage(struct page *page, struct writeback_control *wb
* activity, but those code paths have their own higher-level
* throttling.
*/
- if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) {
+ if (wbc->sync_mode != WB_SYNC_NONE) {
lock_buffer(bh);
} else if (!trylock_buffer(bh)) {
redirty_page_for_writepage(wbc, page);
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index caa7583..c1f9389 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2438,7 +2438,7 @@ static int reiserfs_write_full_page(struct page *page,
/* from this point on, we know the buffer is mapped to a
* real block and not a direct item
*/
- if (wbc->sync_mode != WB_SYNC_NONE || !wbc->nonblocking) {
+ if (wbc->sync_mode != WB_SYNC_NONE) {
lock_buffer(bh);
} else {
if (!trylock_buffer(bh)) {
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 15412fe..ec495a4 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -1139,8 +1139,7 @@ xfs_vm_writepage(
type = IO_DELAY;
flags = BMAPI_ALLOCATE;

- if (wbc->sync_mode == WB_SYNC_NONE &&
- wbc->nonblocking)
+ if (wbc->sync_mode == WB_SYNC_NONE)
flags |= BMAPI_TRYLOCK;
}

diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index f345f66..0bb01ab 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -96,8 +96,6 @@ DECLARE_EVENT_CLASS(wbc_class,
__field(long, nr_to_write)
__field(long, pages_skipped)
__field(int, sync_mode)
- __field(int, nonblocking)
- __field(int, encountered_congestion)
__field(int, for_kupdate)
__field(int, for_background)
__field(int, for_reclaim)
diff --git a/mm/migrate.c b/mm/migrate.c
index 38e7cad..1e5914a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -431,7 +431,6 @@ static int writeout(struct address_space *mapping, struct page *page)
.nr_to_write = 1,
.range_start = 0,
.range_end = LLONG_MAX,
- .nonblocking = 1,
.for_reclaim = 1
};
int rc;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 225a759..3520523 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -376,7 +376,6 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
.nr_to_write = SWAP_CLUSTER_MAX,
.range_start = 0,
.range_end = LLONG_MAX,
- .nonblocking = 1,
.for_reclaim = 1,
};


2010-08-20 00:50:15

by Jeff Layton

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, 20 Aug 2010 08:33:08 +0800
Wu Fengguang <[email protected]> wrote:

> > Here's a lightly tested patch that turns the check for the two flags
> > into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> > don't have a clear testcase for it. Does this look reasonable?
>
> Yes, I don't see any problems.
>
> > ------------------[snip]------------------------
> >
> > NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls
> >
> > WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
> > also include not waiting for COMMIT calls to complete.
> >
> > WB_SYNC_NONE is also implied when wbc->nonblocking or
> > wbc->for_background are set, so we can replace those checks in
> > nfs_commit_unstable_pages with a check for WB_SYNC_NONE.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfs/write.c | 10 +++++-----
> > 1 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 874972d..35bd7d0 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1436,12 +1436,12 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
> > /* Don't commit yet if this is a non-blocking flush and there are
> > * lots of outstanding writes for this mapping.
> > */
> > - if (wbc->sync_mode == WB_SYNC_NONE &&
> > - nfsi->ncommit <= (nfsi->npages >> 1))
> > - goto out_mark_dirty;
> > -
> > - if (wbc->nonblocking || wbc->for_background)
> > + if (wbc->sync_mode == WB_SYNC_NONE) {
> > + if (nfsi->ncommit <= (nfsi->npages >> 1))
> > + goto out_mark_dirty;
> > flags = 0;
> > + }
> > +
>
> nitpick: I'd slightly prefer an one-line change
>
> - if (wbc->nonblocking || wbc->for_background)
> + if (wbc->sync_mode == WB_SYNC_NONE)
> flags = 0;
>
> That way the patch will look more obvious and "git blame" friendly,
> and the original "Don't commit.." comment will best match its code.
>
> Reviewed-by: Wu Fengguang <[email protected]>
>
> Thanks,
> Fengguang

Either way. I figured it would be slightly more efficient to just check
WB_SYNC_NONE once in that function. I suppose we could just fix up the
comments instead. Let me know if I should respin the patch with updated
comments...

Thanks for the review...
--
Jeff Layton <[email protected]>

2010-08-19 15:11:32

by Jeff Layton

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, 19 Aug 2010 10:58:25 -0400
Trond Myklebust <[email protected]> wrote:

> On Thu, 2010-08-19 at 10:37 -0400, Christoph Hellwig wrote:
> > On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > > I'm looking at backporting some upstream changes to earlier kernels,
> > > and ran across something I don't quite understand...
> > >
> > > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> > >
> > > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > > not waiting on the COMMIT to complete?
> >
> > I've been trying to figure out what the nonblocking flag is supposed
> > to mean for a while now.
> >
> > It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
> >
> > "writeback: remove unused nonblocking and congestion checks"
> >
> > from Wu. What's left these days is a couple of places in local copies
> > of write_cache_pages (afs, cifs), and a couple of checks in random
> > writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> > and the use in nfs_write_inode. It's only actually set for memory
> > migration and pageout, that is VM writeback.
> >
> > To me it really doesn't make much sense, but maybe someone has a better
> > idea what it is for.
> >
> > > + if (wbc->nonblocking || wbc->for_background ||
> > > + wbc->sync_mode == WB_SYNC_NONE)
> >
> > You could remove the nonblocking and for_background checks as
> > these impliy WB_SYNC_NONE.
>
> To me that sounds fine. I've also been trying to wrap my head around the
> differences between 'nonblocking', 'for_background', 'for_reclaim' and
> 'for_kupdate' and how the filesystem is supposed to treat them.
>
> Aside from the above, I've used 'for_reclaim', 'for_kupdate' and
> 'for_background' in order to adjust the RPC request's queuing priority
> (high in the case of 'for_reclaim' and low for the other two).
>

Ok, I don't really have a great way to test the above change though
aside from sticking it into the backport I'm working on for RHEL5
(2.6.18).

I suspect that the existing flag checks probably cover a lot of the
WB_SYNC_NONE cases already. Changing it to a check for WB_SYNC_NONE
would help me as RHEL5 doesn't have the for_background flag...

Cheers,
--
Jeff Layton <[email protected]>

2010-08-20 02:35:50

by Sage Weil

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, 20 Aug 2010, Wu Fengguang wrote:

> [add CC to afs/cifs/ceph maintainers]
>
> On Fri, Aug 20, 2010 at 07:55:53AM +0800, Wu Fengguang wrote:
> > On Thu, Aug 19, 2010 at 10:37:10AM -0400, Christoph Hellwig wrote:
> > > On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > > > I'm looking at backporting some upstream changes to earlier kernels,
> > > > and ran across something I don't quite understand...
> > > >
> > > > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > > > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> > > >
> > > > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > > > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > > > not waiting on the COMMIT to complete?
> > >
> > > I've been trying to figure out what the nonblocking flag is supposed
> > > to mean for a while now.
> > >
> > > It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
> > >
> > > "writeback: remove unused nonblocking and congestion checks"
> > >
> > > from Wu. What's left these days is a couple of places in local copies
> > > of write_cache_pages (afs, cifs), and a couple of checks in random
> > > writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> > > and the use in nfs_write_inode.
> >
> > In principle all nonblocking checks in ->writepages should be removed.
> >
> > (My original patch does have chunks for afs/cifs that somehow get
> > dropped in the process, and missed ceph because it's not upstream
> > when I started patch..)

I'll queue up a fix for Ceph's ->writepages in my tree.

Thanks!
sage


> >
> > > It's only actually set for memory
> > > migration and pageout, that is VM writeback.
> > >
> > > To me it really doesn't make much sense, but maybe someone has a better
> > > idea what it is for.
> >
> > Since migration and pageout still set nonblocking for ->writepage, we
> > may keep them in the near future, until VM does not start IO on itself.
> >
> > > > + if (wbc->nonblocking || wbc->for_background ||
> > > > + wbc->sync_mode == WB_SYNC_NONE)
> > >
> > > You could remove the nonblocking and for_background checks as
> > > these impliy WB_SYNC_NONE.
> >
> > Agreed.
> >
> > Thanks,
> > Fengguang
>

2010-08-20 13:20:49

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, Aug 19, 2010 at 08:53:32PM -0400, Jeff Layton wrote:
> On Fri, 20 Aug 2010 08:33:08 +0800
> Wu Fengguang <[email protected]> wrote:
>
> > > Here's a lightly tested patch that turns the check for the two flags
> > > into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> > > don't have a clear testcase for it. Does this look reasonable?
> >
> > Yes, I don't see any problems.
> >
> > > ------------------[snip]------------------------
> > >
> > > NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls
> > >
> > > WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
> > > also include not waiting for COMMIT calls to complete.
> > >
> > > WB_SYNC_NONE is also implied when wbc->nonblocking or
> > > wbc->for_background are set, so we can replace those checks in
> > > nfs_commit_unstable_pages with a check for WB_SYNC_NONE.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/nfs/write.c | 10 +++++-----
> > > 1 files changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > index 874972d..35bd7d0 100644
> > > --- a/fs/nfs/write.c
> > > +++ b/fs/nfs/write.c
> > > @@ -1436,12 +1436,12 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
> > > /* Don't commit yet if this is a non-blocking flush and there are
> > > * lots of outstanding writes for this mapping.
> > > */
> > > - if (wbc->sync_mode == WB_SYNC_NONE &&
> > > - nfsi->ncommit <= (nfsi->npages >> 1))
> > > - goto out_mark_dirty;
> > > -
> > > - if (wbc->nonblocking || wbc->for_background)
> > > + if (wbc->sync_mode == WB_SYNC_NONE) {
> > > + if (nfsi->ncommit <= (nfsi->npages >> 1))
> > > + goto out_mark_dirty;
> > > flags = 0;
> > > + }
> > > +
> >
> > nitpick: I'd slightly prefer an one-line change
> >
> > - if (wbc->nonblocking || wbc->for_background)
> > + if (wbc->sync_mode == WB_SYNC_NONE)
> > flags = 0;
> >
> > That way the patch will look more obvious and "git blame" friendly,
> > and the original "Don't commit.." comment will best match its code.
> >
> > Reviewed-by: Wu Fengguang <[email protected]>
> >
> > Thanks,
> > Fengguang
>
> Either way. I figured it would be slightly more efficient to just check
> WB_SYNC_NONE once in that function. I suppose we could just fix up the
> comments instead. Let me know if I should respin the patch with updated
> comments...

OK, please do the comments.

> Thanks for the review...

You are welcome.

Thanks,
Fengguang


2010-08-20 12:44:35

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, Aug 20, 2010 at 07:27:57AM -0400, Jeff Layton wrote:
> On Fri, 20 Aug 2010 05:19:04 -0400
> Christoph Hellwig <[email protected]> wrote:
>
> > On Fri, Aug 20, 2010 at 07:55:53AM +0800, Wu Fengguang wrote:
> > > Since migration and pageout still set nonblocking for ->writepage, we
> > > may keep them in the near future, until VM does not start IO on itself.
> >
> > Why does pageout() and memory migration need to be even more
> > non-blocking than the already non-blockig WB_SYNC_NONE writeout?
> >
>
> Just an idle thought on this...
>
> I think a lot of the confusion here comes from the fact that we have
> sync_mode and a bunch of flags, and it's not at all clear how
> filesystems are supposed to treat the union of them. There are also
> possible unions of flags/sync_modes that never happen in practice. It's
> not always obvious though and as filesystem implementors we have to
> consider the possibility that they might occur (consider WB_SYNC_ALL +
> for_background).
>
> Perhaps a lot of this confusion could be lifted by getting rid of the
> extra flags and adding new sync_mode's. Maybe something like:
>
> WB_SYNC_ALL /* wait on everything to complete */
> WB_SYNC_NONE /* don't wait on anything */
> WB_SYNC_FOR_RECLAIM /* sync for reclaim */
> WB_SYNC_FOR_KUPDATED /* sync by kupdate */
> ...etc...
>
> That does mean that all of the filesystem specific code may need to be
> touched when new modes are added and removed. I think it would be
> clearer though about what you're supposed to do in ->writepages.

No, we are moving towards the other direction :)

I just removed the definition of wbc->nonblocking and
wbc->encountered_congestion and all of the references.

Sorry for the confusion!

Thanks,
Fengguang

2010-08-20 13:23:20

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

> > Here's a lightly tested patch that turns the check for the two flags
> > into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> > don't have a clear testcase for it. Does this look reasonable?
>
> Looks fine to me. I'll queue it up for the post-2.6.36 merge window...

Trond, I just created a patch that removes the wbc->nonblocking
definition and all its references except NFS. So there will be merge
dependencies. What should we do? To push both patches to Andrew's -mm
tree?

Thanks,
Fengguang

2010-08-20 09:19:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Fri, Aug 20, 2010 at 07:55:53AM +0800, Wu Fengguang wrote:
> Since migration and pageout still set nonblocking for ->writepage, we
> may keep them in the near future, until VM does not start IO on itself.

Why does pageout() and memory migration need to be even more
non-blocking than the already non-blockig WB_SYNC_NONE writeout?


2010-08-19 14:58:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, 2010-08-19 at 10:37 -0400, Christoph Hellwig wrote:
> On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > I'm looking at backporting some upstream changes to earlier kernels,
> > and ran across something I don't quite understand...
> >
> > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> >
> > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > not waiting on the COMMIT to complete?
>
> I've been trying to figure out what the nonblocking flag is supposed
> to mean for a while now.
>
> It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
>
> "writeback: remove unused nonblocking and congestion checks"
>
> from Wu. What's left these days is a couple of places in local copies
> of write_cache_pages (afs, cifs), and a couple of checks in random
> writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> and the use in nfs_write_inode. It's only actually set for memory
> migration and pageout, that is VM writeback.
>
> To me it really doesn't make much sense, but maybe someone has a better
> idea what it is for.
>
> > + if (wbc->nonblocking || wbc->for_background ||
> > + wbc->sync_mode == WB_SYNC_NONE)
>
> You could remove the nonblocking and for_background checks as
> these impliy WB_SYNC_NONE.

To me that sounds fine. I've also been trying to wrap my head around the
differences between 'nonblocking', 'for_background', 'for_reclaim' and
'for_kupdate' and how the filesystem is supposed to treat them.

Aside from the above, I've used 'for_reclaim', 'for_kupdate' and
'for_background' in order to adjust the RPC request's queuing priority
(high in the case of 'for_reclaim' and low for the other two).

Cheers
Trond


2010-08-20 00:33:31

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

> Here's a lightly tested patch that turns the check for the two flags
> into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> don't have a clear testcase for it. Does this look reasonable?

Yes, I don't see any problems.

> ------------------[snip]------------------------
>
> NFS: don't use FLUSH_SYNC on WB_SYNC_NONE COMMIT calls
>
> WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
> also include not waiting for COMMIT calls to complete.
>
> WB_SYNC_NONE is also implied when wbc->nonblocking or
> wbc->for_background are set, so we can replace those checks in
> nfs_commit_unstable_pages with a check for WB_SYNC_NONE.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/write.c | 10 +++++-----
> 1 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 874972d..35bd7d0 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1436,12 +1436,12 @@ static int nfs_commit_unstable_pages(struct inode *inode, struct writeback_contr
> /* Don't commit yet if this is a non-blocking flush and there are
> * lots of outstanding writes for this mapping.
> */
> - if (wbc->sync_mode == WB_SYNC_NONE &&
> - nfsi->ncommit <= (nfsi->npages >> 1))
> - goto out_mark_dirty;
> -
> - if (wbc->nonblocking || wbc->for_background)
> + if (wbc->sync_mode == WB_SYNC_NONE) {
> + if (nfsi->ncommit <= (nfsi->npages >> 1))
> + goto out_mark_dirty;
> flags = 0;
> + }
> +

nitpick: I'd slightly prefer an one-line change

- if (wbc->nonblocking || wbc->for_background)
+ if (wbc->sync_mode == WB_SYNC_NONE)
flags = 0;

That way the patch will look more obvious and "git blame" friendly,
and the original "Don't commit.." comment will best match its code.

Reviewed-by: Wu Fengguang <[email protected]>

Thanks,
Fengguang

2010-08-20 00:03:14

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

[add CC to afs/cifs/ceph maintainers]

On Fri, Aug 20, 2010 at 07:55:53AM +0800, Wu Fengguang wrote:
> On Thu, Aug 19, 2010 at 10:37:10AM -0400, Christoph Hellwig wrote:
> > On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > > I'm looking at backporting some upstream changes to earlier kernels,
> > > and ran across something I don't quite understand...
> > >
> > > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> > >
> > > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > > not waiting on the COMMIT to complete?
> >
> > I've been trying to figure out what the nonblocking flag is supposed
> > to mean for a while now.
> >
> > It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
> >
> > "writeback: remove unused nonblocking and congestion checks"
> >
> > from Wu. What's left these days is a couple of places in local copies
> > of write_cache_pages (afs, cifs), and a couple of checks in random
> > writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> > and the use in nfs_write_inode.
>
> In principle all nonblocking checks in ->writepages should be removed.
>
> (My original patch does have chunks for afs/cifs that somehow get
> dropped in the process, and missed ceph because it's not upstream
> when I started patch..)
>
> > It's only actually set for memory
> > migration and pageout, that is VM writeback.
> >
> > To me it really doesn't make much sense, but maybe someone has a better
> > idea what it is for.
>
> Since migration and pageout still set nonblocking for ->writepage, we
> may keep them in the near future, until VM does not start IO on itself.
>
> > > + if (wbc->nonblocking || wbc->for_background ||
> > > + wbc->sync_mode == WB_SYNC_NONE)
> >
> > You could remove the nonblocking and for_background checks as
> > these impliy WB_SYNC_NONE.
>
> Agreed.
>
> Thanks,
> Fengguang


Attachments:
(No filename) (2.04 kB)
writeback-remove-congested-checks.patch (2.64 kB)
Download all attachments

2010-08-30 23:53:17

by Wu Fengguang

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Mon, Aug 30, 2010 at 03:22:54PM -0400, Trond Myklebust wrote:
> On Fri, 2010-08-20 at 21:23 +0800, Wu Fengguang wrote:
> > > > Here's a lightly tested patch that turns the check for the two flags
> > > > into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> > > > don't have a clear testcase for it. Does this look reasonable?
> > >
> > > Looks fine to me. I'll queue it up for the post-2.6.36 merge window...
> >
> > Trond, I just created a patch that removes the wbc->nonblocking
> > definition and all its references except NFS. So there will be merge
> > dependencies. What should we do? To push both patches to Andrew's -mm
> > tree?
> >
> > Thanks,
> > Fengguang
>
> Do you want to include it as part of your series? Just remember to add
> an
>
> Acked-by: Trond Myklebust <[email protected]>

Thanks. Please keep the NFS patches in your tree. I've send a patch
to Andrew Morton which removes the other references but keeps the
definitions. So that there won't be compile errors when the patches
are pushed at different time.

Thanks,
Fengguang

2010-08-19 19:43:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: why are WB_SYNC_NONE COMMITs being done with FLUSH_SYNC set ?

On Thu, 2010-08-19 at 15:16 -0400, Jeff Layton wrote:
> On Thu, 19 Aug 2010 10:58:25 -0400
> Trond Myklebust <[email protected]> wrote:
>
> > On Thu, 2010-08-19 at 10:37 -0400, Christoph Hellwig wrote:
> > > On Thu, Aug 19, 2010 at 10:15:25AM -0400, Jeff Layton wrote:
> > > > I'm looking at backporting some upstream changes to earlier kernels,
> > > > and ran across something I don't quite understand...
> > > >
> > > > In nfs_commit_unstable_pages, we set the flags to FLUSH_SYNC. We then
> > > > zero out the flags if wbc->nonblocking or wbc->for_background is set.
> > > >
> > > > Shouldn't we also clear it out if wbc->sync_mode == WB_SYNC_NONE ?
> > > > WB_SYNC_NONE means "don't wait on anything", so shouldn't that include
> > > > not waiting on the COMMIT to complete?
> > >
> > > I've been trying to figure out what the nonblocking flag is supposed
> > > to mean for a while now.
> > >
> > > It basically disappeared in commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7
> > >
> > > "writeback: remove unused nonblocking and congestion checks"
> > >
> > > from Wu. What's left these days is a couple of places in local copies
> > > of write_cache_pages (afs, cifs), and a couple of checks in random
> > > writepages instances (afs, block_write_full_page, ceph, nfs, reiserfs, xfs)
> > > and the use in nfs_write_inode. It's only actually set for memory
> > > migration and pageout, that is VM writeback.
> > >
> > > To me it really doesn't make much sense, but maybe someone has a better
> > > idea what it is for.
> > >
> > > > + if (wbc->nonblocking || wbc->for_background ||
> > > > + wbc->sync_mode == WB_SYNC_NONE)
> > >
> > > You could remove the nonblocking and for_background checks as
> > > these impliy WB_SYNC_NONE.
> >
> > To me that sounds fine. I've also been trying to wrap my head around the
> > differences between 'nonblocking', 'for_background', 'for_reclaim' and
> > 'for_kupdate' and how the filesystem is supposed to treat them.
> >
> > Aside from the above, I've used 'for_reclaim', 'for_kupdate' and
> > 'for_background' in order to adjust the RPC request's queuing priority
> > (high in the case of 'for_reclaim' and low for the other two).
> >
>
> Here's a lightly tested patch that turns the check for the two flags
> into a check for WB_SYNC_NONE. It seems to do the right thing, but I
> don't have a clear testcase for it. Does this look reasonable?

Looks fine to me. I'll queue it up for the post-2.6.36 merge window...

Cheers
Trond