LinuxLists.cc - Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

2021-02-09 19:21:51

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

So I'm looking at this early, because I have more time now than I will
have during the merge window, and honestly, your pull requests have
been problematic in the past.

The PG_fscache bit waiting functions are completely crazy. The comment
about "this will wake up others" is actively wrong, and the waiting
function looks insane, because you're mixing the two names for
"fscache" which makes the code look totally incomprehensible. Why
would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
why, but the code looks entirely nonsensical.

So just looking at the support infrastructure changes, I get a big "Hmm".

But the thing that makes me go "No, I won't pull this", is that it has
all the same hallmark signs of trouble that I've complained about
before: I see absolutely zero sign of "this has more developers
involved".

There's not a single ack from a VM person for the VM changes. There's
no sign that this isn't yet another "David Howells went off alone and
did something that absolutely nobody else cared about".

See my problem? I need to be convinced that this makes sense outside
of your world, and it's not yet another thing that will cause problems
down the line because nobody else really ever used it or cared about
it until we hit a snag.

Linus

2021-02-09 20:05:21

by Jeff Layton

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Tue, 2021-02-09 at 11:06 -0800, Linus Torvalds wrote:
> So I'm looking at this early, because I have more time now than I will
> have during the merge window, and honestly, your pull requests have
> been problematic in the past.
>
> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.
>
> So just looking at the support infrastructure changes, I get a big "Hmm".
>
> But the thing that makes me go "No, I won't pull this", is that it has
> all the same hallmark signs of trouble that I've complained about
> before: I see absolutely zero sign of "this has more developers
> involved".
>
> There's not a single ack from a VM person for the VM changes. There's
> no sign that this isn't yet another "David Howells went off alone and
> did something that absolutely nobody else cared about".
>
> See my problem? I need to be convinced that this makes sense outside
> of your world, and it's not yet another thing that will cause problems
> down the line because nobody else really ever used it or cared about
> it until we hit a snag.
>
> ??????????????????Linus
>

I (and several other developers) have been working with David on this
for the last year or so. Would it help if I gave this on the netfs lib
work and the fscache patches?

Reviewed-and-tested-by: Jeff Layton <[email protected]>

My testing has mainly been with ceph. My main interest is that this
allows us to drop a fairly significant chunk of rather nasty code from
fs/ceph. The netfs read helper infrastructure makes a _lot_ more sense
for a networked filesystem, IMO.

The legacy fscache code has some significant bugs too, and this gives it
a path to making better use of more modern kernel features. It should
also be set up so that filesystems can be converted piecemeal.

I'd really like to see this go in.

Cheers,
Jeff

2021-02-09 20:44:37

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Tue, Feb 09, 2021 at 11:06:41AM -0800, Linus Torvalds wrote:
> So I'm looking at this early, because I have more time now than I will
> have during the merge window, and honestly, your pull requests have
> been problematic in the past.

Thanks for looking at this early.

> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.

Yeah, I have trouble with the private2 vs fscache bit too. I've been
trying to persuade David that he doesn't actually need an fscache
bit at all; he can just increment the page's refcount to prevent it
from being freed while he writes data to the cache.

> But the thing that makes me go "No, I won't pull this", is that it has
> all the same hallmark signs of trouble that I've complained about
> before: I see absolutely zero sign of "this has more developers
> involved".

I've been involved! I really want to get rid of the address_space
readpages op. The only remaining users are the filesystems which
use fscache and it's hard to convert them with the old infrastructure.
I'm not 100% convinced that the new infrastructure is good, but I am
convinced that it's better than the old infrastructure.

> There's not a single ack from a VM person for the VM changes. There's
> no sign that this isn't yet another "David Howells went off alone and
> did something that absolutely nobody else cared about".

I'm pretty bad about sending R-b for patches when I've only given them
a quick once-over. I tend to only do it for patches that I've given an
appropriately deep amount of thought to (eg almost none for spelling
fixes and days for page cache related patches). I'll see what I feel
comfortable with for this patchset.

> See my problem? I need to be convinced that this makes sense outside
> of your world, and it's not yet another thing that will cause problems
> down the line because nobody else really ever used it or cared about
> it until we hit a snag.

My major concern is that we haven't had any feedback from Trond or Anna.
I don't know if they're just too busy or if there's something else going
on, but it'd be nice to hear something.

2021-02-09 21:55:27

by David Howells

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

Linus Torvalds <[email protected]> wrote:

> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong,

You mean this?

/**
* unlock_page_fscache - Unlock a page pinned with PG_fscache
* @page: The page
*
* Unlocks the page and wakes up sleepers in wait_on_page_fscache(). Also
* wakes those waiting for the lock and writeback bits because the wakeup
* mechanism is shared. But that's OK - those sleepers will just go back to
* sleep.
*/

Actually, you're right. The wakeup check func is evaluated by the
waker-upper. I can fix the comment with a patch.

> and the waiting function looks insane, because you're mixing the two names
> for "fscache" which makes the code look totally incomprehensible. Why would
> we wait for PF_fscache, when PG_private_2 was set? Yes, I know why, but the
> code looks entirely nonsensical.

IIRC someone insisted that I should make it a generic name and put the
accessor functions in the fscache headers (which means they aren't available
to core code), but I don't remember who (maybe Andrew? it was before mid-2007)
- kind of like PG_checked is an alias for PG_owner_priv_1.

I'd be quite happy to move the accessors for PG_fscache to the
linux/page-flags.h as that would simplify things.

David

2021-02-09 23:00:08

by Linus Torvalds

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Tue, Feb 9, 2021 at 12:21 PM Matthew Wilcox <[email protected]> wrote:
>
> Yeah, I have trouble with the private2 vs fscache bit too. I've been
> trying to persuade David that he doesn't actually need an fscache
> bit at all; he can just increment the page's refcount to prevent it
> from being freed while he writes data to the cache.

Does the code not hold a refcount already?

Honestly, the fact that writeback doesn't take a refcount, and then
has magic "if writeback is set, don't free" code in other parts of the
VM layer has been a problem already, when the wakeup ended up
"leaking" from a previous page to a new allocation.

I very much hope the fscache bit does not make similar mistakes,
because the rest of the VM will _not_ have special "if fscache is set,
then we won't do X" the way we do for writeback.

So I think the fscache code needs to hold a refcount regardless, and
that the fscache bit is set the page has to have a reference.

So what are the current lifetime rules for the fscache bit?

Linus

2021-02-09 23:00:16

by David Wysochanski

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Tue, Feb 9, 2021 at 2:07 PM Linus Torvalds
<[email protected]> wrote:
>
> So I'm looking at this early, because I have more time now than I will
> have during the merge window, and honestly, your pull requests have
> been problematic in the past.
>
> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.
>
> So just looking at the support infrastructure changes, I get a big "Hmm".
>
> But the thing that makes me go "No, I won't pull this", is that it has
> all the same hallmark signs of trouble that I've complained about
> before: I see absolutely zero sign of "this has more developers
> involved".
>
> There's not a single ack from a VM person for the VM changes. There's
> no sign that this isn't yet another "David Howells went off alone and
> did something that absolutely nobody else cared about".
>

I care about it.

I cannot speak to your concerns about the infrastructure changes, nor
can I comment about a given maintainers involvement or lack thereof.
However, I can tell you what my involvement has been. I got involved
with it because some of our customers use fscache with NFS and I've
supported it. I saw dhowells rewriting it to greatly simplify the
code and make it easier to debug and wanted to support the effort.

I have been working on the NFS conversion as dhowells has been
evolving the fscache-iter API. David first posted the series I think
in Dec 2019 and I started with NFS about mid-year 2020, and had my
first post of NFS patches in July:
https://marc.info/?l=linux-nfs&m=159482591232752&w=2

One thing that came out of the earlier iterations as a result of my
testing was the need for the network filesystem to be able to 'cap'
the IO size based on its parameters, hence the "clamp_length()"
function. So the next iteration dhowells further evolved it into a
'netfs' API and a 'fscache' API, and my November post was based on
that:
https://marc.info/?l=linux-nfs&m=160596540022461&w=2

Each iteration has greatly simplified the interface to the network
filesystem until today where the API is pretty simple. I have done
extensive tests with each iteration with all the main NFS versions,
specific unit tests, xfstests, etc. However my test matrix did not
hit enough fscache + pNFS servers, and I found a problem too late to
include in his pull request. This is mostly why my patches were not
included to convert NFS to the new fscache API, but I intend to work
out the remaining issues for the next merge window, and I'll have an
opportunity to do more testing last week of Feb with the NFS "remote
bakeathon". My most recent post was at the end of Jan, and Anna is
taking the first 5 refactoring patches in the next merge window:
https://marc.info/?l=linux-nfs&m=161184595127618&w=2

I do not have the skills of a Trond or Anna NFS developers, but I have
worked in this in earnest and intend to see it through to completion
and support NFS and fscache work. I have received some feedback on
the NFS patches though it's not been a lot, I do know I have some
things to address still. With open source, no feedback is hard to
draw conclusions other than it's not "super popular" area, but we
always knew that about fscache - it's an "add on" that some customers
require but not everyone. I know Trond speaks up when I make a mistake
and/or something will cause a problem, so I consider the silence
mostly a positive sign.

> See my problem? I need to be convinced that this makes sense outside
> of your world, and it's not yet another thing that will cause problems
> down the line because nobody else really ever used it or cared about
> it until we hit a snag.
>
> Linus
>

2021-02-09 23:54:49

by David Howells

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

Matthew Wilcox <[email protected]> wrote:

> Yeah, I have trouble with the private2 vs fscache bit too. I've been
> trying to persuade David that he doesn't actually need an fscache
> bit at all; he can just increment the page's refcount to prevent it
> from being freed while he writes data to the cache.

That's not what the bit is primarily being used for. It's being used to
prevent the starting of a second write to the cache whilst the first is in
progress and also to prevent modification whilst DMA to the cache is in
progress. This isn't so obvious in this cut-down patchset, but comes more in
to play with full caching of local writes in my fscache-iter branch.

I can't easily share PG_writeback for this because each bit covers a write to
a different place. PG_writeback covers the write to the server and PG_fscache
the write to the cache. These writes may get split up differently and will
most likely finish at different times.

If I have to share PG_writeback, that will mean storing both states for each
page somewhere else and then "OR'ing" them together to drive PG_writeback.

David

2021-02-10 16:35:03

by David Howells

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

Linus Torvalds <[email protected]> wrote:

> The PG_fscache bit waiting functions are completely crazy. The comment
> about "this will wake up others" is actively wrong, and the waiting
> function looks insane, because you're mixing the two names for
> "fscache" which makes the code look totally incomprehensible. Why
> would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> why, but the code looks entirely nonsensical.

How about the attached change to make it more coherent and fix the doc
comment?

David
---
commit 9a28f7e68602193ce020a41f855f71cc55f693b9
Author: David Howells <[email protected]>
Date: Wed Feb 10 10:53:02 2021 +0000

netfs: Rename unlock_page_fscache() and wait_on_page_fscache()

Rename unlock_page_fscache() to unlock_page_private_2() and
wait_on_page_fscache() to wait_on_page_private_2() and change the
references to PG_fscache to PG_private_2 also. This makes these functions
look more generic and doesn't mix the terminology.

Fix the kdoc comment as the wake up mechanism doesn't wake up all the
sleepers. Note the example usage case for the functions in conjunction
with the cache also.

Alias the functions in linux/netfs.h.

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: David Howells <[email protected]>

diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 2ffdef1ded91..d4cb6e6f704c 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -24,6 +24,8 @@
#define ClearPageFsCache(page) ClearPagePrivate2((page))
#define TestSetPageFsCache(page) TestSetPagePrivate2((page))
#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
+#define wait_on_page_fscache(page) wait_on_page_private_2((page))
+#define unlock_page_fscache(page) unlock_page_private_2((page))

enum netfs_read_source {
NETFS_FILL_WITH_ZEROES,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 4935ad6171c1..a88ccc9ab0b1 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -591,7 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
extern void unlock_page(struct page *page);
-extern void unlock_page_fscache(struct page *page);
+extern void unlock_page_private_2(struct page *page);

/*
* Return true if the page was successfully locked
@@ -683,16 +683,17 @@ static inline int wait_on_page_locked_killable(struct page *page)
}

/**
- * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page
+ * wait_on_page_private_2 - Wait for PG_private_2 to be cleared on a page
* @page: The page
*
- * Wait for the fscache mark to be removed from a page, usually signifying the
- * completion of a write from that page to the cache.
+ * Wait for the PG_private_2 page bit to be removed from a page. This is, for
+ * example, used to handle a netfs page being written to a local disk cache,
+ * thereby allowing writes to the cache for the same page to be serialised.
*/
-static inline void wait_on_page_fscache(struct page *page)
+static inline void wait_on_page_private_2(struct page *page)
{
if (PagePrivate2(page))
- wait_on_page_bit(compound_head(page), PG_fscache);
+ wait_on_page_bit(compound_head(page), PG_private_2);
}

extern void put_and_wait_on_page_locked(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 91fcae006d64..7d321152d579 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1467,22 +1467,24 @@ void unlock_page(struct page *page)
EXPORT_SYMBOL(unlock_page);

/**
- * unlock_page_fscache - Unlock a page pinned with PG_fscache
+ * unlock_page_private_2 - Unlock a page that's locked with PG_private_2
* @page: The page
*
- * Unlocks the page and wakes up sleepers in wait_on_page_fscache(). Also
- * wakes those waiting for the lock and writeback bits because the wakeup
- * mechanism is shared. But that's OK - those sleepers will just go back to
- * sleep.
+ * Unlocks a page that's locked with PG_private_2 and wakes up sleepers in
+ * wait_on_page_private_2().
+ *
+ * This is, for example, used when a netfs page is being written to a local
+ * disk cache, thereby allowing writes to the cache for the same page to be
+ * serialised.
*/
-void unlock_page_fscache(struct page *page)
+void unlock_page_private_2(struct page *page)
{
page = compound_head(page);
VM_BUG_ON_PAGE(!PagePrivate2(page), page);
- clear_bit_unlock(PG_fscache, &page->flags);
- wake_up_page_bit(page, PG_fscache);
+ clear_bit_unlock(PG_private_2, &page->flags);
+ wake_up_page_bit(page, PG_private_2);
}
-EXPORT_SYMBOL(unlock_page_fscache);
+EXPORT_SYMBOL(unlock_page_private_2);

/**
* end_page_writeback - end writeback against a page

2021-02-10 16:36:46

by David Howells

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

> Linus Torvalds <[email protected]> wrote:
>
> > The PG_fscache bit waiting functions are completely crazy. The comment
> > about "this will wake up others" is actively wrong, and the waiting
> > function looks insane, because you're mixing the two names for
> > "fscache" which makes the code look totally incomprehensible. Why
> > would we wait for PF_fscache, when PG_private_2 was set? Yes, I know
> > why, but the code looks entirely nonsensical.
>
> How about the attached change to make it more coherent and fix the doc
> comment?

Then I could follow it up with this patch here, moving towards dropping the
PG_fscache alias for the new API.

David
---
commit b415fafb07166732933242e938626fc6cdbdbc5b
Author: David Howells <[email protected]>
Date: Wed Feb 10 11:22:59 2021 +0000

netfs: Move towards dropping the PG_fscache alias for PG_private_2

Move towards dropping the PG_fscache alias for PG_private_2 with the new
fscache I/O API and netfs helper lib (this does not alter the old API).

Signed-off-by: David Howells <[email protected]>

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 8f28d4f4cfd7..bb8c6d501292 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -438,7 +438,7 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
if (PagePrivate(page))
afs_invalidate_dirty(page, offset, length);

- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
_leave("");
}

@@ -457,10 +457,10 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
/* deny if page is being written to the cache and the caller hasn't
* elected to wait */
#ifdef CONFIG_AFS_FSCACHE
- if (PageFsCache(page)) {
+ if (PagePrivate2(page)) {
if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
return false;
- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
}
#endif

diff --git a/fs/afs/write.c b/fs/afs/write.c
index e672833c99bc..9c554981f53b 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -118,7 +118,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
}

#ifdef CONFIG_AFS_FSCACHE
- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
#endif

index = page->index;
@@ -929,8 +929,8 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
* be modified. We then assume the entire page will need writing back.
*/
#ifdef CONFIG_AFS_FSCACHE
- if (PageFsCache(page) &&
- wait_on_page_bit_killable(page, PG_fscache) < 0)
+ if (PagePrivate2(page) &&
+ wait_on_page_bit_killable(page, PG_private_2) < 0)
return VM_FAULT_RETRY;
#endif

@@ -1026,6 +1026,6 @@ int afs_launder_page(struct page *page)

trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
detach_page_private(page);
- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
return ret;
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 0dd64d31eff6..fd2498567b49 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -147,7 +147,7 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
struct ceph_inode_info *ci;
struct ceph_snap_context *snapc = page_snap_context(page);

- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);

inode = page->mapping->host;
ci = ceph_inode(inode);
@@ -176,10 +176,10 @@ static int ceph_releasepage(struct page *page, gfp_t gfp_flags)
dout("%p releasepage %p idx %lu (%sdirty)\n", page->mapping->host,
page, page->index, PageDirty(page) ? "" : "not ");

- if (PageFsCache(page)) {
+ if (PagePrivate2(page)) {
if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
return 0;
- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
}
return !PagePrivate(page);
}
@@ -1258,7 +1258,7 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
&ceph_netfs_write_begin_ops, NULL);
out:
if (r == 0)
- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
if (r < 0) {
if (page)
put_page(page);
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 156941e0de61..9018224693e9 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -223,7 +223,7 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq)

/*
* Deal with the completion of writing the data to the cache. We have to clear
- * the PG_fscache bits on the pages involved and release the caller's ref.
+ * the PG_private_2 bits on the pages involved and release the caller's ref.
*
* May be called in softirq mode and we inherit a ref from the caller.
*/
@@ -246,7 +246,7 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
if (have_unlocked && page->index <= unlocked)
continue;
unlocked = page->index;
- unlock_page_fscache(page);
+ unlock_page_private_2(page);
have_unlocked = true;
}
}
@@ -357,7 +357,7 @@ static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq)
}

/*
- * Unlock the pages in a read operation. We need to set PG_fscache on any
+ * Unlock the pages in a read operation. We need to set PG_private_2 on any
* pages we're going to write back before we unlock them.
*/
static void netfs_rreq_unlock(struct netfs_read_request *rreq)
@@ -404,7 +404,7 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
break;
}
if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags))
- SetPageFsCache(page);
+ SetPagePrivate2(page);
pg_failed |= subreq_failed;
if (pgend < iopos + subreq->len)
break;
@@ -1142,7 +1142,7 @@ int netfs_write_begin(struct file *file, struct address_space *mapping,
goto error;

have_page:
- wait_on_page_fscache(page);
+ wait_on_page_private_2(page);
have_page_no_wait:
if (netfs_priv)
ops->cleanup(netfs_priv, mapping);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 3f177faa0ac2..ccf533288291 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -29,6 +29,17 @@
#define fscache_cookie_valid(cookie) (0)
#endif

+#ifndef FSCACHE_USE_NEW_IO_API
+/*
+ * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
+ * a page is currently backed by a local disk cache
+ */
+#define PageFsCache(page) PagePrivate2((page))
+#define SetPageFsCache(page) SetPagePrivate2((page))
+#define ClearPageFsCache(page) ClearPagePrivate2((page))
+#define TestSetPageFsCache(page) TestSetPagePrivate2((page))
+#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
+#endif

/* pattern used to fill dead space in an index entry */
#define FSCACHE_INDEX_DEADFILL_PATTERN 0x79
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index d4cb6e6f704c..be03c3b6f0dc 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -15,18 +15,6 @@
#include <linux/workqueue.h>
#include <linux/fs.h>

-/*
- * Overload PG_private_2 to give us PG_fscache - this is used to indicate that
- * a page is currently backed by a local disk cache
- */
-#define PageFsCache(page) PagePrivate2((page))
-#define SetPageFsCache(page) SetPagePrivate2((page))
-#define ClearPageFsCache(page) ClearPagePrivate2((page))
-#define TestSetPageFsCache(page) TestSetPagePrivate2((page))
-#define TestClearPageFsCache(page) TestClearPagePrivate2((page))
-#define wait_on_page_fscache(page) wait_on_page_private_2((page))
-#define unlock_page_fscache(page) unlock_page_private_2((page))
-
enum netfs_read_source {
NETFS_FILL_WITH_ZEROES,
NETFS_DOWNLOAD_FROM_SERVER,
diff --git a/mm/filemap.c b/mm/filemap.c
index 7d321152d579..e7b2a1e2c40b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3604,7 +3604,7 @@ EXPORT_SYMBOL(generic_file_write_iter);
* The address_space is to try to release any data against the page
* (presumably at page->private).
*
- * This may also be called if PG_fscache is set on a page, indicating that the
+ * This may also be called if PG_private_2 is set on a page, indicating that the
* page is known to the local caching routines.
*
* The @gfp_mask argument specifies whether I/O may be performed to release
diff --git a/mm/readahead.c b/mm/readahead.c
index 4446dada0bc2..01209a46e834 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -40,7 +40,7 @@ EXPORT_SYMBOL_GPL(file_ra_state_init);

/*
* see if a page needs releasing upon read_cache_pages() failure
- * - the caller of read_cache_pages() may have set PG_private or PG_fscache
+ * - the caller of read_cache_pages() may have set PG_private or PG_private_2
* before calling, such as the NFS fs marking pages that are cached locally
* on disk, thus we need to give the fs a chance to clean up in the event of
* an error

2021-02-10 16:40:43

by David Howells

[permalink] [raw]

Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

Linus Torvalds <[email protected]> wrote:

> Does the code not hold a refcount already?

The attached patch will do that. Note that it's currently based on top of the
patch that drops the PG_fscache alias, so it refers to PG_private_2.

I've run all three patches through xfstests over afs, both with and without a
cache, and Jeff has tested ceph with them.

David
---
commit 803a09110b41b9f6091a517fc8f5c4b15475048c
Author: David Howells <[email protected]>
Date: Wed Feb 10 11:35:15 2021 +0000

netfs: Hold a ref on a page when PG_private_2 is set

Take a reference on a page when PG_private_2 is set and drop it once the
bit is unlocked.

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: David Howells <[email protected]>

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 9018224693e9..043d96ca2aad 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -10,6 +10,7 @@
#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
+#include <linux/pagevec.h>
#include <linux/slab.h>
#include <linux/uio.h>
#include <linux/sched/mm.h>
@@ -230,10 +231,13 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq)
static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
{
struct netfs_read_subrequest *subreq;
+ struct pagevec pvec;
struct page *page;
pgoff_t unlocked = 0;
bool have_unlocked = false;

+ pagevec_init(&pvec);
+
rcu_read_lock();

list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
@@ -247,6 +251,8 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
continue;
unlocked = page->index;
unlock_page_private_2(page);
+ if (pagevec_add(&pvec, page) == 0)
+ pagevec_release(&pvec);
have_unlocked = true;
}
}
@@ -403,8 +409,10 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
pg_failed = true;
break;
}
- if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags))
+ if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) {
+ get_page(page);
SetPagePrivate2(page);
+ }
pg_failed |= subreq_failed;
if (pgend < iopos + subreq->len)
break;