2021-02-09 16:14:28

by David Howells

[permalink] [raw]
Subject: [GIT PULL] fscache: I/O API modernisation and netfs helper library

Hi Linus,

Can you pull this during the upcoming merge window? It provides a more
modern I/O API for fscache and moves some common pieces out of network
filesystems into a common helper library. This request only includes
modifications for afs and ceph.

Dave Wysochanski has a patch series for nfs. Normal nfs works fine and
passes various tests, but it turned out pnfs has a problem that wasn't
discovered until quite late - pnfs does splitting of requests itself and
sending them to various places, but it will need to cooperate more closely
with the netfs lib over this.

I've given Dominique Martinet a patch for 9p and Steve French a partial
patch for cifs, but neither of those is going to be ready for this merge
window.

The main features of this request are:

(1) Institution of a helper library for network filesystems. The first
phase of this handles ->readpage(), ->readahead() and part of
->write_begin() on behalf of the netfs, requiring the netfs to provide
a common vector to perform a read to some part of a file.

This allows handling of the following to be (at least partially) moved
out of all the network filesystems and consolidated in one place:

- changes in VM vectors (Matthew Wilcox's work)
- transparent huge page support
- shaping of reads
- readahead expansion
- fs alignment/granularity (ceph, pnfs)
- cache alignment/granularity
- slicing of reads
- rsize
- keeping multiple read in flight } Steve French would like
- multichannel distribution } but for the future
- multiserver distribution (ceph, pnfs)
- stitching together reads from the cache and reads from the net
- copying data read from the server into the cache
- retry/reissue handling
- fallback after cache failure
- short reads
- fscrypt data crypting (Jeff Layton is considering for the future)

(2) Adding an alternate cache I/O API for use with the netfs lib that
makes use of kiocbs in the cache to do direct I/O between the cache
files and the netfs pages.

This is intended to replace the current I/O API that calls the backing
fs readpage op and than snooping the wait queues for completion to
read and using vfs_write() to write. It wasn't possible to do
in-kernel DIO when I first wrote cachefiles - but using kiocbs makes
it a lot simpler and more robust (and it uses a lot less memory).

(3) Add an ITER_XARRAY iov_iter that allows I/O iteration to be done on an
xarray of pinned pages (such as inode->i_mapping->i_pages), thereby
avoiding the need to allocate a bvec array to represent this.

This is used to present a set of netfs pages to the cache to do DIO on
and is also used by afs to present netfs pages to sendmsg. It could
also be used by unencrypted cifs to pass the pages to the TCP socket
it uses (if it's doing TCP) and my patch for 9p (which isn't included
here) can make use of it too.

(4) Make afs use the above. It passes the same xfstests (and has the same
failures) as the unpatched afs client.

(5) Make ceph use the above (I've merged a branch from Jeff Layton for this).
This also passes xfstests.

David
---
The following changes since commit 9791581c049c10929e97098374dd1716a81fefcc:

Merge tag 'for-5.11-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux (2021-01-20 14:15:33 -0800)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/fscache-ioapi-20210203

for you to fetch changes up to 1df6bf2cc0fad1a5b2b32b7b0066b13175ad1ce4:

netfs: Fix kerneldoc on netfs_subreq_terminated() (2021-02-03 11:17:57 +0000)

----------------------------------------------------------------
fscache I/O API rework and netfs changes

----------------------------------------------------------------
David Howells (29):
iov_iter: Add ITER_XARRAY
vm: Add wait/unlock functions for PG_fscache
mm: Implement readahead_control pageset expansion
vfs: Export rw_verify_area() for use by cachefiles
netfs: Make a netfs helper module
netfs: Provide readahead and readpage netfs helpers
netfs: Add tracepoints
netfs: Gather stats
netfs: Add write_begin helper
netfs: Define an interface to talk to a cache
fscache, cachefiles: Add alternate API to use kiocb for read/write to cache
afs: Disable use of the fscache I/O routines
afs: Pass page into dirty region helpers to provide THP size
afs: Print the operation debug_id when logging an unexpected data version
afs: Move key to afs_read struct
afs: Don't truncate iter during data fetch
afs: Log remote unmarshalling errors
afs: Set up the iov_iter before calling afs_extract_data()
afs: Use ITER_XARRAY for writing
afs: Wait on PG_fscache before modifying/releasing a page
afs: Extract writeback extension into its own function
afs: Prepare for use of THPs
afs: Use the fs operation ops to handle FetchData completion
afs: Use new fscache read helper API
Merge branch 'fscache-netfs-lib' into fscache-next
Merge branch 'ceph-netfs-lib' of https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux into fscache-next
netfs: Fix various bits of error handling
afs: Fix error handling in afs_req_issue_op()
netfs: Fix kerneldoc on netfs_subreq_terminated()

Jeff Layton (7):
ceph: disable old fscache readpage handling
ceph: rework PageFsCache handling
ceph: fix fscache invalidation
ceph: convert readpage to fscache read helper
ceph: plug write_begin into read helper
ceph: convert ceph_readpages to ceph_readahead
ceph: fix an oops in error handling in ceph_netfs_issue_op

fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/afs/Kconfig | 1 +
fs/afs/dir.c | 225 +++++---
fs/afs/file.c | 470 ++++-------------
fs/afs/fs_operation.c | 4 +-
fs/afs/fsclient.c | 108 ++--
fs/afs/inode.c | 7 +-
fs/afs/internal.h | 58 +-
fs/afs/rxrpc.c | 150 ++----
fs/afs/write.c | 610 ++++++++++++----------
fs/afs/yfsclient.c | 82 +--
fs/cachefiles/Makefile | 1 +
fs/cachefiles/interface.c | 5 +-
fs/cachefiles/internal.h | 9 +
fs/cachefiles/rdwr2.c | 412 +++++++++++++++
fs/ceph/Kconfig | 1 +
fs/ceph/addr.c | 535 ++++++++-----------
fs/ceph/cache.c | 125 -----
fs/ceph/cache.h | 101 +---
fs/ceph/caps.c | 10 +-
fs/ceph/inode.c | 1 +
fs/ceph/super.h | 1 +
fs/fscache/Kconfig | 1 +
fs/fscache/Makefile | 3 +-
fs/fscache/internal.h | 3 +
fs/fscache/page.c | 2 +-
fs/fscache/page2.c | 117 +++++
fs/fscache/stats.c | 1 +
fs/internal.h | 5 -
fs/netfs/Kconfig | 23 +
fs/netfs/Makefile | 5 +
fs/netfs/internal.h | 97 ++++
fs/netfs/read_helper.c | 1161 +++++++++++++++++++++++++++++++++++++++++
fs/netfs/stats.c | 59 +++
fs/read_write.c | 1 +
include/linux/fs.h | 1 +
include/linux/fscache-cache.h | 4 +
include/linux/fscache.h | 40 +-
include/linux/netfs.h | 167 ++++++
include/linux/pagemap.h | 16 +
include/linux/uio.h | 11 +
include/net/af_rxrpc.h | 2 +-
include/trace/events/afs.h | 74 ++-
include/trace/events/netfs.h | 201 +++++++
lib/iov_iter.c | 313 ++++++++++-
mm/filemap.c | 18 +
mm/readahead.c | 70 +++
net/rxrpc/recvmsg.c | 9 +-
49 files changed, 3749 insertions(+), 1573 deletions(-)
create mode 100644 fs/cachefiles/rdwr2.c
create mode 100644 fs/fscache/page2.c
create mode 100644 fs/netfs/Kconfig
create mode 100644 fs/netfs/Makefile
create mode 100644 fs/netfs/internal.h
create mode 100644 fs/netfs/read_helper.c
create mode 100644 fs/netfs/stats.c
create mode 100644 include/linux/netfs.h
create mode 100644 include/trace/events/netfs.h


2021-02-10 20:46:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Wed, Feb 10, 2021 at 8:33 AM David Howells <[email protected]> wrote:
>
> Then I could follow it up with this patch here, moving towards dropping the
> PG_fscache alias for the new API.

So I don't mind the alias per se, but I did mind the odd mixing of
names for the same thing.

So I think your change to make it be named "wait_on_page_private_2()"
fixed that mixing, but I also think that it's probably then a good
idea to have aliases in place for filesystems that actually include
the fscache.h header.

Put another way: I think that it would be even better to simply just
have a function like

static inline void wait_on_page_fscache(struct page *page)
{
if (PagePrivate2(page))
wait_on_page_bit(page, PG_private_2);
}

and make that be *not* in <linux/pagemap.h>, but simply be in
<linux/fscache.h> under that big comment about how PG_private_2 is
used for the fscache bit. You already have that comment, putting the
above kind of helper function right there would very much explain why
a "wait for fscache bit" function then uses the PagePrivate2 function
to test the bit. Agreed?

Alternatively, since that header file already has

#define PageFsCache(page) PagePrivate2((page))

you could also just write the above as

static inline void wait_on_page_fscache(struct page *page)
{
if (PageFsCache(page))
wait_on_page_bit(page, PG_fscache);
}

and now it is even more obvious. And there's no odd mixing of
"fscache" and "private_2", it's all consistent.

IOW, I'm not against "wait_on_page_fscache()" as a function, but I
*am* against the odd _mixing_ of things without a big explanation,
where the code itself looks very odd and questionable.

And I think the "fscache" waiting functions should not be visible to
any core VM or filesystem code - it should be limited explicitly to
those filesystems that use fscache, and include that header file.

Wouldn't that make sense?

Also, honestly, I really *REALLY* want your commit messages to talk
about who has been cc'd, who has been part of development, and point
to the PUBLIC MAILING LISTS WHERE THAT DISCUSSION WAS TAKING PLACE, so
that I can actually see that "yes, other people were involved"

No, I don't require this in general, but exactly because of the
history we have, I really really want to see that. I want to see a

Link: https://lore.kernel.org/r/....

and the Cc's - or better yet, the Reviewed-by's etc - so that when I
get a pull request, it really is very obvious to me when I look at it
that others really have been involved.

So if I continue to see just

Signed-off-by: David Howells <[email protected]>

at the end of the commit messages, I will not pull.

Yes, in this thread a couple of people have piped up and said that
they were part of the discussion and that they are interested, but if
I have to start asking around just to see that, then it's too little,
too late.

No more of this "it looks like David Howells did things in private". I
want links I can follow to see the discussion, and I really want to
see that others really have been involved.

Ok?

Linus

2021-02-12 16:45:44

by David Wysochanski

[permalink] [raw]
Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Thu, Feb 11, 2021 at 6:20 PM David Howells <[email protected]> wrote:
>
> Linus Torvalds <[email protected]> wrote:
>
> > Also, honestly, I really *REALLY* want your commit messages to talk
> > about who has been cc'd, who has been part of development, and point
> > to the PUBLIC MAILING LISTS WHERE THAT DISCUSSION WAS TAKING PLACE, so
> > that I can actually see that "yes, other people were involved"
>
> Most of the development discussion took place on IRC and waving snippets of
> code about in pastebin rather than email - the latency of email is just too
> high. There's not a great deal I can do about that now as I haven't kept IRC
> logs. I can do that in future if you want.
>
> > No, I don't require this in general, but exactly because of the
> > history we have, I really really want to see that. I want to see a
> >
> > Link: https://lore.kernel.org/r/....
>
> I can add links to where I've posted the stuff for review. Do you want this
> on a per-patch basis or just in the cover for now?
>
> Also, do you want things like these:
>
> https://lore.kernel.org/linux-fsdevel/[email protected]/
> https://lore.kernel.org/linux-fsdevel/[email protected]/
>
> which pertain to the overall fscache rewrite, but where the relevant changes
> didn't end up included in this particular patchset? Or this:
>
> https://listman.redhat.com/archives/linux-cachefs/2020-December/msg00000.html
>
> where someone was testing the overall patchset of which this is a subset?
>
> > and the Cc's - or better yet, the Reviewed-by's etc - so that when I
> > get a pull request, it really is very obvious to me when I look at it
> > that others really have been involved.
> >
> > So if I continue to see just
> >
> > Signed-off-by: David Howells <[email protected]>
> >
> > at the end of the commit messages, I will not pull.
> >
> > Yes, in this thread a couple of people have piped up and said that
> > they were part of the discussion and that they are interested, but if
> > I have to start asking around just to see that, then it's too little,
> > too late.
> >
> > No more of this "it looks like David Howells did things in private". I
> > want links I can follow to see the discussion, and I really want to
> > see that others really have been involved.
> >
> > Ok?
>
> Sure.
>
> I can go and edit in link pointers into the existing patches if you want and
> add Jeff's Review-and-tested-by into the appropriate ones. You would be able
> to compare against the existing tag, so it wouldn't entirely invalidate the
> testing.
>
You can add my Tested-by for your fscache-next branch series ending at
commit 235299002012 netfs: Hold a ref on a page when PG_private_2 is set
This series includes your commit c723f0232c9f8928b3b15786499637bda3121f41
discussed a little earlier in this email thread.

I ran over 24 hours of NFS tests (unit, connectathon, xfstests,
various servers and all NFS versions) on your latest series
and it looks good. Note I did not run against pNFS servers
due to known issue, and I did not do more advanced tests like
error injections. I did get one OOM on xfstest generic/551 on
one testbed, but that same' test passed on another testbed,
so it's not clear what is happening there and it could very
well be testbed or NFS related.

In addition, I reviewed various patches in the series, especially the
API portions of the netfs patches, so for those, Reviewed-by is
appropriate as well. I have also reviewed some of the internals
of the other infrastructure patches, but my review is more limited
there.





> Also, do you want links inserting into all the patches of the two keyrings
> pull requests I've sent you?
>
> David
>

2021-02-15 01:14:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

On Sun, Feb 14, 2021 at 4:23 PM David Howells <[email protected]> wrote:
>
> Anyway, I have posted my fscache modernisation patches multiple times for
> public review, I have tried to involve the wider community in aspects of the
> development on public mailing lists and I have been including the maintainers
> in to/cc.

So then add those links and the cc's to the commit logs, so that I can
*see* them.

I'm done with this discussion.

If I see a pull request from you, I DO NOT WANT TO HAVE TO HAVE A
WEEK-LONG EMAIL THREAD ABOUT HOW I CANNOT SEE THAT IT HAS EVER SEEN
ANY REVIEW.

So if all I see is "Signed-off-by:" from you, I will promptly throw
that pull request into the garbage, because it's just not worth my
time to try to have to get you kicking and screaming to show that
others have been involved.

Can you not understand that?

When I get that pull request, I need to see that yes, this has been
reviewed, people have been involved, and yes, it's been in linux-next.

I want to see "reviewed-by" and "tested-by", I want to see "cc", and I
want to see links to submission threads with discussion showing that
others actually were involved.

I do *not* want to see just a single signed-off-by line from you, and
then have to ask for "has anybody else actually seen this and reviewed
it".

Look, here's an entirely unrelated example from a single fairly recent
trivial one-liner memory leak fix:

Fixes: 87c715dcde63 ("scsi: scsi_debug: Add per_host_store option")
Link: https://lore.kernel.org/r/[email protected]
Acked-by: Douglas Gilbert <[email protected]>
Signed-off-by: Maurizio Lombardi <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

that's from a quite a trivial commit. Yes, it's trivial, but it could
still be wrong, of course. And if somebody ever reports that it causes
problems despite how simple it was, look at what I have: I have three
people to contact, and I have a pointer to the actual original
submission of the patch.

Do we have that for all our commits? No. But it's also not at all
unusual any more, and in fact many commits have even more, with
testing etc.

And yes, sometimes the test results and acks come back later after
you've already pushed the changes out etc, and no, it's generally not
worth rebasing for that - maybe others have now started to rely on
whatever public branch you have. Which is why the "Link:" is useful,
so that if things come in later, the discussion can still be found.
But quite often, you shouldn't have pushed out some final branch
before you've gotten at least *some* positive response from people, so
I do kind of expect some "Acked-by" etc in the commit itself.

THAT is what you need to aim for.

And yes, I'm picking on you. Because we've had this problem before.
I've complained when you've sent me pull requests that don't even
build, that you in fact had been told by linux-next didn't build, and
you still sent them to me.

And as a result, I've asked for more involvement from other people before.

So now I'm clarifying that requirement - I absolutely need to see
that it has actually seen testing, that it has seen other people being
involved, and that it isn't just you throwing spaghetti at the wall to
see what sticks.

And I'm not going to do that for every pull request. I want to see
that data *in* the pull request itself.

Linus