2021-04-26 23:07:38

by David Howells

[permalink] [raw]
Subject: [GIT PULL] Network fs helper library & fscache kiocb API

Hi Linus,

Here's a set of patches for 5.13 to begin the process of overhauling the
local caching API for network filesystems. This set consists of two parts:

(1) Add a helper library to handle the new VM readahead interface. This
is intended to be used unconditionally by the filesystem (whether or
not caching is enabled) and provides a common framework for doing
caching, transparent huge pages and, in the future, possibly fscrypt
and read bandwidth maximisation. It also allows the netfs and the
cache to align, expand and slice up a read request from the VM in
various ways; the netfs need only provide a function to read a stretch
of data to the pagecache and the helper takes care of the rest.

(2) Add an alternative fscache/cachfiles I/O API that uses the kiocb
facility to do async DIO to transfer data to/from the netfs's pages,
rather than using readpage with wait queue snooping on one side and
vfs_write() on the other. It also uses less memory, since it doesn't
do buffered I/O on the backing file.

Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
to be read from the cache. Whilst this is an improvement from the
bmap interface, it still has a problem with regard to a modern
extent-based filesystem inserting or removing bridging blocks of
zeros. Fixing that requires a much greater overhaul.

This is a step towards overhauling the fscache API. The change is opt-in
on the part of the network filesystem. A netfs should not try to mix the
old and the new API because of conflicting ways of handling pages and the
PG_fscache page flag and because it would be mixing DIO with buffered I/O.
Further, the helper library can't be used with the old API.

This does not change any of the fscache cookie handling APIs or the way
invalidation is done at this time.

In the near term, I intend to deprecate and remove the old I/O API
(fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(),
fscache_write_page() and fscache_uncache_page()) and eventually replace
most of fscache/cachefiles with something simpler and easier to follow.

This patchset contains the following parts:

(1) Some helper patches, including provision of an ITER_XARRAY iov
iterator and a function to do readahead expansion.

(2) Patches to add the netfs helper library.

(3) A patch to add the fscache/cachefiles kiocb API.

(4) A pair of patches to fix some review issues in the ITER_XARRAY and
read helpers as spotted by Al and Willy.

Jeff Layton has patches to add support in Ceph for this that he intends for
this merge window. I have a set of patches to support AFS that I will post
a separate pull request for.

With this, AFS without a cache passes all expected xfstests; with a cache,
there's an extra failure, but that's also there before these patches.
Fixing that probably requires a greater overhaul. Ceph also passes the
expected tests.

I also have patches in a separate branch to tidy up the handling of
PG_fscache/PG_private_2 and their contribution to page refcounting in the
core kernel here, but I haven't included them in this set and will route
them separately.


Changes
=======

Fixed some ITER_XARRAY issues spotted by Al Viro[14].

Fixed a kernel doc issue and a couple of potential integer overflows
in the read helpers spotted by Matthew Wilcox[15].

ver #7:
Put some missing compound_head() calls in the *_page_private_2()
functions[11].

Included a patch from Matthew Wilcox to make it possible to modify
the readahead_control descriptor in a filesystem without occasionally
triggering a BUG in the VM core[12].

Renamed iter_xarray_copy_pages() to iter_xarray_populate_pages() as
it doesn't copy the contents of the pages, but rather fills out a
list of pages[13].

ver #6:
Merged in some fixes and added an additional tracepoint[8], including
fixing the amalgamation of contiguous subrequests that are to be
written to the cache.

Added/merged some patches from Matthew Wilcox to make
readahead_expand() appropriately adjust the trigger for the next
readahead[9]. Also included is a patch to kerneldocify the
file_ra_state struct.

Altered netfs_write_begin() to use DEFINE_READAHEAD()[10].

Split the afs patches out into their own branch.

ver #5:
Fixed some review comments from Matthew Wilcox:

- Put a comment into netfs_readahead() to indicate why there's a loop
that puts, but doesn't unlock, "unconsumed" pages at the end when
it could just return said pages to the caller to dispose of[6].
(This is because where those pages are marked consumed).

- Use the page_file_mapping() and page_index() helper functions
rather than accessing the page struct directly[6].

- Better names for wrangling functions for PG_private_2 and
PG_fscache wrangling functions[7]. Came up with
{set,end,wait_for}_page_private_2() and aliased these for fscache.

Moved the taking of/dropping a page ref for the PG_private_2 flag
into the set and end functions.

ver #4:
Fixed some review comments from Christoph Hellwig, including dropping
the export of rw_verify_area()[3] and some minor stuff[4].

Moved the declaration of readahead_expand() to a better location[5].

Rebased to v5.12-rc2 and added a bunch of references into individual
commits.

Dropped Ceph support - that will go through the maintainer's tree.

Added interface documentation for the netfs helper library.

ver #3:
Rolled in the bug fixes.

Adjusted the functions that unlock and wait for PG_fscache according
to Linus's suggestion[1].

Hold a ref on a page when PG_fscache is set as per Linus's
suggestion[2].

Dropped NFS support and added Ceph support.

ver #2:
Fixed some bugs and added NFS support.

Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/[email protected]/ [3]
Link: https://lore.kernel.org/r/[email protected]/ [4]
Link: https://lore.kernel.org/r/[email protected]/ [5]
Link: https://lore.kernel.org/r/[email protected]/ [6]
Link: https://lore.kernel.org/r/[email protected]/ [7]
Link: https://lore.kernel.org/r/161781041339.463527.18139104281901492882.stgit@warthog.procyon.org.uk/ [8]
Link: https://lore.kernel.org/r/[email protected]/ [9]
Link: https://lore.kernel.org/r/[email protected]/ [10]
Link: https://lore.kernel.org/r/[email protected]/ [11]
Link: https://lore.kernel.org/r/[email protected]/ [12]
Link: https://lore.kernel.org/r/[email protected] [13]
Link: https://lore.kernel.org/r/[email protected] [14]
Link: https://lore.kernel.org/r/[email protected] [15]

References
==========

These patches have been published for review before, firstly as part of a
larger set:

Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/

Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/

Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/

Then as a cut-down set:

Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789062190.6155.12711584466338493050.stgit@warthog.procyon.org.uk/ # v6
Link: https://lore.kernel.org/r/161918446704.3145707.14418606303992174310.stgit@warthog.procyon.org.uk # v7

Proposals/information about the design has been published here:

Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

And requests for information:

Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

I've posted partial patches to try and help 9p and cifs along:

Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

David
---
The following changes since commit 4ee998b0ef8b6d7b1267cd4d953182224929abba:

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux (2021-03-24 11:26:50 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/netfs-lib-20210426

for you to fetch changes up to 53b776c77aca99b663a5512a04abc27670d61058:

netfs: Miscellaneous fixes (2021-04-26 23:23:41 +0100)

----------------------------------------------------------------
Network filesystem helper library

----------------------------------------------------------------
David Howells (16):
iov_iter: Add ITER_XARRAY
mm: Add set/end/wait functions for PG_private_2
mm: Implement readahead_control pageset expansion
netfs: Make a netfs helper module
netfs: Documentation for helper library
netfs, mm: Move PG_fscache helper funcs to linux/netfs.h
netfs, mm: Add set/end/wait_on_page_fscache() aliases
netfs: Provide readahead and readpage netfs helpers
netfs: Add tracepoints
netfs: Gather stats
netfs: Add write_begin helper
netfs: Define an interface to talk to a cache
netfs: Add a tracepoint to log failures that would be otherwise unseen
fscache, cachefiles: Add alternate API to use kiocb for read/write to cache
iov_iter: Four fixes for ITER_XARRAY
netfs: Miscellaneous fixes

Matthew Wilcox (Oracle) (3):
mm/filemap: Pass the file_ra_state in the ractl
fs: Document file_ra_state
mm/readahead: Handle ractl nr_pages being modified

Documentation/filesystems/index.rst | 1 +
Documentation/filesystems/netfs_library.rst | 526 ++++++++++++
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/interface.c | 5 +-
fs/cachefiles/internal.h | 9 +
fs/cachefiles/io.c | 420 ++++++++++
fs/ext4/verity.c | 2 +-
fs/f2fs/file.c | 2 +-
fs/f2fs/verity.c | 2 +-
fs/fscache/Kconfig | 1 +
fs/fscache/Makefile | 1 +
fs/fscache/internal.h | 4 +
fs/fscache/io.c | 116 +++
fs/fscache/page.c | 2 +-
fs/fscache/stats.c | 1 +
fs/netfs/Kconfig | 23 +
fs/netfs/Makefile | 5 +
fs/netfs/internal.h | 97 +++
fs/netfs/read_helper.c | 1185 +++++++++++++++++++++++++++
fs/netfs/stats.c | 59 ++
include/linux/fs.h | 24 +-
include/linux/fscache-cache.h | 4 +
include/linux/fscache.h | 50 +-
include/linux/netfs.h | 234 ++++++
include/linux/pagemap.h | 42 +-
include/linux/uio.h | 10 +
include/trace/events/netfs.h | 261 ++++++
lib/iov_iter.c | 318 ++++++-
mm/filemap.c | 65 +-
mm/internal.h | 7 +-
mm/readahead.c | 101 ++-
33 files changed, 3503 insertions(+), 77 deletions(-)
create mode 100644 Documentation/filesystems/netfs_library.rst
create mode 100644 fs/cachefiles/io.c
create mode 100644 fs/fscache/io.c
create mode 100644 fs/netfs/Kconfig
create mode 100644 fs/netfs/Makefile
create mode 100644 fs/netfs/internal.h
create mode 100644 fs/netfs/read_helper.c
create mode 100644 fs/netfs/stats.c
create mode 100644 include/linux/netfs.h
create mode 100644 include/trace/events/netfs.h


2021-04-27 00:14:37

by David Howells

[permalink] [raw]
Subject: [GIT PULL] afs: Preparation for fscache overhaul

Hi Linus,

Here's a set of patches for the AFS filesystem for 5.13 to begin the
process of overhauling the use of the fscache API by AFS and the
introduction of support for features such as Transparent Huge Pages (THPs).

(1) Add some support for THPs, including using core VM helper functions to
find details of pages.

(2) Use the ITER_XARRAY I/O iterator to mediate access to the pagecache as
this handles THPs and doesn't require allocation of large bvec arrays.

(3) Delegate address_space read/pre-write I/O methods for AFS to the netfs
helper library. A method is provided to the library that allows it to
issue a read against the server.

This includes a change in use for PG_fscache (it now indicates a DIO
write in progress from the marked page), so a number of waits need to
be deployed for it.

(4) Split the core AFS writeback function to make it easier to modify in
future patches to handle writing to the cache. [This might feasibly
make more sense moved out into my fscache-iter branch].

I've tested these with "xfstests -g quick" against an AFS volume (xfstests
needs patching to make it work). With this, AFS without a cache passes all
expected xfstests; with a cache, there's an extra failure, but that's also
there before these patches. Fixing that probably requires a greater
overhaul (as can be found on my fscache-iter branch, but that's for a later
time).

Thanks should go to Marc Dionne and Jeff Altman of AuriStor for exercising
the patches in their test farm also.


Changes
=======

These patches are dependent on the netfs-lib branch and have been posted in
association with them. The changes relevant to these patches are:

ver #6:
Split the afs patches out into their own branch.

ver #5:
Fixed some review comments from Matthew Wilcox:

- Better names for wrangling functions for PG_private_2 and
PG_fscache wrangling functions[3]. Came up with
{set,end,wait_for}_page_private_2() and aliased these for fscache.

Moved the taking of/dropping a page ref for the PG_private_2 flag
into the set and end functions.

ver #4:
Rebased to v5.12-rc2 and added a bunch of references into individual
commits.

ver #3:
Adjusted the functions that unlock and wait for PG_fscache according
to Linus's suggestion[1].

Hold a ref on a page when PG_fscache is set as per Linus's
suggestion[2].

Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/[email protected]/ [3]

References
==========

These patches have been published for review before, firstly as part of a
larger set:

Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/

Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/

Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/

Then as a cut-down set:

Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789062190.6155.12711584466338493050.stgit@warthog.procyon.org.uk/ # v6
Link: https://lore.kernel.org/r/161918446704.3145707.14418606303992174310.stgit@warthog.procyon.org.uk # v7

David
---
The following changes since commit 26aaeffcafe6cbb7c3978fa6ed7555122f8c9f8c:

fscache, cachefiles: Add alternate API to use kiocb for read/write to cache (2021-04-23 10:14:32 +0100)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-netfs-lib-20210426

for you to fetch changes up to 3003bbd0697b659944237f3459489cb596ba196c:

afs: Use the netfs_write_begin() helper (2021-04-23 10:17:28 +0100)

----------------------------------------------------------------
AFS: Use the new netfs lib

----------------------------------------------------------------
David Howells (14):
afs: Disable use of the fscache I/O routines
afs: Pass page into dirty region helpers to provide THP size
afs: Print the operation debug_id when logging an unexpected data version
afs: Move key to afs_read struct
afs: Don't truncate iter during data fetch
afs: Log remote unmarshalling errors
afs: Set up the iov_iter before calling afs_extract_data()
afs: Use ITER_XARRAY for writing
afs: Wait on PG_fscache before modifying/releasing a page
afs: Extract writeback extension into its own function
afs: Prepare for use of THPs
afs: Use the fs operation ops to handle FetchData completion
afs: Use new netfs lib read helper API
afs: Use the netfs_write_begin() helper

fs/afs/Kconfig | 1 +
fs/afs/dir.c | 225 +++++++++++-----
fs/afs/file.c | 483 +++++++++------------------------
fs/afs/fs_operation.c | 4 +-
fs/afs/fsclient.c | 108 +++-----
fs/afs/inode.c | 7 +-
fs/afs/internal.h | 59 ++--
fs/afs/rxrpc.c | 150 ++++-------
fs/afs/write.c | 657 +++++++++++++++++++++++----------------------
fs/afs/yfsclient.c | 82 ++----
include/net/af_rxrpc.h | 2 +-
include/trace/events/afs.h | 74 +++--
net/rxrpc/recvmsg.c | 9 +-
13 files changed, 805 insertions(+), 1056 deletions(-)

2021-04-27 20:34:33

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] Network fs helper library & fscache kiocb API

The pull request you sent on Tue, 27 Apr 2021 00:06:44 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/netfs-lib-20210426

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/820c4bae40cb56466cfed6409e00d0f5165a990c

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2021-04-27 20:37:51

by Jeff Layton

[permalink] [raw]
Subject: Re: [GIT PULL] Network fs helper library & fscache kiocb API

On Tue, 2021-04-27 at 20:32 +0000, [email protected] wrote:
> The pull request you sent on Tue, 27 Apr 2021 00:06:44 +0100:
>
> > git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/netfs-lib-20210426
>
> has been merged into torvalds/linux.git:
> https://git.kernel.org/torvalds/c/820c4bae40cb56466cfed6409e00d0f5165a990c
>
> Thank you!
>

Hi Ilya,

With this, we should be clear to send a PR to Linus for what's in
master. The patches that Viro was carrying are also in mainline now too.

Cheers,
--
Jeff Layton <[email protected]>