2023-12-13 15:24:19

by David Howells

[permalink] [raw]
Subject: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

Hi Jeff, Steve, Dominique,

I have been working on my netfslib helpers to the point that I can run
xfstests on AFS to completion (both with write-back buffering and, with a
small patch, write-through buffering in the pagecache). I have a patch for
9P, but am currently unable to test it.

The patches remove a little over 800 lines from AFS, 300 from 9P, albeit with
around 3000 lines added to netfs. Hopefully, I will be able to remove a bunch
of lines from Ceph too.

I've split the CIFS patches out to a separate branch, cifs-netfs, where a
further 2000+ lines are removed. I can run a certain amount of xfstests on
CIFS, though I'm running into ksmbd issues and not all the tests work
correctly because of issues between fallocate and what the SMB protocol
actually supports.

I've also dropped the content-crypto patches out for the moment as they're
only usable by the ceph changes which I'm still working on.

The patch to use PG_writeback instead of PG_fscache for writing to the
cache has also been deferred, pending 9p, afs, ceph and cifs all being
converted.

The main aims of these patches are to get high-level I/O and knowledge of
the pagecache out of the filesystem drivers as much as possible and to get
rid, as much of possible, of the knowledge that pages/folios exist.

Further, I would like to see ->write_begin, ->write_end and ->launder_folio
go away.

Features that are added by these patches to that which is already there in
netfslib:

(1) NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to
prevent these from happening at the same time. mmap'd I/O can, of
necessity, happen at any time ignoring these locks.

(2) Support for unbuffered I/O. The data is kept in the bounce buffer and
the pagecache is not used. This can be turned on with an inode flag.

(3) Support for direct I/O. This is basically unbuffered I/O with some
extra restrictions and no RMW.

(4) Support for using a bounce buffer in an operation. The bounce buffer
may be bigger than the target data/buffer, allowing for crypto
rounding.

(5) ->write_begin() and ->write_end() are ignored in favour of merging all
of that into one function, netfs_perform_write(), thereby avoiding the
function pointer traversals.

(6) Support for write-through caching in the pagecache.
netfs_perform_write() adds the pages is modifies to an I/O operation
as it goes and directly marks them writeback rather than dirty. When
writing back from write-through, it limits the range written back.
This should allow CIFS to deal with byte-range mandatory locks
correctly.

(7) O_*SYNC and RWF_*SYNC writes use write-through rather than writing to
the pagecache and then flushing afterwards. An AIO O_*SYNC write will
notify of completion when the sub-writes all complete.

(8) Support for write-streaming where modifed data is held in !uptodate
folios, with a private struct attached indicating the range that is
valid.

(9) Support for write grouping, multiplexing a pointer to a group in the
folio private data with the write-streaming data. The writepages
algorithm only writes stuff back that's in the nominated group. This
is intended for use by Ceph to write is snaps in order.

(10) Skipping reads for which we know the server could only supply zeros or
EOF (for instance if we've done a local write that leaves a hole in
the file and extends the local inode size).

General notes:

(1) The fscache module is merged into the netfslib module to avoid cyclic
exported symbol usage that prevents either module from being loaded.

(2) Some helpers from fscache are reassigned to netfslib by name.

(3) netfslib now makes use of folio->private, which means the filesystem
can't use it.

(4) The filesystem provides wrappers to call the write helpers, allowing
it to do pre-validation, oplock/capability fetching and the passing in
of write group info.

(5) I want to try flushing the data when tearing down an inode before
invalidating it to try and render launder_folio unnecessary.

(6) Write-through caching will generate and dispatch write subrequests as
it gathers enough data to hit wsize and has whole pages that at least
span that size. This needs to be a bit more flexible, allowing for a
filesystem such as CIFS to have a variable wsize.

(7) The filesystem driver is just given read and write calls with an
iov_iter describing the data/buffer to use. Ideally, they don't see
pages or folios at all. A function, extract_iter_to_sg(), is already
available to decant part of an iterator into a scatterlist for crypto
purposes.


9P notes:

(1) I haven't managed to test this as I haven't been able to get Ganesha
to work correctly with 9P.

(2) Writes should now occur in larger-than-page-sized chunks.

(3) It should be possible to turn on multipage folio support in 9P now.


Changes
=======
ver #4)
- Slimmed down the branch:
- Split the cifs-related patches off to a separate branch (cifs-netfs)
- Deferred the content-encryption to the in-progress ceph changes.
- Deferred the use-PG_writeback rather than PG_fscache patch
- Rebased on a later linux-next with afs-rotation patches.

ver #3)
- Moved the fscache module into netfslib to avoid export cycles.
- Fixed a bunch of bugs.
- Got CIFS to pass as much of xfstests as possible.
- Added a patch to make 9P use all the helpers.
- Added a patch to stop using PG_fscache, but rather dirty pages on
reading and have writepages write to the cache.

ver #2)
- Folded the addition of NETFS_RREQ_NONBLOCK/BLOCKED into first patch that
uses them.
- Folded addition of rsize member into first user.
- Don't set rsize in ceph (yet) and set it in kafs to 256KiB. cifs sets
it dynamically.
- Moved direct_bv next to direct_bv_count in struct netfs_io_request and
labelled it with a __counted_by().
- Passed flags into netfs_xa_store_and_mark() rather than two bools.
- Removed netfs_set_up_buffer() as it wasn't used.

David

Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2

David Howells (39):
netfs, fscache: Move fs/fscache/* into fs/netfs/
netfs, fscache: Combine fscache with netfs
netfs, fscache: Remove ->begin_cache_operation
netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a
symlink
netfs: Move pinning-for-writeback from fscache to netfs
netfs: Add a procfile to list in-progress requests
netfs: Allow the netfs to make the io (sub)request alloc larger
netfs: Add a ->free_subrequest() op
afs: Don't use folio->private to record partial modification
netfs: Provide invalidate_folio and release_folio calls
netfs: Implement unbuffered/DIO vs buffered I/O locking
netfs: Add iov_iters to (sub)requests to describe various buffers
netfs: Add support for DIO buffering
netfs: Provide tools to create a buffer in an xarray
netfs: Add bounce buffering support
netfs: Add func to calculate pagecount/size-limited span of an
iterator
netfs: Limit subrequest by size or number of segments
netfs: Export netfs_put_subrequest() and some tracepoints
netfs: Extend the netfs_io_*request structs to handle writes
netfs: Add a hook to allow tell the netfs to update its i_size
netfs: Make netfs_put_request() handle a NULL pointer
netfs: Make the refcounting of netfs_begin_read() easier to use
netfs: Prep to use folio->private for write grouping and streaming
write
netfs: Dispatch write requests to process a writeback slice
netfs: Provide func to copy data to pagecache for buffered write
netfs: Make netfs_read_folio() handle streaming-write pages
netfs: Allocate multipage folios in the writepath
netfs: Implement support for unbuffered/DIO read
netfs: Implement unbuffered/DIO write support
netfs: Implement buffered write API
netfs: Allow buffered shared-writeable mmap through
netfs_page_mkwrite()
netfs: Provide netfs_file_read_iter()
netfs, cachefiles: Pass upper bound length to allow expansion
netfs: Provide a writepages implementation
netfs: Provide a launder_folio implementation
netfs: Implement a write-through caching option
netfs: Optimise away reads above the point at which there can be no
data
afs: Use the netfs write helpers
9p: Use netfslib read/write_iter

Documentation/filesystems/netfs_library.rst | 23 +-
MAINTAINERS | 2 +-
fs/9p/vfs_addr.c | 352 +----
fs/9p/vfs_file.c | 89 +-
fs/9p/vfs_inode.c | 5 +-
fs/9p/vfs_super.c | 14 +-
fs/Kconfig | 1 -
fs/Makefile | 1 -
fs/afs/file.c | 213 +--
fs/afs/inode.c | 26 +-
fs/afs/internal.h | 72 +-
fs/afs/super.c | 2 +-
fs/afs/write.c | 826 +----------
fs/cachefiles/internal.h | 2 +-
fs/cachefiles/io.c | 10 +-
fs/cachefiles/ondemand.c | 2 +-
fs/ceph/addr.c | 25 +-
fs/ceph/cache.h | 35 +-
fs/ceph/inode.c | 2 +-
fs/fs-writeback.c | 10 +-
fs/fscache/Kconfig | 40 -
fs/fscache/Makefile | 16 -
fs/fscache/internal.h | 277 ----
fs/netfs/Kconfig | 39 +
fs/netfs/Makefile | 22 +-
fs/netfs/buffered_read.c | 229 ++-
fs/netfs/buffered_write.c | 1247 +++++++++++++++++
fs/netfs/direct_read.c | 252 ++++
fs/netfs/direct_write.c | 170 +++
fs/{fscache/cache.c => netfs/fscache_cache.c} | 0
.../cookie.c => netfs/fscache_cookie.c} | 0
fs/netfs/fscache_internal.h | 14 +
fs/{fscache/io.c => netfs/fscache_io.c} | 42 +-
fs/{fscache/main.c => netfs/fscache_main.c} | 25 +-
fs/{fscache/proc.c => netfs/fscache_proc.c} | 23 +-
fs/{fscache/stats.c => netfs/fscache_stats.c} | 4 +-
.../volume.c => netfs/fscache_volume.c} | 0
fs/netfs/internal.h | 288 ++++
fs/netfs/io.c | 214 ++-
fs/netfs/iterator.c | 97 ++
fs/netfs/locking.c | 215 +++
fs/netfs/main.c | 110 ++
fs/netfs/misc.c | 260 ++++
fs/netfs/objects.c | 63 +-
fs/netfs/output.c | 478 +++++++
fs/netfs/stats.c | 31 +-
fs/nfs/Kconfig | 4 +-
fs/nfs/fscache.c | 7 -
fs/smb/client/cifsfs.c | 9 +-
fs/smb/client/file.c | 18 +-
fs/smb/client/fscache.c | 2 +-
include/linux/fs.h | 2 +-
include/linux/fscache.h | 45 -
include/linux/netfs.h | 176 ++-
include/linux/writeback.h | 2 +-
include/trace/events/afs.h | 31 -
include/trace/events/netfs.h | 155 +-
mm/filemap.c | 1 +
58 files changed, 4197 insertions(+), 2123 deletions(-)
delete mode 100644 fs/fscache/Kconfig
delete mode 100644 fs/fscache/Makefile
delete mode 100644 fs/fscache/internal.h
create mode 100644 fs/netfs/buffered_write.c
create mode 100644 fs/netfs/direct_read.c
create mode 100644 fs/netfs/direct_write.c
rename fs/{fscache/cache.c => netfs/fscache_cache.c} (100%)
rename fs/{fscache/cookie.c => netfs/fscache_cookie.c} (100%)
create mode 100644 fs/netfs/fscache_internal.h
rename fs/{fscache/io.c => netfs/fscache_io.c} (86%)
rename fs/{fscache/main.c => netfs/fscache_main.c} (84%)
rename fs/{fscache/proc.c => netfs/fscache_proc.c} (58%)
rename fs/{fscache/stats.c => netfs/fscache_stats.c} (97%)
rename fs/{fscache/volume.c => netfs/fscache_volume.c} (100%)
create mode 100644 fs/netfs/locking.c
create mode 100644 fs/netfs/misc.c
create mode 100644 fs/netfs/output.c



2023-12-13 15:24:42

by David Howells

[permalink] [raw]
Subject: [PATCH v4 01/39] netfs, fscache: Move fs/fscache/* into fs/netfs/

There's a problem with dependencies between netfslib and fscache as each
wants to access some functions of the other. Deal with this by moving
fs/fscache/* into fs/netfs/ and renaming those files to begin with
"fscache-".

For the moment, the moved files are changed as little as possible and an
fscache module is still built. A subsequent patch will integrate them.

Signed-off-by: David Howells <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Christian Brauner <[email protected]>
cc: [email protected]
cc: [email protected]
---
MAINTAINERS | 2 +-
fs/Kconfig | 1 -
fs/Makefile | 1 -
fs/fscache/Kconfig | 40 -------------------
fs/fscache/Makefile | 16 --------
fs/netfs/Kconfig | 39 ++++++++++++++++++
fs/netfs/Makefile | 14 ++++++-
fs/{fscache/cache.c => netfs/fscache_cache.c} | 0
.../cookie.c => netfs/fscache_cookie.c} | 0
.../internal.h => netfs/fscache_internal.h} | 0
fs/{fscache/io.c => netfs/fscache_io.c} | 0
fs/{fscache/main.c => netfs/fscache_main.c} | 0
fs/{fscache/proc.c => netfs/fscache_proc.c} | 0
fs/{fscache/stats.c => netfs/fscache_stats.c} | 0
.../volume.c => netfs/fscache_volume.c} | 0
fs/netfs/internal.h | 5 +++
fs/netfs/main.c | 5 ++-
17 files changed, 61 insertions(+), 62 deletions(-)
delete mode 100644 fs/fscache/Kconfig
delete mode 100644 fs/fscache/Makefile
rename fs/{fscache/cache.c => netfs/fscache_cache.c} (100%)
rename fs/{fscache/cookie.c => netfs/fscache_cookie.c} (100%)
rename fs/{fscache/internal.h => netfs/fscache_internal.h} (100%)
rename fs/{fscache/io.c => netfs/fscache_io.c} (100%)
rename fs/{fscache/main.c => netfs/fscache_main.c} (100%)
rename fs/{fscache/proc.c => netfs/fscache_proc.c} (100%)
rename fs/{fscache/stats.c => netfs/fscache_stats.c} (100%)
rename fs/{fscache/volume.c => netfs/fscache_volume.c} (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 902708b4530d..10eff1e83ec1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8617,7 +8617,7 @@ M: David Howells <[email protected]>
L: [email protected] (moderated for non-subscribers)
S: Supported
F: Documentation/filesystems/caching/
-F: fs/fscache/
+F: fs/netfs/fscache-*
F: include/linux/fscache*.h

FSCRYPT: FILE SYSTEM LEVEL ENCRYPTION SUPPORT
diff --git a/fs/Kconfig b/fs/Kconfig
index cf62d86b514f..26c3821bf1fb 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -140,7 +140,6 @@ source "fs/overlayfs/Kconfig"
menu "Caches"

source "fs/netfs/Kconfig"
-source "fs/fscache/Kconfig"
source "fs/cachefiles/Kconfig"

endmenu
diff --git a/fs/Makefile b/fs/Makefile
index 75522f88e763..af7632368e98 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -60,7 +60,6 @@ obj-$(CONFIG_DLM) += dlm/

# Do not add any filesystems before this line
obj-$(CONFIG_NETFS_SUPPORT) += netfs/
-obj-$(CONFIG_FSCACHE) += fscache/
obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_EXT4_FS) += ext4/
# We place ext4 before ext2 so that clean ext3 root fs's do NOT mount using the
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
deleted file mode 100644
index b313a978ae0a..000000000000
--- a/fs/fscache/Kconfig
+++ /dev/null
@@ -1,40 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-
-config FSCACHE
- tristate "General filesystem local caching manager"
- select NETFS_SUPPORT
- help
- This option enables a generic filesystem caching manager that can be
- used by various network and other filesystems to cache data locally.
- Different sorts of caches can be plugged in, depending on the
- resources available.
-
- See Documentation/filesystems/caching/fscache.rst for more information.
-
-config FSCACHE_STATS
- bool "Gather statistical information on local caching"
- depends on FSCACHE && PROC_FS
- select NETFS_STATS
- help
- This option causes statistical information to be gathered on local
- caching and exported through file:
-
- /proc/fs/fscache/stats
-
- The gathering of statistics adds a certain amount of overhead to
- execution as there are a quite a few stats gathered, and on a
- multi-CPU system these may be on cachelines that keep bouncing
- between CPUs. On the other hand, the stats are very useful for
- debugging purposes. Saying 'Y' here is recommended.
-
- See Documentation/filesystems/caching/fscache.rst for more information.
-
-config FSCACHE_DEBUG
- bool "Debug FS-Cache"
- depends on FSCACHE
- help
- This permits debugging to be dynamically enabled in the local caching
- management module. If this is set, the debugging output may be
- enabled by setting bits in /sys/modules/fscache/parameter/debug.
-
- See Documentation/filesystems/caching/fscache.rst for more information.
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
deleted file mode 100644
index afb090ea16c4..000000000000
--- a/fs/fscache/Makefile
+++ /dev/null
@@ -1,16 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-#
-# Makefile for general filesystem caching code
-#
-
-fscache-y := \
- cache.o \
- cookie.o \
- io.o \
- main.o \
- volume.o
-
-fscache-$(CONFIG_PROC_FS) += proc.o
-fscache-$(CONFIG_FSCACHE_STATS) += stats.o
-
-obj-$(CONFIG_FSCACHE) := fscache.o
diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
index b4db21022cb4..b4378688357c 100644
--- a/fs/netfs/Kconfig
+++ b/fs/netfs/Kconfig
@@ -21,3 +21,42 @@ config NETFS_STATS
multi-CPU system these may be on cachelines that keep bouncing
between CPUs. On the other hand, the stats are very useful for
debugging purposes. Saying 'Y' here is recommended.
+
+config FSCACHE
+ tristate "General filesystem local caching manager"
+ select NETFS_SUPPORT
+ help
+ This option enables a generic filesystem caching manager that can be
+ used by various network and other filesystems to cache data locally.
+ Different sorts of caches can be plugged in, depending on the
+ resources available.
+
+ See Documentation/filesystems/caching/fscache.rst for more information.
+
+config FSCACHE_STATS
+ bool "Gather statistical information on local caching"
+ depends on FSCACHE && PROC_FS
+ select NETFS_STATS
+ help
+ This option causes statistical information to be gathered on local
+ caching and exported through file:
+
+ /proc/fs/fscache/stats
+
+ The gathering of statistics adds a certain amount of overhead to
+ execution as there are a quite a few stats gathered, and on a
+ multi-CPU system these may be on cachelines that keep bouncing
+ between CPUs. On the other hand, the stats are very useful for
+ debugging purposes. Saying 'Y' here is recommended.
+
+ See Documentation/filesystems/caching/fscache.rst for more information.
+
+config FSCACHE_DEBUG
+ bool "Debug FS-Cache"
+ depends on FSCACHE
+ help
+ This permits debugging to be dynamically enabled in the local caching
+ management module. If this is set, the debugging output may be
+ enabled by setting bits in /sys/modules/fscache/parameter/debug.
+
+ See Documentation/filesystems/caching/fscache.rst for more information.
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 386d6fb92793..bbb2b824bd5e 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -1,5 +1,17 @@
# SPDX-License-Identifier: GPL-2.0

+fscache-y := \
+ fscache_cache.o \
+ fscache_cookie.o \
+ fscache_io.o \
+ fscache_main.o \
+ fscache_volume.o
+
+fscache-$(CONFIG_PROC_FS) += fscache_proc.o
+fscache-$(CONFIG_FSCACHE_STATS) += fscache_stats.o
+
+obj-$(CONFIG_FSCACHE) := fscache.o
+
netfs-y := \
buffered_read.o \
io.o \
@@ -9,4 +21,4 @@ netfs-y := \

netfs-$(CONFIG_NETFS_STATS) += stats.o

-obj-$(CONFIG_NETFS_SUPPORT) := netfs.o
+obj-$(CONFIG_NETFS_SUPPORT) += netfs.o
diff --git a/fs/fscache/cache.c b/fs/netfs/fscache_cache.c
similarity index 100%
rename from fs/fscache/cache.c
rename to fs/netfs/fscache_cache.c
diff --git a/fs/fscache/cookie.c b/fs/netfs/fscache_cookie.c
similarity index 100%
rename from fs/fscache/cookie.c
rename to fs/netfs/fscache_cookie.c
diff --git a/fs/fscache/internal.h b/fs/netfs/fscache_internal.h
similarity index 100%
rename from fs/fscache/internal.h
rename to fs/netfs/fscache_internal.h
diff --git a/fs/fscache/io.c b/fs/netfs/fscache_io.c
similarity index 100%
rename from fs/fscache/io.c
rename to fs/netfs/fscache_io.c
diff --git a/fs/fscache/main.c b/fs/netfs/fscache_main.c
similarity index 100%
rename from fs/fscache/main.c
rename to fs/netfs/fscache_main.c
diff --git a/fs/fscache/proc.c b/fs/netfs/fscache_proc.c
similarity index 100%
rename from fs/fscache/proc.c
rename to fs/netfs/fscache_proc.c
diff --git a/fs/fscache/stats.c b/fs/netfs/fscache_stats.c
similarity index 100%
rename from fs/fscache/stats.c
rename to fs/netfs/fscache_stats.c
diff --git a/fs/fscache/volume.c b/fs/netfs/fscache_volume.c
similarity index 100%
rename from fs/fscache/volume.c
rename to fs/netfs/fscache_volume.c
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index 43fac1b14e40..e96432499eb2 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -5,9 +5,12 @@
* Written by David Howells ([email protected])
*/

+#include <linux/slab.h>
+#include <linux/seq_file.h>
#include <linux/netfs.h>
#include <linux/fscache.h>
#include <trace/events/netfs.h>
+#include "fscache_internal.h"

#ifdef pr_fmt
#undef pr_fmt
@@ -107,6 +110,7 @@ static inline bool netfs_is_cache_enabled(struct netfs_inode *ctx)
/*
* debug tracing
*/
+#if 0
#define dbgprintk(FMT, ...) \
printk("[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__)

@@ -143,3 +147,4 @@ do { \
#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__)
#endif
+#endif
diff --git a/fs/netfs/main.c b/fs/netfs/main.c
index 068568702957..237c54a01d97 100644
--- a/fs/netfs/main.c
+++ b/fs/netfs/main.c
@@ -8,8 +8,8 @@
#include <linux/module.h>
#include <linux/export.h>
#include "internal.h"
-#define CREATE_TRACE_POINTS
-#include <trace/events/netfs.h>
+//#define CREATE_TRACE_POINTS
+//#include <trace/events/netfs.h>

MODULE_DESCRIPTION("Network fs support");
MODULE_AUTHOR("Red Hat, Inc.");
@@ -18,3 +18,4 @@ MODULE_LICENSE("GPL");
unsigned netfs_debug;
module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask");
+


2023-12-14 14:12:03

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

On Wed, 2023-12-13 at 15:23 +0000, David Howells wrote:
> Hi Jeff, Steve, Dominique,
>
> I have been working on my netfslib helpers to the point that I can run
> xfstests on AFS to completion (both with write-back buffering and, with a
> small patch, write-through buffering in the pagecache). I have a patch for
> 9P, but am currently unable to test it.
>
> The patches remove a little over 800 lines from AFS, 300 from 9P, albeit with
> around 3000 lines added to netfs. Hopefully, I will be able to remove a bunch
> of lines from Ceph too.
>
> I've split the CIFS patches out to a separate branch, cifs-netfs, where a
> further 2000+ lines are removed. I can run a certain amount of xfstests on
> CIFS, though I'm running into ksmbd issues and not all the tests work
> correctly because of issues between fallocate and what the SMB protocol
> actually supports.
>
> I've also dropped the content-crypto patches out for the moment as they're
> only usable by the ceph changes which I'm still working on.
>
> The patch to use PG_writeback instead of PG_fscache for writing to the
> cache has also been deferred, pending 9p, afs, ceph and cifs all being
> converted.
>
> The main aims of these patches are to get high-level I/O and knowledge of
> the pagecache out of the filesystem drivers as much as possible and to get
> rid, as much of possible, of the knowledge that pages/folios exist.
>
> Further, I would like to see ->write_begin, ->write_end and ->launder_folio
> go away.
>
> Features that are added by these patches to that which is already there in
> netfslib:
>
> (1) NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to
> prevent these from happening at the same time. mmap'd I/O can, of
> necessity, happen at any time ignoring these locks.
>
> (2) Support for unbuffered I/O. The data is kept in the bounce buffer and
> the pagecache is not used. This can be turned on with an inode flag.
>
> (3) Support for direct I/O. This is basically unbuffered I/O with some
> extra restrictions and no RMW.
>
> (4) Support for using a bounce buffer in an operation. The bounce buffer
> may be bigger than the target data/buffer, allowing for crypto
> rounding.
>
> (5) ->write_begin() and ->write_end() are ignored in favour of merging all
> of that into one function, netfs_perform_write(), thereby avoiding the
> function pointer traversals.
>
> (6) Support for write-through caching in the pagecache.
> netfs_perform_write() adds the pages is modifies to an I/O operation
> as it goes and directly marks them writeback rather than dirty. When
> writing back from write-through, it limits the range written back.
> This should allow CIFS to deal with byte-range mandatory locks
> correctly.
>
> (7) O_*SYNC and RWF_*SYNC writes use write-through rather than writing to
> the pagecache and then flushing afterwards. An AIO O_*SYNC write will
> notify of completion when the sub-writes all complete.
>
> (8) Support for write-streaming where modifed data is held in !uptodate
> folios, with a private struct attached indicating the range that is
> valid.
>
> (9) Support for write grouping, multiplexing a pointer to a group in the
> folio private data with the write-streaming data. The writepages
> algorithm only writes stuff back that's in the nominated group. This
> is intended for use by Ceph to write is snaps in order.
>
> (10) Skipping reads for which we know the server could only supply zeros or
> EOF (for instance if we've done a local write that leaves a hole in
> the file and extends the local inode size).
>
> General notes:
>
> (1) The fscache module is merged into the netfslib module to avoid cyclic
> exported symbol usage that prevents either module from being loaded.
>
> (2) Some helpers from fscache are reassigned to netfslib by name.
>
> (3) netfslib now makes use of folio->private, which means the filesystem
> can't use it.
>
> (4) The filesystem provides wrappers to call the write helpers, allowing
> it to do pre-validation, oplock/capability fetching and the passing in
> of write group info.
>
> (5) I want to try flushing the data when tearing down an inode before
> invalidating it to try and render launder_folio unnecessary.
>
> (6) Write-through caching will generate and dispatch write subrequests as
> it gathers enough data to hit wsize and has whole pages that at least
> span that size. This needs to be a bit more flexible, allowing for a
> filesystem such as CIFS to have a variable wsize.
>
> (7) The filesystem driver is just given read and write calls with an
> iov_iter describing the data/buffer to use. Ideally, they don't see
> pages or folios at all. A function, extract_iter_to_sg(), is already
> available to decant part of an iterator into a scatterlist for crypto
> purposes.
>
>
> 9P notes:
>
> (1) I haven't managed to test this as I haven't been able to get Ganesha
> to work correctly with 9P.
>
> (2) Writes should now occur in larger-than-page-sized chunks.
>
> (3) It should be possible to turn on multipage folio support in 9P now.
>
>
> Changes
> =======
> ver #4)
> - Slimmed down the branch:
> - Split the cifs-related patches off to a separate branch (cifs-netfs)
> - Deferred the content-encryption to the in-progress ceph changes.
> - Deferred the use-PG_writeback rather than PG_fscache patch
> - Rebased on a later linux-next with afs-rotation patches.
>
> ver #3)
> - Moved the fscache module into netfslib to avoid export cycles.
> - Fixed a bunch of bugs.
> - Got CIFS to pass as much of xfstests as possible.
> - Added a patch to make 9P use all the helpers.
> - Added a patch to stop using PG_fscache, but rather dirty pages on
> reading and have writepages write to the cache.
>
> ver #2)
> - Folded the addition of NETFS_RREQ_NONBLOCK/BLOCKED into first patch that
> uses them.
> - Folded addition of rsize member into first user.
> - Don't set rsize in ceph (yet) and set it in kafs to 256KiB. cifs sets
> it dynamically.
> - Moved direct_bv next to direct_bv_count in struct netfs_io_request and
> labelled it with a __counted_by().
> - Passed flags into netfs_xa_store_and_mark() rather than two bools.
> - Removed netfs_set_up_buffer() as it wasn't used.
>
> David
>
> Link: https://lore.kernel.org/r/[email protected]/ # v1
> Link: https://lore.kernel.org/r/[email protected]/ # v2
>
> David Howells (39):
> netfs, fscache: Move fs/fscache/* into fs/netfs/
> netfs, fscache: Combine fscache with netfs
> netfs, fscache: Remove ->begin_cache_operation
> netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a
> symlink
> netfs: Move pinning-for-writeback from fscache to netfs
> netfs: Add a procfile to list in-progress requests
> netfs: Allow the netfs to make the io (sub)request alloc larger
> netfs: Add a ->free_subrequest() op
> afs: Don't use folio->private to record partial modification
> netfs: Provide invalidate_folio and release_folio calls
> netfs: Implement unbuffered/DIO vs buffered I/O locking
> netfs: Add iov_iters to (sub)requests to describe various buffers
> netfs: Add support for DIO buffering
> netfs: Provide tools to create a buffer in an xarray
> netfs: Add bounce buffering support
> netfs: Add func to calculate pagecount/size-limited span of an
> iterator
> netfs: Limit subrequest by size or number of segments
> netfs: Export netfs_put_subrequest() and some tracepoints
> netfs: Extend the netfs_io_*request structs to handle writes
> netfs: Add a hook to allow tell the netfs to update its i_size
> netfs: Make netfs_put_request() handle a NULL pointer
> netfs: Make the refcounting of netfs_begin_read() easier to use
> netfs: Prep to use folio->private for write grouping and streaming
> write
> netfs: Dispatch write requests to process a writeback slice
> netfs: Provide func to copy data to pagecache for buffered write
> netfs: Make netfs_read_folio() handle streaming-write pages
> netfs: Allocate multipage folios in the writepath
> netfs: Implement support for unbuffered/DIO read
> netfs: Implement unbuffered/DIO write support
> netfs: Implement buffered write API
> netfs: Allow buffered shared-writeable mmap through
> netfs_page_mkwrite()
> netfs: Provide netfs_file_read_iter()
> netfs, cachefiles: Pass upper bound length to allow expansion
> netfs: Provide a writepages implementation
> netfs: Provide a launder_folio implementation
> netfs: Implement a write-through caching option
> netfs: Optimise away reads above the point at which there can be no
> data
> afs: Use the netfs write helpers
> 9p: Use netfslib read/write_iter
>
> Documentation/filesystems/netfs_library.rst | 23 +-
> MAINTAINERS | 2 +-
> fs/9p/vfs_addr.c | 352 +----
> fs/9p/vfs_file.c | 89 +-
> fs/9p/vfs_inode.c | 5 +-
> fs/9p/vfs_super.c | 14 +-
> fs/Kconfig | 1 -
> fs/Makefile | 1 -
> fs/afs/file.c | 213 +--
> fs/afs/inode.c | 26 +-
> fs/afs/internal.h | 72 +-
> fs/afs/super.c | 2 +-
> fs/afs/write.c | 826 +----------
> fs/cachefiles/internal.h | 2 +-
> fs/cachefiles/io.c | 10 +-
> fs/cachefiles/ondemand.c | 2 +-
> fs/ceph/addr.c | 25 +-
> fs/ceph/cache.h | 35 +-
> fs/ceph/inode.c | 2 +-
> fs/fs-writeback.c | 10 +-
> fs/fscache/Kconfig | 40 -
> fs/fscache/Makefile | 16 -
> fs/fscache/internal.h | 277 ----
> fs/netfs/Kconfig | 39 +
> fs/netfs/Makefile | 22 +-
> fs/netfs/buffered_read.c | 229 ++-
> fs/netfs/buffered_write.c | 1247 +++++++++++++++++
> fs/netfs/direct_read.c | 252 ++++
> fs/netfs/direct_write.c | 170 +++
> fs/{fscache/cache.c => netfs/fscache_cache.c} | 0
> .../cookie.c => netfs/fscache_cookie.c} | 0
> fs/netfs/fscache_internal.h | 14 +
> fs/{fscache/io.c => netfs/fscache_io.c} | 42 +-
> fs/{fscache/main.c => netfs/fscache_main.c} | 25 +-
> fs/{fscache/proc.c => netfs/fscache_proc.c} | 23 +-
> fs/{fscache/stats.c => netfs/fscache_stats.c} | 4 +-
> .../volume.c => netfs/fscache_volume.c} | 0
> fs/netfs/internal.h | 288 ++++
> fs/netfs/io.c | 214 ++-
> fs/netfs/iterator.c | 97 ++
> fs/netfs/locking.c | 215 +++
> fs/netfs/main.c | 110 ++
> fs/netfs/misc.c | 260 ++++
> fs/netfs/objects.c | 63 +-
> fs/netfs/output.c | 478 +++++++
> fs/netfs/stats.c | 31 +-
> fs/nfs/Kconfig | 4 +-
> fs/nfs/fscache.c | 7 -
> fs/smb/client/cifsfs.c | 9 +-
> fs/smb/client/file.c | 18 +-
> fs/smb/client/fscache.c | 2 +-
> include/linux/fs.h | 2 +-
> include/linux/fscache.h | 45 -
> include/linux/netfs.h | 176 ++-
> include/linux/writeback.h | 2 +-
> include/trace/events/afs.h | 31 -
> include/trace/events/netfs.h | 155 +-
> mm/filemap.c | 1 +
> 58 files changed, 4197 insertions(+), 2123 deletions(-)
> delete mode 100644 fs/fscache/Kconfig
> delete mode 100644 fs/fscache/Makefile
> delete mode 100644 fs/fscache/internal.h
> create mode 100644 fs/netfs/buffered_write.c
> create mode 100644 fs/netfs/direct_read.c
> create mode 100644 fs/netfs/direct_write.c
> rename fs/{fscache/cache.c => netfs/fscache_cache.c} (100%)
> rename fs/{fscache/cookie.c => netfs/fscache_cookie.c} (100%)
> create mode 100644 fs/netfs/fscache_internal.h
> rename fs/{fscache/io.c => netfs/fscache_io.c} (86%)
> rename fs/{fscache/main.c => netfs/fscache_main.c} (84%)
> rename fs/{fscache/proc.c => netfs/fscache_proc.c} (58%)
> rename fs/{fscache/stats.c => netfs/fscache_stats.c} (97%)
> rename fs/{fscache/volume.c => netfs/fscache_volume.c} (100%)
> create mode 100644 fs/netfs/locking.c
> create mode 100644 fs/netfs/misc.c
> create mode 100644 fs/netfs/output.c
>

This all looks pretty great, David. Nice work! I had a few comments on a
few of them, but most are no big deal. It'd be nice to get this into
linux-next soon.

On the ones where I didn't have comments, you can add:

Reviewed-by: Jeff Layton <[email protected]>

2023-12-15 12:03:33

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

On Wed, 13 Dec 2023 15:23:10 +0000, David Howells wrote:
> I have been working on my netfslib helpers to the point that I can run
> xfstests on AFS to completion (both with write-back buffering and, with a
> small patch, write-through buffering in the pagecache). I have a patch for
> 9P, but am currently unable to test it.
>
> The patches remove a little over 800 lines from AFS, 300 from 9P, albeit with
> around 3000 lines added to netfs. Hopefully, I will be able to remove a bunch
> of lines from Ceph too.
>
> [...]

Ok, that's on vfs.netfs for now. It's based on vfs.rw as that has splice
changes that would cause needless conflicts. It helps to not have such
series based on -next.

Fwiw, I'd rather have this based on a mainline tag in the future. Linus
has stated loads of times that he doesn't mind handling merge conflicts
and for me it's a lot easier if I have a stable mainline tag. linux-next
is too volatile. Thanks!

---

Applied to the vfs.netfs branch of the vfs/vfs.git tree.
Patches in the vfs.netfs branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.netfs

[01/39] netfs, fscache: Move fs/fscache/* into fs/netfs/
https://git.kernel.org/vfs/vfs/c/94029f4c6459
[02/39] netfs, fscache: Combine fscache with netfs
https://git.kernel.org/vfs/vfs/c/77eb7aa4805e
[03/39] netfs, fscache: Remove ->begin_cache_operation
https://git.kernel.org/vfs/vfs/c/a7f70e4b4ebf
[04/39] netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink
https://git.kernel.org/vfs/vfs/c/131e9eb7bd1f
[05/39] netfs: Move pinning-for-writeback from fscache to netfs
https://git.kernel.org/vfs/vfs/c/1792e1940f54
[06/39] netfs: Add a procfile to list in-progress requests
https://git.kernel.org/vfs/vfs/c/1491057f69dc
[07/39] netfs: Allow the netfs to make the io (sub)request alloc larger
https://git.kernel.org/vfs/vfs/c/6c3efd20150f
[08/39] netfs: Add a ->free_subrequest() op
https://git.kernel.org/vfs/vfs/c/e0b44a08ac20
[09/39] afs: Don't use folio->private to record partial modification
https://git.kernel.org/vfs/vfs/c/9d2a996de9a2
[10/39] netfs: Provide invalidate_folio and release_folio calls
https://git.kernel.org/vfs/vfs/c/6136f4723a2e
[11/39] netfs: Implement unbuffered/DIO vs buffered I/O locking
https://git.kernel.org/vfs/vfs/c/1243d122feca
[12/39] netfs: Add iov_iters to (sub)requests to describe various buffers
https://git.kernel.org/vfs/vfs/c/a164fd03f073
[13/39] netfs: Add support for DIO buffering
https://git.kernel.org/vfs/vfs/c/669e8c33691d
[14/39] netfs: Provide tools to create a buffer in an xarray
https://git.kernel.org/vfs/vfs/c/c554dc89292d
[15/39] netfs: Add bounce buffering support
https://git.kernel.org/vfs/vfs/c/476c24c3e80b
[16/39] netfs: Add func to calculate pagecount/size-limited span of an iterator
https://git.kernel.org/vfs/vfs/c/25d0f84de71d
[17/39] netfs: Limit subrequest by size or number of segments
https://git.kernel.org/vfs/vfs/c/53ee4e38619a
[18/39] netfs: Export netfs_put_subrequest() and some tracepoints
https://git.kernel.org/vfs/vfs/c/ac3fc1846a06
[19/39] netfs: Extend the netfs_io_*request structs to handle writes
https://git.kernel.org/vfs/vfs/c/90999722fa0b
[20/39] netfs: Add a hook to allow tell the netfs to update its i_size
https://git.kernel.org/vfs/vfs/c/27dfd078db66
[21/39] netfs: Make netfs_put_request() handle a NULL pointer
https://git.kernel.org/vfs/vfs/c/0ffd2319fb64
[22/39] netfs: Make the refcounting of netfs_begin_read() easier to use
https://git.kernel.org/vfs/vfs/c/f7125395caba
[23/39] netfs: Prep to use folio->private for write grouping and streaming write
https://git.kernel.org/vfs/vfs/c/acadf22234e3
[24/39] netfs: Dispatch write requests to process a writeback slice
https://git.kernel.org/vfs/vfs/c/17c2b775e3f4
[25/39] netfs: Provide func to copy data to pagecache for buffered write
https://git.kernel.org/vfs/vfs/c/dd6ed9717a0b
[26/39] netfs: Make netfs_read_folio() handle streaming-write pages
https://git.kernel.org/vfs/vfs/c/c958b464f07f
[27/39] netfs: Allocate multipage folios in the writepath
https://git.kernel.org/vfs/vfs/c/6076cc863769
[28/39] netfs: Implement support for unbuffered/DIO read
https://git.kernel.org/vfs/vfs/c/9409fe70ca46
[29/39] netfs: Implement unbuffered/DIO write support
https://git.kernel.org/vfs/vfs/c/7acd7b902241
[30/39] netfs: Implement buffered write API
https://git.kernel.org/vfs/vfs/c/7b1321366337
[31/39] netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()
https://git.kernel.org/vfs/vfs/c/d156da6e235c
[32/39] netfs: Provide netfs_file_read_iter()
https://git.kernel.org/vfs/vfs/c/899ae1e25a64
[33/39] netfs, cachefiles: Pass upper bound length to allow expansion
https://git.kernel.org/vfs/vfs/c/52882c158a30
[34/39] netfs: Provide a writepages implementation
https://git.kernel.org/vfs/vfs/c/02bf7b4afdba
[35/39] netfs: Provide a launder_folio implementation
https://git.kernel.org/vfs/vfs/c/cf4e16d98659
[36/39] netfs: Implement a write-through caching option
https://git.kernel.org/vfs/vfs/c/7bf6f13f4a63
[37/39] netfs: Optimise away reads above the point at which there can be no data
https://git.kernel.org/vfs/vfs/c/fad15293bd0d
[38/39] afs: Use the netfs write helpers
https://git.kernel.org/vfs/vfs/c/0095df30ad7b
[39/39] 9p: Use netfslib read/write_iter
https://git.kernel.org/vfs/vfs/c/361e79613421

2023-12-15 13:33:54

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

Christian Brauner wrote on Fri, Dec 15, 2023 at 01:03:14PM +0100:
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.netfs

This doesn't seem to build:
-------
CC [M] fs/netfs/buffered_write.o
fs/netfs/buffered_write.c: In function ‘netfs_kill_pages’:
fs/netfs/buffered_write.c:569:17: error: implicit declaration of function ‘generic_error_remove_folio’; did you mean ‘generic_error_remove_page’? [-Werror=implicit-function-declaration]
569 | generic_error_remove_folio(mapping, folio);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
| generic_error_remove_page
cc1: some warnings being treated as errors
-------

This helper is present in -next as commit af7628d6ec19 ("fs: convert
error_remove_page to error_remove_folio") (as of now's next), apparently
from akpm's mm-stable:
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm mm-stable

(which obviously have some conflict in afs when trying to merge...)


I'll go back to dhowell's tree to finally test 9p a bit,
sorry for lack of involvement just low on time all around.


Good luck (?),
--
Dominique Martinet | Asmadeus

2023-12-18 11:06:19

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

On Fri, Dec 15, 2023 at 10:29:43PM +0900, Dominique Martinet wrote:
> Christian Brauner wrote on Fri, Dec 15, 2023 at 01:03:14PM +0100:
> > tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> > branch: vfs.netfs
>
> This doesn't seem to build:

Yeah, I'm aware. That's why I didn't push it out. I couldn't finish the
rebase completely on Friday.

2023-12-20 10:09:02

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

Dominique Martinet <[email protected]> wrote:

> I'll go back to dhowell's tree to finally test 9p a bit,
> sorry for lack of involvement just low on time all around.

I've rebased my tree on -rc6 rather than linux-next for Christian to pull.

Ganesha keeps falling over:

[root@carina build]# valgrind ./ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F
==38960== Memcheck, a memory error detector
==38960== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==38960== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==38960== Command: ./ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -F
==38960==
==38960== Thread 138:
==38960== Invalid read of size 4
==38960== at 0x4DC32D6: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.c:41)
==38960== by 0x489700C: sync_cb (fsal_helper.c:1837)
==38960== by 0x49D79DF: mdc_read_super_cb (mdcache_file.c:559)
==38960== by 0x49D7AC5: mdc_read_cb (mdcache_file.c:582)
==38960== by 0x7B4B81F: vfs_read2 (file.c:1317)
==38960== by 0x49D7BCF: mdcache_read2 (mdcache_file.c:617)
==38960== by 0x4897173: fsal_read (fsal_helper.c:1849)
==38960== by 0x4A10FD4: _9p_read (9p_read.c:134)
==38960== by 0x4A0A024: _9p_process_buffer (9p_interpreter.c:181)
==38960== by 0x4A09DCB: _9p_tcp_process_request (9p_interpreter.c:133)
==38960== by 0x48CE182: _9p_execute (9p_dispatcher.c:315)
==38960== by 0x48CE508: _9p_worker_run (9p_dispatcher.c:412)
==38960== Address 0x24 is not stack'd, malloc'd or (recently) free'd
==38960==
==38960==
==38960== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==38960== Access not within mapped region at address 0x24
==38960== at 0x4DC32D6: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.c:41)
==38960== by 0x489700C: sync_cb (fsal_helper.c:1837)
==38960== by 0x49D79DF: mdc_read_super_cb (mdcache_file.c:559)
==38960== by 0x49D7AC5: mdc_read_cb (mdcache_file.c:582)
==38960== by 0x7B4B81F: vfs_read2 (file.c:1317)
==38960== by 0x49D7BCF: mdcache_read2 (mdcache_file.c:617)
==38960== by 0x4897173: fsal_read (fsal_helper.c:1849)
==38960== by 0x4A10FD4: _9p_read (9p_read.c:134)
==38960== by 0x4A0A024: _9p_process_buffer (9p_interpreter.c:181)
==38960== by 0x4A09DCB: _9p_tcp_process_request (9p_interpreter.c:133)
==38960== by 0x48CE182: _9p_execute (9p_dispatcher.c:315)
==38960== by 0x48CE508: _9p_worker_run (9p_dispatcher.c:412)

David


2023-12-20 13:26:19

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

On Wed, Dec 20, 2023 at 10:04:26AM +0000, David Howells wrote:
> Dominique Martinet <[email protected]> wrote:
>
> > I'll go back to dhowell's tree to finally test 9p a bit,
> > sorry for lack of involvement just low on time all around.
>
> I've rebased my tree on -rc6 rather than linux-next for Christian to pull.

Pulled. Thank you, David. It's on vfs.netfs.