2021-12-22 23:13:40

by David Howells

[permalink] [raw]
Subject: [PATCH v4 00/68] fscache, cachefiles: Rewrite


Here's a set of patches implements a rewrite of the fscache driver and a
matching rewrite of the cachefiles driver, significantly simplifying the
code compared to what's upstream, removing the complex operation scheduling
and object state machine in favour of something much smaller and simpler.

The patchset is structured such that the first few patches disable fscache
use by the network filesystems using it, remove the cachefiles driver
entirely and as much of the fscache driver as can be got away with without
causing build failures in the network filesystems. The patches after that
recreate fscache and then cachefiles, attempting to add the pieces in a
logical order. Finally, the filesystems are reenabled and then the very
last patch changes the documentation.


WHY REWRITE?
============

Fscache's operation scheduling API was intended to handle sequencing of
cache operations, which were all required (where possible) to run
asynchronously in parallel with the operations being done by the network
filesystem, whilst allowing the cache to be brought online and offline and
to interrupt service for invalidation.

With the advent of the tmpfile capacity in the VFS, however, an opportunity
arises to do invalidation much more simply, without having to wait for I/O
that's actually in progress: Cachefiles can simply create a tmpfile, cut
over the file pointer for the backing object attached to a cookie and
abandon the in-progress I/O, dismissing it upon completion.

Future work here would involve using Omar Sandoval's vfs_link() with
AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard
link from a tmpfile as currently I have to unlink the old file first.

These patches can also simplify the object state handling as I/O operations
to the cache don't all have to be brought to a stop in order to invalidate
a file. To that end, and with an eye on to writing a new backing cache
model in the future, I've taken the opportunity to simplify the indexing
structure.

I've separated the index cookie concept from the file cookie concept by C
type now. The former is now called a "volume cookie" (struct
fscache_volume) and there is a container of file cookies. There are then
just the two levels. All the index cookie levels are collapsed into a
single volume cookie, and this has a single printable string as a key. For
instance, an AFS volume would have a key of something like
"afs,example.com,1000555", combining the filesystem name, cell name and
volume ID. This is freeform, but must not have '/' chars in it.

I've also eliminated all pointers back from fscache into the network
filesystem. This required the duplication of a little bit of data in the
cookie (cookie key, coherency data and file size), but it's not actually
that much. This gets rid of problems with making sure we keep netfs data
structures around so that the cache can access them.

These patches mean that most of the code that was in the drivers before is
simply gone and those drivers are now almost entirely new code. That being
the case, there doesn't seem any particular reason to try and maintain
bisectability across it. Further, there has to be a point in the middle
where things are cut over as there's a single point everything has to go
through (ie. /dev/cachefiles) and it can't be in use by two drivers at
once.


ISSUES YET OUTSTANDING
======================

There are some issues still outstanding, unaddressed by this patchset, that
will need fixing in future patchsets, but that don't stop this series from
being usable:

(1) The cachefiles driver needs to stop using the backing filesystem's
metadata to store information about what parts of the cache are
populated. This is not reliable with modern extent-based filesystems.

Fixing this is deferred to a separate patchset as it involves
negotiation with the network filesystem and the VM as to how much data
to download to fulfil a read - which brings me on to (2)...

(2) NFS and CIFS do not take account of how the cache would like I/O to be
structured to meet its granularity requirements. Previously, the
cache used page granularity, which was fine as the network filesystems
also dealt in page granularity, and the backing filesystem (ext4, xfs
or whatever) did whatever it did out of sight. However, we now have
folios to deal with and the cache will now have to store its own
metadata to track its contents.

The change I'm looking at making for cachefiles is to store content
bitmaps in one or more xattrs and making a bit in the map correspond
to something like a 256KiB block. However, the size of an xattr and
the fact that they have to be read/updated in one go means that I'm
looking at covering 1GiB of data per 512-byte map and storing each map
in an xattr. Cachefiles has the potential to grow into a fully
fledged filesystem of its very own if I'm not careful.

However, I'm also looking at changing things even more radically and
going to a different model of how the cache is arranged and managed -
one that's more akin to the way, say, openafs does things - which
brings me on to (3)...

(3) The way cachefilesd does culling is very inefficient for large caches
and it would be better to move it into the kernel if I can as
cachefilesd has to keep asking the kernel if it can cull a file.
Changing the way the backend works would allow this to be addressed.


BITS THAT MAY BE CONTROVERSIAL
==============================

There are some bits I've added that may be controversial:

(1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if
a files is already being used by some other kernel service (e.g. a
duplicate cachefiles cache in the same directory) and reject it if it
is. This isn't entirely necessary, but it helps prevent accidental
data corruption.

I don't want to use S_SWAPFILE as that has other effects, but quite
possibly swapon() should set S_KERNEL_FILE too.

Note that it doesn't prevent userspace from interfering, though
perhaps it should. (I have made it prevent a marked directory from
being rmdir-able).

(2) Cachefiles wants to keep the backing file for a cookie open whilst we
might need to write to it from network filesystem writeback. The
problem is that the network filesystem unuses its cookie when its file
is closed, and so we have nothing pinning the cachefiles file open and
it will get closed automatically after a short time to avoid
EMFILE/ENFILE problems.

Reopening the cache file, however, is a problem if this is being done
due to writeback triggered by exit(). Some filesystems will oops if
we try to open a file in that context because they want to access
current->fs or suchlike.

To get around this, I added the following:

(A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
filesystem inode to indicate that we have a usage count on the
cookie caching that inode.

(B) A flag in struct writeback_control, unpinned_fscache_wb, that is
set when __writeback_single_inode() clears the last dirty page
from i_pages - at which point it clears I_PINNING_FSCACHE_WB and
sets this flag.

This has to be done here so that clearing I_PINNING_FSCACHE_WB can
be done atomically with the check of PAGECACHE_TAG_DIRTY that
clears I_DIRTY_PAGES.

(C) A function, fscache_set_page_dirty(), which if it is not set, sets
I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the
cache resources.

(D) A function, fscache_unpin_writeback(), to be called by
->write_inode() to unuse the cookie.

(E) A function, fscache_clear_inode_writeback(), to be called when the
inode is evicted, before clear_inode() is called. This cleans up
any lingering I_PINNING_FSCACHE_WB.

The network filesystem can then use these tools to make sure that
fscache_write_to_cache() can write locally modified data to the cache
as well as to the server.

For the future, I'm working on write helpers for netfs lib that should
allow this facility to be removed by keeping track of the dirty
regions separately - but that's incomplete at the moment and is also
going to be affected by folios, one way or another, since it deals
with pages.


These patches can be found also on:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite

David


Changes
=======
ver #4:
- Dropped a pair of patches to try and cope with multipage folios in
afs_write_begin/end() - it should really be done in the caller[7].
- Fixed the use of sizeof with memset in cifs.
- Removed an extraneous kdoc param.
- Added a patch to add a tracepoint for fscache_use/unuse_cookie().
- In cifs, tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
- Add an expanded version of a patch to use current_is_kswapd() instead of
!gfpflags_allow_blocking()[8].
- Removed a couple of debugging print statements.

ver #3:
- Fixed a race in the cookie state machine between LRU discard and
relinquishment[4].
- Fixed up the hashing to make it portable[5].
- Fixed up some netfs coherency data to make it portable.
- Fixed some missing NFS_FSCACHE=n fallback functions in nfs[6].
- Added a patch to store volume coherency data in an xattr.
- Added a check that the cookie is unhashed before being freed.
- Fixed fscache to use remove_proc_subtree() to remove /proc/fs/fscache/.

ver #2:
- Fix an unused-var warning due to CONFIG_9P_FSCACHE=n.
- Use gfpflags_allow_blocking() rather than using flag directly.
- Fixed some error logging in a couple of cachefiles functions.
- Fixed an error check in the fscache volume allocation.
- Need to unmark an inode we've moved to the graveyard before unlocking.
- Upgraded to -rc4 to allow for upstream changes to cifs.
- Should only change to inval state if can get access to cache.
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.
- Remove the unused cookie pointer field from the fscache_acquire
tracepoint.
- Added missing transition to LRU_DISCARDING state.
- Added two ceph patches from Jeff Layton[2].
- Remove NFS_INO_FSCACHE as it's no longer used.
- In NFS, need to unuse a cookie on file-release, not inode-clear.
- Filled in the NFS cache I/O routines, borrowing from the previously posted
fallback I/O code[3].


Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [3]
Link: https://lore.kernel.org/r/[email protected]/ [4]
Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [5]
Link: https://lore.kernel.org/r/61b90f3d.H1IkoeQfEsGNhvq9%[email protected]/ [6]
Link: https://lore.kernel.org/r/CAHk-=wh2dr=NgVSVj0sw-gSuzhxhLRV5FymfPS146zGgF4kBjA@mail.gmail.com/ [7]
Link: https://lore.kernel.org/r/[email protected]/ [8]


References
==========

These patches have been published for review before, firstly as part of a
larger set:

Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/

Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/

Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/

Then as a cut-down set:

Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5

I split out a set to just restructure the I/O, which got merged back in to
this one:

Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk/ # v4

... and a larger set to do the conversion, also merged back into this one:

Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk/ # v2

Older versions of this one:

Link: https://lore.kernel.org/r/163819575444.215744.318477214576928110.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967073889.1823006.12237147297060239168.stgit@warthog.procyon.org.uk/ # v3

Proposals/information about the design have been published here:

Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

And requests for information:

Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

I've posted partial patches to try and help 9p and cifs along:

Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/
Link: https://lore.kernel.org/r/[email protected]/

---
Dave Wysochanski (1):
nfs: Convert to new fscache volume/cookie API

David Howells (65):
fscache, cachefiles: Disable configuration
cachefiles: Delete the cachefiles driver pending rewrite
fscache: Remove the contents of the fscache driver, pending rewrite
netfs: Display the netfs inode number in the netfs_read tracepoint
netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space
fscache: Introduce new driver
fscache: Implement a hash function
fscache: Implement cache registration
fscache: Implement volume registration
fscache: Implement cookie registration
fscache: Implement cache-level access helpers
fscache: Implement volume-level access helpers
fscache: Implement cookie-level access helpers
fscache: Implement functions add/remove a cache
fscache: Provide and use cache methods to lookup/create/free a volume
fscache: Add a function for a cache backend to note an I/O error
fscache: Implement simple cookie state machine
fscache: Implement cookie user counting and resource pinning
fscache: Implement cookie invalidation
fscache: Provide a means to begin an operation
fscache: Count data storage objects in a cache
fscache: Provide read/write stat counters for the cache
fscache: Provide a function to let the netfs update its coherency data
netfs: Pass more information on how to deal with a hole in the cache
fscache: Implement raw I/O interface
fscache: Implement higher-level write I/O interface
vfs, fscache: Implement pinning of cache usage for writeback
fscache: Provide a function to note the release of a page
fscache: Provide a function to resize a cookie
cachefiles: Introduce rewritten driver
cachefiles: Define structs
cachefiles: Add some error injection support
cachefiles: Add a couple of tracepoints for logging errors
cachefiles: Add cache error reporting macro
cachefiles: Add security derivation
cachefiles: Register a miscdev and parse commands over it
cachefiles: Provide a function to check how much space there is
vfs, cachefiles: Mark a backing file in use with an inode flag
cachefiles: Implement a function to get/create a directory in the cache
cachefiles: Implement cache registration and withdrawal
cachefiles: Implement volume support
cachefiles: Add tracepoints for calls to the VFS
cachefiles: Implement object lifecycle funcs
cachefiles: Implement key to filename encoding
cachefiles: Implement metadata/coherency data storage in xattrs
cachefiles: Mark a backing file in use with an inode flag
cachefiles: Implement culling daemon commands
cachefiles: Implement backing file wrangling
cachefiles: Implement begin and end I/O operation
cachefiles: Implement cookie resize for truncate
cachefiles: Implement the I/O routines
fscache, cachefiles: Store the volume coherency data
cachefiles: Allow cachefiles to actually function
fscache, cachefiles: Display stats of no-space events
fscache, cachefiles: Display stat of culling events
afs: Convert afs to use the new fscache API
afs: Copy local writes to the cache when writing to the server
afs: Skip truncation on the server of data we haven't written yet
9p: Use fscache indexing rewrite and reenable caching
9p: Copy local writes to the cache when writing to the server
nfs: Implement cache I/O by accessing the cache directly
cifs: Support fscache indexing rewrite (untested)
fscache: Rewrite documentation
fscache: Add a tracepoint for cookie use/unuse
9p, afs, ceph, cifs, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()

Jeff Layton (2):
ceph: conversion to new fscache API
ceph: add fscache writeback support


.../filesystems/caching/backend-api.rst | 850 ++++------
.../filesystems/caching/cachefiles.rst | 6 +-
Documentation/filesystems/caching/fscache.rst | 525 ++----
Documentation/filesystems/caching/index.rst | 4 +-
.../filesystems/caching/netfs-api.rst | 1136 ++++---------
Documentation/filesystems/caching/object.rst | 313 ----
.../filesystems/caching/operations.rst | 210 ---
Documentation/filesystems/netfs_library.rst | 16 +-
fs/9p/Kconfig | 2 +-
fs/9p/cache.c | 195 +--
fs/9p/cache.h | 25 +-
fs/9p/v9fs.c | 17 +-
fs/9p/v9fs.h | 13 +-
fs/9p/vfs_addr.c | 57 +-
fs/9p/vfs_dir.c | 13 +
fs/9p/vfs_file.c | 3 +-
fs/9p/vfs_inode.c | 26 +-
fs/9p/vfs_inode_dotl.c | 3 +-
fs/9p/vfs_super.c | 3 +
fs/afs/Kconfig | 2 +-
fs/afs/Makefile | 3 -
fs/afs/cache.c | 68 -
fs/afs/cell.c | 12 -
fs/afs/file.c | 38 +-
fs/afs/inode.c | 101 +-
fs/afs/internal.h | 37 +-
fs/afs/main.c | 14 -
fs/afs/super.c | 1 +
fs/afs/volume.c | 29 +-
fs/afs/write.c | 88 +-
fs/cachefiles/Kconfig | 7 +
fs/cachefiles/Makefile | 6 +-
fs/cachefiles/bind.c | 278 ----
fs/cachefiles/cache.c | 378 +++++
fs/cachefiles/daemon.c | 180 +-
fs/cachefiles/error_inject.c | 46 +
fs/cachefiles/interface.c | 747 ++++-----
fs/cachefiles/internal.h | 270 +--
fs/cachefiles/io.c | 330 +++-
fs/cachefiles/key.c | 201 ++-
fs/cachefiles/main.c | 22 +-
fs/cachefiles/namei.c | 1223 ++++++--------
fs/cachefiles/rdwr.c | 972 -----------
fs/cachefiles/security.c | 2 +-
fs/cachefiles/volume.c | 139 ++
fs/cachefiles/xattr.c | 421 ++---
fs/ceph/Kconfig | 2 +-
fs/ceph/addr.c | 102 +-
fs/ceph/cache.c | 218 +--
fs/ceph/cache.h | 97 +-
fs/ceph/caps.c | 3 +-
fs/ceph/file.c | 13 +-
fs/ceph/inode.c | 22 +-
fs/ceph/super.c | 10 +-
fs/ceph/super.h | 3 +-
fs/cifs/Kconfig | 2 +-
fs/cifs/Makefile | 2 +-
fs/cifs/cache.c | 105 --
fs/cifs/cifsfs.c | 11 +-
fs/cifs/cifsglob.h | 5 +-
fs/cifs/connect.c | 12 -
fs/cifs/file.c | 64 +-
fs/cifs/fscache.c | 333 +---
fs/cifs/fscache.h | 126 +-
fs/cifs/inode.c | 36 +-
fs/fs-writeback.c | 8 +
fs/fscache/Makefile | 6 +-
fs/fscache/cache.c | 618 +++----
fs/fscache/cookie.c | 1448 +++++++++--------
fs/fscache/fsdef.c | 98 --
fs/fscache/internal.h | 317 +---
fs/fscache/io.c | 376 ++++-
fs/fscache/main.c | 147 +-
fs/fscache/netfs.c | 74 -
fs/fscache/object.c | 1125 -------------
fs/fscache/operation.c | 633 -------
fs/fscache/page.c | 1242 --------------
fs/fscache/proc.c | 47 +-
fs/fscache/stats.c | 293 +---
fs/fscache/volume.c | 517 ++++++
fs/namei.c | 3 +-
fs/netfs/read_helper.c | 10 +-
fs/nfs/Kconfig | 2 +-
fs/nfs/Makefile | 2 +-
fs/nfs/client.c | 4 -
fs/nfs/direct.c | 2 +
fs/nfs/file.c | 13 +-
fs/nfs/fscache-index.c | 140 --
fs/nfs/fscache.c | 490 ++----
fs/nfs/fscache.h | 180 +-
fs/nfs/inode.c | 11 +-
fs/nfs/nfstrace.h | 1 -
fs/nfs/read.c | 25 +-
fs/nfs/super.c | 28 +-
fs/nfs/write.c | 8 +-
include/linux/fs.h | 4 +
include/linux/fscache-cache.h | 614 ++-----
include/linux/fscache.h | 1021 +++++-------
include/linux/netfs.h | 15 +-
include/linux/nfs_fs.h | 1 -
include/linux/nfs_fs_sb.h | 9 +-
include/linux/writeback.h | 1 +
include/trace/events/cachefiles.h | 527 ++++--
include/trace/events/fscache.h | 642 ++++----
include/trace/events/netfs.h | 5 +-
105 files changed, 7396 insertions(+), 13509 deletions(-)
delete mode 100644 Documentation/filesystems/caching/object.rst
delete mode 100644 Documentation/filesystems/caching/operations.rst
delete mode 100644 fs/afs/cache.c
delete mode 100644 fs/cachefiles/bind.c
create mode 100644 fs/cachefiles/cache.c
create mode 100644 fs/cachefiles/error_inject.c
delete mode 100644 fs/cachefiles/rdwr.c
create mode 100644 fs/cachefiles/volume.c
delete mode 100644 fs/cifs/cache.c
delete mode 100644 fs/fscache/fsdef.c
delete mode 100644 fs/fscache/netfs.c
delete mode 100644 fs/fscache/object.c
delete mode 100644 fs/fscache/operation.c
delete mode 100644 fs/fscache/page.c
create mode 100644 fs/fscache/volume.c
delete mode 100644 fs/nfs/fscache-index.c




2021-12-22 23:13:57

by David Howells

[permalink] [raw]
Subject: [PATCH v4 01/68] fscache, cachefiles: Disable configuration

Disable fscache and cachefiles in Kconfig whilst it is rewritten.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819576672.215744.12444272479560406780.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906882835.143852.11073015983885872901.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967075113.1823006.277316290062782998.stgit@warthog.procyon.org.uk/ # v3
---

fs/9p/Kconfig | 2 +-
fs/afs/Kconfig | 2 +-
fs/ceph/Kconfig | 2 +-
fs/cifs/Kconfig | 2 +-
fs/fscache/Kconfig | 3 +++
fs/nfs/Kconfig | 2 +-
6 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/9p/Kconfig b/fs/9p/Kconfig
index d7bc93447c85..b3d33b3ddb98 100644
--- a/fs/9p/Kconfig
+++ b/fs/9p/Kconfig
@@ -14,7 +14,7 @@ config 9P_FS
if 9P_FS
config 9P_FSCACHE
bool "Enable 9P client caching support"
- depends on 9P_FS=m && FSCACHE || 9P_FS=y && FSCACHE=y
+ depends on 9P_FS=m && FSCACHE_OLD_API || 9P_FS=y && FSCACHE_OLD_API=y
help
Choose Y here to enable persistent, read-only local
caching support for 9p clients using FS-Cache
diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index fc8ba9142f2f..c40cdfcc25d1 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -25,7 +25,7 @@ config AFS_DEBUG

config AFS_FSCACHE
bool "Provide AFS client caching support"
- depends on AFS_FS=m && FSCACHE || AFS_FS=y && FSCACHE=y
+ depends on AFS_FS=m && FSCACHE_OLD_API || AFS_FS=y && FSCACHE_OLD_API=y
help
Say Y here if you want AFS data to be cached locally on disk through
the generic filesystem cache manager
diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 94df854147d3..61f123356c3e 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -21,7 +21,7 @@ config CEPH_FS
if CEPH_FS
config CEPH_FSCACHE
bool "Enable Ceph client caching support"
- depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
+ depends on CEPH_FS=m && FSCACHE_OLD_API || CEPH_FS=y && FSCACHE_OLD_API=y
help
Choose Y here to enable persistent, read-only local
caching support for Ceph clients using FS-Cache
diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
index 3b7e3b9e4fd2..346ae8716deb 100644
--- a/fs/cifs/Kconfig
+++ b/fs/cifs/Kconfig
@@ -188,7 +188,7 @@ config CIFS_SMB_DIRECT

config CIFS_FSCACHE
bool "Provide CIFS client caching support"
- depends on CIFS=m && FSCACHE || CIFS=y && FSCACHE=y
+ depends on CIFS=m && FSCACHE_OLD_API || CIFS=y && FSCACHE_OLD_API=y
help
Makes CIFS FS-Cache capable. Say Y here if you want your CIFS data
to be cached locally on disk through the general filesystem cache
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index b313a978ae0a..76316c4a3fb7 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -38,3 +38,6 @@ config FSCACHE_DEBUG
enabled by setting bits in /sys/modules/fscache/parameter/debug.

See Documentation/filesystems/caching/fscache.rst for more information.
+
+config FSCACHE_OLD_API
+ bool
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 14a72224b657..bdc11b89eac5 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -170,7 +170,7 @@ config ROOT_NFS

config NFS_FSCACHE
bool "Provide NFS client caching support"
- depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
+ depends on NFS_FS=m && FSCACHE_OLD_API || NFS_FS=y && FSCACHE_OLD_API=y
help
Say Y here if you want NFS data to be cached locally on disc through
the general filesystem cache manager



2021-12-22 23:15:28

by David Howells

[permalink] [raw]
Subject: [PATCH v4 04/68] netfs: Display the netfs inode number in the netfs_read tracepoint

Display the netfs inode number in the netfs_read tracepoint so that this
can be used to correlate with the cachefiles_prep_read tracepoint.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819581097.215744.17476611915583897051.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906885903.143852.12229407815154182247.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967078164.1823006.15286989199782861123.stgit@warthog.procyon.org.uk/ # v3
---

include/trace/events/netfs.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 4d470bffd9f1..e6f4ebbb4c69 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -135,6 +135,7 @@ TRACE_EVENT(netfs_read,
__field(loff_t, start )
__field(size_t, len )
__field(enum netfs_read_trace, what )
+ __field(unsigned int, netfs_inode )
),

TP_fast_assign(
@@ -143,12 +144,14 @@ TRACE_EVENT(netfs_read,
__entry->start = start;
__entry->len = len;
__entry->what = what;
+ __entry->netfs_inode = rreq->inode->i_ino;
),

- TP_printk("R=%08x %s c=%08x s=%llx %zx",
+ TP_printk("R=%08x %s c=%08x ni=%x s=%llx %zx",
__entry->rreq,
__print_symbolic(__entry->what, netfs_read_traces),
__entry->cookie,
+ __entry->netfs_inode,
__entry->start, __entry->len)
);




2021-12-22 23:15:29

by David Howells

[permalink] [raw]
Subject: [PATCH v4 05/68] netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space

Pass a flag to ->prepare_write() to indicate if there's definitely no
space allocated in the cache yet (for instance if we've already checked as
we were asked to do a read).

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819583123.215744.12783808230464471417.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906886835.143852.6689886781122679769.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967079100.1823006.12889542712309574359.stgit@warthog.procyon.org.uk/ # v3
---

fs/netfs/read_helper.c | 2 +-
include/linux/netfs.h | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 75c76cbb27cc..9dd76b8914f2 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -323,7 +323,7 @@ static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq)
}

ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len,
- rreq->i_size);
+ rreq->i_size, true);
if (ret < 0) {
trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write);
trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index ca0683b9e3d1..1ea22fc48818 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -232,7 +232,8 @@ struct netfs_cache_ops {
* actually do.
*/
int (*prepare_write)(struct netfs_cache_resources *cres,
- loff_t *_start, size_t *_len, loff_t i_size);
+ loff_t *_start, size_t *_len, loff_t i_size,
+ bool no_space_allocated_yet);
};

struct readahead_control;



2021-12-22 23:15:39

by David Howells

[permalink] [raw]
Subject: [PATCH v4 06/68] fscache: Introduce new driver

Introduce basic skeleton of the new, rewritten fscache driver.

Changes
=======
ver #3:
- Use remove_proc_subtree(), not remove_proc_entry() to remove a populated
dir.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819584034.215744.4290533472390439030.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906887770.143852.3577888294989185666.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967080039.1823006.5702921801104057922.stgit@warthog.procyon.org.uk/ # v3
---

fs/Makefile | 1
fs/fscache/Kconfig | 39 +++++++++
fs/fscache/Makefile | 12 +++
fs/fscache/internal.h | 183 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/main.c | 65 ++++++++++++++
fs/fscache/proc.c | 42 +++++++++
fs/fscache/stats.c | 22 +++++
include/linux/fscache-cache.h | 2
include/linux/fscache.h | 6 +
include/trace/events/fscache.h | 49 +++++++++++
10 files changed, 420 insertions(+), 1 deletion(-)
create mode 100644 fs/fscache/Makefile
create mode 100644 fs/fscache/internal.h
create mode 100644 fs/fscache/main.c
create mode 100644 fs/fscache/proc.c
create mode 100644 fs/fscache/stats.c
create mode 100644 include/trace/events/fscache.h

diff --git a/fs/Makefile b/fs/Makefile
index 23ddd0803d14..290815f3fd31 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_DLM) += dlm/

# Do not add any filesystems before this line
obj-$(CONFIG_NETFS_SUPPORT) += netfs/
+obj-$(CONFIG_FSCACHE) += fscache/
obj-$(CONFIG_REISERFS_FS) += reiserfs/
obj-$(CONFIG_EXT4_FS) += ext4/
# We place ext4 before ext2 so that clean ext3 root fs's do NOT mount using the
diff --git a/fs/fscache/Kconfig b/fs/fscache/Kconfig
index 6440484d9461..76316c4a3fb7 100644
--- a/fs/fscache/Kconfig
+++ b/fs/fscache/Kconfig
@@ -1,4 +1,43 @@
# SPDX-License-Identifier: GPL-2.0-only

+config FSCACHE
+ tristate "General filesystem local caching manager"
+ select NETFS_SUPPORT
+ help
+ This option enables a generic filesystem caching manager that can be
+ used by various network and other filesystems to cache data locally.
+ Different sorts of caches can be plugged in, depending on the
+ resources available.
+
+ See Documentation/filesystems/caching/fscache.rst for more information.
+
+config FSCACHE_STATS
+ bool "Gather statistical information on local caching"
+ depends on FSCACHE && PROC_FS
+ select NETFS_STATS
+ help
+ This option causes statistical information to be gathered on local
+ caching and exported through file:
+
+ /proc/fs/fscache/stats
+
+ The gathering of statistics adds a certain amount of overhead to
+ execution as there are a quite a few stats gathered, and on a
+ multi-CPU system these may be on cachelines that keep bouncing
+ between CPUs. On the other hand, the stats are very useful for
+ debugging purposes. Saying 'Y' here is recommended.
+
+ See Documentation/filesystems/caching/fscache.rst for more information.
+
+config FSCACHE_DEBUG
+ bool "Debug FS-Cache"
+ depends on FSCACHE
+ help
+ This permits debugging to be dynamically enabled in the local caching
+ management module. If this is set, the debugging output may be
+ enabled by setting bits in /sys/modules/fscache/parameter/debug.
+
+ See Documentation/filesystems/caching/fscache.rst for more information.
+
config FSCACHE_OLD_API
bool
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
new file mode 100644
index 000000000000..f9722de32247
--- /dev/null
+++ b/fs/fscache/Makefile
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for general filesystem caching code
+#
+
+fscache-y := \
+ main.o
+
+fscache-$(CONFIG_PROC_FS) += proc.o
+fscache-$(CONFIG_FSCACHE_STATS) += stats.o
+
+obj-$(CONFIG_FSCACHE) := fscache.o
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
new file mode 100644
index 000000000000..ea52f8594a77
--- /dev/null
+++ b/fs/fscache/internal.h
@@ -0,0 +1,183 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Internal definitions for FS-Cache
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "FS-Cache: " fmt
+
+#include <linux/slab.h>
+#include <linux/fscache-cache.h>
+#include <trace/events/fscache.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+
+/*
+ * main.c
+ */
+extern unsigned fscache_debug;
+
+/*
+ * proc.c
+ */
+#ifdef CONFIG_PROC_FS
+extern int __init fscache_proc_init(void);
+extern void fscache_proc_cleanup(void);
+#else
+#define fscache_proc_init() (0)
+#define fscache_proc_cleanup() do {} while (0)
+#endif
+
+/*
+ * stats.c
+ */
+#ifdef CONFIG_FSCACHE_STATS
+
+static inline void fscache_stat(atomic_t *stat)
+{
+ atomic_inc(stat);
+}
+
+static inline void fscache_stat_d(atomic_t *stat)
+{
+ atomic_dec(stat);
+}
+
+#define __fscache_stat(stat) (stat)
+
+int fscache_stats_show(struct seq_file *m, void *v);
+#else
+
+#define __fscache_stat(stat) (NULL)
+#define fscache_stat(stat) do {} while (0)
+#define fscache_stat_d(stat) do {} while (0)
+#endif
+
+
+/*****************************************************************************/
+/*
+ * debug tracing
+ */
+#define dbgprintk(FMT, ...) \
+ printk("[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__)
+
+#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __func__, ##__VA_ARGS__)
+#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
+#define kdebug(FMT, ...) dbgprintk(FMT, ##__VA_ARGS__)
+
+#define kjournal(FMT, ...) no_printk(FMT, ##__VA_ARGS__)
+
+#ifdef __KDEBUG
+#define _enter(FMT, ...) kenter(FMT, ##__VA_ARGS__)
+#define _leave(FMT, ...) kleave(FMT, ##__VA_ARGS__)
+#define _debug(FMT, ...) kdebug(FMT, ##__VA_ARGS__)
+
+#elif defined(CONFIG_FSCACHE_DEBUG)
+#define _enter(FMT, ...) \
+do { \
+ if (__do_kdebug(ENTER)) \
+ kenter(FMT, ##__VA_ARGS__); \
+} while (0)
+
+#define _leave(FMT, ...) \
+do { \
+ if (__do_kdebug(LEAVE)) \
+ kleave(FMT, ##__VA_ARGS__); \
+} while (0)
+
+#define _debug(FMT, ...) \
+do { \
+ if (__do_kdebug(DEBUG)) \
+ kdebug(FMT, ##__VA_ARGS__); \
+} while (0)
+
+#else
+#define _enter(FMT, ...) no_printk("==> %s("FMT")", __func__, ##__VA_ARGS__)
+#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
+#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__)
+#endif
+
+/*
+ * determine whether a particular optional debugging point should be logged
+ * - we need to go through three steps to persuade cpp to correctly join the
+ * shorthand in FSCACHE_DEBUG_LEVEL with its prefix
+ */
+#define ____do_kdebug(LEVEL, POINT) \
+ unlikely((fscache_debug & \
+ (FSCACHE_POINT_##POINT << (FSCACHE_DEBUG_ ## LEVEL * 3))))
+#define ___do_kdebug(LEVEL, POINT) \
+ ____do_kdebug(LEVEL, POINT)
+#define __do_kdebug(POINT) \
+ ___do_kdebug(FSCACHE_DEBUG_LEVEL, POINT)
+
+#define FSCACHE_DEBUG_CACHE 0
+#define FSCACHE_DEBUG_COOKIE 1
+#define FSCACHE_DEBUG_OBJECT 2
+#define FSCACHE_DEBUG_OPERATION 3
+
+#define FSCACHE_POINT_ENTER 1
+#define FSCACHE_POINT_LEAVE 2
+#define FSCACHE_POINT_DEBUG 4
+
+#ifndef FSCACHE_DEBUG_LEVEL
+#define FSCACHE_DEBUG_LEVEL CACHE
+#endif
+
+/*
+ * assertions
+ */
+#if 1 /* defined(__KDEBUGALL) */
+
+#define ASSERT(X) \
+do { \
+ if (unlikely(!(X))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ BUG(); \
+ } \
+} while (0)
+
+#define ASSERTCMP(X, OP, Y) \
+do { \
+ if (unlikely(!((X) OP (Y)))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ pr_err("%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while (0)
+
+#define ASSERTIF(C, X) \
+do { \
+ if (unlikely((C) && !(X))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ BUG(); \
+ } \
+} while (0)
+
+#define ASSERTIFCMP(C, X, OP, Y) \
+do { \
+ if (unlikely((C) && !((X) OP (Y)))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ pr_err("%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while (0)
+
+#else
+
+#define ASSERT(X) do {} while (0)
+#define ASSERTCMP(X, OP, Y) do {} while (0)
+#define ASSERTIF(C, X) do {} while (0)
+#define ASSERTIFCMP(C, X, OP, Y) do {} while (0)
+
+#endif /* assert or not */
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
new file mode 100644
index 000000000000..819de2ee1276
--- /dev/null
+++ b/fs/fscache/main.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* General filesystem local caching manager
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/module.h>
+#include <linux/init.h>
+#define CREATE_TRACE_POINTS
+#include "internal.h"
+
+MODULE_DESCRIPTION("FS Cache Manager");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+unsigned fscache_debug;
+module_param_named(debug, fscache_debug, uint,
+ S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(fscache_debug,
+ "FS-Cache debugging mask");
+
+struct workqueue_struct *fscache_wq;
+EXPORT_SYMBOL(fscache_wq);
+
+/*
+ * initialise the fs caching module
+ */
+static int __init fscache_init(void)
+{
+ int ret = -ENOMEM;
+
+ fscache_wq = alloc_workqueue("fscache", WQ_UNBOUND | WQ_FREEZABLE, 0);
+ if (!fscache_wq)
+ goto error_wq;
+
+ ret = fscache_proc_init();
+ if (ret < 0)
+ goto error_proc;
+
+ pr_notice("Loaded\n");
+ return 0;
+
+error_proc:
+ destroy_workqueue(fscache_wq);
+error_wq:
+ return ret;
+}
+
+fs_initcall(fscache_init);
+
+/*
+ * clean up on module removal
+ */
+static void __exit fscache_exit(void)
+{
+ _enter("");
+
+ fscache_proc_cleanup();
+ destroy_workqueue(fscache_wq);
+ pr_notice("Unloaded\n");
+}
+
+module_exit(fscache_exit);
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
new file mode 100644
index 000000000000..4d866ac41776
--- /dev/null
+++ b/fs/fscache/proc.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* FS-Cache statistics viewing interface
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include "internal.h"
+
+/*
+ * initialise the /proc/fs/fscache/ directory
+ */
+int __init fscache_proc_init(void)
+{
+ if (!proc_mkdir("fs/fscache", NULL))
+ goto error_dir;
+
+#ifdef CONFIG_FSCACHE_STATS
+ if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
+ fscache_stats_show))
+ goto error;
+#endif
+
+ return 0;
+
+error:
+ remove_proc_entry("fs/fscache", NULL);
+error_dir:
+ return -ENOMEM;
+}
+
+/*
+ * clean up the /proc/fs/fscache/ directory
+ */
+void fscache_proc_cleanup(void)
+{
+ remove_proc_subtree("fs/fscache", NULL);
+}
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
new file mode 100644
index 000000000000..bd92f93e1680
--- /dev/null
+++ b/fs/fscache/stats.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* FS-Cache statistics
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include "internal.h"
+
+/*
+ * display the general statistics
+ */
+int fscache_stats_show(struct seq_file *m, void *v)
+{
+ seq_puts(m, "FS-Cache statistics\n");
+
+ netfs_stats_show(m);
+ return 0;
+}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 47f21a53ac4b..d6910a913918 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -16,4 +16,6 @@

#include <linux/fscache.h>

+extern struct workqueue_struct *fscache_wq;
+
#endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 0364a4ca16f6..1cf90c252aac 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -1,7 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0-or-later */
/* General filesystem caching interface
*
- * Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
* Written by David Howells ([email protected])
*
* NOTE!!! See:
@@ -18,9 +18,13 @@
#include <linux/netfs.h>

#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
+#define __fscache_available (1)
+#define fscache_available() (1)
#define fscache_cookie_valid(cookie) (cookie)
#define fscache_cookie_enabled(cookie) (cookie)
#else
+#define __fscache_available (0)
+#define fscache_available() (0)
#define fscache_cookie_valid(cookie) (0)
#define fscache_cookie_enabled(cookie) (0)
#endif
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
new file mode 100644
index 000000000000..fe214c5cc87f
--- /dev/null
+++ b/include/trace/events/fscache.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* FS-Cache tracepoints
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fscache
+
+#if !defined(_TRACE_FSCACHE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FSCACHE_H
+
+#include <linux/fscache.h>
+#include <linux/tracepoint.h>
+
+/*
+ * Define enums for tracing information.
+ */
+#ifndef __FSCACHE_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __FSCACHE_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+#endif
+
+/*
+ * Declare tracing information enums and their string mappings for display.
+ */
+
+/*
+ * Export enum symbols via userspace.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define E_(a, b) TRACE_DEFINE_ENUM(a);
+
+/*
+ * Now redefine the EM() and E_() macros to map the enums to the strings that
+ * will be printed in the output.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) { a, b },
+#define E_(a, b) { a, b }
+
+
+#endif /* _TRACE_FSCACHE_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>



2021-12-22 23:15:51

by David Howells

[permalink] [raw]
Subject: [PATCH v4 07/68] fscache: Implement a hash function

Implement a function to generate hashes. It needs to be stable over time
and endianness-independent as the hashes will appear on disk in future
patches. It can assume that its input is a multiple of four bytes in size
and alignment.

This is borrowed from the VFS and simplified. le32_to_cpu() is added to
make it endianness-independent.

Changes
=======
ver #3:
- Read the data being hashed in an endianness-independent way[1].
- Change the size parameter to be in bytes rather than words.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/163819586113.215744.1699465806130102367.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906888735.143852.10944614318596881429.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967082342.1823006.8915671045444488742.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/internal.h | 2 ++
fs/fscache/main.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 42 insertions(+)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index ea52f8594a77..f345bdb018ba 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -22,6 +22,8 @@
*/
extern unsigned fscache_debug;

+extern unsigned int fscache_hash(unsigned int salt, const void *data, size_t len);
+
/*
* proc.c
*/
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 819de2ee1276..687b34903d5b 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -24,6 +24,46 @@ MODULE_PARM_DESC(fscache_debug,
struct workqueue_struct *fscache_wq;
EXPORT_SYMBOL(fscache_wq);

+/*
+ * Mixing scores (in bits) for (7,20):
+ * Input delta: 1-bit 2-bit
+ * 1 round: 330.3 9201.6
+ * 2 rounds: 1246.4 25475.4
+ * 3 rounds: 1907.1 31295.1
+ * 4 rounds: 2042.3 31718.6
+ * Perfect: 2048 31744
+ * (32*64) (32*31/2 * 64)
+ */
+#define HASH_MIX(x, y, a) \
+ ( x ^= (a), \
+ y ^= x, x = rol32(x, 7),\
+ x += y, y = rol32(y,20),\
+ y *= 9 )
+
+static inline unsigned int fold_hash(unsigned long x, unsigned long y)
+{
+ /* Use arch-optimized multiply if one exists */
+ return __hash_32(y ^ __hash_32(x));
+}
+
+/*
+ * Generate a hash. This is derived from full_name_hash(), but we want to be
+ * sure it is arch independent and that it doesn't change as bits of the
+ * computed hash value might appear on disk. The caller must guarantee that
+ * the source data is a multiple of four bytes in size.
+ */
+unsigned int fscache_hash(unsigned int salt, const void *data, size_t len)
+{
+ const __le32 *p = data;
+ unsigned int a, x = 0, y = salt, n = len / sizeof(__le32);
+
+ for (; n; n--) {
+ a = le32_to_cpu(*p++);
+ HASH_MIX(x, y, a);
+ }
+ return fold_hash(x, y);
+}
+
/*
* initialise the fs caching module
*/



2021-12-22 23:16:02

by David Howells

[permalink] [raw]
Subject: [PATCH v4 08/68] fscache: Implement cache registration

Implement a register of caches and provide functions to manage it.

Two functions are provided for the cache backend to use:

(1) Acquire a cache cookie:

struct fscache_cache *fscache_acquire_cache(const char *name)

This gets the cache cookie for a cache of the specified name and moves
it to the preparation state. If a nameless cache cookie exists, that
will be given this name and used.

(2) Relinquish a cache cookie:

void fscache_relinquish_cache(struct fscache_cache *cache);

This relinquishes a cache cookie, cleans it and makes it available if
it's still referenced by a network filesystem.

Note that network filesystems don't deal with cache cookies directly, but
rather go straight to the volume registration.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819587157.215744.13523139317322503286.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906889665.143852.10378009165231294456.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967085081.1823006.2218944206363626210.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/Makefile | 1
fs/fscache/cache.c | 274 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/internal.h | 33 +++++
fs/fscache/proc.c | 4 +
include/linux/fscache-cache.h | 34 +++++
include/trace/events/fscache.h | 43 ++++++
6 files changed, 389 insertions(+)
create mode 100644 fs/fscache/cache.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index f9722de32247..d9fc22c18090 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -4,6 +4,7 @@
#

fscache-y := \
+ cache.o \
main.o

fscache-$(CONFIG_PROC_FS) += proc.o
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
new file mode 100644
index 000000000000..8db77bb9f8e2
--- /dev/null
+++ b/fs/fscache/cache.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* FS-Cache cache handling
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define FSCACHE_DEBUG_LEVEL CACHE
+#include <linux/export.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static LIST_HEAD(fscache_caches);
+DECLARE_RWSEM(fscache_addremove_sem);
+EXPORT_SYMBOL(fscache_addremove_sem);
+
+static atomic_t fscache_cache_debug_id;
+
+/*
+ * Allocate a cache cookie.
+ */
+static struct fscache_cache *fscache_alloc_cache(const char *name)
+{
+ struct fscache_cache *cache;
+
+ cache = kzalloc(sizeof(*cache), GFP_KERNEL);
+ if (cache) {
+ if (name) {
+ cache->name = kstrdup(name, GFP_KERNEL);
+ if (!cache->name) {
+ kfree(cache);
+ return NULL;
+ }
+ }
+ refcount_set(&cache->ref, 1);
+ INIT_LIST_HEAD(&cache->cache_link);
+ cache->debug_id = atomic_inc_return(&fscache_cache_debug_id);
+ }
+ return cache;
+}
+
+static bool fscache_get_cache_maybe(struct fscache_cache *cache,
+ enum fscache_cache_trace where)
+{
+ bool success;
+ int ref;
+
+ success = __refcount_inc_not_zero(&cache->ref, &ref);
+ if (success)
+ trace_fscache_cache(cache->debug_id, ref + 1, where);
+ return success;
+}
+
+/*
+ * Look up a cache cookie.
+ */
+struct fscache_cache *fscache_lookup_cache(const char *name, bool is_cache)
+{
+ struct fscache_cache *candidate, *cache, *unnamed = NULL;
+
+ /* firstly check for the existence of the cache under read lock */
+ down_read(&fscache_addremove_sem);
+
+ list_for_each_entry(cache, &fscache_caches, cache_link) {
+ if (cache->name && name && strcmp(cache->name, name) == 0 &&
+ fscache_get_cache_maybe(cache, fscache_cache_get_acquire))
+ goto got_cache_r;
+ if (!cache->name && !name &&
+ fscache_get_cache_maybe(cache, fscache_cache_get_acquire))
+ goto got_cache_r;
+ }
+
+ if (!name) {
+ list_for_each_entry(cache, &fscache_caches, cache_link) {
+ if (cache->name &&
+ fscache_get_cache_maybe(cache, fscache_cache_get_acquire))
+ goto got_cache_r;
+ }
+ }
+
+ up_read(&fscache_addremove_sem);
+
+ /* the cache does not exist - create a candidate */
+ candidate = fscache_alloc_cache(name);
+ if (!candidate)
+ return ERR_PTR(-ENOMEM);
+
+ /* write lock, search again and add if still not present */
+ down_write(&fscache_addremove_sem);
+
+ list_for_each_entry(cache, &fscache_caches, cache_link) {
+ if (cache->name && name && strcmp(cache->name, name) == 0 &&
+ fscache_get_cache_maybe(cache, fscache_cache_get_acquire))
+ goto got_cache_w;
+ if (!cache->name) {
+ unnamed = cache;
+ if (!name &&
+ fscache_get_cache_maybe(cache, fscache_cache_get_acquire))
+ goto got_cache_w;
+ }
+ }
+
+ if (unnamed && is_cache &&
+ fscache_get_cache_maybe(unnamed, fscache_cache_get_acquire))
+ goto use_unnamed_cache;
+
+ if (!name) {
+ list_for_each_entry(cache, &fscache_caches, cache_link) {
+ if (cache->name &&
+ fscache_get_cache_maybe(cache, fscache_cache_get_acquire))
+ goto got_cache_w;
+ }
+ }
+
+ list_add_tail(&candidate->cache_link, &fscache_caches);
+ trace_fscache_cache(candidate->debug_id,
+ refcount_read(&candidate->ref),
+ fscache_cache_new_acquire);
+ up_write(&fscache_addremove_sem);
+ return candidate;
+
+got_cache_r:
+ up_read(&fscache_addremove_sem);
+ return cache;
+use_unnamed_cache:
+ cache = unnamed;
+ cache->name = candidate->name;
+ candidate->name = NULL;
+got_cache_w:
+ up_write(&fscache_addremove_sem);
+ kfree(candidate->name);
+ kfree(candidate);
+ return cache;
+}
+
+/**
+ * fscache_acquire_cache - Acquire a cache-level cookie.
+ * @name: The name of the cache.
+ *
+ * Get a cookie to represent an actual cache. If a name is given and there is
+ * a nameless cache record available, this will acquire that and set its name,
+ * directing all the volumes using it to this cache.
+ *
+ * The cache will be switched over to the preparing state if not currently in
+ * use, otherwise -EBUSY will be returned.
+ */
+struct fscache_cache *fscache_acquire_cache(const char *name)
+{
+ struct fscache_cache *cache;
+
+ ASSERT(name);
+ cache = fscache_lookup_cache(name, true);
+ if (IS_ERR(cache))
+ return cache;
+
+ if (!fscache_set_cache_state_maybe(cache,
+ FSCACHE_CACHE_IS_NOT_PRESENT,
+ FSCACHE_CACHE_IS_PREPARING)) {
+ pr_warn("Cache tag %s in use\n", name);
+ fscache_put_cache(cache, fscache_cache_put_cache);
+ return ERR_PTR(-EBUSY);
+ }
+
+ return cache;
+}
+EXPORT_SYMBOL(fscache_acquire_cache);
+
+/**
+ * fscache_put_cache - Release a cache-level cookie.
+ * @cache: The cache cookie to be released
+ * @where: An indication of where the release happened
+ *
+ * Release the caller's reference on a cache-level cookie. The @where
+ * indication should give information about the circumstances in which the call
+ * occurs and will be logged through a tracepoint.
+ */
+void fscache_put_cache(struct fscache_cache *cache,
+ enum fscache_cache_trace where)
+{
+ unsigned int debug_id = cache->debug_id;
+ bool zero;
+ int ref;
+
+ if (IS_ERR_OR_NULL(cache))
+ return;
+
+ zero = __refcount_dec_and_test(&cache->ref, &ref);
+ trace_fscache_cache(debug_id, ref - 1, where);
+
+ if (zero) {
+ down_write(&fscache_addremove_sem);
+ list_del_init(&cache->cache_link);
+ up_write(&fscache_addremove_sem);
+ kfree(cache->name);
+ kfree(cache);
+ }
+}
+
+/**
+ * fscache_relinquish_cache - Reset cache state and release cookie
+ * @cache: The cache cookie to be released
+ *
+ * Reset the state of a cache and release the caller's reference on a cache
+ * cookie.
+ */
+void fscache_relinquish_cache(struct fscache_cache *cache)
+{
+ enum fscache_cache_trace where =
+ (cache->state == FSCACHE_CACHE_IS_PREPARING) ?
+ fscache_cache_put_prep_failed :
+ fscache_cache_put_relinquish;
+
+ cache->cache_priv = NULL;
+ smp_store_release(&cache->state, FSCACHE_CACHE_IS_NOT_PRESENT);
+ fscache_put_cache(cache, where);
+}
+EXPORT_SYMBOL(fscache_relinquish_cache);
+
+#ifdef CONFIG_PROC_FS
+static const char fscache_cache_states[NR__FSCACHE_CACHE_STATE] = "-PAEW";
+
+/*
+ * Generate a list of caches in /proc/fs/fscache/caches
+ */
+static int fscache_caches_seq_show(struct seq_file *m, void *v)
+{
+ struct fscache_cache *cache;
+
+ if (v == &fscache_caches) {
+ seq_puts(m,
+ "CACHE REF VOLS OBJS ACCES S NAME\n"
+ "======== ===== ===== ===== ===== = ===============\n"
+ );
+ return 0;
+ }
+
+ cache = list_entry(v, struct fscache_cache, cache_link);
+ seq_printf(m,
+ "%08x %5d %5d %5d %5d %c %s\n",
+ cache->debug_id,
+ refcount_read(&cache->ref),
+ atomic_read(&cache->n_volumes),
+ atomic_read(&cache->object_count),
+ atomic_read(&cache->n_accesses),
+ fscache_cache_states[cache->state],
+ cache->name ?: "-");
+ return 0;
+}
+
+static void *fscache_caches_seq_start(struct seq_file *m, loff_t *_pos)
+ __acquires(fscache_addremove_sem)
+{
+ down_read(&fscache_addremove_sem);
+ return seq_list_start_head(&fscache_caches, *_pos);
+}
+
+static void *fscache_caches_seq_next(struct seq_file *m, void *v, loff_t *_pos)
+{
+ return seq_list_next(v, &fscache_caches, _pos);
+}
+
+static void fscache_caches_seq_stop(struct seq_file *m, void *v)
+ __releases(fscache_addremove_sem)
+{
+ up_read(&fscache_addremove_sem);
+}
+
+const struct seq_operations fscache_caches_seq_ops = {
+ .start = fscache_caches_seq_start,
+ .next = fscache_caches_seq_next,
+ .stop = fscache_caches_seq_stop,
+ .show = fscache_caches_seq_show,
+};
+#endif /* CONFIG_PROC_FS */
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index f345bdb018ba..8fd39e7735fc 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -17,6 +17,39 @@
#include <linux/sched.h>
#include <linux/seq_file.h>

+/*
+ * cache.c
+ */
+#ifdef CONFIG_PROC_FS
+extern const struct seq_operations fscache_caches_seq_ops;
+#endif
+struct fscache_cache *fscache_lookup_cache(const char *name, bool is_cache);
+void fscache_put_cache(struct fscache_cache *cache, enum fscache_cache_trace where);
+
+static inline enum fscache_cache_state fscache_cache_state(const struct fscache_cache *cache)
+{
+ return smp_load_acquire(&cache->state);
+}
+
+static inline bool fscache_cache_is_live(const struct fscache_cache *cache)
+{
+ return fscache_cache_state(cache) == FSCACHE_CACHE_IS_ACTIVE;
+}
+
+static inline void fscache_set_cache_state(struct fscache_cache *cache,
+ enum fscache_cache_state new_state)
+{
+ smp_store_release(&cache->state, new_state);
+
+}
+
+static inline bool fscache_set_cache_state_maybe(struct fscache_cache *cache,
+ enum fscache_cache_state old_state,
+ enum fscache_cache_state new_state)
+{
+ return try_cmpxchg_release(&cache->state, &old_state, new_state);
+}
+
/*
* main.c
*/
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index 4d866ac41776..93b925709e09 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -19,6 +19,10 @@ int __init fscache_proc_init(void)
if (!proc_mkdir("fs/fscache", NULL))
goto error_dir;

+ if (!proc_create_seq("fs/fscache/caches", S_IFREG | 0444, NULL,
+ &fscache_caches_seq_ops))
+ goto error;
+
#ifdef CONFIG_FSCACHE_STATS
if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
fscache_stats_show))
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index d6910a913918..18cd5c9877bb 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -16,6 +16,40 @@

#include <linux/fscache.h>

+enum fscache_cache_trace;
+enum fscache_access_trace;
+
+enum fscache_cache_state {
+ FSCACHE_CACHE_IS_NOT_PRESENT, /* No cache is present for this name */
+ FSCACHE_CACHE_IS_PREPARING, /* A cache is preparing to come live */
+ FSCACHE_CACHE_IS_ACTIVE, /* Attached cache is active and can be used */
+ FSCACHE_CACHE_GOT_IOERROR, /* Attached cache stopped on I/O error */
+ FSCACHE_CACHE_IS_WITHDRAWN, /* Attached cache is being withdrawn */
+#define NR__FSCACHE_CACHE_STATE (FSCACHE_CACHE_IS_WITHDRAWN + 1)
+};
+
+/*
+ * Cache cookie.
+ */
+struct fscache_cache {
+ struct list_head cache_link; /* Link in cache list */
+ void *cache_priv; /* Private cache data (or NULL) */
+ refcount_t ref;
+ atomic_t n_volumes; /* Number of active volumes; */
+ atomic_t n_accesses; /* Number of in-progress accesses on the cache */
+ atomic_t object_count; /* no. of live objects in this cache */
+ unsigned int debug_id;
+ enum fscache_cache_state state;
+ char *name;
+};
+
extern struct workqueue_struct *fscache_wq;

+/*
+ * out-of-line cache backend functions
+ */
+extern struct rw_semaphore fscache_addremove_sem;
+extern struct fscache_cache *fscache_acquire_cache(const char *name);
+extern void fscache_relinquish_cache(struct fscache_cache *cache);
+
#endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index fe214c5cc87f..3b8e0597b2c1 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -19,11 +19,27 @@
#ifndef __FSCACHE_DECLARE_TRACE_ENUMS_ONCE_ONLY
#define __FSCACHE_DECLARE_TRACE_ENUMS_ONCE_ONLY

+enum fscache_cache_trace {
+ fscache_cache_collision,
+ fscache_cache_get_acquire,
+ fscache_cache_new_acquire,
+ fscache_cache_put_cache,
+ fscache_cache_put_prep_failed,
+ fscache_cache_put_relinquish,
+};
+
#endif

/*
* Declare tracing information enums and their string mappings for display.
*/
+#define fscache_cache_traces \
+ EM(fscache_cache_collision, "*COLLIDE*") \
+ EM(fscache_cache_get_acquire, "GET acq ") \
+ EM(fscache_cache_new_acquire, "NEW acq ") \
+ EM(fscache_cache_put_cache, "PUT cache") \
+ EM(fscache_cache_put_prep_failed, "PUT pfail") \
+ E_(fscache_cache_put_relinquish, "PUT relnq")

/*
* Export enum symbols via userspace.
@@ -33,6 +49,8 @@
#define EM(a, b) TRACE_DEFINE_ENUM(a);
#define E_(a, b) TRACE_DEFINE_ENUM(a);

+fscache_cache_traces;
+
/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
* will be printed in the output.
@@ -43,6 +61,31 @@
#define E_(a, b) { a, b }


+TRACE_EVENT(fscache_cache,
+ TP_PROTO(unsigned int cache_debug_id,
+ int usage,
+ enum fscache_cache_trace where),
+
+ TP_ARGS(cache_debug_id, usage, where),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cache )
+ __field(int, usage )
+ __field(enum fscache_cache_trace, where )
+ ),
+
+ TP_fast_assign(
+ __entry->cache = cache_debug_id;
+ __entry->usage = usage;
+ __entry->where = where;
+ ),
+
+ TP_printk("C=%08x %s r=%d",
+ __entry->cache,
+ __print_symbolic(__entry->where, fscache_cache_traces),
+ __entry->usage)
+ );
+
#endif /* _TRACE_FSCACHE_H */

/* This part must be outside protection */



2021-12-22 23:16:29

by David Howells

[permalink] [raw]
Subject: [PATCH v4 09/68] fscache: Implement volume registration

Add functions to the fscache API to allow volumes to be acquired and
relinquished by the network filesystem. A volume is an index of data
storage cache objects. A volume is represented by a volume cookie in the
API. A filesystem would typically create a volume for a superblock and
then create per-inode cookies within it.

To request a volume, the filesystem calls:

struct fscache_volume *
fscache_acquire_volume(const char *volume_key,
const char *cache_name,
const void *coherency_data,
size_t coherency_len)

The volume_key is a printable string used to match the volume in the cache.
It should not contain any '/' characters. For AFS, for example, this would
be "afs,<cellname>,<volume_id>", e.g. "afs,example.com,523001".

The cache_name can be NULL, but if not it should be a string indicating the
name of the cache to use if there's more than one available.

The coherency data, if given, is an arbitrarily-sized blob that's attached
to the volume and is compared when the volume is looked up. If it doesn't
match, the old volume is judged to be out of date and it and everything
within it is discarded.

Acquiring a volume twice concurrently is disallowed, though the function
will wait if an old volume cookie is being relinquishing.


When a network filesystem has finished with a volume, it should return the
volume cookie by calling:

void
fscache_relinquish_volume(struct fscache_volume *volume,
const void *coherency_data,
bool invalidate)

If invalidate is true, the entire volume will be discarded; if false, the
volume will be synced and the coherency data will be updated.

Changes
=======
ver #4:
- Removed an extraneous param from kdoc on fscache_relinquish_volume()[3].

ver #3:
- fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit
to round up to.
- When comparing cookies, simply see if the attributes are the same rather
than subtracting them to produce a strcmp-style return[2].
- Make the coherency data an arbitrary blob rather than a u64, but don't
store it for the moment.

ver #2:
- Fix error check[1].
- Make a fscache_acquire_volume() return errors, including EBUSY if a
conflicting volume cookie already exists. No error is printed now -
that's left to the netfs.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/20211203095608.GC2480@kili/ [1]
Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [2]
Link: https://lore.kernel.org/r/[email protected]/ [3]
Link: https://lore.kernel.org/r/163819588944.215744.1629085755564865996.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906890630.143852.13972180614535611154.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967086836.1823006.8191672796841981763.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/Makefile | 3
fs/fscache/internal.h | 14 ++
fs/fscache/proc.c | 4
fs/fscache/stats.c | 12 +
fs/fscache/volume.c | 340 ++++++++++++++++++++++++++++++++++++++++
include/linux/fscache.h | 84 ++++++++++
include/trace/events/fscache.h | 61 +++++++
7 files changed, 516 insertions(+), 2 deletions(-)
create mode 100644 fs/fscache/volume.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index d9fc22c18090..bb5282ae682f 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -5,7 +5,8 @@

fscache-y := \
cache.o \
- main.o
+ main.o \
+ volume.o

fscache-$(CONFIG_PROC_FS) += proc.o
fscache-$(CONFIG_FSCACHE_STATS) += stats.o
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 8fd39e7735fc..07dc9cbc2280 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -72,6 +72,9 @@ extern void fscache_proc_cleanup(void);
* stats.c
*/
#ifdef CONFIG_FSCACHE_STATS
+extern atomic_t fscache_n_volumes;
+extern atomic_t fscache_n_volumes_collision;
+extern atomic_t fscache_n_volumes_nomem;

static inline void fscache_stat(atomic_t *stat)
{
@@ -93,6 +96,17 @@ int fscache_stats_show(struct seq_file *m, void *v);
#define fscache_stat_d(stat) do {} while (0)
#endif

+/*
+ * volume.c
+ */
+extern const struct seq_operations fscache_volumes_seq_ops;
+
+struct fscache_volume *fscache_get_volume(struct fscache_volume *volume,
+ enum fscache_volume_trace where);
+void fscache_put_volume(struct fscache_volume *volume,
+ enum fscache_volume_trace where);
+void fscache_create_volume(struct fscache_volume *volume, bool wait);
+

/*****************************************************************************/
/*
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index 93b925709e09..bc6ecbdd065d 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -23,6 +23,10 @@ int __init fscache_proc_init(void)
&fscache_caches_seq_ops))
goto error;

+ if (!proc_create_seq("fs/fscache/volumes", S_IFREG | 0444, NULL,
+ &fscache_volumes_seq_ops))
+ goto error;
+
#ifdef CONFIG_FSCACHE_STATS
if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
fscache_stats_show))
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index bd92f93e1680..b811a4d03585 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -10,12 +10,24 @@
#include <linux/seq_file.h>
#include "internal.h"

+/*
+ * operation counters
+ */
+atomic_t fscache_n_volumes;
+atomic_t fscache_n_volumes_collision;
+atomic_t fscache_n_volumes_nomem;
+
/*
* display the general statistics
*/
int fscache_stats_show(struct seq_file *m, void *v)
{
seq_puts(m, "FS-Cache statistics\n");
+ seq_printf(m, "Cookies: v=%d vcol=%u voom=%u\n",
+ atomic_read(&fscache_n_volumes),
+ atomic_read(&fscache_n_volumes_collision),
+ atomic_read(&fscache_n_volumes_nomem)
+ );

netfs_stats_show(m);
return 0;
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
new file mode 100644
index 000000000000..630894fefd02
--- /dev/null
+++ b/fs/fscache/volume.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Volume-level cache cookie handling.
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define FSCACHE_DEBUG_LEVEL COOKIE
+#include <linux/export.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+#define fscache_volume_hash_shift 10
+static struct hlist_bl_head fscache_volume_hash[1 << fscache_volume_hash_shift];
+static atomic_t fscache_volume_debug_id;
+static LIST_HEAD(fscache_volumes);
+
+struct fscache_volume *fscache_get_volume(struct fscache_volume *volume,
+ enum fscache_volume_trace where)
+{
+ int ref;
+
+ __refcount_inc(&volume->ref, &ref);
+ trace_fscache_volume(volume->debug_id, ref + 1, where);
+ return volume;
+}
+
+static void fscache_see_volume(struct fscache_volume *volume,
+ enum fscache_volume_trace where)
+{
+ int ref = refcount_read(&volume->ref);
+
+ trace_fscache_volume(volume->debug_id, ref, where);
+}
+
+static bool fscache_volume_same(const struct fscache_volume *a,
+ const struct fscache_volume *b)
+{
+ size_t klen;
+
+ if (a->key_hash != b->key_hash ||
+ a->cache != b->cache ||
+ a->key[0] != b->key[0])
+ return false;
+
+ klen = round_up(a->key[0] + 1, sizeof(__le32));
+ return memcmp(a->key, b->key, klen) == 0;
+}
+
+static bool fscache_is_acquire_pending(struct fscache_volume *volume)
+{
+ return test_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &volume->flags);
+}
+
+static void fscache_wait_on_volume_collision(struct fscache_volume *candidate,
+ unsigned int collidee_debug_id)
+{
+ wait_var_event_timeout(&candidate->flags,
+ fscache_is_acquire_pending(candidate), 20 * HZ);
+ if (!fscache_is_acquire_pending(candidate)) {
+ pr_notice("Potential volume collision new=%08x old=%08x",
+ candidate->debug_id, collidee_debug_id);
+ fscache_stat(&fscache_n_volumes_collision);
+ wait_var_event(&candidate->flags, fscache_is_acquire_pending(candidate));
+ }
+}
+
+/*
+ * Attempt to insert the new volume into the hash. If there's a collision, we
+ * wait for the old volume to complete if it's being relinquished and an error
+ * otherwise.
+ */
+static bool fscache_hash_volume(struct fscache_volume *candidate)
+{
+ struct fscache_volume *cursor;
+ struct hlist_bl_head *h;
+ struct hlist_bl_node *p;
+ unsigned int bucket, collidee_debug_id = 0;
+
+ bucket = candidate->key_hash & (ARRAY_SIZE(fscache_volume_hash) - 1);
+ h = &fscache_volume_hash[bucket];
+
+ hlist_bl_lock(h);
+ hlist_bl_for_each_entry(cursor, p, h, hash_link) {
+ if (fscache_volume_same(candidate, cursor)) {
+ if (!test_bit(FSCACHE_VOLUME_RELINQUISHED, &cursor->flags))
+ goto collision;
+ fscache_see_volume(cursor, fscache_volume_get_hash_collision);
+ set_bit(FSCACHE_VOLUME_COLLIDED_WITH, &cursor->flags);
+ set_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &candidate->flags);
+ collidee_debug_id = cursor->debug_id;
+ break;
+ }
+ }
+
+ hlist_bl_add_head(&candidate->hash_link, h);
+ hlist_bl_unlock(h);
+
+ if (test_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &candidate->flags))
+ fscache_wait_on_volume_collision(candidate, collidee_debug_id);
+ return true;
+
+collision:
+ fscache_see_volume(cursor, fscache_volume_collision);
+ hlist_bl_unlock(h);
+ return false;
+}
+
+/*
+ * Allocate and initialise a volume representation cookie.
+ */
+static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
+ const char *cache_name,
+ const void *coherency_data,
+ size_t coherency_len)
+{
+ struct fscache_volume *volume;
+ struct fscache_cache *cache;
+ size_t klen, hlen;
+ char *key;
+
+ cache = fscache_lookup_cache(cache_name, false);
+ if (IS_ERR(cache))
+ return NULL;
+
+ volume = kzalloc(sizeof(*volume), GFP_KERNEL);
+ if (!volume)
+ goto err_cache;
+
+ volume->cache = cache;
+ INIT_LIST_HEAD(&volume->proc_link);
+ INIT_WORK(&volume->work, NULL /* PLACEHOLDER */);
+ refcount_set(&volume->ref, 1);
+ spin_lock_init(&volume->lock);
+
+ /* Stick the length on the front of the key and pad it out to make
+ * hashing easier.
+ */
+ klen = strlen(volume_key);
+ hlen = round_up(1 + klen + 1, sizeof(__le32));
+ key = kzalloc(hlen, GFP_KERNEL);
+ if (!key)
+ goto err_vol;
+ key[0] = klen;
+ memcpy(key + 1, volume_key, klen);
+
+ volume->key = key;
+ volume->key_hash = fscache_hash(0, key, hlen);
+
+ volume->debug_id = atomic_inc_return(&fscache_volume_debug_id);
+ down_write(&fscache_addremove_sem);
+ atomic_inc(&cache->n_volumes);
+ list_add_tail(&volume->proc_link, &fscache_volumes);
+ fscache_see_volume(volume, fscache_volume_new_acquire);
+ fscache_stat(&fscache_n_volumes);
+ up_write(&fscache_addremove_sem);
+ _leave(" = v=%x", volume->debug_id);
+ return volume;
+
+err_vol:
+ kfree(volume);
+err_cache:
+ fscache_put_cache(cache, fscache_cache_put_alloc_volume);
+ fscache_stat(&fscache_n_volumes_nomem);
+ return NULL;
+}
+
+/*
+ * Acquire a volume representation cookie and link it to a (proposed) cache.
+ */
+struct fscache_volume *__fscache_acquire_volume(const char *volume_key,
+ const char *cache_name,
+ const void *coherency_data,
+ size_t coherency_len)
+{
+ struct fscache_volume *volume;
+
+ volume = fscache_alloc_volume(volume_key, cache_name,
+ coherency_data, coherency_len);
+ if (!volume)
+ return ERR_PTR(-ENOMEM);
+
+ if (!fscache_hash_volume(volume)) {
+ fscache_put_volume(volume, fscache_volume_put_hash_collision);
+ return ERR_PTR(-EBUSY);
+ }
+
+ // PLACEHOLDER: Create the volume if we have a cache available
+ return volume;
+}
+EXPORT_SYMBOL(__fscache_acquire_volume);
+
+static void fscache_wake_pending_volume(struct fscache_volume *volume,
+ struct hlist_bl_head *h)
+{
+ struct fscache_volume *cursor;
+ struct hlist_bl_node *p;
+
+ hlist_bl_for_each_entry(cursor, p, h, hash_link) {
+ if (fscache_volume_same(cursor, volume)) {
+ fscache_see_volume(cursor, fscache_volume_see_hash_wake);
+ clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
+ wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
+ return;
+ }
+ }
+}
+
+/*
+ * Remove a volume cookie from the hash table.
+ */
+static void fscache_unhash_volume(struct fscache_volume *volume)
+{
+ struct hlist_bl_head *h;
+ unsigned int bucket;
+
+ bucket = volume->key_hash & (ARRAY_SIZE(fscache_volume_hash) - 1);
+ h = &fscache_volume_hash[bucket];
+
+ hlist_bl_lock(h);
+ hlist_bl_del(&volume->hash_link);
+ if (test_bit(FSCACHE_VOLUME_COLLIDED_WITH, &volume->flags))
+ fscache_wake_pending_volume(volume, h);
+ hlist_bl_unlock(h);
+}
+
+/*
+ * Drop a cache's volume attachments.
+ */
+static void fscache_free_volume(struct fscache_volume *volume)
+{
+ struct fscache_cache *cache = volume->cache;
+
+ if (volume->cache_priv) {
+ // PLACEHOLDER: Detach any attached cache
+ }
+
+ down_write(&fscache_addremove_sem);
+ list_del_init(&volume->proc_link);
+ atomic_dec(&volume->cache->n_volumes);
+ up_write(&fscache_addremove_sem);
+
+ if (!hlist_bl_unhashed(&volume->hash_link))
+ fscache_unhash_volume(volume);
+
+ trace_fscache_volume(volume->debug_id, 0, fscache_volume_free);
+ kfree(volume->key);
+ kfree(volume);
+ fscache_stat_d(&fscache_n_volumes);
+ fscache_put_cache(cache, fscache_cache_put_volume);
+}
+
+/*
+ * Drop a reference to a volume cookie.
+ */
+void fscache_put_volume(struct fscache_volume *volume,
+ enum fscache_volume_trace where)
+{
+ if (volume) {
+ unsigned int debug_id = volume->debug_id;
+ bool zero;
+ int ref;
+
+ zero = __refcount_dec_and_test(&volume->ref, &ref);
+ trace_fscache_volume(debug_id, ref - 1, where);
+ if (zero)
+ fscache_free_volume(volume);
+ }
+}
+
+/*
+ * Relinquish a volume representation cookie.
+ */
+void __fscache_relinquish_volume(struct fscache_volume *volume,
+ const void *coherency_data,
+ bool invalidate)
+{
+ if (WARN_ON(test_and_set_bit(FSCACHE_VOLUME_RELINQUISHED, &volume->flags)))
+ return;
+
+ if (invalidate)
+ set_bit(FSCACHE_VOLUME_INVALIDATE, &volume->flags);
+
+ fscache_put_volume(volume, fscache_volume_put_relinquish);
+}
+EXPORT_SYMBOL(__fscache_relinquish_volume);
+
+#ifdef CONFIG_PROC_FS
+/*
+ * Generate a list of volumes in /proc/fs/fscache/volumes
+ */
+static int fscache_volumes_seq_show(struct seq_file *m, void *v)
+{
+ struct fscache_volume *volume;
+
+ if (v == &fscache_volumes) {
+ seq_puts(m,
+ "VOLUME REF nCOOK ACC FL CACHE KEY\n"
+ "======== ===== ===== === == =============== ================\n");
+ return 0;
+ }
+
+ volume = list_entry(v, struct fscache_volume, proc_link);
+ seq_printf(m,
+ "%08x %5d %5d %3d %02lx %-15.15s %s\n",
+ volume->debug_id,
+ refcount_read(&volume->ref),
+ atomic_read(&volume->n_cookies),
+ atomic_read(&volume->n_accesses),
+ volume->flags,
+ volume->cache->name ?: "-",
+ volume->key + 1);
+ return 0;
+}
+
+static void *fscache_volumes_seq_start(struct seq_file *m, loff_t *_pos)
+ __acquires(&fscache_addremove_sem)
+{
+ down_read(&fscache_addremove_sem);
+ return seq_list_start_head(&fscache_volumes, *_pos);
+}
+
+static void *fscache_volumes_seq_next(struct seq_file *m, void *v, loff_t *_pos)
+{
+ return seq_list_next(v, &fscache_volumes, _pos);
+}
+
+static void fscache_volumes_seq_stop(struct seq_file *m, void *v)
+ __releases(&fscache_addremove_sem)
+{
+ up_read(&fscache_addremove_sem);
+}
+
+const struct seq_operations fscache_volumes_seq_ops = {
+ .start = fscache_volumes_seq_start,
+ .next = fscache_volumes_seq_next,
+ .stop = fscache_volumes_seq_stop,
+ .show = fscache_volumes_seq_show,
+};
+#endif /* CONFIG_PROC_FS */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 1cf90c252aac..131a741a6652 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -20,13 +20,97 @@
#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
#define __fscache_available (1)
#define fscache_available() (1)
+#define fscache_volume_valid(volume) (volume)
#define fscache_cookie_valid(cookie) (cookie)
#define fscache_cookie_enabled(cookie) (cookie)
#else
#define __fscache_available (0)
#define fscache_available() (0)
+#define fscache_volume_valid(volume) (0)
#define fscache_cookie_valid(cookie) (0)
#define fscache_cookie_enabled(cookie) (0)
#endif

+/*
+ * Volume representation cookie.
+ */
+struct fscache_volume {
+ refcount_t ref;
+ atomic_t n_cookies; /* Number of data cookies in volume */
+ atomic_t n_accesses; /* Number of cache accesses in progress */
+ unsigned int debug_id;
+ unsigned int key_hash; /* Hash of key string */
+ char *key; /* Volume ID, eg. "[email protected]@1234" */
+ struct list_head proc_link; /* Link in /proc/fs/fscache/volumes */
+ struct hlist_bl_node hash_link; /* Link in hash table */
+ struct work_struct work;
+ struct fscache_cache *cache; /* The cache in which this resides */
+ void *cache_priv; /* Cache private data */
+ spinlock_t lock;
+ unsigned long flags;
+#define FSCACHE_VOLUME_RELINQUISHED 0 /* Volume is being cleaned up */
+#define FSCACHE_VOLUME_INVALIDATE 1 /* Volume was invalidated */
+#define FSCACHE_VOLUME_COLLIDED_WITH 2 /* Volume was collided with */
+#define FSCACHE_VOLUME_ACQUIRE_PENDING 3 /* Volume is waiting to complete acquisition */
+#define FSCACHE_VOLUME_CREATING 4 /* Volume is being created on disk */
+};
+
+/*
+ * slow-path functions for when there is actually caching available, and the
+ * netfs does actually have a valid token
+ * - these are not to be called directly
+ * - these are undefined symbols when FS-Cache is not configured and the
+ * optimiser takes care of not using them
+ */
+extern struct fscache_volume *__fscache_acquire_volume(const char *, const char *,
+ const void *, size_t);
+extern void __fscache_relinquish_volume(struct fscache_volume *, const void *, bool);
+
+/**
+ * fscache_acquire_volume - Register a volume as desiring caching services
+ * @volume_key: An identification string for the volume
+ * @cache_name: The name of the cache to use (or NULL for the default)
+ * @coherency_data: Piece of arbitrary coherency data to check (or NULL)
+ * @coherency_len: The size of the coherency data
+ *
+ * Register a volume as desiring caching services if they're available. The
+ * caller must provide an identifier for the volume and may also indicate which
+ * cache it should be in. If a preexisting volume entry is found in the cache,
+ * the coherency data must match otherwise the entry will be invalidated.
+ *
+ * Returns a cookie pointer on success, -ENOMEM if out of memory or -EBUSY if a
+ * cache volume of that name is already acquired. Note that "NULL" is a valid
+ * cookie pointer and can be returned if caching is refused.
+ */
+static inline
+struct fscache_volume *fscache_acquire_volume(const char *volume_key,
+ const char *cache_name,
+ const void *coherency_data,
+ size_t coherency_len)
+{
+ if (!fscache_available())
+ return NULL;
+ return __fscache_acquire_volume(volume_key, cache_name,
+ coherency_data, coherency_len);
+}
+
+/**
+ * fscache_relinquish_volume - Cease caching a volume
+ * @volume: The volume cookie
+ * @coherency_data: Piece of arbitrary coherency data to set (or NULL)
+ * @invalidate: True if the volume should be invalidated
+ *
+ * Indicate that a filesystem no longer desires caching services for a volume.
+ * The caller must have relinquished all file cookies prior to calling this.
+ * The stored coherency data is updated.
+ */
+static inline
+void fscache_relinquish_volume(struct fscache_volume *volume,
+ const void *coherency_data,
+ bool invalidate)
+{
+ if (fscache_volume_valid(volume))
+ __fscache_relinquish_volume(volume, coherency_data, invalidate);
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 3b8e0597b2c1..eeb3e7d88e20 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -23,9 +23,26 @@ enum fscache_cache_trace {
fscache_cache_collision,
fscache_cache_get_acquire,
fscache_cache_new_acquire,
+ fscache_cache_put_alloc_volume,
fscache_cache_put_cache,
fscache_cache_put_prep_failed,
fscache_cache_put_relinquish,
+ fscache_cache_put_volume,
+};
+
+enum fscache_volume_trace {
+ fscache_volume_collision,
+ fscache_volume_get_cookie,
+ fscache_volume_get_create_work,
+ fscache_volume_get_hash_collision,
+ fscache_volume_free,
+ fscache_volume_new_acquire,
+ fscache_volume_put_cookie,
+ fscache_volume_put_create_work,
+ fscache_volume_put_hash_collision,
+ fscache_volume_put_relinquish,
+ fscache_volume_see_create_work,
+ fscache_volume_see_hash_wake,
};

#endif
@@ -37,9 +54,25 @@ enum fscache_cache_trace {
EM(fscache_cache_collision, "*COLLIDE*") \
EM(fscache_cache_get_acquire, "GET acq ") \
EM(fscache_cache_new_acquire, "NEW acq ") \
+ EM(fscache_cache_put_alloc_volume, "PUT alvol") \
EM(fscache_cache_put_cache, "PUT cache") \
EM(fscache_cache_put_prep_failed, "PUT pfail") \
- E_(fscache_cache_put_relinquish, "PUT relnq")
+ EM(fscache_cache_put_relinquish, "PUT relnq") \
+ E_(fscache_cache_put_volume, "PUT vol ")
+
+#define fscache_volume_traces \
+ EM(fscache_volume_collision, "*COLLIDE*") \
+ EM(fscache_volume_get_cookie, "GET cook ") \
+ EM(fscache_volume_get_create_work, "GET creat") \
+ EM(fscache_volume_get_hash_collision, "GET hcoll") \
+ EM(fscache_volume_free, "FREE ") \
+ EM(fscache_volume_new_acquire, "NEW acq ") \
+ EM(fscache_volume_put_cookie, "PUT cook ") \
+ EM(fscache_volume_put_create_work, "PUT creat") \
+ EM(fscache_volume_put_hash_collision, "PUT hcoll") \
+ EM(fscache_volume_put_relinquish, "PUT relnq") \
+ EM(fscache_volume_see_create_work, "SEE creat") \
+ E_(fscache_volume_see_hash_wake, "SEE hwake")

/*
* Export enum symbols via userspace.
@@ -50,6 +83,7 @@ enum fscache_cache_trace {
#define E_(a, b) TRACE_DEFINE_ENUM(a);

fscache_cache_traces;
+fscache_volume_traces;

/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -86,6 +120,31 @@ TRACE_EVENT(fscache_cache,
__entry->usage)
);

+TRACE_EVENT(fscache_volume,
+ TP_PROTO(unsigned int volume_debug_id,
+ int usage,
+ enum fscache_volume_trace where),
+
+ TP_ARGS(volume_debug_id, usage, where),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, volume )
+ __field(int, usage )
+ __field(enum fscache_volume_trace, where )
+ ),
+
+ TP_fast_assign(
+ __entry->volume = volume_debug_id;
+ __entry->usage = usage;
+ __entry->where = where;
+ ),
+
+ TP_printk("V=%08x %s u=%d",
+ __entry->volume,
+ __print_symbolic(__entry->where, fscache_volume_traces),
+ __entry->usage)
+ );
+
#endif /* _TRACE_FSCACHE_H */

/* This part must be outside protection */



2021-12-22 23:16:43

by David Howells

[permalink] [raw]
Subject: [PATCH v4 10/68] fscache: Implement cookie registration

Add functions to the fscache API to allow data file cookies to be acquired
and relinquished by the network filesystem. It is intended that the
filesystem will create such cookies per-inode under a volume.

To request a cookie, the filesystem should call:

struct fscache_cookie *
fscache_acquire_cookie(struct fscache_volume *volume,
u8 advice,
const void *index_key,
size_t index_key_len,
const void *aux_data,
size_t aux_data_len,
loff_t object_size)


The filesystem must first have created a volume cookie, which is passed in
here. If it passes in NULL then the function will just return a NULL
cookie.

A binary key should be passed in index_key and is of size index_key_len.
This is saved in the cookie and is used to locate the associated data in
the cache.

A coherency data buffer of size aux_data_len will be allocated and
initialised from the buffer pointed to by aux_data. This is used to
validate cache objects when they're opened and is stored on disk with them
when they're committed. The data is stored in the cookie and will be
updateable by various functions in later patches.

The object_size must also be given. This is also used to perform a
coherency check and to size the backing storage appropriately.

This function disallows a cookie from being acquired twice in parallel,
though it will cause the second user to wait if the first is busy
relinquishing its cookie.


When a network filesystem has finished with a cookie, it should call:

void
fscache_relinquish_cookie(struct fscache_volume *volume,
bool retire)

If retire is true, any backing data will be discarded immediately.

Changes
=======
ver #3:
- fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit
to round up to.
- When comparing cookies, simply see if the attributes are the same rather
than subtracting them to produce a strcmp-style return[1].
- Add a check to see if the cookie is still hashed at the point of
freeing.

ver #2:
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.
- Remove the unused cookie pointer field from the fscache_acquire
tracepoint.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/r/163819590658.215744.14934902514281054323.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906891983.143852.6219772337558577395.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967088507.1823006.12659006350221417165.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/Makefile | 1
fs/fscache/cookie.c | 497 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/internal.h | 23 ++
fs/fscache/main.c | 12 +
fs/fscache/proc.c | 4
fs/fscache/stats.c | 28 ++
include/linux/fscache-cache.h | 22 ++
include/linux/fscache.h | 134 +++++++++++
include/trace/events/fscache.h | 111 +++++++++
9 files changed, 831 insertions(+), 1 deletion(-)
create mode 100644 fs/fscache/cookie.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index bb5282ae682f..bcc79615f93a 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -5,6 +5,7 @@

fscache-y := \
cache.o \
+ cookie.o \
main.o \
volume.o

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
new file mode 100644
index 000000000000..438b0098aa73
--- /dev/null
+++ b/fs/fscache/cookie.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* netfs cookie management
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ *
+ * See Documentation/filesystems/caching/netfs-api.rst for more information on
+ * the netfs API.
+ */
+
+#define FSCACHE_DEBUG_LEVEL COOKIE
+#include <linux/module.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+struct kmem_cache *fscache_cookie_jar;
+
+static void fscache_drop_cookie(struct fscache_cookie *cookie);
+
+#define fscache_cookie_hash_shift 15
+static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
+static LIST_HEAD(fscache_cookies);
+static DEFINE_RWLOCK(fscache_cookies_lock);
+static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAFWRD";
+
+void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
+{
+ const u8 *k;
+
+ pr_err("%c-cookie c=%08x [fl=%lx na=%u nA=%u s=%c]\n",
+ prefix,
+ cookie->debug_id,
+ cookie->flags,
+ atomic_read(&cookie->n_active),
+ atomic_read(&cookie->n_accesses),
+ fscache_cookie_states[cookie->state]);
+ pr_err("%c-cookie V=%08x [%s]\n",
+ prefix,
+ cookie->volume->debug_id,
+ cookie->volume->key);
+
+ k = (cookie->key_len <= sizeof(cookie->inline_key)) ?
+ cookie->inline_key : cookie->key;
+ pr_err("%c-key=[%u] '%*phN'\n", prefix, cookie->key_len, cookie->key_len, k);
+}
+
+static void fscache_free_cookie(struct fscache_cookie *cookie)
+{
+ if (WARN_ON_ONCE(test_bit(FSCACHE_COOKIE_IS_HASHED, &cookie->flags))) {
+ fscache_print_cookie(cookie, 'F');
+ return;
+ }
+
+ write_lock(&fscache_cookies_lock);
+ list_del(&cookie->proc_link);
+ write_unlock(&fscache_cookies_lock);
+ if (cookie->aux_len > sizeof(cookie->inline_aux))
+ kfree(cookie->aux);
+ if (cookie->key_len > sizeof(cookie->inline_key))
+ kfree(cookie->key);
+ fscache_stat_d(&fscache_n_cookies);
+ kmem_cache_free(fscache_cookie_jar, cookie);
+}
+
+static inline void wake_up_cookie_state(struct fscache_cookie *cookie)
+{
+ /* Use a barrier to ensure that waiters see the state variable
+ * change, as spin_unlock doesn't guarantee a barrier.
+ *
+ * See comments over wake_up_bit() and waitqueue_active().
+ */
+ smp_mb();
+ wake_up_var(&cookie->state);
+}
+
+static void __fscache_set_cookie_state(struct fscache_cookie *cookie,
+ enum fscache_cookie_state state)
+{
+ cookie->state = state;
+}
+
+/*
+ * Change the state a cookie is at and wake up anyone waiting for that - but
+ * only if the cookie isn't already marked as being in a cleanup state.
+ */
+void fscache_set_cookie_state(struct fscache_cookie *cookie,
+ enum fscache_cookie_state state)
+{
+ bool changed = false;
+
+ spin_lock(&cookie->lock);
+ switch (cookie->state) {
+ case FSCACHE_COOKIE_STATE_RELINQUISHING:
+ break;
+ default:
+ __fscache_set_cookie_state(cookie, state);
+ changed = true;
+ break;
+ }
+ spin_unlock(&cookie->lock);
+ if (changed)
+ wake_up_cookie_state(cookie);
+}
+EXPORT_SYMBOL(fscache_set_cookie_state);
+
+/*
+ * Set the index key in a cookie. The cookie struct has space for a 16-byte
+ * key plus length and hash, but if that's not big enough, it's instead a
+ * pointer to a buffer containing 3 bytes of hash, 1 byte of length and then
+ * the key data.
+ */
+static int fscache_set_key(struct fscache_cookie *cookie,
+ const void *index_key, size_t index_key_len)
+{
+ void *buf;
+ size_t buf_size;
+
+ buf_size = round_up(index_key_len, sizeof(__le32));
+
+ if (index_key_len > sizeof(cookie->inline_key)) {
+ buf = kzalloc(buf_size, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+ cookie->key = buf;
+ } else {
+ buf = cookie->inline_key;
+ }
+
+ memcpy(buf, index_key, index_key_len);
+ cookie->key_hash = fscache_hash(cookie->volume->key_hash,
+ buf, buf_size);
+ return 0;
+}
+
+static bool fscache_cookie_same(const struct fscache_cookie *a,
+ const struct fscache_cookie *b)
+{
+ const void *ka, *kb;
+
+ if (a->key_hash != b->key_hash ||
+ a->volume != b->volume ||
+ a->key_len != b->key_len)
+ return false;
+
+ if (a->key_len <= sizeof(a->inline_key)) {
+ ka = &a->inline_key;
+ kb = &b->inline_key;
+ } else {
+ ka = a->key;
+ kb = b->key;
+ }
+ return memcmp(ka, kb, a->key_len) == 0;
+}
+
+static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
+
+/*
+ * Allocate a cookie.
+ */
+static struct fscache_cookie *fscache_alloc_cookie(
+ struct fscache_volume *volume,
+ u8 advice,
+ const void *index_key, size_t index_key_len,
+ const void *aux_data, size_t aux_data_len,
+ loff_t object_size)
+{
+ struct fscache_cookie *cookie;
+
+ /* allocate and initialise a cookie */
+ cookie = kmem_cache_zalloc(fscache_cookie_jar, GFP_KERNEL);
+ if (!cookie)
+ return NULL;
+ fscache_stat(&fscache_n_cookies);
+
+ cookie->volume = volume;
+ cookie->advice = advice;
+ cookie->key_len = index_key_len;
+ cookie->aux_len = aux_data_len;
+ cookie->object_size = object_size;
+ if (object_size == 0)
+ __set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+
+ if (fscache_set_key(cookie, index_key, index_key_len) < 0)
+ goto nomem;
+
+ if (cookie->aux_len <= sizeof(cookie->inline_aux)) {
+ memcpy(cookie->inline_aux, aux_data, cookie->aux_len);
+ } else {
+ cookie->aux = kmemdup(aux_data, cookie->aux_len, GFP_KERNEL);
+ if (!cookie->aux)
+ goto nomem;
+ }
+
+ refcount_set(&cookie->ref, 1);
+ cookie->debug_id = atomic_inc_return(&fscache_cookie_debug_id);
+ cookie->state = FSCACHE_COOKIE_STATE_QUIESCENT;
+ spin_lock_init(&cookie->lock);
+ INIT_LIST_HEAD(&cookie->commit_link);
+ INIT_WORK(&cookie->work, NULL /* PLACEHOLDER */);
+
+ write_lock(&fscache_cookies_lock);
+ list_add_tail(&cookie->proc_link, &fscache_cookies);
+ write_unlock(&fscache_cookies_lock);
+ fscache_see_cookie(cookie, fscache_cookie_new_acquire);
+ return cookie;
+
+nomem:
+ fscache_free_cookie(cookie);
+ return NULL;
+}
+
+static void fscache_wait_on_collision(struct fscache_cookie *candidate,
+ struct fscache_cookie *wait_for)
+{
+ enum fscache_cookie_state *statep = &wait_for->state;
+
+ wait_var_event_timeout(statep, READ_ONCE(*statep) == FSCACHE_COOKIE_STATE_DROPPED,
+ 20 * HZ);
+ if (READ_ONCE(*statep) != FSCACHE_COOKIE_STATE_DROPPED) {
+ pr_notice("Potential collision c=%08x old: c=%08x",
+ candidate->debug_id, wait_for->debug_id);
+ wait_var_event(statep, READ_ONCE(*statep) == FSCACHE_COOKIE_STATE_DROPPED);
+ }
+}
+
+/*
+ * Attempt to insert the new cookie into the hash. If there's a collision, we
+ * wait for the old cookie to complete if it's being relinquished and an error
+ * otherwise.
+ */
+static bool fscache_hash_cookie(struct fscache_cookie *candidate)
+{
+ struct fscache_cookie *cursor, *wait_for = NULL;
+ struct hlist_bl_head *h;
+ struct hlist_bl_node *p;
+ unsigned int bucket;
+
+ bucket = candidate->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1);
+ h = &fscache_cookie_hash[bucket];
+
+ hlist_bl_lock(h);
+ hlist_bl_for_each_entry(cursor, p, h, hash_link) {
+ if (fscache_cookie_same(candidate, cursor)) {
+ if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cursor->flags))
+ goto collision;
+ wait_for = fscache_get_cookie(cursor,
+ fscache_cookie_get_hash_collision);
+ break;
+ }
+ }
+
+ fscache_get_volume(candidate->volume, fscache_volume_get_cookie);
+ atomic_inc(&candidate->volume->n_cookies);
+ hlist_bl_add_head(&candidate->hash_link, h);
+ set_bit(FSCACHE_COOKIE_IS_HASHED, &candidate->flags);
+ hlist_bl_unlock(h);
+
+ if (wait_for) {
+ fscache_wait_on_collision(candidate, wait_for);
+ fscache_put_cookie(wait_for, fscache_cookie_put_hash_collision);
+ }
+ return true;
+
+collision:
+ trace_fscache_cookie(cursor->debug_id, refcount_read(&cursor->ref),
+ fscache_cookie_collision);
+ pr_err("Duplicate cookie detected\n");
+ fscache_print_cookie(cursor, 'O');
+ fscache_print_cookie(candidate, 'N');
+ hlist_bl_unlock(h);
+ return false;
+}
+
+/*
+ * Request a cookie to represent a data storage object within a volume.
+ *
+ * We never let on to the netfs about errors. We may set a negative cookie
+ * pointer, but that's okay
+ */
+struct fscache_cookie *__fscache_acquire_cookie(
+ struct fscache_volume *volume,
+ u8 advice,
+ const void *index_key, size_t index_key_len,
+ const void *aux_data, size_t aux_data_len,
+ loff_t object_size)
+{
+ struct fscache_cookie *cookie;
+
+ _enter("V=%x", volume->debug_id);
+
+ if (!index_key || !index_key_len || index_key_len > 255 || aux_data_len > 255)
+ return NULL;
+ if (!aux_data || !aux_data_len) {
+ aux_data = NULL;
+ aux_data_len = 0;
+ }
+
+ fscache_stat(&fscache_n_acquires);
+
+ cookie = fscache_alloc_cookie(volume, advice,
+ index_key, index_key_len,
+ aux_data, aux_data_len,
+ object_size);
+ if (!cookie) {
+ fscache_stat(&fscache_n_acquires_oom);
+ return NULL;
+ }
+
+ if (!fscache_hash_cookie(cookie)) {
+ fscache_see_cookie(cookie, fscache_cookie_discard);
+ fscache_free_cookie(cookie);
+ return NULL;
+ }
+
+ trace_fscache_acquire(cookie);
+ fscache_stat(&fscache_n_acquires_ok);
+ _leave(" = c=%08x", cookie->debug_id);
+ return cookie;
+}
+EXPORT_SYMBOL(__fscache_acquire_cookie);
+
+/*
+ * Remove a cookie from the hash table.
+ */
+static void fscache_unhash_cookie(struct fscache_cookie *cookie)
+{
+ struct hlist_bl_head *h;
+ unsigned int bucket;
+
+ bucket = cookie->key_hash & (ARRAY_SIZE(fscache_cookie_hash) - 1);
+ h = &fscache_cookie_hash[bucket];
+
+ hlist_bl_lock(h);
+ hlist_bl_del(&cookie->hash_link);
+ clear_bit(FSCACHE_COOKIE_IS_HASHED, &cookie->flags);
+ hlist_bl_unlock(h);
+}
+
+/*
+ * Finalise a cookie after all its resources have been disposed of.
+ */
+static void fscache_drop_cookie(struct fscache_cookie *cookie)
+{
+ spin_lock(&cookie->lock);
+ __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_DROPPED);
+ spin_unlock(&cookie->lock);
+ wake_up_cookie_state(cookie);
+
+ fscache_unhash_cookie(cookie);
+ fscache_stat(&fscache_n_relinquishes_dropped);
+}
+
+/*
+ * Allow the netfs to release a cookie back to the cache.
+ * - the object will be marked as recyclable on disk if retire is true
+ */
+void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
+{
+ fscache_stat(&fscache_n_relinquishes);
+ if (retire)
+ fscache_stat(&fscache_n_relinquishes_retire);
+
+ _enter("c=%08x{%d},%d",
+ cookie->debug_id, atomic_read(&cookie->n_active), retire);
+
+ if (WARN(test_and_set_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags),
+ "Cookie c=%x already relinquished\n", cookie->debug_id))
+ return;
+
+ if (retire)
+ set_bit(FSCACHE_COOKIE_RETIRED, &cookie->flags);
+ trace_fscache_relinquish(cookie, retire);
+
+ ASSERTCMP(atomic_read(&cookie->n_active), ==, 0);
+ ASSERTCMP(atomic_read(&cookie->volume->n_cookies), >, 0);
+ atomic_dec(&cookie->volume->n_cookies);
+
+ set_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags);
+
+ if (test_bit(FSCACHE_COOKIE_HAS_BEEN_CACHED, &cookie->flags))
+ ; // PLACEHOLDER: Do something here if the cookie was cached
+ else
+ fscache_drop_cookie(cookie);
+ fscache_put_cookie(cookie, fscache_cookie_put_relinquish);
+}
+EXPORT_SYMBOL(__fscache_relinquish_cookie);
+
+/*
+ * Drop a reference to a cookie.
+ */
+void fscache_put_cookie(struct fscache_cookie *cookie,
+ enum fscache_cookie_trace where)
+{
+ struct fscache_volume *volume = cookie->volume;
+ unsigned int cookie_debug_id = cookie->debug_id;
+ bool zero;
+ int ref;
+
+ zero = __refcount_dec_and_test(&cookie->ref, &ref);
+ trace_fscache_cookie(cookie_debug_id, ref - 1, where);
+ if (zero) {
+ fscache_free_cookie(cookie);
+ fscache_put_volume(volume, fscache_volume_put_cookie);
+ }
+}
+EXPORT_SYMBOL(fscache_put_cookie);
+
+/*
+ * Get a reference to a cookie.
+ */
+struct fscache_cookie *fscache_get_cookie(struct fscache_cookie *cookie,
+ enum fscache_cookie_trace where)
+{
+ int ref;
+
+ __refcount_inc(&cookie->ref, &ref);
+ trace_fscache_cookie(cookie->debug_id, ref + 1, where);
+ return cookie;
+}
+EXPORT_SYMBOL(fscache_get_cookie);
+
+/*
+ * Generate a list of extant cookies in /proc/fs/fscache/cookies
+ */
+static int fscache_cookies_seq_show(struct seq_file *m, void *v)
+{
+ struct fscache_cookie *cookie;
+ unsigned int keylen = 0, auxlen = 0;
+ u8 *p;
+
+ if (v == &fscache_cookies) {
+ seq_puts(m,
+ "COOKIE VOLUME REF ACT ACC S FL DEF \n"
+ "======== ======== === === === = == ================\n"
+ );
+ return 0;
+ }
+
+ cookie = list_entry(v, struct fscache_cookie, proc_link);
+
+ seq_printf(m,
+ "%08x %08x %3d %3d %3d %c %02lx",
+ cookie->debug_id,
+ cookie->volume->debug_id,
+ refcount_read(&cookie->ref),
+ atomic_read(&cookie->n_active),
+ atomic_read(&cookie->n_accesses),
+ fscache_cookie_states[cookie->state],
+ cookie->flags);
+
+ keylen = cookie->key_len;
+ auxlen = cookie->aux_len;
+
+ if (keylen > 0 || auxlen > 0) {
+ seq_puts(m, " ");
+ p = keylen <= sizeof(cookie->inline_key) ?
+ cookie->inline_key : cookie->key;
+ for (; keylen > 0; keylen--)
+ seq_printf(m, "%02x", *p++);
+ if (auxlen > 0) {
+ seq_puts(m, ", ");
+ p = auxlen <= sizeof(cookie->inline_aux) ?
+ cookie->inline_aux : cookie->aux;
+ for (; auxlen > 0; auxlen--)
+ seq_printf(m, "%02x", *p++);
+ }
+ }
+
+ seq_puts(m, "\n");
+ return 0;
+}
+
+static void *fscache_cookies_seq_start(struct seq_file *m, loff_t *_pos)
+ __acquires(fscache_cookies_lock)
+{
+ read_lock(&fscache_cookies_lock);
+ return seq_list_start_head(&fscache_cookies, *_pos);
+}
+
+static void *fscache_cookies_seq_next(struct seq_file *m, void *v, loff_t *_pos)
+{
+ return seq_list_next(v, &fscache_cookies, _pos);
+}
+
+static void fscache_cookies_seq_stop(struct seq_file *m, void *v)
+ __releases(rcu)
+{
+ read_unlock(&fscache_cookies_lock);
+}
+
+
+const struct seq_operations fscache_cookies_seq_ops = {
+ .start = fscache_cookies_seq_start,
+ .next = fscache_cookies_seq_next,
+ .stop = fscache_cookies_seq_stop,
+ .show = fscache_cookies_seq_show,
+};
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 07dc9cbc2280..71c897757d44 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -50,6 +50,20 @@ static inline bool fscache_set_cache_state_maybe(struct fscache_cache *cache,
return try_cmpxchg_release(&cache->state, &old_state, new_state);
}

+/*
+ * cookie.c
+ */
+extern struct kmem_cache *fscache_cookie_jar;
+extern const struct seq_operations fscache_cookies_seq_ops;
+
+extern void fscache_print_cookie(struct fscache_cookie *cookie, char prefix);
+static inline void fscache_see_cookie(struct fscache_cookie *cookie,
+ enum fscache_cookie_trace where)
+{
+ trace_fscache_cookie(cookie->debug_id, refcount_read(&cookie->ref),
+ where);
+}
+
/*
* main.c
*/
@@ -75,6 +89,15 @@ extern void fscache_proc_cleanup(void);
extern atomic_t fscache_n_volumes;
extern atomic_t fscache_n_volumes_collision;
extern atomic_t fscache_n_volumes_nomem;
+extern atomic_t fscache_n_cookies;
+
+extern atomic_t fscache_n_acquires;
+extern atomic_t fscache_n_acquires_ok;
+extern atomic_t fscache_n_acquires_oom;
+
+extern atomic_t fscache_n_relinquishes;
+extern atomic_t fscache_n_relinquishes_retire;
+extern atomic_t fscache_n_relinquishes_dropped;

static inline void fscache_stat(atomic_t *stat)
{
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 687b34903d5b..ae493e9ca1c9 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -79,9 +79,20 @@ static int __init fscache_init(void)
if (ret < 0)
goto error_proc;

+ fscache_cookie_jar = kmem_cache_create("fscache_cookie_jar",
+ sizeof(struct fscache_cookie),
+ 0, 0, NULL);
+ if (!fscache_cookie_jar) {
+ pr_notice("Failed to allocate a cookie jar\n");
+ ret = -ENOMEM;
+ goto error_cookie_jar;
+ }
+
pr_notice("Loaded\n");
return 0;

+error_cookie_jar:
+ fscache_proc_cleanup();
error_proc:
destroy_workqueue(fscache_wq);
error_wq:
@@ -97,6 +108,7 @@ static void __exit fscache_exit(void)
{
_enter("");

+ kmem_cache_destroy(fscache_cookie_jar);
fscache_proc_cleanup();
destroy_workqueue(fscache_wq);
pr_notice("Unloaded\n");
diff --git a/fs/fscache/proc.c b/fs/fscache/proc.c
index bc6ecbdd065d..dc3b0e9c8cce 100644
--- a/fs/fscache/proc.c
+++ b/fs/fscache/proc.c
@@ -27,6 +27,10 @@ int __init fscache_proc_init(void)
&fscache_volumes_seq_ops))
goto error;

+ if (!proc_create_seq("fs/fscache/cookies", S_IFREG | 0444, NULL,
+ &fscache_cookies_seq_ops))
+ goto error;
+
#ifdef CONFIG_FSCACHE_STATS
if (!proc_create_single("fs/fscache/stats", S_IFREG | 0444, NULL,
fscache_stats_show))
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index b811a4d03585..252e883ae148 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -16,6 +16,18 @@
atomic_t fscache_n_volumes;
atomic_t fscache_n_volumes_collision;
atomic_t fscache_n_volumes_nomem;
+atomic_t fscache_n_cookies;
+
+atomic_t fscache_n_acquires;
+atomic_t fscache_n_acquires_ok;
+atomic_t fscache_n_acquires_oom;
+
+atomic_t fscache_n_updates;
+EXPORT_SYMBOL(fscache_n_updates);
+
+atomic_t fscache_n_relinquishes;
+atomic_t fscache_n_relinquishes_retire;
+atomic_t fscache_n_relinquishes_dropped;

/*
* display the general statistics
@@ -23,12 +35,26 @@ atomic_t fscache_n_volumes_nomem;
int fscache_stats_show(struct seq_file *m, void *v)
{
seq_puts(m, "FS-Cache statistics\n");
- seq_printf(m, "Cookies: v=%d vcol=%u voom=%u\n",
+ seq_printf(m, "Cookies: n=%d v=%d vcol=%u voom=%u\n",
+ atomic_read(&fscache_n_cookies),
atomic_read(&fscache_n_volumes),
atomic_read(&fscache_n_volumes_collision),
atomic_read(&fscache_n_volumes_nomem)
);

+ seq_printf(m, "Acquire: n=%u ok=%u oom=%u\n",
+ atomic_read(&fscache_n_acquires),
+ atomic_read(&fscache_n_acquires_ok),
+ atomic_read(&fscache_n_acquires_oom));
+
+ seq_printf(m, "Updates: n=%u\n",
+ atomic_read(&fscache_n_updates));
+
+ seq_printf(m, "Relinqs: n=%u rtr=%u drop=%u\n",
+ atomic_read(&fscache_n_relinquishes),
+ atomic_read(&fscache_n_relinquishes_retire),
+ atomic_read(&fscache_n_relinquishes_dropped));
+
netfs_stats_show(m);
return 0;
}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 18cd5c9877bb..c4355b888c91 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -17,6 +17,7 @@
#include <linux/fscache.h>

enum fscache_cache_trace;
+enum fscache_cookie_trace;
enum fscache_access_trace;

enum fscache_cache_state {
@@ -52,4 +53,25 @@ extern struct rw_semaphore fscache_addremove_sem;
extern struct fscache_cache *fscache_acquire_cache(const char *name);
extern void fscache_relinquish_cache(struct fscache_cache *cache);

+extern struct fscache_cookie *fscache_get_cookie(struct fscache_cookie *cookie,
+ enum fscache_cookie_trace where);
+extern void fscache_put_cookie(struct fscache_cookie *cookie,
+ enum fscache_cookie_trace where);
+extern void fscache_set_cookie_state(struct fscache_cookie *cookie,
+ enum fscache_cookie_state state);
+
+/**
+ * fscache_get_key - Get a pointer to the cookie key
+ * @cookie: The cookie to query
+ *
+ * Return a pointer to the where a cookie's key is stored.
+ */
+static inline void *fscache_get_key(struct fscache_cookie *cookie)
+{
+ if (cookie->key_len <= sizeof(cookie->inline_key))
+ return cookie->inline_key;
+ else
+ return cookie->key;
+}
+
#endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 131a741a6652..4450d17c11e8 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -31,6 +31,27 @@
#define fscache_cookie_enabled(cookie) (0)
#endif

+struct fscache_cookie;
+
+#define FSCACHE_ADV_SINGLE_CHUNK 0x01 /* The object is a single chunk of data */
+#define FSCACHE_ADV_WRITE_CACHE 0x00 /* Do cache if written to locally */
+#define FSCACHE_ADV_WRITE_NOCACHE 0x02 /* Don't cache if written to locally */
+
+/*
+ * Data object state.
+ */
+enum fscache_cookie_state {
+ FSCACHE_COOKIE_STATE_QUIESCENT, /* The cookie is uncached */
+ FSCACHE_COOKIE_STATE_LOOKING_UP, /* The cache object is being looked up */
+ FSCACHE_COOKIE_STATE_CREATING, /* The cache object is being created */
+ FSCACHE_COOKIE_STATE_ACTIVE, /* The cache is active, readable and writable */
+ FSCACHE_COOKIE_STATE_FAILED, /* The cache failed, withdraw to clear */
+ FSCACHE_COOKIE_STATE_WITHDRAWING, /* The cookie is being withdrawn */
+ FSCACHE_COOKIE_STATE_RELINQUISHING, /* The cookie is being relinquished */
+ FSCACHE_COOKIE_STATE_DROPPED, /* The cookie has been dropped */
+#define FSCACHE_COOKIE_STATE__NR (FSCACHE_COOKIE_STATE_DROPPED + 1)
+} __attribute__((mode(byte)));
+
/*
* Volume representation cookie.
*/
@@ -55,6 +76,60 @@ struct fscache_volume {
#define FSCACHE_VOLUME_CREATING 4 /* Volume is being created on disk */
};

+/*
+ * Data file representation cookie.
+ * - a file will only appear in one cache
+ * - a request to cache a file may or may not be honoured, subject to
+ * constraints such as disk space
+ * - indices are created on disk just-in-time
+ */
+struct fscache_cookie {
+ refcount_t ref;
+ atomic_t n_active; /* number of active users of cookie */
+ atomic_t n_accesses; /* Number of cache accesses in progress */
+ unsigned int debug_id;
+ unsigned int inval_counter; /* Number of invalidations made */
+ spinlock_t lock;
+ struct fscache_volume *volume; /* Parent volume of this file. */
+ void *cache_priv; /* Cache-side representation */
+ struct hlist_bl_node hash_link; /* Link in hash table */
+ struct list_head proc_link; /* Link in proc list */
+ struct list_head commit_link; /* Link in commit queue */
+ struct work_struct work; /* Commit/relinq/withdraw work */
+ loff_t object_size; /* Size of the netfs object */
+ unsigned long unused_at; /* Time at which unused (jiffies) */
+ unsigned long flags;
+#define FSCACHE_COOKIE_RELINQUISHED 0 /* T if cookie has been relinquished */
+#define FSCACHE_COOKIE_RETIRED 1 /* T if this cookie has retired on relinq */
+#define FSCACHE_COOKIE_IS_CACHING 2 /* T if this cookie is cached */
+#define FSCACHE_COOKIE_NO_DATA_TO_READ 3 /* T if this cookie has nothing to read */
+#define FSCACHE_COOKIE_NEEDS_UPDATE 4 /* T if attrs have been updated */
+#define FSCACHE_COOKIE_HAS_BEEN_CACHED 5 /* T if cookie needs withdraw-on-relinq */
+#define FSCACHE_COOKIE_DISABLED 6 /* T if cookie has been disabled */
+#define FSCACHE_COOKIE_LOCAL_WRITE 7 /* T if cookie has been modified locally */
+#define FSCACHE_COOKIE_NO_ACCESS_WAKE 8 /* T if no wake when n_accesses goes 0 */
+#define FSCACHE_COOKIE_DO_RELINQUISH 9 /* T if this cookie needs relinquishment */
+#define FSCACHE_COOKIE_DO_WITHDRAW 10 /* T if this cookie needs withdrawing */
+#define FSCACHE_COOKIE_DO_LRU_DISCARD 11 /* T if this cookie needs LRU discard */
+#define FSCACHE_COOKIE_DO_PREP_TO_WRITE 12 /* T if cookie needs write preparation */
+#define FSCACHE_COOKIE_HAVE_DATA 13 /* T if this cookie has data stored */
+#define FSCACHE_COOKIE_IS_HASHED 14 /* T if this cookie is hashed */
+
+ enum fscache_cookie_state state;
+ u8 advice; /* FSCACHE_ADV_* */
+ u8 key_len; /* Length of index key */
+ u8 aux_len; /* Length of auxiliary data */
+ u32 key_hash; /* Hash of volume, key, len */
+ union {
+ void *key; /* Index key */
+ u8 inline_key[16]; /* - If the key is short enough */
+ };
+ union {
+ void *aux; /* Auxiliary data */
+ u8 inline_aux[8]; /* - If the aux data is short enough */
+ };
+};
+
/*
* slow-path functions for when there is actually caching available, and the
* netfs does actually have a valid token
@@ -66,6 +141,14 @@ extern struct fscache_volume *__fscache_acquire_volume(const char *, const char
const void *, size_t);
extern void __fscache_relinquish_volume(struct fscache_volume *, const void *, bool);

+extern struct fscache_cookie *__fscache_acquire_cookie(
+ struct fscache_volume *,
+ u8,
+ const void *, size_t,
+ const void *, size_t,
+ loff_t);
+extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
+
/**
* fscache_acquire_volume - Register a volume as desiring caching services
* @volume_key: An identification string for the volume
@@ -113,4 +196,55 @@ void fscache_relinquish_volume(struct fscache_volume *volume,
__fscache_relinquish_volume(volume, coherency_data, invalidate);
}

+/**
+ * fscache_acquire_cookie - Acquire a cookie to represent a cache object
+ * @volume: The volume in which to locate/create this cookie
+ * @advice: Advice flags (FSCACHE_COOKIE_ADV_*)
+ * @index_key: The index key for this cookie
+ * @index_key_len: Size of the index key
+ * @aux_data: The auxiliary data for the cookie (may be NULL)
+ * @aux_data_len: Size of the auxiliary data buffer
+ * @object_size: The initial size of object
+ *
+ * Acquire a cookie to represent a data file within the given cache volume.
+ *
+ * See Documentation/filesystems/caching/netfs-api.rst for a complete
+ * description.
+ */
+static inline
+struct fscache_cookie *fscache_acquire_cookie(struct fscache_volume *volume,
+ u8 advice,
+ const void *index_key,
+ size_t index_key_len,
+ const void *aux_data,
+ size_t aux_data_len,
+ loff_t object_size)
+{
+ if (!fscache_volume_valid(volume))
+ return NULL;
+ return __fscache_acquire_cookie(volume, advice,
+ index_key, index_key_len,
+ aux_data, aux_data_len,
+ object_size);
+}
+
+/**
+ * fscache_relinquish_cookie - Return the cookie to the cache, maybe discarding
+ * it
+ * @cookie: The cookie being returned
+ * @retire: True if the cache object the cookie represents is to be discarded
+ *
+ * This function returns a cookie to the cache, forcibly discarding the
+ * associated cache object if retire is set to true.
+ *
+ * See Documentation/filesystems/caching/netfs-api.rst for a complete
+ * description.
+ */
+static inline
+void fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_relinquish_cookie(cookie, retire);
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index eeb3e7d88e20..9286e1c4b2ac 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -45,6 +45,23 @@ enum fscache_volume_trace {
fscache_volume_see_hash_wake,
};

+enum fscache_cookie_trace {
+ fscache_cookie_collision,
+ fscache_cookie_discard,
+ fscache_cookie_get_end_access,
+ fscache_cookie_get_hash_collision,
+ fscache_cookie_new_acquire,
+ fscache_cookie_put_hash_collision,
+ fscache_cookie_put_over_queued,
+ fscache_cookie_put_relinquish,
+ fscache_cookie_put_withdrawn,
+ fscache_cookie_put_work,
+ fscache_cookie_see_active,
+ fscache_cookie_see_relinquish,
+ fscache_cookie_see_withdraw,
+ fscache_cookie_see_work,
+};
+
#endif

/*
@@ -74,6 +91,22 @@ enum fscache_volume_trace {
EM(fscache_volume_see_create_work, "SEE creat") \
E_(fscache_volume_see_hash_wake, "SEE hwake")

+#define fscache_cookie_traces \
+ EM(fscache_cookie_collision, "*COLLIDE*") \
+ EM(fscache_cookie_discard, "DISCARD ") \
+ EM(fscache_cookie_get_hash_collision, "GET hcoll") \
+ EM(fscache_cookie_get_end_access, "GQ endac") \
+ EM(fscache_cookie_new_acquire, "NEW acq ") \
+ EM(fscache_cookie_put_hash_collision, "PUT hcoll") \
+ EM(fscache_cookie_put_over_queued, "PQ overq") \
+ EM(fscache_cookie_put_relinquish, "PUT relnq") \
+ EM(fscache_cookie_put_withdrawn, "PUT wthdn") \
+ EM(fscache_cookie_put_work, "PQ work ") \
+ EM(fscache_cookie_see_active, "- activ") \
+ EM(fscache_cookie_see_relinquish, "- x-rlq") \
+ EM(fscache_cookie_see_withdraw, "- x-wth") \
+ E_(fscache_cookie_see_work, "- work ")
+
/*
* Export enum symbols via userspace.
*/
@@ -84,6 +117,7 @@ enum fscache_volume_trace {

fscache_cache_traces;
fscache_volume_traces;
+fscache_cookie_traces;

/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -145,6 +179,83 @@ TRACE_EVENT(fscache_volume,
__entry->usage)
);

+TRACE_EVENT(fscache_cookie,
+ TP_PROTO(unsigned int cookie_debug_id,
+ int ref,
+ enum fscache_cookie_trace where),
+
+ TP_ARGS(cookie_debug_id, ref, where),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(int, ref )
+ __field(enum fscache_cookie_trace, where )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie_debug_id;
+ __entry->ref = ref;
+ __entry->where = where;
+ ),
+
+ TP_printk("c=%08x %s r=%d",
+ __entry->cookie,
+ __print_symbolic(__entry->where, fscache_cookie_traces),
+ __entry->ref)
+ );
+
+TRACE_EVENT(fscache_acquire,
+ TP_PROTO(struct fscache_cookie *cookie),
+
+ TP_ARGS(cookie),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(unsigned int, volume )
+ __field(int, v_ref )
+ __field(int, v_n_cookies )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie->debug_id;
+ __entry->volume = cookie->volume->debug_id;
+ __entry->v_ref = refcount_read(&cookie->volume->ref);
+ __entry->v_n_cookies = atomic_read(&cookie->volume->n_cookies);
+ ),
+
+ TP_printk("c=%08x V=%08x vr=%d vc=%d",
+ __entry->cookie,
+ __entry->volume, __entry->v_ref, __entry->v_n_cookies)
+ );
+
+TRACE_EVENT(fscache_relinquish,
+ TP_PROTO(struct fscache_cookie *cookie, bool retire),
+
+ TP_ARGS(cookie, retire),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(unsigned int, volume )
+ __field(int, ref )
+ __field(int, n_active )
+ __field(u8, flags )
+ __field(bool, retire )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie->debug_id;
+ __entry->volume = cookie->volume->debug_id;
+ __entry->ref = refcount_read(&cookie->ref);
+ __entry->n_active = atomic_read(&cookie->n_active);
+ __entry->flags = cookie->flags;
+ __entry->retire = retire;
+ ),
+
+ TP_printk("c=%08x V=%08x r=%d U=%d f=%02x rt=%u",
+ __entry->cookie, __entry->volume, __entry->ref,
+ __entry->n_active, __entry->flags, __entry->retire)
+ );
+
#endif /* _TRACE_FSCACHE_H */

/* This part must be outside protection */



2021-12-22 23:16:58

by David Howells

[permalink] [raw]
Subject: [PATCH v4 11/68] fscache: Implement cache-level access helpers

Add a pair of functions to pin/unpin a cache that we're wanting to do a
high-level access to (such as creating or removing a volume):

bool fscache_begin_cache_access(struct fscache_cache *cache,
enum fscache_access_trace why);
void fscache_end_cache_access(struct fscache_cache *cache,
enum fscache_access_trace why);

The way the access gate works/will work is:

(1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE),
then we return false to indicate access was not permitted.

(2) If the cache tests as live, then we increment the n_accesses count and
then recheck the liveness, ending the access if it ceased to be live.

(3) When we end the access, we decrement n_accesses and wake up the any
waiters if it reaches 0.

(4) Whilst the cache is caching, n_accesses is kept artificially
incremented to prevent wakeups from happening.

(5) When the cache is taken offline, the state is changed to prevent new
accesses, n_accesses is decremented and we wait for n_accesses to
become 0.

Note that some of this is implemented in a later patch.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819593239.215744.7537428720603638088.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906893368.143852.14164004598465617981.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967093977.1823006.6967886507023056409.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cache.c | 62 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/internal.h | 2 +
fs/fscache/main.c | 2 +
include/trace/events/fscache.h | 41 ++++++++++++++++++++++++++
4 files changed, 107 insertions(+)

diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 8db77bb9f8e2..e867cff53a70 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -216,6 +216,68 @@ void fscache_relinquish_cache(struct fscache_cache *cache)
}
EXPORT_SYMBOL(fscache_relinquish_cache);

+/**
+ * fscache_begin_cache_access - Pin a cache so it can be accessed
+ * @cache: The cache-level cookie
+ * @why: An indication of the circumstances of the access for tracing
+ *
+ * Attempt to pin the cache to prevent it from going away whilst we're
+ * accessing it and returns true if successful. This works as follows:
+ *
+ * (1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE),
+ * then we return false to indicate access was not permitted.
+ *
+ * (2) If the cache tests as live, then we increment the n_accesses count and
+ * then recheck the liveness, ending the access if it ceased to be live.
+ *
+ * (3) When we end the access, we decrement n_accesses and wake up the any
+ * waiters if it reaches 0.
+ *
+ * (4) Whilst the cache is caching, n_accesses is kept artificially
+ * incremented to prevent wakeups from happening.
+ *
+ * (5) When the cache is taken offline, the state is changed to prevent new
+ * accesses, n_accesses is decremented and we wait for n_accesses to
+ * become 0.
+ */
+bool fscache_begin_cache_access(struct fscache_cache *cache, enum fscache_access_trace why)
+{
+ int n_accesses;
+
+ if (!fscache_cache_is_live(cache))
+ return false;
+
+ n_accesses = atomic_inc_return(&cache->n_accesses);
+ smp_mb__after_atomic(); /* Reread live flag after n_accesses */
+ trace_fscache_access_cache(cache->debug_id, refcount_read(&cache->ref),
+ n_accesses, why);
+ if (!fscache_cache_is_live(cache)) {
+ fscache_end_cache_access(cache, fscache_access_unlive);
+ return false;
+ }
+ return true;
+}
+
+/**
+ * fscache_end_cache_access - Unpin a cache at the end of an access.
+ * @cache: The cache-level cookie
+ * @why: An indication of the circumstances of the access for tracing
+ *
+ * Unpin a cache after we've accessed it. The @why indicator is merely
+ * provided for tracing purposes.
+ */
+void fscache_end_cache_access(struct fscache_cache *cache, enum fscache_access_trace why)
+{
+ int n_accesses;
+
+ smp_mb__before_atomic();
+ n_accesses = atomic_dec_return(&cache->n_accesses);
+ trace_fscache_access_cache(cache->debug_id, refcount_read(&cache->ref),
+ n_accesses, why);
+ if (n_accesses == 0)
+ wake_up_var(&cache->n_accesses);
+}
+
#ifdef CONFIG_PROC_FS
static const char fscache_cache_states[NR__FSCACHE_CACHE_STATE] = "-PAEW";

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 71c897757d44..be29816b37ef 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -23,6 +23,8 @@
#ifdef CONFIG_PROC_FS
extern const struct seq_operations fscache_caches_seq_ops;
#endif
+bool fscache_begin_cache_access(struct fscache_cache *cache, enum fscache_access_trace why);
+void fscache_end_cache_access(struct fscache_cache *cache, enum fscache_access_trace why);
struct fscache_cache *fscache_lookup_cache(const char *name, bool is_cache);
void fscache_put_cache(struct fscache_cache *cache, enum fscache_cache_trace where);

diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index ae493e9ca1c9..876f4bee5840 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -21,6 +21,8 @@ module_param_named(debug, fscache_debug, uint,
MODULE_PARM_DESC(fscache_debug,
"FS-Cache debugging mask");

+EXPORT_TRACEPOINT_SYMBOL(fscache_access_cache);
+
struct workqueue_struct *fscache_wq;
EXPORT_SYMBOL(fscache_wq);

diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 9286e1c4b2ac..734966bc49e1 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -62,6 +62,12 @@ enum fscache_cookie_trace {
fscache_cookie_see_work,
};

+enum fscache_access_trace {
+ fscache_access_cache_pin,
+ fscache_access_cache_unpin,
+ fscache_access_unlive,
+};
+
#endif

/*
@@ -107,6 +113,11 @@ enum fscache_cookie_trace {
EM(fscache_cookie_see_withdraw, "- x-wth") \
E_(fscache_cookie_see_work, "- work ")

+#define fscache_access_traces \
+ EM(fscache_access_cache_pin, "PIN cache ") \
+ EM(fscache_access_cache_unpin, "UNPIN cache ") \
+ E_(fscache_access_unlive, "END unlive ")
+
/*
* Export enum symbols via userspace.
*/
@@ -118,6 +129,7 @@ enum fscache_cookie_trace {
fscache_cache_traces;
fscache_volume_traces;
fscache_cookie_traces;
+fscache_access_traces;

/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -204,6 +216,35 @@ TRACE_EVENT(fscache_cookie,
__entry->ref)
);

+TRACE_EVENT(fscache_access_cache,
+ TP_PROTO(unsigned int cache_debug_id,
+ int ref,
+ int n_accesses,
+ enum fscache_access_trace why),
+
+ TP_ARGS(cache_debug_id, ref, n_accesses, why),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cache )
+ __field(int, ref )
+ __field(int, n_accesses )
+ __field(enum fscache_access_trace, why )
+ ),
+
+ TP_fast_assign(
+ __entry->cache = cache_debug_id;
+ __entry->ref = ref;
+ __entry->n_accesses = n_accesses;
+ __entry->why = why;
+ ),
+
+ TP_printk("C=%08x %s r=%d a=%d",
+ __entry->cache,
+ __print_symbolic(__entry->why, fscache_access_traces),
+ __entry->ref,
+ __entry->n_accesses)
+ );
+
TRACE_EVENT(fscache_acquire,
TP_PROTO(struct fscache_cookie *cookie),




2021-12-22 23:17:13

by David Howells

[permalink] [raw]
Subject: [PATCH v4 12/68] fscache: Implement volume-level access helpers

Add a pair of helper functions to manage access to a volume, pinning the
volume in place for the duration to prevent cache withdrawal from removing
it:

bool fscache_begin_volume_access(struct fscache_volume *volume,
enum fscache_access_trace why);
void fscache_end_volume_access(struct fscache_volume *volume,
enum fscache_access_trace why);

The way the access gate on the volume works/will work is:

(1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE),
then we return false to indicate access was not permitted.

(2) If the cache tests as live, then we increment the volume's n_accesses
count and then recheck the cache liveness, ending the access if it
ceased to be live.

(3) When we end the access, we decrement the volume's n_accesses and wake
up the any waiters if it reaches 0.

(4) Whilst the cache is caching, the volume's n_accesses is kept
artificially incremented to prevent wakeups from happening.

(5) When the cache is taken offline, the state is changed to prevent new
accesses, the volume's n_accesses is decremented and we wait for it to
become 0.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819594158.215744.8285859817391683254.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906894315.143852.5454793807544710479.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967095028.1823006.9173132503876627466.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/internal.h | 3 +
fs/fscache/main.c | 1
fs/fscache/volume.c | 84 ++++++++++++++++++++++++++++++++++++++++
include/linux/fscache-cache.h | 4 ++
include/trace/events/fscache.h | 34 ++++++++++++++++
5 files changed, 126 insertions(+)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index be29816b37ef..91a4ea08ec0b 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -130,6 +130,9 @@ struct fscache_volume *fscache_get_volume(struct fscache_volume *volume,
enum fscache_volume_trace where);
void fscache_put_volume(struct fscache_volume *volume,
enum fscache_volume_trace where);
+bool fscache_begin_volume_access(struct fscache_volume *volume,
+ struct fscache_cookie *cookie,
+ enum fscache_access_trace why);
void fscache_create_volume(struct fscache_volume *volume, bool wait);


diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 876f4bee5840..6cab5d99ba4c 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -22,6 +22,7 @@ MODULE_PARM_DESC(fscache_debug,
"FS-Cache debugging mask");

EXPORT_TRACEPOINT_SYMBOL(fscache_access_cache);
+EXPORT_TRACEPOINT_SYMBOL(fscache_access_volume);

struct workqueue_struct *fscache_wq;
EXPORT_SYMBOL(fscache_wq);
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index 630894fefd02..20497f0f10bb 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -33,6 +33,90 @@ static void fscache_see_volume(struct fscache_volume *volume,
trace_fscache_volume(volume->debug_id, ref, where);
}

+/*
+ * Pin the cache behind a volume so that we can access it.
+ */
+static void __fscache_begin_volume_access(struct fscache_volume *volume,
+ struct fscache_cookie *cookie,
+ enum fscache_access_trace why)
+{
+ int n_accesses;
+
+ n_accesses = atomic_inc_return(&volume->n_accesses);
+ smp_mb__after_atomic();
+ trace_fscache_access_volume(volume->debug_id, cookie ? cookie->debug_id : 0,
+ refcount_read(&volume->ref),
+ n_accesses, why);
+}
+
+/**
+ * fscache_begin_volume_access - Pin a cache so a volume can be accessed
+ * @volume: The volume cookie
+ * @cookie: A datafile cookie for a tracing reference (or NULL)
+ * @why: An indication of the circumstances of the access for tracing
+ *
+ * Attempt to pin the cache to prevent it from going away whilst we're
+ * accessing a volume and returns true if successful. This works as follows:
+ *
+ * (1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE),
+ * then we return false to indicate access was not permitted.
+ *
+ * (2) If the cache tests as live, then we increment the volume's n_accesses
+ * count and then recheck the cache liveness, ending the access if it
+ * ceased to be live.
+ *
+ * (3) When we end the access, we decrement the volume's n_accesses and wake
+ * up the any waiters if it reaches 0.
+ *
+ * (4) Whilst the cache is caching, the volume's n_accesses is kept
+ * artificially incremented to prevent wakeups from happening.
+ *
+ * (5) When the cache is taken offline, the state is changed to prevent new
+ * accesses, the volume's n_accesses is decremented and we wait for it to
+ * become 0.
+ *
+ * The datafile @cookie and the @why indicator are merely provided for tracing
+ * purposes.
+ */
+bool fscache_begin_volume_access(struct fscache_volume *volume,
+ struct fscache_cookie *cookie,
+ enum fscache_access_trace why)
+{
+ if (!fscache_cache_is_live(volume->cache))
+ return false;
+ __fscache_begin_volume_access(volume, cookie, why);
+ if (!fscache_cache_is_live(volume->cache)) {
+ fscache_end_volume_access(volume, cookie, fscache_access_unlive);
+ return false;
+ }
+ return true;
+}
+
+/**
+ * fscache_end_volume_access - Unpin a cache at the end of an access.
+ * @volume: The volume cookie
+ * @cookie: A datafile cookie for a tracing reference (or NULL)
+ * @why: An indication of the circumstances of the access for tracing
+ *
+ * Unpin a cache volume after we've accessed it. The datafile @cookie and the
+ * @why indicator are merely provided for tracing purposes.
+ */
+void fscache_end_volume_access(struct fscache_volume *volume,
+ struct fscache_cookie *cookie,
+ enum fscache_access_trace why)
+{
+ int n_accesses;
+
+ smp_mb__before_atomic();
+ n_accesses = atomic_dec_return(&volume->n_accesses);
+ trace_fscache_access_volume(volume->debug_id, cookie ? cookie->debug_id : 0,
+ refcount_read(&volume->ref),
+ n_accesses, why);
+ if (n_accesses == 0)
+ wake_up_var(&volume->n_accesses);
+}
+EXPORT_SYMBOL(fscache_end_volume_access);
+
static bool fscache_volume_same(const struct fscache_volume *a,
const struct fscache_volume *b)
{
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index c4355b888c91..fbbd8a2afe12 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -53,6 +53,10 @@ extern struct rw_semaphore fscache_addremove_sem;
extern struct fscache_cache *fscache_acquire_cache(const char *name);
extern void fscache_relinquish_cache(struct fscache_cache *cache);

+extern void fscache_end_volume_access(struct fscache_volume *volume,
+ struct fscache_cookie *cookie,
+ enum fscache_access_trace why);
+
extern struct fscache_cookie *fscache_get_cookie(struct fscache_cookie *cookie,
enum fscache_cookie_trace where);
extern void fscache_put_cookie(struct fscache_cookie *cookie,
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 734966bc49e1..4f40cfa52469 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -43,6 +43,7 @@ enum fscache_volume_trace {
fscache_volume_put_relinquish,
fscache_volume_see_create_work,
fscache_volume_see_hash_wake,
+ fscache_volume_wait_create_work,
};

enum fscache_cookie_trace {
@@ -245,6 +246,39 @@ TRACE_EVENT(fscache_access_cache,
__entry->n_accesses)
);

+TRACE_EVENT(fscache_access_volume,
+ TP_PROTO(unsigned int volume_debug_id,
+ unsigned int cookie_debug_id,
+ int ref,
+ int n_accesses,
+ enum fscache_access_trace why),
+
+ TP_ARGS(volume_debug_id, cookie_debug_id, ref, n_accesses, why),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, volume )
+ __field(unsigned int, cookie )
+ __field(int, ref )
+ __field(int, n_accesses )
+ __field(enum fscache_access_trace, why )
+ ),
+
+ TP_fast_assign(
+ __entry->volume = volume_debug_id;
+ __entry->cookie = cookie_debug_id;
+ __entry->ref = ref;
+ __entry->n_accesses = n_accesses;
+ __entry->why = why;
+ ),
+
+ TP_printk("V=%08x c=%08x %s r=%d a=%d",
+ __entry->volume,
+ __entry->cookie,
+ __print_symbolic(__entry->why, fscache_access_traces),
+ __entry->ref,
+ __entry->n_accesses)
+ );
+
TRACE_EVENT(fscache_acquire,
TP_PROTO(struct fscache_cookie *cookie),




2021-12-22 23:17:37

by David Howells

[permalink] [raw]
Subject: [PATCH v4 13/68] fscache: Implement cookie-level access helpers

Add a number of helper functions to manage access to a cookie, pinning the
cache object in place for the duration to prevent cache withdrawal from
removing it:

(1) void fscache_init_access_gate(struct fscache_cookie *cookie);

This function initialises the access count when a cache binds to a
cookie. An extra ref is taken on the access count to prevent wakeups
while the cache is active. We're only interested in the wakeup when a
cookie is being withdrawn and we're waiting for it to quiesce - at
which point the counter will be decremented before the wait.

The FSCACHE_COOKIE_NACC_ELEVATED flag is set on the cookie to keep
track of the extra ref in order to handle a race between
relinquishment and withdrawal both trying to drop the extra ref.

(2) bool fscache_begin_cookie_access(struct fscache_cookie *cookie,
enum fscache_access_trace why);

This function attempts to begin access upon a cookie, pinning it in
place if it's cached. If successful, it returns true and leaves a the
access count incremented.

(3) void fscache_end_cookie_access(struct fscache_cookie *cookie,
enum fscache_access_trace why);

This function drops the access count obtained by (2), permitting
object withdrawal to take place when it reaches zero.

A tracepoint is provided to track changes to the access counter on a
cookie.

Changes
=======
ver #2:
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819595085.215744.1706073049250505427.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906895313.143852.10141619544149102193.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967095980.1823006.1133648159424418877.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cookie.c | 98 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/internal.h | 3 +
fs/fscache/main.c | 1
include/linux/fscache-cache.h | 2 +
include/trace/events/fscache.h | 29 ++++++++++++
5 files changed, 133 insertions(+)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 438b0098aa73..04d2127bd354 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -62,6 +62,104 @@ static void fscache_free_cookie(struct fscache_cookie *cookie)
kmem_cache_free(fscache_cookie_jar, cookie);
}

+/*
+ * Initialise the access gate on a cookie by setting a flag to prevent the
+ * state machine from being queued when the access counter transitions to 0.
+ * We're only interested in this when we withdraw caching services from the
+ * cookie.
+ */
+static void fscache_init_access_gate(struct fscache_cookie *cookie)
+{
+ int n_accesses;
+
+ n_accesses = atomic_read(&cookie->n_accesses);
+ trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref),
+ n_accesses, fscache_access_cache_pin);
+ set_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags);
+}
+
+/**
+ * fscache_end_cookie_access - Unpin a cache at the end of an access.
+ * @cookie: A data file cookie
+ * @why: An indication of the circumstances of the access for tracing
+ *
+ * Unpin a cache cookie after we've accessed it and bring a deferred
+ * relinquishment or withdrawal state into effect.
+ *
+ * The @why indicator is provided for tracing purposes.
+ */
+void fscache_end_cookie_access(struct fscache_cookie *cookie,
+ enum fscache_access_trace why)
+{
+ int n_accesses;
+
+ smp_mb__before_atomic();
+ n_accesses = atomic_dec_return(&cookie->n_accesses);
+ trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref),
+ n_accesses, why);
+ if (n_accesses == 0 &&
+ !test_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags)) {
+ // PLACEHOLDER: Need to poke the state machine
+ }
+}
+EXPORT_SYMBOL(fscache_end_cookie_access);
+
+/*
+ * Pin the cache behind a cookie so that we can access it.
+ */
+static void __fscache_begin_cookie_access(struct fscache_cookie *cookie,
+ enum fscache_access_trace why)
+{
+ int n_accesses;
+
+ n_accesses = atomic_inc_return(&cookie->n_accesses);
+ smp_mb__after_atomic(); /* (Future) read state after is-caching.
+ * Reread n_accesses after is-caching
+ */
+ trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref),
+ n_accesses, why);
+}
+
+/**
+ * fscache_begin_cookie_access - Pin a cache so data can be accessed
+ * @cookie: A data file cookie
+ * @why: An indication of the circumstances of the access for tracing
+ *
+ * Attempt to pin the cache to prevent it from going away whilst we're
+ * accessing data and returns true if successful. This works as follows:
+ *
+ * (1) If the cookie is not being cached (ie. FSCACHE_COOKIE_IS_CACHING is not
+ * set), we return false to indicate access was not permitted.
+ *
+ * (2) If the cookie is being cached, we increment its n_accesses count and
+ * then recheck the IS_CACHING flag, ending the access if it got cleared.
+ *
+ * (3) When we end the access, we decrement the cookie's n_accesses and wake
+ * up the any waiters if it reaches 0.
+ *
+ * (4) Whilst the cookie is actively being cached, its n_accesses is kept
+ * artificially incremented to prevent wakeups from happening.
+ *
+ * (5) When the cache is taken offline or if the cookie is culled, the flag is
+ * cleared to prevent new accesses, the cookie's n_accesses is decremented
+ * and we wait for it to become 0.
+ *
+ * The @why indicator are merely provided for tracing purposes.
+ */
+bool fscache_begin_cookie_access(struct fscache_cookie *cookie,
+ enum fscache_access_trace why)
+{
+ if (!test_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags))
+ return false;
+ __fscache_begin_cookie_access(cookie, why);
+ if (!test_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags) ||
+ !fscache_cache_is_live(cookie->volume->cache)) {
+ fscache_end_cookie_access(cookie, fscache_access_unlive);
+ return false;
+ }
+ return true;
+}
+
static inline void wake_up_cookie_state(struct fscache_cookie *cookie)
{
/* Use a barrier to ensure that waiters see the state variable
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 91a4ea08ec0b..e0d8ef212e82 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -59,6 +59,9 @@ extern struct kmem_cache *fscache_cookie_jar;
extern const struct seq_operations fscache_cookies_seq_ops;

extern void fscache_print_cookie(struct fscache_cookie *cookie, char prefix);
+extern bool fscache_begin_cookie_access(struct fscache_cookie *cookie,
+ enum fscache_access_trace why);
+
static inline void fscache_see_cookie(struct fscache_cookie *cookie,
enum fscache_cookie_trace where)
{
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 6cab5d99ba4c..dad85fd84f6f 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -23,6 +23,7 @@ MODULE_PARM_DESC(fscache_debug,

EXPORT_TRACEPOINT_SYMBOL(fscache_access_cache);
EXPORT_TRACEPOINT_SYMBOL(fscache_access_volume);
+EXPORT_TRACEPOINT_SYMBOL(fscache_access);

struct workqueue_struct *fscache_wq;
EXPORT_SYMBOL(fscache_wq);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index fbbd8a2afe12..66624407ba84 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -61,6 +61,8 @@ extern struct fscache_cookie *fscache_get_cookie(struct fscache_cookie *cookie,
enum fscache_cookie_trace where);
extern void fscache_put_cookie(struct fscache_cookie *cookie,
enum fscache_cookie_trace where);
+extern void fscache_end_cookie_access(struct fscache_cookie *cookie,
+ enum fscache_access_trace why);
extern void fscache_set_cookie_state(struct fscache_cookie *cookie,
enum fscache_cookie_state state);

diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 4f40cfa52469..b1a962adfd16 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -279,6 +279,35 @@ TRACE_EVENT(fscache_access_volume,
__entry->n_accesses)
);

+TRACE_EVENT(fscache_access,
+ TP_PROTO(unsigned int cookie_debug_id,
+ int ref,
+ int n_accesses,
+ enum fscache_access_trace why),
+
+ TP_ARGS(cookie_debug_id, ref, n_accesses, why),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(int, ref )
+ __field(int, n_accesses )
+ __field(enum fscache_access_trace, why )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie_debug_id;
+ __entry->ref = ref;
+ __entry->n_accesses = n_accesses;
+ __entry->why = why;
+ ),
+
+ TP_printk("c=%08x %s r=%d a=%d",
+ __entry->cookie,
+ __print_symbolic(__entry->why, fscache_access_traces),
+ __entry->ref,
+ __entry->n_accesses)
+ );
+
TRACE_EVENT(fscache_acquire,
TP_PROTO(struct fscache_cookie *cookie),




2021-12-22 23:18:00

by David Howells

[permalink] [raw]
Subject: [PATCH v4 14/68] fscache: Implement functions add/remove a cache

Implement functions to allow the cache backend to add or remove a cache:

(1) Declare a cache to be live:

int fscache_add_cache(struct fscache_cache *cache,
const struct fscache_cache_ops *ops,
void *cache_priv);

Take a previously acquired cache cookie, set the operations table and
private data and mark the cache open for access.

(2) Withdraw a cache from service:

void fscache_withdraw_cache(struct fscache_cache *cache);

This marks the cache as withdrawn and thus prevents further
cache-level and volume-level accesses.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819596022.215744.8799712491432238827.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906896599.143852.17049208999019262884.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967097870.1823006.3470041000971522030.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cache.c | 70 +++++++++++++++++++++++++++++++++++++++++
include/linux/fscache-cache.h | 13 ++++++++
2 files changed, 83 insertions(+)

diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index e867cff53a70..bbd102be91c4 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -210,12 +210,55 @@ void fscache_relinquish_cache(struct fscache_cache *cache)
fscache_cache_put_prep_failed :
fscache_cache_put_relinquish;

+ cache->ops = NULL;
cache->cache_priv = NULL;
smp_store_release(&cache->state, FSCACHE_CACHE_IS_NOT_PRESENT);
fscache_put_cache(cache, where);
}
EXPORT_SYMBOL(fscache_relinquish_cache);

+/**
+ * fscache_add_cache - Declare a cache as being open for business
+ * @cache: The cache-level cookie representing the cache
+ * @ops: Table of cache operations to use
+ * @cache_priv: Private data for the cache record
+ *
+ * Add a cache to the system, making it available for netfs's to use.
+ *
+ * See Documentation/filesystems/caching/backend-api.rst for a complete
+ * description.
+ */
+int fscache_add_cache(struct fscache_cache *cache,
+ const struct fscache_cache_ops *ops,
+ void *cache_priv)
+{
+ int n_accesses;
+
+ _enter("{%s,%s}", ops->name, cache->name);
+
+ BUG_ON(fscache_cache_state(cache) != FSCACHE_CACHE_IS_PREPARING);
+
+ /* Get a ref on the cache cookie and keep its n_accesses counter raised
+ * by 1 to prevent wakeups from transitioning it to 0 until we're
+ * withdrawing caching services from it.
+ */
+ n_accesses = atomic_inc_return(&cache->n_accesses);
+ trace_fscache_access_cache(cache->debug_id, refcount_read(&cache->ref),
+ n_accesses, fscache_access_cache_pin);
+
+ down_write(&fscache_addremove_sem);
+
+ cache->ops = ops;
+ cache->cache_priv = cache_priv;
+ fscache_set_cache_state(cache, FSCACHE_CACHE_IS_ACTIVE);
+
+ up_write(&fscache_addremove_sem);
+ pr_notice("Cache \"%s\" added (type %s)\n", cache->name, ops->name);
+ _leave(" = 0 [%s]", cache->name);
+ return 0;
+}
+EXPORT_SYMBOL(fscache_add_cache);
+
/**
* fscache_begin_cache_access - Pin a cache so it can be accessed
* @cache: The cache-level cookie
@@ -278,6 +321,33 @@ void fscache_end_cache_access(struct fscache_cache *cache, enum fscache_access_t
wake_up_var(&cache->n_accesses);
}

+/**
+ * fscache_withdraw_cache - Withdraw a cache from the active service
+ * @cache: The cache cookie
+ *
+ * Begin the process of withdrawing a cache from service. This stops new
+ * cache-level and volume-level accesses from taking place and waits for
+ * currently ongoing cache-level accesses to end.
+ */
+void fscache_withdraw_cache(struct fscache_cache *cache)
+{
+ int n_accesses;
+
+ pr_notice("Withdrawing cache \"%s\" (%u objs)\n",
+ cache->name, atomic_read(&cache->object_count));
+
+ fscache_set_cache_state(cache, FSCACHE_CACHE_IS_WITHDRAWN);
+
+ /* Allow wakeups on dec-to-0 */
+ n_accesses = atomic_dec_return(&cache->n_accesses);
+ trace_fscache_access_cache(cache->debug_id, refcount_read(&cache->ref),
+ n_accesses, fscache_access_cache_unpin);
+
+ wait_var_event(&cache->n_accesses,
+ atomic_read(&cache->n_accesses) == 0);
+}
+EXPORT_SYMBOL(fscache_withdraw_cache);
+
#ifdef CONFIG_PROC_FS
static const char fscache_cache_states[NR__FSCACHE_CACHE_STATE] = "-PAEW";

diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 66624407ba84..f78add6e7823 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -33,6 +33,7 @@ enum fscache_cache_state {
* Cache cookie.
*/
struct fscache_cache {
+ const struct fscache_cache_ops *ops;
struct list_head cache_link; /* Link in cache list */
void *cache_priv; /* Private cache data (or NULL) */
refcount_t ref;
@@ -44,6 +45,14 @@ struct fscache_cache {
char *name;
};

+/*
+ * cache operations
+ */
+struct fscache_cache_ops {
+ /* name of cache provider */
+ const char *name;
+};
+
extern struct workqueue_struct *fscache_wq;

/*
@@ -52,6 +61,10 @@ extern struct workqueue_struct *fscache_wq;
extern struct rw_semaphore fscache_addremove_sem;
extern struct fscache_cache *fscache_acquire_cache(const char *name);
extern void fscache_relinquish_cache(struct fscache_cache *cache);
+extern int fscache_add_cache(struct fscache_cache *cache,
+ const struct fscache_cache_ops *ops,
+ void *cache_priv);
+extern void fscache_withdraw_cache(struct fscache_cache *cache);

extern void fscache_end_volume_access(struct fscache_volume *volume,
struct fscache_cookie *cookie,



2021-12-22 23:18:11

by David Howells

[permalink] [raw]
Subject: [PATCH v4 15/68] fscache: Provide and use cache methods to lookup/create/free a volume

Add cache methods to lookup, create and remove a volume.

Looking up or creating the volume requires the cache pinning for access;
freeing the volume requires the volume pinning for access. The
->acquire_volume() method is used to ask the cache backend to lookup and,
if necessary, create a volume; the ->free_volume() method is used to free
the resources for a volume.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819597821.215744.5225318658134989949.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906898645.143852.8537799955945956818.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967099771.1823006.1455197910571061835.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/volume.c | 89 +++++++++++++++++++++++++++++++++++++++-
include/linux/fscache-cache.h | 7 +++
include/trace/events/fscache.h | 11 ++++-
3 files changed, 103 insertions(+), 4 deletions(-)

diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index 20497f0f10bb..e1a8e92a6adb 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -15,6 +15,8 @@ static struct hlist_bl_head fscache_volume_hash[1 << fscache_volume_hash_shift];
static atomic_t fscache_volume_debug_id;
static LIST_HEAD(fscache_volumes);

+static void fscache_create_volume_work(struct work_struct *work);
+
struct fscache_volume *fscache_get_volume(struct fscache_volume *volume,
enum fscache_volume_trace where)
{
@@ -213,7 +215,7 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,

volume->cache = cache;
INIT_LIST_HEAD(&volume->proc_link);
- INIT_WORK(&volume->work, NULL /* PLACEHOLDER */);
+ INIT_WORK(&volume->work, fscache_create_volume_work);
refcount_set(&volume->ref, 1);
spin_lock_init(&volume->lock);

@@ -249,6 +251,58 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
return NULL;
}

+/*
+ * Create a volume's representation on disk. Have a volume ref and a cache
+ * access we have to release.
+ */
+static void fscache_create_volume_work(struct work_struct *work)
+{
+ const struct fscache_cache_ops *ops;
+ struct fscache_volume *volume =
+ container_of(work, struct fscache_volume, work);
+
+ fscache_see_volume(volume, fscache_volume_see_create_work);
+
+ ops = volume->cache->ops;
+ if (ops->acquire_volume)
+ ops->acquire_volume(volume);
+ fscache_end_cache_access(volume->cache,
+ fscache_access_acquire_volume_end);
+
+ clear_bit_unlock(FSCACHE_VOLUME_CREATING, &volume->flags);
+ wake_up_bit(&volume->flags, FSCACHE_VOLUME_CREATING);
+ fscache_put_volume(volume, fscache_volume_put_create_work);
+}
+
+/*
+ * Dispatch a worker thread to create a volume's representation on disk.
+ */
+void fscache_create_volume(struct fscache_volume *volume, bool wait)
+{
+ if (test_and_set_bit(FSCACHE_VOLUME_CREATING, &volume->flags))
+ goto maybe_wait;
+ if (volume->cache_priv)
+ goto no_wait; /* We raced */
+ if (!fscache_begin_cache_access(volume->cache,
+ fscache_access_acquire_volume))
+ goto no_wait;
+
+ fscache_get_volume(volume, fscache_volume_get_create_work);
+ if (!schedule_work(&volume->work))
+ fscache_put_volume(volume, fscache_volume_put_create_work);
+
+maybe_wait:
+ if (wait) {
+ fscache_see_volume(volume, fscache_volume_wait_create_work);
+ wait_on_bit(&volume->flags, FSCACHE_VOLUME_CREATING,
+ TASK_UNINTERRUPTIBLE);
+ }
+ return;
+no_wait:
+ clear_bit_unlock(FSCACHE_VOLUME_CREATING, &volume->flags);
+ wake_up_bit(&volume->flags, FSCACHE_VOLUME_CREATING);
+}
+
/*
* Acquire a volume representation cookie and link it to a (proposed) cache.
*/
@@ -269,7 +323,7 @@ struct fscache_volume *__fscache_acquire_volume(const char *volume_key,
return ERR_PTR(-EBUSY);
}

- // PLACEHOLDER: Create the volume if we have a cache available
+ fscache_create_volume(volume, false);
return volume;
}
EXPORT_SYMBOL(__fscache_acquire_volume);
@@ -316,7 +370,12 @@ static void fscache_free_volume(struct fscache_volume *volume)
struct fscache_cache *cache = volume->cache;

if (volume->cache_priv) {
- // PLACEHOLDER: Detach any attached cache
+ __fscache_begin_volume_access(volume, NULL,
+ fscache_access_relinquish_volume);
+ if (volume->cache_priv)
+ cache->ops->free_volume(volume);
+ fscache_end_volume_access(volume, NULL,
+ fscache_access_relinquish_volume_end);
}

down_write(&fscache_addremove_sem);
@@ -369,6 +428,30 @@ void __fscache_relinquish_volume(struct fscache_volume *volume,
}
EXPORT_SYMBOL(__fscache_relinquish_volume);

+/**
+ * fscache_withdraw_volume - Withdraw a volume from being cached
+ * @volume: Volume cookie
+ *
+ * Withdraw a cache volume from service, waiting for all accesses to complete
+ * before returning.
+ */
+void fscache_withdraw_volume(struct fscache_volume *volume)
+{
+ int n_accesses;
+
+ _debug("withdraw V=%x", volume->debug_id);
+
+ /* Allow wakeups on dec-to-0 */
+ n_accesses = atomic_dec_return(&volume->n_accesses);
+ trace_fscache_access_volume(volume->debug_id, 0,
+ refcount_read(&volume->ref),
+ n_accesses, fscache_access_cache_unpin);
+
+ wait_var_event(&volume->n_accesses,
+ atomic_read(&volume->n_accesses) == 0);
+}
+EXPORT_SYMBOL(fscache_withdraw_volume);
+
#ifdef CONFIG_PROC_FS
/*
* Generate a list of volumes in /proc/fs/fscache/volumes
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index f78add6e7823..a10b66ca3544 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -51,6 +51,12 @@ struct fscache_cache {
struct fscache_cache_ops {
/* name of cache provider */
const char *name;
+
+ /* Acquire a volume */
+ void (*acquire_volume)(struct fscache_volume *volume);
+
+ /* Free the cache's data attached to a volume */
+ void (*free_volume)(struct fscache_volume *volume);
};

extern struct workqueue_struct *fscache_wq;
@@ -65,6 +71,7 @@ extern int fscache_add_cache(struct fscache_cache *cache,
const struct fscache_cache_ops *ops,
void *cache_priv);
extern void fscache_withdraw_cache(struct fscache_cache *cache);
+extern void fscache_withdraw_volume(struct fscache_volume *volume);

extern void fscache_end_volume_access(struct fscache_volume *volume,
struct fscache_cookie *cookie,
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index b1a962adfd16..1d576bd8112e 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -64,8 +64,12 @@ enum fscache_cookie_trace {
};

enum fscache_access_trace {
+ fscache_access_acquire_volume,
+ fscache_access_acquire_volume_end,
fscache_access_cache_pin,
fscache_access_cache_unpin,
+ fscache_access_relinquish_volume,
+ fscache_access_relinquish_volume_end,
fscache_access_unlive,
};

@@ -96,7 +100,8 @@ enum fscache_access_trace {
EM(fscache_volume_put_hash_collision, "PUT hcoll") \
EM(fscache_volume_put_relinquish, "PUT relnq") \
EM(fscache_volume_see_create_work, "SEE creat") \
- E_(fscache_volume_see_hash_wake, "SEE hwake")
+ EM(fscache_volume_see_hash_wake, "SEE hwake") \
+ E_(fscache_volume_wait_create_work, "WAIT crea")

#define fscache_cookie_traces \
EM(fscache_cookie_collision, "*COLLIDE*") \
@@ -115,8 +120,12 @@ enum fscache_access_trace {
E_(fscache_cookie_see_work, "- work ")

#define fscache_access_traces \
+ EM(fscache_access_acquire_volume, "BEGIN acq_vol") \
+ EM(fscache_access_acquire_volume_end, "END acq_vol") \
EM(fscache_access_cache_pin, "PIN cache ") \
EM(fscache_access_cache_unpin, "UNPIN cache ") \
+ EM(fscache_access_relinquish_volume, "BEGIN rlq_vol") \
+ EM(fscache_access_relinquish_volume_end,"END rlq_vol") \
E_(fscache_access_unlive, "END unlive ")

/*



2021-12-22 23:18:33

by David Howells

[permalink] [raw]
Subject: [PATCH v4 16/68] fscache: Add a function for a cache backend to note an I/O error

Add a function to the backend API to note an I/O error in a cache.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819598741.215744.891281275151382095.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906901316.143852.15225412215771586528.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967100721.1823006.16435671567428949398.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cache.c | 20 ++++++++++++++++++++
include/linux/fscache-cache.h | 2 ++
2 files changed, 22 insertions(+)

diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index bbd102be91c4..25eac61f1c29 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -321,6 +321,26 @@ void fscache_end_cache_access(struct fscache_cache *cache, enum fscache_access_t
wake_up_var(&cache->n_accesses);
}

+/**
+ * fscache_io_error - Note a cache I/O error
+ * @cache: The record describing the cache
+ *
+ * Note that an I/O error occurred in a cache and that it should no longer be
+ * used for anything. This also reports the error into the kernel log.
+ *
+ * See Documentation/filesystems/caching/backend-api.rst for a complete
+ * description.
+ */
+void fscache_io_error(struct fscache_cache *cache)
+{
+ if (fscache_set_cache_state_maybe(cache,
+ FSCACHE_CACHE_IS_ACTIVE,
+ FSCACHE_CACHE_GOT_IOERROR))
+ pr_err("Cache '%s' stopped due to I/O error\n",
+ cache->name);
+}
+EXPORT_SYMBOL(fscache_io_error);
+
/**
* fscache_withdraw_cache - Withdraw a cache from the active service
* @cache: The cache cookie
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index a10b66ca3544..936ef731bbc7 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -73,6 +73,8 @@ extern int fscache_add_cache(struct fscache_cache *cache,
extern void fscache_withdraw_cache(struct fscache_cache *cache);
extern void fscache_withdraw_volume(struct fscache_volume *volume);

+extern void fscache_io_error(struct fscache_cache *cache);
+
extern void fscache_end_volume_access(struct fscache_volume *volume,
struct fscache_cookie *cookie,
enum fscache_access_trace why);



2021-12-22 23:18:41

by David Howells

[permalink] [raw]
Subject: [PATCH v4 17/68] fscache: Implement simple cookie state machine

Implement a very simple cookie state machine to handle lookup,
invalidation, withdrawal, relinquishment and, to be added later, commit on
LRU discard.

Three cache methods are provided: ->lookup_cookie() to look up and, if
necessary, create a data storage object; ->withdraw_cookie() to free the
resources associated with that object and potentially delete it; and
->prepare_to_write(), to do prepare for changes to the cached data to be
modified locally.

Changes
=======
ver #3:
- Fix a race between LRU discard and relinquishment whereby the former
would override the latter and thus the latter would never happen[1].

ver #2:
- Don't hold n_accesses elevated whilst cache is bound to a cookie, but
rather add a flag that prevents the state machine from being queued when
n_accesses reaches 0.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/163819599657.215744.15799615296912341745.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906903925.143852.1805855338154353867.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967105456.1823006.14730395299835841776.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cookie.c | 313 +++++++++++++++++++++++++++++++++++-----
include/linux/fscache-cache.h | 27 +++
include/trace/events/fscache.h | 4 +
3 files changed, 300 insertions(+), 44 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 04d2127bd354..336046de08ee 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -15,7 +15,8 @@

struct kmem_cache *fscache_cookie_jar;

-static void fscache_drop_cookie(struct fscache_cookie *cookie);
+static void fscache_cookie_worker(struct work_struct *work);
+static void fscache_unhash_cookie(struct fscache_cookie *cookie);

#define fscache_cookie_hash_shift 15
static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
@@ -62,6 +63,19 @@ static void fscache_free_cookie(struct fscache_cookie *cookie)
kmem_cache_free(fscache_cookie_jar, cookie);
}

+static void __fscache_queue_cookie(struct fscache_cookie *cookie)
+{
+ if (!queue_work(fscache_wq, &cookie->work))
+ fscache_put_cookie(cookie, fscache_cookie_put_over_queued);
+}
+
+static void fscache_queue_cookie(struct fscache_cookie *cookie,
+ enum fscache_cookie_trace where)
+{
+ fscache_get_cookie(cookie, where);
+ __fscache_queue_cookie(cookie);
+}
+
/*
* Initialise the access gate on a cookie by setting a flag to prevent the
* state machine from being queued when the access counter transitions to 0.
@@ -98,9 +112,8 @@ void fscache_end_cookie_access(struct fscache_cookie *cookie,
trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref),
n_accesses, why);
if (n_accesses == 0 &&
- !test_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags)) {
- // PLACEHOLDER: Need to poke the state machine
- }
+ !test_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags))
+ fscache_queue_cookie(cookie, fscache_cookie_get_end_access);
}
EXPORT_SYMBOL(fscache_end_cookie_access);

@@ -171,35 +184,58 @@ static inline void wake_up_cookie_state(struct fscache_cookie *cookie)
wake_up_var(&cookie->state);
}

+/*
+ * Change the state a cookie is at and wake up anyone waiting for that. Impose
+ * an ordering between the stuff stored in the cookie and the state member.
+ * Paired with fscache_cookie_state().
+ */
static void __fscache_set_cookie_state(struct fscache_cookie *cookie,
enum fscache_cookie_state state)
{
- cookie->state = state;
+ smp_store_release(&cookie->state, state);
}

-/*
- * Change the state a cookie is at and wake up anyone waiting for that - but
- * only if the cookie isn't already marked as being in a cleanup state.
- */
-void fscache_set_cookie_state(struct fscache_cookie *cookie,
- enum fscache_cookie_state state)
+static void fscache_set_cookie_state(struct fscache_cookie *cookie,
+ enum fscache_cookie_state state)
{
- bool changed = false;
-
spin_lock(&cookie->lock);
- switch (cookie->state) {
- case FSCACHE_COOKIE_STATE_RELINQUISHING:
- break;
- default:
- __fscache_set_cookie_state(cookie, state);
- changed = true;
- break;
- }
+ __fscache_set_cookie_state(cookie, state);
spin_unlock(&cookie->lock);
- if (changed)
- wake_up_cookie_state(cookie);
+ wake_up_cookie_state(cookie);
+}
+
+/**
+ * fscache_cookie_lookup_negative - Note negative lookup
+ * @cookie: The cookie that was being looked up
+ *
+ * Note that some part of the metadata path in the cache doesn't exist and so
+ * we can release any waiting readers in the certain knowledge that there's
+ * nothing for them to actually read.
+ *
+ * This function uses no locking and must only be called from the state machine.
+ */
+void fscache_cookie_lookup_negative(struct fscache_cookie *cookie)
+{
+ set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_CREATING);
+}
+EXPORT_SYMBOL(fscache_cookie_lookup_negative);
+
+/**
+ * fscache_caching_failed - Report that a failure stopped caching on a cookie
+ * @cookie: The cookie that was affected
+ *
+ * Tell fscache that caching on a cookie needs to be stopped due to some sort
+ * of failure.
+ *
+ * This function uses no locking and must only be called from the state machine.
+ */
+void fscache_caching_failed(struct fscache_cookie *cookie)
+{
+ clear_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags);
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_FAILED);
}
-EXPORT_SYMBOL(fscache_set_cookie_state);
+EXPORT_SYMBOL(fscache_caching_failed);

/*
* Set the index key in a cookie. The cookie struct has space for a 16-byte
@@ -291,10 +327,10 @@ static struct fscache_cookie *fscache_alloc_cookie(

refcount_set(&cookie->ref, 1);
cookie->debug_id = atomic_inc_return(&fscache_cookie_debug_id);
- cookie->state = FSCACHE_COOKIE_STATE_QUIESCENT;
spin_lock_init(&cookie->lock);
INIT_LIST_HEAD(&cookie->commit_link);
- INIT_WORK(&cookie->work, NULL /* PLACEHOLDER */);
+ INIT_WORK(&cookie->work, fscache_cookie_worker);
+ __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT);

write_lock(&fscache_cookies_lock);
list_add_tail(&cookie->proc_link, &fscache_cookies);
@@ -417,6 +453,192 @@ struct fscache_cookie *__fscache_acquire_cookie(
}
EXPORT_SYMBOL(__fscache_acquire_cookie);

+/*
+ * Prepare a cache object to be written to.
+ */
+static void fscache_prepare_to_write(struct fscache_cookie *cookie)
+{
+ cookie->volume->cache->ops->prepare_to_write(cookie);
+}
+
+/*
+ * Look up a cookie in the cache.
+ */
+static void fscache_perform_lookup(struct fscache_cookie *cookie)
+{
+ enum fscache_access_trace trace = fscache_access_lookup_cookie_end_failed;
+ bool need_withdraw = false;
+
+ _enter("");
+
+ if (!cookie->volume->cache_priv) {
+ fscache_create_volume(cookie->volume, true);
+ if (!cookie->volume->cache_priv) {
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT);
+ goto out;
+ }
+ }
+
+ if (!cookie->volume->cache->ops->lookup_cookie(cookie)) {
+ if (cookie->state != FSCACHE_COOKIE_STATE_FAILED)
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT);
+ need_withdraw = true;
+ _leave(" [fail]");
+ goto out;
+ }
+
+ fscache_see_cookie(cookie, fscache_cookie_see_active);
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_ACTIVE);
+ trace = fscache_access_lookup_cookie_end;
+
+out:
+ fscache_end_cookie_access(cookie, trace);
+ if (need_withdraw)
+ fscache_withdraw_cookie(cookie);
+ fscache_end_volume_access(cookie->volume, cookie, trace);
+}
+
+/*
+ * Perform work upon the cookie, such as committing its cache state,
+ * relinquishing it or withdrawing the backing cache. We're protected from the
+ * cache going away under us as object withdrawal must come through this
+ * non-reentrant work item.
+ */
+static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
+{
+ enum fscache_cookie_state state;
+ bool wake = false;
+
+ _enter("c=%x", cookie->debug_id);
+
+again:
+ spin_lock(&cookie->lock);
+again_locked:
+ state = cookie->state;
+ switch (state) {
+ case FSCACHE_COOKIE_STATE_QUIESCENT:
+ /* The QUIESCENT state is jumped to the LOOKING_UP state by
+ * fscache_use_cookie().
+ */
+
+ if (atomic_read(&cookie->n_accesses) == 0 &&
+ test_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags)) {
+ __fscache_set_cookie_state(cookie,
+ FSCACHE_COOKIE_STATE_RELINQUISHING);
+ wake = true;
+ goto again_locked;
+ }
+ break;
+
+ case FSCACHE_COOKIE_STATE_LOOKING_UP:
+ spin_unlock(&cookie->lock);
+ fscache_init_access_gate(cookie);
+ fscache_perform_lookup(cookie);
+ goto again;
+
+ case FSCACHE_COOKIE_STATE_ACTIVE:
+ if (test_and_clear_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags)) {
+ spin_unlock(&cookie->lock);
+ fscache_prepare_to_write(cookie);
+ spin_lock(&cookie->lock);
+ }
+ fallthrough;
+
+ case FSCACHE_COOKIE_STATE_FAILED:
+ if (atomic_read(&cookie->n_accesses) != 0)
+ break;
+ if (test_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags)) {
+ __fscache_set_cookie_state(cookie,
+ FSCACHE_COOKIE_STATE_RELINQUISHING);
+ wake = true;
+ goto again_locked;
+ }
+ if (test_bit(FSCACHE_COOKIE_DO_WITHDRAW, &cookie->flags)) {
+ __fscache_set_cookie_state(cookie,
+ FSCACHE_COOKIE_STATE_WITHDRAWING);
+ wake = true;
+ goto again_locked;
+ }
+ break;
+
+ case FSCACHE_COOKIE_STATE_RELINQUISHING:
+ case FSCACHE_COOKIE_STATE_WITHDRAWING:
+ if (cookie->cache_priv) {
+ spin_unlock(&cookie->lock);
+ cookie->volume->cache->ops->withdraw_cookie(cookie);
+ spin_lock(&cookie->lock);
+ }
+
+ switch (state) {
+ case FSCACHE_COOKIE_STATE_RELINQUISHING:
+ fscache_see_cookie(cookie, fscache_cookie_see_relinquish);
+ fscache_unhash_cookie(cookie);
+ __fscache_set_cookie_state(cookie,
+ FSCACHE_COOKIE_STATE_DROPPED);
+ wake = true;
+ goto out;
+ case FSCACHE_COOKIE_STATE_WITHDRAWING:
+ fscache_see_cookie(cookie, fscache_cookie_see_withdraw);
+ break;
+ default:
+ BUG();
+ }
+
+ clear_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &cookie->flags);
+ clear_bit(FSCACHE_COOKIE_DO_WITHDRAW, &cookie->flags);
+ clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags);
+ clear_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
+ set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+ __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_QUIESCENT);
+ wake = true;
+ goto again_locked;
+
+ case FSCACHE_COOKIE_STATE_DROPPED:
+ break;
+
+ default:
+ WARN_ONCE(1, "Cookie %x in unexpected state %u\n",
+ cookie->debug_id, state);
+ break;
+ }
+
+out:
+ spin_unlock(&cookie->lock);
+ if (wake)
+ wake_up_cookie_state(cookie);
+ _leave("");
+}
+
+static void fscache_cookie_worker(struct work_struct *work)
+{
+ struct fscache_cookie *cookie = container_of(work, struct fscache_cookie, work);
+
+ fscache_see_cookie(cookie, fscache_cookie_see_work);
+ fscache_cookie_state_machine(cookie);
+ fscache_put_cookie(cookie, fscache_cookie_put_work);
+}
+
+/*
+ * Wait for the object to become inactive. The cookie's work item will be
+ * scheduled when someone transitions n_accesses to 0 - but if someone's
+ * already done that, schedule it anyway.
+ */
+static void __fscache_withdraw_cookie(struct fscache_cookie *cookie)
+{
+ int n_accesses;
+ bool unpinned;
+
+ unpinned = test_and_clear_bit(FSCACHE_COOKIE_NO_ACCESS_WAKE, &cookie->flags);
+
+ /* Need to read the access count after unpinning */
+ n_accesses = atomic_read(&cookie->n_accesses);
+ if (unpinned)
+ trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref),
+ n_accesses, fscache_access_cache_unpin);
+ if (n_accesses == 0)
+ fscache_queue_cookie(cookie, fscache_cookie_get_end_access);
+}
+
/*
* Remove a cookie from the hash table.
*/
@@ -432,21 +654,27 @@ static void fscache_unhash_cookie(struct fscache_cookie *cookie)
hlist_bl_del(&cookie->hash_link);
clear_bit(FSCACHE_COOKIE_IS_HASHED, &cookie->flags);
hlist_bl_unlock(h);
+ fscache_stat(&fscache_n_relinquishes_dropped);
}

-/*
- * Finalise a cookie after all its resources have been disposed of.
- */
-static void fscache_drop_cookie(struct fscache_cookie *cookie)
+static void fscache_drop_withdraw_cookie(struct fscache_cookie *cookie)
{
- spin_lock(&cookie->lock);
- __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_DROPPED);
- spin_unlock(&cookie->lock);
- wake_up_cookie_state(cookie);
+ __fscache_withdraw_cookie(cookie);
+}

- fscache_unhash_cookie(cookie);
- fscache_stat(&fscache_n_relinquishes_dropped);
+/**
+ * fscache_withdraw_cookie - Mark a cookie for withdrawal
+ * @cookie: The cookie to be withdrawn.
+ *
+ * Allow the cache backend to withdraw the backing for a cookie for its own
+ * reasons, even if that cookie is in active use.
+ */
+void fscache_withdraw_cookie(struct fscache_cookie *cookie)
+{
+ set_bit(FSCACHE_COOKIE_DO_WITHDRAW, &cookie->flags);
+ fscache_drop_withdraw_cookie(cookie);
}
+EXPORT_SYMBOL(fscache_withdraw_cookie);

/*
* Allow the netfs to release a cookie back to the cache.
@@ -473,12 +701,13 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
ASSERTCMP(atomic_read(&cookie->volume->n_cookies), >, 0);
atomic_dec(&cookie->volume->n_cookies);

- set_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags);
-
- if (test_bit(FSCACHE_COOKIE_HAS_BEEN_CACHED, &cookie->flags))
- ; // PLACEHOLDER: Do something here if the cookie was cached
- else
- fscache_drop_cookie(cookie);
+ if (test_bit(FSCACHE_COOKIE_HAS_BEEN_CACHED, &cookie->flags)) {
+ set_bit(FSCACHE_COOKIE_DO_RELINQUISH, &cookie->flags);
+ fscache_drop_withdraw_cookie(cookie);
+ } else {
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_DROPPED);
+ fscache_unhash_cookie(cookie);
+ }
fscache_put_cookie(cookie, fscache_cookie_put_relinquish);
}
EXPORT_SYMBOL(__fscache_relinquish_cookie);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 936ef731bbc7..ae6a75976450 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -57,6 +57,15 @@ struct fscache_cache_ops {

/* Free the cache's data attached to a volume */
void (*free_volume)(struct fscache_volume *volume);
+
+ /* Look up a cookie in the cache */
+ bool (*lookup_cookie)(struct fscache_cookie *cookie);
+
+ /* Withdraw an object without any cookie access counts held */
+ void (*withdraw_cookie)(struct fscache_cookie *cookie);
+
+ /* Prepare to write to a live cache object */
+ void (*prepare_to_write)(struct fscache_cookie *cookie);
};

extern struct workqueue_struct *fscache_wq;
@@ -72,6 +81,7 @@ extern int fscache_add_cache(struct fscache_cache *cache,
void *cache_priv);
extern void fscache_withdraw_cache(struct fscache_cache *cache);
extern void fscache_withdraw_volume(struct fscache_volume *volume);
+extern void fscache_withdraw_cookie(struct fscache_cookie *cookie);

extern void fscache_io_error(struct fscache_cache *cache);

@@ -85,8 +95,21 @@ extern void fscache_put_cookie(struct fscache_cookie *cookie,
enum fscache_cookie_trace where);
extern void fscache_end_cookie_access(struct fscache_cookie *cookie,
enum fscache_access_trace why);
-extern void fscache_set_cookie_state(struct fscache_cookie *cookie,
- enum fscache_cookie_state state);
+extern void fscache_cookie_lookup_negative(struct fscache_cookie *cookie);
+extern void fscache_caching_failed(struct fscache_cookie *cookie);
+
+/**
+ * fscache_cookie_state - Read the state of a cookie
+ * @cookie: The cookie to query
+ *
+ * Get the state of a cookie, imposing an ordering between the cookie contents
+ * and the state value. Paired with fscache_set_cookie_state().
+ */
+static inline
+enum fscache_cookie_state fscache_cookie_state(struct fscache_cookie *cookie)
+{
+ return smp_load_acquire(&cookie->state);
+}

/**
* fscache_get_key - Get a pointer to the cookie key
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 1d576bd8112e..030c97bb9c8b 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -68,6 +68,8 @@ enum fscache_access_trace {
fscache_access_acquire_volume_end,
fscache_access_cache_pin,
fscache_access_cache_unpin,
+ fscache_access_lookup_cookie_end,
+ fscache_access_lookup_cookie_end_failed,
fscache_access_relinquish_volume,
fscache_access_relinquish_volume_end,
fscache_access_unlive,
@@ -124,6 +126,8 @@ enum fscache_access_trace {
EM(fscache_access_acquire_volume_end, "END acq_vol") \
EM(fscache_access_cache_pin, "PIN cache ") \
EM(fscache_access_cache_unpin, "UNPIN cache ") \
+ EM(fscache_access_lookup_cookie_end, "END lookup ") \
+ EM(fscache_access_lookup_cookie_end_failed,"END lookupf") \
EM(fscache_access_relinquish_volume, "BEGIN rlq_vol") \
EM(fscache_access_relinquish_volume_end,"END rlq_vol") \
E_(fscache_access_unlive, "END unlive ")



2021-12-22 23:19:01

by David Howells

[permalink] [raw]
Subject: [PATCH v4 18/68] fscache: Implement cookie user counting and resource pinning

Provide a pair of functions to count the number of users of a cookie (open
files, writeback, invalidation, resizing, reads, writes), to obtain and pin
resources for the cookie and to prevent culling for the whilst there are
users.

The first function marks a cookie as being in use:

void fscache_use_cookie(struct fscache_cookie *cookie,
bool will_modify);

The caller should indicate the cookie to use and whether or not the caller
is in a context that may modify the cookie (e.g. a file open O_RDWR).

If the cookie is not already resourced, fscache will ask the cache backend
in the background to do whatever it needs to look up, create or otherwise
obtain the resources necessary to access data. This is pinned to the
cookie and may not be culled, though it may be withdrawn if the cache as a
whole is withdrawn.

The second function removes the in-use mark from a cookie and, optionally,
updates the coherency data:

void fscache_unuse_cookie(struct fscache_cookie *cookie,
const void *aux_data,
const loff_t *object_size);

If non-NULL, the aux_data buffer and/or the object_size will be saved into
the cookie and will be set on the backing store when the object is
committed.

If this removes the last usage on a cookie, the cookie is placed onto an
LRU list from which it will be removed and closed after a couple of seconds
if it doesn't get reused. This prevents resource overload in the cache -
in particular it prevents it from holding too many files open.

Changes
=======
ver #2:
- Fix fscache_unuse_cookie() to use atomic_dec_and_lock() to avoid a
potential race if the cookie gets reused before it completes the
unusement.
- Added missing transition to LRU_DISCARDING state.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819600612.215744.13678350304176542741.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906907567.143852.16979631199380722019.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967106467.1823006.6790864931048582667.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cookie.c | 218 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/internal.h | 5 +
fs/fscache/stats.c | 12 ++
include/linux/fscache.h | 82 +++++++++++++++
include/trace/events/fscache.h | 12 ++
5 files changed, 327 insertions(+), 2 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 336046de08ee..2f5ee717f2bb 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -15,6 +15,8 @@

struct kmem_cache *fscache_cookie_jar;

+static void fscache_cookie_lru_timed_out(struct timer_list *timer);
+static void fscache_cookie_lru_worker(struct work_struct *work);
static void fscache_cookie_worker(struct work_struct *work);
static void fscache_unhash_cookie(struct fscache_cookie *cookie);

@@ -22,7 +24,12 @@ static void fscache_unhash_cookie(struct fscache_cookie *cookie);
static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
static LIST_HEAD(fscache_cookies);
static DEFINE_RWLOCK(fscache_cookies_lock);
-static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAFWRD";
+static LIST_HEAD(fscache_cookie_lru);
+static DEFINE_SPINLOCK(fscache_cookie_lru_lock);
+DEFINE_TIMER(fscache_cookie_lru_timer, fscache_cookie_lru_timed_out);
+static DECLARE_WORK(fscache_cookie_lru_work, fscache_cookie_lru_worker);
+static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAFUWRD";
+unsigned int fscache_lru_cookie_timeout = 10 * HZ;

void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
{
@@ -47,6 +54,14 @@ void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)

static void fscache_free_cookie(struct fscache_cookie *cookie)
{
+ if (WARN_ON_ONCE(!list_empty(&cookie->commit_link))) {
+ spin_lock(&fscache_cookie_lru_lock);
+ list_del_init(&cookie->commit_link);
+ spin_unlock(&fscache_cookie_lru_lock);
+ fscache_stat_d(&fscache_n_cookies_lru);
+ fscache_stat(&fscache_n_cookies_lru_removed);
+ }
+
if (WARN_ON_ONCE(test_bit(FSCACHE_COOKIE_IS_HASHED, &cookie->flags))) {
fscache_print_cookie(cookie, 'F');
return;
@@ -498,6 +513,126 @@ static void fscache_perform_lookup(struct fscache_cookie *cookie)
fscache_end_volume_access(cookie->volume, cookie, trace);
}

+/*
+ * Begin the process of looking up a cookie. We offload the actual process to
+ * a worker thread.
+ */
+static bool fscache_begin_lookup(struct fscache_cookie *cookie, bool will_modify)
+{
+ if (will_modify) {
+ set_bit(FSCACHE_COOKIE_LOCAL_WRITE, &cookie->flags);
+ set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
+ }
+ if (!fscache_begin_volume_access(cookie->volume, cookie,
+ fscache_access_lookup_cookie))
+ return false;
+
+ __fscache_begin_cookie_access(cookie, fscache_access_lookup_cookie);
+ __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_LOOKING_UP);
+ set_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags);
+ set_bit(FSCACHE_COOKIE_HAS_BEEN_CACHED, &cookie->flags);
+ return true;
+}
+
+/*
+ * Start using the cookie for I/O. This prevents the backing object from being
+ * reaped by VM pressure.
+ */
+void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
+{
+ enum fscache_cookie_state state;
+ bool queue = false;
+
+ _enter("c=%08x", cookie->debug_id);
+
+ if (WARN(test_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags),
+ "Trying to use relinquished cookie\n"))
+ return;
+
+ spin_lock(&cookie->lock);
+
+ atomic_inc(&cookie->n_active);
+
+again:
+ state = fscache_cookie_state(cookie);
+ switch (state) {
+ case FSCACHE_COOKIE_STATE_QUIESCENT:
+ queue = fscache_begin_lookup(cookie, will_modify);
+ break;
+
+ case FSCACHE_COOKIE_STATE_LOOKING_UP:
+ case FSCACHE_COOKIE_STATE_CREATING:
+ if (will_modify)
+ set_bit(FSCACHE_COOKIE_LOCAL_WRITE, &cookie->flags);
+ break;
+ case FSCACHE_COOKIE_STATE_ACTIVE:
+ if (will_modify &&
+ !test_and_set_bit(FSCACHE_COOKIE_LOCAL_WRITE, &cookie->flags)) {
+ set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
+ queue = true;
+ }
+ break;
+
+ case FSCACHE_COOKIE_STATE_FAILED:
+ case FSCACHE_COOKIE_STATE_WITHDRAWING:
+ break;
+
+ case FSCACHE_COOKIE_STATE_LRU_DISCARDING:
+ spin_unlock(&cookie->lock);
+ wait_var_event(&cookie->state,
+ fscache_cookie_state(cookie) !=
+ FSCACHE_COOKIE_STATE_LRU_DISCARDING);
+ spin_lock(&cookie->lock);
+ goto again;
+
+ case FSCACHE_COOKIE_STATE_DROPPED:
+ case FSCACHE_COOKIE_STATE_RELINQUISHING:
+ WARN(1, "Can't use cookie in state %u\n", state);
+ break;
+ }
+
+ spin_unlock(&cookie->lock);
+ if (queue)
+ fscache_queue_cookie(cookie, fscache_cookie_get_use_work);
+ _leave("");
+}
+EXPORT_SYMBOL(__fscache_use_cookie);
+
+static void fscache_unuse_cookie_locked(struct fscache_cookie *cookie)
+{
+ clear_bit(FSCACHE_COOKIE_DISABLED, &cookie->flags);
+ if (!test_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags))
+ return;
+
+ cookie->unused_at = jiffies;
+ spin_lock(&fscache_cookie_lru_lock);
+ if (list_empty(&cookie->commit_link)) {
+ fscache_get_cookie(cookie, fscache_cookie_get_lru);
+ fscache_stat(&fscache_n_cookies_lru);
+ }
+ list_move_tail(&cookie->commit_link, &fscache_cookie_lru);
+
+ spin_unlock(&fscache_cookie_lru_lock);
+ timer_reduce(&fscache_cookie_lru_timer,
+ jiffies + fscache_lru_cookie_timeout);
+}
+
+/*
+ * Stop using the cookie for I/O.
+ */
+void __fscache_unuse_cookie(struct fscache_cookie *cookie,
+ const void *aux_data, const loff_t *object_size)
+{
+ if (aux_data || object_size)
+ __fscache_update_cookie(cookie, aux_data, object_size);
+
+ if (atomic_dec_and_lock(&cookie->n_active, &cookie->lock)) {
+ fscache_unuse_cookie_locked(cookie);
+ spin_unlock(&cookie->lock);
+ }
+}
+EXPORT_SYMBOL(__fscache_unuse_cookie);
+
/*
* Perform work upon the cookie, such as committing its cache state,
* relinquishing it or withdrawing the backing cache. We're protected from the
@@ -542,6 +677,12 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
fscache_prepare_to_write(cookie);
spin_lock(&cookie->lock);
}
+ if (test_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) {
+ __fscache_set_cookie_state(cookie,
+ FSCACHE_COOKIE_STATE_LRU_DISCARDING);
+ wake = true;
+ goto again_locked;
+ }
fallthrough;

case FSCACHE_COOKIE_STATE_FAILED:
@@ -561,6 +702,7 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
}
break;

+ case FSCACHE_COOKIE_STATE_LRU_DISCARDING:
case FSCACHE_COOKIE_STATE_RELINQUISHING:
case FSCACHE_COOKIE_STATE_WITHDRAWING:
if (cookie->cache_priv) {
@@ -577,6 +719,9 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
FSCACHE_COOKIE_STATE_DROPPED);
wake = true;
goto out;
+ case FSCACHE_COOKIE_STATE_LRU_DISCARDING:
+ fscache_see_cookie(cookie, fscache_cookie_see_lru_discard);
+ break;
case FSCACHE_COOKIE_STATE_WITHDRAWING:
fscache_see_cookie(cookie, fscache_cookie_see_withdraw);
break;
@@ -639,6 +784,76 @@ static void __fscache_withdraw_cookie(struct fscache_cookie *cookie)
fscache_queue_cookie(cookie, fscache_cookie_get_end_access);
}

+static void fscache_cookie_lru_do_one(struct fscache_cookie *cookie)
+{
+ fscache_see_cookie(cookie, fscache_cookie_see_lru_do_one);
+
+ spin_lock(&cookie->lock);
+ if (cookie->state != FSCACHE_COOKIE_STATE_ACTIVE ||
+ time_before(jiffies, cookie->unused_at + fscache_lru_cookie_timeout) ||
+ atomic_read(&cookie->n_active) > 0) {
+ spin_unlock(&cookie->lock);
+ fscache_stat(&fscache_n_cookies_lru_removed);
+ } else {
+ set_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags);
+ spin_unlock(&cookie->lock);
+ fscache_stat(&fscache_n_cookies_lru_expired);
+ _debug("lru c=%x", cookie->debug_id);
+ __fscache_withdraw_cookie(cookie);
+ }
+
+ fscache_put_cookie(cookie, fscache_cookie_put_lru);
+}
+
+static void fscache_cookie_lru_worker(struct work_struct *work)
+{
+ struct fscache_cookie *cookie;
+ unsigned long unused_at;
+
+ spin_lock(&fscache_cookie_lru_lock);
+
+ while (!list_empty(&fscache_cookie_lru)) {
+ cookie = list_first_entry(&fscache_cookie_lru,
+ struct fscache_cookie, commit_link);
+ unused_at = cookie->unused_at + fscache_lru_cookie_timeout;
+ if (time_before(jiffies, unused_at)) {
+ timer_reduce(&fscache_cookie_lru_timer, unused_at);
+ break;
+ }
+
+ list_del_init(&cookie->commit_link);
+ fscache_stat_d(&fscache_n_cookies_lru);
+ spin_unlock(&fscache_cookie_lru_lock);
+ fscache_cookie_lru_do_one(cookie);
+ spin_lock(&fscache_cookie_lru_lock);
+ }
+
+ spin_unlock(&fscache_cookie_lru_lock);
+}
+
+static void fscache_cookie_lru_timed_out(struct timer_list *timer)
+{
+ queue_work(fscache_wq, &fscache_cookie_lru_work);
+}
+
+static void fscache_cookie_drop_from_lru(struct fscache_cookie *cookie)
+{
+ bool need_put = false;
+
+ if (!list_empty(&cookie->commit_link)) {
+ spin_lock(&fscache_cookie_lru_lock);
+ if (!list_empty(&cookie->commit_link)) {
+ list_del_init(&cookie->commit_link);
+ fscache_stat_d(&fscache_n_cookies_lru);
+ fscache_stat(&fscache_n_cookies_lru_dropped);
+ need_put = true;
+ }
+ spin_unlock(&fscache_cookie_lru_lock);
+ if (need_put)
+ fscache_put_cookie(cookie, fscache_cookie_put_lru);
+ }
+}
+
/*
* Remove a cookie from the hash table.
*/
@@ -659,6 +874,7 @@ static void fscache_unhash_cookie(struct fscache_cookie *cookie)

static void fscache_drop_withdraw_cookie(struct fscache_cookie *cookie)
{
+ fscache_cookie_drop_from_lru(cookie);
__fscache_withdraw_cookie(cookie);
}

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index e0d8ef212e82..ca938e00eaa0 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -57,6 +57,7 @@ static inline bool fscache_set_cache_state_maybe(struct fscache_cache *cache,
*/
extern struct kmem_cache *fscache_cookie_jar;
extern const struct seq_operations fscache_cookies_seq_ops;
+extern struct timer_list fscache_cookie_lru_timer;

extern void fscache_print_cookie(struct fscache_cookie *cookie, char prefix);
extern bool fscache_begin_cookie_access(struct fscache_cookie *cookie,
@@ -95,6 +96,10 @@ extern atomic_t fscache_n_volumes;
extern atomic_t fscache_n_volumes_collision;
extern atomic_t fscache_n_volumes_nomem;
extern atomic_t fscache_n_cookies;
+extern atomic_t fscache_n_cookies_lru;
+extern atomic_t fscache_n_cookies_lru_expired;
+extern atomic_t fscache_n_cookies_lru_removed;
+extern atomic_t fscache_n_cookies_lru_dropped;

extern atomic_t fscache_n_acquires;
extern atomic_t fscache_n_acquires_ok;
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 252e883ae148..5aa4bd9fe207 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -17,6 +17,10 @@ atomic_t fscache_n_volumes;
atomic_t fscache_n_volumes_collision;
atomic_t fscache_n_volumes_nomem;
atomic_t fscache_n_cookies;
+atomic_t fscache_n_cookies_lru;
+atomic_t fscache_n_cookies_lru_expired;
+atomic_t fscache_n_cookies_lru_removed;
+atomic_t fscache_n_cookies_lru_dropped;

atomic_t fscache_n_acquires;
atomic_t fscache_n_acquires_ok;
@@ -47,6 +51,14 @@ int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_acquires_ok),
atomic_read(&fscache_n_acquires_oom));

+ seq_printf(m, "LRU : n=%u exp=%u rmv=%u drp=%u at=%ld\n",
+ atomic_read(&fscache_n_cookies_lru),
+ atomic_read(&fscache_n_cookies_lru_expired),
+ atomic_read(&fscache_n_cookies_lru_removed),
+ atomic_read(&fscache_n_cookies_lru_dropped),
+ timer_pending(&fscache_cookie_lru_timer) ?
+ fscache_cookie_lru_timer.expires - jiffies : 0);
+
seq_printf(m, "Updates: n=%u\n",
atomic_read(&fscache_n_updates));

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 4450d17c11e8..e6c321e5bf73 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -22,12 +22,14 @@
#define fscache_available() (1)
#define fscache_volume_valid(volume) (volume)
#define fscache_cookie_valid(cookie) (cookie)
-#define fscache_cookie_enabled(cookie) (cookie)
+#define fscache_resources_valid(cres) ((cres)->cache_priv)
+#define fscache_cookie_enabled(cookie) (cookie && !test_bit(FSCACHE_COOKIE_DISABLED, &cookie->flags))
#else
#define __fscache_available (0)
#define fscache_available() (0)
#define fscache_volume_valid(volume) (0)
#define fscache_cookie_valid(cookie) (0)
+#define fscache_resources_valid(cres) (false)
#define fscache_cookie_enabled(cookie) (0)
#endif

@@ -46,6 +48,7 @@ enum fscache_cookie_state {
FSCACHE_COOKIE_STATE_CREATING, /* The cache object is being created */
FSCACHE_COOKIE_STATE_ACTIVE, /* The cache is active, readable and writable */
FSCACHE_COOKIE_STATE_FAILED, /* The cache failed, withdraw to clear */
+ FSCACHE_COOKIE_STATE_LRU_DISCARDING, /* The cookie is being discarded by the LRU */
FSCACHE_COOKIE_STATE_WITHDRAWING, /* The cookie is being withdrawn */
FSCACHE_COOKIE_STATE_RELINQUISHING, /* The cookie is being relinquished */
FSCACHE_COOKIE_STATE_DROPPED, /* The cookie has been dropped */
@@ -147,6 +150,8 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
const void *, size_t,
const void *, size_t,
loff_t);
+extern void __fscache_use_cookie(struct fscache_cookie *, bool);
+extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);

/**
@@ -228,6 +233,39 @@ struct fscache_cookie *fscache_acquire_cookie(struct fscache_volume *volume,
object_size);
}

+/**
+ * fscache_use_cookie - Request usage of cookie attached to an object
+ * @object: Object description
+ * @will_modify: If cache is expected to be modified locally
+ *
+ * Request usage of the cookie attached to an object. The caller should tell
+ * the cache if the object's contents are about to be modified locally and then
+ * the cache can apply the policy that has been set to handle this case.
+ */
+static inline void fscache_use_cookie(struct fscache_cookie *cookie,
+ bool will_modify)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_use_cookie(cookie, will_modify);
+}
+
+/**
+ * fscache_unuse_cookie - Cease usage of cookie attached to an object
+ * @object: Object description
+ * @aux_data: Updated auxiliary data (or NULL)
+ * @object_size: Revised size of the object (or NULL)
+ *
+ * Cease usage of the cookie attached to an object. When the users count
+ * reaches zero then the cookie relinquishment will be permitted to proceed.
+ */
+static inline void fscache_unuse_cookie(struct fscache_cookie *cookie,
+ const void *aux_data,
+ const loff_t *object_size)
+{
+ if (fscache_cookie_valid(cookie))
+ __fscache_unuse_cookie(cookie, aux_data, object_size);
+}
+
/**
* fscache_relinquish_cookie - Return the cookie to the cache, maybe discarding
* it
@@ -247,4 +285,46 @@ void fscache_relinquish_cookie(struct fscache_cookie *cookie, bool retire)
__fscache_relinquish_cookie(cookie, retire);
}

+/*
+ * Find the auxiliary data on a cookie.
+ */
+static inline void *fscache_get_aux(struct fscache_cookie *cookie)
+{
+ if (cookie->aux_len <= sizeof(cookie->inline_aux))
+ return cookie->inline_aux;
+ else
+ return cookie->aux;
+}
+
+/*
+ * Update the auxiliary data on a cookie.
+ */
+static inline
+void fscache_update_aux(struct fscache_cookie *cookie,
+ const void *aux_data, const loff_t *object_size)
+{
+ void *p = fscache_get_aux(cookie);
+
+ if (aux_data && p)
+ memcpy(p, aux_data, cookie->aux_len);
+ if (object_size)
+ cookie->object_size = *object_size;
+}
+
+#ifdef CONFIG_FSCACHE_STATS
+extern atomic_t fscache_n_updates;
+#endif
+
+static inline
+void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
+ const loff_t *object_size)
+{
+#ifdef CONFIG_FSCACHE_STATS
+ atomic_inc(&fscache_n_updates);
+#endif
+ fscache_update_aux(cookie, aux_data, object_size);
+ smp_wmb();
+ set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &cookie->flags);
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 030c97bb9c8b..b0409b1fad23 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -51,13 +51,18 @@ enum fscache_cookie_trace {
fscache_cookie_discard,
fscache_cookie_get_end_access,
fscache_cookie_get_hash_collision,
+ fscache_cookie_get_lru,
+ fscache_cookie_get_use_work,
fscache_cookie_new_acquire,
fscache_cookie_put_hash_collision,
+ fscache_cookie_put_lru,
fscache_cookie_put_over_queued,
fscache_cookie_put_relinquish,
fscache_cookie_put_withdrawn,
fscache_cookie_put_work,
fscache_cookie_see_active,
+ fscache_cookie_see_lru_discard,
+ fscache_cookie_see_lru_do_one,
fscache_cookie_see_relinquish,
fscache_cookie_see_withdraw,
fscache_cookie_see_work,
@@ -68,6 +73,7 @@ enum fscache_access_trace {
fscache_access_acquire_volume_end,
fscache_access_cache_pin,
fscache_access_cache_unpin,
+ fscache_access_lookup_cookie,
fscache_access_lookup_cookie_end,
fscache_access_lookup_cookie_end_failed,
fscache_access_relinquish_volume,
@@ -110,13 +116,18 @@ enum fscache_access_trace {
EM(fscache_cookie_discard, "DISCARD ") \
EM(fscache_cookie_get_hash_collision, "GET hcoll") \
EM(fscache_cookie_get_end_access, "GQ endac") \
+ EM(fscache_cookie_get_lru, "GET lru ") \
+ EM(fscache_cookie_get_use_work, "GQ use ") \
EM(fscache_cookie_new_acquire, "NEW acq ") \
EM(fscache_cookie_put_hash_collision, "PUT hcoll") \
+ EM(fscache_cookie_put_lru, "PUT lru ") \
EM(fscache_cookie_put_over_queued, "PQ overq") \
EM(fscache_cookie_put_relinquish, "PUT relnq") \
EM(fscache_cookie_put_withdrawn, "PUT wthdn") \
EM(fscache_cookie_put_work, "PQ work ") \
EM(fscache_cookie_see_active, "- activ") \
+ EM(fscache_cookie_see_lru_discard, "- x-lru") \
+ EM(fscache_cookie_see_lru_do_one, "- lrudo") \
EM(fscache_cookie_see_relinquish, "- x-rlq") \
EM(fscache_cookie_see_withdraw, "- x-wth") \
E_(fscache_cookie_see_work, "- work ")
@@ -126,6 +137,7 @@ enum fscache_access_trace {
EM(fscache_access_acquire_volume_end, "END acq_vol") \
EM(fscache_access_cache_pin, "PIN cache ") \
EM(fscache_access_cache_unpin, "UNPIN cache ") \
+ EM(fscache_access_lookup_cookie, "BEGIN lookup ") \
EM(fscache_access_lookup_cookie_end, "END lookup ") \
EM(fscache_access_lookup_cookie_end_failed,"END lookupf") \
EM(fscache_access_relinquish_volume, "BEGIN rlq_vol") \



2021-12-22 23:19:11

by David Howells

[permalink] [raw]
Subject: [PATCH v4 19/68] fscache: Implement cookie invalidation

Add a function to invalidate the cache behind a cookie:

void fscache_invalidate(struct fscache_cookie *cookie,
const void *aux_data,
loff_t size,
unsigned int flags)

This causes any cached data for the specified cookie to be discarded. If
the cookie is marked as being in use, a new cache object will be created if
possible and future I/O will use that instead. In-flight I/O should be
abandoned (writes) or reconsidered (reads). Each time it is called
cookie->inval_counter is incremented and this can be used to detect
invalidation at the end of an I/O operation.

The coherency data attached to the cookie can be updated and the cookie
size should be reset. One flag is available, FSCACHE_INVAL_DIO_WRITE,
which should be used to indicate invalidation due to a DIO write on a
file. This will temporarily disable caching for this cookie.

Changes
=======
ver #2:
- Should only change to inval state if can get access to cache.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819602231.215744.11206598147269491575.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906909707.143852.18056070560477964891.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967107447.1823006.5945029409592119962.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cookie.c | 88 ++++++++++++++++++++++++++++++++++++++++
fs/fscache/internal.h | 2 +
fs/fscache/stats.c | 5 ++
include/linux/fscache-cache.h | 4 ++
include/linux/fscache.h | 31 ++++++++++++++
include/linux/netfs.h | 1
include/trace/events/fscache.h | 25 +++++++++++
7 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 2f5ee717f2bb..a7ea7d1db032 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -19,6 +19,7 @@ static void fscache_cookie_lru_timed_out(struct timer_list *timer);
static void fscache_cookie_lru_worker(struct work_struct *work);
static void fscache_cookie_worker(struct work_struct *work);
static void fscache_unhash_cookie(struct fscache_cookie *cookie);
+static void fscache_perform_invalidation(struct fscache_cookie *cookie);

#define fscache_cookie_hash_shift 15
static struct hlist_bl_head fscache_cookie_hash[1 << fscache_cookie_hash_shift];
@@ -28,7 +29,7 @@ static LIST_HEAD(fscache_cookie_lru);
static DEFINE_SPINLOCK(fscache_cookie_lru_lock);
DEFINE_TIMER(fscache_cookie_lru_timer, fscache_cookie_lru_timed_out);
static DECLARE_WORK(fscache_cookie_lru_work, fscache_cookie_lru_worker);
-static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAFUWRD";
+static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAIFUWRD";
unsigned int fscache_lru_cookie_timeout = 10 * HZ;

void fscache_print_cookie(struct fscache_cookie *cookie, char prefix)
@@ -236,6 +237,19 @@ void fscache_cookie_lookup_negative(struct fscache_cookie *cookie)
}
EXPORT_SYMBOL(fscache_cookie_lookup_negative);

+/**
+ * fscache_resume_after_invalidation - Allow I/O to resume after invalidation
+ * @cookie: The cookie that was invalidated
+ *
+ * Tell fscache that invalidation is sufficiently complete that I/O can be
+ * allowed again.
+ */
+void fscache_resume_after_invalidation(struct fscache_cookie *cookie)
+{
+ fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_ACTIVE);
+}
+EXPORT_SYMBOL(fscache_resume_after_invalidation);
+
/**
* fscache_caching_failed - Report that a failure stopped caching on a cookie
* @cookie: The cookie that was affected
@@ -566,6 +580,7 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
set_bit(FSCACHE_COOKIE_LOCAL_WRITE, &cookie->flags);
break;
case FSCACHE_COOKIE_STATE_ACTIVE:
+ case FSCACHE_COOKIE_STATE_INVALIDATING:
if (will_modify &&
!test_and_set_bit(FSCACHE_COOKIE_LOCAL_WRITE, &cookie->flags)) {
set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
@@ -671,6 +686,11 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
fscache_perform_lookup(cookie);
goto again;

+ case FSCACHE_COOKIE_STATE_INVALIDATING:
+ spin_unlock(&cookie->lock);
+ fscache_perform_invalidation(cookie);
+ goto again;
+
case FSCACHE_COOKIE_STATE_ACTIVE:
if (test_and_clear_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags)) {
spin_unlock(&cookie->lock);
@@ -962,6 +982,72 @@ struct fscache_cookie *fscache_get_cookie(struct fscache_cookie *cookie,
}
EXPORT_SYMBOL(fscache_get_cookie);

+/*
+ * Ask the cache to effect invalidation of a cookie.
+ */
+static void fscache_perform_invalidation(struct fscache_cookie *cookie)
+{
+ if (!cookie->volume->cache->ops->invalidate_cookie(cookie))
+ fscache_caching_failed(cookie);
+ fscache_end_cookie_access(cookie, fscache_access_invalidate_cookie_end);
+}
+
+/*
+ * Invalidate an object.
+ */
+void __fscache_invalidate(struct fscache_cookie *cookie,
+ const void *aux_data, loff_t new_size,
+ unsigned int flags)
+{
+ bool is_caching;
+
+ _enter("c=%x", cookie->debug_id);
+
+ fscache_stat(&fscache_n_invalidates);
+
+ if (WARN(test_bit(FSCACHE_COOKIE_RELINQUISHED, &cookie->flags),
+ "Trying to invalidate relinquished cookie\n"))
+ return;
+
+ if ((flags & FSCACHE_INVAL_DIO_WRITE) &&
+ test_and_set_bit(FSCACHE_COOKIE_DISABLED, &cookie->flags))
+ return;
+
+ spin_lock(&cookie->lock);
+ set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+ fscache_update_aux(cookie, aux_data, &new_size);
+ cookie->inval_counter++;
+ trace_fscache_invalidate(cookie, new_size);
+
+ switch (cookie->state) {
+ case FSCACHE_COOKIE_STATE_INVALIDATING: /* is_still_valid will catch it */
+ default:
+ spin_unlock(&cookie->lock);
+ _leave(" [no %u]", cookie->state);
+ return;
+
+ case FSCACHE_COOKIE_STATE_LOOKING_UP:
+ case FSCACHE_COOKIE_STATE_CREATING:
+ spin_unlock(&cookie->lock);
+ _leave(" [look %x]", cookie->inval_counter);
+ return;
+
+ case FSCACHE_COOKIE_STATE_ACTIVE:
+ is_caching = fscache_begin_cookie_access(
+ cookie, fscache_access_invalidate_cookie);
+ if (is_caching)
+ __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_INVALIDATING);
+ spin_unlock(&cookie->lock);
+ wake_up_cookie_state(cookie);
+
+ if (is_caching)
+ fscache_queue_cookie(cookie, fscache_cookie_get_inval_work);
+ _leave(" [inv]");
+ return;
+ }
+}
+EXPORT_SYMBOL(__fscache_invalidate);
+
/*
* Generate a list of extant cookies in /proc/fs/fscache/cookies
*/
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index ca938e00eaa0..7fb83d216360 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -105,6 +105,8 @@ extern atomic_t fscache_n_acquires;
extern atomic_t fscache_n_acquires_ok;
extern atomic_t fscache_n_acquires_oom;

+extern atomic_t fscache_n_invalidates;
+
extern atomic_t fscache_n_relinquishes;
extern atomic_t fscache_n_relinquishes_retire;
extern atomic_t fscache_n_relinquishes_dropped;
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 5aa4bd9fe207..cdbb672a274f 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -26,6 +26,8 @@ atomic_t fscache_n_acquires;
atomic_t fscache_n_acquires_ok;
atomic_t fscache_n_acquires_oom;

+atomic_t fscache_n_invalidates;
+
atomic_t fscache_n_updates;
EXPORT_SYMBOL(fscache_n_updates);

@@ -59,6 +61,9 @@ int fscache_stats_show(struct seq_file *m, void *v)
timer_pending(&fscache_cookie_lru_timer) ?
fscache_cookie_lru_timer.expires - jiffies : 0);

+ seq_printf(m, "Invals : n=%u\n",
+ atomic_read(&fscache_n_invalidates));
+
seq_printf(m, "Updates: n=%u\n",
atomic_read(&fscache_n_updates));

diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index ae6a75976450..1ad56bfd9d72 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -64,6 +64,9 @@ struct fscache_cache_ops {
/* Withdraw an object without any cookie access counts held */
void (*withdraw_cookie)(struct fscache_cookie *cookie);

+ /* Invalidate an object */
+ bool (*invalidate_cookie)(struct fscache_cookie *cookie);
+
/* Prepare to write to a live cache object */
void (*prepare_to_write)(struct fscache_cookie *cookie);
};
@@ -96,6 +99,7 @@ extern void fscache_put_cookie(struct fscache_cookie *cookie,
extern void fscache_end_cookie_access(struct fscache_cookie *cookie,
enum fscache_access_trace why);
extern void fscache_cookie_lookup_negative(struct fscache_cookie *cookie);
+extern void fscache_resume_after_invalidation(struct fscache_cookie *cookie);
extern void fscache_caching_failed(struct fscache_cookie *cookie);

/**
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index e6c321e5bf73..0f36d1fac237 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,8 @@ struct fscache_cookie;
#define FSCACHE_ADV_WRITE_CACHE 0x00 /* Do cache if written to locally */
#define FSCACHE_ADV_WRITE_NOCACHE 0x02 /* Don't cache if written to locally */

+#define FSCACHE_INVAL_DIO_WRITE 0x01 /* Invalidate due to DIO write */
+
/*
* Data object state.
*/
@@ -47,6 +49,7 @@ enum fscache_cookie_state {
FSCACHE_COOKIE_STATE_LOOKING_UP, /* The cache object is being looked up */
FSCACHE_COOKIE_STATE_CREATING, /* The cache object is being created */
FSCACHE_COOKIE_STATE_ACTIVE, /* The cache is active, readable and writable */
+ FSCACHE_COOKIE_STATE_INVALIDATING, /* The cache is being invalidated */
FSCACHE_COOKIE_STATE_FAILED, /* The cache failed, withdraw to clear */
FSCACHE_COOKIE_STATE_LRU_DISCARDING, /* The cookie is being discarded by the LRU */
FSCACHE_COOKIE_STATE_WITHDRAWING, /* The cookie is being withdrawn */
@@ -153,6 +156,7 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
extern void __fscache_use_cookie(struct fscache_cookie *, bool);
extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
+extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);

/**
* fscache_acquire_volume - Register a volume as desiring caching services
@@ -327,4 +331,31 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &cookie->flags);
}

+/**
+ * fscache_invalidate - Notify cache that an object needs invalidation
+ * @cookie: The cookie representing the cache object
+ * @aux_data: The updated auxiliary data for the cookie (may be NULL)
+ * @size: The revised size of the object.
+ * @flags: Invalidation flags (FSCACHE_INVAL_*)
+ *
+ * Notify the cache that an object is needs to be invalidated and that it
+ * should abort any retrievals or stores it is doing on the cache. This
+ * increments inval_counter on the cookie which can be used by the caller to
+ * reconsider I/O requests as they complete.
+ *
+ * If @flags has FSCACHE_INVAL_DIO_WRITE set, this indicates that this is due
+ * to a direct I/O write and will cause caching to be disabled on this cookie
+ * until it is completely unused.
+ *
+ * See Documentation/filesystems/caching/netfs-api.rst for a complete
+ * description.
+ */
+static inline
+void fscache_invalidate(struct fscache_cookie *cookie,
+ const void *aux_data, loff_t size, unsigned int flags)
+{
+ if (fscache_cookie_enabled(cookie))
+ __fscache_invalidate(cookie, aux_data, size, flags);
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 1ea22fc48818..5a46fde65759 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -124,6 +124,7 @@ struct netfs_cache_resources {
void *cache_priv;
void *cache_priv2;
unsigned int debug_id; /* Cookie debug ID */
+ unsigned int inval_counter; /* object->inval_counter at begin_op */
};

/*
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index b0409b1fad23..294792881434 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -51,6 +51,7 @@ enum fscache_cookie_trace {
fscache_cookie_discard,
fscache_cookie_get_end_access,
fscache_cookie_get_hash_collision,
+ fscache_cookie_get_inval_work,
fscache_cookie_get_lru,
fscache_cookie_get_use_work,
fscache_cookie_new_acquire,
@@ -73,6 +74,8 @@ enum fscache_access_trace {
fscache_access_acquire_volume_end,
fscache_access_cache_pin,
fscache_access_cache_unpin,
+ fscache_access_invalidate_cookie,
+ fscache_access_invalidate_cookie_end,
fscache_access_lookup_cookie,
fscache_access_lookup_cookie_end,
fscache_access_lookup_cookie_end_failed,
@@ -116,6 +119,7 @@ enum fscache_access_trace {
EM(fscache_cookie_discard, "DISCARD ") \
EM(fscache_cookie_get_hash_collision, "GET hcoll") \
EM(fscache_cookie_get_end_access, "GQ endac") \
+ EM(fscache_cookie_get_inval_work, "GQ inval") \
EM(fscache_cookie_get_lru, "GET lru ") \
EM(fscache_cookie_get_use_work, "GQ use ") \
EM(fscache_cookie_new_acquire, "NEW acq ") \
@@ -137,6 +141,8 @@ enum fscache_access_trace {
EM(fscache_access_acquire_volume_end, "END acq_vol") \
EM(fscache_access_cache_pin, "PIN cache ") \
EM(fscache_access_cache_unpin, "UNPIN cache ") \
+ EM(fscache_access_invalidate_cookie, "BEGIN inval ") \
+ EM(fscache_access_invalidate_cookie_end,"END inval ") \
EM(fscache_access_lookup_cookie, "BEGIN lookup ") \
EM(fscache_access_lookup_cookie_end, "END lookup ") \
EM(fscache_access_lookup_cookie_end_failed,"END lookupf") \
@@ -385,6 +391,25 @@ TRACE_EVENT(fscache_relinquish,
__entry->n_active, __entry->flags, __entry->retire)
);

+TRACE_EVENT(fscache_invalidate,
+ TP_PROTO(struct fscache_cookie *cookie, loff_t new_size),
+
+ TP_ARGS(cookie, new_size),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(loff_t, new_size )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie->debug_id;
+ __entry->new_size = new_size;
+ ),
+
+ TP_printk("c=%08x sz=%llx",
+ __entry->cookie, __entry->new_size)
+ );
+
#endif /* _TRACE_FSCACHE_H */

/* This part must be outside protection */



2021-12-22 23:19:28

by David Howells

[permalink] [raw]
Subject: [PATCH v4 20/68] fscache: Provide a means to begin an operation

Provide a function to begin a read operation:

int fscache_begin_read_operation(
struct netfs_cache_resources *cres,
struct fscache_cookie *cookie)

This is primarily intended to be called by network filesystems on behalf of
netfslib, but may also be called to use the I/O access functions directly.
It attaches the resources required by the cache to cres struct from the
supplied cookie.

This holds access to the cache behind the cookie for the duration of the
operation and forces cache withdrawal and cookie invalidation to perform
synchronisation on the operation. cres->inval_counter is set from the
cookie at this point so that it can be compared at the end of the
operation.

Note that this does not guarantee that the cache state is fully set up and
able to perform I/O immediately; looking up and creation may be left in
progress in the background. The operations intended to be called by the
network filesystem, such as reading and writing, are expected to wait for
the cookie to move to the correct state.

This will, however, potentially sleep, waiting for a certain minimum state
to be set or for operations such as invalidate to advance far enough that
I/O can resume.


Also provide a function for the cache to call to wait for the cache object
to get to a state where it can be used for certain things:

bool fscache_wait_for_operation(struct netfs_cache_resources *cres,
enum fscache_want_stage stage);

This looks at the cache resources provided by the begin function and waits
for them to get to an appropriate stage. There's a choice of wanting just
some parameters (FSCACHE_WANT_PARAM) or the ability to do I/O
(FSCACHE_WANT_READ or FSCACHE_WANT_WRITE).

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819603692.215744.146724961588817028.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906910672.143852.13856103384424986357.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967110245.1823006.2239170567540431836.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/Makefile | 1
fs/fscache/internal.h | 11 +++
fs/fscache/io.c | 151 ++++++++++++++++++++++++++++++++++++++++
include/linux/fscache-cache.h | 11 +++
include/linux/fscache.h | 49 +++++++++++++
include/trace/events/fscache.h | 6 ++
6 files changed, 229 insertions(+)
create mode 100644 fs/fscache/io.c

diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index bcc79615f93a..afb090ea16c4 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -6,6 +6,7 @@
fscache-y := \
cache.o \
cookie.o \
+ io.o \
main.o \
volume.o

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 7fb83d216360..017bf3d346a4 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -70,6 +70,17 @@ static inline void fscache_see_cookie(struct fscache_cookie *cookie,
where);
}

+/*
+ * io.c
+ */
+static inline void fscache_end_operation(struct netfs_cache_resources *cres)
+{
+ const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+
+ if (ops)
+ ops->end_operation(cres);
+}
+
/*
* main.c
*/
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
new file mode 100644
index 000000000000..460a43473019
--- /dev/null
+++ b/fs/fscache/io.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Cache data I/O routines
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+#define FSCACHE_DEBUG_LEVEL OPERATION
+#include <linux/fscache-cache.h>
+#include <linux/uio.h>
+#include <linux/bvec.h>
+#include <linux/slab.h>
+#include <linux/uio.h>
+#include "internal.h"
+
+/**
+ * fscache_wait_for_operation - Wait for an object become accessible
+ * @cres: The cache resources for the operation being performed
+ * @want_state: The minimum state the object must be at
+ *
+ * See if the target cache object is at the specified minimum state of
+ * accessibility yet, and if not, wait for it.
+ */
+bool fscache_wait_for_operation(struct netfs_cache_resources *cres,
+ enum fscache_want_state want_state)
+{
+ struct fscache_cookie *cookie = fscache_cres_cookie(cres);
+ enum fscache_cookie_state state;
+
+again:
+ if (!fscache_cache_is_live(cookie->volume->cache)) {
+ _leave(" [broken]");
+ return false;
+ }
+
+ state = fscache_cookie_state(cookie);
+ _enter("c=%08x{%u},%x", cookie->debug_id, state, want_state);
+
+ switch (state) {
+ case FSCACHE_COOKIE_STATE_CREATING:
+ case FSCACHE_COOKIE_STATE_INVALIDATING:
+ if (want_state == FSCACHE_WANT_PARAMS)
+ goto ready; /* There can be no content */
+ fallthrough;
+ case FSCACHE_COOKIE_STATE_LOOKING_UP:
+ case FSCACHE_COOKIE_STATE_LRU_DISCARDING:
+ wait_var_event(&cookie->state,
+ fscache_cookie_state(cookie) != state);
+ goto again;
+
+ case FSCACHE_COOKIE_STATE_ACTIVE:
+ goto ready;
+ case FSCACHE_COOKIE_STATE_DROPPED:
+ case FSCACHE_COOKIE_STATE_RELINQUISHING:
+ default:
+ _leave(" [not live]");
+ return false;
+ }
+
+ready:
+ if (!cres->cache_priv2)
+ return cookie->volume->cache->ops->begin_operation(cres, want_state);
+ return true;
+}
+EXPORT_SYMBOL(fscache_wait_for_operation);
+
+/*
+ * Begin an I/O operation on the cache, waiting till we reach the right state.
+ *
+ * Attaches the resources required to the operation resources record.
+ */
+static int fscache_begin_operation(struct netfs_cache_resources *cres,
+ struct fscache_cookie *cookie,
+ enum fscache_want_state want_state,
+ enum fscache_access_trace why)
+{
+ enum fscache_cookie_state state;
+ long timeo;
+ bool once_only = false;
+
+ cres->ops = NULL;
+ cres->cache_priv = cookie;
+ cres->cache_priv2 = NULL;
+ cres->debug_id = cookie->debug_id;
+ cres->inval_counter = cookie->inval_counter;
+
+ if (!fscache_begin_cookie_access(cookie, why))
+ return -ENOBUFS;
+
+again:
+ spin_lock(&cookie->lock);
+
+ state = fscache_cookie_state(cookie);
+ _enter("c=%08x{%u},%x", cookie->debug_id, state, want_state);
+
+ switch (state) {
+ case FSCACHE_COOKIE_STATE_LOOKING_UP:
+ case FSCACHE_COOKIE_STATE_LRU_DISCARDING:
+ case FSCACHE_COOKIE_STATE_INVALIDATING:
+ goto wait_for_file_wrangling;
+ case FSCACHE_COOKIE_STATE_CREATING:
+ if (want_state == FSCACHE_WANT_PARAMS)
+ goto ready; /* There can be no content */
+ goto wait_for_file_wrangling;
+ case FSCACHE_COOKIE_STATE_ACTIVE:
+ goto ready;
+ case FSCACHE_COOKIE_STATE_DROPPED:
+ case FSCACHE_COOKIE_STATE_RELINQUISHING:
+ WARN(1, "Can't use cookie in state %u\n", cookie->state);
+ goto not_live;
+ default:
+ goto not_live;
+ }
+
+ready:
+ spin_unlock(&cookie->lock);
+ if (!cookie->volume->cache->ops->begin_operation(cres, want_state))
+ goto failed;
+ return 0;
+
+wait_for_file_wrangling:
+ spin_unlock(&cookie->lock);
+ trace_fscache_access(cookie->debug_id, refcount_read(&cookie->ref),
+ atomic_read(&cookie->n_accesses),
+ fscache_access_io_wait);
+ timeo = wait_var_event_timeout(&cookie->state,
+ fscache_cookie_state(cookie) != state, 20 * HZ);
+ if (timeo <= 1 && !once_only) {
+ pr_warn("%s: cookie state change wait timed out: cookie->state=%u state=%u",
+ __func__, fscache_cookie_state(cookie), state);
+ fscache_print_cookie(cookie, 'O');
+ once_only = true;
+ }
+ goto again;
+
+not_live:
+ spin_unlock(&cookie->lock);
+failed:
+ cres->cache_priv = NULL;
+ cres->ops = NULL;
+ fscache_end_cookie_access(cookie, fscache_access_io_not_live);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+
+int __fscache_begin_read_operation(struct netfs_cache_resources *cres,
+ struct fscache_cookie *cookie)
+{
+ return fscache_begin_operation(cres, cookie, FSCACHE_WANT_PARAMS,
+ fscache_access_io_read);
+}
+EXPORT_SYMBOL(__fscache_begin_read_operation);
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 1ad56bfd9d72..566497cf5f13 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -67,6 +67,10 @@ struct fscache_cache_ops {
/* Invalidate an object */
bool (*invalidate_cookie)(struct fscache_cookie *cookie);

+ /* Begin an operation for the netfs lib */
+ bool (*begin_operation)(struct netfs_cache_resources *cres,
+ enum fscache_want_state want_state);
+
/* Prepare to write to a live cache object */
void (*prepare_to_write)(struct fscache_cookie *cookie);
};
@@ -101,6 +105,8 @@ extern void fscache_end_cookie_access(struct fscache_cookie *cookie,
extern void fscache_cookie_lookup_negative(struct fscache_cookie *cookie);
extern void fscache_resume_after_invalidation(struct fscache_cookie *cookie);
extern void fscache_caching_failed(struct fscache_cookie *cookie);
+extern bool fscache_wait_for_operation(struct netfs_cache_resources *cred,
+ enum fscache_want_state state);

/**
* fscache_cookie_state - Read the state of a cookie
@@ -129,4 +135,9 @@ static inline void *fscache_get_key(struct fscache_cookie *cookie)
return cookie->key;
}

+static inline struct fscache_cookie *fscache_cres_cookie(struct netfs_cache_resources *cres)
+{
+ return cres->cache_priv;
+}
+
#endif /* _LINUX_FSCACHE_CACHE_H */
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 0f36d1fac237..7cdc63c4fe35 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -41,6 +41,12 @@ struct fscache_cookie;

#define FSCACHE_INVAL_DIO_WRITE 0x01 /* Invalidate due to DIO write */

+enum fscache_want_state {
+ FSCACHE_WANT_PARAMS,
+ FSCACHE_WANT_WRITE,
+ FSCACHE_WANT_READ,
+};
+
/*
* Data object state.
*/
@@ -157,6 +163,7 @@ extern void __fscache_use_cookie(struct fscache_cookie *, bool);
extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
+extern int __fscache_begin_read_operation(struct netfs_cache_resources *, struct fscache_cookie *);

/**
* fscache_acquire_volume - Register a volume as desiring caching services
@@ -358,4 +365,46 @@ void fscache_invalidate(struct fscache_cookie *cookie,
__fscache_invalidate(cookie, aux_data, size, flags);
}

+/**
+ * fscache_operation_valid - Return true if operations resources are usable
+ * @cres: The resources to check.
+ *
+ * Returns a pointer to the operations table if usable or NULL if not.
+ */
+static inline
+const struct netfs_cache_ops *fscache_operation_valid(const struct netfs_cache_resources *cres)
+{
+ return fscache_resources_valid(cres) ? cres->ops : NULL;
+}
+
+/**
+ * fscache_begin_read_operation - Begin a read operation for the netfs lib
+ * @cres: The cache resources for the read being performed
+ * @cookie: The cookie representing the cache object
+ *
+ * Begin a read operation on behalf of the netfs helper library. @cres
+ * indicates the cache resources to which the operation state should be
+ * attached; @cookie indicates the cache object that will be accessed.
+ *
+ * This is intended to be called from the ->begin_cache_operation() netfs lib
+ * operation as implemented by the network filesystem.
+ *
+ * @cres->inval_counter is set from @cookie->inval_counter for comparison at
+ * the end of the operation. This allows invalidation during the operation to
+ * be detected by the caller.
+ *
+ * Returns:
+ * * 0 - Success
+ * * -ENOBUFS - No caching available
+ * * Other error code from the cache, such as -ENOMEM.
+ */
+static inline
+int fscache_begin_read_operation(struct netfs_cache_resources *cres,
+ struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_enabled(cookie))
+ return __fscache_begin_read_operation(cres, cookie);
+ return -ENOBUFS;
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 294792881434..9f78c903b00a 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -76,6 +76,9 @@ enum fscache_access_trace {
fscache_access_cache_unpin,
fscache_access_invalidate_cookie,
fscache_access_invalidate_cookie_end,
+ fscache_access_io_not_live,
+ fscache_access_io_read,
+ fscache_access_io_wait,
fscache_access_lookup_cookie,
fscache_access_lookup_cookie_end,
fscache_access_lookup_cookie_end_failed,
@@ -143,6 +146,9 @@ enum fscache_access_trace {
EM(fscache_access_cache_unpin, "UNPIN cache ") \
EM(fscache_access_invalidate_cookie, "BEGIN inval ") \
EM(fscache_access_invalidate_cookie_end,"END inval ") \
+ EM(fscache_access_io_not_live, "END io_notl") \
+ EM(fscache_access_io_read, "BEGIN io_read") \
+ EM(fscache_access_io_wait, "WAIT io ") \
EM(fscache_access_lookup_cookie, "BEGIN lookup ") \
EM(fscache_access_lookup_cookie_end, "END lookup ") \
EM(fscache_access_lookup_cookie_end_failed,"END lookupf") \



2021-12-22 23:19:48

by David Howells

[permalink] [raw]
Subject: [PATCH v4 21/68] fscache: Count data storage objects in a cache

Count the data storage objects that are currently allocated in a cache.
This is used to pin certain cache structures until cache withdrawal is
complete.

Three helpers are provided to manage and make use of the count:

(1) void fscache_count_object(struct fscache_cache *cache);

This should be called by the cache backend to note that an object has
been allocated and attached to the cache.

(2) void fscache_uncount_object(struct fscache_cache *cache);

This should be called by the backend to note that an object has been
destroyed. This sends a wakeup event that allows cache withdrawal to
proceed if it was waiting for that object.

(3) void fscache_wait_for_objects(struct fscache_cache *cache);

This can be used by the backend to wait for all outstanding cache
object to be destroyed.

Each cache's counter is displayed as part of /proc/fs/fscache/caches.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819608594.215744.1812706538117388252.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906911646.143852.168184059935530127.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967111846.1823006.9868154941573671255.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/cache.c | 2 ++
include/linux/fscache-cache.h | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)

diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c
index 25eac61f1c29..2749933852a9 100644
--- a/fs/fscache/cache.c
+++ b/fs/fscache/cache.c
@@ -13,6 +13,8 @@
static LIST_HEAD(fscache_caches);
DECLARE_RWSEM(fscache_addremove_sem);
EXPORT_SYMBOL(fscache_addremove_sem);
+DECLARE_WAIT_QUEUE_HEAD(fscache_clearance_waiters);
+EXPORT_SYMBOL(fscache_clearance_waiters);

static atomic_t fscache_cache_debug_id;

diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 566497cf5f13..337335d7a5e2 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -76,6 +76,7 @@ struct fscache_cache_ops {
};

extern struct workqueue_struct *fscache_wq;
+extern wait_queue_head_t fscache_clearance_waiters;

/*
* out-of-line cache backend functions
@@ -140,4 +141,42 @@ static inline struct fscache_cookie *fscache_cres_cookie(struct netfs_cache_reso
return cres->cache_priv;
}

+/**
+ * fscache_count_object - Tell fscache that an object has been added
+ * @cache: The cache to account to
+ *
+ * Tell fscache that an object has been added to the cache. This prevents the
+ * cache from tearing down the cache structure until the object is uncounted.
+ */
+static inline void fscache_count_object(struct fscache_cache *cache)
+{
+ atomic_inc(&cache->object_count);
+}
+
+/**
+ * fscache_uncount_object - Tell fscache that an object has been removed
+ * @cache: The cache to account to
+ *
+ * Tell fscache that an object has been removed from the cache and will no
+ * longer be accessed. After this point, the cache cookie may be destroyed.
+ */
+static inline void fscache_uncount_object(struct fscache_cache *cache)
+{
+ if (atomic_dec_and_test(&cache->object_count))
+ wake_up_all(&fscache_clearance_waiters);
+}
+
+/**
+ * fscache_wait_for_objects - Wait for all objects to be withdrawn
+ * @cache: The cache to query
+ *
+ * Wait for all extant objects in a cache to finish being withdrawn
+ * and go away.
+ */
+static inline void fscache_wait_for_objects(struct fscache_cache *cache)
+{
+ wait_event(fscache_clearance_waiters,
+ atomic_read(&cache->object_count) == 0);
+}
+
#endif /* _LINUX_FSCACHE_CACHE_H */



2021-12-22 23:19:51

by David Howells

[permalink] [raw]
Subject: [PATCH v4 22/68] fscache: Provide read/write stat counters for the cache

Provide read/write stat counters for the cache backend to use.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819609532.215744.10821082637727410554.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906912598.143852.12960327989649429069.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967113830.1823006.3222957649202368162.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/stats.c | 9 +++++++++
include/linux/fscache-cache.h | 10 ++++++++++
2 files changed, 19 insertions(+)

diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index cdbb672a274f..db42beb1ba3f 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -35,6 +35,11 @@ atomic_t fscache_n_relinquishes;
atomic_t fscache_n_relinquishes_retire;
atomic_t fscache_n_relinquishes_dropped;

+atomic_t fscache_n_read;
+EXPORT_SYMBOL(fscache_n_read);
+atomic_t fscache_n_write;
+EXPORT_SYMBOL(fscache_n_write);
+
/*
* display the general statistics
*/
@@ -72,6 +77,10 @@ int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_relinquishes_retire),
atomic_read(&fscache_n_relinquishes_dropped));

+ seq_printf(m, "IO : rd=%u wr=%u\n",
+ atomic_read(&fscache_n_read),
+ atomic_read(&fscache_n_write));
+
netfs_stats_show(m);
return 0;
}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 337335d7a5e2..796c8b5c5305 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -179,4 +179,14 @@ static inline void fscache_wait_for_objects(struct fscache_cache *cache)
atomic_read(&cache->object_count) == 0);
}

+#ifdef CONFIG_FSCACHE_STATS
+extern atomic_t fscache_n_read;
+extern atomic_t fscache_n_write;
+#define fscache_count_read() atomic_inc(&fscache_n_read)
+#define fscache_count_write() atomic_inc(&fscache_n_write)
+#else
+#define fscache_count_read() do {} while(0)
+#define fscache_count_write() do {} while(0)
+#endif
+
#endif /* _LINUX_FSCACHE_CACHE_H */



2021-12-22 23:20:00

by David Howells

[permalink] [raw]
Subject: [PATCH v4 23/68] fscache: Provide a function to let the netfs update its coherency data

Provide a function to let the netfs update its coherency data:

void fscache_update_cookie(struct fscache_cookie *cookie,
const void *aux_data,
const loff_t *object_size);

This will update the auxiliary data and/or the size of the object attached
to a cookie if either pointer is not-NULL and flag that the disk needs to
be updated.

Note that fscache_unuse_cookie() also allows this to be done.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819610438.215744.4223265964131424954.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906913530.143852.18150303220217653820.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967117795.1823006.7493373142653442595.stgit@warthog.procyon.org.uk/ # v3
---

include/linux/fscache.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 7cdc63c4fe35..fc77648c8af6 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -338,6 +338,28 @@ void __fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data
set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &cookie->flags);
}

+/**
+ * fscache_update_cookie - Request that a cache object be updated
+ * @cookie: The cookie representing the cache object
+ * @aux_data: The updated auxiliary data for the cookie (may be NULL)
+ * @object_size: The current size of the object (may be NULL)
+ *
+ * Request an update of the index data for the cache object associated with the
+ * cookie. The auxiliary data on the cookie will be updated first if @aux_data
+ * is set and the object size will be updated and the object possibly trimmed
+ * if @object_size is set.
+ *
+ * See Documentation/filesystems/caching/netfs-api.rst for a complete
+ * description.
+ */
+static inline
+void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
+ const loff_t *object_size)
+{
+ if (fscache_cookie_enabled(cookie))
+ __fscache_update_cookie(cookie, aux_data, object_size);
+}
+
/**
* fscache_invalidate - Notify cache that an object needs invalidation
* @cookie: The cookie representing the cache object



2021-12-22 23:20:17

by David Howells

[permalink] [raw]
Subject: [PATCH v4 24/68] netfs: Pass more information on how to deal with a hole in the cache

Pass more information to the cache on how to deal with a hole if it
encounters one when trying to read from the cache. Three options are
provided:

(1) NETFS_READ_HOLE_IGNORE. Read the hole along with the data, assuming
it to be a punched-out extent by the backing filesystem.

(2) NETFS_READ_HOLE_CLEAR. If there's a hole, erase the requested region
of the cache and clear the read buffer.

(3) NETFS_READ_HOLE_FAIL. Fail the read if a hole is detected.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819612321.215744.9738308885948264476.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906914460.143852.6284247083607910189.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967119923.1823006.15637375885194297582.stgit@warthog.procyon.org.uk/ # v3
---

fs/netfs/read_helper.c | 8 ++++----
include/linux/netfs.h | 11 ++++++++++-
2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 9dd76b8914f2..6169659857b3 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -170,7 +170,7 @@ static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error
*/
static void netfs_read_from_cache(struct netfs_read_request *rreq,
struct netfs_read_subrequest *subreq,
- bool seek_data)
+ enum netfs_read_from_hole read_hole)
{
struct netfs_cache_resources *cres = &rreq->cache_resources;
struct iov_iter iter;
@@ -180,7 +180,7 @@ static void netfs_read_from_cache(struct netfs_read_request *rreq,
subreq->start + subreq->transferred,
subreq->len - subreq->transferred);

- cres->ops->read(cres, subreq->start, &iter, seek_data,
+ cres->ops->read(cres, subreq->start, &iter, read_hole,
netfs_cache_read_terminated, subreq);
}

@@ -461,7 +461,7 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq,
netfs_get_read_subrequest(subreq);
atomic_inc(&rreq->nr_rd_ops);
if (subreq->source == NETFS_READ_FROM_CACHE)
- netfs_read_from_cache(rreq, subreq, true);
+ netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_CLEAR);
else
netfs_read_from_server(rreq, subreq);
}
@@ -789,7 +789,7 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
netfs_read_from_server(rreq, subreq);
break;
case NETFS_READ_FROM_CACHE:
- netfs_read_from_cache(rreq, subreq, false);
+ netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_IGNORE);
break;
default:
BUG();
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5a46fde65759..b46c39d98bbd 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -196,6 +196,15 @@ struct netfs_read_request_ops {
void (*cleanup)(struct address_space *mapping, void *netfs_priv);
};

+/*
+ * How to handle reading from a hole.
+ */
+enum netfs_read_from_hole {
+ NETFS_READ_HOLE_IGNORE,
+ NETFS_READ_HOLE_CLEAR,
+ NETFS_READ_HOLE_FAIL,
+};
+
/*
* Table of operations for access to a cache. This is obtained by
* rreq->ops->begin_cache_operation().
@@ -208,7 +217,7 @@ struct netfs_cache_ops {
int (*read)(struct netfs_cache_resources *cres,
loff_t start_pos,
struct iov_iter *iter,
- bool seek_data,
+ enum netfs_read_from_hole read_hole,
netfs_io_terminated_t term_func,
void *term_func_priv);




2021-12-22 23:20:30

by David Howells

[permalink] [raw]
Subject: [PATCH v4 25/68] fscache: Implement raw I/O interface

Provide a pair of functions to perform raw I/O on the cache. The first
function allows an arbitrary asynchronous direct-IO read to be made against
a cache object, though the read should be aligned and sized appropriately
for the backing device:

int fscache_read(struct netfs_cache_resources *cres,
loff_t start_pos,
struct iov_iter *iter,
enum netfs_read_from_hole read_hole,
netfs_io_terminated_t term_func,
void *term_func_priv);

The cache resources must have been previously initialised by
fscache_begin_read_operation(). A read operation is sent to the backing
filesystem, starting at start_pos within the file. The size of the read is
specified by the iterator, as is the location of the output buffer.

If there is a hole in the data it can be ignored and left to the backing
filesystem to deal with (NETFS_READ_HOLE_IGNORE), a hole at the beginning
can be skipped over and the buffer padded with zeros
(NETFS_READ_HOLE_CLEAR) or -ENODATA can be given (NETFS_READ_HOLE_FAIL).

If term_func is not NULL, the operation may be performed asynchronously.
Upon completion, successful or otherwise, (*term_func)() will be called and
passed term_func_priv, along with an error or the amount of data
transferred. If the op is run asynchronously, fscache_read() will return
-EIOCBQUEUED.

The second function allows an arbitrary asynchronous direct-IO write to be
made against a cache object, though the write should be aligned and sized
appropriately for the backing device:

int fscache_write(struct netfs_cache_resources *cres,
loff_t start_pos,
struct iov_iter *iter,
netfs_io_terminated_t term_func,
void *term_func_priv);

This works in very similar way to fscache_read(), except that there's no
need to deal with holes (they're just overwritten).

The caller is responsible for preventing concurrent overlapping writes.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819613224.215744.7877577215582621254.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906915386.143852.16936177636106480724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967122632.1823006.7487049517698562172.stgit@warthog.procyon.org.uk/ # v3
---

include/linux/fscache.h | 74 ++++++++++++++++++++++++++++++++++++++++
include/trace/events/fscache.h | 2 +
2 files changed, 76 insertions(+)

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index fc77648c8af6..ae753cae0fdd 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -429,4 +429,78 @@ int fscache_begin_read_operation(struct netfs_cache_resources *cres,
return -ENOBUFS;
}

+/**
+ * fscache_read - Start a read from the cache.
+ * @cres: The cache resources to use
+ * @start_pos: The beginning file offset in the cache file
+ * @iter: The buffer to fill - and also the length
+ * @read_hole: How to handle a hole in the data.
+ * @term_func: The function to call upon completion
+ * @term_func_priv: The private data for @term_func
+ *
+ * Start a read from the cache. @cres indicates the cache object to read from
+ * and must be obtained by a call to fscache_begin_operation() beforehand.
+ *
+ * The data is read into the iterator, @iter, and that also indicates the size
+ * of the operation. @start_pos is the start position in the file, though if
+ * @seek_data is set appropriately, the cache can use SEEK_DATA to find the
+ * next piece of data, writing zeros for the hole into the iterator.
+ *
+ * Upon termination of the operation, @term_func will be called and supplied
+ * with @term_func_priv plus the amount of data written, if successful, or the
+ * error code otherwise.
+ *
+ * @read_hole indicates how a partially populated region in the cache should be
+ * handled. It can be one of a number of settings:
+ *
+ * NETFS_READ_HOLE_IGNORE - Just try to read (may return a short read).
+ *
+ * NETFS_READ_HOLE_CLEAR - Seek for data, clearing the part of the buffer
+ * skipped over, then do as for IGNORE.
+ *
+ * NETFS_READ_HOLE_FAIL - Give ENODATA if we encounter a hole.
+ */
+static inline
+int fscache_read(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ enum netfs_read_from_hole read_hole,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+ return ops->read(cres, start_pos, iter, read_hole,
+ term_func, term_func_priv);
+}
+
+/**
+ * fscache_write - Start a write to the cache.
+ * @cres: The cache resources to use
+ * @start_pos: The beginning file offset in the cache file
+ * @iter: The data to write - and also the length
+ * @term_func: The function to call upon completion
+ * @term_func_priv: The private data for @term_func
+ *
+ * Start a write to the cache. @cres indicates the cache object to write to and
+ * must be obtained by a call to fscache_begin_operation() beforehand.
+ *
+ * The data to be written is obtained from the iterator, @iter, and that also
+ * indicates the size of the operation. @start_pos is the start position in
+ * the file.
+ *
+ * Upon termination of the operation, @term_func will be called and supplied
+ * with @term_func_priv plus the amount of data written, if successful, or the
+ * error code otherwise.
+ */
+static inline
+int fscache_write(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+ return ops->write(cres, start_pos, iter, term_func, term_func_priv);
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 9f78c903b00a..2459d75659cf 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -79,6 +79,7 @@ enum fscache_access_trace {
fscache_access_io_not_live,
fscache_access_io_read,
fscache_access_io_wait,
+ fscache_access_io_write,
fscache_access_lookup_cookie,
fscache_access_lookup_cookie_end,
fscache_access_lookup_cookie_end_failed,
@@ -149,6 +150,7 @@ enum fscache_access_trace {
EM(fscache_access_io_not_live, "END io_notl") \
EM(fscache_access_io_read, "BEGIN io_read") \
EM(fscache_access_io_wait, "WAIT io ") \
+ EM(fscache_access_io_write, "BEGIN io_writ") \
EM(fscache_access_lookup_cookie, "BEGIN lookup ") \
EM(fscache_access_lookup_cookie_end, "END lookup ") \
EM(fscache_access_lookup_cookie_end_failed,"END lookupf") \



2021-12-22 23:20:57

by David Howells

[permalink] [raw]
Subject: [PATCH v4 26/68] fscache: Implement higher-level write I/O interface

Provide a higher-level function than fscache_write() to perform a write
from an inode's pagecache to the cache, whilst fending off concurrent
writes by means of the PG_fscache mark on a page:

void fscache_write_to_cache(struct fscache_cookie *cookie,
struct address_space *mapping,
loff_t start,
size_t len,
loff_t i_size,
netfs_io_terminated_t term_func,
void *term_func_priv,
bool caching);

If caching is false, this function does nothing except call (*term_func)()
if given. It assumes that, in such a case, PG_fscache will not have been
set on the pages.

Otherwise, if caching is true, this function requires the source pages to
have had PG_fscache set on them before calling. start and len define the
region of the file to be modified and i_size indicates the new file size.
The source pages are extracted from the mapping.

term_func and term_func_priv work as for fscache_write(). The PG_fscache
marks will be cleared at the end of the operation, before term_func is
called or the function otherwise returns.

There is an additonal helper function to clear the PG_fscache bits from a
range of pages:

void fscache_clear_page_bits(struct fscache_cookie *cookie,
struct address_space *mapping,
loff_t start, size_t len,
bool caching);

If caching is true, the pages to be managed are expected to be located on
mapping in the range defined by start and len. If caching is false, it
does nothing.

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819614155.215744.5528123235123721230.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906916346.143852.15632773570362489926.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967123599.1823006.12946816026724657428.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/io.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/fscache.h | 63 ++++++++++++++++++++++++++++
2 files changed, 167 insertions(+)

diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 460a43473019..74cde7acf434 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -149,3 +149,107 @@ int __fscache_begin_read_operation(struct netfs_cache_resources *cres,
fscache_access_io_read);
}
EXPORT_SYMBOL(__fscache_begin_read_operation);
+
+struct fscache_write_request {
+ struct netfs_cache_resources cache_resources;
+ struct address_space *mapping;
+ loff_t start;
+ size_t len;
+ bool set_bits;
+ netfs_io_terminated_t term_func;
+ void *term_func_priv;
+};
+
+void __fscache_clear_page_bits(struct address_space *mapping,
+ loff_t start, size_t len)
+{
+ pgoff_t first = start / PAGE_SIZE;
+ pgoff_t last = (start + len - 1) / PAGE_SIZE;
+ struct page *page;
+
+ if (len) {
+ XA_STATE(xas, &mapping->i_pages, first);
+
+ rcu_read_lock();
+ xas_for_each(&xas, page, last) {
+ end_page_fscache(page);
+ }
+ rcu_read_unlock();
+ }
+}
+EXPORT_SYMBOL(__fscache_clear_page_bits);
+
+/*
+ * Deal with the completion of writing the data to the cache.
+ */
+static void fscache_wreq_done(void *priv, ssize_t transferred_or_error,
+ bool was_async)
+{
+ struct fscache_write_request *wreq = priv;
+
+ fscache_clear_page_bits(fscache_cres_cookie(&wreq->cache_resources),
+ wreq->mapping, wreq->start, wreq->len,
+ wreq->set_bits);
+
+ if (wreq->term_func)
+ wreq->term_func(wreq->term_func_priv, transferred_or_error,
+ was_async);
+ fscache_end_operation(&wreq->cache_resources);
+ kfree(wreq);
+}
+
+void __fscache_write_to_cache(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ loff_t start, size_t len, loff_t i_size,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv,
+ bool cond)
+{
+ struct fscache_write_request *wreq;
+ struct netfs_cache_resources *cres;
+ struct iov_iter iter;
+ int ret = -ENOBUFS;
+
+ if (len == 0)
+ goto abandon;
+
+ _enter("%llx,%zx", start, len);
+
+ wreq = kzalloc(sizeof(struct fscache_write_request), GFP_NOFS);
+ if (!wreq)
+ goto abandon;
+ wreq->mapping = mapping;
+ wreq->start = start;
+ wreq->len = len;
+ wreq->set_bits = cond;
+ wreq->term_func = term_func;
+ wreq->term_func_priv = term_func_priv;
+
+ cres = &wreq->cache_resources;
+ if (fscache_begin_operation(cres, cookie, FSCACHE_WANT_WRITE,
+ fscache_access_io_write) < 0)
+ goto abandon_free;
+
+ ret = cres->ops->prepare_write(cres, &start, &len, i_size, false);
+ if (ret < 0)
+ goto abandon_end;
+
+ /* TODO: Consider clearing page bits now for space the write isn't
+ * covering. This is more complicated than it appears when THPs are
+ * taken into account.
+ */
+
+ iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len);
+ fscache_write(cres, start, &iter, fscache_wreq_done, wreq);
+ return;
+
+abandon_end:
+ return fscache_wreq_done(wreq, ret, false);
+abandon_free:
+ kfree(wreq);
+abandon:
+ fscache_clear_page_bits(cookie, mapping, start, len, cond);
+ if (term_func)
+ term_func(term_func_priv, ret, false);
+}
+EXPORT_SYMBOL(__fscache_write_to_cache);
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index ae753cae0fdd..9d469613e16c 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -165,6 +165,11 @@ extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
extern int __fscache_begin_read_operation(struct netfs_cache_resources *, struct fscache_cookie *);

+extern void __fscache_write_to_cache(struct fscache_cookie *, struct address_space *,
+ loff_t, size_t, loff_t, netfs_io_terminated_t, void *,
+ bool);
+extern void __fscache_clear_page_bits(struct address_space *, loff_t, size_t);
+
/**
* fscache_acquire_volume - Register a volume as desiring caching services
* @volume_key: An identification string for the volume
@@ -503,4 +508,62 @@ int fscache_write(struct netfs_cache_resources *cres,
return ops->write(cres, start_pos, iter, term_func, term_func_priv);
}

+/**
+ * fscache_clear_page_bits - Clear the PG_fscache bits from a set of pages
+ * @cookie: The cookie representing the cache object
+ * @mapping: The netfs inode to use as the source
+ * @start: The start position in @mapping
+ * @len: The amount of data to unlock
+ * @caching: If PG_fscache has been set
+ *
+ * Clear the PG_fscache flag from a sequence of pages and wake up anyone who's
+ * waiting.
+ */
+static inline void fscache_clear_page_bits(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ loff_t start, size_t len,
+ bool caching)
+{
+ if (caching)
+ __fscache_clear_page_bits(mapping, start, len);
+}
+
+/**
+ * fscache_write_to_cache - Save a write to the cache and clear PG_fscache
+ * @cookie: The cookie representing the cache object
+ * @mapping: The netfs inode to use as the source
+ * @start: The start position in @mapping
+ * @len: The amount of data to write back
+ * @i_size: The new size of the inode
+ * @term_func: The function to call upon completion
+ * @term_func_priv: The private data for @term_func
+ * @caching: If PG_fscache has been set
+ *
+ * Helper function for a netfs to write dirty data from an inode into the cache
+ * object that's backing it.
+ *
+ * @start and @len describe the range of the data. This does not need to be
+ * page-aligned, but to satisfy DIO requirements, the cache may expand it up to
+ * the page boundaries on either end. All the pages covering the range must be
+ * marked with PG_fscache.
+ *
+ * If given, @term_func will be called upon completion and supplied with
+ * @term_func_priv. Note that the PG_fscache flags will have been cleared by
+ * this point, so the netfs must retain its own pin on the mapping.
+ */
+static inline void fscache_write_to_cache(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ loff_t start, size_t len, loff_t i_size,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv,
+ bool caching)
+{
+ if (caching)
+ __fscache_write_to_cache(cookie, mapping, start, len, i_size,
+ term_func, term_func_priv, caching);
+ else if (term_func)
+ term_func(term_func_priv, -ENOBUFS, false);
+
+}
+
#endif /* _LINUX_FSCACHE_H */



2021-12-22 23:21:06

by David Howells

[permalink] [raw]
Subject: [PATCH v4 27/68] vfs, fscache: Implement pinning of cache usage for writeback

Cachefiles has a problem in that it needs to keep the backing file for a
cookie open whilst there are local modifications pending that need to be
written to it. However, we don't want to keep the file open indefinitely,
as that causes EMFILE/ENFILE/ENOMEM problems.

Reopening the cache file, however, is a problem if this is being done due
to writeback triggered by exit(). Some filesystems will oops if we try to
open a file in that context because they want to access current->fs or
other resources that have already been dismantled.

To get around this, I added the following:

(1) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network filesystem
inode to indicate that we have a usage count on the cookie caching
that inode.

(2) A flag in struct writeback_control, unpinned_fscache_wb, that is set
when __writeback_single_inode() clears the last dirty page from
i_pages - at which point it clears I_PINNING_FSCACHE_WB and sets this
flag.

This has to be done here so that clearing I_PINNING_FSCACHE_WB can be
done atomically with the check of PAGECACHE_TAG_DIRTY that clears
I_DIRTY_PAGES.

(3) A function, fscache_set_page_dirty(), which if it is not set, sets
I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the cache
resources.

(4) A function, fscache_unpin_writeback(), to be called by ->write_inode()
to unuse the cookie.

(5) A function, fscache_clear_inode_writeback(), to be called when the
inode is evicted, before clear_inode() is called. This cleans up any
lingering I_PINNING_FSCACHE_WB.

The network filesystem can then use these tools to make sure that
fscache_write_to_cache() can write locally modified data to the cache as
well as to the server.

For the future, I'm working on write helpers for netfs lib that should
allow this facility to be removed by keeping track of the dirty regions
separately - but that's incomplete at the moment and is also going to be
affected by folios, one way or another, since it deals with pages

Signed-off-by: David Howells <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819615157.215744.17623791756928043114.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906917856.143852.8224898306177154573.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967124567.1823006.14188359004568060298.stgit@warthog.procyon.org.uk/ # v3
---

fs/fs-writeback.c | 8 ++++++++
fs/fscache/io.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 3 +++
include/linux/fscache.h | 41 +++++++++++++++++++++++++++++++++++++++++
include/linux/writeback.h | 1 +
5 files changed, 91 insertions(+)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 67f0e88eed01..8294a60ce323 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1666,6 +1666,13 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)

if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
inode->i_state |= I_DIRTY_PAGES;
+ else if (unlikely(inode->i_state & I_PINNING_FSCACHE_WB)) {
+ if (!(inode->i_state & I_DIRTY_PAGES)) {
+ inode->i_state &= ~I_PINNING_FSCACHE_WB;
+ wbc->unpinned_fscache_wb = true;
+ dirty |= I_PINNING_FSCACHE_WB; /* Cause write_inode */
+ }
+ }

spin_unlock(&inode->i_lock);

@@ -1675,6 +1682,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
if (ret == 0)
ret = err;
}
+ wbc->unpinned_fscache_wb = false;
trace_writeback_single_inode(inode, wbc, nr_to_write);
return ret;
}
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index 74cde7acf434..e9e5d6758ea8 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -150,6 +150,44 @@ int __fscache_begin_read_operation(struct netfs_cache_resources *cres,
}
EXPORT_SYMBOL(__fscache_begin_read_operation);

+/**
+ * fscache_set_page_dirty - Mark page dirty and pin a cache object for writeback
+ * @page: The page being dirtied
+ * @cookie: The cookie referring to the cache object
+ *
+ * Set the dirty flag on a page and pin an in-use cache object in memory when
+ * dirtying a page so that writeback can later write to it. This is intended
+ * to be called from the filesystem's ->set_page_dirty() method.
+ *
+ * Returns 1 if PG_dirty was set on the page, 0 otherwise.
+ */
+int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie)
+{
+ struct inode *inode = page->mapping->host;
+ bool need_use = false;
+
+ _enter("");
+
+ if (!__set_page_dirty_nobuffers(page))
+ return 0;
+ if (!fscache_cookie_valid(cookie))
+ return 1;
+
+ if (!(inode->i_state & I_PINNING_FSCACHE_WB)) {
+ spin_lock(&inode->i_lock);
+ if (!(inode->i_state & I_PINNING_FSCACHE_WB)) {
+ inode->i_state |= I_PINNING_FSCACHE_WB;
+ need_use = true;
+ }
+ spin_unlock(&inode->i_lock);
+
+ if (need_use)
+ fscache_use_cookie(cookie, true);
+ }
+ return 1;
+}
+EXPORT_SYMBOL(fscache_set_page_dirty);
+
struct fscache_write_request {
struct netfs_cache_resources cache_resources;
struct address_space *mapping;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbf812ce89a8..2c0b8e77d9ab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2418,6 +2418,8 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
* Used to detect that mark_inode_dirty() should not move
* inode between dirty lists.
*
+ * I_PINNING_FSCACHE_WB Inode is pinning an fscache object for writeback.
+ *
* Q: What is the difference between I_WILL_FREE and I_FREEING?
*/
#define I_DIRTY_SYNC (1 << 0)
@@ -2440,6 +2442,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
#define I_CREATING (1 << 15)
#define I_DONTCACHE (1 << 16)
#define I_SYNC_QUEUED (1 << 17)
+#define I_PINNING_FSCACHE_WB (1 << 18)

#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
#define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 9d469613e16c..18e725671594 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -16,6 +16,7 @@

#include <linux/fs.h>
#include <linux/netfs.h>
+#include <linux/writeback.h>

#if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE)
#define __fscache_available (1)
@@ -566,4 +567,44 @@ static inline void fscache_write_to_cache(struct fscache_cookie *cookie,

}

+#if __fscache_available
+extern int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie);
+#else
+#define fscache_set_page_dirty(PAGE, COOKIE) (__set_page_dirty_nobuffers((PAGE)))
+#endif
+
+/**
+ * fscache_unpin_writeback - Unpin writeback resources
+ * @wbc: The writeback control
+ * @cookie: The cookie referring to the cache object
+ *
+ * Unpin the writeback resources pinned by fscache_set_page_dirty(). This is
+ * intended to be called by the netfs's ->write_inode() method.
+ */
+static inline void fscache_unpin_writeback(struct writeback_control *wbc,
+ struct fscache_cookie *cookie)
+{
+ if (wbc->unpinned_fscache_wb)
+ fscache_unuse_cookie(cookie, NULL, NULL);
+}
+
+/**
+ * fscache_clear_inode_writeback - Clear writeback resources pinned by an inode
+ * @cookie: The cookie referring to the cache object
+ * @inode: The inode to clean up
+ * @aux: Auxiliary data to apply to the inode
+ *
+ * Clear any writeback resources held by an inode when the inode is evicted.
+ * This must be called before clear_inode() is called.
+ */
+static inline void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
+ struct inode *inode,
+ const void *aux)
+{
+ if (inode->i_state & I_PINNING_FSCACHE_WB) {
+ loff_t i_size = i_size_read(inode);
+ fscache_unuse_cookie(cookie, aux, &i_size);
+ }
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 3bfd487d1dd2..fec248ab1fec 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -68,6 +68,7 @@ struct writeback_control {
unsigned for_reclaim:1; /* Invoked from the page allocator */
unsigned range_cyclic:1; /* range_start is cyclic */
unsigned for_sync:1; /* sync(2) WB_SYNC_ALL writeback */
+ unsigned unpinned_fscache_wb:1; /* Cleared I_PINNING_FSCACHE_WB */

/*
* When writeback IOs are bounced through async layers, only the



2021-12-22 23:21:20

by David Howells

[permalink] [raw]
Subject: [PATCH v4 28/68] fscache: Provide a function to note the release of a page

Provide a function to be called from a network filesystem's releasepage
method to indicate that a page has been released that might have been a
reflection of data upon the server - and now that data must be reloaded
from the server or the cache.

This is used to end an optimisation for empty files, in particular files
that have just been created locally, whereby we know there cannot yet be
any data that we would need to read from the server or the cache.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819617128.215744.4725572296135656508.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906920354.143852.7511819614661372008.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967128061.1823006.611781655060034988.stgit@warthog.procyon.org.uk/ # v3
---

include/linux/fscache.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 18e725671594..28ce258c1f87 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -607,4 +607,20 @@ static inline void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
}
}

+/**
+ * fscache_note_page_release - Note that a netfs page got released
+ * @cookie: The cookie corresponding to the file
+ *
+ * Note that a page that has been copied to the cache has been released. This
+ * means that future reads will need to look in the cache to see if it's there.
+ */
+static inline
+void fscache_note_page_release(struct fscache_cookie *cookie)
+{
+ if (cookie &&
+ test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
+ test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
+ clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+}
+
#endif /* _LINUX_FSCACHE_H */



2021-12-22 23:21:34

by David Howells

[permalink] [raw]
Subject: [PATCH v4 29/68] fscache: Provide a function to resize a cookie

Provide a function to change the size of the storage attached to a cookie,
to match the size of the file being cached when it's changed by truncate or
fallocate:

void fscache_resize_cookie(struct fscache_cookie *cookie,
loff_t new_size);

This acts synchronously and is expected to run under the inode lock of the
caller.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819621839.215744.7895597119803515402.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906922387.143852.16394459879816147793.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967128998.1823006.10740669081985775576.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/internal.h | 3 +++
fs/fscache/io.c | 25 +++++++++++++++++++++++++
fs/fscache/stats.c | 9 +++++++--
include/linux/fscache-cache.h | 4 ++++
include/linux/fscache.h | 18 ++++++++++++++++++
include/trace/events/fscache.h | 25 +++++++++++++++++++++++++
6 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 017bf3d346a4..f121c21590dc 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -122,6 +122,9 @@ extern atomic_t fscache_n_relinquishes;
extern atomic_t fscache_n_relinquishes_retire;
extern atomic_t fscache_n_relinquishes_dropped;

+extern atomic_t fscache_n_resizes;
+extern atomic_t fscache_n_resizes_null;
+
static inline void fscache_stat(atomic_t *stat)
{
atomic_inc(stat);
diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index e9e5d6758ea8..bed7628a5a9d 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -291,3 +291,28 @@ void __fscache_write_to_cache(struct fscache_cookie *cookie,
term_func(term_func_priv, ret, false);
}
EXPORT_SYMBOL(__fscache_write_to_cache);
+
+/*
+ * Change the size of a backing object.
+ */
+void __fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
+{
+ struct netfs_cache_resources cres;
+
+ trace_fscache_resize(cookie, new_size);
+ if (fscache_begin_operation(&cres, cookie, FSCACHE_WANT_WRITE,
+ fscache_access_io_resize) == 0) {
+ fscache_stat(&fscache_n_resizes);
+ set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &cookie->flags);
+
+ /* We cannot defer a resize as we need to do it inside the
+ * netfs's inode lock so that we're serialised with respect to
+ * writes.
+ */
+ cookie->volume->cache->ops->resize_cookie(&cres, new_size);
+ fscache_end_operation(&cres);
+ } else {
+ fscache_stat(&fscache_n_resizes_null);
+ }
+}
+EXPORT_SYMBOL(__fscache_resize_cookie);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index db42beb1ba3f..798ee68b3e9d 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -35,6 +35,9 @@ atomic_t fscache_n_relinquishes;
atomic_t fscache_n_relinquishes_retire;
atomic_t fscache_n_relinquishes_dropped;

+atomic_t fscache_n_resizes;
+atomic_t fscache_n_resizes_null;
+
atomic_t fscache_n_read;
EXPORT_SYMBOL(fscache_n_read);
atomic_t fscache_n_write;
@@ -69,8 +72,10 @@ int fscache_stats_show(struct seq_file *m, void *v)
seq_printf(m, "Invals : n=%u\n",
atomic_read(&fscache_n_invalidates));

- seq_printf(m, "Updates: n=%u\n",
- atomic_read(&fscache_n_updates));
+ seq_printf(m, "Updates: n=%u rsz=%u rsn=%u\n",
+ atomic_read(&fscache_n_updates),
+ atomic_read(&fscache_n_resizes),
+ atomic_read(&fscache_n_resizes_null));

seq_printf(m, "Relinqs: n=%u rtr=%u drop=%u\n",
atomic_read(&fscache_n_relinquishes),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 796c8b5c5305..3fa4902dc87c 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -64,6 +64,10 @@ struct fscache_cache_ops {
/* Withdraw an object without any cookie access counts held */
void (*withdraw_cookie)(struct fscache_cookie *cookie);

+ /* Change the size of a data object */
+ void (*resize_cookie)(struct netfs_cache_resources *cres,
+ loff_t new_size);
+
/* Invalidate an object */
bool (*invalidate_cookie)(struct fscache_cookie *cookie);

diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 28ce258c1f87..86b1c0db1de5 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -163,6 +163,7 @@ extern struct fscache_cookie *__fscache_acquire_cookie(
extern void __fscache_use_cookie(struct fscache_cookie *, bool);
extern void __fscache_unuse_cookie(struct fscache_cookie *, const void *, const loff_t *);
extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
+extern void __fscache_resize_cookie(struct fscache_cookie *, loff_t);
extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
extern int __fscache_begin_read_operation(struct netfs_cache_resources *, struct fscache_cookie *);

@@ -366,6 +367,23 @@ void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data,
__fscache_update_cookie(cookie, aux_data, object_size);
}

+/**
+ * fscache_resize_cookie - Request that a cache object be resized
+ * @cookie: The cookie representing the cache object
+ * @new_size: The new size of the object (may be NULL)
+ *
+ * Request that the size of an object be changed.
+ *
+ * See Documentation/filesystems/caching/netfs-api.txt for a complete
+ * description.
+ */
+static inline
+void fscache_resize_cookie(struct fscache_cookie *cookie, loff_t new_size)
+{
+ if (fscache_cookie_enabled(cookie))
+ __fscache_resize_cookie(cookie, new_size);
+}
+
/**
* fscache_invalidate - Notify cache that an object needs invalidation
* @cookie: The cookie representing the cache object
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 2459d75659cf..5fa37a8b4ec7 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -78,6 +78,7 @@ enum fscache_access_trace {
fscache_access_invalidate_cookie_end,
fscache_access_io_not_live,
fscache_access_io_read,
+ fscache_access_io_resize,
fscache_access_io_wait,
fscache_access_io_write,
fscache_access_lookup_cookie,
@@ -149,6 +150,7 @@ enum fscache_access_trace {
EM(fscache_access_invalidate_cookie_end,"END inval ") \
EM(fscache_access_io_not_live, "END io_notl") \
EM(fscache_access_io_read, "BEGIN io_read") \
+ EM(fscache_access_io_resize, "BEGIN io_resz") \
EM(fscache_access_io_wait, "WAIT io ") \
EM(fscache_access_io_write, "BEGIN io_writ") \
EM(fscache_access_lookup_cookie, "BEGIN lookup ") \
@@ -418,6 +420,29 @@ TRACE_EVENT(fscache_invalidate,
__entry->cookie, __entry->new_size)
);

+TRACE_EVENT(fscache_resize,
+ TP_PROTO(struct fscache_cookie *cookie, loff_t new_size),
+
+ TP_ARGS(cookie, new_size),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(loff_t, old_size )
+ __field(loff_t, new_size )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie->debug_id;
+ __entry->old_size = cookie->object_size;
+ __entry->new_size = new_size;
+ ),
+
+ TP_printk("c=%08x os=%08llx sz=%08llx",
+ __entry->cookie,
+ __entry->old_size,
+ __entry->new_size)
+ );
+
#endif /* _TRACE_FSCACHE_H */

/* This part must be outside protection */



2021-12-22 23:21:45

by David Howells

[permalink] [raw]
Subject: [PATCH v4 30/68] cachefiles: Introduce rewritten driver

Introduce basic skeleton of the rewritten cachefiles driver including
config options so that it can be enabled for compilation.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819622766.215744.9108359326983195047.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906923341.143852.3856498104256721447.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967130320.1823006.15791456613198441566.stgit@warthog.procyon.org.uk/ # v3
---

fs/Kconfig | 1
fs/Makefile | 1
fs/cachefiles/Kconfig | 21 +++++++
fs/cachefiles/Makefile | 9 +++
fs/cachefiles/internal.h | 115 +++++++++++++++++++++++++++++++++++++
fs/cachefiles/main.c | 53 +++++++++++++++++
include/trace/events/cachefiles.h | 49 ++++++++++++++++
7 files changed, 249 insertions(+)
create mode 100644 fs/cachefiles/Kconfig
create mode 100644 fs/cachefiles/Makefile
create mode 100644 fs/cachefiles/internal.h
create mode 100644 fs/cachefiles/main.c
create mode 100644 include/trace/events/cachefiles.h

diff --git a/fs/Kconfig b/fs/Kconfig
index 86e311377e6e..a6313a969bc5 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -132,6 +132,7 @@ menu "Caches"

source "fs/netfs/Kconfig"
source "fs/fscache/Kconfig"
+source "fs/cachefiles/Kconfig"

endmenu

diff --git a/fs/Makefile b/fs/Makefile
index 290815f3fd31..84c5e4cdfee5 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -125,6 +125,7 @@ obj-$(CONFIG_AFS_FS) += afs/
obj-$(CONFIG_NILFS2_FS) += nilfs2/
obj-$(CONFIG_BEFS_FS) += befs/
obj-$(CONFIG_HOSTFS) += hostfs/
+obj-$(CONFIG_CACHEFILES) += cachefiles/
obj-$(CONFIG_DEBUG_FS) += debugfs/
obj-$(CONFIG_TRACING) += tracefs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
new file mode 100644
index 000000000000..6827b40f7ddc
--- /dev/null
+++ b/fs/cachefiles/Kconfig
@@ -0,0 +1,21 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config CACHEFILES
+ tristate "Filesystem caching on files"
+ depends on FSCACHE && BLOCK
+ help
+ This permits use of a mounted filesystem as a cache for other
+ filesystems - primarily networking filesystems - thus allowing fast
+ local disk to enhance the speed of slower devices.
+
+ See Documentation/filesystems/caching/cachefiles.rst for more
+ information.
+
+config CACHEFILES_DEBUG
+ bool "Debug CacheFiles"
+ depends on CACHEFILES
+ help
+ This permits debugging to be dynamically enabled in the filesystem
+ caching on files module. If this is set, the debugging output may be
+ enabled by setting bits in /sys/modules/cachefiles/parameter/debug or
+ by including a debugging specifier in /etc/cachefilesd.conf.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
new file mode 100644
index 000000000000..a7f3e982e249
--- /dev/null
+++ b/fs/cachefiles/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for caching in a mounted filesystem
+#
+
+cachefiles-y := \
+ main.o
+
+obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
new file mode 100644
index 000000000000..26e0e23d7702
--- /dev/null
+++ b/fs/cachefiles/internal.h
@@ -0,0 +1,115 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* General netfs cache on cache files internal defs
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+
+#define pr_fmt(fmt) "CacheFiles: " fmt
+
+
+#include <linux/fscache-cache.h>
+#include <linux/cred.h>
+#include <linux/security.h>
+
+
+/*
+ * Debug tracing.
+ */
+extern unsigned cachefiles_debug;
+#define CACHEFILES_DEBUG_KENTER 1
+#define CACHEFILES_DEBUG_KLEAVE 2
+#define CACHEFILES_DEBUG_KDEBUG 4
+
+#define dbgprintk(FMT, ...) \
+ printk(KERN_DEBUG "[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__)
+
+#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __func__, ##__VA_ARGS__)
+#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
+#define kdebug(FMT, ...) dbgprintk(FMT, ##__VA_ARGS__)
+
+
+#if defined(__KDEBUG)
+#define _enter(FMT, ...) kenter(FMT, ##__VA_ARGS__)
+#define _leave(FMT, ...) kleave(FMT, ##__VA_ARGS__)
+#define _debug(FMT, ...) kdebug(FMT, ##__VA_ARGS__)
+
+#elif defined(CONFIG_CACHEFILES_DEBUG)
+#define _enter(FMT, ...) \
+do { \
+ if (cachefiles_debug & CACHEFILES_DEBUG_KENTER) \
+ kenter(FMT, ##__VA_ARGS__); \
+} while (0)
+
+#define _leave(FMT, ...) \
+do { \
+ if (cachefiles_debug & CACHEFILES_DEBUG_KLEAVE) \
+ kleave(FMT, ##__VA_ARGS__); \
+} while (0)
+
+#define _debug(FMT, ...) \
+do { \
+ if (cachefiles_debug & CACHEFILES_DEBUG_KDEBUG) \
+ kdebug(FMT, ##__VA_ARGS__); \
+} while (0)
+
+#else
+#define _enter(FMT, ...) no_printk("==> %s("FMT")", __func__, ##__VA_ARGS__)
+#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__)
+#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__)
+#endif
+
+#if 1 /* defined(__KDEBUGALL) */
+
+#define ASSERT(X) \
+do { \
+ if (unlikely(!(X))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ BUG(); \
+ } \
+} while (0)
+
+#define ASSERTCMP(X, OP, Y) \
+do { \
+ if (unlikely(!((X) OP (Y)))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ pr_err("%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while (0)
+
+#define ASSERTIF(C, X) \
+do { \
+ if (unlikely((C) && !(X))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ BUG(); \
+ } \
+} while (0)
+
+#define ASSERTIFCMP(C, X, OP, Y) \
+do { \
+ if (unlikely((C) && !((X) OP (Y)))) { \
+ pr_err("\n"); \
+ pr_err("Assertion failed\n"); \
+ pr_err("%lx " #OP " %lx is false\n", \
+ (unsigned long)(X), (unsigned long)(Y)); \
+ BUG(); \
+ } \
+} while (0)
+
+#else
+
+#define ASSERT(X) do {} while (0)
+#define ASSERTCMP(X, OP, Y) do {} while (0)
+#define ASSERTIF(C, X) do {} while (0)
+#define ASSERTIFCMP(C, X, OP, Y) do {} while (0)
+
+#endif
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
new file mode 100644
index 000000000000..47bc1cc078de
--- /dev/null
+++ b/fs/cachefiles/main.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Network filesystem caching backend to use cache files on a premounted
+ * filesystem
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/sysctl.h>
+#include <linux/miscdevice.h>
+#include <linux/netfs.h>
+#include <trace/events/netfs.h>
+#define CREATE_TRACE_POINTS
+#include "internal.h"
+
+unsigned cachefiles_debug;
+module_param_named(debug, cachefiles_debug, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(cachefiles_debug, "CacheFiles debugging mask");
+
+MODULE_DESCRIPTION("Mounted-filesystem based cache");
+MODULE_AUTHOR("Red Hat, Inc.");
+MODULE_LICENSE("GPL");
+
+/*
+ * initialise the fs caching module
+ */
+static int __init cachefiles_init(void)
+{
+ pr_info("Loaded\n");
+ return 0;
+}
+
+fs_initcall(cachefiles_init);
+
+/*
+ * clean up on module removal
+ */
+static void __exit cachefiles_exit(void)
+{
+ pr_info("Unloading\n");
+}
+
+module_exit(cachefiles_exit);
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
new file mode 100644
index 000000000000..5ee0aabb20be
--- /dev/null
+++ b/include/trace/events/cachefiles.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* CacheFiles tracepoints
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM cachefiles
+
+#if !defined(_TRACE_CACHEFILES_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_CACHEFILES_H
+
+#include <linux/tracepoint.h>
+
+/*
+ * Define enums for tracing information.
+ */
+#ifndef __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY
+#define __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY
+
+#endif
+
+/*
+ * Define enum -> string mappings for display.
+ */
+
+
+/*
+ * Export enum symbols via userspace.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define E_(a, b) TRACE_DEFINE_ENUM(a);
+
+/*
+ * Now redefine the EM() and E_() macros to map the enums to the strings that
+ * will be printed in the output.
+ */
+#undef EM
+#undef E_
+#define EM(a, b) { a, b },
+#define E_(a, b) { a, b }
+
+
+#endif /* _TRACE_CACHEFILES_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>



2021-12-22 23:22:05

by David Howells

[permalink] [raw]
Subject: [PATCH v4 31/68] cachefiles: Define structs

Define the cachefiles_cache struct that's going to carry the cache-level
parameters and state of a cache.

Define the beginning of the cachefiles_object struct that's going to carry
the state for a data storage object. For the moment this is just a
debugging ID for logging purposes.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819623690.215744.2824739137193655547.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906924292.143852.15881439716653984905.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967131405.1823006.4480555941533935597.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/internal.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 26e0e23d7702..cff4b2a5f928 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -16,6 +16,52 @@
#include <linux/cred.h>
#include <linux/security.h>

+struct cachefiles_cache;
+struct cachefiles_object;
+
+/*
+ * Data file records.
+ */
+struct cachefiles_object {
+ int debug_id; /* debugging ID */
+};
+
+/*
+ * Cache files cache definition
+ */
+struct cachefiles_cache {
+ struct vfsmount *mnt; /* mountpoint holding the cache */
+ struct file *cachefilesd; /* manager daemon handle */
+ const struct cred *cache_cred; /* security override for accessing cache */
+ struct mutex daemon_mutex; /* command serialisation mutex */
+ wait_queue_head_t daemon_pollwq; /* poll waitqueue for daemon */
+ atomic_t gravecounter; /* graveyard uniquifier */
+ atomic_t f_released; /* number of objects released lately */
+ atomic_long_t b_released; /* number of blocks released lately */
+ unsigned frun_percent; /* when to stop culling (% files) */
+ unsigned fcull_percent; /* when to start culling (% files) */
+ unsigned fstop_percent; /* when to stop allocating (% files) */
+ unsigned brun_percent; /* when to stop culling (% blocks) */
+ unsigned bcull_percent; /* when to start culling (% blocks) */
+ unsigned bstop_percent; /* when to stop allocating (% blocks) */
+ unsigned bsize; /* cache's block size */
+ unsigned bshift; /* min(ilog2(PAGE_SIZE / bsize), 0) */
+ uint64_t frun; /* when to stop culling */
+ uint64_t fcull; /* when to start culling */
+ uint64_t fstop; /* when to stop allocating */
+ sector_t brun; /* when to stop culling */
+ sector_t bcull; /* when to start culling */
+ sector_t bstop; /* when to stop allocating */
+ unsigned long flags;
+#define CACHEFILES_READY 0 /* T if cache prepared */
+#define CACHEFILES_DEAD 1 /* T if cache dead */
+#define CACHEFILES_CULLING 2 /* T if cull engaged */
+#define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */
+ char *rootdirname; /* name of cache root directory */
+ char *secctx; /* LSM security context */
+ char *tag; /* cache binding tag */
+};
+

/*
* Debug tracing.



2021-12-22 23:22:23

by David Howells

[permalink] [raw]
Subject: [PATCH v4 32/68] cachefiles: Add some error injection support

Add support for injecting ENOSPC or EIO errors. This needs to be enabled
by CONFIG_CACHEFILES_ERROR_INJECTION=y. Once enabled, ENOSPC on things
like write and mkdir can be triggered by:

echo 1 >/proc/sys/cachefiles/error_injection

and EIO can be triggered on most operations by:

echo 2 >/proc/sys/cachefiles/error_injection

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819624706.215744.6911916249119962943.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906925343.143852.5465695512984025812.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967134412.1823006.7354285948280296595.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Kconfig | 7 ++++++
fs/cachefiles/Makefile | 2 ++
fs/cachefiles/error_inject.c | 46 ++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 42 +++++++++++++++++++++++++++++++++++++-
fs/cachefiles/main.c | 12 +++++++++++
5 files changed, 108 insertions(+), 1 deletion(-)
create mode 100644 fs/cachefiles/error_inject.c

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 6827b40f7ddc..719faeeda168 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -19,3 +19,10 @@ config CACHEFILES_DEBUG
caching on files module. If this is set, the debugging output may be
enabled by setting bits in /sys/modules/cachefiles/parameter/debug or
by including a debugging specifier in /etc/cachefilesd.conf.
+
+config CACHEFILES_ERROR_INJECTION
+ bool "Provide error injection for cachefiles"
+ depends on CACHEFILES && SYSCTL
+ help
+ This permits error injection to be enabled in cachefiles whilst a
+ cache is in service.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index a7f3e982e249..183fb5f3b8b1 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -6,4 +6,6 @@
cachefiles-y := \
main.o

+cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
+
obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/error_inject.c b/fs/cachefiles/error_inject.c
new file mode 100644
index 000000000000..58f8aec964e4
--- /dev/null
+++ b/fs/cachefiles/error_inject.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Error injection handling.
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/sysctl.h>
+#include "internal.h"
+
+unsigned int cachefiles_error_injection_state;
+
+static struct ctl_table_header *cachefiles_sysctl;
+static struct ctl_table cachefiles_sysctls[] = {
+ {
+ .procname = "error_injection",
+ .data = &cachefiles_error_injection_state,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_douintvec,
+ },
+ {}
+};
+
+static struct ctl_table cachefiles_sysctls_root[] = {
+ {
+ .procname = "cachefiles",
+ .mode = 0555,
+ .child = cachefiles_sysctls,
+ },
+ {}
+};
+
+int __init cachefiles_register_error_injection(void)
+{
+ cachefiles_sysctl = register_sysctl_table(cachefiles_sysctls_root);
+ if (!cachefiles_sysctl)
+ return -ENOMEM;
+ return 0;
+
+}
+
+void cachefiles_unregister_error_injection(void)
+{
+ unregister_sysctl_table(cachefiles_sysctl);
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index cff4b2a5f928..1f2fea902d3e 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -64,7 +64,47 @@ struct cachefiles_cache {


/*
- * Debug tracing.
+ * error_inject.c
+ */
+#ifdef CONFIG_CACHEFILES_ERROR_INJECTION
+extern unsigned int cachefiles_error_injection_state;
+extern int cachefiles_register_error_injection(void);
+extern void cachefiles_unregister_error_injection(void);
+
+#else
+#define cachefiles_error_injection_state 0
+
+static inline int cachefiles_register_error_injection(void)
+{
+ return 0;
+}
+
+static inline void cachefiles_unregister_error_injection(void)
+{
+}
+#endif
+
+
+static inline int cachefiles_inject_read_error(void)
+{
+ return cachefiles_error_injection_state & 2 ? -EIO : 0;
+}
+
+static inline int cachefiles_inject_write_error(void)
+{
+ return cachefiles_error_injection_state & 2 ? -EIO :
+ cachefiles_error_injection_state & 1 ? -ENOSPC :
+ 0;
+}
+
+static inline int cachefiles_inject_remove_error(void)
+{
+ return cachefiles_error_injection_state & 2 ? -EIO : 0;
+}
+
+
+/*
+ * Debug tracing
*/
extern unsigned cachefiles_debug;
#define CACHEFILES_DEBUG_KENTER 1
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
index 47bc1cc078de..387d42c7185f 100644
--- a/fs/cachefiles/main.c
+++ b/fs/cachefiles/main.c
@@ -36,8 +36,18 @@ MODULE_LICENSE("GPL");
*/
static int __init cachefiles_init(void)
{
+ int ret;
+
+ ret = cachefiles_register_error_injection();
+ if (ret < 0)
+ goto error_einj;
+
pr_info("Loaded\n");
return 0;
+
+error_einj:
+ pr_err("failed to register: %d\n", ret);
+ return ret;
}

fs_initcall(cachefiles_init);
@@ -48,6 +58,8 @@ fs_initcall(cachefiles_init);
static void __exit cachefiles_exit(void)
{
pr_info("Unloading\n");
+
+ cachefiles_unregister_error_injection();
}

module_exit(cachefiles_exit);



2021-12-22 23:22:46

by David Howells

[permalink] [raw]
Subject: [PATCH v4 33/68] cachefiles: Add a couple of tracepoints for logging errors

Add two trace points to log errors, one for vfs operations like mkdir or
create, and one for I/O operations, like read, write or truncate.

Also add the beginnings of a struct that is going to represent a data file
and place a debugging ID in it for the tracepoints to record.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819625632.215744.17907340966178411033.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906926297.143852.18267924605548658911.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967135390.1823006.2512120406360156424.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/internal.h | 1
include/trace/events/cachefiles.h | 94 +++++++++++++++++++++++++++++++++++++
2 files changed, 95 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 1f2fea902d3e..b51146a29aca 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -62,6 +62,7 @@ struct cachefiles_cache {
char *tag; /* cache binding tag */
};

+#include <trace/events/cachefiles.h>

/*
* error_inject.c
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 5ee0aabb20be..9bd5a8a60801 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -18,11 +18,49 @@
#ifndef __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY
#define __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY

+enum cachefiles_error_trace {
+ cachefiles_trace_fallocate_error,
+ cachefiles_trace_getxattr_error,
+ cachefiles_trace_link_error,
+ cachefiles_trace_lookup_error,
+ cachefiles_trace_mkdir_error,
+ cachefiles_trace_notify_change_error,
+ cachefiles_trace_open_error,
+ cachefiles_trace_read_error,
+ cachefiles_trace_remxattr_error,
+ cachefiles_trace_rename_error,
+ cachefiles_trace_seek_error,
+ cachefiles_trace_setxattr_error,
+ cachefiles_trace_statfs_error,
+ cachefiles_trace_tmpfile_error,
+ cachefiles_trace_trunc_error,
+ cachefiles_trace_unlink_error,
+ cachefiles_trace_write_error,
+};
+
#endif

/*
* Define enum -> string mappings for display.
*/
+#define cachefiles_error_traces \
+ EM(cachefiles_trace_fallocate_error, "fallocate") \
+ EM(cachefiles_trace_getxattr_error, "getxattr") \
+ EM(cachefiles_trace_link_error, "link") \
+ EM(cachefiles_trace_lookup_error, "lookup") \
+ EM(cachefiles_trace_mkdir_error, "mkdir") \
+ EM(cachefiles_trace_notify_change_error, "notify_change") \
+ EM(cachefiles_trace_open_error, "open") \
+ EM(cachefiles_trace_read_error, "read") \
+ EM(cachefiles_trace_remxattr_error, "remxattr") \
+ EM(cachefiles_trace_rename_error, "rename") \
+ EM(cachefiles_trace_seek_error, "seek") \
+ EM(cachefiles_trace_setxattr_error, "setxattr") \
+ EM(cachefiles_trace_statfs_error, "statfs") \
+ EM(cachefiles_trace_tmpfile_error, "tmpfile") \
+ EM(cachefiles_trace_trunc_error, "trunc") \
+ EM(cachefiles_trace_unlink_error, "unlink") \
+ E_(cachefiles_trace_write_error, "write")


/*
@@ -33,6 +71,8 @@
#define EM(a, b) TRACE_DEFINE_ENUM(a);
#define E_(a, b) TRACE_DEFINE_ENUM(a);

+cachefiles_error_traces;
+
/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
* will be printed in the output.
@@ -43,6 +83,60 @@
#define E_(a, b) { a, b }


+TRACE_EVENT(cachefiles_vfs_error,
+ TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+ int error, enum cachefiles_error_trace where),
+
+ TP_ARGS(obj, backer, error, where),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ __field(enum cachefiles_error_trace, where )
+ __field(short, error )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : 0;
+ __entry->backer = backer->i_ino;
+ __entry->error = error;
+ __entry->where = where;
+ ),
+
+ TP_printk("o=%08x b=%08x %s e=%d",
+ __entry->obj,
+ __entry->backer,
+ __print_symbolic(__entry->where, cachefiles_error_traces),
+ __entry->error)
+ );
+
+TRACE_EVENT(cachefiles_io_error,
+ TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+ int error, enum cachefiles_error_trace where),
+
+ TP_ARGS(obj, backer, error, where),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ __field(enum cachefiles_error_trace, where )
+ __field(short, error )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : 0;
+ __entry->backer = backer->i_ino;
+ __entry->error = error;
+ __entry->where = where;
+ ),
+
+ TP_printk("o=%08x b=%08x %s e=%d",
+ __entry->obj,
+ __entry->backer,
+ __print_symbolic(__entry->where, cachefiles_error_traces),
+ __entry->error)
+ );
+
#endif /* _TRACE_CACHEFILES_H */

/* This part must be outside protection */



2021-12-22 23:22:50

by David Howells

[permalink] [raw]
Subject: [PATCH v4 34/68] cachefiles: Add cache error reporting macro

Add a macro to report a cache I/O error and to tell fscache that the cache
is in trouble.

Also add a pointer to the fscache cache cookie from the cachefiles_cache
struct as we need that to pass to fscache_io_error().

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819626562.215744.1503690975344731661.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906927235.143852.13694625647880837563.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967137158.1823006.2065038830569321335.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/internal.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b51146a29aca..b2adcb59b4ce 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -30,6 +30,7 @@ struct cachefiles_object {
* Cache files cache definition
*/
struct cachefiles_cache {
+ struct fscache_cache *cache; /* Cache cookie */
struct vfsmount *mnt; /* mountpoint holding the cache */
struct file *cachefilesd; /* manager daemon handle */
const struct cred *cache_cred; /* security override for accessing cache */
@@ -103,6 +104,16 @@ static inline int cachefiles_inject_remove_error(void)
return cachefiles_error_injection_state & 2 ? -EIO : 0;
}

+/*
+ * Error handling
+ */
+#define cachefiles_io_error(___cache, FMT, ...) \
+do { \
+ pr_err("I/O Error: " FMT"\n", ##__VA_ARGS__); \
+ fscache_io_error((___cache)->cache); \
+ set_bit(CACHEFILES_DEAD, &(___cache)->flags); \
+} while (0)
+

/*
* Debug tracing



2021-12-22 23:23:09

by David Howells

[permalink] [raw]
Subject: [PATCH v4 35/68] cachefiles: Add security derivation

Implement code to derive a new set of creds for the cachefiles to use when
making VFS or I/O calls and to change the auditing info since the
application interacting with the network filesystem is not accessing the
cache directly. Cachefiles uses override_creds() to change the effective
creds temporarily.

set_security_override_from_ctx() is called to derive the LSM 'label' that
the cachefiles driver will act with. set_create_files_as() is called to
determine the LSM 'label' that will be applied to files and directories
created in the cache. These functions alter the new creds.

Also implement a couple of functions to wrap the calls to begin/end cred
overriding.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819627469.215744.3603633690679962985.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906928172.143852.15886637013364286786.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967138138.1823006.7620933448261939504.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 3 +
fs/cachefiles/internal.h | 20 ++++++++
fs/cachefiles/security.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 134 insertions(+), 1 deletion(-)
create mode 100644 fs/cachefiles/security.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 183fb5f3b8b1..28bbb0d14868 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -4,7 +4,8 @@
#

cachefiles-y := \
- main.o
+ main.o \
+ security.o

cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index b2adcb59b4ce..e57ce5ef875c 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -104,6 +104,26 @@ static inline int cachefiles_inject_remove_error(void)
return cachefiles_error_injection_state & 2 ? -EIO : 0;
}

+/*
+ * security.c
+ */
+extern int cachefiles_get_security_ID(struct cachefiles_cache *cache);
+extern int cachefiles_determine_cache_security(struct cachefiles_cache *cache,
+ struct dentry *root,
+ const struct cred **_saved_cred);
+
+static inline void cachefiles_begin_secure(struct cachefiles_cache *cache,
+ const struct cred **_saved_cred)
+{
+ *_saved_cred = override_creds(cache->cache_cred);
+}
+
+static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
+ const struct cred *saved_cred)
+{
+ revert_creds(saved_cred);
+}
+
/*
* Error handling
*/
diff --git a/fs/cachefiles/security.c b/fs/cachefiles/security.c
new file mode 100644
index 000000000000..fe777164f1d8
--- /dev/null
+++ b/fs/cachefiles/security.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* CacheFiles security management
+ *
+ * Copyright (C) 2007, 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/fs.h>
+#include <linux/cred.h>
+#include "internal.h"
+
+/*
+ * determine the security context within which we access the cache from within
+ * the kernel
+ */
+int cachefiles_get_security_ID(struct cachefiles_cache *cache)
+{
+ struct cred *new;
+ int ret;
+
+ _enter("{%s}", cache->secctx);
+
+ new = prepare_kernel_cred(current);
+ if (!new) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ if (cache->secctx) {
+ ret = set_security_override_from_ctx(new, cache->secctx);
+ if (ret < 0) {
+ put_cred(new);
+ pr_err("Security denies permission to nominate security context: error %d\n",
+ ret);
+ goto error;
+ }
+ }
+
+ cache->cache_cred = new;
+ ret = 0;
+error:
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * see if mkdir and create can be performed in the root directory
+ */
+static int cachefiles_check_cache_dir(struct cachefiles_cache *cache,
+ struct dentry *root)
+{
+ int ret;
+
+ ret = security_inode_mkdir(d_backing_inode(root), root, 0);
+ if (ret < 0) {
+ pr_err("Security denies permission to make dirs: error %d",
+ ret);
+ return ret;
+ }
+
+ ret = security_inode_create(d_backing_inode(root), root, 0);
+ if (ret < 0)
+ pr_err("Security denies permission to create files: error %d",
+ ret);
+
+ return ret;
+}
+
+/*
+ * check the security details of the on-disk cache
+ * - must be called with security override in force
+ * - must return with a security override in force - even in the case of an
+ * error
+ */
+int cachefiles_determine_cache_security(struct cachefiles_cache *cache,
+ struct dentry *root,
+ const struct cred **_saved_cred)
+{
+ struct cred *new;
+ int ret;
+
+ _enter("");
+
+ /* duplicate the cache creds for COW (the override is currently in
+ * force, so we can use prepare_creds() to do this) */
+ new = prepare_creds();
+ if (!new)
+ return -ENOMEM;
+
+ cachefiles_end_secure(cache, *_saved_cred);
+
+ /* use the cache root dir's security context as the basis with
+ * which create files */
+ ret = set_create_files_as(new, d_backing_inode(root));
+ if (ret < 0) {
+ abort_creds(new);
+ cachefiles_begin_secure(cache, _saved_cred);
+ _leave(" = %d [cfa]", ret);
+ return ret;
+ }
+
+ put_cred(cache->cache_cred);
+ cache->cache_cred = new;
+
+ cachefiles_begin_secure(cache, _saved_cred);
+ ret = cachefiles_check_cache_dir(cache, root);
+
+ if (ret == -EOPNOTSUPP)
+ ret = 0;
+ _leave(" = %d", ret);
+ return ret;
+}



2021-12-22 23:23:24

by David Howells

[permalink] [raw]
Subject: [PATCH v4 36/68] cachefiles: Register a miscdev and parse commands over it

Register a misc device with which to talk to the daemon. The misc device
holds a cache set up through it around and closing the device kills the
cache.

cachefilesd communicates with the kernel by passing it single-line text
commands. Parse these and use them to parameterise the cache state. This
does not implement the command to actually bring a cache online. That's
left for later.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819628388.215744.17712097043607299608.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906929128.143852.14065207858943654011.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967139085.1823006.3514846391807454287.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 1
fs/cachefiles/daemon.c | 725 ++++++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 14 +
fs/cachefiles/main.c | 12 +
4 files changed, 752 insertions(+)
create mode 100644 fs/cachefiles/daemon.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 28bbb0d14868..f008524bb78f 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -4,6 +4,7 @@
#

cachefiles-y := \
+ daemon.o \
main.o \
security.o

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
new file mode 100644
index 000000000000..4cfb7c8b37d0
--- /dev/null
+++ b/fs/cachefiles/daemon.c
@@ -0,0 +1,725 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Daemon interface
+ *
+ * Copyright (C) 2007, 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/completion.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/mount.h>
+#include <linux/statfs.h>
+#include <linux/ctype.h>
+#include <linux/string.h>
+#include <linux/fs_struct.h>
+#include "internal.h"
+
+static int cachefiles_daemon_open(struct inode *, struct file *);
+static int cachefiles_daemon_release(struct inode *, struct file *);
+static ssize_t cachefiles_daemon_read(struct file *, char __user *, size_t,
+ loff_t *);
+static ssize_t cachefiles_daemon_write(struct file *, const char __user *,
+ size_t, loff_t *);
+static __poll_t cachefiles_daemon_poll(struct file *,
+ struct poll_table_struct *);
+static int cachefiles_daemon_frun(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_fcull(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_fstop(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_brun(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_bcull(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_bstop(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_cull(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_debug(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_dir(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_inuse(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_secctx(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_tag(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
+static void cachefiles_daemon_unbind(struct cachefiles_cache *);
+
+static unsigned long cachefiles_open;
+
+const struct file_operations cachefiles_daemon_fops = {
+ .owner = THIS_MODULE,
+ .open = cachefiles_daemon_open,
+ .release = cachefiles_daemon_release,
+ .read = cachefiles_daemon_read,
+ .write = cachefiles_daemon_write,
+ .poll = cachefiles_daemon_poll,
+ .llseek = noop_llseek,
+};
+
+struct cachefiles_daemon_cmd {
+ char name[8];
+ int (*handler)(struct cachefiles_cache *cache, char *args);
+};
+
+static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
+ { "bind", cachefiles_daemon_bind },
+ { "brun", cachefiles_daemon_brun },
+ { "bcull", cachefiles_daemon_bcull },
+ { "bstop", cachefiles_daemon_bstop },
+ { "cull", cachefiles_daemon_cull },
+ { "debug", cachefiles_daemon_debug },
+ { "dir", cachefiles_daemon_dir },
+ { "frun", cachefiles_daemon_frun },
+ { "fcull", cachefiles_daemon_fcull },
+ { "fstop", cachefiles_daemon_fstop },
+ { "inuse", cachefiles_daemon_inuse },
+ { "secctx", cachefiles_daemon_secctx },
+ { "tag", cachefiles_daemon_tag },
+ { "", NULL }
+};
+
+
+/*
+ * Prepare a cache for caching.
+ */
+static int cachefiles_daemon_open(struct inode *inode, struct file *file)
+{
+ struct cachefiles_cache *cache;
+
+ _enter("");
+
+ /* only the superuser may do this */
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ /* the cachefiles device may only be open once at a time */
+ if (xchg(&cachefiles_open, 1) == 1)
+ return -EBUSY;
+
+ /* allocate a cache record */
+ cache = kzalloc(sizeof(struct cachefiles_cache), GFP_KERNEL);
+ if (!cache) {
+ cachefiles_open = 0;
+ return -ENOMEM;
+ }
+
+ mutex_init(&cache->daemon_mutex);
+ init_waitqueue_head(&cache->daemon_pollwq);
+
+ /* set default caching limits
+ * - limit at 1% free space and/or free files
+ * - cull below 5% free space and/or free files
+ * - cease culling above 7% free space and/or free files
+ */
+ cache->frun_percent = 7;
+ cache->fcull_percent = 5;
+ cache->fstop_percent = 1;
+ cache->brun_percent = 7;
+ cache->bcull_percent = 5;
+ cache->bstop_percent = 1;
+
+ file->private_data = cache;
+ cache->cachefilesd = file;
+ return 0;
+}
+
+/*
+ * Release a cache.
+ */
+static int cachefiles_daemon_release(struct inode *inode, struct file *file)
+{
+ struct cachefiles_cache *cache = file->private_data;
+
+ _enter("");
+
+ ASSERT(cache);
+
+ set_bit(CACHEFILES_DEAD, &cache->flags);
+
+ cachefiles_daemon_unbind(cache);
+
+ /* clean up the control file interface */
+ cache->cachefilesd = NULL;
+ file->private_data = NULL;
+ cachefiles_open = 0;
+
+ kfree(cache);
+
+ _leave("");
+ return 0;
+}
+
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+ size_t buflen, loff_t *pos)
+{
+ struct cachefiles_cache *cache = file->private_data;
+ unsigned long long b_released;
+ unsigned f_released;
+ char buffer[256];
+ int n;
+
+ //_enter(",,%zu,", buflen);
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags))
+ return 0;
+
+ /* check how much space the cache has */
+ // PLACEHOLDER: Check space
+
+ /* summarise */
+ f_released = atomic_xchg(&cache->f_released, 0);
+ b_released = atomic_long_xchg(&cache->b_released, 0);
+ clear_bit(CACHEFILES_STATE_CHANGED, &cache->flags);
+
+ n = snprintf(buffer, sizeof(buffer),
+ "cull=%c"
+ " frun=%llx"
+ " fcull=%llx"
+ " fstop=%llx"
+ " brun=%llx"
+ " bcull=%llx"
+ " bstop=%llx"
+ " freleased=%x"
+ " breleased=%llx",
+ test_bit(CACHEFILES_CULLING, &cache->flags) ? '1' : '0',
+ (unsigned long long) cache->frun,
+ (unsigned long long) cache->fcull,
+ (unsigned long long) cache->fstop,
+ (unsigned long long) cache->brun,
+ (unsigned long long) cache->bcull,
+ (unsigned long long) cache->bstop,
+ f_released,
+ b_released);
+
+ if (n > buflen)
+ return -EMSGSIZE;
+
+ if (copy_to_user(_buffer, buffer, n) != 0)
+ return -EFAULT;
+
+ return n;
+}
+
+/*
+ * Take a command from cachefilesd, parse it and act on it.
+ */
+static ssize_t cachefiles_daemon_write(struct file *file,
+ const char __user *_data,
+ size_t datalen,
+ loff_t *pos)
+{
+ const struct cachefiles_daemon_cmd *cmd;
+ struct cachefiles_cache *cache = file->private_data;
+ ssize_t ret;
+ char *data, *args, *cp;
+
+ //_enter(",,%zu,", datalen);
+
+ ASSERT(cache);
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags))
+ return -EIO;
+
+ if (datalen > PAGE_SIZE - 1)
+ return -EOPNOTSUPP;
+
+ /* drag the command string into the kernel so we can parse it */
+ data = memdup_user_nul(_data, datalen);
+ if (IS_ERR(data))
+ return PTR_ERR(data);
+
+ ret = -EINVAL;
+ if (memchr(data, '\0', datalen))
+ goto error;
+
+ /* strip any newline */
+ cp = memchr(data, '\n', datalen);
+ if (cp) {
+ if (cp == data)
+ goto error;
+
+ *cp = '\0';
+ }
+
+ /* parse the command */
+ ret = -EOPNOTSUPP;
+
+ for (args = data; *args; args++)
+ if (isspace(*args))
+ break;
+ if (*args) {
+ if (args == data)
+ goto error;
+ *args = '\0';
+ args = skip_spaces(++args);
+ }
+
+ /* run the appropriate command handler */
+ for (cmd = cachefiles_daemon_cmds; cmd->name[0]; cmd++)
+ if (strcmp(cmd->name, data) == 0)
+ goto found_command;
+
+error:
+ kfree(data);
+ //_leave(" = %zd", ret);
+ return ret;
+
+found_command:
+ mutex_lock(&cache->daemon_mutex);
+
+ ret = -EIO;
+ if (!test_bit(CACHEFILES_DEAD, &cache->flags))
+ ret = cmd->handler(cache, args);
+
+ mutex_unlock(&cache->daemon_mutex);
+
+ if (ret == 0)
+ ret = datalen;
+ goto error;
+}
+
+/*
+ * Poll for culling state
+ * - use EPOLLOUT to indicate culling state
+ */
+static __poll_t cachefiles_daemon_poll(struct file *file,
+ struct poll_table_struct *poll)
+{
+ struct cachefiles_cache *cache = file->private_data;
+ __poll_t mask;
+
+ poll_wait(file, &cache->daemon_pollwq, poll);
+ mask = 0;
+
+ if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+ mask |= EPOLLIN;
+
+ if (test_bit(CACHEFILES_CULLING, &cache->flags))
+ mask |= EPOLLOUT;
+
+ return mask;
+}
+
+/*
+ * Give a range error for cache space constraints
+ * - can be tail-called
+ */
+static int cachefiles_daemon_range_error(struct cachefiles_cache *cache,
+ char *args)
+{
+ pr_err("Free space limits must be in range 0%%<=stop<cull<run<100%%\n");
+
+ return -EINVAL;
+}
+
+/*
+ * Set the percentage of files at which to stop culling
+ * - command: "frun <N>%"
+ */
+static int cachefiles_daemon_frun(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long frun;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ frun = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (frun <= cache->fcull_percent || frun >= 100)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->frun_percent = frun;
+ return 0;
+}
+
+/*
+ * Set the percentage of files at which to start culling
+ * - command: "fcull <N>%"
+ */
+static int cachefiles_daemon_fcull(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long fcull;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ fcull = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (fcull <= cache->fstop_percent || fcull >= cache->frun_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->fcull_percent = fcull;
+ return 0;
+}
+
+/*
+ * Set the percentage of files at which to stop allocating
+ * - command: "fstop <N>%"
+ */
+static int cachefiles_daemon_fstop(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long fstop;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ fstop = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (fstop >= cache->fcull_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->fstop_percent = fstop;
+ return 0;
+}
+
+/*
+ * Set the percentage of blocks at which to stop culling
+ * - command: "brun <N>%"
+ */
+static int cachefiles_daemon_brun(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long brun;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ brun = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (brun <= cache->bcull_percent || brun >= 100)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->brun_percent = brun;
+ return 0;
+}
+
+/*
+ * Set the percentage of blocks at which to start culling
+ * - command: "bcull <N>%"
+ */
+static int cachefiles_daemon_bcull(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long bcull;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ bcull = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (bcull <= cache->bstop_percent || bcull >= cache->brun_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->bcull_percent = bcull;
+ return 0;
+}
+
+/*
+ * Set the percentage of blocks at which to stop allocating
+ * - command: "bstop <N>%"
+ */
+static int cachefiles_daemon_bstop(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long bstop;
+
+ _enter(",%s", args);
+
+ if (!*args)
+ return -EINVAL;
+
+ bstop = simple_strtoul(args, &args, 10);
+ if (args[0] != '%' || args[1] != '\0')
+ return -EINVAL;
+
+ if (bstop >= cache->bcull_percent)
+ return cachefiles_daemon_range_error(cache, args);
+
+ cache->bstop_percent = bstop;
+ return 0;
+}
+
+/*
+ * Set the cache directory
+ * - command: "dir <name>"
+ */
+static int cachefiles_daemon_dir(struct cachefiles_cache *cache, char *args)
+{
+ char *dir;
+
+ _enter(",%s", args);
+
+ if (!*args) {
+ pr_err("Empty directory specified\n");
+ return -EINVAL;
+ }
+
+ if (cache->rootdirname) {
+ pr_err("Second cache directory specified\n");
+ return -EEXIST;
+ }
+
+ dir = kstrdup(args, GFP_KERNEL);
+ if (!dir)
+ return -ENOMEM;
+
+ cache->rootdirname = dir;
+ return 0;
+}
+
+/*
+ * Set the cache security context
+ * - command: "secctx <ctx>"
+ */
+static int cachefiles_daemon_secctx(struct cachefiles_cache *cache, char *args)
+{
+ char *secctx;
+
+ _enter(",%s", args);
+
+ if (!*args) {
+ pr_err("Empty security context specified\n");
+ return -EINVAL;
+ }
+
+ if (cache->secctx) {
+ pr_err("Second security context specified\n");
+ return -EINVAL;
+ }
+
+ secctx = kstrdup(args, GFP_KERNEL);
+ if (!secctx)
+ return -ENOMEM;
+
+ cache->secctx = secctx;
+ return 0;
+}
+
+/*
+ * Set the cache tag
+ * - command: "tag <name>"
+ */
+static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args)
+{
+ char *tag;
+
+ _enter(",%s", args);
+
+ if (!*args) {
+ pr_err("Empty tag specified\n");
+ return -EINVAL;
+ }
+
+ if (cache->tag)
+ return -EEXIST;
+
+ tag = kstrdup(args, GFP_KERNEL);
+ if (!tag)
+ return -ENOMEM;
+
+ cache->tag = tag;
+ return 0;
+}
+
+/*
+ * Request a node in the cache be culled from the current working directory
+ * - command: "cull <name>"
+ */
+static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args)
+{
+ struct path path;
+ const struct cred *saved_cred;
+ int ret;
+
+ _enter(",%s", args);
+
+ if (strchr(args, '/'))
+ goto inval;
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags)) {
+ pr_err("cull applied to unready cache\n");
+ return -EIO;
+ }
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ pr_err("cull applied to dead cache\n");
+ return -EIO;
+ }
+
+ get_fs_pwd(current->fs, &path);
+
+ if (!d_can_lookup(path.dentry))
+ goto notdir;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = -ENOANO; // PLACEHOLDER: Do culling
+ cachefiles_end_secure(cache, saved_cred);
+
+ path_put(&path);
+ _leave(" = %d", ret);
+ return ret;
+
+notdir:
+ path_put(&path);
+ pr_err("cull command requires dirfd to be a directory\n");
+ return -ENOTDIR;
+
+inval:
+ pr_err("cull command requires dirfd and filename\n");
+ return -EINVAL;
+}
+
+/*
+ * Set debugging mode
+ * - command: "debug <mask>"
+ */
+static int cachefiles_daemon_debug(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long mask;
+
+ _enter(",%s", args);
+
+ mask = simple_strtoul(args, &args, 0);
+ if (args[0] != '\0')
+ goto inval;
+
+ cachefiles_debug = mask;
+ _leave(" = 0");
+ return 0;
+
+inval:
+ pr_err("debug command requires mask\n");
+ return -EINVAL;
+}
+
+/*
+ * Find out whether an object in the current working directory is in use or not
+ * - command: "inuse <name>"
+ */
+static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
+{
+ struct path path;
+ const struct cred *saved_cred;
+ int ret;
+
+ //_enter(",%s", args);
+
+ if (strchr(args, '/'))
+ goto inval;
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags)) {
+ pr_err("inuse applied to unready cache\n");
+ return -EIO;
+ }
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ pr_err("inuse applied to dead cache\n");
+ return -EIO;
+ }
+
+ get_fs_pwd(current->fs, &path);
+
+ if (!d_can_lookup(path.dentry))
+ goto notdir;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = -ENOANO; // PLACEHOLDER: Check if in use
+ cachefiles_end_secure(cache, saved_cred);
+
+ path_put(&path);
+ //_leave(" = %d", ret);
+ return ret;
+
+notdir:
+ path_put(&path);
+ pr_err("inuse command requires dirfd to be a directory\n");
+ return -ENOTDIR;
+
+inval:
+ pr_err("inuse command requires dirfd and filename\n");
+ return -EINVAL;
+}
+
+/*
+ * Bind a directory as a cache
+ */
+static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
+{
+ _enter("{%u,%u,%u,%u,%u,%u},%s",
+ cache->frun_percent,
+ cache->fcull_percent,
+ cache->fstop_percent,
+ cache->brun_percent,
+ cache->bcull_percent,
+ cache->bstop_percent,
+ args);
+
+ if (cache->fstop_percent >= cache->fcull_percent ||
+ cache->fcull_percent >= cache->frun_percent ||
+ cache->frun_percent >= 100)
+ return -ERANGE;
+
+ if (cache->bstop_percent >= cache->bcull_percent ||
+ cache->bcull_percent >= cache->brun_percent ||
+ cache->brun_percent >= 100)
+ return -ERANGE;
+
+ if (*args) {
+ pr_err("'bind' command doesn't take an argument\n");
+ return -EINVAL;
+ }
+
+ if (!cache->rootdirname) {
+ pr_err("No cache directory specified\n");
+ return -EINVAL;
+ }
+
+ /* Don't permit already bound caches to be re-bound */
+ if (test_bit(CACHEFILES_READY, &cache->flags)) {
+ pr_err("Cache already bound\n");
+ return -EBUSY;
+ }
+
+ pr_warn("Cache is disabled for development\n");
+ return -ENOANO; // Don't allow the cache to operate yet
+}
+
+/*
+ * Unbind a cache.
+ */
+static void cachefiles_daemon_unbind(struct cachefiles_cache *cache)
+{
+ _enter("");
+
+ if (test_bit(CACHEFILES_READY, &cache->flags)) {
+ // PLACEHOLDER: Withdraw cache
+ }
+
+ mntput(cache->mnt);
+
+ kfree(cache->rootdirname);
+ kfree(cache->secctx);
+ kfree(cache->tag);
+
+ _leave("");
+}
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e57ce5ef875c..7fd5429715ea 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -65,6 +65,20 @@ struct cachefiles_cache {

#include <trace/events/cachefiles.h>

+/*
+ * note change of state for daemon
+ */
+static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
+{
+ set_bit(CACHEFILES_STATE_CHANGED, &cache->flags);
+ wake_up_all(&cache->daemon_pollwq);
+}
+
+/*
+ * daemon.c
+ */
+extern const struct file_operations cachefiles_daemon_fops;
+
/*
* error_inject.c
*/
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
index 387d42c7185f..533e3067d80f 100644
--- a/fs/cachefiles/main.c
+++ b/fs/cachefiles/main.c
@@ -31,6 +31,12 @@ MODULE_DESCRIPTION("Mounted-filesystem based cache");
MODULE_AUTHOR("Red Hat, Inc.");
MODULE_LICENSE("GPL");

+static struct miscdevice cachefiles_dev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "cachefiles",
+ .fops = &cachefiles_daemon_fops,
+};
+
/*
* initialise the fs caching module
*/
@@ -41,10 +47,15 @@ static int __init cachefiles_init(void)
ret = cachefiles_register_error_injection();
if (ret < 0)
goto error_einj;
+ ret = misc_register(&cachefiles_dev);
+ if (ret < 0)
+ goto error_dev;

pr_info("Loaded\n");
return 0;

+error_dev:
+ cachefiles_unregister_error_injection();
error_einj:
pr_err("failed to register: %d\n", ret);
return ret;
@@ -59,6 +70,7 @@ static void __exit cachefiles_exit(void)
{
pr_info("Unloading\n");

+ misc_deregister(&cachefiles_dev);
cachefiles_unregister_error_injection();
}




2021-12-22 23:23:36

by David Howells

[permalink] [raw]
Subject: [PATCH v4 37/68] cachefiles: Provide a function to check how much space there is

Provide a function to check how much space there is. This also flips the
state on the cache and will signal the daemon to inform it of the change
and to ask it to do some culling if necessary.

We will also need to subtract the amount of data currently being written to
the cache (cache->b_writing) from the amount of available space to avoid
hitting ENOSPC accidentally.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819629322.215744.13457425294680841213.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906930100.143852.1681026700865762069.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967140058.1823006.7781243664702837128.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 1
fs/cachefiles/cache.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/daemon.c | 2 -
fs/cachefiles/internal.h | 7 +++
4 files changed, 112 insertions(+), 1 deletion(-)
create mode 100644 fs/cachefiles/cache.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index f008524bb78f..463e3d608b75 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -4,6 +4,7 @@
#

cachefiles-y := \
+ cache.o \
daemon.o \
main.o \
security.o
diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
new file mode 100644
index 000000000000..73636f89eefa
--- /dev/null
+++ b/fs/cachefiles/cache.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Manage high-level VFS aspects of a cache.
+ *
+ * Copyright (C) 2007, 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/slab.h>
+#include <linux/statfs.h>
+#include <linux/namei.h>
+#include "internal.h"
+
+/*
+ * See if we have space for a number of pages and/or a number of files in the
+ * cache
+ */
+int cachefiles_has_space(struct cachefiles_cache *cache,
+ unsigned fnr, unsigned bnr)
+{
+ struct kstatfs stats;
+ u64 b_avail, b_writing;
+ int ret;
+
+ struct path path = {
+ .mnt = cache->mnt,
+ .dentry = cache->mnt->mnt_root,
+ };
+
+ //_enter("{%llu,%llu,%llu,%llu,%llu,%llu},%u,%u",
+ // (unsigned long long) cache->frun,
+ // (unsigned long long) cache->fcull,
+ // (unsigned long long) cache->fstop,
+ // (unsigned long long) cache->brun,
+ // (unsigned long long) cache->bcull,
+ // (unsigned long long) cache->bstop,
+ // fnr, bnr);
+
+ /* find out how many pages of blockdev are available */
+ memset(&stats, 0, sizeof(stats));
+
+ ret = vfs_statfs(&path, &stats);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(NULL, d_inode(path.dentry), ret,
+ cachefiles_trace_statfs_error);
+ if (ret == -EIO)
+ cachefiles_io_error(cache, "statfs failed");
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ b_avail = stats.f_bavail >> cache->bshift;
+ b_writing = atomic_long_read(&cache->b_writing);
+ if (b_avail > b_writing)
+ b_avail -= b_writing;
+ else
+ b_avail = 0;
+
+ //_debug("avail %llu,%llu",
+ // (unsigned long long)stats.f_ffree,
+ // (unsigned long long)b_avail);
+
+ /* see if there is sufficient space */
+ if (stats.f_ffree > fnr)
+ stats.f_ffree -= fnr;
+ else
+ stats.f_ffree = 0;
+
+ if (b_avail > bnr)
+ b_avail -= bnr;
+ else
+ b_avail = 0;
+
+ ret = -ENOBUFS;
+ if (stats.f_ffree < cache->fstop ||
+ b_avail < cache->bstop)
+ goto begin_cull;
+
+ ret = 0;
+ if (stats.f_ffree < cache->fcull ||
+ b_avail < cache->bcull)
+ goto begin_cull;
+
+ if (test_bit(CACHEFILES_CULLING, &cache->flags) &&
+ stats.f_ffree >= cache->frun &&
+ b_avail >= cache->brun &&
+ test_and_clear_bit(CACHEFILES_CULLING, &cache->flags)
+ ) {
+ _debug("cease culling");
+ cachefiles_state_changed(cache);
+ }
+
+ //_leave(" = 0");
+ return 0;
+
+begin_cull:
+ if (!test_and_set_bit(CACHEFILES_CULLING, &cache->flags)) {
+ _debug("### CULL CACHE ###");
+ cachefiles_state_changed(cache);
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 4cfb7c8b37d0..7d4691614cec 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -167,7 +167,7 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
return 0;

/* check how much space the cache has */
- // PLACEHOLDER: Check space
+ cachefiles_has_space(cache, 0, 0);

/* summarise */
f_released = atomic_xchg(&cache->f_released, 0);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 7fd5429715ea..3783a3e01027 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -39,6 +39,7 @@ struct cachefiles_cache {
atomic_t gravecounter; /* graveyard uniquifier */
atomic_t f_released; /* number of objects released lately */
atomic_long_t b_released; /* number of blocks released lately */
+ atomic_long_t b_writing; /* Number of blocks being written */
unsigned frun_percent; /* when to stop culling (% files) */
unsigned fcull_percent; /* when to start culling (% files) */
unsigned fstop_percent; /* when to stop allocating (% files) */
@@ -74,6 +75,12 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
wake_up_all(&cache->daemon_pollwq);
}

+/*
+ * cache.c
+ */
+extern int cachefiles_has_space(struct cachefiles_cache *cache,
+ unsigned fnr, unsigned bnr);
+
/*
* daemon.c
*/



2021-12-22 23:23:45

by David Howells

[permalink] [raw]
Subject: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
the kernel to prevent cachefiles or other kernel services from interfering
with that file.

Alter rmdir to reject attempts to remove a directory marked with this flag.
This is used by cachefiles to prevent cachefilesd from removing them.

Using S_SWAPFILE instead isn't really viable as that has other effects in
the I/O paths.

Changes
=======
ver #3:
- Check for the object pointer being NULL in the tracepoints rather than
the caller.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 1 +
fs/cachefiles/namei.c | 43 +++++++++++++++++++++++++++++++++++++
fs/namei.c | 3 ++-
include/linux/fs.h | 1 +
include/trace/events/cachefiles.h | 42 ++++++++++++++++++++++++++++++++++++
5 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 fs/cachefiles/namei.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 463e3d608b75..e0b092ca077f 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
cache.o \
daemon.o \
main.o \
+ namei.o \
security.o

cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
new file mode 100644
index 000000000000..913f83f1c900
--- /dev/null
+++ b/fs/cachefiles/namei.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* CacheFiles path walking and related routines
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/fs.h>
+#include "internal.h"
+
+/*
+ * Mark the backing file as being a cache file if it's not already in use. The
+ * mark tells the culling request command that it's not allowed to cull the
+ * file or directory. The caller must hold the inode lock.
+ */
+static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
+ struct dentry *dentry)
+{
+ struct inode *inode = d_backing_inode(dentry);
+ bool can_use = false;
+
+ if (!(inode->i_flags & S_KERNEL_FILE)) {
+ inode->i_flags |= S_KERNEL_FILE;
+ trace_cachefiles_mark_active(object, inode);
+ can_use = true;
+ } else {
+ pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
+ }
+
+ return can_use;
+}
+
+/*
+ * Unmark a backing inode. The caller must hold the inode lock.
+ */
+static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+ struct dentry *dentry)
+{
+ struct inode *inode = d_backing_inode(dentry);
+
+ inode->i_flags &= ~S_KERNEL_FILE;
+ trace_cachefiles_mark_inactive(object, inode);
+}
diff --git a/fs/namei.c b/fs/namei.c
index 1f9d2187c765..d81f04f8d818 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3958,7 +3958,8 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
inode_lock(dentry->d_inode);

error = -EBUSY;
- if (is_local_mountpoint(dentry))
+ if (is_local_mountpoint(dentry) ||
+ (dentry->d_inode->i_flags & S_KERNEL_FILE))
goto out;

error = security_inode_rmdir(dir, dentry);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2c0b8e77d9ab..bcf1ca430139 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2249,6 +2249,7 @@ struct super_operations {
#define S_ENCRYPTED (1 << 14) /* Encrypted file (using fs/crypto/) */
#define S_CASEFOLD (1 << 15) /* Casefolded file */
#define S_VERITY (1 << 16) /* Verity file (using fs/verity/) */
+#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */

/*
* Note that nosuid etc flags are inode-specific: setting some file-system
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 9bd5a8a60801..6331cd29880d 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -83,6 +83,48 @@ cachefiles_error_traces;
#define E_(a, b) { a, b }


+TRACE_EVENT(cachefiles_mark_active,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct inode *inode),
+
+ TP_ARGS(obj, inode),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(ino_t, inode )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : 0;
+ __entry->inode = inode->i_ino;
+ ),
+
+ TP_printk("o=%08x i=%lx",
+ __entry->obj, __entry->inode)
+ );
+
+TRACE_EVENT(cachefiles_mark_inactive,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct inode *inode),
+
+ TP_ARGS(obj, inode),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(ino_t, inode )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : 0;
+ __entry->inode = inode->i_ino;
+ ),
+
+ TP_printk("o=%08x i=%lx",
+ __entry->obj, __entry->inode)
+ );
+
TRACE_EVENT(cachefiles_vfs_error,
TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
int error, enum cachefiles_error_trace where),



2021-12-22 23:24:05

by David Howells

[permalink] [raw]
Subject: [PATCH v4 39/68] cachefiles: Implement a function to get/create a directory in the cache

Implement a function to get/create structural directories in the cache.
This is used for setting up a cache and creating volume substructures. The
directory in memory are marked with the S_KERNEL_FILE inode flag whilst
they're in use to tell rmdir to reject attempts to remove them.

Changes
=======
ver #3:
- Return an indication as to whether the directory was freshly created.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819631182.215744.3322471539523262619.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906933130.143852.962088616746509062.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967141952.1823006.7832985646370603833.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/internal.h | 9 +++
fs/cachefiles/namei.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 150 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 3783a3e01027..48768a3ab105 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -125,6 +125,15 @@ static inline int cachefiles_inject_remove_error(void)
return cachefiles_error_injection_state & 2 ? -EIO : 0;
}

+/*
+ * namei.c
+ */
+extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ const char *name,
+ bool *_is_new);
+extern void cachefiles_put_directory(struct dentry *dir);
+
/*
* security.c
*/
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 913f83f1c900..11a33209ab5f 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -6,6 +6,7 @@
*/

#include <linux/fs.h>
+#include <linux/namei.h>
#include "internal.h"

/*
@@ -41,3 +42,143 @@ static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
inode->i_flags &= ~S_KERNEL_FILE;
trace_cachefiles_mark_inactive(object, inode);
}
+
+/*
+ * get a subdirectory
+ */
+struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ const char *dirname,
+ bool *_is_new)
+{
+ struct dentry *subdir;
+ struct path path;
+ int ret;
+
+ _enter(",,%s", dirname);
+
+ /* search the current directory for the element name */
+ inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+
+retry:
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ subdir = lookup_one_len(dirname, dir, strlen(dirname));
+ else
+ subdir = ERR_PTR(ret);
+ if (IS_ERR(subdir)) {
+ trace_cachefiles_vfs_error(NULL, d_backing_inode(dir),
+ PTR_ERR(subdir),
+ cachefiles_trace_lookup_error);
+ if (PTR_ERR(subdir) == -ENOMEM)
+ goto nomem_d_alloc;
+ goto lookup_error;
+ }
+
+ _debug("subdir -> %pd %s",
+ subdir, d_backing_inode(subdir) ? "positive" : "negative");
+
+ /* we need to create the subdir if it doesn't exist yet */
+ if (d_is_negative(subdir)) {
+ ret = cachefiles_has_space(cache, 1, 0);
+ if (ret < 0)
+ goto mkdir_error;
+
+ _debug("attempt mkdir");
+
+ path.mnt = cache->mnt;
+ path.dentry = dir;
+ ret = security_path_mkdir(&path, subdir, 0700);
+ if (ret < 0)
+ goto mkdir_error;
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = vfs_mkdir(&init_user_ns, d_inode(dir), subdir, 0700);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(NULL, d_inode(dir), ret,
+ cachefiles_trace_mkdir_error);
+ goto mkdir_error;
+ }
+
+ if (unlikely(d_unhashed(subdir))) {
+ cachefiles_put_directory(subdir);
+ goto retry;
+ }
+ ASSERT(d_backing_inode(subdir));
+
+ _debug("mkdir -> %pd{ino=%lu}",
+ subdir, d_backing_inode(subdir)->i_ino);
+ if (_is_new)
+ *_is_new = true;
+ }
+
+ /* Tell rmdir() it's not allowed to delete the subdir */
+ inode_lock(d_inode(subdir));
+ inode_unlock(d_inode(dir));
+
+ if (!__cachefiles_mark_inode_in_use(NULL, subdir))
+ goto mark_error;
+
+ inode_unlock(d_inode(subdir));
+
+ /* we need to make sure the subdir is a directory */
+ ASSERT(d_backing_inode(subdir));
+
+ if (!d_can_lookup(subdir)) {
+ pr_err("%s is not a directory\n", dirname);
+ ret = -EIO;
+ goto check_error;
+ }
+
+ ret = -EPERM;
+ if (!(d_backing_inode(subdir)->i_opflags & IOP_XATTR) ||
+ !d_backing_inode(subdir)->i_op->lookup ||
+ !d_backing_inode(subdir)->i_op->mkdir ||
+ !d_backing_inode(subdir)->i_op->rename ||
+ !d_backing_inode(subdir)->i_op->rmdir ||
+ !d_backing_inode(subdir)->i_op->unlink)
+ goto check_error;
+
+ _leave(" = [%lu]", d_backing_inode(subdir)->i_ino);
+ return subdir;
+
+check_error:
+ cachefiles_put_directory(subdir);
+ _leave(" = %d [check]", ret);
+ return ERR_PTR(ret);
+
+mark_error:
+ inode_unlock(d_inode(subdir));
+ dput(subdir);
+ return ERR_PTR(-EBUSY);
+
+mkdir_error:
+ inode_unlock(d_inode(dir));
+ dput(subdir);
+ pr_err("mkdir %s failed with error %d\n", dirname, ret);
+ return ERR_PTR(ret);
+
+lookup_error:
+ inode_unlock(d_inode(dir));
+ ret = PTR_ERR(subdir);
+ pr_err("Lookup %s failed with error %d\n", dirname, ret);
+ return ERR_PTR(ret);
+
+nomem_d_alloc:
+ inode_unlock(d_inode(dir));
+ _leave(" = -ENOMEM");
+ return ERR_PTR(-ENOMEM);
+}
+
+/*
+ * Put a subdirectory.
+ */
+void cachefiles_put_directory(struct dentry *dir)
+{
+ if (dir) {
+ inode_lock(dir->d_inode);
+ __cachefiles_unmark_inode_in_use(NULL, dir);
+ inode_unlock(dir->d_inode);
+ dput(dir);
+ }
+}



2021-12-22 23:24:18

by David Howells

[permalink] [raw]
Subject: [PATCH v4 40/68] cachefiles: Implement cache registration and withdrawal

Do the following:

(1) Fill out cachefiles_daemon_add_cache() so that it sets up the cache
directories and registers the cache with cachefiles.

(2) Add a function to do the top-level part of cache withdrawal and
unregistration.

(3) Add a function to sync a cache.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819633175.215744.10857127598041268340.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906935445.143852.15545194974036410029.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967142904.1823006.244055483596047072.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 1
fs/cachefiles/cache.c | 207 +++++++++++++++++++++++++++++++++++++++++++++
fs/cachefiles/daemon.c | 8 +-
fs/cachefiles/interface.c | 18 ++++
fs/cachefiles/internal.h | 9 ++
5 files changed, 240 insertions(+), 3 deletions(-)
create mode 100644 fs/cachefiles/interface.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index e0b092ca077f..92af5daee8ce 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -6,6 +6,7 @@
cachefiles-y := \
cache.o \
daemon.o \
+ interface.o \
main.o \
namei.o \
security.o
diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
index 73636f89eefa..0462e7af87fb 100644
--- a/fs/cachefiles/cache.c
+++ b/fs/cachefiles/cache.c
@@ -10,6 +10,166 @@
#include <linux/namei.h>
#include "internal.h"

+/*
+ * Bring a cache online.
+ */
+int cachefiles_add_cache(struct cachefiles_cache *cache)
+{
+ struct fscache_cache *cache_cookie;
+ struct path path;
+ struct kstatfs stats;
+ struct dentry *graveyard, *cachedir, *root;
+ const struct cred *saved_cred;
+ int ret;
+
+ _enter("");
+
+ cache_cookie = fscache_acquire_cache(cache->tag);
+ if (IS_ERR(cache_cookie))
+ return PTR_ERR(cache_cookie);
+
+ /* we want to work under the module's security ID */
+ ret = cachefiles_get_security_ID(cache);
+ if (ret < 0)
+ goto error_getsec;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+
+ /* look up the directory at the root of the cache */
+ ret = kern_path(cache->rootdirname, LOOKUP_DIRECTORY, &path);
+ if (ret < 0)
+ goto error_open_root;
+
+ cache->mnt = path.mnt;
+ root = path.dentry;
+
+ ret = -EINVAL;
+ if (mnt_user_ns(path.mnt) != &init_user_ns) {
+ pr_warn("File cache on idmapped mounts not supported");
+ goto error_unsupported;
+ }
+
+ /* check parameters */
+ ret = -EOPNOTSUPP;
+ if (d_is_negative(root) ||
+ !d_backing_inode(root)->i_op->lookup ||
+ !d_backing_inode(root)->i_op->mkdir ||
+ !(d_backing_inode(root)->i_opflags & IOP_XATTR) ||
+ !root->d_sb->s_op->statfs ||
+ !root->d_sb->s_op->sync_fs ||
+ root->d_sb->s_blocksize > PAGE_SIZE)
+ goto error_unsupported;
+
+ ret = -EROFS;
+ if (sb_rdonly(root->d_sb))
+ goto error_unsupported;
+
+ /* determine the security of the on-disk cache as this governs
+ * security ID of files we create */
+ ret = cachefiles_determine_cache_security(cache, root, &saved_cred);
+ if (ret < 0)
+ goto error_unsupported;
+
+ /* get the cache size and blocksize */
+ ret = vfs_statfs(&path, &stats);
+ if (ret < 0)
+ goto error_unsupported;
+
+ ret = -ERANGE;
+ if (stats.f_bsize <= 0)
+ goto error_unsupported;
+
+ ret = -EOPNOTSUPP;
+ if (stats.f_bsize > PAGE_SIZE)
+ goto error_unsupported;
+
+ cache->bsize = stats.f_bsize;
+ cache->bshift = 0;
+ if (stats.f_bsize < PAGE_SIZE)
+ cache->bshift = PAGE_SHIFT - ilog2(stats.f_bsize);
+
+ _debug("blksize %u (shift %u)",
+ cache->bsize, cache->bshift);
+
+ _debug("size %llu, avail %llu",
+ (unsigned long long) stats.f_blocks,
+ (unsigned long long) stats.f_bavail);
+
+ /* set up caching limits */
+ do_div(stats.f_files, 100);
+ cache->fstop = stats.f_files * cache->fstop_percent;
+ cache->fcull = stats.f_files * cache->fcull_percent;
+ cache->frun = stats.f_files * cache->frun_percent;
+
+ _debug("limits {%llu,%llu,%llu} files",
+ (unsigned long long) cache->frun,
+ (unsigned long long) cache->fcull,
+ (unsigned long long) cache->fstop);
+
+ stats.f_blocks >>= cache->bshift;
+ do_div(stats.f_blocks, 100);
+ cache->bstop = stats.f_blocks * cache->bstop_percent;
+ cache->bcull = stats.f_blocks * cache->bcull_percent;
+ cache->brun = stats.f_blocks * cache->brun_percent;
+
+ _debug("limits {%llu,%llu,%llu} blocks",
+ (unsigned long long) cache->brun,
+ (unsigned long long) cache->bcull,
+ (unsigned long long) cache->bstop);
+
+ /* get the cache directory and check its type */
+ cachedir = cachefiles_get_directory(cache, root, "cache", NULL);
+ if (IS_ERR(cachedir)) {
+ ret = PTR_ERR(cachedir);
+ goto error_unsupported;
+ }
+
+ cache->store = cachedir;
+
+ /* get the graveyard directory */
+ graveyard = cachefiles_get_directory(cache, root, "graveyard", NULL);
+ if (IS_ERR(graveyard)) {
+ ret = PTR_ERR(graveyard);
+ goto error_unsupported;
+ }
+
+ cache->graveyard = graveyard;
+ cache->cache = cache_cookie;
+
+ ret = fscache_add_cache(cache_cookie, &cachefiles_cache_ops, cache);
+ if (ret < 0)
+ goto error_add_cache;
+
+ /* done */
+ set_bit(CACHEFILES_READY, &cache->flags);
+ dput(root);
+
+ pr_info("File cache on %s registered\n", cache_cookie->name);
+
+ /* check how much space the cache has */
+ cachefiles_has_space(cache, 0, 0);
+ cachefiles_end_secure(cache, saved_cred);
+ _leave(" = 0 [%px]", cache->cache);
+ return 0;
+
+error_add_cache:
+ cachefiles_put_directory(cache->graveyard);
+ cache->graveyard = NULL;
+error_unsupported:
+ cachefiles_put_directory(cache->store);
+ cache->store = NULL;
+ mntput(cache->mnt);
+ cache->mnt = NULL;
+ dput(root);
+error_open_root:
+ cachefiles_end_secure(cache, saved_cred);
+error_getsec:
+ fscache_relinquish_cache(cache_cookie);
+ cache->cache = NULL;
+ pr_err("Failed to register: %d\n", ret);
+ return ret;
+}
+
/*
* See if we have space for a number of pages and/or a number of files in the
* cache
@@ -101,3 +261,50 @@ int cachefiles_has_space(struct cachefiles_cache *cache,
_leave(" = %d", ret);
return ret;
}
+
+/*
+ * Sync a cache to backing disk.
+ */
+static void cachefiles_sync_cache(struct cachefiles_cache *cache)
+{
+ const struct cred *saved_cred;
+ int ret;
+
+ _enter("%s", cache->cache->name);
+
+ /* make sure all pages pinned by operations on behalf of the netfs are
+ * written to disc */
+ cachefiles_begin_secure(cache, &saved_cred);
+ down_read(&cache->mnt->mnt_sb->s_umount);
+ ret = sync_filesystem(cache->mnt->mnt_sb);
+ up_read(&cache->mnt->mnt_sb->s_umount);
+ cachefiles_end_secure(cache, saved_cred);
+
+ if (ret == -EIO)
+ cachefiles_io_error(cache,
+ "Attempt to sync backing fs superblock returned error %d",
+ ret);
+}
+
+/*
+ * Withdraw cache objects.
+ */
+void cachefiles_withdraw_cache(struct cachefiles_cache *cache)
+{
+ struct fscache_cache *fscache = cache->cache;
+
+ pr_info("File cache on %s unregistering\n", fscache->name);
+
+ fscache_withdraw_cache(fscache);
+
+ /* we now have to destroy all the active objects pertaining to this
+ * cache - which we do by passing them off to thread pool to be
+ * disposed of */
+ // PLACEHOLDER: Withdraw objects
+ fscache_wait_for_objects(fscache);
+
+ // PLACEHOLDER: Withdraw volume
+ cachefiles_sync_cache(cache);
+ cache->cache = NULL;
+ fscache_relinquish_cache(fscache);
+}
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7d4691614cec..a449ee661987 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -702,6 +702,7 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)

pr_warn("Cache is disabled for development\n");
return -ENOANO; // Don't allow the cache to operate yet
+ //return cachefiles_add_cache(cache);
}

/*
@@ -711,10 +712,11 @@ static void cachefiles_daemon_unbind(struct cachefiles_cache *cache)
{
_enter("");

- if (test_bit(CACHEFILES_READY, &cache->flags)) {
- // PLACEHOLDER: Withdraw cache
- }
+ if (test_bit(CACHEFILES_READY, &cache->flags))
+ cachefiles_withdraw_cache(cache);

+ cachefiles_put_directory(cache->graveyard);
+ cachefiles_put_directory(cache->store);
mntput(cache->mnt);

kfree(cache->rootdirname);
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
new file mode 100644
index 000000000000..564ea8fa6641
--- /dev/null
+++ b/fs/cachefiles/interface.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* FS-Cache interface to CacheFiles
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/slab.h>
+#include <linux/mount.h>
+#include <linux/xattr.h>
+#include <linux/file.h>
+#include <linux/falloc.h>
+#include <trace/events/fscache.h>
+#include "internal.h"
+
+const struct fscache_cache_ops cachefiles_cache_ops = {
+ .name = "cachefiles",
+};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 48768a3ab105..77e874c2bbe7 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -32,6 +32,8 @@ struct cachefiles_object {
struct cachefiles_cache {
struct fscache_cache *cache; /* Cache cookie */
struct vfsmount *mnt; /* mountpoint holding the cache */
+ struct dentry *store; /* Directory into which live objects go */
+ struct dentry *graveyard; /* directory into which dead objects go */
struct file *cachefilesd; /* manager daemon handle */
const struct cred *cache_cred; /* security override for accessing cache */
struct mutex daemon_mutex; /* command serialisation mutex */
@@ -78,8 +80,10 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
/*
* cache.c
*/
+extern int cachefiles_add_cache(struct cachefiles_cache *cache);
extern int cachefiles_has_space(struct cachefiles_cache *cache,
unsigned fnr, unsigned bnr);
+extern void cachefiles_withdraw_cache(struct cachefiles_cache *cache);

/*
* daemon.c
@@ -125,6 +129,11 @@ static inline int cachefiles_inject_remove_error(void)
return cachefiles_error_injection_state & 2 ? -EIO : 0;
}

+/*
+ * interface.c
+ */
+extern const struct fscache_cache_ops cachefiles_cache_ops;
+
/*
* namei.c
*/



2021-12-22 23:24:26

by David Howells

[permalink] [raw]
Subject: [PATCH v4 41/68] cachefiles: Implement volume support

Implement support for creating the directory layout for a volume on disk
and setting up and withdrawing volume caching.

Each volume has a directory named for the volume key under the root of the
cache (prefixed with an 'I' to indicate to cachefilesd that it's an index)
and then creates a bunch of hash bucket subdirectories under that (named as
'@' plus a hex number) in which cookie files will be created.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819635314.215744.13081522301564537723.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906936397.143852.17788457778396467161.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967143860.1823006.7185205806080225038.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 3 +
fs/cachefiles/cache.c | 28 ++++++++++-
fs/cachefiles/daemon.c | 2 +
fs/cachefiles/interface.c | 2 +
fs/cachefiles/internal.h | 20 ++++++++
fs/cachefiles/volume.c | 118 +++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 171 insertions(+), 2 deletions(-)
create mode 100644 fs/cachefiles/volume.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 92af5daee8ce..d67210ece9cd 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -9,7 +9,8 @@ cachefiles-y := \
interface.o \
main.o \
namei.o \
- security.o
+ security.o \
+ volume.o

cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o

diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
index 0462e7af87fb..c4b9280ca0cd 100644
--- a/fs/cachefiles/cache.c
+++ b/fs/cachefiles/cache.c
@@ -262,6 +262,32 @@ int cachefiles_has_space(struct cachefiles_cache *cache,
return ret;
}

+/*
+ * Withdraw volumes.
+ */
+static void cachefiles_withdraw_volumes(struct cachefiles_cache *cache)
+{
+ _enter("");
+
+ for (;;) {
+ struct cachefiles_volume *volume = NULL;
+
+ spin_lock(&cache->object_list_lock);
+ if (!list_empty(&cache->volumes)) {
+ volume = list_first_entry(&cache->volumes,
+ struct cachefiles_volume, cache_link);
+ list_del_init(&volume->cache_link);
+ }
+ spin_unlock(&cache->object_list_lock);
+ if (!volume)
+ break;
+
+ cachefiles_withdraw_volume(volume);
+ }
+
+ _leave("");
+}
+
/*
* Sync a cache to backing disk.
*/
@@ -303,7 +329,7 @@ void cachefiles_withdraw_cache(struct cachefiles_cache *cache)
// PLACEHOLDER: Withdraw objects
fscache_wait_for_objects(fscache);

- // PLACEHOLDER: Withdraw volume
+ cachefiles_withdraw_volumes(cache);
cachefiles_sync_cache(cache);
cache->cache = NULL;
fscache_relinquish_cache(fscache);
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index a449ee661987..337597a4e30c 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -105,6 +105,8 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)

mutex_init(&cache->daemon_mutex);
init_waitqueue_head(&cache->daemon_pollwq);
+ INIT_LIST_HEAD(&cache->volumes);
+ spin_lock_init(&cache->object_list_lock);

/* set default caching limits
* - limit at 1% free space and/or free files
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 564ea8fa6641..1793e46bd3e7 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -15,4 +15,6 @@

const struct fscache_cache_ops cachefiles_cache_ops = {
.name = "cachefiles",
+ .acquire_volume = cachefiles_acquire_volume,
+ .free_volume = cachefiles_free_volume,
};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 77e874c2bbe7..ab0e9307be7b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -19,6 +19,17 @@
struct cachefiles_cache;
struct cachefiles_object;

+/*
+ * Cached volume representation.
+ */
+struct cachefiles_volume {
+ struct cachefiles_cache *cache;
+ struct list_head cache_link; /* Link in cache->volumes */
+ struct fscache_volume *vcookie; /* The netfs's representation */
+ struct dentry *dentry; /* The volume dentry */
+ struct dentry *fanout[256]; /* Fanout subdirs */
+};
+
/*
* Data file records.
*/
@@ -35,6 +46,8 @@ struct cachefiles_cache {
struct dentry *store; /* Directory into which live objects go */
struct dentry *graveyard; /* directory into which dead objects go */
struct file *cachefilesd; /* manager daemon handle */
+ struct list_head volumes; /* List of volume objects */
+ spinlock_t object_list_lock; /* Lock for volumes and object_list */
const struct cred *cache_cred; /* security override for accessing cache */
struct mutex daemon_mutex; /* command serialisation mutex */
wait_queue_head_t daemon_pollwq; /* poll waitqueue for daemon */
@@ -163,6 +176,13 @@ static inline void cachefiles_end_secure(struct cachefiles_cache *cache,
revert_creds(saved_cred);
}

+/*
+ * volume.c
+ */
+void cachefiles_acquire_volume(struct fscache_volume *volume);
+void cachefiles_free_volume(struct fscache_volume *volume);
+void cachefiles_withdraw_volume(struct cachefiles_volume *volume);
+
/*
* Error handling
*/
diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
new file mode 100644
index 000000000000..4a14f5e72764
--- /dev/null
+++ b/fs/cachefiles/volume.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Volume handling.
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include "internal.h"
+#include <trace/events/fscache.h>
+
+/*
+ * Allocate and set up a volume representation. We make sure all the fanout
+ * directories are created and pinned.
+ */
+void cachefiles_acquire_volume(struct fscache_volume *vcookie)
+{
+ struct cachefiles_volume *volume;
+ struct cachefiles_cache *cache = vcookie->cache->cache_priv;
+ const struct cred *saved_cred;
+ struct dentry *vdentry, *fan;
+ size_t len;
+ char *name;
+ int n_accesses, i;
+
+ _enter("");
+
+ volume = kzalloc(sizeof(struct cachefiles_volume), GFP_KERNEL);
+ if (!volume)
+ return;
+ volume->vcookie = vcookie;
+ volume->cache = cache;
+ INIT_LIST_HEAD(&volume->cache_link);
+
+ cachefiles_begin_secure(cache, &saved_cred);
+
+ len = vcookie->key[0];
+ name = kmalloc(len + 3, GFP_NOFS);
+ if (!name)
+ goto error_vol;
+ name[0] = 'I';
+ memcpy(name + 1, vcookie->key + 1, len);
+ name[len + 1] = 0;
+
+ vdentry = cachefiles_get_directory(cache, cache->store, name, NULL);
+ if (IS_ERR(vdentry))
+ goto error_name;
+ volume->dentry = vdentry;
+
+ for (i = 0; i < 256; i++) {
+ sprintf(name, "@%02x", i);
+ fan = cachefiles_get_directory(cache, vdentry, name, NULL);
+ if (IS_ERR(fan))
+ goto error_fan;
+ volume->fanout[i] = fan;
+ }
+
+ cachefiles_end_secure(cache, saved_cred);
+
+ vcookie->cache_priv = volume;
+ n_accesses = atomic_inc_return(&vcookie->n_accesses); /* Stop wakeups on dec-to-0 */
+ trace_fscache_access_volume(vcookie->debug_id, 0,
+ refcount_read(&vcookie->ref),
+ n_accesses, fscache_access_cache_pin);
+
+ spin_lock(&cache->object_list_lock);
+ list_add(&volume->cache_link, &volume->cache->volumes);
+ spin_unlock(&cache->object_list_lock);
+
+ kfree(name);
+ return;
+
+error_fan:
+ for (i = 0; i < 256; i++)
+ cachefiles_put_directory(volume->fanout[i]);
+ cachefiles_put_directory(volume->dentry);
+error_name:
+ kfree(name);
+error_vol:
+ kfree(volume);
+ cachefiles_end_secure(cache, saved_cred);
+}
+
+/*
+ * Release a volume representation.
+ */
+static void __cachefiles_free_volume(struct cachefiles_volume *volume)
+{
+ int i;
+
+ _enter("");
+
+ volume->vcookie->cache_priv = NULL;
+
+ for (i = 0; i < 256; i++)
+ cachefiles_put_directory(volume->fanout[i]);
+ cachefiles_put_directory(volume->dentry);
+ kfree(volume);
+}
+
+void cachefiles_free_volume(struct fscache_volume *vcookie)
+{
+ struct cachefiles_volume *volume = vcookie->cache_priv;
+
+ if (volume) {
+ spin_lock(&volume->cache->object_list_lock);
+ list_del_init(&volume->cache_link);
+ spin_unlock(&volume->cache->object_list_lock);
+ __cachefiles_free_volume(volume);
+ }
+}
+
+void cachefiles_withdraw_volume(struct cachefiles_volume *volume)
+{
+ fscache_withdraw_volume(volume->vcookie);
+ __cachefiles_free_volume(volume);
+}



2021-12-22 23:25:00

by David Howells

[permalink] [raw]
Subject: [PATCH v4 42/68] cachefiles: Add tracepoints for calls to the VFS

Add tracepoints in cachefiles to monitor when it does various VFS
operations, such as mkdir.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819638517.215744.12773133137536579766.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906938316.143852.17227990869551737803.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967147139.1823006.4909879317496543392.stgit@warthog.procyon.org.uk/ # v3
---

include/trace/events/cachefiles.h | 176 +++++++++++++++++++++++++++++++++++++
1 file changed, 176 insertions(+)

diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 6331cd29880d..5975ea4977b2 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -18,6 +18,21 @@
#ifndef __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY
#define __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY

+enum fscache_why_object_killed {
+ FSCACHE_OBJECT_IS_STALE,
+ FSCACHE_OBJECT_IS_WEIRD,
+ FSCACHE_OBJECT_INVALIDATED,
+ FSCACHE_OBJECT_NO_SPACE,
+ FSCACHE_OBJECT_WAS_RETIRED,
+ FSCACHE_OBJECT_WAS_CULLED,
+};
+
+enum cachefiles_trunc_trace {
+ cachefiles_trunc_dio_adjust,
+ cachefiles_trunc_expand_tmpfile,
+ cachefiles_trunc_shrink,
+};
+
enum cachefiles_error_trace {
cachefiles_trace_fallocate_error,
cachefiles_trace_getxattr_error,
@@ -43,6 +58,19 @@ enum cachefiles_error_trace {
/*
* Define enum -> string mappings for display.
*/
+#define cachefiles_obj_kill_traces \
+ EM(FSCACHE_OBJECT_IS_STALE, "stale") \
+ EM(FSCACHE_OBJECT_IS_WEIRD, "weird") \
+ EM(FSCACHE_OBJECT_INVALIDATED, "inval") \
+ EM(FSCACHE_OBJECT_NO_SPACE, "no_space") \
+ EM(FSCACHE_OBJECT_WAS_RETIRED, "was_retired") \
+ E_(FSCACHE_OBJECT_WAS_CULLED, "was_culled")
+
+#define cachefiles_trunc_traces \
+ EM(cachefiles_trunc_dio_adjust, "DIOADJ") \
+ EM(cachefiles_trunc_expand_tmpfile, "EXPTMP") \
+ E_(cachefiles_trunc_shrink, "SHRINK")
+
#define cachefiles_error_traces \
EM(cachefiles_trace_fallocate_error, "fallocate") \
EM(cachefiles_trace_getxattr_error, "getxattr") \
@@ -71,6 +99,8 @@ enum cachefiles_error_trace {
#define EM(a, b) TRACE_DEFINE_ENUM(a);
#define E_(a, b) TRACE_DEFINE_ENUM(a);

+cachefiles_obj_kill_traces;
+cachefiles_trunc_traces;
cachefiles_error_traces;

/*
@@ -83,6 +113,152 @@ cachefiles_error_traces;
#define E_(a, b) { a, b }


+TRACE_EVENT(cachefiles_lookup,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct dentry *de),
+
+ TP_ARGS(obj, de),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(short, error )
+ __field(unsigned long, ino )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->ino = (!IS_ERR(de) && d_backing_inode(de) ?
+ d_backing_inode(de)->i_ino : 0);
+ __entry->error = IS_ERR(de) ? PTR_ERR(de) : 0;
+ ),
+
+ TP_printk("o=%08x i=%lx e=%d",
+ __entry->obj, __entry->ino, __entry->error)
+ );
+
+TRACE_EVENT(cachefiles_tmpfile,
+ TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+ TP_ARGS(obj, backer),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->backer = backer->i_ino;
+ ),
+
+ TP_printk("o=%08x b=%08x",
+ __entry->obj,
+ __entry->backer)
+ );
+
+TRACE_EVENT(cachefiles_link,
+ TP_PROTO(struct cachefiles_object *obj, struct inode *backer),
+
+ TP_ARGS(obj, backer),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->backer = backer->i_ino;
+ ),
+
+ TP_printk("o=%08x b=%08x",
+ __entry->obj,
+ __entry->backer)
+ );
+
+TRACE_EVENT(cachefiles_unlink,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct dentry *de,
+ enum fscache_why_object_killed why),
+
+ TP_ARGS(obj, de, why),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(struct dentry *, de )
+ __field(enum fscache_why_object_killed, why )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : UINT_MAX;
+ __entry->de = de;
+ __entry->why = why;
+ ),
+
+ TP_printk("o=%08x d=%p w=%s",
+ __entry->obj, __entry->de,
+ __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
+ );
+
+TRACE_EVENT(cachefiles_rename,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct dentry *de,
+ struct dentry *to,
+ enum fscache_why_object_killed why),
+
+ TP_ARGS(obj, de, to, why),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(struct dentry *, de )
+ __field(struct dentry *, to )
+ __field(enum fscache_why_object_killed, why )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj ? obj->debug_id : UINT_MAX;
+ __entry->de = de;
+ __entry->to = to;
+ __entry->why = why;
+ ),
+
+ TP_printk("o=%08x d=%p t=%p w=%s",
+ __entry->obj, __entry->de, __entry->to,
+ __print_symbolic(__entry->why, cachefiles_obj_kill_traces))
+ );
+
+TRACE_EVENT(cachefiles_trunc,
+ TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
+ loff_t from, loff_t to, enum cachefiles_trunc_trace why),
+
+ TP_ARGS(obj, backer, from, to, why),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ __field(enum cachefiles_trunc_trace, why )
+ __field(loff_t, from )
+ __field(loff_t, to )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->backer = backer->i_ino;
+ __entry->from = from;
+ __entry->to = to;
+ __entry->why = why;
+ ),
+
+ TP_printk("o=%08x b=%08x %s l=%llx->%llx",
+ __entry->obj,
+ __entry->backer,
+ __print_symbolic(__entry->why, cachefiles_trunc_traces),
+ __entry->from,
+ __entry->to)
+ );
+
TRACE_EVENT(cachefiles_mark_active,
TP_PROTO(struct cachefiles_object *obj,
struct inode *inode),



2021-12-22 23:25:04

by David Howells

[permalink] [raw]
Subject: [PATCH v4 43/68] cachefiles: Implement object lifecycle funcs

Implement allocate, get, see and put functions for the cachefiles_object
struct. The members of the struct we're going to need are also added.

Additionally, implement a lifecycle tracepoint.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819639457.215744.4600093239395728232.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906939569.143852.3594314410666551982.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967148857.1823006.6332962598220464364.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/interface.c | 86 +++++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 35 ++++++++++++++-
fs/cachefiles/main.c | 16 +++++++
include/trace/events/cachefiles.h | 58 +++++++++++++++++++++++++
include/trace/events/fscache.h | 4 ++
5 files changed, 197 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 1793e46bd3e7..68bb7b6c4945 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -13,6 +13,92 @@
#include <trace/events/fscache.h>
#include "internal.h"

+static atomic_t cachefiles_object_debug_id;
+
+/*
+ * Allocate a cache object record.
+ */
+static
+struct cachefiles_object *cachefiles_alloc_object(struct fscache_cookie *cookie)
+{
+ struct fscache_volume *vcookie = cookie->volume;
+ struct cachefiles_volume *volume = vcookie->cache_priv;
+ struct cachefiles_object *object;
+
+ _enter("{%s},%x,", vcookie->key, cookie->debug_id);
+
+ object = kmem_cache_zalloc(cachefiles_object_jar, GFP_KERNEL);
+ if (!object)
+ return NULL;
+
+ refcount_set(&object->ref, 1);
+
+ spin_lock_init(&object->lock);
+ INIT_LIST_HEAD(&object->cache_link);
+ object->volume = volume;
+ object->debug_id = atomic_inc_return(&cachefiles_object_debug_id);
+ object->cookie = fscache_get_cookie(cookie, fscache_cookie_get_attach_object);
+
+ fscache_count_object(vcookie->cache);
+ trace_cachefiles_ref(object->debug_id, cookie->debug_id, 1,
+ cachefiles_obj_new);
+ return object;
+}
+
+/*
+ * Note that an object has been seen.
+ */
+void cachefiles_see_object(struct cachefiles_object *object,
+ enum cachefiles_obj_ref_trace why)
+{
+ trace_cachefiles_ref(object->debug_id, object->cookie->debug_id,
+ refcount_read(&object->ref), why);
+}
+
+/*
+ * Increment the usage count on an object;
+ */
+struct cachefiles_object *cachefiles_grab_object(struct cachefiles_object *object,
+ enum cachefiles_obj_ref_trace why)
+{
+ int r;
+
+ __refcount_inc(&object->ref, &r);
+ trace_cachefiles_ref(object->debug_id, object->cookie->debug_id, r, why);
+ return object;
+}
+
+/*
+ * dispose of a reference to an object
+ */
+void cachefiles_put_object(struct cachefiles_object *object,
+ enum cachefiles_obj_ref_trace why)
+{
+ unsigned int object_debug_id = object->debug_id;
+ unsigned int cookie_debug_id = object->cookie->debug_id;
+ struct fscache_cache *cache;
+ bool done;
+ int r;
+
+ done = __refcount_dec_and_test(&object->ref, &r);
+ trace_cachefiles_ref(object_debug_id, cookie_debug_id, r, why);
+ if (done) {
+ _debug("- kill object OBJ%x", object_debug_id);
+
+ ASSERTCMP(object->file, ==, NULL);
+
+ kfree(object->d_name);
+
+ cache = object->volume->cache->cache;
+ fscache_put_cookie(object->cookie, fscache_cookie_put_object);
+ object->cookie = NULL;
+ kmem_cache_free(cachefiles_object_jar, object);
+ fscache_uncount_object(cache);
+ }
+
+ _leave("");
+}
+
const struct fscache_cache_ops cachefiles_cache_ops = {
.name = "cachefiles",
.acquire_volume = cachefiles_acquire_volume,
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index ab0e9307be7b..8763ee4a0df2 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -19,6 +19,16 @@
struct cachefiles_cache;
struct cachefiles_object;

+enum cachefiles_content {
+ /* These values are saved on disk */
+ CACHEFILES_CONTENT_NO_DATA = 0, /* No content stored */
+ CACHEFILES_CONTENT_SINGLE = 1, /* Content is monolithic, all is present */
+ CACHEFILES_CONTENT_ALL = 2, /* Content is all present, no map */
+ CACHEFILES_CONTENT_BACKFS_MAP = 3, /* Content is piecemeal, mapped through backing fs */
+ CACHEFILES_CONTENT_DIRTY = 4, /* Content is dirty (only seen on disk) */
+ nr__cachefiles_content
+};
+
/*
* Cached volume representation.
*/
@@ -31,10 +41,20 @@ struct cachefiles_volume {
};

/*
- * Data file records.
+ * Backing file state.
*/
struct cachefiles_object {
- int debug_id; /* debugging ID */
+ struct fscache_cookie *cookie; /* Netfs data storage object cookie */
+ struct cachefiles_volume *volume; /* Cache volume that holds this object */
+ struct list_head cache_link; /* Link in cache->*_list */
+ struct file *file; /* The file representing this object */
+ char *d_name; /* Backing file name */
+ int debug_id;
+ spinlock_t lock;
+ refcount_t ref;
+ u8 d_name_len; /* Length of filename */
+ enum cachefiles_content content_info:8; /* Info about content presence */
+ unsigned long flags;
};

/*
@@ -146,6 +166,17 @@ static inline int cachefiles_inject_remove_error(void)
* interface.c
*/
extern const struct fscache_cache_ops cachefiles_cache_ops;
+extern void cachefiles_see_object(struct cachefiles_object *object,
+ enum cachefiles_obj_ref_trace why);
+extern struct cachefiles_object *cachefiles_grab_object(struct cachefiles_object *object,
+ enum cachefiles_obj_ref_trace why);
+extern void cachefiles_put_object(struct cachefiles_object *object,
+ enum cachefiles_obj_ref_trace why);
+
+/*
+ * main.c
+ */
+extern struct kmem_cache *cachefiles_object_jar;

/*
* namei.c
diff --git a/fs/cachefiles/main.c b/fs/cachefiles/main.c
index 533e3067d80f..3f369c6f816d 100644
--- a/fs/cachefiles/main.c
+++ b/fs/cachefiles/main.c
@@ -31,6 +31,8 @@ MODULE_DESCRIPTION("Mounted-filesystem based cache");
MODULE_AUTHOR("Red Hat, Inc.");
MODULE_LICENSE("GPL");

+struct kmem_cache *cachefiles_object_jar;
+
static struct miscdevice cachefiles_dev = {
.minor = MISC_DYNAMIC_MINOR,
.name = "cachefiles",
@@ -51,9 +53,22 @@ static int __init cachefiles_init(void)
if (ret < 0)
goto error_dev;

+ /* create an object jar */
+ ret = -ENOMEM;
+ cachefiles_object_jar =
+ kmem_cache_create("cachefiles_object_jar",
+ sizeof(struct cachefiles_object),
+ 0, SLAB_HWCACHE_ALIGN, NULL);
+ if (!cachefiles_object_jar) {
+ pr_notice("Failed to allocate an object jar\n");
+ goto error_object_jar;
+ }
+
pr_info("Loaded\n");
return 0;

+error_object_jar:
+ misc_deregister(&cachefiles_dev);
error_dev:
cachefiles_unregister_error_injection();
error_einj:
@@ -70,6 +85,7 @@ static void __exit cachefiles_exit(void)
{
pr_info("Unloading\n");

+ kmem_cache_destroy(cachefiles_object_jar);
misc_deregister(&cachefiles_dev);
cachefiles_unregister_error_injection();
}
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 5975ea4977b2..54815cc776ba 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -18,6 +18,21 @@
#ifndef __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY
#define __CACHEFILES_DECLARE_TRACE_ENUMS_ONCE_ONLY

+enum cachefiles_obj_ref_trace {
+ cachefiles_obj_get_ioreq,
+ cachefiles_obj_new,
+ cachefiles_obj_put_alloc_fail,
+ cachefiles_obj_put_detach,
+ cachefiles_obj_put_ioreq,
+ cachefiles_obj_see_clean_commit,
+ cachefiles_obj_see_clean_delete,
+ cachefiles_obj_see_clean_drop_tmp,
+ cachefiles_obj_see_lookup_cookie,
+ cachefiles_obj_see_lookup_failed,
+ cachefiles_obj_see_withdraw_cookie,
+ cachefiles_obj_see_withdrawal,
+};
+
enum fscache_why_object_killed {
FSCACHE_OBJECT_IS_STALE,
FSCACHE_OBJECT_IS_WEIRD,
@@ -66,6 +81,20 @@ enum cachefiles_error_trace {
EM(FSCACHE_OBJECT_WAS_RETIRED, "was_retired") \
E_(FSCACHE_OBJECT_WAS_CULLED, "was_culled")

+#define cachefiles_obj_ref_traces \
+ EM(cachefiles_obj_get_ioreq, "GET ioreq") \
+ EM(cachefiles_obj_new, "NEW obj") \
+ EM(cachefiles_obj_put_alloc_fail, "PUT alloc_fail") \
+ EM(cachefiles_obj_put_detach, "PUT detach") \
+ EM(cachefiles_obj_put_ioreq, "PUT ioreq") \
+ EM(cachefiles_obj_see_clean_commit, "SEE clean_commit") \
+ EM(cachefiles_obj_see_clean_delete, "SEE clean_delete") \
+ EM(cachefiles_obj_see_clean_drop_tmp, "SEE clean_drop_tmp") \
+ EM(cachefiles_obj_see_lookup_cookie, "SEE lookup_cookie") \
+ EM(cachefiles_obj_see_lookup_failed, "SEE lookup_failed") \
+ EM(cachefiles_obj_see_withdraw_cookie, "SEE withdraw_cookie") \
+ E_(cachefiles_obj_see_withdrawal, "SEE withdrawal")
+
#define cachefiles_trunc_traces \
EM(cachefiles_trunc_dio_adjust, "DIOADJ") \
EM(cachefiles_trunc_expand_tmpfile, "EXPTMP") \
@@ -100,6 +129,7 @@ enum cachefiles_error_trace {
#define E_(a, b) TRACE_DEFINE_ENUM(a);

cachefiles_obj_kill_traces;
+cachefiles_obj_ref_traces;
cachefiles_trunc_traces;
cachefiles_error_traces;

@@ -113,6 +143,34 @@ cachefiles_error_traces;
#define E_(a, b) { a, b }


+TRACE_EVENT(cachefiles_ref,
+ TP_PROTO(unsigned int object_debug_id,
+ unsigned int cookie_debug_id,
+ int usage,
+ enum cachefiles_obj_ref_trace why),
+
+ TP_ARGS(object_debug_id, cookie_debug_id, usage, why),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, cookie )
+ __field(enum cachefiles_obj_ref_trace, why )
+ __field(int, usage )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = object_debug_id;
+ __entry->cookie = cookie_debug_id;
+ __entry->usage = usage;
+ __entry->why = why;
+ ),
+
+ TP_printk("c=%08x o=%08x u=%d %s",
+ __entry->cookie, __entry->obj, __entry->usage,
+ __print_symbolic(__entry->why, cachefiles_obj_ref_traces))
+ );
+
TRACE_EVENT(cachefiles_lookup,
TP_PROTO(struct cachefiles_object *obj,
struct dentry *de),
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 5fa37a8b4ec7..d9d830296ec3 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -49,6 +49,7 @@ enum fscache_volume_trace {
enum fscache_cookie_trace {
fscache_cookie_collision,
fscache_cookie_discard,
+ fscache_cookie_get_attach_object,
fscache_cookie_get_end_access,
fscache_cookie_get_hash_collision,
fscache_cookie_get_inval_work,
@@ -57,6 +58,7 @@ enum fscache_cookie_trace {
fscache_cookie_new_acquire,
fscache_cookie_put_hash_collision,
fscache_cookie_put_lru,
+ fscache_cookie_put_object,
fscache_cookie_put_over_queued,
fscache_cookie_put_relinquish,
fscache_cookie_put_withdrawn,
@@ -122,6 +124,7 @@ enum fscache_access_trace {
#define fscache_cookie_traces \
EM(fscache_cookie_collision, "*COLLIDE*") \
EM(fscache_cookie_discard, "DISCARD ") \
+ EM(fscache_cookie_get_attach_object, "GET attch") \
EM(fscache_cookie_get_hash_collision, "GET hcoll") \
EM(fscache_cookie_get_end_access, "GQ endac") \
EM(fscache_cookie_get_inval_work, "GQ inval") \
@@ -130,6 +133,7 @@ enum fscache_access_trace {
EM(fscache_cookie_new_acquire, "NEW acq ") \
EM(fscache_cookie_put_hash_collision, "PUT hcoll") \
EM(fscache_cookie_put_lru, "PUT lru ") \
+ EM(fscache_cookie_put_object, "PUT obj ") \
EM(fscache_cookie_put_over_queued, "PQ overq") \
EM(fscache_cookie_put_relinquish, "PUT relnq") \
EM(fscache_cookie_put_withdrawn, "PUT wthdn") \



2021-12-22 23:25:17

by David Howells

[permalink] [raw]
Subject: [PATCH v4 44/68] cachefiles: Implement key to filename encoding

Implement a function to encode a binary cookie key as something that can be
used as a filename. Four options are considered:

(1) All printable chars with no '/' characters. Prepend a 'D' to indicate
the encoding but otherwise use as-is.

(2) Appears to be an array of __be32. Encode as 'S' plus a list of
hex-encoded 32-bit ints separated by commas. If a number is 0, it is
rendered as "" instead of "0".

(3) Appears to be an array of __le32. Encoded as (2) but with a 'T'
encoding prefix.

(4) Encoded as base64 with an 'E' prefix plus a second char indicating how
much padding is involved. A non-standard base64 encoding is used
because '/' cannot be used in the encoded form.

If (1) is not possible, whichever of (2), (3) or (4) produces the shortest
string is selected (hex-encoding a number may be less dense than base64
encoding it).

Note that the prefix characters have to be selected from the set [DEIJST@]
lest cachefilesd remove the files because it recognise the name.

Changes
=======
ver #2:
- Fix a short allocation that didn't allow for a string terminator[1]

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/163819640393.215744.15212364106412961104.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906940529.143852.17352132319136117053.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967149827.1823006.6088580775428487961.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 1
fs/cachefiles/internal.h | 5 ++
fs/cachefiles/key.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 144 insertions(+)
create mode 100644 fs/cachefiles/key.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index d67210ece9cd..6f025940a65c 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
cache.o \
daemon.o \
interface.o \
+ key.o \
main.o \
namei.o \
security.o \
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8763ee4a0df2..dbc37f5d4714 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -173,6 +173,11 @@ extern struct cachefiles_object *cachefiles_grab_object(struct cachefiles_object
extern void cachefiles_put_object(struct cachefiles_object *object,
enum cachefiles_obj_ref_trace why);

+/*
+ * key.c
+ */
+extern bool cachefiles_cook_key(struct cachefiles_object *object);
+
/*
* main.c
*/
diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c
new file mode 100644
index 000000000000..bf935e25bdbe
--- /dev/null
+++ b/fs/cachefiles/key.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Key to pathname encoder
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/slab.h>
+#include "internal.h"
+
+static const char cachefiles_charmap[64] =
+ "0123456789" /* 0 - 9 */
+ "abcdefghijklmnopqrstuvwxyz" /* 10 - 35 */
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZ" /* 36 - 61 */
+ "_-" /* 62 - 63 */
+ ;
+
+static const char cachefiles_filecharmap[256] = {
+ /* we skip space and tab and control chars */
+ [33 ... 46] = 1, /* '!' -> '.' */
+ /* we skip '/' as it's significant to pathwalk */
+ [48 ... 127] = 1, /* '0' -> '~' */
+};
+
+static inline unsigned int how_many_hex_digits(unsigned int x)
+{
+ return x ? round_up(ilog2(x) + 1, 4) / 4 : 0;
+}
+
+/*
+ * turn the raw key into something cooked
+ * - the key may be up to NAME_MAX in length (including the length word)
+ * - "base64" encode the strange keys, mapping 3 bytes of raw to four of
+ * cooked
+ * - need to cut the cooked key into 252 char lengths (189 raw bytes)
+ */
+bool cachefiles_cook_key(struct cachefiles_object *object)
+{
+ const u8 *key = fscache_get_key(object->cookie), *kend;
+ unsigned char ch;
+ unsigned int acc, i, n, nle, nbe, keylen = object->cookie->key_len;
+ unsigned int b64len, len, print, pad;
+ char *name, sep;
+
+ _enter(",%u,%*phN", keylen, keylen, key);
+
+ BUG_ON(keylen > NAME_MAX - 3);
+
+ print = 1;
+ for (i = 0; i < keylen; i++) {
+ ch = key[i];
+ print &= cachefiles_filecharmap[ch];
+ }
+
+ /* If the path is usable ASCII, then we render it directly */
+ if (print) {
+ len = 1 + keylen;
+ name = kmalloc(len + 1, GFP_KERNEL);
+ if (!name)
+ return false;
+
+ name[0] = 'D'; /* Data object type, string encoding */
+ memcpy(name + 1, key, keylen);
+ goto success;
+ }
+
+ /* See if it makes sense to encode it as "hex,hex,hex" for each 32-bit
+ * chunk. We rely on the key having been padded out to a whole number
+ * of 32-bit words.
+ */
+ n = round_up(keylen, 4);
+ nbe = nle = 0;
+ for (i = 0; i < n; i += 4) {
+ u32 be = be32_to_cpu(*(__be32 *)(key + i));
+ u32 le = le32_to_cpu(*(__le32 *)(key + i));
+
+ nbe += 1 + how_many_hex_digits(be);
+ nle += 1 + how_many_hex_digits(le);
+ }
+
+ b64len = DIV_ROUND_UP(keylen, 3);
+ pad = b64len * 3 - keylen;
+ b64len = 2 + b64len * 4; /* Length if we base64-encode it */
+ _debug("len=%u nbe=%u nle=%u b64=%u", keylen, nbe, nle, b64len);
+ if (nbe < b64len || nle < b64len) {
+ unsigned int nlen = min(nbe, nle) + 1;
+ name = kmalloc(nlen, GFP_KERNEL);
+ if (!name)
+ return false;
+ sep = (nbe <= nle) ? 'S' : 'T'; /* Encoding indicator */
+ len = 0;
+ for (i = 0; i < n; i += 4) {
+ u32 x;
+ if (nbe <= nle)
+ x = be32_to_cpu(*(__be32 *)(key + i));
+ else
+ x = le32_to_cpu(*(__le32 *)(key + i));
+ name[len++] = sep;
+ if (x != 0)
+ len += snprintf(name + len, nlen - len, "%x", x);
+ sep = ',';
+ }
+ goto success;
+ }
+
+ /* We need to base64-encode it */
+ name = kmalloc(b64len + 1, GFP_KERNEL);
+ if (!name)
+ return false;
+
+ name[0] = 'E';
+ name[1] = '0' + pad;
+ len = 2;
+ kend = key + keylen;
+ do {
+ acc = *key++;
+ if (key < kend) {
+ acc |= *key++ << 8;
+ if (key < kend)
+ acc |= *key++ << 16;
+ }
+
+ name[len++] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ name[len++] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ name[len++] = cachefiles_charmap[acc & 63];
+ acc >>= 6;
+ name[len++] = cachefiles_charmap[acc & 63];
+ } while (key < kend);
+
+success:
+ name[len] = 0;
+ object->d_name = name;
+ object->d_name_len = len;
+ _leave(" = %s", object->d_name);
+ return true;
+}



2021-12-22 23:25:29

by David Howells

[permalink] [raw]
Subject: [PATCH v4 45/68] cachefiles: Implement metadata/coherency data storage in xattrs

Use an xattr on each backing file in the cache to store some metadata, such
as the content type and the coherency data.

Five content types are defined:

(0) No content stored.

(1) The file contains a single monolithic blob and must be all or nothing.
This would be used for something like an AFS directory or a symlink.

(2) The file is populated with content completely up to a point with
nothing beyond that.

(3) The file has a map attached and is sparsely populated. This would be
stored in one or more additional xattrs.

(4) The file is dirty, being in the process of local modification and the
contents are not necessarily represented correctly by the metadata.
The file should be deleted if this is seen on binding.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819641320.215744.16346770087799536862.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906942248.143852.5423738045012094252.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967151734.1823006.9301249989443622576.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 3 -
fs/cachefiles/internal.h | 21 ++++
fs/cachefiles/xattr.c | 181 +++++++++++++++++++++++++++++++++++++
include/trace/events/cachefiles.h | 56 +++++++++++
4 files changed, 260 insertions(+), 1 deletion(-)
create mode 100644 fs/cachefiles/xattr.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 6f025940a65c..cb7a6bcf51eb 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -11,7 +11,8 @@ cachefiles-y := \
main.o \
namei.o \
security.o \
- volume.o
+ volume.o \
+ xattr.o

cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index dbc37f5d4714..01071e7a7c02 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -55,6 +55,7 @@ struct cachefiles_object {
u8 d_name_len; /* Length of filename */
enum cachefiles_content content_info:8; /* Info about content presence */
unsigned long flags;
+#define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */
};

/*
@@ -219,6 +220,17 @@ void cachefiles_acquire_volume(struct fscache_volume *volume);
void cachefiles_free_volume(struct fscache_volume *volume);
void cachefiles_withdraw_volume(struct cachefiles_volume *volume);

+/*
+ * xattr.c
+ */
+extern int cachefiles_set_object_xattr(struct cachefiles_object *object);
+extern int cachefiles_check_auxdata(struct cachefiles_object *object,
+ struct file *file);
+extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
+ struct cachefiles_object *object,
+ struct dentry *dentry);
+extern void cachefiles_prepare_to_write(struct fscache_cookie *cookie);
+
/*
* Error handling
*/
@@ -229,6 +241,15 @@ do { \
set_bit(CACHEFILES_DEAD, &(___cache)->flags); \
} while (0)

+#define cachefiles_io_error_obj(object, FMT, ...) \
+do { \
+ struct cachefiles_cache *___cache; \
+ \
+ ___cache = (object)->volume->cache; \
+ cachefiles_io_error(___cache, FMT " [o=%08x]", ##__VA_ARGS__, \
+ (object)->debug_id); \
+} while (0)
+

/*
* Debug tracing
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
new file mode 100644
index 000000000000..0601c46a22ef
--- /dev/null
+++ b/fs/cachefiles/xattr.c
@@ -0,0 +1,181 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* CacheFiles extended attribute management
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/fsnotify.h>
+#include <linux/quotaops.h>
+#include <linux/xattr.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+#define CACHEFILES_COOKIE_TYPE_DATA 1
+
+struct cachefiles_xattr {
+ __be64 object_size; /* Actual size of the object */
+ __be64 zero_point; /* Size after which server has no data not written by us */
+ __u8 type; /* Type of object */
+ __u8 content; /* Content presence (enum cachefiles_content) */
+ __u8 data[]; /* netfs coherency data */
+} __packed;
+
+static const char cachefiles_xattr_cache[] =
+ XATTR_USER_PREFIX "CacheFiles.cache";
+
+/*
+ * set the state xattr on a cache file
+ */
+int cachefiles_set_object_xattr(struct cachefiles_object *object)
+{
+ struct cachefiles_xattr *buf;
+ struct dentry *dentry;
+ struct file *file = object->file;
+ unsigned int len = object->cookie->aux_len;
+ int ret;
+
+ if (!file)
+ return -ESTALE;
+ dentry = file->f_path.dentry;
+
+ _enter("%x,#%d", object->debug_id, len);
+
+ buf = kmalloc(sizeof(struct cachefiles_xattr) + len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ buf->object_size = cpu_to_be64(object->cookie->object_size);
+ buf->zero_point = 0;
+ buf->type = CACHEFILES_COOKIE_TYPE_DATA;
+ buf->content = object->content_info;
+ if (test_bit(FSCACHE_COOKIE_LOCAL_WRITE, &object->cookie->flags))
+ buf->content = CACHEFILES_CONTENT_DIRTY;
+ if (len > 0)
+ memcpy(buf->data, fscache_get_aux(object->cookie), len);
+
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = vfs_setxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
+ buf, sizeof(struct cachefiles_xattr) + len, 0);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(object, file_inode(file), ret,
+ cachefiles_trace_setxattr_error);
+ trace_cachefiles_coherency(object, file_inode(file)->i_ino,
+ buf->content,
+ cachefiles_coherency_set_fail);
+ if (ret != -ENOMEM)
+ cachefiles_io_error_obj(
+ object,
+ "Failed to set xattr with error %d", ret);
+ } else {
+ trace_cachefiles_coherency(object, file_inode(file)->i_ino,
+ buf->content,
+ cachefiles_coherency_set_ok);
+ }
+
+ kfree(buf);
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * check the consistency between the backing cache and the FS-Cache cookie
+ */
+int cachefiles_check_auxdata(struct cachefiles_object *object, struct file *file)
+{
+ struct cachefiles_xattr *buf;
+ struct dentry *dentry = file->f_path.dentry;
+ unsigned int len = object->cookie->aux_len, tlen;
+ const void *p = fscache_get_aux(object->cookie);
+ enum cachefiles_coherency_trace why;
+ ssize_t xlen;
+ int ret = -ESTALE;
+
+ tlen = sizeof(struct cachefiles_xattr) + len;
+ buf = kmalloc(tlen, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ xlen = cachefiles_inject_read_error();
+ if (xlen == 0)
+ xlen = vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache, buf, tlen);
+ if (xlen != tlen) {
+ if (xlen < 0)
+ trace_cachefiles_vfs_error(object, file_inode(file), xlen,
+ cachefiles_trace_getxattr_error);
+ if (xlen == -EIO)
+ cachefiles_io_error_obj(
+ object,
+ "Failed to read aux with error %zd", xlen);
+ why = cachefiles_coherency_check_xattr;
+ } else if (buf->type != CACHEFILES_COOKIE_TYPE_DATA) {
+ why = cachefiles_coherency_check_type;
+ } else if (memcmp(buf->data, p, len) != 0) {
+ why = cachefiles_coherency_check_aux;
+ } else if (be64_to_cpu(buf->object_size) != object->cookie->object_size) {
+ why = cachefiles_coherency_check_objsize;
+ } else if (buf->content == CACHEFILES_CONTENT_DIRTY) {
+ // TODO: Begin conflict resolution
+ pr_warn("Dirty object in cache\n");
+ why = cachefiles_coherency_check_dirty;
+ } else {
+ why = cachefiles_coherency_check_ok;
+ ret = 0;
+ }
+
+ trace_cachefiles_coherency(object, file_inode(file)->i_ino,
+ buf->content, why);
+ kfree(buf);
+ return ret;
+}
+
+/*
+ * remove the object's xattr to mark it stale
+ */
+int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
+ struct cachefiles_object *object,
+ struct dentry *dentry)
+{
+ int ret;
+
+ ret = cachefiles_inject_remove_error();
+ if (ret == 0)
+ ret = vfs_removexattr(&init_user_ns, dentry, cachefiles_xattr_cache);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(object, d_inode(dentry), ret,
+ cachefiles_trace_remxattr_error);
+ if (ret == -ENOENT || ret == -ENODATA)
+ ret = 0;
+ else if (ret != -ENOMEM)
+ cachefiles_io_error(cache,
+ "Can't remove xattr from %lu"
+ " (error %d)",
+ d_backing_inode(dentry)->i_ino, -ret);
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * Stick a marker on the cache object to indicate that it's dirty.
+ */
+void cachefiles_prepare_to_write(struct fscache_cookie *cookie)
+{
+ const struct cred *saved_cred;
+ struct cachefiles_object *object = cookie->cache_priv;
+ struct cachefiles_cache *cache = object->volume->cache;
+
+ _enter("c=%08x", object->cookie->debug_id);
+
+ if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
+ cachefiles_begin_secure(cache, &saved_cred);
+ cachefiles_set_object_xattr(object);
+ cachefiles_end_secure(cache, saved_cred);
+ }
+}
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 54815cc776ba..98b1eee4a7a8 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -42,6 +42,19 @@ enum fscache_why_object_killed {
FSCACHE_OBJECT_WAS_CULLED,
};

+enum cachefiles_coherency_trace {
+ cachefiles_coherency_check_aux,
+ cachefiles_coherency_check_content,
+ cachefiles_coherency_check_dirty,
+ cachefiles_coherency_check_len,
+ cachefiles_coherency_check_objsize,
+ cachefiles_coherency_check_ok,
+ cachefiles_coherency_check_type,
+ cachefiles_coherency_check_xattr,
+ cachefiles_coherency_set_fail,
+ cachefiles_coherency_set_ok,
+};
+
enum cachefiles_trunc_trace {
cachefiles_trunc_dio_adjust,
cachefiles_trunc_expand_tmpfile,
@@ -95,6 +108,18 @@ enum cachefiles_error_trace {
EM(cachefiles_obj_see_withdraw_cookie, "SEE withdraw_cookie") \
E_(cachefiles_obj_see_withdrawal, "SEE withdrawal")

+#define cachefiles_coherency_traces \
+ EM(cachefiles_coherency_check_aux, "BAD aux ") \
+ EM(cachefiles_coherency_check_content, "BAD cont") \
+ EM(cachefiles_coherency_check_dirty, "BAD dirt") \
+ EM(cachefiles_coherency_check_len, "BAD len ") \
+ EM(cachefiles_coherency_check_objsize, "BAD osiz") \
+ EM(cachefiles_coherency_check_ok, "OK ") \
+ EM(cachefiles_coherency_check_type, "BAD type") \
+ EM(cachefiles_coherency_check_xattr, "BAD xatt") \
+ EM(cachefiles_coherency_set_fail, "SET fail") \
+ E_(cachefiles_coherency_set_ok, "SET ok ")
+
#define cachefiles_trunc_traces \
EM(cachefiles_trunc_dio_adjust, "DIOADJ") \
EM(cachefiles_trunc_expand_tmpfile, "EXPTMP") \
@@ -130,6 +155,7 @@ enum cachefiles_error_trace {

cachefiles_obj_kill_traces;
cachefiles_obj_ref_traces;
+cachefiles_coherency_traces;
cachefiles_trunc_traces;
cachefiles_error_traces;

@@ -287,6 +313,36 @@ TRACE_EVENT(cachefiles_rename,
__print_symbolic(__entry->why, cachefiles_obj_kill_traces))
);

+TRACE_EVENT(cachefiles_coherency,
+ TP_PROTO(struct cachefiles_object *obj,
+ ino_t ino,
+ enum cachefiles_content content,
+ enum cachefiles_coherency_trace why),
+
+ TP_ARGS(obj, ino, content, why),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(enum cachefiles_coherency_trace, why )
+ __field(enum cachefiles_content, content )
+ __field(u64, ino )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->why = why;
+ __entry->content = content;
+ __entry->ino = ino;
+ ),
+
+ TP_printk("o=%08x %s i=%llx c=%u",
+ __entry->obj,
+ __print_symbolic(__entry->why, cachefiles_coherency_traces),
+ __entry->ino,
+ __entry->content)
+ );
+
TRACE_EVENT(cachefiles_trunc,
TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
loff_t from, loff_t to, enum cachefiles_trunc_trace why),



2021-12-22 23:25:52

by David Howells

[permalink] [raw]
Subject: [PATCH v4 46/68] cachefiles: Mark a backing file in use with an inode flag

Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
the kernel to prevent cachefiles or other kernel services from interfering
with that file.

Using S_SWAPFILE instead isn't really viable as that has other effects in
the I/O paths.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819642273.215744.6414248677118690672.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906943215.143852.16972351425323967014.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967154118.1823006.13227551961786743991.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/internal.h | 2 ++
fs/cachefiles/namei.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 01071e7a7c02..7c67a70a3dff 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -187,6 +187,8 @@ extern struct kmem_cache *cachefiles_object_jar;
/*
* namei.c
*/
+extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+ struct file *file);
extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
struct dentry *dir,
const char *name,
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 11a33209ab5f..db60a671c3fc 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -31,6 +31,18 @@ static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
return can_use;
}

+static bool cachefiles_mark_inode_in_use(struct cachefiles_object *object,
+ struct dentry *dentry)
+{
+ struct inode *inode = d_backing_inode(dentry);
+ bool can_use;
+
+ inode_lock(inode);
+ can_use = __cachefiles_mark_inode_in_use(object, dentry);
+ inode_unlock(inode);
+ return can_use;
+}
+
/*
* Unmark a backing inode. The caller must hold the inode lock.
*/
@@ -43,6 +55,29 @@ static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
trace_cachefiles_mark_inactive(object, inode);
}

+/*
+ * Unmark a backing inode and tell cachefilesd that there's something that can
+ * be culled.
+ */
+void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
+ struct file *file)
+{
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct inode *inode = file_inode(file);
+
+ if (inode) {
+ inode_lock(inode);
+ __cachefiles_unmark_inode_in_use(object, file->f_path.dentry);
+ inode_unlock(inode);
+
+ if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
+ atomic_long_add(inode->i_blocks, &cache->b_released);
+ if (atomic_inc_return(&cache->f_released))
+ cachefiles_state_changed(cache);
+ }
+ }
+}
+
/*
* get a subdirectory
*/



2021-12-22 23:26:17

by David Howells

[permalink] [raw]
Subject: [PATCH v4 47/68] cachefiles: Implement culling daemon commands

Implement the ability for the userspace daemon to try and cull a file or
directory in the cache. Two daemon commands are implemented:

(1) The "inuse" command. This queries if a file is in use or whether it
can be deleted. It checks the S_KERNEL_FILE flag on the inode
referred to by the specified filename.

(2) The "cull" command. This asks for a file or directory to be removed,
where removal means either unlinking it or moving it to the graveyard
directory for userspace to dismantle.

Changes
=======
ver #2:
- Fix logging of wrong error[1].
- Need to unmark an inode we've moved to the graveyard before unlocking.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/20211203094950.GA2480@kili/ [1]
Link: https://lore.kernel.org/r/163819643179.215744.13641580295708315695.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906945705.143852.8177595531814485350.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967155792.1823006.1088936326902550910.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/daemon.c | 4 -
fs/cachefiles/internal.h | 11 ++
fs/cachefiles/namei.c | 307 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 320 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 337597a4e30c..985c3f3e6767 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -574,7 +574,7 @@ static int cachefiles_daemon_cull(struct cachefiles_cache *cache, char *args)
goto notdir;

cachefiles_begin_secure(cache, &saved_cred);
- ret = -ENOANO; // PLACEHOLDER: Do culling
+ ret = cachefiles_cull(cache, path.dentry, args);
cachefiles_end_secure(cache, saved_cred);

path_put(&path);
@@ -645,7 +645,7 @@ static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
goto notdir;

cachefiles_begin_secure(cache, &saved_cred);
- ret = -ENOANO; // PLACEHOLDER: Check if in use
+ ret = cachefiles_check_in_use(cache, path.dentry, args);
cachefiles_end_secure(cache, saved_cred);

path_put(&path);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 7c67a70a3dff..654dbd51b965 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -189,12 +189,23 @@ extern struct kmem_cache *cachefiles_object_jar;
*/
extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
struct file *file);
+extern int cachefiles_bury_object(struct cachefiles_cache *cache,
+ struct cachefiles_object *object,
+ struct dentry *dir,
+ struct dentry *rep,
+ enum fscache_why_object_killed why);
extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
struct dentry *dir,
const char *name,
bool *_is_new);
extern void cachefiles_put_directory(struct dentry *dir);

+extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename);
+
+extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
+ struct dentry *dir, char *filename);
+
/*
* security.c
*/
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index db60a671c3fc..e87c401239f1 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -217,3 +217,310 @@ void cachefiles_put_directory(struct dentry *dir)
dput(dir);
}
}
+
+/*
+ * Remove a regular file from the cache.
+ */
+static int cachefiles_unlink(struct cachefiles_cache *cache,
+ struct cachefiles_object *object,
+ struct dentry *dir, struct dentry *dentry,
+ enum fscache_why_object_killed why)
+{
+ struct path path = {
+ .mnt = cache->mnt,
+ .dentry = dir,
+ };
+ int ret;
+
+ trace_cachefiles_unlink(object, dentry, why);
+ ret = security_path_unlink(&path, dentry);
+ if (ret < 0) {
+ cachefiles_io_error(cache, "Unlink security error");
+ return ret;
+ }
+
+ ret = cachefiles_inject_remove_error();
+ if (ret == 0) {
+ ret = vfs_unlink(&init_user_ns, d_backing_inode(dir), dentry, NULL);
+ if (ret == -EIO)
+ cachefiles_io_error(cache, "Unlink failed");
+ }
+ if (ret != 0)
+ trace_cachefiles_vfs_error(object, d_backing_inode(dir), ret,
+ cachefiles_trace_unlink_error);
+ return ret;
+}
+
+/*
+ * Delete an object representation from the cache
+ * - File backed objects are unlinked
+ * - Directory backed objects are stuffed into the graveyard for userspace to
+ * delete
+ */
+int cachefiles_bury_object(struct cachefiles_cache *cache,
+ struct cachefiles_object *object,
+ struct dentry *dir,
+ struct dentry *rep,
+ enum fscache_why_object_killed why)
+{
+ struct dentry *grave, *trap;
+ struct path path, path_to_graveyard;
+ char nbuffer[8 + 8 + 1];
+ int ret;
+
+ _enter(",'%pd','%pd'", dir, rep);
+
+ if (rep->d_parent != dir) {
+ inode_unlock(d_inode(dir));
+ _leave(" = -ESTALE");
+ return -ESTALE;
+ }
+
+ /* non-directories can just be unlinked */
+ if (!d_is_dir(rep)) {
+ dget(rep); /* Stop the dentry being negated if it's only pinned
+ * by a file struct.
+ */
+ ret = cachefiles_unlink(cache, object, dir, rep, why);
+ dput(rep);
+
+ inode_unlock(d_inode(dir));
+ _leave(" = %d", ret);
+ return ret;
+ }
+
+ /* directories have to be moved to the graveyard */
+ _debug("move stale object to graveyard");
+ inode_unlock(d_inode(dir));
+
+try_again:
+ /* first step is to make up a grave dentry in the graveyard */
+ sprintf(nbuffer, "%08x%08x",
+ (uint32_t) ktime_get_real_seconds(),
+ (uint32_t) atomic_inc_return(&cache->gravecounter));
+
+ /* do the multiway lock magic */
+ trap = lock_rename(cache->graveyard, dir);
+
+ /* do some checks before getting the grave dentry */
+ if (rep->d_parent != dir || IS_DEADDIR(d_inode(rep))) {
+ /* the entry was probably culled when we dropped the parent dir
+ * lock */
+ unlock_rename(cache->graveyard, dir);
+ _leave(" = 0 [culled?]");
+ return 0;
+ }
+
+ if (!d_can_lookup(cache->graveyard)) {
+ unlock_rename(cache->graveyard, dir);
+ cachefiles_io_error(cache, "Graveyard no longer a directory");
+ return -EIO;
+ }
+
+ if (trap == rep) {
+ unlock_rename(cache->graveyard, dir);
+ cachefiles_io_error(cache, "May not make directory loop");
+ return -EIO;
+ }
+
+ if (d_mountpoint(rep)) {
+ unlock_rename(cache->graveyard, dir);
+ cachefiles_io_error(cache, "Mountpoint in cache");
+ return -EIO;
+ }
+
+ grave = lookup_one_len(nbuffer, cache->graveyard, strlen(nbuffer));
+ if (IS_ERR(grave)) {
+ unlock_rename(cache->graveyard, dir);
+ trace_cachefiles_vfs_error(object, d_inode(cache->graveyard),
+ PTR_ERR(grave),
+ cachefiles_trace_lookup_error);
+
+ if (PTR_ERR(grave) == -ENOMEM) {
+ _leave(" = -ENOMEM");
+ return -ENOMEM;
+ }
+
+ cachefiles_io_error(cache, "Lookup error %ld", PTR_ERR(grave));
+ return -EIO;
+ }
+
+ if (d_is_positive(grave)) {
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ grave = NULL;
+ cond_resched();
+ goto try_again;
+ }
+
+ if (d_mountpoint(grave)) {
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ cachefiles_io_error(cache, "Mountpoint in graveyard");
+ return -EIO;
+ }
+
+ /* target should not be an ancestor of source */
+ if (trap == grave) {
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ cachefiles_io_error(cache, "May not make directory loop");
+ return -EIO;
+ }
+
+ /* attempt the rename */
+ path.mnt = cache->mnt;
+ path.dentry = dir;
+ path_to_graveyard.mnt = cache->mnt;
+ path_to_graveyard.dentry = cache->graveyard;
+ ret = security_path_rename(&path, rep, &path_to_graveyard, grave, 0);
+ if (ret < 0) {
+ cachefiles_io_error(cache, "Rename security error %d", ret);
+ } else {
+ struct renamedata rd = {
+ .old_mnt_userns = &init_user_ns,
+ .old_dir = d_inode(dir),
+ .old_dentry = rep,
+ .new_mnt_userns = &init_user_ns,
+ .new_dir = d_inode(cache->graveyard),
+ .new_dentry = grave,
+ };
+ trace_cachefiles_rename(object, rep, grave, why);
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ ret = vfs_rename(&rd);
+ if (ret != 0)
+ trace_cachefiles_vfs_error(object, d_inode(dir), ret,
+ cachefiles_trace_rename_error);
+ if (ret != 0 && ret != -ENOMEM)
+ cachefiles_io_error(cache,
+ "Rename failed with error %d", ret);
+ }
+
+ __cachefiles_unmark_inode_in_use(object, rep);
+ unlock_rename(cache->graveyard, dir);
+ dput(grave);
+ _leave(" = 0");
+ return 0;
+}
+
+/*
+ * Look up an inode to be checked or culled. Return -EBUSY if the inode is
+ * marked in use.
+ */
+static struct dentry *cachefiles_lookup_for_cull(struct cachefiles_cache *cache,
+ struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+ int ret = -ENOENT;
+
+ inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+
+ victim = lookup_one_len(filename, dir, strlen(filename));
+ if (IS_ERR(victim))
+ goto lookup_error;
+ if (d_is_negative(victim))
+ goto lookup_put;
+ if (d_inode(victim)->i_flags & S_KERNEL_FILE)
+ goto lookup_busy;
+ return victim;
+
+lookup_busy:
+ ret = -EBUSY;
+lookup_put:
+ inode_unlock(d_inode(dir));
+ dput(victim);
+ return ERR_PTR(ret);
+
+lookup_error:
+ inode_unlock(d_inode(dir));
+ ret = PTR_ERR(victim);
+ if (ret == -ENOENT)
+ return ERR_PTR(-ESTALE); /* Probably got retired by the netfs */
+
+ if (ret == -EIO) {
+ cachefiles_io_error(cache, "Lookup failed");
+ } else if (ret != -ENOMEM) {
+ pr_err("Internal error: %d\n", ret);
+ ret = -EIO;
+ }
+
+ return ERR_PTR(ret);
+}
+
+/*
+ * Cull an object if it's not in use
+ * - called only by cache manager daemon
+ */
+int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+ struct inode *inode;
+ int ret;
+
+ _enter(",%pd/,%s", dir, filename);
+
+ victim = cachefiles_lookup_for_cull(cache, dir, filename);
+ if (IS_ERR(victim))
+ return PTR_ERR(victim);
+
+ /* check to see if someone is using this object */
+ inode = d_inode(victim);
+ inode_lock(inode);
+ if (inode->i_flags & S_KERNEL_FILE) {
+ ret = -EBUSY;
+ } else {
+ /* Stop the cache from picking it back up */
+ inode->i_flags |= S_KERNEL_FILE;
+ ret = 0;
+ }
+ inode_unlock(inode);
+ if (ret < 0)
+ goto error_unlock;
+
+ ret = cachefiles_bury_object(cache, NULL, dir, victim,
+ FSCACHE_OBJECT_WAS_CULLED);
+ if (ret < 0)
+ goto error;
+
+ dput(victim);
+ _leave(" = 0");
+ return 0;
+
+error_unlock:
+ inode_unlock(d_inode(dir));
+error:
+ dput(victim);
+ if (ret == -ENOENT)
+ return -ESTALE; /* Probably got retired by the netfs */
+
+ if (ret != -ENOMEM) {
+ pr_err("Internal error: %d\n", ret);
+ ret = -EIO;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * Find out if an object is in use or not
+ * - called only by cache manager daemon
+ * - returns -EBUSY or 0 to indicate whether an object is in use or not
+ */
+int cachefiles_check_in_use(struct cachefiles_cache *cache, struct dentry *dir,
+ char *filename)
+{
+ struct dentry *victim;
+ int ret = 0;
+
+ victim = cachefiles_lookup_for_cull(cache, dir, filename);
+ if (IS_ERR(victim))
+ return PTR_ERR(victim);
+
+ inode_unlock(d_inode(dir));
+ dput(victim);
+ return ret;
+}



2021-12-22 23:26:31

by David Howells

[permalink] [raw]
Subject: [PATCH v4 48/68] cachefiles: Implement backing file wrangling

Implement the wrangling of backing files, including the following pieces:

(1) Lookup and creation of a file on disk, using a tmpfile if the file
isn't yet present. The file is then opened, sized for DIO and the
file handle is attached to the cachefiles_object struct. The inode is
marked to indicate that it's in use by a kernel service.

(2) Invalidation of an object, creating a tmpfile and switching the file
pointer in the cachefiles object.

(3) Committing a file to disk, including setting the coherency xattr on it
and, if necessary, creating a hard link to it.

Note that this would be a good place to use Omar Sandoval's vfs_link()
with AT_LINK_REPLACE[1] as I may have to unlink an old file before I
can link a tmpfile into place.

(4) Withdrawal of open objects when a cache is being withdrawn or a cookie
is relinquished. This involves committing or discarding the file.

Changes
=======
ver #2:
- Fix logging of wrong error[1].

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/20211203094950.GA2480@kili/ [1]
Link: https://lore.kernel.org/r/163819644097.215744.4505389616742411239.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906949512.143852.14222856795032602080.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967158526.1823006.17482695321424642675.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/cache.c | 32 ++++-
fs/cachefiles/daemon.c | 1
fs/cachefiles/interface.c | 260 +++++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 9 +
fs/cachefiles/namei.c | 318 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 619 insertions(+), 1 deletion(-)

diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
index c4b9280ca0cd..e2cbbc08bad9 100644
--- a/fs/cachefiles/cache.c
+++ b/fs/cachefiles/cache.c
@@ -262,6 +262,36 @@ int cachefiles_has_space(struct cachefiles_cache *cache,
return ret;
}

+/*
+ * Mark all the objects as being out of service and queue them all for cleanup.
+ */
+static void cachefiles_withdraw_objects(struct cachefiles_cache *cache)
+{
+ struct cachefiles_object *object;
+ unsigned int count = 0;
+
+ _enter("");
+
+ spin_lock(&cache->object_list_lock);
+
+ while (!list_empty(&cache->object_list)) {
+ object = list_first_entry(&cache->object_list,
+ struct cachefiles_object, cache_link);
+ cachefiles_see_object(object, cachefiles_obj_see_withdrawal);
+ list_del_init(&object->cache_link);
+ fscache_withdraw_cookie(object->cookie);
+ count++;
+ if ((count & 63) == 0) {
+ spin_unlock(&cache->object_list_lock);
+ cond_resched();
+ spin_lock(&cache->object_list_lock);
+ }
+ }
+
+ spin_unlock(&cache->object_list_lock);
+ _leave(" [%u objs]", count);
+}
+
/*
* Withdraw volumes.
*/
@@ -326,7 +356,7 @@ void cachefiles_withdraw_cache(struct cachefiles_cache *cache)
/* we now have to destroy all the active objects pertaining to this
* cache - which we do by passing them off to thread pool to be
* disposed of */
- // PLACEHOLDER: Withdraw objects
+ cachefiles_withdraw_objects(cache);
fscache_wait_for_objects(fscache);

cachefiles_withdraw_volumes(cache);
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 985c3f3e6767..61e8740d01be 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -106,6 +106,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
mutex_init(&cache->daemon_mutex);
init_waitqueue_head(&cache->daemon_pollwq);
INIT_LIST_HEAD(&cache->volumes);
+ INIT_LIST_HEAD(&cache->object_list);
spin_lock_init(&cache->object_list_lock);

/* set default caching limits
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 68bb7b6c4945..e47c52c34071 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -99,8 +99,268 @@ void cachefiles_put_object(struct cachefiles_object *object,
_leave("");
}

+/*
+ * Adjust the size of a cache file if necessary to match the DIO size. We keep
+ * the EOF marker a multiple of DIO blocks so that we don't fall back to doing
+ * non-DIO for a partial block straddling the EOF, but we also have to be
+ * careful of someone expanding the file and accidentally accreting the
+ * padding.
+ */
+static int cachefiles_adjust_size(struct cachefiles_object *object)
+{
+ struct iattr newattrs;
+ struct file *file = object->file;
+ uint64_t ni_size;
+ loff_t oi_size;
+ int ret;
+
+ ni_size = object->cookie->object_size;
+ ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
+ _enter("{OBJ%x},[%llu]",
+ object->debug_id, (unsigned long long) ni_size);
+
+ if (!file)
+ return -ENOBUFS;
+
+ oi_size = i_size_read(file_inode(file));
+ if (oi_size == ni_size)
+ return 0;
+
+ inode_lock(file_inode(file));
+
+ /* if there's an extension to a partial page at the end of the backing
+ * file, we need to discard the partial page so that we pick up new
+ * data after it */
+ if (oi_size & ~PAGE_MASK && ni_size > oi_size) {
+ _debug("discard tail %llx", oi_size);
+ newattrs.ia_valid = ATTR_SIZE;
+ newattrs.ia_size = oi_size & PAGE_MASK;
+ ret = cachefiles_inject_remove_error();
+ if (ret == 0)
+ ret = notify_change(&init_user_ns, file->f_path.dentry,
+ &newattrs, NULL);
+ if (ret < 0)
+ goto truncate_failed;
+ }
+
+ newattrs.ia_valid = ATTR_SIZE;
+ newattrs.ia_size = ni_size;
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = notify_change(&init_user_ns, file->f_path.dentry,
+ &newattrs, NULL);
+
+truncate_failed:
+ inode_unlock(file_inode(file));
+
+ if (ret < 0)
+ trace_cachefiles_io_error(NULL, file_inode(file), ret,
+ cachefiles_trace_notify_change_error);
+ if (ret == -EIO) {
+ cachefiles_io_error_obj(object, "Size set failed");
+ ret = -ENOBUFS;
+ }
+
+ _leave(" = %d", ret);
+ return ret;
+}
+
+/*
+ * Attempt to look up the nominated node in this cache
+ */
+static bool cachefiles_lookup_cookie(struct fscache_cookie *cookie)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache = cookie->volume->cache->cache_priv;
+ const struct cred *saved_cred;
+ bool success;
+
+ object = cachefiles_alloc_object(cookie);
+ if (!object)
+ goto fail;
+
+ _enter("{OBJ%x}", object->debug_id);
+
+ if (!cachefiles_cook_key(object))
+ goto fail_put;
+
+ cookie->cache_priv = object;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+
+ success = cachefiles_look_up_object(object);
+ if (!success)
+ goto fail_withdraw;
+
+ cachefiles_see_object(object, cachefiles_obj_see_lookup_cookie);
+
+ spin_lock(&cache->object_list_lock);
+ list_add(&object->cache_link, &cache->object_list);
+ spin_unlock(&cache->object_list_lock);
+ cachefiles_adjust_size(object);
+
+ cachefiles_end_secure(cache, saved_cred);
+ _leave(" = t");
+ return true;
+
+fail_withdraw:
+ cachefiles_end_secure(cache, saved_cred);
+ cachefiles_see_object(object, cachefiles_obj_see_lookup_failed);
+ fscache_caching_failed(cookie);
+ _debug("failed c=%08x o=%08x", cookie->debug_id, object->debug_id);
+ /* The caller holds an access count on the cookie, so we need them to
+ * drop it before we can withdraw the object.
+ */
+ return false;
+
+fail_put:
+ cachefiles_put_object(object, cachefiles_obj_put_alloc_fail);
+fail:
+ return false;
+}
+
+/*
+ * Commit changes to the object as we drop it.
+ */
+static void cachefiles_commit_object(struct cachefiles_object *object,
+ struct cachefiles_cache *cache)
+{
+ bool update = false;
+
+ if (test_and_clear_bit(FSCACHE_COOKIE_LOCAL_WRITE, &object->cookie->flags))
+ update = true;
+ if (test_and_clear_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags))
+ update = true;
+ if (update)
+ cachefiles_set_object_xattr(object);
+
+ if (test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags))
+ cachefiles_commit_tmpfile(cache, object);
+}
+
+/*
+ * Finalise and object and close the VFS structs that we have.
+ */
+static void cachefiles_clean_up_object(struct cachefiles_object *object,
+ struct cachefiles_cache *cache)
+{
+ if (test_bit(FSCACHE_COOKIE_RETIRED, &object->cookie->flags)) {
+ if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
+ cachefiles_see_object(object, cachefiles_obj_see_clean_delete);
+ _debug("- inval object OBJ%x", object->debug_id);
+ cachefiles_delete_object(object, FSCACHE_OBJECT_WAS_RETIRED);
+ } else {
+ cachefiles_see_object(object, cachefiles_obj_see_clean_drop_tmp);
+ _debug("- inval object OBJ%x tmpfile", object->debug_id);
+ }
+ } else {
+ cachefiles_see_object(object, cachefiles_obj_see_clean_commit);
+ cachefiles_commit_object(object, cache);
+ }
+
+ cachefiles_unmark_inode_in_use(object, object->file);
+ if (object->file) {
+ fput(object->file);
+ object->file = NULL;
+ }
+}
+
+/*
+ * Withdraw caching for a cookie.
+ */
+static void cachefiles_withdraw_cookie(struct fscache_cookie *cookie)
+{
+ struct cachefiles_object *object = cookie->cache_priv;
+ struct cachefiles_cache *cache = object->volume->cache;
+ const struct cred *saved_cred;
+
+ _enter("o=%x", object->debug_id);
+ cachefiles_see_object(object, cachefiles_obj_see_withdraw_cookie);
+
+ if (!list_empty(&object->cache_link)) {
+ spin_lock(&cache->object_list_lock);
+ cachefiles_see_object(object, cachefiles_obj_see_withdrawal);
+ list_del_init(&object->cache_link);
+ spin_unlock(&cache->object_list_lock);
+ }
+
+ if (object->file) {
+ cachefiles_begin_secure(cache, &saved_cred);
+ cachefiles_clean_up_object(object, cache);
+ cachefiles_end_secure(cache, saved_cred);
+ }
+
+ cookie->cache_priv = NULL;
+ cachefiles_put_object(object, cachefiles_obj_put_detach);
+}
+
+/*
+ * Invalidate the storage associated with a cookie.
+ */
+static bool cachefiles_invalidate_cookie(struct fscache_cookie *cookie)
+{
+ struct cachefiles_object *object = cookie->cache_priv;
+ struct file *new_file, *old_file;
+ bool old_tmpfile;
+
+ _enter("o=%x,[%llu]", object->debug_id, object->cookie->object_size);
+
+ old_tmpfile = test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+
+ if (!object->file) {
+ fscache_resume_after_invalidation(cookie);
+ _leave(" = t [light]");
+ return true;
+ }
+
+ new_file = cachefiles_create_tmpfile(object);
+ if (IS_ERR(new_file))
+ goto failed;
+
+ /* Substitute the VFS target */
+ _debug("sub");
+ spin_lock(&object->lock);
+
+ old_file = object->file;
+ object->file = new_file;
+ object->content_info = CACHEFILES_CONTENT_NO_DATA;
+ set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+ set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags);
+
+ spin_unlock(&object->lock);
+ _debug("subbed");
+
+ /* Allow I/O to take place again */
+ fscache_resume_after_invalidation(cookie);
+
+ if (old_file) {
+ if (!old_tmpfile) {
+ struct cachefiles_volume *volume = object->volume;
+ struct dentry *fan = volume->fanout[(u8)cookie->key_hash];
+
+ inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
+ cachefiles_bury_object(volume->cache, object, fan,
+ old_file->f_path.dentry,
+ FSCACHE_OBJECT_INVALIDATED);
+ }
+ fput(old_file);
+ }
+
+ _leave(" = t");
+ return true;
+
+failed:
+ _leave(" = f");
+ return false;
+}
+
const struct fscache_cache_ops cachefiles_cache_ops = {
.name = "cachefiles",
.acquire_volume = cachefiles_acquire_volume,
.free_volume = cachefiles_free_volume,
+ .lookup_cookie = cachefiles_lookup_cookie,
+ .withdraw_cookie = cachefiles_withdraw_cookie,
+ .invalidate_cookie = cachefiles_invalidate_cookie,
+ .prepare_to_write = cachefiles_prepare_to_write,
};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 654dbd51b965..d7aae04edc61 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -16,6 +16,8 @@
#include <linux/cred.h>
#include <linux/security.h>

+#define CACHEFILES_DIO_BLOCK_SIZE 4096
+
struct cachefiles_cache;
struct cachefiles_object;

@@ -68,6 +70,7 @@ struct cachefiles_cache {
struct dentry *graveyard; /* directory into which dead objects go */
struct file *cachefilesd; /* manager daemon handle */
struct list_head volumes; /* List of volume objects */
+ struct list_head object_list; /* List of active objects */
spinlock_t object_list_lock; /* Lock for volumes and object_list */
const struct cred *cache_cred; /* security override for accessing cache */
struct mutex daemon_mutex; /* command serialisation mutex */
@@ -194,6 +197,9 @@ extern int cachefiles_bury_object(struct cachefiles_cache *cache,
struct dentry *dir,
struct dentry *rep,
enum fscache_why_object_killed why);
+extern int cachefiles_delete_object(struct cachefiles_object *object,
+ enum fscache_why_object_killed why);
+extern bool cachefiles_look_up_object(struct cachefiles_object *object);
extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
struct dentry *dir,
const char *name,
@@ -205,6 +211,9 @@ extern int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,

extern int cachefiles_check_in_use(struct cachefiles_cache *cache,
struct dentry *dir, char *filename);
+extern struct file *cachefiles_create_tmpfile(struct cachefiles_object *object);
+extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+ struct cachefiles_object *object);

/*
* security.c
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index e87c401239f1..b549e9f79c01 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -404,6 +404,324 @@ int cachefiles_bury_object(struct cachefiles_cache *cache,
return 0;
}

+/*
+ * Delete a cache file.
+ */
+int cachefiles_delete_object(struct cachefiles_object *object,
+ enum fscache_why_object_killed why)
+{
+ struct cachefiles_volume *volume = object->volume;
+ struct dentry *dentry = object->file->f_path.dentry;
+ struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
+ int ret;
+
+ _enter(",OBJ%x{%pD}", object->debug_id, object->file);
+
+ /* Stop the dentry being negated if it's only pinned by a file struct. */
+ dget(dentry);
+
+ inode_lock_nested(d_backing_inode(fan), I_MUTEX_PARENT);
+ ret = cachefiles_unlink(volume->cache, object, fan, dentry, why);
+ inode_unlock(d_backing_inode(fan));
+ dput(dentry);
+ return ret;
+}
+
+/*
+ * Create a temporary file and leave it unattached and un-xattr'd until the
+ * time comes to discard the object from memory.
+ */
+struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
+{
+ struct cachefiles_volume *volume = object->volume;
+ struct cachefiles_cache *cache = volume->cache;
+ const struct cred *saved_cred;
+ struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
+ struct file *file;
+ struct path path;
+ uint64_t ni_size = object->cookie->object_size;
+ long ret;
+
+ ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
+ cachefiles_begin_secure(cache, &saved_cred);
+
+ path.mnt = cache->mnt;
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ path.dentry = vfs_tmpfile(&init_user_ns, fan, S_IFREG, O_RDWR);
+ else
+ path.dentry = ERR_PTR(ret);
+ if (IS_ERR(path.dentry)) {
+ trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(path.dentry),
+ cachefiles_trace_tmpfile_error);
+ if (PTR_ERR(path.dentry) == -EIO)
+ cachefiles_io_error_obj(object, "Failed to create tmpfile");
+ file = ERR_CAST(path.dentry);
+ goto out;
+ }
+
+ trace_cachefiles_tmpfile(object, d_backing_inode(path.dentry));
+
+ if (!cachefiles_mark_inode_in_use(object, path.dentry)) {
+ file = ERR_PTR(-EBUSY);
+ goto out_dput;
+ }
+
+ if (ni_size > 0) {
+ trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
+ cachefiles_trunc_expand_tmpfile);
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = vfs_truncate(&path, ni_size);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(
+ object, d_backing_inode(path.dentry), ret,
+ cachefiles_trace_trunc_error);
+ file = ERR_PTR(ret);
+ goto out_dput;
+ }
+ }
+
+ file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DIRECT,
+ d_backing_inode(path.dentry), cache->cache_cred);
+ if (IS_ERR(file)) {
+ trace_cachefiles_vfs_error(object, d_backing_inode(path.dentry),
+ PTR_ERR(file),
+ cachefiles_trace_open_error);
+ goto out_dput;
+ }
+ if (unlikely(!file->f_op->read_iter) ||
+ unlikely(!file->f_op->write_iter)) {
+ fput(file);
+ pr_notice("Cache does not support read_iter and write_iter\n");
+ file = ERR_PTR(-EINVAL);
+ }
+
+out_dput:
+ dput(path.dentry);
+out:
+ cachefiles_end_secure(cache, saved_cred);
+ return file;
+}
+
+/*
+ * Create a new file.
+ */
+static bool cachefiles_create_file(struct cachefiles_object *object)
+{
+ struct file *file;
+ int ret;
+
+ ret = cachefiles_has_space(object->volume->cache, 1, 0);
+ if (ret < 0)
+ return false;
+
+ file = cachefiles_create_tmpfile(object);
+ if (IS_ERR(file))
+ return false;
+
+ set_bit(FSCACHE_COOKIE_NEEDS_UPDATE, &object->cookie->flags);
+ set_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+ _debug("create -> %pD{ino=%lu}", file, file_inode(file)->i_ino);
+ object->file = file;
+ return true;
+}
+
+/*
+ * Open an existing file, checking its attributes and replacing it if it is
+ * stale.
+ */
+static bool cachefiles_open_file(struct cachefiles_object *object,
+ struct dentry *dentry)
+{
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct file *file;
+ struct path path;
+ int ret;
+
+ _enter("%pd", dentry);
+
+ if (!cachefiles_mark_inode_in_use(object, dentry))
+ return false;
+
+ /* We need to open a file interface onto a data file now as we can't do
+ * it on demand because writeback called from do_exit() sees
+ * current->fs == NULL - which breaks d_path() called from ext4 open.
+ */
+ path.mnt = cache->mnt;
+ path.dentry = dentry;
+ file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DIRECT,
+ d_backing_inode(dentry), cache->cache_cred);
+ if (IS_ERR(file)) {
+ trace_cachefiles_vfs_error(object, d_backing_inode(dentry),
+ PTR_ERR(file),
+ cachefiles_trace_open_error);
+ goto error;
+ }
+
+ if (unlikely(!file->f_op->read_iter) ||
+ unlikely(!file->f_op->write_iter)) {
+ pr_notice("Cache does not support read_iter and write_iter\n");
+ goto error_fput;
+ }
+ _debug("file -> %pd positive", dentry);
+
+ ret = cachefiles_check_auxdata(object, file);
+ if (ret < 0)
+ goto check_failed;
+
+ object->file = file;
+
+ /* Always update the atime on an object we've just looked up (this is
+ * used to keep track of culling, and atimes are only updated by read,
+ * write and readdir but not lookup or open).
+ */
+ touch_atime(&file->f_path);
+ dput(dentry);
+ return true;
+
+check_failed:
+ fscache_cookie_lookup_negative(object->cookie);
+ cachefiles_unmark_inode_in_use(object, file);
+ if (ret == -ESTALE) {
+ fput(file);
+ dput(dentry);
+ return cachefiles_create_file(object);
+ }
+error_fput:
+ fput(file);
+error:
+ dput(dentry);
+ return false;
+}
+
+/*
+ * walk from the parent object to the child object through the backing
+ * filesystem, creating directories as we go
+ */
+bool cachefiles_look_up_object(struct cachefiles_object *object)
+{
+ struct cachefiles_volume *volume = object->volume;
+ struct dentry *dentry, *fan = volume->fanout[(u8)object->cookie->key_hash];
+ int ret;
+
+ _enter("OBJ%x,%s,", object->debug_id, object->d_name);
+
+ /* Look up path "cache/vol/fanout/file". */
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ dentry = lookup_positive_unlocked(object->d_name, fan,
+ object->d_name_len);
+ else
+ dentry = ERR_PTR(ret);
+ trace_cachefiles_lookup(object, dentry);
+ if (IS_ERR(dentry)) {
+ if (dentry == ERR_PTR(-ENOENT))
+ goto new_file;
+ if (dentry == ERR_PTR(-EIO))
+ cachefiles_io_error_obj(object, "Lookup failed");
+ return false;
+ }
+
+ if (!d_is_reg(dentry)) {
+ pr_err("%pd is not a file\n", dentry);
+ inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
+ ret = cachefiles_bury_object(volume->cache, object, fan, dentry,
+ FSCACHE_OBJECT_IS_WEIRD);
+ dput(dentry);
+ if (ret < 0)
+ return false;
+ goto new_file;
+ }
+
+ if (!cachefiles_open_file(object, dentry))
+ return false;
+
+ _leave(" = t [%lu]", file_inode(object->file)->i_ino);
+ return true;
+
+new_file:
+ fscache_cookie_lookup_negative(object->cookie);
+ return cachefiles_create_file(object);
+}
+
+/*
+ * Attempt to link a temporary file into its rightful place in the cache.
+ */
+bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
+ struct cachefiles_object *object)
+{
+ struct cachefiles_volume *volume = object->volume;
+ struct dentry *dentry, *fan = volume->fanout[(u8)object->cookie->key_hash];
+ bool success = false;
+ int ret;
+
+ _enter(",%pD", object->file);
+
+ inode_lock_nested(d_inode(fan), I_MUTEX_PARENT);
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ dentry = lookup_one_len(object->d_name, fan, object->d_name_len);
+ else
+ dentry = ERR_PTR(ret);
+ if (IS_ERR(dentry)) {
+ trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
+ cachefiles_trace_lookup_error);
+ _debug("lookup fail %ld", PTR_ERR(dentry));
+ goto out_unlock;
+ }
+
+ if (!d_is_negative(dentry)) {
+ if (d_backing_inode(dentry) == file_inode(object->file)) {
+ success = true;
+ goto out_dput;
+ }
+
+ ret = cachefiles_unlink(volume->cache, object, fan, dentry,
+ FSCACHE_OBJECT_IS_STALE);
+ if (ret < 0)
+ goto out_dput;
+
+ dput(dentry);
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ dentry = lookup_one_len(object->d_name, fan, object->d_name_len);
+ else
+ dentry = ERR_PTR(ret);
+ if (IS_ERR(dentry)) {
+ trace_cachefiles_vfs_error(object, d_inode(fan), PTR_ERR(dentry),
+ cachefiles_trace_lookup_error);
+ _debug("lookup fail %ld", PTR_ERR(dentry));
+ goto out_unlock;
+ }
+ }
+
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ ret = vfs_link(object->file->f_path.dentry, &init_user_ns,
+ d_inode(fan), dentry, NULL);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(object, d_inode(fan), ret,
+ cachefiles_trace_link_error);
+ _debug("link fail %d", ret);
+ } else {
+ trace_cachefiles_link(object, file_inode(object->file));
+ spin_lock(&object->lock);
+ /* TODO: Do we want to switch the file pointer to the new dentry? */
+ clear_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags);
+ spin_unlock(&object->lock);
+ success = true;
+ }
+
+out_dput:
+ dput(dentry);
+out_unlock:
+ inode_unlock(d_inode(fan));
+ _leave(" = %u", success);
+ return success;
+}
+
/*
* Look up an inode to be checked or culled. Return -EBUSY if the inode is
* marked in use.



2021-12-22 23:26:52

by David Howells

[permalink] [raw]
Subject: [PATCH v4 49/68] cachefiles: Implement begin and end I/O operation

Implement the methods for beginning and ending an I/O operation.

When called to begin an I/O operation, we are guaranteed that the cookie
has reached a certain stage (we're called by fscache after it has done a
suitable wait).

If a file is available, we paste a ref over into the cache resources for
the I/O routines to use. This means that the object can be invalidated
whilst the I/O is ongoing without the need to synchronise as the file
pointer in the object is replaced, but the file pointer in the cache
resources is unaffected.

Ending the operation just requires ditching any refs we have and dropping
the access guarantee that fscache got for us on the cookie.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819645033.215744.2199344081658268312.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906951916.143852.9531384743995679857.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967161222.1823006.4461476204800357263.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/Makefile | 1 +
fs/cachefiles/interface.c | 1 +
fs/cachefiles/internal.h | 18 +++++++++++++
fs/cachefiles/io.c | 57 ++++++++++++++++++++++++++++++++++++++++
include/trace/events/fscache.h | 2 +
5 files changed, 79 insertions(+)
create mode 100644 fs/cachefiles/io.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index cb7a6bcf51eb..16d811f1a2fa 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -7,6 +7,7 @@ cachefiles-y := \
cache.o \
daemon.o \
interface.o \
+ io.o \
key.o \
main.o \
namei.o \
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index e47c52c34071..ad9d311413ff 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -362,5 +362,6 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.lookup_cookie = cachefiles_lookup_cookie,
.withdraw_cookie = cachefiles_withdraw_cookie,
.invalidate_cookie = cachefiles_invalidate_cookie,
+ .begin_operation = cachefiles_begin_operation,
.prepare_to_write = cachefiles_prepare_to_write,
};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index d7aae04edc61..d5868f5514d3 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -105,6 +105,18 @@ struct cachefiles_cache {

#include <trace/events/cachefiles.h>

+static inline
+struct file *cachefiles_cres_file(struct netfs_cache_resources *cres)
+{
+ return cres->cache_priv2;
+}
+
+static inline
+struct cachefiles_object *cachefiles_cres_object(struct netfs_cache_resources *cres)
+{
+ return fscache_cres_cookie(cres)->cache_priv;
+}
+
/*
* note change of state for daemon
*/
@@ -177,6 +189,12 @@ extern struct cachefiles_object *cachefiles_grab_object(struct cachefiles_object
extern void cachefiles_put_object(struct cachefiles_object *object,
enum cachefiles_obj_ref_trace why);

+/*
+ * io.c
+ */
+extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
+ enum fscache_want_state want_state);
+
/*
* key.c
*/
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
new file mode 100644
index 000000000000..adeb9a42fd7b
--- /dev/null
+++ b/fs/cachefiles/io.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* kiocb-using read/write
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/uio.h>
+#include <linux/falloc.h>
+#include <linux/sched/mm.h>
+#include <trace/events/fscache.h>
+#include "internal.h"
+
+/*
+ * Clean up an operation.
+ */
+static void cachefiles_end_operation(struct netfs_cache_resources *cres)
+{
+ struct file *file = cachefiles_cres_file(cres);
+
+ if (file)
+ fput(file);
+ fscache_end_cookie_access(fscache_cres_cookie(cres), fscache_access_io_end);
+}
+
+static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
+ .end_operation = cachefiles_end_operation,
+};
+
+/*
+ * Open the cache file when beginning a cache operation.
+ */
+bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
+ enum fscache_want_state want_state)
+{
+ struct cachefiles_object *object = cachefiles_cres_object(cres);
+
+ if (!cachefiles_cres_file(cres)) {
+ cres->ops = &cachefiles_netfs_cache_ops;
+ if (object->file) {
+ spin_lock(&object->lock);
+ if (!cres->cache_priv2 && object->file)
+ cres->cache_priv2 = get_file(object->file);
+ spin_unlock(&object->lock);
+ }
+ }
+
+ if (!cachefiles_cres_file(cres) && want_state != FSCACHE_WANT_PARAMS) {
+ pr_err("failed to get cres->file\n");
+ return false;
+ }
+
+ return true;
+}
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index d9d830296ec3..1594aefadeac 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -78,6 +78,7 @@ enum fscache_access_trace {
fscache_access_cache_unpin,
fscache_access_invalidate_cookie,
fscache_access_invalidate_cookie_end,
+ fscache_access_io_end,
fscache_access_io_not_live,
fscache_access_io_read,
fscache_access_io_resize,
@@ -152,6 +153,7 @@ enum fscache_access_trace {
EM(fscache_access_cache_unpin, "UNPIN cache ") \
EM(fscache_access_invalidate_cookie, "BEGIN inval ") \
EM(fscache_access_invalidate_cookie_end,"END inval ") \
+ EM(fscache_access_io_end, "END io ") \
EM(fscache_access_io_not_live, "END io_notl") \
EM(fscache_access_io_read, "BEGIN io_read") \
EM(fscache_access_io_resize, "BEGIN io_resz") \



2021-12-22 23:27:06

by David Howells

[permalink] [raw]
Subject: [PATCH v4 50/68] cachefiles: Implement cookie resize for truncate

Implement resizing an object, using truncate and/or fallocate to adjust the
object.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819646631.215744.13819016478175576761.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906952877.143852.4140962906331914859.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967162168.1823006.5941985259926902274.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/interface.c | 78 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index ad9d311413ff..51c968cd00a6 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -220,6 +220,83 @@ static bool cachefiles_lookup_cookie(struct fscache_cookie *cookie)
return false;
}

+/*
+ * Shorten the backing object to discard any dirty data and free up
+ * any unused granules.
+ */
+static bool cachefiles_shorten_object(struct cachefiles_object *object,
+ struct file *file, loff_t new_size)
+{
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct inode *inode = file_inode(file);
+ loff_t i_size, dio_size;
+ int ret;
+
+ dio_size = round_up(new_size, CACHEFILES_DIO_BLOCK_SIZE);
+ i_size = i_size_read(inode);
+
+ trace_cachefiles_trunc(object, inode, i_size, dio_size,
+ cachefiles_trunc_shrink);
+ ret = cachefiles_inject_remove_error();
+ if (ret == 0)
+ ret = vfs_truncate(&file->f_path, dio_size);
+ if (ret < 0) {
+ trace_cachefiles_io_error(object, file_inode(file), ret,
+ cachefiles_trace_trunc_error);
+ cachefiles_io_error_obj(object, "Trunc-to-size failed %d", ret);
+ cachefiles_remove_object_xattr(cache, object, file->f_path.dentry);
+ return false;
+ }
+
+ if (new_size < dio_size) {
+ trace_cachefiles_trunc(object, inode, dio_size, new_size,
+ cachefiles_trunc_dio_adjust);
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = vfs_fallocate(file, FALLOC_FL_ZERO_RANGE,
+ new_size, dio_size);
+ if (ret < 0) {
+ trace_cachefiles_io_error(object, file_inode(file), ret,
+ cachefiles_trace_fallocate_error);
+ cachefiles_io_error_obj(object, "Trunc-to-dio-size failed %d", ret);
+ cachefiles_remove_object_xattr(cache, object, file->f_path.dentry);
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * Resize the backing object.
+ */
+static void cachefiles_resize_cookie(struct netfs_cache_resources *cres,
+ loff_t new_size)
+{
+ struct cachefiles_object *object = cachefiles_cres_object(cres);
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct fscache_cookie *cookie = object->cookie;
+ const struct cred *saved_cred;
+ struct file *file = cachefiles_cres_file(cres);
+ loff_t old_size = cookie->object_size;
+
+ _enter("%llu->%llu", old_size, new_size);
+
+ if (new_size < old_size) {
+ cachefiles_begin_secure(cache, &saved_cred);
+ cachefiles_shorten_object(object, file, new_size);
+ cachefiles_end_secure(cache, saved_cred);
+ object->cookie->object_size = new_size;
+ return;
+ }
+
+ /* The file is being expanded. We don't need to do anything
+ * particularly. cookie->initial_size doesn't change and so the point
+ * at which we have to download before doesn't change.
+ */
+ cookie->object_size = new_size;
+}
+
/*
* Commit changes to the object as we drop it.
*/
@@ -363,5 +440,6 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.withdraw_cookie = cachefiles_withdraw_cookie,
.invalidate_cookie = cachefiles_invalidate_cookie,
.begin_operation = cachefiles_begin_operation,
+ .resize_cookie = cachefiles_resize_cookie,
.prepare_to_write = cachefiles_prepare_to_write,
};



2021-12-22 23:27:13

by David Howells

[permalink] [raw]
Subject: [PATCH v4 51/68] cachefiles: Implement the I/O routines

Implement the I/O routines for cachefiles. There are two sets of routines
here: preparation and actual I/O.

Preparation for read involves looking to see whether there is data present,
and how much. Netfslib tells us what it wants us to do and we have the
option of adjusting shrinking and telling it whether to read from the
cache, download from the server or simply clear a region.

Preparation for write involves checking for space and defending against
possibly running short of space, if necessary punching out a hole in the
file so that we don't leave old data in the cache if we update the
coherency information.

Then there's a read routine and a write routine. They wait for the cookie
state to move to something appropriate and then start a potentially
asynchronous direct I/O operation upon it.

Changes
=======
ver #2:
- Fix a misassigned variable[1].

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/YaZOCk9zxApPattb@archlinux-ax161/ [1]
Link: https://lore.kernel.org/r/163819647945.215744.17827962047487125939.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906954666.143852.1504887120569779407.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967163110.1823006.9206718511874339672.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/io.c | 514 +++++++++++++++++++++++++++++++++++++
include/trace/events/cachefiles.h | 121 +++++++++
2 files changed, 635 insertions(+)

diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index adeb9a42fd7b..6f4dce0cfc36 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -14,6 +14,516 @@
#include <trace/events/fscache.h>
#include "internal.h"

+struct cachefiles_kiocb {
+ struct kiocb iocb;
+ refcount_t ki_refcnt;
+ loff_t start;
+ union {
+ size_t skipped;
+ size_t len;
+ };
+ struct cachefiles_object *object;
+ netfs_io_terminated_t term_func;
+ void *term_func_priv;
+ bool was_async;
+ unsigned int inval_counter; /* Copy of cookie->inval_counter */
+ u64 b_writing;
+};
+
+static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
+{
+ if (refcount_dec_and_test(&ki->ki_refcnt)) {
+ cachefiles_put_object(ki->object, cachefiles_obj_put_ioreq);
+ fput(ki->iocb.ki_filp);
+ kfree(ki);
+ }
+}
+
+/*
+ * Handle completion of a read from the cache.
+ */
+static void cachefiles_read_complete(struct kiocb *iocb, long ret)
+{
+ struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+ struct inode *inode = file_inode(ki->iocb.ki_filp);
+
+ _enter("%ld", ret);
+
+ if (ret < 0)
+ trace_cachefiles_io_error(ki->object, inode, ret,
+ cachefiles_trace_read_error);
+
+ if (ki->term_func) {
+ if (ret >= 0) {
+ if (ki->object->cookie->inval_counter == ki->inval_counter)
+ ki->skipped += ret;
+ else
+ ret = -ESTALE;
+ }
+
+ ki->term_func(ki->term_func_priv, ret, ki->was_async);
+ }
+
+ cachefiles_put_kiocb(ki);
+}
+
+/*
+ * Initiate a read from the cache.
+ */
+static int cachefiles_read(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ enum netfs_read_from_hole read_hole,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_kiocb *ki;
+ struct file *file;
+ unsigned int old_nofs;
+ ssize_t ret = -ENOBUFS;
+ size_t len = iov_iter_count(iter), skipped = 0;
+
+ if (!fscache_wait_for_operation(cres, FSCACHE_WANT_READ))
+ goto presubmission_error;
+
+ fscache_count_read();
+ object = cachefiles_cres_object(cres);
+ file = cachefiles_cres_file(cres);
+
+ _enter("%pD,%li,%llx,%zx/%llx",
+ file, file_inode(file)->i_ino, start_pos, len,
+ i_size_read(file_inode(file)));
+
+ /* If the caller asked us to seek for data before doing the read, then
+ * we should do that now. If we find a gap, we fill it with zeros.
+ */
+ if (read_hole != NETFS_READ_HOLE_IGNORE) {
+ loff_t off = start_pos, off2;
+
+ off2 = cachefiles_inject_read_error();
+ if (off2 == 0)
+ off2 = vfs_llseek(file, off, SEEK_DATA);
+ if (off2 < 0 && off2 >= (loff_t)-MAX_ERRNO && off2 != -ENXIO) {
+ skipped = 0;
+ ret = off2;
+ goto presubmission_error;
+ }
+
+ if (off2 == -ENXIO || off2 >= start_pos + len) {
+ /* The region is beyond the EOF or there's no more data
+ * in the region, so clear the rest of the buffer and
+ * return success.
+ */
+ ret = -ENODATA;
+ if (read_hole == NETFS_READ_HOLE_FAIL)
+ goto presubmission_error;
+
+ iov_iter_zero(len, iter);
+ skipped = len;
+ ret = 0;
+ goto presubmission_error;
+ }
+
+ skipped = off2 - off;
+ iov_iter_zero(skipped, iter);
+ }
+
+ ret = -ENOMEM;
+ ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+ if (!ki)
+ goto presubmission_error;
+
+ refcount_set(&ki->ki_refcnt, 2);
+ ki->iocb.ki_filp = file;
+ ki->iocb.ki_pos = start_pos + skipped;
+ ki->iocb.ki_flags = IOCB_DIRECT;
+ ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file));
+ ki->iocb.ki_ioprio = get_current_ioprio();
+ ki->skipped = skipped;
+ ki->object = object;
+ ki->inval_counter = cres->inval_counter;
+ ki->term_func = term_func;
+ ki->term_func_priv = term_func_priv;
+ ki->was_async = true;
+
+ if (ki->term_func)
+ ki->iocb.ki_complete = cachefiles_read_complete;
+
+ get_file(ki->iocb.ki_filp);
+ cachefiles_grab_object(object, cachefiles_obj_get_ioreq);
+
+ trace_cachefiles_read(object, file_inode(file), ki->iocb.ki_pos, len - skipped);
+ old_nofs = memalloc_nofs_save();
+ ret = cachefiles_inject_read_error();
+ if (ret == 0)
+ ret = vfs_iocb_iter_read(file, &ki->iocb, iter);
+ memalloc_nofs_restore(old_nofs);
+ switch (ret) {
+ case -EIOCBQUEUED:
+ goto in_progress;
+
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTARTNOHAND:
+ case -ERESTART_RESTARTBLOCK:
+ /* There's no easy way to restart the syscall since other AIO's
+ * may be already running. Just fail this IO with EINTR.
+ */
+ ret = -EINTR;
+ fallthrough;
+ default:
+ ki->was_async = false;
+ cachefiles_read_complete(&ki->iocb, ret);
+ if (ret > 0)
+ ret = 0;
+ break;
+ }
+
+in_progress:
+ cachefiles_put_kiocb(ki);
+ _leave(" = %zd", ret);
+ return ret;
+
+presubmission_error:
+ if (term_func)
+ term_func(term_func_priv, ret < 0 ? ret : skipped, false);
+ return ret;
+}
+
+/*
+ * Handle completion of a write to the cache.
+ */
+static void cachefiles_write_complete(struct kiocb *iocb, long ret)
+{
+ struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+ struct cachefiles_object *object = ki->object;
+ struct inode *inode = file_inode(ki->iocb.ki_filp);
+
+ _enter("%ld", ret);
+
+ /* Tell lockdep we inherited freeze protection from submission thread */
+ __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+ __sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
+
+ if (ret < 0)
+ trace_cachefiles_io_error(object, inode, ret,
+ cachefiles_trace_write_error);
+
+ atomic_long_sub(ki->b_writing, &object->volume->cache->b_writing);
+ set_bit(FSCACHE_COOKIE_HAVE_DATA, &object->cookie->flags);
+ if (ki->term_func)
+ ki->term_func(ki->term_func_priv, ret, ki->was_async);
+ cachefiles_put_kiocb(ki);
+}
+
+/*
+ * Initiate a write to the cache.
+ */
+static int cachefiles_write(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ struct cachefiles_kiocb *ki;
+ struct inode *inode;
+ struct file *file;
+ unsigned int old_nofs;
+ ssize_t ret = -ENOBUFS;
+ size_t len = iov_iter_count(iter);
+
+ if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
+ goto presubmission_error;
+ fscache_count_write();
+ object = cachefiles_cres_object(cres);
+ cache = object->volume->cache;
+ file = cachefiles_cres_file(cres);
+
+ _enter("%pD,%li,%llx,%zx/%llx",
+ file, file_inode(file)->i_ino, start_pos, len,
+ i_size_read(file_inode(file)));
+
+ ret = -ENOMEM;
+ ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+ if (!ki)
+ goto presubmission_error;
+
+ refcount_set(&ki->ki_refcnt, 2);
+ ki->iocb.ki_filp = file;
+ ki->iocb.ki_pos = start_pos;
+ ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE;
+ ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file));
+ ki->iocb.ki_ioprio = get_current_ioprio();
+ ki->object = object;
+ ki->inval_counter = cres->inval_counter;
+ ki->start = start_pos;
+ ki->len = len;
+ ki->term_func = term_func;
+ ki->term_func_priv = term_func_priv;
+ ki->was_async = true;
+ ki->b_writing = (len + (1 << cache->bshift)) >> cache->bshift;
+
+ if (ki->term_func)
+ ki->iocb.ki_complete = cachefiles_write_complete;
+ atomic_long_add(ki->b_writing, &cache->b_writing);
+
+ /* Open-code file_start_write here to grab freeze protection, which
+ * will be released by another thread in aio_complete_rw(). Fool
+ * lockdep by telling it the lock got released so that it doesn't
+ * complain about the held lock when we return to userspace.
+ */
+ inode = file_inode(file);
+ __sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
+ __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+
+ get_file(ki->iocb.ki_filp);
+ cachefiles_grab_object(object, cachefiles_obj_get_ioreq);
+
+ trace_cachefiles_write(object, inode, ki->iocb.ki_pos, len);
+ old_nofs = memalloc_nofs_save();
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = vfs_iocb_iter_write(file, &ki->iocb, iter);
+ memalloc_nofs_restore(old_nofs);
+ switch (ret) {
+ case -EIOCBQUEUED:
+ goto in_progress;
+
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTARTNOHAND:
+ case -ERESTART_RESTARTBLOCK:
+ /* There's no easy way to restart the syscall since other AIO's
+ * may be already running. Just fail this IO with EINTR.
+ */
+ ret = -EINTR;
+ fallthrough;
+ default:
+ ki->was_async = false;
+ cachefiles_write_complete(&ki->iocb, ret);
+ if (ret > 0)
+ ret = 0;
+ break;
+ }
+
+in_progress:
+ cachefiles_put_kiocb(ki);
+ _leave(" = %zd", ret);
+ return ret;
+
+presubmission_error:
+ if (term_func)
+ term_func(term_func_priv, ret, false);
+ return ret;
+}
+
+/*
+ * Prepare a read operation, shortening it to a cached/uncached
+ * boundary as appropriate.
+ */
+static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq,
+ loff_t i_size)
+{
+ enum cachefiles_prepare_read_trace why;
+ struct netfs_read_request *rreq = subreq->rreq;
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ struct fscache_cookie *cookie = fscache_cres_cookie(cres);
+ const struct cred *saved_cred;
+ struct file *file = cachefiles_cres_file(cres);
+ enum netfs_read_source ret = NETFS_DOWNLOAD_FROM_SERVER;
+ loff_t off, to;
+ ino_t ino = file ? file_inode(file)->i_ino : 0;
+
+ _enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size);
+
+ if (subreq->start >= i_size) {
+ ret = NETFS_FILL_WITH_ZEROES;
+ why = cachefiles_trace_read_after_eof;
+ goto out_no_object;
+ }
+
+ if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
+ __set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+ why = cachefiles_trace_read_no_data;
+ goto out_no_object;
+ }
+
+ /* The object and the file may be being created in the background. */
+ if (!file) {
+ why = cachefiles_trace_read_no_file;
+ if (!fscache_wait_for_operation(cres, FSCACHE_WANT_READ))
+ goto out_no_object;
+ file = cachefiles_cres_file(cres);
+ if (!file)
+ goto out_no_object;
+ ino = file_inode(file)->i_ino;
+ }
+
+ object = cachefiles_cres_object(cres);
+ cache = object->volume->cache;
+ cachefiles_begin_secure(cache, &saved_cred);
+
+ off = cachefiles_inject_read_error();
+ if (off == 0)
+ off = vfs_llseek(file, subreq->start, SEEK_DATA);
+ if (off < 0 && off >= (loff_t)-MAX_ERRNO) {
+ if (off == (loff_t)-ENXIO) {
+ why = cachefiles_trace_read_seek_nxio;
+ goto download_and_store;
+ }
+ trace_cachefiles_io_error(object, file_inode(file), off,
+ cachefiles_trace_seek_error);
+ why = cachefiles_trace_read_seek_error;
+ goto out;
+ }
+
+ if (off >= subreq->start + subreq->len) {
+ why = cachefiles_trace_read_found_hole;
+ goto download_and_store;
+ }
+
+ if (off > subreq->start) {
+ off = round_up(off, cache->bsize);
+ subreq->len = off - subreq->start;
+ why = cachefiles_trace_read_found_part;
+ goto download_and_store;
+ }
+
+ to = cachefiles_inject_read_error();
+ if (to == 0)
+ to = vfs_llseek(file, subreq->start, SEEK_HOLE);
+ if (to < 0 && to >= (loff_t)-MAX_ERRNO) {
+ trace_cachefiles_io_error(object, file_inode(file), to,
+ cachefiles_trace_seek_error);
+ why = cachefiles_trace_read_seek_error;
+ goto out;
+ }
+
+ if (to < subreq->start + subreq->len) {
+ if (subreq->start + subreq->len >= i_size)
+ to = round_up(to, cache->bsize);
+ else
+ to = round_down(to, cache->bsize);
+ subreq->len = to - subreq->start;
+ }
+
+ why = cachefiles_trace_read_have_data;
+ ret = NETFS_READ_FROM_CACHE;
+ goto out;
+
+download_and_store:
+ __set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+out:
+ cachefiles_end_secure(cache, saved_cred);
+out_no_object:
+ trace_cachefiles_prep_read(subreq, ret, why, ino);
+ return ret;
+}
+
+/*
+ * Prepare for a write to occur.
+ */
+static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
+ loff_t *_start, size_t *_len, loff_t i_size,
+ bool no_space_allocated_yet)
+{
+ struct cachefiles_object *object = cachefiles_cres_object(cres);
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct file *file = cachefiles_cres_file(cres);
+ loff_t start = *_start, pos;
+ size_t len = *_len, down;
+ int ret;
+
+ /* Round to DIO size */
+ down = start - round_down(start, PAGE_SIZE);
+ *_start = start - down;
+ *_len = round_up(down + len, PAGE_SIZE);
+
+ /* We need to work out whether there's sufficient disk space to perform
+ * the write - but we can skip that check if we have space already
+ * allocated.
+ */
+ if (no_space_allocated_yet)
+ goto check_space;
+
+ pos = cachefiles_inject_read_error();
+ if (pos == 0)
+ pos = vfs_llseek(file, *_start, SEEK_DATA);
+ if (pos < 0 && pos >= (loff_t)-MAX_ERRNO) {
+ if (pos == -ENXIO)
+ goto check_space; /* Unallocated tail */
+ trace_cachefiles_io_error(object, file_inode(file), pos,
+ cachefiles_trace_seek_error);
+ return pos;
+ }
+ if ((u64)pos >= (u64)*_start + *_len)
+ goto check_space; /* Unallocated region */
+
+ /* We have a block that's at least partially filled - if we're low on
+ * space, we need to see if it's fully allocated. If it's not, we may
+ * want to cull it.
+ */
+ if (cachefiles_has_space(cache, 0, *_len / PAGE_SIZE) == 0)
+ return 0; /* Enough space to simply overwrite the whole block */
+
+ pos = cachefiles_inject_read_error();
+ if (pos == 0)
+ pos = vfs_llseek(file, *_start, SEEK_HOLE);
+ if (pos < 0 && pos >= (loff_t)-MAX_ERRNO) {
+ trace_cachefiles_io_error(object, file_inode(file), pos,
+ cachefiles_trace_seek_error);
+ return pos;
+ }
+ if ((u64)pos >= (u64)*_start + *_len)
+ return 0; /* Fully allocated */
+
+ /* Partially allocated, but insufficient space: cull. */
+ ret = cachefiles_inject_remove_error();
+ if (ret == 0)
+ ret = vfs_fallocate(file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ *_start, *_len);
+ if (ret < 0) {
+ trace_cachefiles_io_error(object, file_inode(file), ret,
+ cachefiles_trace_fallocate_error);
+ cachefiles_io_error_obj(object,
+ "CacheFiles: fallocate failed (%d)\n", ret);
+ ret = -EIO;
+ }
+
+ return ret;
+
+check_space:
+ return cachefiles_has_space(cache, 0, *_len / PAGE_SIZE);
+}
+
+static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
+ loff_t *_start, size_t *_len, loff_t i_size,
+ bool no_space_allocated_yet)
+{
+ struct cachefiles_object *object = cachefiles_cres_object(cres);
+ struct cachefiles_cache *cache = object->volume->cache;
+ const struct cred *saved_cred;
+ int ret;
+
+ if (!cachefiles_cres_file(cres)) {
+ if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
+ return -ENOBUFS;
+ if (!cachefiles_cres_file(cres))
+ return -ENOBUFS;
+ }
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = __cachefiles_prepare_write(cres, _start, _len, i_size,
+ no_space_allocated_yet);
+ cachefiles_end_secure(cache, saved_cred);
+ return ret;
+}
+
/*
* Clean up an operation.
*/
@@ -28,6 +538,10 @@ static void cachefiles_end_operation(struct netfs_cache_resources *cres)

static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
.end_operation = cachefiles_end_operation,
+ .read = cachefiles_read,
+ .write = cachefiles_write,
+ .prepare_read = cachefiles_prepare_read,
+ .prepare_write = cachefiles_prepare_write,
};

/*
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 98b1eee4a7a8..ab1376ebc3ab 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -61,6 +61,17 @@ enum cachefiles_trunc_trace {
cachefiles_trunc_shrink,
};

+enum cachefiles_prepare_read_trace {
+ cachefiles_trace_read_after_eof,
+ cachefiles_trace_read_found_hole,
+ cachefiles_trace_read_found_part,
+ cachefiles_trace_read_have_data,
+ cachefiles_trace_read_no_data,
+ cachefiles_trace_read_no_file,
+ cachefiles_trace_read_seek_error,
+ cachefiles_trace_read_seek_nxio,
+};
+
enum cachefiles_error_trace {
cachefiles_trace_fallocate_error,
cachefiles_trace_getxattr_error,
@@ -125,6 +136,16 @@ enum cachefiles_error_trace {
EM(cachefiles_trunc_expand_tmpfile, "EXPTMP") \
E_(cachefiles_trunc_shrink, "SHRINK")

+#define cachefiles_prepare_read_traces \
+ EM(cachefiles_trace_read_after_eof, "after-eof ") \
+ EM(cachefiles_trace_read_found_hole, "found-hole") \
+ EM(cachefiles_trace_read_found_part, "found-part") \
+ EM(cachefiles_trace_read_have_data, "have-data ") \
+ EM(cachefiles_trace_read_no_data, "no-data ") \
+ EM(cachefiles_trace_read_no_file, "no-file ") \
+ EM(cachefiles_trace_read_seek_error, "seek-error") \
+ E_(cachefiles_trace_read_seek_nxio, "seek-enxio")
+
#define cachefiles_error_traces \
EM(cachefiles_trace_fallocate_error, "fallocate") \
EM(cachefiles_trace_getxattr_error, "getxattr") \
@@ -157,6 +178,7 @@ cachefiles_obj_kill_traces;
cachefiles_obj_ref_traces;
cachefiles_coherency_traces;
cachefiles_trunc_traces;
+cachefiles_prepare_read_traces;
cachefiles_error_traces;

/*
@@ -343,6 +365,105 @@ TRACE_EVENT(cachefiles_coherency,
__entry->content)
);

+TRACE_EVENT(cachefiles_prep_read,
+ TP_PROTO(struct netfs_read_subrequest *sreq,
+ enum netfs_read_source source,
+ enum cachefiles_prepare_read_trace why,
+ ino_t cache_inode),
+
+ TP_ARGS(sreq, source, why, cache_inode),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, rreq )
+ __field(unsigned short, index )
+ __field(unsigned short, flags )
+ __field(enum netfs_read_source, source )
+ __field(enum cachefiles_prepare_read_trace, why )
+ __field(size_t, len )
+ __field(loff_t, start )
+ __field(unsigned int, netfs_inode )
+ __field(unsigned int, cache_inode )
+ ),
+
+ TP_fast_assign(
+ __entry->rreq = sreq->rreq->debug_id;
+ __entry->index = sreq->debug_index;
+ __entry->flags = sreq->flags;
+ __entry->source = source;
+ __entry->why = why;
+ __entry->len = sreq->len;
+ __entry->start = sreq->start;
+ __entry->netfs_inode = sreq->rreq->inode->i_ino;
+ __entry->cache_inode = cache_inode;
+ ),
+
+ TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx ni=%x b=%x",
+ __entry->rreq, __entry->index,
+ __print_symbolic(__entry->source, netfs_sreq_sources),
+ __print_symbolic(__entry->why, cachefiles_prepare_read_traces),
+ __entry->flags,
+ __entry->start, __entry->len,
+ __entry->netfs_inode, __entry->cache_inode)
+ );
+
+TRACE_EVENT(cachefiles_read,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct inode *backer,
+ loff_t start,
+ size_t len),
+
+ TP_ARGS(obj, backer, start, len),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ __field(size_t, len )
+ __field(loff_t, start )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->backer = backer->i_ino;
+ __entry->start = start;
+ __entry->len = len;
+ ),
+
+ TP_printk("o=%08x b=%08x s=%llx l=%zx",
+ __entry->obj,
+ __entry->backer,
+ __entry->start,
+ __entry->len)
+ );
+
+TRACE_EVENT(cachefiles_write,
+ TP_PROTO(struct cachefiles_object *obj,
+ struct inode *backer,
+ loff_t start,
+ size_t len),
+
+ TP_ARGS(obj, backer, start, len),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, obj )
+ __field(unsigned int, backer )
+ __field(size_t, len )
+ __field(loff_t, start )
+ ),
+
+ TP_fast_assign(
+ __entry->obj = obj->debug_id;
+ __entry->backer = backer->i_ino;
+ __entry->start = start;
+ __entry->len = len;
+ ),
+
+ TP_printk("o=%08x b=%08x s=%llx l=%zx",
+ __entry->obj,
+ __entry->backer,
+ __entry->start,
+ __entry->len)
+ );
+
TRACE_EVENT(cachefiles_trunc,
TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
loff_t from, loff_t to, enum cachefiles_trunc_trace why),



2021-12-22 23:27:29

by David Howells

[permalink] [raw]
Subject: [PATCH v4 52/68] fscache, cachefiles: Store the volume coherency data

Store the volume coherency data in an xattr and check it when we rebind the
volume. If it doesn't match the cache volume is moved to the graveyard and
rebuilt anew.

Changes
=======
ver #4:
- Remove a couple of debugging prints.

Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/163967164397.1823006.2950539849831291830.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/internal.h | 2 +
fs/cachefiles/volume.c | 25 +++++++++++-
fs/cachefiles/xattr.c | 78 +++++++++++++++++++++++++++++++++++++
fs/fscache/volume.c | 14 ++++++-
include/linux/fscache.h | 2 +
include/trace/events/cachefiles.h | 42 +++++++++++++++++++-
6 files changed, 157 insertions(+), 6 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index d5868f5514d3..abdd1b66f6b9 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -270,6 +270,8 @@ extern int cachefiles_remove_object_xattr(struct cachefiles_cache *cache,
struct cachefiles_object *object,
struct dentry *dentry);
extern void cachefiles_prepare_to_write(struct fscache_cookie *cookie);
+extern bool cachefiles_set_volume_xattr(struct cachefiles_volume *volume);
+extern int cachefiles_check_volume_xattr(struct cachefiles_volume *volume);

/*
* Error handling
diff --git a/fs/cachefiles/volume.c b/fs/cachefiles/volume.c
index 4a14f5e72764..89df0ba8ba5e 100644
--- a/fs/cachefiles/volume.c
+++ b/fs/cachefiles/volume.c
@@ -22,7 +22,8 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
struct dentry *vdentry, *fan;
size_t len;
char *name;
- int n_accesses, i;
+ bool is_new = false;
+ int ret, n_accesses, i;

_enter("");

@@ -43,11 +44,29 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
memcpy(name + 1, vcookie->key + 1, len);
name[len + 1] = 0;

- vdentry = cachefiles_get_directory(cache, cache->store, name, NULL);
+retry:
+ vdentry = cachefiles_get_directory(cache, cache->store, name, &is_new);
if (IS_ERR(vdentry))
goto error_name;
volume->dentry = vdentry;

+ if (is_new) {
+ if (!cachefiles_set_volume_xattr(volume))
+ goto error_dir;
+ } else {
+ ret = cachefiles_check_volume_xattr(volume);
+ if (ret < 0) {
+ if (ret != -ESTALE)
+ goto error_dir;
+ inode_lock_nested(d_inode(cache->store), I_MUTEX_PARENT);
+ cachefiles_bury_object(cache, NULL, cache->store, vdentry,
+ FSCACHE_VOLUME_IS_WEIRD);
+ cachefiles_put_directory(volume->dentry);
+ cond_resched();
+ goto retry;
+ }
+ }
+
for (i = 0; i < 256; i++) {
sprintf(name, "@%02x", i);
fan = cachefiles_get_directory(cache, vdentry, name, NULL);
@@ -74,6 +93,7 @@ void cachefiles_acquire_volume(struct fscache_volume *vcookie)
error_fan:
for (i = 0; i < 256; i++)
cachefiles_put_directory(volume->fanout[i]);
+error_dir:
cachefiles_put_directory(volume->dentry);
error_name:
kfree(name);
@@ -114,5 +134,6 @@ void cachefiles_free_volume(struct fscache_volume *vcookie)
void cachefiles_withdraw_volume(struct cachefiles_volume *volume)
{
fscache_withdraw_volume(volume->vcookie);
+ cachefiles_set_volume_xattr(volume);
__cachefiles_free_volume(volume);
}
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 0601c46a22ef..83f41bd0c3a9 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -179,3 +179,81 @@ void cachefiles_prepare_to_write(struct fscache_cookie *cookie)
cachefiles_end_secure(cache, saved_cred);
}
}
+
+/*
+ * Set the state xattr on a volume directory.
+ */
+bool cachefiles_set_volume_xattr(struct cachefiles_volume *volume)
+{
+ unsigned int len = volume->vcookie->coherency_len;
+ const void *p = volume->vcookie->coherency;
+ struct dentry *dentry = volume->dentry;
+ int ret;
+
+ _enter("%x,#%d", volume->vcookie->debug_id, len);
+
+ ret = cachefiles_inject_write_error();
+ if (ret == 0)
+ ret = vfs_setxattr(&init_user_ns, dentry, cachefiles_xattr_cache,
+ p, len, 0);
+ if (ret < 0) {
+ trace_cachefiles_vfs_error(NULL, d_inode(dentry), ret,
+ cachefiles_trace_setxattr_error);
+ trace_cachefiles_vol_coherency(volume, d_inode(dentry)->i_ino,
+ cachefiles_coherency_vol_set_fail);
+ if (ret != -ENOMEM)
+ cachefiles_io_error(
+ volume->cache, "Failed to set xattr with error %d", ret);
+ } else {
+ trace_cachefiles_vol_coherency(volume, d_inode(dentry)->i_ino,
+ cachefiles_coherency_vol_set_ok);
+ }
+
+ _leave(" = %d", ret);
+ return ret == 0;
+}
+
+/*
+ * Check the consistency between the backing cache and the volume cookie.
+ */
+int cachefiles_check_volume_xattr(struct cachefiles_volume *volume)
+{
+ struct cachefiles_xattr *buf;
+ struct dentry *dentry = volume->dentry;
+ unsigned int len = volume->vcookie->coherency_len;
+ const void *p = volume->vcookie->coherency;
+ enum cachefiles_coherency_trace why;
+ ssize_t xlen;
+ int ret = -ESTALE;
+
+ _enter("");
+
+ buf = kmalloc(len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ xlen = cachefiles_inject_read_error();
+ if (xlen == 0)
+ xlen = vfs_getxattr(&init_user_ns, dentry, cachefiles_xattr_cache, buf, len);
+ if (xlen != len) {
+ if (xlen < 0) {
+ trace_cachefiles_vfs_error(NULL, d_inode(dentry), xlen,
+ cachefiles_trace_getxattr_error);
+ if (xlen == -EIO)
+ cachefiles_io_error(
+ volume->cache,
+ "Failed to read xattr with error %zd", xlen);
+ }
+ why = cachefiles_coherency_vol_check_xattr;
+ } else if (memcmp(buf->data, p, len) != 0) {
+ why = cachefiles_coherency_vol_check_cmp;
+ } else {
+ why = cachefiles_coherency_vol_check_ok;
+ ret = 0;
+ }
+
+ trace_cachefiles_vol_coherency(volume, d_inode(dentry)->i_ino, why);
+ kfree(buf);
+ _leave(" = %d", ret);
+ return ret;
+}
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index e1a8e92a6adb..a57c6cbee858 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -205,15 +205,22 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
size_t klen, hlen;
char *key;

+ if (!coherency_data)
+ coherency_len = 0;
+
cache = fscache_lookup_cache(cache_name, false);
if (IS_ERR(cache))
return NULL;

- volume = kzalloc(sizeof(*volume), GFP_KERNEL);
+ volume = kzalloc(struct_size(volume, coherency, coherency_len),
+ GFP_KERNEL);
if (!volume)
goto err_cache;

volume->cache = cache;
+ volume->coherency_len = coherency_len;
+ if (coherency_data)
+ memcpy(volume->coherency, coherency_data, coherency_len);
INIT_LIST_HEAD(&volume->proc_link);
INIT_WORK(&volume->work, fscache_create_volume_work);
refcount_set(&volume->ref, 1);
@@ -421,8 +428,11 @@ void __fscache_relinquish_volume(struct fscache_volume *volume,
if (WARN_ON(test_and_set_bit(FSCACHE_VOLUME_RELINQUISHED, &volume->flags)))
return;

- if (invalidate)
+ if (invalidate) {
set_bit(FSCACHE_VOLUME_INVALIDATE, &volume->flags);
+ } else if (coherency_data) {
+ memcpy(volume->coherency, coherency_data, volume->coherency_len);
+ }

fscache_put_volume(volume, fscache_volume_put_relinquish);
}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 86b1c0db1de5..7bd35f60d19a 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -87,6 +87,8 @@ struct fscache_volume {
#define FSCACHE_VOLUME_COLLIDED_WITH 2 /* Volume was collided with */
#define FSCACHE_VOLUME_ACQUIRE_PENDING 3 /* Volume is waiting to complete acquisition */
#define FSCACHE_VOLUME_CREATING 4 /* Volume is being created on disk */
+ u8 coherency_len; /* Length of the coherency data */
+ u8 coherency[]; /* Coherency data */
};

/*
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index ab1376ebc3ab..1172529b5b49 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -40,6 +40,7 @@ enum fscache_why_object_killed {
FSCACHE_OBJECT_NO_SPACE,
FSCACHE_OBJECT_WAS_RETIRED,
FSCACHE_OBJECT_WAS_CULLED,
+ FSCACHE_VOLUME_IS_WEIRD,
};

enum cachefiles_coherency_trace {
@@ -53,6 +54,11 @@ enum cachefiles_coherency_trace {
cachefiles_coherency_check_xattr,
cachefiles_coherency_set_fail,
cachefiles_coherency_set_ok,
+ cachefiles_coherency_vol_check_cmp,
+ cachefiles_coherency_vol_check_ok,
+ cachefiles_coherency_vol_check_xattr,
+ cachefiles_coherency_vol_set_fail,
+ cachefiles_coherency_vol_set_ok,
};

enum cachefiles_trunc_trace {
@@ -103,7 +109,8 @@ enum cachefiles_error_trace {
EM(FSCACHE_OBJECT_INVALIDATED, "inval") \
EM(FSCACHE_OBJECT_NO_SPACE, "no_space") \
EM(FSCACHE_OBJECT_WAS_RETIRED, "was_retired") \
- E_(FSCACHE_OBJECT_WAS_CULLED, "was_culled")
+ EM(FSCACHE_OBJECT_WAS_CULLED, "was_culled") \
+ E_(FSCACHE_VOLUME_IS_WEIRD, "volume_weird")

#define cachefiles_obj_ref_traces \
EM(cachefiles_obj_get_ioreq, "GET ioreq") \
@@ -129,7 +136,12 @@ enum cachefiles_error_trace {
EM(cachefiles_coherency_check_type, "BAD type") \
EM(cachefiles_coherency_check_xattr, "BAD xatt") \
EM(cachefiles_coherency_set_fail, "SET fail") \
- E_(cachefiles_coherency_set_ok, "SET ok ")
+ EM(cachefiles_coherency_set_ok, "SET ok ") \
+ EM(cachefiles_coherency_vol_check_cmp, "VOL BAD cmp ") \
+ EM(cachefiles_coherency_vol_check_ok, "VOL OK ") \
+ EM(cachefiles_coherency_vol_check_xattr,"VOL BAD xatt") \
+ EM(cachefiles_coherency_vol_set_fail, "VOL SET fail") \
+ E_(cachefiles_coherency_vol_set_ok, "VOL SET ok ")

#define cachefiles_trunc_traces \
EM(cachefiles_trunc_dio_adjust, "DIOADJ") \
@@ -365,6 +377,32 @@ TRACE_EVENT(cachefiles_coherency,
__entry->content)
);

+TRACE_EVENT(cachefiles_vol_coherency,
+ TP_PROTO(struct cachefiles_volume *volume,
+ ino_t ino,
+ enum cachefiles_coherency_trace why),
+
+ TP_ARGS(volume, ino, why),
+
+ /* Note that obj may be NULL */
+ TP_STRUCT__entry(
+ __field(unsigned int, vol )
+ __field(enum cachefiles_coherency_trace, why )
+ __field(u64, ino )
+ ),
+
+ TP_fast_assign(
+ __entry->vol = volume->vcookie->debug_id;
+ __entry->why = why;
+ __entry->ino = ino;
+ ),
+
+ TP_printk("V=%08x %s i=%llx",
+ __entry->vol,
+ __print_symbolic(__entry->why, cachefiles_coherency_traces),
+ __entry->ino)
+ );
+
TRACE_EVENT(cachefiles_prep_read,
TP_PROTO(struct netfs_read_subrequest *sreq,
enum netfs_read_source source,



2021-12-22 23:27:44

by David Howells

[permalink] [raw]
Subject: [PATCH v4 53/68] cachefiles: Allow cachefiles to actually function

Remove the block that allowed cachefiles to be compiled but prevented it
from actually starting a cache.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819649497.215744.2872504990762846767.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906956491.143852.4951522864793559189.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967165374.1823006.14248189932202373809.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/daemon.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 61e8740d01be..45af558a696e 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -703,9 +703,7 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
return -EBUSY;
}

- pr_warn("Cache is disabled for development\n");
- return -ENOANO; // Don't allow the cache to operate yet
- //return cachefiles_add_cache(cache);
+ return cachefiles_add_cache(cache);
}

/*



2021-12-22 23:27:59

by David Howells

[permalink] [raw]
Subject: [PATCH v4 54/68] fscache, cachefiles: Display stats of no-space events

Add stat counters of no-space events that caused caching not to happen and
display in /proc/fs/fscache/stats.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819653216.215744.17210522251617386509.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906958369.143852.7257100711818401748.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967166917.1823006.14842444049198947892.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/cache.c | 18 +++++++++++++++---
fs/cachefiles/daemon.c | 2 +-
fs/cachefiles/internal.h | 11 +++++++++--
fs/cachefiles/io.c | 7 +++++--
fs/cachefiles/namei.c | 6 ++++--
fs/fscache/stats.c | 8 ++++++++
include/linux/fscache-cache.h | 6 ++++++
7 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
index e2cbbc08bad9..809519286335 100644
--- a/fs/cachefiles/cache.c
+++ b/fs/cachefiles/cache.c
@@ -147,7 +147,7 @@ int cachefiles_add_cache(struct cachefiles_cache *cache)
pr_info("File cache on %s registered\n", cache_cookie->name);

/* check how much space the cache has */
- cachefiles_has_space(cache, 0, 0);
+ cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
cachefiles_end_secure(cache, saved_cred);
_leave(" = 0 [%px]", cache->cache);
return 0;
@@ -175,7 +175,8 @@ int cachefiles_add_cache(struct cachefiles_cache *cache)
* cache
*/
int cachefiles_has_space(struct cachefiles_cache *cache,
- unsigned fnr, unsigned bnr)
+ unsigned fnr, unsigned bnr,
+ enum cachefiles_has_space_for reason)
{
struct kstatfs stats;
u64 b_avail, b_writing;
@@ -233,7 +234,7 @@ int cachefiles_has_space(struct cachefiles_cache *cache,
ret = -ENOBUFS;
if (stats.f_ffree < cache->fstop ||
b_avail < cache->bstop)
- goto begin_cull;
+ goto stop_and_begin_cull;

ret = 0;
if (stats.f_ffree < cache->fcull ||
@@ -252,6 +253,17 @@ int cachefiles_has_space(struct cachefiles_cache *cache,
//_leave(" = 0");
return 0;

+stop_and_begin_cull:
+ switch (reason) {
+ case cachefiles_has_space_for_write:
+ fscache_count_no_write_space();
+ break;
+ case cachefiles_has_space_for_create:
+ fscache_count_no_create_space();
+ break;
+ default:
+ break;
+ }
begin_cull:
if (!test_and_set_bit(CACHEFILES_CULLING, &cache->flags)) {
_debug("### CULL CACHE ###");
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 45af558a696e..40a792421fc1 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -170,7 +170,7 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
return 0;

/* check how much space the cache has */
- cachefiles_has_space(cache, 0, 0);
+ cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);

/* summarise */
f_released = atomic_xchg(&cache->f_released, 0);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index abdd1b66f6b9..8dd54d9375b6 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -130,10 +130,17 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
* cache.c
*/
extern int cachefiles_add_cache(struct cachefiles_cache *cache);
-extern int cachefiles_has_space(struct cachefiles_cache *cache,
- unsigned fnr, unsigned bnr);
extern void cachefiles_withdraw_cache(struct cachefiles_cache *cache);

+enum cachefiles_has_space_for {
+ cachefiles_has_space_check,
+ cachefiles_has_space_for_write,
+ cachefiles_has_space_for_create,
+};
+extern int cachefiles_has_space(struct cachefiles_cache *cache,
+ unsigned fnr, unsigned bnr,
+ enum cachefiles_has_space_for reason);
+
/*
* daemon.c
*/
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 6f4dce0cfc36..60b1eac2ce78 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -468,7 +468,8 @@ static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
* space, we need to see if it's fully allocated. If it's not, we may
* want to cull it.
*/
- if (cachefiles_has_space(cache, 0, *_len / PAGE_SIZE) == 0)
+ if (cachefiles_has_space(cache, 0, *_len / PAGE_SIZE,
+ cachefiles_has_space_check) == 0)
return 0; /* Enough space to simply overwrite the whole block */

pos = cachefiles_inject_read_error();
@@ -483,6 +484,7 @@ static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
return 0; /* Fully allocated */

/* Partially allocated, but insufficient space: cull. */
+ fscache_count_no_write_space();
ret = cachefiles_inject_remove_error();
if (ret == 0)
ret = vfs_fallocate(file, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
@@ -498,7 +500,8 @@ static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
return ret;

check_space:
- return cachefiles_has_space(cache, 0, *_len / PAGE_SIZE);
+ return cachefiles_has_space(cache, 0, *_len / PAGE_SIZE,
+ cachefiles_has_space_for_write);
}

static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index b549e9f79c01..ab3ca598acac 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -115,7 +115,8 @@ struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,

/* we need to create the subdir if it doesn't exist yet */
if (d_is_negative(subdir)) {
- ret = cachefiles_has_space(cache, 1, 0);
+ ret = cachefiles_has_space(cache, 1, 0,
+ cachefiles_has_space_for_create);
if (ret < 0)
goto mkdir_error;

@@ -513,7 +514,8 @@ static bool cachefiles_create_file(struct cachefiles_object *object)
struct file *file;
int ret;

- ret = cachefiles_has_space(object->volume->cache, 1, 0);
+ ret = cachefiles_has_space(object->volume->cache, 1, 0,
+ cachefiles_has_space_for_create);
if (ret < 0)
return false;

diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index 798ee68b3e9d..db2f4e225dd9 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -42,6 +42,10 @@ atomic_t fscache_n_read;
EXPORT_SYMBOL(fscache_n_read);
atomic_t fscache_n_write;
EXPORT_SYMBOL(fscache_n_write);
+atomic_t fscache_n_no_write_space;
+EXPORT_SYMBOL(fscache_n_no_write_space);
+atomic_t fscache_n_no_create_space;
+EXPORT_SYMBOL(fscache_n_no_create_space);

/*
* display the general statistics
@@ -82,6 +86,10 @@ int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_relinquishes_retire),
atomic_read(&fscache_n_relinquishes_dropped));

+ seq_printf(m, "NoSpace: nwr=%u ncr=%u\n",
+ atomic_read(&fscache_n_no_write_space),
+ atomic_read(&fscache_n_no_create_space));
+
seq_printf(m, "IO : rd=%u wr=%u\n",
atomic_read(&fscache_n_read),
atomic_read(&fscache_n_write));
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 3fa4902dc87c..007e47f38610 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -186,11 +186,17 @@ static inline void fscache_wait_for_objects(struct fscache_cache *cache)
#ifdef CONFIG_FSCACHE_STATS
extern atomic_t fscache_n_read;
extern atomic_t fscache_n_write;
+extern atomic_t fscache_n_no_write_space;
+extern atomic_t fscache_n_no_create_space;
#define fscache_count_read() atomic_inc(&fscache_n_read)
#define fscache_count_write() atomic_inc(&fscache_n_write)
+#define fscache_count_no_write_space() atomic_inc(&fscache_n_no_write_space)
+#define fscache_count_no_create_space() atomic_inc(&fscache_n_no_create_space)
#else
#define fscache_count_read() do {} while(0)
#define fscache_count_write() do {} while(0)
+#define fscache_count_no_write_space() do {} while(0)
+#define fscache_count_no_create_space() do {} while(0)
#endif

#endif /* _LINUX_FSCACHE_CACHE_H */



2021-12-22 23:28:13

by David Howells

[permalink] [raw]
Subject: [PATCH v4 55/68] fscache, cachefiles: Display stat of culling events

Add a stat counter of culling events whereby the cache backend culls a file
to make space (when asked by cachefilesd in this case) and display in
/proc/fs/fscache/stats.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819654165.215744.3797804661644212436.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906961387.143852.9291157239960289090.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967168266.1823006.14436200166581605746.stgit@warthog.procyon.org.uk/ # v3
---

fs/cachefiles/namei.c | 1 +
fs/fscache/stats.c | 7 +++++--
include/linux/fscache-cache.h | 3 +++
3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index ab3ca598acac..9bd692870617 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -805,6 +805,7 @@ int cachefiles_cull(struct cachefiles_cache *cache, struct dentry *dir,
if (ret < 0)
goto error;

+ fscache_count_culled();
dput(victim);
_leave(" = 0");
return 0;
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index db2f4e225dd9..fc94e5e79f1c 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -46,6 +46,8 @@ atomic_t fscache_n_no_write_space;
EXPORT_SYMBOL(fscache_n_no_write_space);
atomic_t fscache_n_no_create_space;
EXPORT_SYMBOL(fscache_n_no_create_space);
+atomic_t fscache_n_culled;
+EXPORT_SYMBOL(fscache_n_culled);

/*
* display the general statistics
@@ -86,9 +88,10 @@ int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_relinquishes_retire),
atomic_read(&fscache_n_relinquishes_dropped));

- seq_printf(m, "NoSpace: nwr=%u ncr=%u\n",
+ seq_printf(m, "NoSpace: nwr=%u ncr=%u cull=%u\n",
atomic_read(&fscache_n_no_write_space),
- atomic_read(&fscache_n_no_create_space));
+ atomic_read(&fscache_n_no_create_space),
+ atomic_read(&fscache_n_culled));

seq_printf(m, "IO : rd=%u wr=%u\n",
atomic_read(&fscache_n_read),
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 007e47f38610..a174cedf4d90 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -188,15 +188,18 @@ extern atomic_t fscache_n_read;
extern atomic_t fscache_n_write;
extern atomic_t fscache_n_no_write_space;
extern atomic_t fscache_n_no_create_space;
+extern atomic_t fscache_n_culled;
#define fscache_count_read() atomic_inc(&fscache_n_read)
#define fscache_count_write() atomic_inc(&fscache_n_write)
#define fscache_count_no_write_space() atomic_inc(&fscache_n_no_write_space)
#define fscache_count_no_create_space() atomic_inc(&fscache_n_no_create_space)
+#define fscache_count_culled() atomic_inc(&fscache_n_culled)
#else
#define fscache_count_read() do {} while(0)
#define fscache_count_write() do {} while(0)
#define fscache_count_no_write_space() do {} while(0)
#define fscache_count_no_create_space() do {} while(0)
+#define fscache_count_culled() do {} while(0)
#endif

#endif /* _LINUX_FSCACHE_CACHE_H */



2021-12-22 23:28:25

by David Howells

[permalink] [raw]
Subject: [PATCH v4 56/68] afs: Convert afs to use the new fscache API

Change the afs filesystem to support the new afs driver.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole. There's also no longer a cell cookie.

(2) The volume cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). This function takes three parameters: a
string representing the "volume" in the index, a string naming the
cache to use (or NULL) and a u64 that conveys coherency metadata for
the volume.

For afs, I've made it render the volume name string as:

"afs,<cell>,<volume_id>"

and the coherency data is currently 0.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before, except that these are now stored in big endian
form instead of cpu endian. This makes the cache more copyable.

(4) fscache_use_cookie() and fscache_unuse_cookie() are called when a file
is opened or closed to prevent a cache file from being culled and to
keep resources to hand that are needed to do I/O.

fscache_use_cookie() is given an indication if the cache is likely to
be modified locally (e.g. the file is open for writing).

fscache_unuse_cookie() is given a coherency update if we had the file
open for writing and will update that.

(5) fscache_invalidate() is now given uptodate auxiliary data and a file
size. It can also take a flag to indicate if this was due to a DIO
write. This is wrapped into afs_fscache_invalidate() now for
convenience.

(6) fscache_resize() now gets called from the finalisation of
afs_setattr(), and afs_setattr() does use/unuse of the cookie around
the call to support this.

(7) fscache_note_page_release() is called from afs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

Render the parts of the cookie key for an afs inode cookie as big endian.

Changes
=======
ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/163819661382.215744.1485608824741611837.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970002.143852.17678518584089878259.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967174665.1823006.1301789965454084220.stgit@warthog.procyon.org.uk/ # v3
---

fs/afs/Kconfig | 2 +-
fs/afs/Makefile | 3 --
fs/afs/cache.c | 68 -----------------------------------------------------
fs/afs/cell.c | 12 ---------
fs/afs/file.c | 29 ++++++++++++++++++-----
fs/afs/inode.c | 50 +++++++++++++++++++--------------------
fs/afs/internal.h | 32 ++++++++++++++-----------
fs/afs/main.c | 14 -----------
fs/afs/volume.c | 29 +++++++++++++++++------
fs/afs/write.c | 1 -
10 files changed, 89 insertions(+), 151 deletions(-)
delete mode 100644 fs/afs/cache.c

diff --git a/fs/afs/Kconfig b/fs/afs/Kconfig
index c40cdfcc25d1..fc8ba9142f2f 100644
--- a/fs/afs/Kconfig
+++ b/fs/afs/Kconfig
@@ -25,7 +25,7 @@ config AFS_DEBUG

config AFS_FSCACHE
bool "Provide AFS client caching support"
- depends on AFS_FS=m && FSCACHE_OLD_API || AFS_FS=y && FSCACHE_OLD_API=y
+ depends on AFS_FS=m && FSCACHE || AFS_FS=y && FSCACHE=y
help
Say Y here if you want AFS data to be cached locally on disk through
the generic filesystem cache manager
diff --git a/fs/afs/Makefile b/fs/afs/Makefile
index 75c4e4043d1d..e8956b65d7ff 100644
--- a/fs/afs/Makefile
+++ b/fs/afs/Makefile
@@ -3,10 +3,7 @@
# Makefile for Red Hat Linux AFS client.
#

-afs-cache-$(CONFIG_AFS_FSCACHE) := cache.o
-
kafs-y := \
- $(afs-cache-y) \
addr_list.o \
callback.o \
cell.o \
diff --git a/fs/afs/cache.c b/fs/afs/cache.c
deleted file mode 100644
index 037af93e3aba..000000000000
--- a/fs/afs/cache.c
+++ /dev/null
@@ -1,68 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* AFS caching stuff
- *
- * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells ([email protected])
- */
-
-#include <linux/sched.h>
-#include "internal.h"
-
-static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data,
- const void *buffer,
- uint16_t buflen,
- loff_t object_size);
-
-struct fscache_netfs afs_cache_netfs = {
- .name = "afs",
- .version = 2,
-};
-
-struct fscache_cookie_def afs_cell_cache_index_def = {
- .name = "AFS.cell",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-struct fscache_cookie_def afs_volume_cache_index_def = {
- .name = "AFS.volume",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-struct fscache_cookie_def afs_vnode_cache_index_def = {
- .name = "AFS.vnode",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = afs_vnode_cache_check_aux,
-};
-
-/*
- * check that the auxiliary data indicates that the entry is still valid
- */
-static enum fscache_checkaux afs_vnode_cache_check_aux(void *cookie_netfs_data,
- const void *buffer,
- uint16_t buflen,
- loff_t object_size)
-{
- struct afs_vnode *vnode = cookie_netfs_data;
- struct afs_vnode_cache_aux aux;
-
- _enter("{%llx,%x,%llx},%p,%u",
- vnode->fid.vnode, vnode->fid.unique, vnode->status.data_version,
- buffer, buflen);
-
- memcpy(&aux, buffer, sizeof(aux));
-
- /* check the size of the data is what we're expecting */
- if (buflen != sizeof(aux)) {
- _leave(" = OBSOLETE [len %hx != %zx]", buflen, sizeof(aux));
- return FSCACHE_CHECKAUX_OBSOLETE;
- }
-
- if (vnode->status.data_version != aux.data_version) {
- _leave(" = OBSOLETE [vers %llx != %llx]",
- aux.data_version, vnode->status.data_version);
- return FSCACHE_CHECKAUX_OBSOLETE;
- }
-
- _leave(" = SUCCESS");
- return FSCACHE_CHECKAUX_OKAY;
-}
diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index d88407fb9bc0..07ad744eef77 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -680,13 +680,6 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell)
return ret;
}

-#ifdef CONFIG_AFS_FSCACHE
- cell->cache = fscache_acquire_cookie(afs_cache_netfs.primary_index,
- &afs_cell_cache_index_def,
- cell->name, strlen(cell->name),
- NULL, 0,
- cell, 0, true);
-#endif
ret = afs_proc_cell_setup(cell);
if (ret < 0)
return ret;
@@ -723,11 +716,6 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell)
afs_dynroot_rmdir(net, cell);
mutex_unlock(&net->proc_cells_lock);

-#ifdef CONFIG_AFS_FSCACHE
- fscache_relinquish_cookie(cell->cache, NULL, false);
- cell->cache = NULL;
-#endif
-
_leave("");
}

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 97a51e1de55c..be23635f35b8 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -158,7 +158,9 @@ int afs_open(struct inode *inode, struct file *file)

if (file->f_flags & O_TRUNC)
set_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
-
+
+ fscache_use_cookie(afs_vnode_cache(vnode), file->f_mode & FMODE_WRITE);
+
file->private_data = af;
_leave(" = 0");
return 0;
@@ -177,8 +179,10 @@ int afs_open(struct inode *inode, struct file *file)
*/
int afs_release(struct inode *inode, struct file *file)
{
+ struct afs_vnode_cache_aux aux;
struct afs_vnode *vnode = AFS_FS_I(inode);
struct afs_file *af = file->private_data;
+ loff_t i_size;
int ret = 0;

_enter("{%llx:%llu},", vnode->fid.vid, vnode->fid.vnode);
@@ -189,6 +193,15 @@ int afs_release(struct inode *inode, struct file *file)
file->private_data = NULL;
if (af->wb)
afs_put_wb_key(af->wb);
+
+ if ((file->f_mode & FMODE_WRITE)) {
+ i_size = i_size_read(&vnode->vfs_inode);
+ afs_set_cache_aux(vnode, &aux);
+ fscache_unuse_cookie(afs_vnode_cache(vnode), &aux, &i_size);
+ } else {
+ fscache_unuse_cookie(afs_vnode_cache(vnode), NULL, NULL);
+ }
+
key_put(af->key);
kfree(af);
afs_prune_wb_keys(vnode);
@@ -352,7 +365,9 @@ static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file)

static bool afs_is_cache_enabled(struct inode *inode)
{
- return fscache_cookie_enabled(afs_vnode_cache(AFS_FS_I(inode)));
+ struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode));
+
+ return fscache_cookie_enabled(cookie) && cookie->cache_priv;
}

static int afs_begin_cache_operation(struct netfs_read_request *rreq)
@@ -360,7 +375,8 @@ static int afs_begin_cache_operation(struct netfs_read_request *rreq)
#ifdef CONFIG_AFS_FSCACHE
struct afs_vnode *vnode = AFS_FS_I(rreq->inode);

- return fscache_begin_read_operation(rreq, afs_vnode_cache(vnode));
+ return fscache_begin_read_operation(&rreq->cache_resources,
+ afs_vnode_cache(vnode));
#else
return -ENOBUFS;
#endif
@@ -482,23 +498,24 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
* release a page and clean up its private state if it's not busy
* - return true if the page can now be released, false if not
*/
-static int afs_releasepage(struct page *page, gfp_t gfp_flags)
+static int afs_releasepage(struct page *page, gfp_t gfp)
{
struct folio *folio = page_folio(page);
struct afs_vnode *vnode = AFS_FS_I(folio_inode(folio));

_enter("{{%llx:%llu}[%lu],%lx},%x",
vnode->fid.vid, vnode->fid.vnode, folio_index(folio), folio->flags,
- gfp_flags);
+ gfp);

/* deny if page is being written to the cache and the caller hasn't
* elected to wait */
#ifdef CONFIG_AFS_FSCACHE
if (folio_test_fscache(folio)) {
- if (!(gfp_flags & __GFP_DIRECT_RECLAIM) || !(gfp_flags & __GFP_FS))
+ if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
return false;
folio_wait_fscache(folio);
}
+ fscache_note_page_release(afs_vnode_cache(vnode));
#endif

if (folio_test_private(folio)) {
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 16906eb592d9..509208825907 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -413,9 +413,9 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
{
#ifdef CONFIG_AFS_FSCACHE
struct {
- u32 vnode_id;
- u32 unique;
- u32 vnode_id_ext[2]; /* Allow for a 96-bit key */
+ __be32 vnode_id;
+ __be32 unique;
+ __be32 vnode_id_ext[2]; /* Allow for a 96-bit key */
} __packed key;
struct afs_vnode_cache_aux aux;

@@ -424,17 +424,18 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
return;
}

- key.vnode_id = vnode->fid.vnode;
- key.unique = vnode->fid.unique;
- key.vnode_id_ext[0] = vnode->fid.vnode >> 32;
- key.vnode_id_ext[1] = vnode->fid.vnode_hi;
- aux.data_version = vnode->status.data_version;
-
- vnode->cache = fscache_acquire_cookie(vnode->volume->cache,
- &afs_vnode_cache_index_def,
- &key, sizeof(key),
- &aux, sizeof(aux),
- vnode, vnode->status.size, true);
+ key.vnode_id = htonl(vnode->fid.vnode);
+ key.unique = htonl(vnode->fid.unique);
+ key.vnode_id_ext[0] = htonl(vnode->fid.vnode >> 32);
+ key.vnode_id_ext[1] = htonl(vnode->fid.vnode_hi);
+ afs_set_cache_aux(vnode, &aux);
+
+ vnode->cache = fscache_acquire_cookie(
+ vnode->volume->cache,
+ vnode->status.type == AFS_FTYPE_FILE ? 0 : FSCACHE_ADV_SINGLE_CHUNK,
+ &key, sizeof(key),
+ &aux, sizeof(aux),
+ vnode->status.size);
#endif
}

@@ -563,9 +564,7 @@ static void afs_zap_data(struct afs_vnode *vnode)
{
_enter("{%llx:%llu}", vnode->fid.vid, vnode->fid.vnode);

-#ifdef CONFIG_AFS_FSCACHE
- fscache_invalidate(vnode->cache);
-#endif
+ afs_invalidate_cache(vnode, 0);

/* nuke all the non-dirty pages that aren't locked, mapped or being
* written back in a regular file and completely discard the pages in a
@@ -786,14 +785,9 @@ void afs_evict_inode(struct inode *inode)
}

#ifdef CONFIG_AFS_FSCACHE
- {
- struct afs_vnode_cache_aux aux;
-
- aux.data_version = vnode->status.data_version;
- fscache_relinquish_cookie(vnode->cache, &aux,
- test_bit(AFS_VNODE_DELETED, &vnode->flags));
- vnode->cache = NULL;
- }
+ fscache_relinquish_cookie(vnode->cache,
+ test_bit(AFS_VNODE_DELETED, &vnode->flags));
+ vnode->cache = NULL;
#endif

afs_prune_wb_keys(vnode);
@@ -833,6 +827,9 @@ static void afs_setattr_edit_file(struct afs_operation *op)

if (size < i_size)
truncate_pagecache(inode, size);
+ if (size != i_size)
+ fscache_resize_cookie(afs_vnode_cache(vp->vnode),
+ vp->scb.status.size);
}
}

@@ -876,6 +873,8 @@ int afs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
attr->ia_valid &= ~ATTR_SIZE;
}

+ fscache_use_cookie(afs_vnode_cache(vnode), true);
+
/* flush any dirty data outstanding on a regular file */
if (S_ISREG(vnode->vfs_inode.i_mode))
filemap_write_and_wait(vnode->vfs_inode.i_mapping);
@@ -907,6 +906,7 @@ int afs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,

out_unlock:
up_write(&vnode->validate_lock);
+ fscache_unuse_cookie(afs_vnode_cache(vnode), NULL, NULL);
_leave(" = %d", ret);
return ret;
}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index aa4c0d6c9780..1d80649aec72 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -14,7 +14,6 @@
#include <linux/key.h>
#include <linux/workqueue.h>
#include <linux/sched.h>
-#define FSCACHE_USE_NEW_IO_API
#include <linux/fscache.h>
#include <linux/backing-dev.h>
#include <linux/uuid.h>
@@ -364,9 +363,6 @@ struct afs_cell {
struct key *anonymous_key; /* anonymous user key for this cell */
struct work_struct manager; /* Manager for init/deinit/dns */
struct hlist_node proc_link; /* /proc cell list link */
-#ifdef CONFIG_AFS_FSCACHE
- struct fscache_cookie *cache; /* caching cookie */
-#endif
time64_t dns_expiry; /* Time AFSDB/SRV record expires */
time64_t last_inactive; /* Time of last drop of usage count */
atomic_t ref; /* Struct refcount */
@@ -590,7 +586,7 @@ struct afs_volume {
#define AFS_VOLUME_BUSY 5 /* - T if volume busy notice given */
#define AFS_VOLUME_MAYBE_NO_IBULK 6 /* - T if some servers don't have InlineBulkStatus */
#ifdef CONFIG_AFS_FSCACHE
- struct fscache_cookie *cache; /* caching cookie */
+ struct fscache_volume *cache; /* Caching cookie */
#endif
struct afs_server_list __rcu *servers; /* List of servers on which volume resides */
rwlock_t servers_lock; /* Lock for ->servers */
@@ -872,9 +868,24 @@ struct afs_operation {
* Cache auxiliary data.
*/
struct afs_vnode_cache_aux {
- u64 data_version;
+ __be64 data_version;
} __packed;

+static inline void afs_set_cache_aux(struct afs_vnode *vnode,
+ struct afs_vnode_cache_aux *aux)
+{
+ aux->data_version = cpu_to_be64(vnode->status.data_version);
+}
+
+static inline void afs_invalidate_cache(struct afs_vnode *vnode, unsigned int flags)
+{
+ struct afs_vnode_cache_aux aux;
+
+ afs_set_cache_aux(vnode, &aux);
+ fscache_invalidate(afs_vnode_cache(vnode), &aux,
+ i_size_read(&vnode->vfs_inode), flags);
+}
+
/*
* We use folio->private to hold the amount of the folio that we've written to,
* splitting the field into two parts. However, we need to represent a range
@@ -962,13 +973,6 @@ extern void afs_merge_fs_addr6(struct afs_addr_list *, __be32 *, u16);
*/
#ifdef CONFIG_AFS_FSCACHE
extern struct fscache_netfs afs_cache_netfs;
-extern struct fscache_cookie_def afs_cell_cache_index_def;
-extern struct fscache_cookie_def afs_volume_cache_index_def;
-extern struct fscache_cookie_def afs_vnode_cache_index_def;
-#else
-#define afs_cell_cache_index_def (*(struct fscache_cookie_def *) NULL)
-#define afs_volume_cache_index_def (*(struct fscache_cookie_def *) NULL)
-#define afs_vnode_cache_index_def (*(struct fscache_cookie_def *) NULL)
#endif

/*
@@ -1506,7 +1510,7 @@ extern struct afs_vlserver_list *afs_extract_vlserver_list(struct afs_cell *,
* volume.c
*/
extern struct afs_volume *afs_create_volume(struct afs_fs_context *);
-extern void afs_activate_volume(struct afs_volume *);
+extern int afs_activate_volume(struct afs_volume *);
extern void afs_deactivate_volume(struct afs_volume *);
extern struct afs_volume *afs_get_volume(struct afs_volume *, enum afs_volume_trace);
extern void afs_put_volume(struct afs_net *, struct afs_volume *, enum afs_volume_trace);
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 179004b15566..eae288c8d40a 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -186,13 +186,6 @@ static int __init afs_init(void)
if (!afs_lock_manager)
goto error_lockmgr;

-#ifdef CONFIG_AFS_FSCACHE
- /* we want to be able to cache */
- ret = fscache_register_netfs(&afs_cache_netfs);
- if (ret < 0)
- goto error_cache;
-#endif
-
ret = register_pernet_device(&afs_net_ops);
if (ret < 0)
goto error_net;
@@ -215,10 +208,6 @@ static int __init afs_init(void)
error_fs:
unregister_pernet_device(&afs_net_ops);
error_net:
-#ifdef CONFIG_AFS_FSCACHE
- fscache_unregister_netfs(&afs_cache_netfs);
-error_cache:
-#endif
destroy_workqueue(afs_lock_manager);
error_lockmgr:
destroy_workqueue(afs_async_calls);
@@ -245,9 +234,6 @@ static void __exit afs_exit(void)
proc_remove(afs_proc_symlink);
afs_fs_exit();
unregister_pernet_device(&afs_net_ops);
-#ifdef CONFIG_AFS_FSCACHE
- fscache_unregister_netfs(&afs_cache_netfs);
-#endif
destroy_workqueue(afs_lock_manager);
destroy_workqueue(afs_async_calls);
destroy_workqueue(afs_wq);
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index f84194b791d3..94a3d247924b 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -268,15 +268,30 @@ void afs_put_volume(struct afs_net *net, struct afs_volume *volume,
/*
* Activate a volume.
*/
-void afs_activate_volume(struct afs_volume *volume)
+int afs_activate_volume(struct afs_volume *volume)
{
#ifdef CONFIG_AFS_FSCACHE
- volume->cache = fscache_acquire_cookie(volume->cell->cache,
- &afs_volume_cache_index_def,
- &volume->vid, sizeof(volume->vid),
- NULL, 0,
- volume, 0, true);
+ struct fscache_volume *vcookie;
+ char *name;
+
+ name = kasprintf(GFP_KERNEL, "afs,%s,%llx",
+ volume->cell->name, volume->vid);
+ if (!name)
+ return -ENOMEM;
+
+ vcookie = fscache_acquire_volume(name, NULL, NULL, 0);
+ if (IS_ERR(vcookie)) {
+ if (vcookie != ERR_PTR(-EBUSY)) {
+ kfree(name);
+ return PTR_ERR(vcookie);
+ }
+ pr_err("AFS: Cache volume key already in use (%s)\n", name);
+ vcookie = NULL;
+ }
+ volume->cache = vcookie;
+ kfree(name);
#endif
+ return 0;
}

/*
@@ -287,7 +302,7 @@ void afs_deactivate_volume(struct afs_volume *volume)
_enter("%s", volume->name);

#ifdef CONFIG_AFS_FSCACHE
- fscache_relinquish_cookie(volume->cache, NULL,
+ fscache_relinquish_volume(volume->cache, NULL,
test_bit(AFS_VOLUME_DELETED, &volume->flags));
volume->cache = NULL;
#endif
diff --git a/fs/afs/write.c b/fs/afs/write.c
index ca4909baf5e6..1c8ee6eae905 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -12,7 +12,6 @@
#include <linux/writeback.h>
#include <linux/pagevec.h>
#include <linux/netfs.h>
-#include <linux/fscache.h>
#include "internal.h"

/*



2021-12-22 23:28:43

by David Howells

[permalink] [raw]
Subject: [PATCH v4 57/68] afs: Copy local writes to the cache when writing to the server

When writing to the server from afs_writepage() or afs_writepages(), copy
the data to the cache object too.

To make this possible, the cookie must have its active users count
incremented when the page is dirtied and kept incremented until we manage
to clean up all the pages. This allows the writeback to take place after
the last file struct is released.

Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/163819662333.215744.7531373404219224438.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906970998.143852.674420788614608063.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967176564.1823006.16666056085593949570.stgit@warthog.procyon.org.uk/ # v3
---

fs/afs/file.c | 6 ++++
fs/afs/inode.c | 8 +++--
fs/afs/internal.h | 5 +++
fs/afs/super.c | 1 +
fs/afs/write.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++-------
5 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index be23635f35b8..572063dad0b3 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -416,6 +416,12 @@ static void afs_readahead(struct readahead_control *ractl)
netfs_readahead(ractl, &afs_req_ops, NULL);
}

+int afs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+ fscache_unpin_writeback(wbc, afs_vnode_cache(AFS_FS_I(inode)));
+ return 0;
+}
+
/*
* Adjust the dirty region of the page on truncation or full invalidation,
* getting rid of the markers altogether if the region is entirely invalidated.
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 509208825907..8db902405031 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -761,9 +761,8 @@ int afs_drop_inode(struct inode *inode)
*/
void afs_evict_inode(struct inode *inode)
{
- struct afs_vnode *vnode;
-
- vnode = AFS_FS_I(inode);
+ struct afs_vnode_cache_aux aux;
+ struct afs_vnode *vnode = AFS_FS_I(inode);

_enter("{%llx:%llu.%d}",
vnode->fid.vid,
@@ -775,6 +774,9 @@ void afs_evict_inode(struct inode *inode)
ASSERTCMP(inode->i_ino, ==, vnode->fid.vnode);

truncate_inode_pages_final(&inode->i_data);
+
+ afs_set_cache_aux(vnode, &aux);
+ fscache_clear_inode_writeback(afs_vnode_cache(vnode), inode, &aux);
clear_inode(inode);

while (!list_empty(&vnode->wb_keys)) {
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 1d80649aec72..b6f02321fc09 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1072,6 +1072,7 @@ extern int afs_release(struct inode *, struct file *);
extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
extern struct afs_read *afs_alloc_read(gfp_t);
extern void afs_put_read(struct afs_read *);
+extern int afs_write_inode(struct inode *, struct writeback_control *);

static inline struct afs_read *afs_get_read(struct afs_read *req)
{
@@ -1519,7 +1520,11 @@ extern int afs_check_volume_status(struct afs_volume *, struct afs_operation *);
/*
* write.c
*/
+#ifdef CONFIG_AFS_FSCACHE
extern int afs_set_page_dirty(struct page *);
+#else
+#define afs_set_page_dirty __set_page_dirty_nobuffers
+#endif
extern int afs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata);
diff --git a/fs/afs/super.c b/fs/afs/super.c
index d110def8aa8e..af7cbd9949c5 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -55,6 +55,7 @@ int afs_net_id;
static const struct super_operations afs_super_ops = {
.statfs = afs_statfs,
.alloc_inode = afs_alloc_inode,
+ .write_inode = afs_write_inode,
.drop_inode = afs_drop_inode,
.destroy_inode = afs_destroy_inode,
.free_inode = afs_free_inode,
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 1c8ee6eae905..5e9157d0da29 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -14,14 +14,28 @@
#include <linux/netfs.h>
#include "internal.h"

+static void afs_write_to_cache(struct afs_vnode *vnode, loff_t start, size_t len,
+ loff_t i_size, bool caching);
+
+#ifdef CONFIG_AFS_FSCACHE
/*
- * mark a page as having been made dirty and thus needing writeback
+ * Mark a page as having been made dirty and thus needing writeback. We also
+ * need to pin the cache object to write back to.
*/
int afs_set_page_dirty(struct page *page)
{
- _enter("");
- return __set_page_dirty_nobuffers(page);
+ return fscache_set_page_dirty(page, afs_vnode_cache(AFS_FS_I(page->mapping->host)));
+}
+static void afs_folio_start_fscache(bool caching, struct folio *folio)
+{
+ if (caching)
+ folio_start_fscache(folio);
+}
+#else
+static void afs_folio_start_fscache(bool caching, struct folio *folio)
+{
}
+#endif

/*
* prepare to perform part of a write to a page
@@ -113,7 +127,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
unsigned long priv;
unsigned int f, from = offset_in_folio(folio, pos);
unsigned int t, to = from + copied;
- loff_t i_size, maybe_i_size;
+ loff_t i_size, write_end_pos;

_enter("{%llx:%llu},{%lx}",
vnode->fid.vid, vnode->fid.vnode, folio_index(folio));
@@ -130,15 +144,16 @@ int afs_write_end(struct file *file, struct address_space *mapping,
if (copied == 0)
goto out;

- maybe_i_size = pos + copied;
+ write_end_pos = pos + copied;

i_size = i_size_read(&vnode->vfs_inode);
- if (maybe_i_size > i_size) {
+ if (write_end_pos > i_size) {
write_seqlock(&vnode->cb_lock);
i_size = i_size_read(&vnode->vfs_inode);
- if (maybe_i_size > i_size)
- afs_set_i_size(vnode, maybe_i_size);
+ if (write_end_pos > i_size)
+ afs_set_i_size(vnode, write_end_pos);
write_sequnlock(&vnode->cb_lock);
+ fscache_update_cookie(afs_vnode_cache(vnode), NULL, &write_end_pos);
}

if (folio_test_private(folio)) {
@@ -417,6 +432,7 @@ static void afs_extend_writeback(struct address_space *mapping,
loff_t start,
loff_t max_len,
bool new_content,
+ bool caching,
unsigned int *_len)
{
struct pagevec pvec;
@@ -463,7 +479,9 @@ static void afs_extend_writeback(struct address_space *mapping,
folio_put(folio);
break;
}
- if (!folio_test_dirty(folio) || folio_test_writeback(folio)) {
+ if (!folio_test_dirty(folio) ||
+ folio_test_writeback(folio) ||
+ folio_test_fscache(folio)) {
folio_unlock(folio);
folio_put(folio);
break;
@@ -511,6 +529,7 @@ static void afs_extend_writeback(struct address_space *mapping,
BUG();
if (folio_start_writeback(folio))
BUG();
+ afs_folio_start_fscache(caching, folio);

*_count -= folio_nr_pages(folio);
folio_unlock(folio);
@@ -538,6 +557,7 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping,
unsigned int offset, to, len, max_len;
loff_t i_size = i_size_read(&vnode->vfs_inode);
bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
+ bool caching = fscache_cookie_enabled(afs_vnode_cache(vnode));
long count = wbc->nr_to_write;
int ret;

@@ -545,6 +565,7 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping,

if (folio_start_writeback(folio))
BUG();
+ afs_folio_start_fscache(caching, folio);

count -= folio_nr_pages(folio);

@@ -571,7 +592,8 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping,
if (len < max_len &&
(to == folio_size(folio) || new_content))
afs_extend_writeback(mapping, vnode, &count,
- start, max_len, new_content, &len);
+ start, max_len, new_content,
+ caching, &len);
len = min_t(loff_t, len, max_len);
}

@@ -584,12 +606,19 @@ static ssize_t afs_write_back_from_locked_folio(struct address_space *mapping,
if (start < i_size) {
_debug("write back %x @%llx [%llx]", len, start, i_size);

+ /* Speculatively write to the cache. We have to fix this up
+ * later if the store fails.
+ */
+ afs_write_to_cache(vnode, start, len, i_size, caching);
+
iov_iter_xarray(&iter, WRITE, &mapping->i_pages, start, len);
ret = afs_store_data(vnode, &iter, start, false);
} else {
_debug("write discard %x @%llx [%llx]", len, start, i_size);

/* The dirty region was entirely beyond the EOF. */
+ fscache_clear_page_bits(afs_vnode_cache(vnode),
+ mapping, start, len, caching);
afs_pages_written_back(vnode, start, len);
ret = 0;
}
@@ -648,6 +677,10 @@ int afs_writepage(struct page *subpage, struct writeback_control *wbc)

_enter("{%lx},", folio_index(folio));

+#ifdef CONFIG_AFS_FSCACHE
+ folio_wait_fscache(folio);
+#endif
+
start = folio_index(folio) * PAGE_SIZE;
ret = afs_write_back_from_locked_folio(folio_mapping(folio), wbc,
folio, start, LLONG_MAX - start);
@@ -713,10 +746,15 @@ static int afs_writepages_region(struct address_space *mapping,
continue;
}

- if (folio_test_writeback(folio)) {
+ if (folio_test_writeback(folio) ||
+ folio_test_fscache(folio)) {
folio_unlock(folio);
- if (wbc->sync_mode != WB_SYNC_NONE)
+ if (wbc->sync_mode != WB_SYNC_NONE) {
folio_wait_writeback(folio);
+#ifdef CONFIG_AFS_FSCACHE
+ folio_wait_fscache(folio);
+#endif
+ }
folio_put(folio);
continue;
}
@@ -969,3 +1007,28 @@ int afs_launder_page(struct page *subpage)
folio_wait_fscache(folio);
return ret;
}
+
+/*
+ * Deal with the completion of writing the data to the cache.
+ */
+static void afs_write_to_cache_done(void *priv, ssize_t transferred_or_error,
+ bool was_async)
+{
+ struct afs_vnode *vnode = priv;
+
+ if (IS_ERR_VALUE(transferred_or_error) &&
+ transferred_or_error != -ENOBUFS)
+ afs_invalidate_cache(vnode, 0);
+}
+
+/*
+ * Save the write to the cache also.
+ */
+static void afs_write_to_cache(struct afs_vnode *vnode,
+ loff_t start, size_t len, loff_t i_size,
+ bool caching)
+{
+ fscache_write_to_cache(afs_vnode_cache(vnode),
+ vnode->vfs_inode.i_mapping, start, len, i_size,
+ afs_write_to_cache_done, vnode, caching);
+}



2021-12-22 23:28:53

by David Howells

[permalink] [raw]
Subject: [PATCH v4 58/68] afs: Skip truncation on the server of data we haven't written yet

Don't send a truncation RPC to the server if we're only shortening data
that's in the pagecache and is beyond the server's EOF.

Also don't automatically force writeback on setattr, but do wait to store
RPCs that are in the region to be removed on a shortening truncation.

Signed-off-by: David Howells <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163819663275.215744.4781075713714590913.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906972600.143852.14237659724463048094.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967177522.1823006.15336589054269480601.stgit@warthog.procyon.org.uk/ # v3
---

fs/afs/inode.c | 45 +++++++++++++++++++++++++++++++++++----------
1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 8db902405031..5964f8aee090 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -848,42 +848,67 @@ static const struct afs_operation_ops afs_setattr_operation = {
int afs_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
struct iattr *attr)
{
+ const unsigned int supported =
+ ATTR_SIZE | ATTR_MODE | ATTR_UID | ATTR_GID |
+ ATTR_MTIME | ATTR_MTIME_SET | ATTR_TIMES_SET | ATTR_TOUCH;
struct afs_operation *op;
struct afs_vnode *vnode = AFS_FS_I(d_inode(dentry));
+ struct inode *inode = &vnode->vfs_inode;
+ loff_t i_size;
int ret;

_enter("{%llx:%llu},{n=%pd},%x",
vnode->fid.vid, vnode->fid.vnode, dentry,
attr->ia_valid);

- if (!(attr->ia_valid & (ATTR_SIZE | ATTR_MODE | ATTR_UID | ATTR_GID |
- ATTR_MTIME | ATTR_MTIME_SET | ATTR_TIMES_SET |
- ATTR_TOUCH))) {
+ if (!(attr->ia_valid & supported)) {
_leave(" = 0 [unsupported]");
return 0;
}

+ i_size = i_size_read(inode);
if (attr->ia_valid & ATTR_SIZE) {
- if (!S_ISREG(vnode->vfs_inode.i_mode))
+ if (!S_ISREG(inode->i_mode))
return -EISDIR;

- ret = inode_newsize_ok(&vnode->vfs_inode, attr->ia_size);
+ ret = inode_newsize_ok(inode, attr->ia_size);
if (ret)
return ret;

- if (attr->ia_size == i_size_read(&vnode->vfs_inode))
+ if (attr->ia_size == i_size)
attr->ia_valid &= ~ATTR_SIZE;
}

fscache_use_cookie(afs_vnode_cache(vnode), true);

- /* flush any dirty data outstanding on a regular file */
- if (S_ISREG(vnode->vfs_inode.i_mode))
- filemap_write_and_wait(vnode->vfs_inode.i_mapping);
-
/* Prevent any new writebacks from starting whilst we do this. */
down_write(&vnode->validate_lock);

+ if ((attr->ia_valid & ATTR_SIZE) && S_ISREG(inode->i_mode)) {
+ loff_t size = attr->ia_size;
+
+ /* Wait for any outstanding writes to the server to complete */
+ loff_t from = min(size, i_size);
+ loff_t to = max(size, i_size);
+ ret = filemap_fdatawait_range(inode->i_mapping, from, to);
+ if (ret < 0)
+ goto out_unlock;
+
+ /* Don't talk to the server if we're just shortening in-memory
+ * writes that haven't gone to the server yet.
+ */
+ if (!(attr->ia_valid & (supported & ~ATTR_SIZE & ~ATTR_MTIME)) &&
+ attr->ia_size < i_size &&
+ attr->ia_size > vnode->status.size) {
+ truncate_pagecache(inode, attr->ia_size);
+ fscache_resize_cookie(afs_vnode_cache(vnode),
+ attr->ia_size);
+ i_size_write(inode, attr->ia_size);
+ ret = 0;
+ goto out_unlock;
+ }
+ }
+
op = afs_alloc_operation(((attr->ia_valid & ATTR_FILE) ?
afs_file_key(attr->ia_file) : NULL),
vnode->volume);



2021-12-22 23:29:10

by David Howells

[permalink] [raw]
Subject: [PATCH v4 59/68] 9p: Use fscache indexing rewrite and reenable caching

Change the 9p filesystem to take account of the changes to fscache's
indexing rewrite and reenable caching in 9p.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.

(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.

For 9p, I've made it render the volume name string as:

"9p,<devname>,<cachetag>"

where the cachetag is replaced by the aname if it wasn't supplied.

This probably needs rethinking a bit as the aname can have slashes in
it. It might be better to hash the cachetag and use the hash or I
could substitute commas for the slashes or something.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before.

(4) The functions to set/reset/flush cookies are removed and
fscache_use_cookie() and fscache_unuse_cookie() are used instead.

fscache_use_cookie() is passed a flag to indicate if the cookie is
opened for writing. fscache_unuse_cookie() is passed updates for the
metadata if we changed it (ie. if the file was opened for writing).

These are called when the file is opened or closed.

(5) wait_on_page_bit[_killable]() is replaced with the specific wait
functions for the bits waited upon.

(6) I've got rid of some of the 9p-specific cache helper functions and
called things like fscache_relinquish_cookie() directly as they'll
optimise away if v9fs_inode_cookie() returns an unconditional NULL
(which will be the case if CONFIG_9P_FSCACHE=n).

(7) v9fs_vfs_setattr() is made to call fscache_resize() to change the size
of the cache object.

Notes:

(A) We should call fscache_invalidate() if we detect that the server's
copy of a file got changed by a third party, but I don't know where to
do that. We don't need to do that when allocating the cookie as we
get a check-and-invalidate when we initially bind to the cache object.

(B) The copy-to-cache-on-writeback side of things will be handled in
separate patch.

Changes
=======
ver #3:
- Canonicalise the cookie key and coherency data to make them
endianness-independent.

ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <[email protected]>
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/163819664645.215744.1555314582005286846.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906975017.143852.3459573173204394039.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967178512.1823006.17377493641569138183.stgit@warthog.procyon.org.uk/ # v3
---

fs/9p/Kconfig | 2
fs/9p/cache.c | 195 +++++++++---------------------------------------
fs/9p/cache.h | 25 +-----
fs/9p/v9fs.c | 17 +---
fs/9p/v9fs.h | 13 +++
fs/9p/vfs_addr.c | 8 +-
fs/9p/vfs_dir.c | 13 +++
fs/9p/vfs_file.c | 3 -
fs/9p/vfs_inode.c | 22 +++--
fs/9p/vfs_inode_dotl.c | 3 -
10 files changed, 91 insertions(+), 210 deletions(-)

diff --git a/fs/9p/Kconfig b/fs/9p/Kconfig
index b3d33b3ddb98..d7bc93447c85 100644
--- a/fs/9p/Kconfig
+++ b/fs/9p/Kconfig
@@ -14,7 +14,7 @@ config 9P_FS
if 9P_FS
config 9P_FSCACHE
bool "Enable 9P client caching support"
- depends on 9P_FS=m && FSCACHE_OLD_API || 9P_FS=y && FSCACHE_OLD_API=y
+ depends on 9P_FS=m && FSCACHE || 9P_FS=y && FSCACHE=y
help
Choose Y here to enable persistent, read-only local
caching support for 9p clients using FS-Cache
diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index f2ba131cede1..55e108e5e133 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -16,186 +16,61 @@
#include "v9fs.h"
#include "cache.h"

-#define CACHETAG_LEN 11
-
-struct fscache_netfs v9fs_cache_netfs = {
- .name = "9p",
- .version = 0,
-};
-
-/*
- * v9fs_random_cachetag - Generate a random tag to be associated
- * with a new cache session.
- *
- * The value of jiffies is used for a fairly randomly cache tag.
- */
-
-static
-int v9fs_random_cachetag(struct v9fs_session_info *v9ses)
+int v9fs_cache_session_get_cookie(struct v9fs_session_info *v9ses,
+ const char *dev_name)
{
- v9ses->cachetag = kmalloc(CACHETAG_LEN, GFP_KERNEL);
- if (!v9ses->cachetag)
- return -ENOMEM;
+ struct fscache_volume *vcookie;
+ char *name, *p;

- return scnprintf(v9ses->cachetag, CACHETAG_LEN, "%lu", jiffies);
-}
-
-const struct fscache_cookie_def v9fs_cache_session_index_def = {
- .name = "9P.session",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
+ name = kasprintf(GFP_KERNEL, "9p,%s,%s",
+ dev_name, v9ses->cachetag ?: v9ses->aname);
+ if (!name)
+ return -ENOMEM;

-void v9fs_cache_session_get_cookie(struct v9fs_session_info *v9ses)
-{
- /* If no cache session tag was specified, we generate a random one. */
- if (!v9ses->cachetag) {
- if (v9fs_random_cachetag(v9ses) < 0) {
- v9ses->fscache = NULL;
- kfree(v9ses->cachetag);
- v9ses->cachetag = NULL;
- return;
+ for (p = name; *p; p++)
+ if (*p == '/')
+ *p = ';';
+
+ vcookie = fscache_acquire_volume(name, NULL, NULL, 0);
+ p9_debug(P9_DEBUG_FSC, "session %p get volume %p (%s)\n",
+ v9ses, vcookie, name);
+ if (IS_ERR(vcookie)) {
+ if (vcookie != ERR_PTR(-EBUSY)) {
+ kfree(name);
+ return PTR_ERR(vcookie);
}
+ pr_err("Cache volume key already in use (%s)\n", name);
+ vcookie = NULL;
}
-
- v9ses->fscache = fscache_acquire_cookie(v9fs_cache_netfs.primary_index,
- &v9fs_cache_session_index_def,
- v9ses->cachetag,
- strlen(v9ses->cachetag),
- NULL, 0,
- v9ses, 0, true);
- p9_debug(P9_DEBUG_FSC, "session %p get cookie %p\n",
- v9ses, v9ses->fscache);
-}
-
-void v9fs_cache_session_put_cookie(struct v9fs_session_info *v9ses)
-{
- p9_debug(P9_DEBUG_FSC, "session %p put cookie %p\n",
- v9ses, v9ses->fscache);
- fscache_relinquish_cookie(v9ses->fscache, NULL, false);
- v9ses->fscache = NULL;
-}
-
-static enum
-fscache_checkaux v9fs_cache_inode_check_aux(void *cookie_netfs_data,
- const void *buffer,
- uint16_t buflen,
- loff_t object_size)
-{
- const struct v9fs_inode *v9inode = cookie_netfs_data;
-
- if (buflen != sizeof(v9inode->qid.version))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- if (memcmp(buffer, &v9inode->qid.version,
- sizeof(v9inode->qid.version)))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
+ v9ses->fscache = vcookie;
+ kfree(name);
+ return 0;
}

-const struct fscache_cookie_def v9fs_cache_inode_index_def = {
- .name = "9p.inode",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = v9fs_cache_inode_check_aux,
-};
-
void v9fs_cache_inode_get_cookie(struct inode *inode)
{
struct v9fs_inode *v9inode;
struct v9fs_session_info *v9ses;
+ __le32 version;
+ __le64 path;

if (!S_ISREG(inode->i_mode))
return;

v9inode = V9FS_I(inode);
- if (v9inode->fscache)
+ if (WARN_ON(v9inode->fscache))
return;

+ version = cpu_to_le32(v9inode->qid.version);
+ path = cpu_to_le64(v9inode->qid.path);
v9ses = v9fs_inode2v9ses(inode);
- v9inode->fscache = fscache_acquire_cookie(v9ses->fscache,
- &v9fs_cache_inode_index_def,
- &v9inode->qid.path,
- sizeof(v9inode->qid.path),
- &v9inode->qid.version,
- sizeof(v9inode->qid.version),
- v9inode,
- i_size_read(&v9inode->vfs_inode),
- true);
+ v9inode->fscache =
+ fscache_acquire_cookie(v9fs_session_cache(v9ses),
+ 0,
+ &path, sizeof(path),
+ &version, sizeof(version),
+ i_size_read(&v9inode->vfs_inode));

p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
inode, v9inode->fscache);
}
-
-void v9fs_cache_inode_put_cookie(struct inode *inode)
-{
- struct v9fs_inode *v9inode = V9FS_I(inode);
-
- if (!v9inode->fscache)
- return;
- p9_debug(P9_DEBUG_FSC, "inode %p put cookie %p\n",
- inode, v9inode->fscache);
-
- fscache_relinquish_cookie(v9inode->fscache, &v9inode->qid.version,
- false);
- v9inode->fscache = NULL;
-}
-
-void v9fs_cache_inode_flush_cookie(struct inode *inode)
-{
- struct v9fs_inode *v9inode = V9FS_I(inode);
-
- if (!v9inode->fscache)
- return;
- p9_debug(P9_DEBUG_FSC, "inode %p flush cookie %p\n",
- inode, v9inode->fscache);
-
- fscache_relinquish_cookie(v9inode->fscache, NULL, true);
- v9inode->fscache = NULL;
-}
-
-void v9fs_cache_inode_set_cookie(struct inode *inode, struct file *filp)
-{
- struct v9fs_inode *v9inode = V9FS_I(inode);
-
- if (!v9inode->fscache)
- return;
-
- mutex_lock(&v9inode->fscache_lock);
-
- if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
- v9fs_cache_inode_flush_cookie(inode);
- else
- v9fs_cache_inode_get_cookie(inode);
-
- mutex_unlock(&v9inode->fscache_lock);
-}
-
-void v9fs_cache_inode_reset_cookie(struct inode *inode)
-{
- struct v9fs_inode *v9inode = V9FS_I(inode);
- struct v9fs_session_info *v9ses;
- struct fscache_cookie *old;
-
- if (!v9inode->fscache)
- return;
-
- old = v9inode->fscache;
-
- mutex_lock(&v9inode->fscache_lock);
- fscache_relinquish_cookie(v9inode->fscache, NULL, true);
-
- v9ses = v9fs_inode2v9ses(inode);
- v9inode->fscache = fscache_acquire_cookie(v9ses->fscache,
- &v9fs_cache_inode_index_def,
- &v9inode->qid.path,
- sizeof(v9inode->qid.path),
- &v9inode->qid.version,
- sizeof(v9inode->qid.version),
- v9inode,
- i_size_read(&v9inode->vfs_inode),
- true);
- p9_debug(P9_DEBUG_FSC, "inode %p revalidating cookie old %p new %p\n",
- inode, old, v9inode->fscache);
-
- mutex_unlock(&v9inode->fscache_lock);
-}
diff --git a/fs/9p/cache.h b/fs/9p/cache.h
index 7480b4b49fea..1923affcdc62 100644
--- a/fs/9p/cache.h
+++ b/fs/9p/cache.h
@@ -7,26 +7,15 @@

#ifndef _9P_CACHE_H
#define _9P_CACHE_H
-#define FSCACHE_USE_NEW_IO_API
+
#include <linux/fscache.h>

#ifdef CONFIG_9P_FSCACHE

-extern struct fscache_netfs v9fs_cache_netfs;
-extern const struct fscache_cookie_def v9fs_cache_session_index_def;
-extern const struct fscache_cookie_def v9fs_cache_inode_index_def;
-
-extern void v9fs_cache_session_get_cookie(struct v9fs_session_info *v9ses);
-extern void v9fs_cache_session_put_cookie(struct v9fs_session_info *v9ses);
+extern int v9fs_cache_session_get_cookie(struct v9fs_session_info *v9ses,
+ const char *dev_name);

extern void v9fs_cache_inode_get_cookie(struct inode *inode);
-extern void v9fs_cache_inode_put_cookie(struct inode *inode);
-extern void v9fs_cache_inode_flush_cookie(struct inode *inode);
-extern void v9fs_cache_inode_set_cookie(struct inode *inode, struct file *filp);
-extern void v9fs_cache_inode_reset_cookie(struct inode *inode);
-
-extern int __v9fs_cache_register(void);
-extern void __v9fs_cache_unregister(void);

#else /* CONFIG_9P_FSCACHE */

@@ -34,13 +23,5 @@ static inline void v9fs_cache_inode_get_cookie(struct inode *inode)
{
}

-static inline void v9fs_cache_inode_put_cookie(struct inode *inode)
-{
-}
-
-static inline void v9fs_cache_inode_set_cookie(struct inode *inode, struct file *file)
-{
-}
-
#endif /* CONFIG_9P_FSCACHE */
#endif /* _9P_CACHE_H */
diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index e32dd5f7721b..08f65c40af4f 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -469,7 +469,11 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,

#ifdef CONFIG_9P_FSCACHE
/* register the session for caching */
- v9fs_cache_session_get_cookie(v9ses);
+ if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE) {
+ rc = v9fs_cache_session_get_cookie(v9ses, dev_name);
+ if (rc < 0)
+ goto err_clnt;
+ }
#endif
spin_lock(&v9fs_sessionlist_lock);
list_add(&v9ses->slist, &v9fs_sessionlist);
@@ -502,8 +506,7 @@ void v9fs_session_close(struct v9fs_session_info *v9ses)
}

#ifdef CONFIG_9P_FSCACHE
- if (v9ses->fscache)
- v9fs_cache_session_put_cookie(v9ses);
+ fscache_relinquish_volume(v9fs_session_cache(v9ses), NULL, false);
kfree(v9ses->cachetag);
#endif
kfree(v9ses->uname);
@@ -665,20 +668,12 @@ static int v9fs_cache_register(void)
ret = v9fs_init_inode_cache();
if (ret < 0)
return ret;
-#ifdef CONFIG_9P_FSCACHE
- ret = fscache_register_netfs(&v9fs_cache_netfs);
- if (ret < 0)
- v9fs_destroy_inode_cache();
-#endif
return ret;
}

static void v9fs_cache_unregister(void)
{
v9fs_destroy_inode_cache();
-#ifdef CONFIG_9P_FSCACHE
- fscache_unregister_netfs(&v9fs_cache_netfs);
-#endif
}

/**
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 1647a8e63671..bc8b30205d36 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -89,7 +89,7 @@ struct v9fs_session_info {
unsigned int cache;
#ifdef CONFIG_9P_FSCACHE
char *cachetag;
- struct fscache_cookie *fscache;
+ struct fscache_volume *fscache;
#endif

char *uname; /* user name to mount as */
@@ -109,7 +109,6 @@ struct v9fs_session_info {

struct v9fs_inode {
#ifdef CONFIG_9P_FSCACHE
- struct mutex fscache_lock;
struct fscache_cookie *fscache;
#endif
struct p9_qid qid;
@@ -133,6 +132,16 @@ static inline struct fscache_cookie *v9fs_inode_cookie(struct v9fs_inode *v9inod
#endif
}

+static inline struct fscache_volume *v9fs_session_cache(struct v9fs_session_info *v9ses)
+{
+#ifdef CONFIG_9P_FSCACHE
+ return v9ses->fscache;
+#else
+ return NULL;
+#endif
+}
+
+
extern int v9fs_show_options(struct seq_file *m, struct dentry *root);

struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 4ea8f862b9e4..4f5ce4aca317 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -76,7 +76,9 @@ static void v9fs_req_cleanup(struct address_space *mapping, void *priv)
*/
static bool v9fs_is_cache_enabled(struct inode *inode)
{
- return fscache_cookie_enabled(v9fs_inode_cookie(V9FS_I(inode)));
+ struct fscache_cookie *cookie = v9fs_inode_cookie(V9FS_I(inode));
+
+ return fscache_cookie_enabled(cookie) && cookie->cache_priv;
}

/**
@@ -88,7 +90,7 @@ static int v9fs_begin_cache_operation(struct netfs_read_request *rreq)
#ifdef CONFIG_9P_FSCACHE
struct fscache_cookie *cookie = v9fs_inode_cookie(V9FS_I(rreq->inode));

- return fscache_begin_read_operation(rreq, cookie);
+ return fscache_begin_read_operation(&rreq->cache_resources, cookie);
#else
return -ENOBUFS;
#endif
@@ -140,7 +142,7 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
return 0;
#ifdef CONFIG_9P_FSCACHE
if (folio_test_fscache(folio)) {
- if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS))
+ if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
return 0;
folio_wait_fscache(folio);
}
diff --git a/fs/9p/vfs_dir.c b/fs/9p/vfs_dir.c
index 8c854d8cb0cd..958680f7f23e 100644
--- a/fs/9p/vfs_dir.c
+++ b/fs/9p/vfs_dir.c
@@ -17,6 +17,7 @@
#include <linux/idr.h>
#include <linux/slab.h>
#include <linux/uio.h>
+#include <linux/fscache.h>
#include <net/9p/9p.h>
#include <net/9p/client.h>

@@ -205,7 +206,10 @@ static int v9fs_dir_readdir_dotl(struct file *file, struct dir_context *ctx)

int v9fs_dir_release(struct inode *inode, struct file *filp)
{
+ struct v9fs_inode *v9inode = V9FS_I(inode);
struct p9_fid *fid;
+ __le32 version;
+ loff_t i_size;

fid = filp->private_data;
p9_debug(P9_DEBUG_VFS, "inode: %p filp: %p fid: %d\n",
@@ -216,6 +220,15 @@ int v9fs_dir_release(struct inode *inode, struct file *filp)
spin_unlock(&inode->i_lock);
p9_client_clunk(fid);
}
+
+ if ((filp->f_mode & FMODE_WRITE)) {
+ version = cpu_to_le32(v9inode->qid.version);
+ i_size = i_size_read(inode);
+ fscache_unuse_cookie(v9fs_inode_cookie(v9inode),
+ &version, &i_size);
+ } else {
+ fscache_unuse_cookie(v9fs_inode_cookie(v9inode), NULL, NULL);
+ }
return 0;
}

diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 612e297f3763..be72ad9edb3e 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -93,7 +93,8 @@ int v9fs_file_open(struct inode *inode, struct file *file)
}
mutex_unlock(&v9inode->v_mutex);
if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
- v9fs_cache_inode_set_cookie(inode, file);
+ fscache_use_cookie(v9fs_inode_cookie(v9inode),
+ file->f_mode & FMODE_WRITE);
v9fs_open_fid_add(inode, fid);
return 0;
out_error:
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 328c338ff304..00366bf1ac2c 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -233,7 +233,6 @@ struct inode *v9fs_alloc_inode(struct super_block *sb)
return NULL;
#ifdef CONFIG_9P_FSCACHE
v9inode->fscache = NULL;
- mutex_init(&v9inode->fscache_lock);
#endif
v9inode->writeback_fid = NULL;
v9inode->cache_validity = 0;
@@ -386,7 +385,7 @@ void v9fs_evict_inode(struct inode *inode)
clear_inode(inode);
filemap_fdatawrite(&inode->i_data);

- v9fs_cache_inode_put_cookie(inode);
+ fscache_relinquish_cookie(v9fs_inode_cookie(v9inode), false);
/* clunk the fid stashed in writeback_fid */
if (v9inode->writeback_fid) {
p9_client_clunk(v9inode->writeback_fid);
@@ -869,7 +868,8 @@ v9fs_vfs_atomic_open(struct inode *dir, struct dentry *dentry,

file->private_data = fid;
if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
- v9fs_cache_inode_set_cookie(d_inode(dentry), file);
+ fscache_use_cookie(v9fs_inode_cookie(v9inode),
+ file->f_mode & FMODE_WRITE);
v9fs_open_fid_add(inode, fid);

file->f_mode |= FMODE_CREATED;
@@ -1072,6 +1072,8 @@ static int v9fs_vfs_setattr(struct user_namespace *mnt_userns,
struct dentry *dentry, struct iattr *iattr)
{
int retval, use_dentry = 0;
+ struct inode *inode = d_inode(dentry);
+ struct v9fs_inode *v9inode = V9FS_I(inode);
struct v9fs_session_info *v9ses;
struct p9_fid *fid = NULL;
struct p9_wstat wstat;
@@ -1117,7 +1119,7 @@ static int v9fs_vfs_setattr(struct user_namespace *mnt_userns,

/* Write all dirty data */
if (d_is_reg(dentry))
- filemap_write_and_wait(d_inode(dentry)->i_mapping);
+ filemap_write_and_wait(inode->i_mapping);

retval = p9_client_wstat(fid, &wstat);

@@ -1128,13 +1130,15 @@ static int v9fs_vfs_setattr(struct user_namespace *mnt_userns,
return retval;

if ((iattr->ia_valid & ATTR_SIZE) &&
- iattr->ia_size != i_size_read(d_inode(dentry)))
- truncate_setsize(d_inode(dentry), iattr->ia_size);
+ iattr->ia_size != i_size_read(inode)) {
+ truncate_setsize(inode, iattr->ia_size);
+ fscache_resize_cookie(v9fs_inode_cookie(v9inode), iattr->ia_size);
+ }

- v9fs_invalidate_inode_attr(d_inode(dentry));
+ v9fs_invalidate_inode_attr(inode);

- setattr_copy(&init_user_ns, d_inode(dentry), iattr);
- mark_inode_dirty(d_inode(dentry));
+ setattr_copy(&init_user_ns, inode, iattr);
+ mark_inode_dirty(inode);
return 0;
}

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 7dee89ba32e7..cae301d09cd3 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -344,7 +344,8 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
goto err_clunk_old_fid;
file->private_data = ofid;
if (v9ses->cache == CACHE_LOOSE || v9ses->cache == CACHE_FSCACHE)
- v9fs_cache_inode_set_cookie(inode, file);
+ fscache_use_cookie(v9fs_inode_cookie(v9inode),
+ file->f_mode & FMODE_WRITE);
v9fs_open_fid_add(inode, ofid);
file->f_mode |= FMODE_CREATED;
out:



2021-12-22 23:29:21

by David Howells

[permalink] [raw]
Subject: [PATCH v4 60/68] 9p: Copy local writes to the cache when writing to the server

When writing to the server from v9fs_vfs_writepage(), copy the data to the
cache object too.

To make this possible, the cookie must have its active users count
incremented when the page is dirtied and kept incremented until we manage
to clean up all the pages. This allows the writeback to take place after
the last file struct is released.

This is done by taking a use on the cookie in v9fs_set_page_dirty() if we
haven't already done so (controlled by the I_PINNING_FSCACHE_WB flag) and
dropping the pin in v9fs_write_inode() if __writeback_single_inode() clears
all the outstanding dirty pages (conveyed by the unpinned_fscache_wb flag
in the writeback_control struct).

Inode eviction must also clear the flag after truncating away all the
outstanding pages.

In the future this will be handled more gracefully by netfslib.

Changes
=======
ver #3:
- Canonicalise the coherency data to make it endianness-independent.

ver #2:
- Fix an unused-var warning due to CONFIG_9P_FSCACHE=n[1].

Signed-off-by: David Howells <[email protected]>
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/163819667027.215744.13815687931204222995.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906978015.143852.10646669694345706328.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967180760.1823006.5831751873616248910.stgit@warthog.procyon.org.uk/ # v3
---

fs/9p/vfs_addr.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
fs/9p/vfs_inode.c | 4 ++++
fs/9p/vfs_super.c | 3 +++
3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 4f5ce4aca317..f3f349f460e5 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -137,6 +137,7 @@ static void v9fs_vfs_readahead(struct readahead_control *ractl)
static int v9fs_release_page(struct page *page, gfp_t gfp)
{
struct folio *folio = page_folio(page);
+ struct inode *inode = folio_inode(folio);

if (folio_test_private(folio))
return 0;
@@ -147,6 +148,7 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
folio_wait_fscache(folio);
}
#endif
+ fscache_note_page_release(v9fs_inode_cookie(V9FS_I(inode)));
return 1;
}

@@ -165,10 +167,25 @@ static void v9fs_invalidate_page(struct page *page, unsigned int offset,
folio_wait_fscache(folio);
}

+static void v9fs_write_to_cache_done(void *priv, ssize_t transferred_or_error,
+ bool was_async)
+{
+ struct v9fs_inode *v9inode = priv;
+ __le32 version;
+
+ if (IS_ERR_VALUE(transferred_or_error) &&
+ transferred_or_error != -ENOBUFS) {
+ version = cpu_to_le32(v9inode->qid.version);
+ fscache_invalidate(v9fs_inode_cookie(v9inode), &version,
+ i_size_read(&v9inode->vfs_inode), 0);
+ }
+}
+
static int v9fs_vfs_write_folio_locked(struct folio *folio)
{
struct inode *inode = folio_inode(folio);
struct v9fs_inode *v9inode = V9FS_I(inode);
+ struct fscache_cookie *cookie = v9fs_inode_cookie(v9inode);
loff_t start = folio_pos(folio);
loff_t i_size = i_size_read(inode);
struct iov_iter from;
@@ -185,10 +202,21 @@ static int v9fs_vfs_write_folio_locked(struct folio *folio)
/* We should have writeback_fid always set */
BUG_ON(!v9inode->writeback_fid);

+ folio_wait_fscache(folio);
folio_start_writeback(folio);

p9_client_write(v9inode->writeback_fid, start, &from, &err);

+ if (err == 0 &&
+ fscache_cookie_enabled(cookie) &&
+ test_bit(FSCACHE_COOKIE_IS_CACHING, &cookie->flags)) {
+ folio_start_fscache(folio);
+ fscache_write_to_cache(v9fs_inode_cookie(v9inode),
+ folio_mapping(folio), start, len, i_size,
+ v9fs_write_to_cache_done, v9inode,
+ true);
+ }
+
folio_end_writeback(folio);
return err;
}
@@ -307,6 +335,7 @@ static int v9fs_write_end(struct file *filp, struct address_space *mapping,
loff_t last_pos = pos + copied;
struct folio *folio = page_folio(subpage);
struct inode *inode = mapping->host;
+ struct v9fs_inode *v9inode = V9FS_I(inode);

p9_debug(P9_DEBUG_VFS, "filp %p, mapping %p\n", filp, mapping);

@@ -326,6 +355,7 @@ static int v9fs_write_end(struct file *filp, struct address_space *mapping,
if (last_pos > inode->i_size) {
inode_add_bytes(inode, last_pos - inode->i_size);
i_size_write(inode, last_pos);
+ fscache_update_cookie(v9fs_inode_cookie(v9inode), NULL, &last_pos);
}
folio_mark_dirty(folio);
out:
@@ -335,11 +365,25 @@ static int v9fs_write_end(struct file *filp, struct address_space *mapping,
return copied;
}

+#ifdef CONFIG_9P_FSCACHE
+/*
+ * Mark a page as having been made dirty and thus needing writeback. We also
+ * need to pin the cache object to write back to.
+ */
+static int v9fs_set_page_dirty(struct page *page)
+{
+ struct v9fs_inode *v9inode = V9FS_I(page->mapping->host);
+
+ return fscache_set_page_dirty(page, v9fs_inode_cookie(v9inode));
+}
+#else
+#define v9fs_set_page_dirty __set_page_dirty_nobuffers
+#endif

const struct address_space_operations v9fs_addr_operations = {
.readpage = v9fs_vfs_readpage,
.readahead = v9fs_vfs_readahead,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = v9fs_set_page_dirty,
.writepage = v9fs_vfs_writepage,
.write_begin = v9fs_write_begin,
.write_end = v9fs_write_end,
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 00366bf1ac2c..2a10242c79c7 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -380,8 +380,12 @@ struct inode *v9fs_get_inode(struct super_block *sb, umode_t mode, dev_t rdev)
void v9fs_evict_inode(struct inode *inode)
{
struct v9fs_inode *v9inode = V9FS_I(inode);
+ __le32 version;

truncate_inode_pages_final(&inode->i_data);
+ version = cpu_to_le32(v9inode->qid.version);
+ fscache_clear_inode_writeback(v9fs_inode_cookie(v9inode), inode,
+ &version);
clear_inode(inode);
filemap_fdatawrite(&inode->i_data);

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index b739e02f5ef7..97e23b4e6982 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -20,6 +20,7 @@
#include <linux/slab.h>
#include <linux/statfs.h>
#include <linux/magic.h>
+#include <linux/fscache.h>
#include <net/9p/9p.h>
#include <net/9p/client.h>

@@ -309,6 +310,7 @@ static int v9fs_write_inode(struct inode *inode,
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
return ret;
}
+ fscache_unpin_writeback(wbc, v9fs_inode_cookie(v9inode));
return 0;
}

@@ -332,6 +334,7 @@ static int v9fs_write_inode_dotl(struct inode *inode,
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
return ret;
}
+ fscache_unpin_writeback(wbc, v9fs_inode_cookie(v9inode));
return 0;
}




2021-12-22 23:29:55

by David Howells

[permalink] [raw]
Subject: [PATCH v4 61/68] nfs: Convert to new fscache volume/cookie API

From: Dave Wysochanski <[email protected]>

Change the nfs filesystem to support fscache's indexing rewrite and
reenable caching in nfs.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.

(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.

For nfs, I've made it render the volume name string as:

"nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]"

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before.

(4) fscache_enable/disable_cookie() have been removed.

Call fscache_use_cookie() and fscache_unuse_cookie() when a file is
opened or closed to prevent a cache file from being culled and to keep
resources to hand that are needed to do I/O.

If a file is opened for writing, we invalidate it with
FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache,
thereby making it cease caching until all currently open files are
closed. This should give the same behaviour as the uptream code.
Making the cache store local modifications isn't straightforward for
NFS, so that's left for future patches.

(5) fscache_invalidate() now needs to be given uptodate auxiliary data and
a file size. It also takes a flag to indicate if this was due to a
DIO write.

(6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file
to which a DIO write is made.

(7) Call fscache_note_page_release() from nfs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

(9) The functions to read and write data to/from the cache are stubbed out
pending a conversion to use netfslib.

Changes
=======
ver #3:
- Added missing =n fallback for nfs_fscache_release_file()[1][2].

ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.
- Remove NFS_INO_FSCACHE as it's no longer used.
- Need to unuse a cookie on file-release, not inode-clear.

Signed-off-by: Dave Wysochanski <[email protected]>
Co-developed-by: David Howells <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Trond Myklebust <[email protected]>
cc: Anna Schumaker <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3
---

fs/nfs/Kconfig | 2
fs/nfs/Makefile | 2
fs/nfs/client.c | 4
fs/nfs/direct.c | 2
fs/nfs/file.c | 13 +
fs/nfs/fscache-index.c | 140 ---------------
fs/nfs/fscache.c | 434 +++++++++++----------------------------------
fs/nfs/fscache.h | 127 ++++---------
fs/nfs/inode.c | 11 -
fs/nfs/nfstrace.h | 1
fs/nfs/super.c | 28 ++-
fs/nfs/write.c | 1
include/linux/nfs_fs.h | 1
include/linux/nfs_fs_sb.h | 9 -
14 files changed, 172 insertions(+), 603 deletions(-)
delete mode 100644 fs/nfs/fscache-index.c

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index bdc11b89eac5..14a72224b657 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -170,7 +170,7 @@ config ROOT_NFS

config NFS_FSCACHE
bool "Provide NFS client caching support"
- depends on NFS_FS=m && FSCACHE_OLD_API || NFS_FS=y && FSCACHE_OLD_API=y
+ depends on NFS_FS=m && FSCACHE || NFS_FS=y && FSCACHE=y
help
Say Y here if you want NFS data to be cached locally on disc through
the general filesystem cache manager
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 22d11fdc6deb..5f6db37f461e 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -12,7 +12,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
export.o sysfs.o fs_context.o
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
-nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-index.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o

obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 1e4dc1ab9312..8d8b85b5a641 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -183,8 +183,6 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_net = get_net(cl_init->net);

clp->cl_principal = "*";
- nfs_fscache_get_client_cookie(clp);
-
return clp;

error_cleanup:
@@ -238,8 +236,6 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
- nfs_fscache_release_client_cookie(clp);
-
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 9cff8709c80a..eabfdab543c8 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -59,6 +59,7 @@
#include "internal.h"
#include "iostat.h"
#include "pnfs.h"
+#include "fscache.h"

#define NFSDBG_FACILITY NFSDBG_VFS

@@ -959,6 +960,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
} else {
result = requested;
}
+ nfs_fscache_invalidate(inode, FSCACHE_INVAL_DIO_WRITE);
out_release:
nfs_direct_req_release(dreq);
out:
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 24e7dccce355..76d76acbc594 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -84,6 +84,7 @@ nfs_file_release(struct inode *inode, struct file *filp)

nfs_inc_stats(inode, NFSIOS_VFSRELEASE);
nfs_file_clear_open_context(filp);
+ nfs_fscache_release_file(inode, filp);
return 0;
}
EXPORT_SYMBOL_GPL(nfs_file_release);
@@ -415,8 +416,7 @@ static void nfs_invalidate_page(struct page *page, unsigned int offset,
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page_file_mapping(page)->host, page);
-
- nfs_fscache_invalidate_page(page, page->mapping->host);
+ wait_on_page_fscache(page);
}

/*
@@ -475,12 +475,11 @@ static void nfs_check_dirty_writeback(struct page *page,
static int nfs_launder_page(struct page *page)
{
struct inode *inode = page_file_mapping(page)->host;
- struct nfs_inode *nfsi = NFS_I(inode);

dfprintk(PAGECACHE, "NFS: launder_page(%ld, %llu)\n",
inode->i_ino, (long long)page_offset(page));

- nfs_fscache_wait_on_page_write(nfsi, page);
+ wait_on_page_fscache(page);
return nfs_wb_page(inode, page);
}

@@ -555,7 +554,11 @@ static vm_fault_t nfs_vm_page_mkwrite(struct vm_fault *vmf)
sb_start_pagefault(inode->i_sb);

/* make sure the cache has finished storing the page */
- nfs_fscache_wait_on_page_write(NFS_I(inode), page);
+ if (PageFsCache(page) &&
+ wait_on_page_fscache_killable(vmf->page) < 0) {
+ ret = VM_FAULT_RETRY;
+ goto out;
+ }

wait_on_bit_action(&NFS_I(inode)->flags, NFS_INO_INVALIDATING,
nfs_wait_bit_killable, TASK_KILLABLE);
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
deleted file mode 100644
index 573b1da9342c..000000000000
--- a/fs/nfs/fscache-index.c
+++ /dev/null
@@ -1,140 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/* NFS FS-Cache index structure definition
- *
- * Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells ([email protected])
- */
-
-#include <linux/init.h>
-#include <linux/kernel.h>
-#include <linux/sched.h>
-#include <linux/mm.h>
-#include <linux/nfs_fs.h>
-#include <linux/nfs_fs_sb.h>
-#include <linux/in6.h>
-#include <linux/iversion.h>
-
-#include "internal.h"
-#include "fscache.h"
-
-#define NFSDBG_FACILITY NFSDBG_FSCACHE
-
-/*
- * Define the NFS filesystem for FS-Cache. Upon registration FS-Cache sticks
- * the cookie for the top-level index object for NFS into here. The top-level
- * index can than have other cache objects inserted into it.
- */
-struct fscache_netfs nfs_fscache_netfs = {
- .name = "nfs",
- .version = 0,
-};
-
-/*
- * Register NFS for caching
- */
-int nfs_fscache_register(void)
-{
- return fscache_register_netfs(&nfs_fscache_netfs);
-}
-
-/*
- * Unregister NFS for caching
- */
-void nfs_fscache_unregister(void)
-{
- fscache_unregister_netfs(&nfs_fscache_netfs);
-}
-
-/*
- * Define the server object for FS-Cache. This is used to describe a server
- * object to fscache_acquire_cookie(). It is keyed by the NFS protocol and
- * server address parameters.
- */
-const struct fscache_cookie_def nfs_fscache_server_index_def = {
- .name = "NFS.server",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-/*
- * Define the superblock object for FS-Cache. This is used to describe a
- * superblock object to fscache_acquire_cookie(). It is keyed by all the NFS
- * parameters that might cause a separate superblock.
- */
-const struct fscache_cookie_def nfs_fscache_super_index_def = {
- .name = "NFS.super",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-/*
- * Consult the netfs about the state of an object
- * - This function can be absent if the index carries no state data
- * - The netfs data from the cookie being used as the target is
- * presented, as is the auxiliary data
- */
-static
-enum fscache_checkaux nfs_fscache_inode_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct nfs_fscache_inode_auxdata auxdata;
- struct nfs_inode *nfsi = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.mtime_sec = nfsi->vfs_inode.i_mtime.tv_sec;
- auxdata.mtime_nsec = nfsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.ctime_sec = nfsi->vfs_inode.i_ctime.tv_sec;
- auxdata.ctime_nsec = nfsi->vfs_inode.i_ctime.tv_nsec;
-
- if (NFS_SERVER(&nfsi->vfs_inode)->nfs_client->rpc_ops->version == 4)
- auxdata.change_attr = inode_peek_iversion_raw(&nfsi->vfs_inode);
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-/*
- * Get an extra reference on a read context.
- * - This function can be absent if the completion function doesn't require a
- * context.
- * - The read context is passed back to NFS in the event that a data read on the
- * cache fails with EIO - in which case the server must be contacted to
- * retrieve the data, which requires the read context for security.
- */
-static void nfs_fh_get_context(void *cookie_netfs_data, void *context)
-{
- get_nfs_open_context(context);
-}
-
-/*
- * Release an extra reference on a read context.
- * - This function can be absent if the completion function doesn't require a
- * context.
- */
-static void nfs_fh_put_context(void *cookie_netfs_data, void *context)
-{
- if (context)
- put_nfs_open_context(context);
-}
-
-/*
- * Define the inode object for FS-Cache. This is used to describe an inode
- * object to fscache_acquire_cookie(). It is keyed by the NFS file handle for
- * an inode.
- *
- * Coherency is managed by comparing the copies of i_size, i_mtime and i_ctime
- * held in the cache auxiliary data for the data storage object with those in
- * the inode struct in memory.
- */
-const struct fscache_cookie_def nfs_fscache_inode_object_def = {
- .name = "NFS.fh",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = nfs_fscache_inode_check_aux,
- .get_context = nfs_fh_get_context,
- .put_context = nfs_fh_put_context,
-};
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index d743629e05e1..fac6438477a0 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -22,24 +22,18 @@

#define NFSDBG_FACILITY NFSDBG_FSCACHE

-static struct rb_root nfs_fscache_keys = RB_ROOT;
-static DEFINE_SPINLOCK(nfs_fscache_keys_lock);
+#define NFS_MAX_KEY_LEN 1000

-/*
- * Layout of the key for an NFS server cache object.
- */
-struct nfs_server_key {
- struct {
- uint16_t nfsversion; /* NFS protocol version */
- uint32_t minorversion; /* NFSv4 minor version */
- uint16_t family; /* address family */
- __be16 port; /* IP port */
- } hdr;
- union {
- struct in_addr ipv4_addr; /* IPv4 address */
- struct in6_addr ipv6_addr; /* IPv6 address */
- };
-} __packed;
+static bool nfs_append_int(char *key, int *_len, unsigned long long x)
+{
+ if (*_len > NFS_MAX_KEY_LEN)
+ return false;
+ if (x == 0)
+ key[(*_len)++] = ',';
+ else
+ *_len += sprintf(key + *_len, ",%llx", x);
+ return true;
+}

/*
* Get the per-client index cookie for an NFS client if the appropriate mount
@@ -47,160 +41,108 @@ struct nfs_server_key {
* - We always try and get an index cookie for the client, but get filehandle
* cookies on a per-superblock basis, depending on the mount flags
*/
-void nfs_fscache_get_client_cookie(struct nfs_client *clp)
+static bool nfs_fscache_get_client_key(struct nfs_client *clp,
+ char *key, int *_len)
{
const struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) &clp->cl_addr;
const struct sockaddr_in *sin = (struct sockaddr_in *) &clp->cl_addr;
- struct nfs_server_key key;
- uint16_t len = sizeof(key.hdr);

- memset(&key, 0, sizeof(key));
- key.hdr.nfsversion = clp->rpc_ops->version;
- key.hdr.minorversion = clp->cl_minorversion;
- key.hdr.family = clp->cl_addr.ss_family;
+ *_len += snprintf(key + *_len, NFS_MAX_KEY_LEN - *_len,
+ ",%u.%u,%x",
+ clp->rpc_ops->version,
+ clp->cl_minorversion,
+ clp->cl_addr.ss_family);

switch (clp->cl_addr.ss_family) {
case AF_INET:
- key.hdr.port = sin->sin_port;
- key.ipv4_addr = sin->sin_addr;
- len += sizeof(key.ipv4_addr);
- break;
+ if (!nfs_append_int(key, _len, sin->sin_port) ||
+ !nfs_append_int(key, _len, sin->sin_addr.s_addr))
+ return false;
+ return true;

case AF_INET6:
- key.hdr.port = sin6->sin6_port;
- key.ipv6_addr = sin6->sin6_addr;
- len += sizeof(key.ipv6_addr);
- break;
+ if (!nfs_append_int(key, _len, sin6->sin6_port) ||
+ !nfs_append_int(key, _len, sin6->sin6_addr.s6_addr32[0]) ||
+ !nfs_append_int(key, _len, sin6->sin6_addr.s6_addr32[1]) ||
+ !nfs_append_int(key, _len, sin6->sin6_addr.s6_addr32[2]) ||
+ !nfs_append_int(key, _len, sin6->sin6_addr.s6_addr32[3]))
+ return false;
+ return true;

default:
printk(KERN_WARNING "NFS: Unknown network family '%d'\n",
clp->cl_addr.ss_family);
- clp->fscache = NULL;
- return;
+ return false;
}
-
- /* create a cache index for looking up filehandles */
- clp->fscache = fscache_acquire_cookie(nfs_fscache_netfs.primary_index,
- &nfs_fscache_server_index_def,
- &key, len,
- NULL, 0,
- clp, 0, true);
- dfprintk(FSCACHE, "NFS: get client cookie (0x%p/0x%p)\n",
- clp, clp->fscache);
-}
-
-/*
- * Dispose of a per-client cookie
- */
-void nfs_fscache_release_client_cookie(struct nfs_client *clp)
-{
- dfprintk(FSCACHE, "NFS: releasing client cookie (0x%p/0x%p)\n",
- clp, clp->fscache);
-
- fscache_relinquish_cookie(clp->fscache, NULL, false);
- clp->fscache = NULL;
}

/*
- * Get the cache cookie for an NFS superblock. We have to handle
- * uniquification here because the cache doesn't do it for us.
+ * Get the cache cookie for an NFS superblock.
*
* The default uniquifier is just an empty string, but it may be overridden
* either by the 'fsc=xxx' option to mount, or by inheriting it from the parent
* superblock across an automount point of some nature.
*/
-void nfs_fscache_get_super_cookie(struct super_block *sb, const char *uniq, int ulen)
+int nfs_fscache_get_super_cookie(struct super_block *sb, const char *uniq, int ulen)
{
- struct nfs_fscache_key *key, *xkey;
+ struct fscache_volume *vcookie;
struct nfs_server *nfss = NFS_SB(sb);
- struct rb_node **p, *parent;
- int diff;
+ unsigned int len = 3;
+ char *key;

- nfss->fscache_key = NULL;
- nfss->fscache = NULL;
- if (!uniq) {
- uniq = "";
- ulen = 1;
+ if (uniq) {
+ nfss->fscache_uniq = kmemdup_nul(uniq, ulen, GFP_KERNEL);
+ if (!nfss->fscache_uniq)
+ return -ENOMEM;
}

- key = kzalloc(sizeof(*key) + ulen, GFP_KERNEL);
+ key = kmalloc(NFS_MAX_KEY_LEN + 24, GFP_KERNEL);
if (!key)
- return;
-
- key->nfs_client = nfss->nfs_client;
- key->key.super.s_flags = sb->s_flags & NFS_SB_MASK;
- key->key.nfs_server.flags = nfss->flags;
- key->key.nfs_server.rsize = nfss->rsize;
- key->key.nfs_server.wsize = nfss->wsize;
- key->key.nfs_server.acregmin = nfss->acregmin;
- key->key.nfs_server.acregmax = nfss->acregmax;
- key->key.nfs_server.acdirmin = nfss->acdirmin;
- key->key.nfs_server.acdirmax = nfss->acdirmax;
- key->key.nfs_server.fsid = nfss->fsid;
- key->key.rpc_auth.au_flavor = nfss->client->cl_auth->au_flavor;
-
- key->key.uniq_len = ulen;
- memcpy(key->key.uniquifier, uniq, ulen);
-
- spin_lock(&nfs_fscache_keys_lock);
- p = &nfs_fscache_keys.rb_node;
- parent = NULL;
- while (*p) {
- parent = *p;
- xkey = rb_entry(parent, struct nfs_fscache_key, node);
-
- if (key->nfs_client < xkey->nfs_client)
- goto go_left;
- if (key->nfs_client > xkey->nfs_client)
- goto go_right;
-
- diff = memcmp(&key->key, &xkey->key, sizeof(key->key));
- if (diff < 0)
- goto go_left;
- if (diff > 0)
- goto go_right;
-
- if (key->key.uniq_len == 0)
- goto non_unique;
- diff = memcmp(key->key.uniquifier,
- xkey->key.uniquifier,
- key->key.uniq_len);
- if (diff < 0)
- goto go_left;
- if (diff > 0)
- goto go_right;
- goto non_unique;
-
- go_left:
- p = &(*p)->rb_left;
- continue;
- go_right:
- p = &(*p)->rb_right;
+ return -ENOMEM;
+
+ memcpy(key, "nfs", 3);
+ if (!nfs_fscache_get_client_key(nfss->nfs_client, key, &len) ||
+ !nfs_append_int(key, &len, nfss->fsid.major) ||
+ !nfs_append_int(key, &len, nfss->fsid.minor) ||
+ !nfs_append_int(key, &len, sb->s_flags & NFS_SB_MASK) ||
+ !nfs_append_int(key, &len, nfss->flags) ||
+ !nfs_append_int(key, &len, nfss->rsize) ||
+ !nfs_append_int(key, &len, nfss->wsize) ||
+ !nfs_append_int(key, &len, nfss->acregmin) ||
+ !nfs_append_int(key, &len, nfss->acregmax) ||
+ !nfs_append_int(key, &len, nfss->acdirmin) ||
+ !nfs_append_int(key, &len, nfss->acdirmax) ||
+ !nfs_append_int(key, &len, nfss->client->cl_auth->au_flavor))
+ goto out;
+
+ if (ulen > 0) {
+ if (ulen > NFS_MAX_KEY_LEN - len)
+ goto out;
+ key[len++] = ',';
+ memcpy(key + len, uniq, ulen);
+ len += ulen;
}
-
- rb_link_node(&key->node, parent, p);
- rb_insert_color(&key->node, &nfs_fscache_keys);
- spin_unlock(&nfs_fscache_keys_lock);
- nfss->fscache_key = key;
+ key[len] = 0;

/* create a cache index for looking up filehandles */
- nfss->fscache = fscache_acquire_cookie(nfss->nfs_client->fscache,
- &nfs_fscache_super_index_def,
- &key->key,
- sizeof(key->key) + ulen,
- NULL, 0,
- nfss, 0, true);
+ vcookie = fscache_acquire_volume(key,
+ NULL, /* preferred_cache */
+ NULL, 0 /* coherency_data */);
dfprintk(FSCACHE, "NFS: get superblock cookie (0x%p/0x%p)\n",
- nfss, nfss->fscache);
- return;
+ nfss, vcookie);
+ if (IS_ERR(vcookie)) {
+ if (vcookie != ERR_PTR(-EBUSY)) {
+ kfree(key);
+ return PTR_ERR(vcookie);
+ }
+ pr_err("NFS: Cache volume key already in use (%s)\n", key);
+ vcookie = NULL;
+ }
+ nfss->fscache = vcookie;

-non_unique:
- spin_unlock(&nfs_fscache_keys_lock);
+out:
kfree(key);
- nfss->fscache_key = NULL;
- nfss->fscache = NULL;
- printk(KERN_WARNING "NFS:"
- " Cache request denied due to non-unique superblock keys\n");
+ return 0;
}

/*
@@ -213,29 +155,9 @@ void nfs_fscache_release_super_cookie(struct super_block *sb)
dfprintk(FSCACHE, "NFS: releasing superblock cookie (0x%p/0x%p)\n",
nfss, nfss->fscache);

- fscache_relinquish_cookie(nfss->fscache, NULL, false);
+ fscache_relinquish_volume(nfss->fscache, NULL, false);
nfss->fscache = NULL;
-
- if (nfss->fscache_key) {
- spin_lock(&nfs_fscache_keys_lock);
- rb_erase(&nfss->fscache_key->node, &nfs_fscache_keys);
- spin_unlock(&nfs_fscache_keys_lock);
- kfree(nfss->fscache_key);
- nfss->fscache_key = NULL;
- }
-}
-
-static void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *auxdata,
- struct nfs_inode *nfsi)
-{
- memset(auxdata, 0, sizeof(*auxdata));
- auxdata->mtime_sec = nfsi->vfs_inode.i_mtime.tv_sec;
- auxdata->mtime_nsec = nfsi->vfs_inode.i_mtime.tv_nsec;
- auxdata->ctime_sec = nfsi->vfs_inode.i_ctime.tv_sec;
- auxdata->ctime_nsec = nfsi->vfs_inode.i_ctime.tv_nsec;
-
- if (NFS_SERVER(&nfsi->vfs_inode)->nfs_client->rpc_ops->version == 4)
- auxdata->change_attr = inode_peek_iversion_raw(&nfsi->vfs_inode);
+ kfree(nfss->fscache_uniq);
}

/*
@@ -254,10 +176,12 @@ void nfs_fscache_init_inode(struct inode *inode)
nfs_fscache_update_auxdata(&auxdata, nfsi);

nfsi->fscache = fscache_acquire_cookie(NFS_SB(inode->i_sb)->fscache,
- &nfs_fscache_inode_object_def,
- nfsi->fh.data, nfsi->fh.size,
- &auxdata, sizeof(auxdata),
- nfsi, nfsi->vfs_inode.i_size, false);
+ 0,
+ nfsi->fh.data, /* index_key */
+ nfsi->fh.size,
+ &auxdata, /* aux_data */
+ sizeof(auxdata),
+ i_size_read(&nfsi->vfs_inode));
}

/*
@@ -265,24 +189,15 @@ void nfs_fscache_init_inode(struct inode *inode)
*/
void nfs_fscache_clear_inode(struct inode *inode)
{
- struct nfs_fscache_inode_auxdata auxdata;
struct nfs_inode *nfsi = NFS_I(inode);
struct fscache_cookie *cookie = nfs_i_fscache(inode);

dfprintk(FSCACHE, "NFS: clear cookie (0x%p/0x%p)\n", nfsi, cookie);

- nfs_fscache_update_auxdata(&auxdata, nfsi);
- fscache_relinquish_cookie(cookie, &auxdata, false);
+ fscache_relinquish_cookie(cookie, false);
nfsi->fscache = NULL;
}

-static bool nfs_fscache_can_enable(void *data)
-{
- struct inode *inode = data;
-
- return !inode_is_open_for_write(inode);
-}
-
/*
* Enable or disable caching for a file that is being opened as appropriate.
* The cookie is allocated when the inode is initialised, but is not enabled at
@@ -307,93 +222,31 @@ void nfs_fscache_open_file(struct inode *inode, struct file *filp)
struct nfs_fscache_inode_auxdata auxdata;
struct nfs_inode *nfsi = NFS_I(inode);
struct fscache_cookie *cookie = nfs_i_fscache(inode);
+ bool open_for_write = inode_is_open_for_write(inode);

if (!fscache_cookie_valid(cookie))
return;

- nfs_fscache_update_auxdata(&auxdata, nfsi);
-
- if (inode_is_open_for_write(inode)) {
+ fscache_use_cookie(cookie, open_for_write);
+ if (open_for_write) {
dfprintk(FSCACHE, "NFS: nfsi 0x%p disabling cache\n", nfsi);
- clear_bit(NFS_INO_FSCACHE, &nfsi->flags);
- fscache_disable_cookie(cookie, &auxdata, true);
- fscache_uncache_all_inode_pages(cookie, inode);
- } else {
- dfprintk(FSCACHE, "NFS: nfsi 0x%p enabling cache\n", nfsi);
- fscache_enable_cookie(cookie, &auxdata, nfsi->vfs_inode.i_size,
- nfs_fscache_can_enable, inode);
- if (fscache_cookie_enabled(cookie))
- set_bit(NFS_INO_FSCACHE, &NFS_I(inode)->flags);
+ nfs_fscache_update_auxdata(&auxdata, nfsi);
+ fscache_invalidate(cookie, &auxdata, i_size_read(inode),
+ FSCACHE_INVAL_DIO_WRITE);
}
}
EXPORT_SYMBOL_GPL(nfs_fscache_open_file);

-/*
- * Release the caching state associated with a page, if the page isn't busy
- * interacting with the cache.
- * - Returns true (can release page) or false (page busy).
- */
-int nfs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- if (PageFsCache(page)) {
- struct fscache_cookie *cookie = nfs_i_fscache(page->mapping->host);
-
- BUG_ON(!cookie);
- dfprintk(FSCACHE, "NFS: fscache releasepage (0x%p/0x%p/0x%p)\n",
- cookie, page, NFS_I(page->mapping->host));
-
- if (!fscache_maybe_release_page(cookie, page, gfp))
- return 0;
-
- nfs_inc_fscache_stats(page->mapping->host,
- NFSIOS_FSCACHE_PAGES_UNCACHED);
- }
-
- return 1;
-}
-
-/*
- * Release the caching state associated with a page if undergoing complete page
- * invalidation.
- */
-void __nfs_fscache_invalidate_page(struct page *page, struct inode *inode)
+void nfs_fscache_release_file(struct inode *inode, struct file *filp)
{
+ struct nfs_fscache_inode_auxdata auxdata;
+ struct nfs_inode *nfsi = NFS_I(inode);
struct fscache_cookie *cookie = nfs_i_fscache(inode);

- BUG_ON(!cookie);
-
- dfprintk(FSCACHE, "NFS: fscache invalidatepage (0x%p/0x%p/0x%p)\n",
- cookie, page, NFS_I(inode));
-
- fscache_wait_on_page_write(cookie, page);
-
- BUG_ON(!PageLocked(page));
- fscache_uncache_page(cookie, page);
- nfs_inc_fscache_stats(page->mapping->host,
- NFSIOS_FSCACHE_PAGES_UNCACHED);
-}
-
-/*
- * Handle completion of a page being read from the cache.
- * - Called in process (keventd) context.
- */
-static void nfs_readpage_from_fscache_complete(struct page *page,
- void *context,
- int error)
-{
- dfprintk(FSCACHE,
- "NFS: readpage_from_fscache_complete (0x%p/0x%p/%d)\n",
- page, context, error);
-
- /*
- * If the read completes with an error, mark the page with PG_checked,
- * unlock the page, and let the VM reissue the readpage.
- */
- if (!error)
- SetPageUptodate(page);
- else
- SetPageChecked(page);
- unlock_page(page);
+ if (fscache_cookie_valid(cookie)) {
+ nfs_fscache_update_auxdata(&auxdata, nfsi);
+ fscache_unuse_cookie(cookie, &auxdata, NULL);
+ }
}

/*
@@ -402,8 +255,6 @@ static void nfs_readpage_from_fscache_complete(struct page *page,
int __nfs_readpage_from_fscache(struct nfs_open_context *ctx,
struct inode *inode, struct page *page)
{
- int ret;
-
dfprintk(FSCACHE,
"NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
nfs_i_fscache(inode), page, page->index, page->flags, inode);
@@ -413,31 +264,7 @@ int __nfs_readpage_from_fscache(struct nfs_open_context *ctx,
return 1;
}

- ret = fscache_read_or_alloc_page(nfs_i_fscache(inode),
- page,
- nfs_readpage_from_fscache_complete,
- ctx,
- GFP_KERNEL);
-
- switch (ret) {
- case 0: /* read BIO submitted (page in fscache) */
- dfprintk(FSCACHE,
- "NFS: readpage_from_fscache: BIO submitted\n");
- nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_OK);
- return ret;
-
- case -ENOBUFS: /* inode not in cache */
- case -ENODATA: /* page not in cache */
- nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_FAIL);
- dfprintk(FSCACHE,
- "NFS: readpage_from_fscache %d\n", ret);
- return 1;
-
- default:
- dfprintk(FSCACHE, "NFS: readpage_from_fscache %d\n", ret);
- nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_FAIL);
- }
- return ret;
+ return -ENOBUFS; // TODO: Use netfslib
}

/*
@@ -449,45 +276,10 @@ int __nfs_readpages_from_fscache(struct nfs_open_context *ctx,
struct list_head *pages,
unsigned *nr_pages)
{
- unsigned npages = *nr_pages;
- int ret;
-
dfprintk(FSCACHE, "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
- nfs_i_fscache(inode), npages, inode);
-
- ret = fscache_read_or_alloc_pages(nfs_i_fscache(inode),
- mapping, pages, nr_pages,
- nfs_readpage_from_fscache_complete,
- ctx,
- mapping_gfp_mask(mapping));
- if (*nr_pages < npages)
- nfs_add_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_OK,
- npages);
- if (*nr_pages > 0)
- nfs_add_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_FAIL,
- *nr_pages);
-
- switch (ret) {
- case 0: /* read submitted to the cache for all pages */
- BUG_ON(!list_empty(pages));
- BUG_ON(*nr_pages != 0);
- dfprintk(FSCACHE,
- "NFS: nfs_getpages_from_fscache: submitted\n");
-
- return ret;
-
- case -ENOBUFS: /* some pages aren't cached and can't be */
- case -ENODATA: /* some pages aren't cached */
- dfprintk(FSCACHE,
- "NFS: nfs_getpages_from_fscache: no page: %d\n", ret);
- return 1;
+ nfs_i_fscache(inode), *nr_pages, inode);

- default:
- dfprintk(FSCACHE,
- "NFS: nfs_getpages_from_fscache: ret %d\n", ret);
- }
-
- return ret;
+ return -ENOBUFS; // TODO: Use netfslib
}

/*
@@ -496,25 +288,9 @@ int __nfs_readpages_from_fscache(struct nfs_open_context *ctx,
*/
void __nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync)
{
- int ret;
-
dfprintk(FSCACHE,
"NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
nfs_i_fscache(inode), page, page->index, page->flags, sync);

- ret = fscache_write_page(nfs_i_fscache(inode), page,
- inode->i_size, GFP_KERNEL);
- dfprintk(FSCACHE,
- "NFS: readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
- page, page->index, page->flags, ret);
-
- if (ret != 0) {
- fscache_uncache_page(nfs_i_fscache(inode), page);
- nfs_inc_fscache_stats(inode,
- NFSIOS_FSCACHE_PAGES_WRITTEN_FAIL);
- nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_UNCACHED);
- } else {
- nfs_inc_fscache_stats(inode,
- NFSIOS_FSCACHE_PAGES_WRITTEN_OK);
- }
+ return; // TODO: Use netfslib
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 6754c8607230..0fa267243d26 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -12,46 +12,10 @@
#include <linux/nfs_mount.h>
#include <linux/nfs4_mount.h>
#include <linux/fscache.h>
+#include <linux/iversion.h>

#ifdef CONFIG_NFS_FSCACHE

-/*
- * set of NFS FS-Cache objects that form a superblock key
- */
-struct nfs_fscache_key {
- struct rb_node node;
- struct nfs_client *nfs_client; /* the server */
-
- /* the elements of the unique key - as used by nfs_compare_super() and
- * nfs_compare_mount_options() to distinguish superblocks */
- struct {
- struct {
- unsigned long s_flags; /* various flags
- * (& NFS_MS_MASK) */
- } super;
-
- struct {
- struct nfs_fsid fsid;
- int flags;
- unsigned int rsize; /* read size */
- unsigned int wsize; /* write size */
- unsigned int acregmin; /* attr cache timeouts */
- unsigned int acregmax;
- unsigned int acdirmin;
- unsigned int acdirmax;
- } nfs_server;
-
- struct {
- rpc_authflavor_t au_flavor;
- } rpc_auth;
-
- /* uniquifier - can be used if nfs_server.flags includes
- * NFS_MOUNT_UNSHARED */
- u8 uniq_len;
- char uniquifier[0];
- } key;
-};
-
/*
* Definition of the auxiliary data attached to NFS inode storage objects
* within the cache.
@@ -69,32 +33,18 @@ struct nfs_fscache_inode_auxdata {
u64 change_attr;
};

-/*
- * fscache-index.c
- */
-extern struct fscache_netfs nfs_fscache_netfs;
-extern const struct fscache_cookie_def nfs_fscache_server_index_def;
-extern const struct fscache_cookie_def nfs_fscache_super_index_def;
-extern const struct fscache_cookie_def nfs_fscache_inode_object_def;
-
-extern int nfs_fscache_register(void);
-extern void nfs_fscache_unregister(void);
-
/*
* fscache.c
*/
-extern void nfs_fscache_get_client_cookie(struct nfs_client *);
-extern void nfs_fscache_release_client_cookie(struct nfs_client *);
-
-extern void nfs_fscache_get_super_cookie(struct super_block *, const char *, int);
+extern int nfs_fscache_get_super_cookie(struct super_block *, const char *, int);
extern void nfs_fscache_release_super_cookie(struct super_block *);

extern void nfs_fscache_init_inode(struct inode *);
extern void nfs_fscache_clear_inode(struct inode *);
extern void nfs_fscache_open_file(struct inode *, struct file *);
+extern void nfs_fscache_release_file(struct inode *, struct file *);

extern void __nfs_fscache_invalidate_page(struct page *, struct inode *);
-extern int nfs_fscache_release_page(struct page *, gfp_t);

extern int __nfs_readpage_from_fscache(struct nfs_open_context *,
struct inode *, struct page *);
@@ -103,25 +53,17 @@ extern int __nfs_readpages_from_fscache(struct nfs_open_context *,
struct list_head *, unsigned *);
extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int);

-/*
- * wait for a page to complete writing to the cache
- */
-static inline void nfs_fscache_wait_on_page_write(struct nfs_inode *nfsi,
- struct page *page)
-{
- if (PageFsCache(page))
- fscache_wait_on_page_write(nfsi->fscache, page);
-}
-
-/*
- * release the caching state associated with a page if undergoing complete page
- * invalidation
- */
-static inline void nfs_fscache_invalidate_page(struct page *page,
- struct inode *inode)
+static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
- if (PageFsCache(page))
- __nfs_fscache_invalidate_page(page, inode);
+ if (PageFsCache(page)) {
+ if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ return false;
+ wait_on_page_fscache(page);
+ fscache_note_page_release(nfs_i_fscache(page->mapping->host));
+ nfs_inc_fscache_stats(page->mapping->host,
+ NFSIOS_FSCACHE_PAGES_UNCACHED);
+ }
+ return true;
}

/*
@@ -163,20 +105,32 @@ static inline void nfs_readpage_to_fscache(struct inode *inode,
__nfs_readpage_to_fscache(inode, page, sync);
}

-/*
- * Invalidate the contents of fscache for this inode. This will not sleep.
- */
-static inline void nfs_fscache_invalidate(struct inode *inode)
+static inline void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *auxdata,
+ struct nfs_inode *nfsi)
{
- fscache_invalidate(NFS_I(inode)->fscache);
+ memset(auxdata, 0, sizeof(*auxdata));
+ auxdata->mtime_sec = nfsi->vfs_inode.i_mtime.tv_sec;
+ auxdata->mtime_nsec = nfsi->vfs_inode.i_mtime.tv_nsec;
+ auxdata->ctime_sec = nfsi->vfs_inode.i_ctime.tv_sec;
+ auxdata->ctime_nsec = nfsi->vfs_inode.i_ctime.tv_nsec;
+
+ if (NFS_SERVER(&nfsi->vfs_inode)->nfs_client->rpc_ops->version == 4)
+ auxdata->change_attr = inode_peek_iversion_raw(&nfsi->vfs_inode);
}

/*
- * Wait for an object to finish being invalidated.
+ * Invalidate the contents of fscache for this inode. This will not sleep.
*/
-static inline void nfs_fscache_wait_on_invalidate(struct inode *inode)
+static inline void nfs_fscache_invalidate(struct inode *inode, int flags)
{
- fscache_wait_on_invalidate(NFS_I(inode)->fscache);
+ struct nfs_fscache_inode_auxdata auxdata;
+ struct nfs_inode *nfsi = NFS_I(inode);
+
+ if (nfsi->fscache) {
+ nfs_fscache_update_auxdata(&auxdata, nfsi);
+ fscache_invalidate(nfsi->fscache, &auxdata,
+ i_size_read(&nfsi->vfs_inode), flags);
+ }
}

/*
@@ -190,28 +144,18 @@ static inline const char *nfs_server_fscache_state(struct nfs_server *server)
}

#else /* CONFIG_NFS_FSCACHE */
-static inline int nfs_fscache_register(void) { return 0; }
-static inline void nfs_fscache_unregister(void) {}
-
-static inline void nfs_fscache_get_client_cookie(struct nfs_client *clp) {}
-static inline void nfs_fscache_release_client_cookie(struct nfs_client *clp) {}
-
static inline void nfs_fscache_release_super_cookie(struct super_block *sb) {}

static inline void nfs_fscache_init_inode(struct inode *inode) {}
static inline void nfs_fscache_clear_inode(struct inode *inode) {}
static inline void nfs_fscache_open_file(struct inode *inode,
struct file *filp) {}
+static inline void nfs_fscache_release_file(struct inode *inode, struct file *file) {}

static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
return 1; /* True: may release page */
}
-static inline void nfs_fscache_invalidate_page(struct page *page,
- struct inode *inode) {}
-static inline void nfs_fscache_wait_on_page_write(struct nfs_inode *nfsi,
- struct page *page) {}
-
static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
struct inode *inode,
struct page *page)
@@ -230,8 +174,7 @@ static inline void nfs_readpage_to_fscache(struct inode *inode,
struct page *page, int sync) {}


-static inline void nfs_fscache_invalidate(struct inode *inode) {}
-static inline void nfs_fscache_wait_on_invalidate(struct inode *inode) {}
+static inline void nfs_fscache_invalidate(struct inode *inode, int flags) {}

static inline const char *nfs_server_fscache_state(struct nfs_server *server)
{
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index fda530d5e764..a918c3a834b6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -209,7 +209,7 @@ void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
if (!nfs_has_xattr_cache(nfsi))
flags &= ~NFS_INO_INVALID_XATTR;
if (flags & NFS_INO_INVALID_DATA)
- nfs_fscache_invalidate(inode);
+ nfs_fscache_invalidate(inode, 0);
flags &= ~(NFS_INO_REVAL_PAGECACHE | NFS_INO_REVAL_FORCED);

nfsi->cache_validity |= flags;
@@ -1289,6 +1289,7 @@ static int nfs_invalidate_mapping(struct inode *inode, struct address_space *map
{
int ret;

+ nfs_fscache_invalidate(inode, 0);
if (mapping->nrpages != 0) {
if (S_ISREG(inode->i_mode)) {
ret = nfs_sync_mapping(mapping);
@@ -1300,7 +1301,6 @@ static int nfs_invalidate_mapping(struct inode *inode, struct address_space *map
return ret;
}
nfs_inc_stats(inode, NFSIOS_DATAINVALIDATE);
- nfs_fscache_wait_on_invalidate(inode);

dfprintk(PAGECACHE, "NFS: (%s/%Lu) data cache invalidated\n",
inode->i_sb->s_id,
@@ -2374,10 +2374,6 @@ static int __init init_nfs_fs(void)
if (err < 0)
goto out9;

- err = nfs_fscache_register();
- if (err < 0)
- goto out8;
-
err = nfsiod_start();
if (err)
goto out7;
@@ -2429,8 +2425,6 @@ static int __init init_nfs_fs(void)
out6:
nfsiod_stop();
out7:
- nfs_fscache_unregister();
-out8:
unregister_pernet_subsys(&nfs_net_ops);
out9:
nfs_sysfs_exit();
@@ -2445,7 +2439,6 @@ static void __exit exit_nfs_fs(void)
nfs_destroy_readpagecache();
nfs_destroy_inodecache();
nfs_destroy_nfspagecache();
- nfs_fscache_unregister();
unregister_pernet_subsys(&nfs_net_ops);
rpc_proc_unregister(&init_net, "nfs");
unregister_nfs_fs();
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index b3aee261801e..317ce27bdc4b 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -42,7 +42,6 @@
{ BIT(NFS_INO_ACL_LRU_SET), "ACL_LRU_SET" }, \
{ BIT(NFS_INO_INVALIDATING), "INVALIDATING" }, \
{ BIT(NFS_INO_FSCACHE), "FSCACHE" }, \
- { BIT(NFS_INO_FSCACHE_LOCK), "FSCACHE_LOCK" }, \
{ BIT(NFS_INO_LAYOUTCOMMIT), "NEED_LAYOUTCOMMIT" }, \
{ BIT(NFS_INO_LAYOUTCOMMITTING), "LAYOUTCOMMIT" }, \
{ BIT(NFS_INO_LAYOUTSTATS), "LAYOUTSTATS" }, \
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 3aced401735c..6ab5eeb000dc 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1204,42 +1204,42 @@ static int nfs_compare_super(struct super_block *sb, struct fs_context *fc)
}

#ifdef CONFIG_NFS_FSCACHE
-static void nfs_get_cache_cookie(struct super_block *sb,
- struct nfs_fs_context *ctx)
+static int nfs_get_cache_cookie(struct super_block *sb,
+ struct nfs_fs_context *ctx)
{
struct nfs_server *nfss = NFS_SB(sb);
char *uniq = NULL;
int ulen = 0;

- nfss->fscache_key = NULL;
nfss->fscache = NULL;

if (!ctx)
- return;
+ return 0;

if (ctx->clone_data.sb) {
struct nfs_server *mnt_s = NFS_SB(ctx->clone_data.sb);
if (!(mnt_s->options & NFS_OPTION_FSCACHE))
- return;
- if (mnt_s->fscache_key) {
- uniq = mnt_s->fscache_key->key.uniquifier;
- ulen = mnt_s->fscache_key->key.uniq_len;
+ return 0;
+ if (mnt_s->fscache_uniq) {
+ uniq = mnt_s->fscache_uniq;
+ ulen = strlen(uniq);
}
} else {
if (!(ctx->options & NFS_OPTION_FSCACHE))
- return;
+ return 0;
if (ctx->fscache_uniq) {
uniq = ctx->fscache_uniq;
ulen = strlen(ctx->fscache_uniq);
}
}

- nfs_fscache_get_super_cookie(sb, uniq, ulen);
+ return nfs_fscache_get_super_cookie(sb, uniq, ulen);
}
#else
-static void nfs_get_cache_cookie(struct super_block *sb,
- struct nfs_fs_context *ctx)
+static int nfs_get_cache_cookie(struct super_block *sb,
+ struct nfs_fs_context *ctx)
{
+ return 0;
}
#endif

@@ -1299,7 +1299,9 @@ int nfs_get_tree_common(struct fs_context *fc)
s->s_blocksize_bits = bsize;
s->s_blocksize = 1U << bsize;
}
- nfs_get_cache_cookie(s, ctx);
+ error = nfs_get_cache_cookie(s, ctx);
+ if (error < 0)
+ goto error_splat_super;
}

error = nfs_get_root(s, fc);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 9b7619ce17a7..2b322170372a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -294,6 +294,7 @@ static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int c
nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
out:
spin_unlock(&inode->i_lock);
+ nfs_fscache_invalidate(inode, 0);
}

/* A writeback failed: mark the page as bad, and invalidate the page cache */
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 05f249f20f55..00835bacd236 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -275,7 +275,6 @@ struct nfs4_copy_state {
#define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */
#define NFS_INO_INVALIDATING (3) /* inode is being invalidated */
#define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */
-#define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */
#define NFS_INO_FORCE_READDIR (7) /* force readdirplus */
#define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */
#define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 2a9acbfe00f0..77b2dba27bbb 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -120,11 +120,6 @@ struct nfs_client {
* This is used to generate the mv0 callback address.
*/
char cl_ipaddr[48];
-
-#ifdef CONFIG_NFS_FSCACHE
- struct fscache_cookie *fscache; /* client index cache cookie */
-#endif
-
struct net *cl_net;
struct list_head pending_cb_stateids;
};
@@ -194,8 +189,8 @@ struct nfs_server {
struct nfs_auth_info auth_info; /* parsed auth flavors */

#ifdef CONFIG_NFS_FSCACHE
- struct nfs_fscache_key *fscache_key; /* unique key for superblock */
- struct fscache_cookie *fscache; /* superblock cookie */
+ struct fscache_volume *fscache; /* superblock cookie */
+ char *fscache_uniq; /* Uniquifier (or NULL) */
#endif

u32 pnfs_blksize; /* layout_blksize attr */



2021-12-22 23:29:56

by David Howells

[permalink] [raw]
Subject: [PATCH v4 62/68] nfs: Implement cache I/O by accessing the cache directly

Move NFS to using fscache DIO API instead of the old upstream I/O API as
that has been removed. This is a stopgap solution as the intention is that
at sometime in the future, the cache will move to using larger blocks and
won't be able to store individual pages in order to deal with the potential
for data corruption due to the backing filesystem being able insert/remove
bridging blocks of zeros into its extent list[1].

NFS then reads and writes cache pages synchronously and one page at a time.

The preferred change would be to use the netfs lib, but the new I/O API can
be used directly. It's just that as the cache now needs to track data for
itself, caching blocks may exceed page size...

This code is somewhat borrowed from my "fallback I/O" patchset[2].

Changes
=======
ver #3:
- Restore lost =n fallback for nfs_fscache_release_page()[2].

Signed-off-by: David Howells <[email protected]>
cc: Dave Wysochanski <[email protected]>
cc: Trond Myklebust <[email protected]>
cc: Anna Schumaker <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected] [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [2]
Link: https://lore.kernel.org/r/163906981318.143852.17220018647843475985.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967184451.1823006.6450645559828329590.stgit@warthog.procyon.org.uk/ # v3
---

fs/fscache/io.c | 8 +++
fs/nfs/fscache.c | 126 ++++++++++++++++++++++++++++++++++++++---------
fs/nfs/fscache.h | 52 ++++---------------
fs/nfs/read.c | 25 +++------
fs/nfs/write.c | 7 ++-
include/linux/fscache.h | 28 ++++++++++
6 files changed, 163 insertions(+), 83 deletions(-)

diff --git a/fs/fscache/io.c b/fs/fscache/io.c
index bed7628a5a9d..7a769ea57720 100644
--- a/fs/fscache/io.c
+++ b/fs/fscache/io.c
@@ -150,6 +150,14 @@ int __fscache_begin_read_operation(struct netfs_cache_resources *cres,
}
EXPORT_SYMBOL(__fscache_begin_read_operation);

+int __fscache_begin_write_operation(struct netfs_cache_resources *cres,
+ struct fscache_cookie *cookie)
+{
+ return fscache_begin_operation(cres, cookie, FSCACHE_WANT_PARAMS,
+ fscache_access_io_write);
+}
+EXPORT_SYMBOL(__fscache_begin_write_operation);
+
/**
* fscache_set_page_dirty - Mark page dirty and pin a cache object for writeback
* @page: The page being dirtied
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index fac6438477a0..cfe901650ab0 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -249,48 +249,128 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
}
}

+static inline void fscache_end_operation(struct netfs_cache_resources *cres)
+{
+ const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+
+ if (ops)
+ ops->end_operation(cres);
+}
+
+/*
+ * Fallback page reading interface.
+ */
+static int fscache_fallback_read_page(struct inode *inode, struct page *page)
+{
+ struct netfs_cache_resources cres;
+ struct fscache_cookie *cookie = nfs_i_fscache(inode);
+ struct iov_iter iter;
+ struct bio_vec bvec[1];
+ int ret;
+
+ memset(&cres, 0, sizeof(cres));
+ bvec[0].bv_page = page;
+ bvec[0].bv_offset = 0;
+ bvec[0].bv_len = PAGE_SIZE;
+ iov_iter_bvec(&iter, READ, bvec, ARRAY_SIZE(bvec), PAGE_SIZE);
+
+ ret = fscache_begin_read_operation(&cres, cookie);
+ if (ret < 0)
+ return ret;
+
+ ret = fscache_read(&cres, page_offset(page), &iter, NETFS_READ_HOLE_FAIL,
+ NULL, NULL);
+ fscache_end_operation(&cres);
+ return ret;
+}
+
+/*
+ * Fallback page writing interface.
+ */
+static int fscache_fallback_write_page(struct inode *inode, struct page *page,
+ bool no_space_allocated_yet)
+{
+ struct netfs_cache_resources cres;
+ struct fscache_cookie *cookie = nfs_i_fscache(inode);
+ struct iov_iter iter;
+ struct bio_vec bvec[1];
+ loff_t start = page_offset(page);
+ size_t len = PAGE_SIZE;
+ int ret;
+
+ memset(&cres, 0, sizeof(cres));
+ bvec[0].bv_page = page;
+ bvec[0].bv_offset = 0;
+ bvec[0].bv_len = PAGE_SIZE;
+ iov_iter_bvec(&iter, WRITE, bvec, ARRAY_SIZE(bvec), PAGE_SIZE);
+
+ ret = fscache_begin_write_operation(&cres, cookie);
+ if (ret < 0)
+ return ret;
+
+ ret = cres.ops->prepare_write(&cres, &start, &len, i_size_read(inode),
+ no_space_allocated_yet);
+ if (ret == 0)
+ ret = fscache_write(&cres, page_offset(page), &iter, NULL, NULL);
+ fscache_end_operation(&cres);
+ return ret;
+}
+
/*
* Retrieve a page from fscache
*/
-int __nfs_readpage_from_fscache(struct nfs_open_context *ctx,
- struct inode *inode, struct page *page)
+int __nfs_readpage_from_fscache(struct inode *inode, struct page *page)
{
+ int ret;
+
dfprintk(FSCACHE,
"NFS: readpage_from_fscache(fsc:%p/p:%p(i:%lx f:%lx)/0x%p)\n",
nfs_i_fscache(inode), page, page->index, page->flags, inode);

if (PageChecked(page)) {
+ dfprintk(FSCACHE, "NFS: readpage_from_fscache: PageChecked\n");
ClearPageChecked(page);
return 1;
}

- return -ENOBUFS; // TODO: Use netfslib
-}
-
-/*
- * Retrieve a set of pages from fscache
- */
-int __nfs_readpages_from_fscache(struct nfs_open_context *ctx,
- struct inode *inode,
- struct address_space *mapping,
- struct list_head *pages,
- unsigned *nr_pages)
-{
- dfprintk(FSCACHE, "NFS: nfs_getpages_from_fscache (0x%p/%u/0x%p)\n",
- nfs_i_fscache(inode), *nr_pages, inode);
+ ret = fscache_fallback_read_page(inode, page);
+ if (ret < 0) {
+ nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_FAIL);
+ dfprintk(FSCACHE,
+ "NFS: readpage_from_fscache failed %d\n", ret);
+ SetPageChecked(page);
+ return ret;
+ }

- return -ENOBUFS; // TODO: Use netfslib
+ /* Read completed synchronously */
+ dfprintk(FSCACHE, "NFS: readpage_from_fscache: read successful\n");
+ nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_READ_OK);
+ SetPageUptodate(page);
+ return 0;
}

/*
- * Store a newly fetched page in fscache
- * - PG_fscache must be set on the page
+ * Store a newly fetched page in fscache. We can be certain there's no page
+ * stored in the cache as yet otherwise we would've read it from there.
*/
-void __nfs_readpage_to_fscache(struct inode *inode, struct page *page, int sync)
+void __nfs_readpage_to_fscache(struct inode *inode, struct page *page)
{
+ int ret;
+
dfprintk(FSCACHE,
- "NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx)/%d)\n",
- nfs_i_fscache(inode), page, page->index, page->flags, sync);
+ "NFS: readpage_to_fscache(fsc:%p/p:%p(i:%lx f:%lx))\n",
+ nfs_i_fscache(inode), page, page->index, page->flags);

- return; // TODO: Use netfslib
+ ret = fscache_fallback_write_page(inode, page, true);
+
+ dfprintk(FSCACHE,
+ "NFS: readpage_to_fscache: p:%p(i:%lu f:%lx) ret %d\n",
+ page, page->index, page->flags, ret);
+
+ if (ret != 0) {
+ nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_WRITTEN_FAIL);
+ nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_UNCACHED);
+ } else {
+ nfs_inc_fscache_stats(inode, NFSIOS_FSCACHE_PAGES_WRITTEN_OK);
+ }
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index 0fa267243d26..e0220fc40366 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -44,14 +44,10 @@ extern void nfs_fscache_clear_inode(struct inode *);
extern void nfs_fscache_open_file(struct inode *, struct file *);
extern void nfs_fscache_release_file(struct inode *, struct file *);

-extern void __nfs_fscache_invalidate_page(struct page *, struct inode *);
-
-extern int __nfs_readpage_from_fscache(struct nfs_open_context *,
- struct inode *, struct page *);
-extern int __nfs_readpages_from_fscache(struct nfs_open_context *,
- struct inode *, struct address_space *,
- struct list_head *, unsigned *);
-extern void __nfs_readpage_to_fscache(struct inode *, struct page *, int);
+extern int __nfs_readpage_from_fscache(struct inode *, struct page *);
+extern void __nfs_read_completion_to_fscache(struct nfs_pgio_header *hdr,
+ unsigned long bytes);
+extern void __nfs_readpage_to_fscache(struct inode *, struct page *);

static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
@@ -69,27 +65,11 @@ static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
/*
* Retrieve a page from an inode data storage object.
*/
-static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
- struct inode *inode,
+static inline int nfs_readpage_from_fscache(struct inode *inode,
struct page *page)
{
if (NFS_I(inode)->fscache)
- return __nfs_readpage_from_fscache(ctx, inode, page);
- return -ENOBUFS;
-}
-
-/*
- * Retrieve a set of pages from an inode data storage object.
- */
-static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
- struct inode *inode,
- struct address_space *mapping,
- struct list_head *pages,
- unsigned *nr_pages)
-{
- if (NFS_I(inode)->fscache)
- return __nfs_readpages_from_fscache(ctx, inode, mapping, pages,
- nr_pages);
+ return __nfs_readpage_from_fscache(inode, page);
return -ENOBUFS;
}

@@ -98,11 +78,10 @@ static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
* in the cache.
*/
static inline void nfs_readpage_to_fscache(struct inode *inode,
- struct page *page,
- int sync)
+ struct page *page)
{
- if (PageFsCache(page))
- __nfs_readpage_to_fscache(inode, page, sync);
+ if (NFS_I(inode)->fscache)
+ __nfs_readpage_to_fscache(inode, page);
}

static inline void nfs_fscache_update_auxdata(struct nfs_fscache_inode_auxdata *auxdata,
@@ -156,22 +135,13 @@ static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
return 1; /* True: may release page */
}
-static inline int nfs_readpage_from_fscache(struct nfs_open_context *ctx,
- struct inode *inode,
+static inline int nfs_readpage_from_fscache(struct inode *inode,
struct page *page)
{
return -ENOBUFS;
}
-static inline int nfs_readpages_from_fscache(struct nfs_open_context *ctx,
- struct inode *inode,
- struct address_space *mapping,
- struct list_head *pages,
- unsigned *nr_pages)
-{
- return -ENOBUFS;
-}
static inline void nfs_readpage_to_fscache(struct inode *inode,
- struct page *page, int sync) {}
+ struct page *page) {}


static inline void nfs_fscache_invalidate(struct inode *inode, int flags) {}
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index d11af2a9299c..eb00229c1a50 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -123,7 +123,7 @@ static void nfs_readpage_release(struct nfs_page *req, int error)
struct address_space *mapping = page_file_mapping(page);

if (PageUptodate(page))
- nfs_readpage_to_fscache(inode, page, 0);
+ nfs_readpage_to_fscache(inode, page);
else if (!PageError(page) && !PagePrivate(page))
generic_error_remove_page(mapping, page);
unlock_page(page);
@@ -305,6 +305,12 @@ readpage_async_filler(void *data, struct page *page)

aligned_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE);

+ if (!IS_SYNC(page->mapping->host)) {
+ error = nfs_readpage_from_fscache(page->mapping->host, page);
+ if (error == 0)
+ goto out_unlock;
+ }
+
new = nfs_create_request(desc->ctx, page, 0, aligned_len);
if (IS_ERR(new))
goto out_error;
@@ -320,6 +326,7 @@ readpage_async_filler(void *data, struct page *page)
return 0;
out_error:
error = PTR_ERR(new);
+out_unlock:
unlock_page(page);
out:
return error;
@@ -366,12 +373,6 @@ int nfs_readpage(struct file *file, struct page *page)
desc.ctx = get_nfs_open_context(nfs_file_open_context(file));

xchg(&desc.ctx->error, 0);
- if (!IS_SYNC(inode)) {
- ret = nfs_readpage_from_fscache(desc.ctx, inode, page);
- if (ret == 0)
- goto out_wait;
- }
-
nfs_pageio_init_read(&desc.pgio, inode, false,
&nfs_async_read_completion_ops);

@@ -381,7 +382,6 @@ int nfs_readpage(struct file *file, struct page *page)

nfs_pageio_complete_read(&desc.pgio);
ret = desc.pgio.pg_error < 0 ? desc.pgio.pg_error : 0;
-out_wait:
if (!ret) {
ret = wait_on_page_locked_killable(page);
if (!PageUptodate(page) && !ret)
@@ -419,14 +419,6 @@ int nfs_readpages(struct file *file, struct address_space *mapping,
} else
desc.ctx = get_nfs_open_context(nfs_file_open_context(file));

- /* attempt to read as many of the pages as possible from the cache
- * - this returns -ENOBUFS immediately if the cookie is negative
- */
- ret = nfs_readpages_from_fscache(desc.ctx, inode, mapping,
- pages, &nr_pages);
- if (ret == 0)
- goto read_complete; /* all pages were read */
-
nfs_pageio_init_read(&desc.pgio, inode, false,
&nfs_async_read_completion_ops);

@@ -434,7 +426,6 @@ int nfs_readpages(struct file *file, struct address_space *mapping,

nfs_pageio_complete_read(&desc.pgio);

-read_complete:
put_nfs_open_context(desc.ctx);
out:
trace_nfs_aop_readahead_done(inode, nr_pages, ret);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2b322170372a..987a187bd39a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -2126,8 +2126,11 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
if (PagePrivate(page))
return -EBUSY;

- if (!nfs_fscache_release_page(page, GFP_KERNEL))
- return -EBUSY;
+ if (PageFsCache(page)) {
+ if (mode == MIGRATE_ASYNC)
+ return -EBUSY;
+ wait_on_page_fscache(page);
+ }

return migrate_page(mapping, newpage, page, mode);
}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 7bd35f60d19a..ede50406bcb0 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -168,6 +168,7 @@ extern void __fscache_relinquish_cookie(struct fscache_cookie *, bool);
extern void __fscache_resize_cookie(struct fscache_cookie *, loff_t);
extern void __fscache_invalidate(struct fscache_cookie *, const void *, loff_t, unsigned int);
extern int __fscache_begin_read_operation(struct netfs_cache_resources *, struct fscache_cookie *);
+extern int __fscache_begin_write_operation(struct netfs_cache_resources *, struct fscache_cookie *);

extern void __fscache_write_to_cache(struct fscache_cookie *, struct address_space *,
loff_t, size_t, loff_t, netfs_io_terminated_t, void *,
@@ -499,6 +500,33 @@ int fscache_read(struct netfs_cache_resources *cres,
term_func, term_func_priv);
}

+/**
+ * fscache_begin_write_operation - Begin a write operation for the netfs lib
+ * @cres: The cache resources for the write being performed
+ * @cookie: The cookie representing the cache object
+ *
+ * Begin a write operation on behalf of the netfs helper library. @cres
+ * indicates the cache resources to which the operation state should be
+ * attached; @cookie indicates the cache object that will be accessed.
+ *
+ * @cres->inval_counter is set from @cookie->inval_counter for comparison at
+ * the end of the operation. This allows invalidation during the operation to
+ * be detected by the caller.
+ *
+ * Returns:
+ * * 0 - Success
+ * * -ENOBUFS - No caching available
+ * * Other error code from the cache, such as -ENOMEM.
+ */
+static inline
+int fscache_begin_write_operation(struct netfs_cache_resources *cres,
+ struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_enabled(cookie))
+ return __fscache_begin_write_operation(cres, cookie);
+ return -ENOBUFS;
+}
+
/**
* fscache_write - Start a write to the cache.
* @cres: The cache resources to use



2021-12-22 23:30:20

by David Howells

[permalink] [raw]
Subject: [PATCH v4 63/68] cifs: Support fscache indexing rewrite (untested)

Change the cifs filesystem to take account of the changes to fscache's
indexing rewrite and reenable caching in cifs.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.

(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.

For cifs, I've made it render the volume name string as:

"cifs,<ipaddress>,<sharename>"

where the sharename has '/' characters replaced with ';'.

This probably needs rethinking a bit as the total name could exceed
the maximum filename component length.

Further, the coherency data is currently just set to 0. It needs
something else doing with it - I wonder if it would suffice simply to
sum the resource_id, vol_create_time and vol_serial_number or maybe
hash them.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before.

(4) The functions to set/reset cookies are removed and
fscache_use_cookie() and fscache_unuse_cookie() are used instead.

fscache_use_cookie() is passed a flag to indicate if the cookie is
opened for writing. fscache_unuse_cookie() is passed updates for the
metadata if we changed it (ie. if the file was opened for writing).

These are called when the file is opened or closed.

(5) cifs_setattr_*() are made to call fscache_resize() to change the size
of the cache object.

(6) The functions to read and write data are stubbed out pending a
conversion to use netfslib.

Changes
=======
ver #4:
- Fixed the use of sizeof with memset.
- tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().

ver #3:
- Canonicalise the cifs coherency data to make the cache portable.
- Set volume coherency data.

ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- Upgraded to -rc4 to allow for upstream changes[1].
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <[email protected]>
cc: Steve French <[email protected]>
cc: Shyam Prasad N <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1]
Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3
---

fs/cifs/Kconfig | 2
fs/cifs/Makefile | 2
fs/cifs/cache.c | 105 ----------------
fs/cifs/cifsfs.c | 11 --
fs/cifs/cifsglob.h | 5 -
fs/cifs/connect.c | 12 --
fs/cifs/file.c | 64 +++++++---
fs/cifs/fscache.c | 333 ++++++++++++----------------------------------------
fs/cifs/fscache.h | 126 ++++++--------------
fs/cifs/inode.c | 36 ++++--
10 files changed, 191 insertions(+), 505 deletions(-)
delete mode 100644 fs/cifs/cache.c

diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
index 346ae8716deb..3b7e3b9e4fd2 100644
--- a/fs/cifs/Kconfig
+++ b/fs/cifs/Kconfig
@@ -188,7 +188,7 @@ config CIFS_SMB_DIRECT

config CIFS_FSCACHE
bool "Provide CIFS client caching support"
- depends on CIFS=m && FSCACHE_OLD_API || CIFS=y && FSCACHE_OLD_API=y
+ depends on CIFS=m && FSCACHE || CIFS=y && FSCACHE=y
help
Makes CIFS FS-Cache capable. Say Y here if you want your CIFS data
to be cached locally on disk through the general filesystem cache
diff --git a/fs/cifs/Makefile b/fs/cifs/Makefile
index 87fcacdf3de7..cc8fdcb35b71 100644
--- a/fs/cifs/Makefile
+++ b/fs/cifs/Makefile
@@ -25,7 +25,7 @@ cifs-$(CONFIG_CIFS_DFS_UPCALL) += cifs_dfs_ref.o dfs_cache.o

cifs-$(CONFIG_CIFS_SWN_UPCALL) += netlink.o cifs_swn.o

-cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o cache.o
+cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o

cifs-$(CONFIG_CIFS_SMB_DIRECT) += smbdirect.o

diff --git a/fs/cifs/cache.c b/fs/cifs/cache.c
deleted file mode 100644
index 8be57aaedab6..000000000000
--- a/fs/cifs/cache.c
+++ /dev/null
@@ -1,105 +0,0 @@
-// SPDX-License-Identifier: LGPL-2.1
-/*
- * CIFS filesystem cache index structure definitions
- *
- * Copyright (c) 2010 Novell, Inc.
- * Authors(s): Suresh Jayaraman ([email protected]>
- *
- */
-#include "fscache.h"
-#include "cifs_debug.h"
-
-/*
- * CIFS filesystem definition for FS-Cache
- */
-struct fscache_netfs cifs_fscache_netfs = {
- .name = "cifs",
- .version = 0,
-};
-
-/*
- * Register CIFS for caching with FS-Cache
- */
-int cifs_fscache_register(void)
-{
- return fscache_register_netfs(&cifs_fscache_netfs);
-}
-
-/*
- * Unregister CIFS for caching
- */
-void cifs_fscache_unregister(void)
-{
- fscache_unregister_netfs(&cifs_fscache_netfs);
-}
-
-/*
- * Server object for FS-Cache
- */
-const struct fscache_cookie_def cifs_fscache_server_index_def = {
- .name = "CIFS.server",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-static enum
-fscache_checkaux cifs_fscache_super_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct cifs_fscache_super_auxdata auxdata;
- const struct cifs_tcon *tcon = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-/*
- * Superblock object for FS-Cache
- */
-const struct fscache_cookie_def cifs_fscache_super_index_def = {
- .name = "CIFS.super",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
- .check_aux = cifs_fscache_super_check_aux,
-};
-
-static enum
-fscache_checkaux cifs_fscache_inode_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct cifs_fscache_inode_auxdata auxdata;
- struct cifsInodeInfo *cifsi = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-const struct fscache_cookie_def cifs_fscache_inode_object_def = {
- .name = "CIFS.uniqueid",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = cifs_fscache_inode_check_aux,
-};
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index dca42aa87d30..d3f3acf340f1 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -396,6 +396,8 @@ static void
cifs_evict_inode(struct inode *inode)
{
truncate_inode_pages_final(&inode->i_data);
+ if (inode->i_state & I_PINNING_FSCACHE_WB)
+ cifs_fscache_unuse_inode_cookie(inode, true);
clear_inode(inode);
}

@@ -1624,13 +1626,9 @@ init_cifs(void)
goto out_destroy_cifsoplockd_wq;
}

- rc = cifs_fscache_register();
- if (rc)
- goto out_destroy_deferredclose_wq;
-
rc = cifs_init_inodecache();
if (rc)
- goto out_unreg_fscache;
+ goto out_destroy_deferredclose_wq;

rc = cifs_init_mids();
if (rc)
@@ -1692,8 +1690,6 @@ init_cifs(void)
cifs_destroy_mids();
out_destroy_inodecache:
cifs_destroy_inodecache();
-out_unreg_fscache:
- cifs_fscache_unregister();
out_destroy_deferredclose_wq:
destroy_workqueue(deferredclose_wq);
out_destroy_cifsoplockd_wq:
@@ -1729,7 +1725,6 @@ exit_cifs(void)
cifs_destroy_request_bufs();
cifs_destroy_mids();
cifs_destroy_inodecache();
- cifs_fscache_unregister();
destroy_workqueue(deferredclose_wq);
destroy_workqueue(cifsoplockd_wq);
destroy_workqueue(decrypt_wq);
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index be74606724c7..ba6fbb1ad8f3 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -659,9 +659,6 @@ struct TCP_Server_Info {
unsigned int total_read; /* total amount of data read in this pass */
atomic_t in_send; /* requests trying to send */
atomic_t num_waiters; /* blocked waiting to get in sendrecv */
-#ifdef CONFIG_CIFS_FSCACHE
- struct fscache_cookie *fscache; /* client index cache cookie */
-#endif
#ifdef CONFIG_CIFS_STATS2
atomic_t num_cmds[NUMBER_OF_SMB2_COMMANDS]; /* total requests by cmd */
atomic_t smb2slowcmd[NUMBER_OF_SMB2_COMMANDS]; /* count resps > 1 sec */
@@ -1117,7 +1114,7 @@ struct cifs_tcon {
__u32 max_bytes_copy;
#ifdef CONFIG_CIFS_FSCACHE
u64 resource_id; /* server resource id */
- struct fscache_cookie *fscache; /* cookie for share */
+ struct fscache_volume *fscache; /* cookie for share */
#endif
struct list_head pending_opens; /* list of incomplete opens */
struct cached_fid crfid; /* Cached root fid */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 18448dbd762a..a52fd3a30c88 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1396,10 +1396,6 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect)

cifs_crypto_secmech_release(server);

- /* fscache server cookies are based on primary channel only */
- if (!CIFS_SERVER_IS_CHAN(server))
- cifs_fscache_release_client_cookie(server);
-
kfree(server->session_key.response);
server->session_key.response = NULL;
server->session_key.len = 0;
@@ -1559,14 +1555,6 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
list_add(&tcp_ses->tcp_ses_list, &cifs_tcp_ses_list);
spin_unlock(&cifs_tcp_ses_lock);

- /* fscache server cookies are based on primary channel only */
- if (!CIFS_SERVER_IS_CHAN(tcp_ses))
- cifs_fscache_get_client_cookie(tcp_ses);
-#ifdef CONFIG_CIFS_FSCACHE
- else
- tcp_ses->fscache = tcp_ses->primary_server->fscache;
-#endif /* CONFIG_CIFS_FSCACHE */
-
/* queue echo request delayed work */
queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 9fee3af83a73..22b66ce10115 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -632,7 +632,18 @@ int cifs_open(struct inode *inode, struct file *file)
goto out;
}

- cifs_fscache_set_inode_cookie(inode, file);
+
+ fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
+ file->f_mode & FMODE_WRITE);
+ if (file->f_flags & O_DIRECT &&
+ (!((file->f_flags & O_ACCMODE) != O_RDONLY) ||
+ file->f_flags & O_APPEND)) {
+ struct cifs_fscache_inode_coherency_data cd;
+ cifs_fscache_fill_coherency(file_inode(file), &cd);
+ fscache_invalidate(cifs_inode_cookie(file_inode(file)),
+ &cd, i_size_read(file_inode(file)),
+ FSCACHE_INVAL_DIO_WRITE);
+ }

if ((oplock & CIFS_CREATE_ACTION) && !posix_open_ok && tcon->unix_ext) {
/*
@@ -876,6 +887,8 @@ int cifs_close(struct inode *inode, struct file *file)
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_deferred_close *dclose;

+ cifs_fscache_unuse_inode_cookie(inode, file->f_mode & FMODE_WRITE);
+
if (file->private_data != NULL) {
cfile = file->private_data;
file->private_data = NULL;
@@ -886,7 +899,6 @@ int cifs_close(struct inode *inode, struct file *file)
dclose) {
if (test_and_clear_bit(CIFS_INO_MODIFIED_ATTR, &cinode->flags)) {
inode->i_ctime = inode->i_mtime = current_time(inode);
- cifs_fscache_update_inode_cookie(inode);
}
spin_lock(&cinode->deferred_lock);
cifs_add_deferred_close(cfile, dclose);
@@ -4198,10 +4210,12 @@ static vm_fault_t
cifs_page_mkwrite(struct vm_fault *vmf)
{
struct page *page = vmf->page;
- struct file *file = vmf->vma->vm_file;
- struct inode *inode = file_inode(file);

- cifs_fscache_wait_on_page_write(inode, page);
+#ifdef CONFIG_CIFS_FSCACHE
+ if (PageFsCache(page) &&
+ wait_on_page_fscache_killable(page) < 0)
+ return VM_FAULT_RETRY;
+#endif

lock_page(page);
return VM_FAULT_LOCKED;
@@ -4275,8 +4289,6 @@ cifs_readv_complete(struct work_struct *work)
if (rdata->result == 0 ||
(rdata->result == -EAGAIN && got_bytes))
cifs_readpage_to_fscache(rdata->mapping->host, page);
- else
- cifs_fscache_uncache_page(rdata->mapping->host, page);

got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);

@@ -4593,11 +4605,6 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
kref_put(&rdata->refcount, cifs_readdata_release);
}

- /* Any pages that have been shown to fscache but didn't get added to
- * the pagecache must be uncached before they get returned to the
- * allocator.
- */
- cifs_fscache_readpages_cancel(mapping->host, page_list);
free_xid(xid);
return rc;
}
@@ -4801,17 +4808,19 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
{
if (PagePrivate(page))
return 0;
-
- return cifs_fscache_release_page(page, gfp);
+ if (PageFsCache(page)) {
+ if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ return false;
+ wait_on_page_fscache(page);
+ }
+ fscache_note_page_release(cifs_inode_cookie(page->mapping->host));
+ return true;
}

static void cifs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length)
{
- struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);
-
- if (offset == 0 && length == PAGE_SIZE)
- cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
+ wait_on_page_fscache(page);
}

static int cifs_launder_page(struct page *page)
@@ -4831,7 +4840,7 @@ static int cifs_launder_page(struct page *page)
if (clear_page_dirty_for_io(page))
rc = cifs_writepage_locked(page, &wbc);

- cifs_fscache_invalidate_page(page, page->mapping->host);
+ wait_on_page_fscache(page);
return rc;
}

@@ -4988,6 +4997,19 @@ static void cifs_swap_deactivate(struct file *file)
/* do we need to unpin (or unlock) the file */
}

+/*
+ * Mark a page as having been made dirty and thus needing writeback. We also
+ * need to pin the cache object to write back to.
+ */
+#ifdef CONFIG_CIFS_FSCACHE
+static int cifs_set_page_dirty(struct page *page)
+{
+ return fscache_set_page_dirty(page, cifs_inode_cookie(page->mapping->host));
+}
+#else
+#define cifs_set_page_dirty __set_page_dirty_nobuffers
+#endif
+
const struct address_space_operations cifs_addr_ops = {
.readpage = cifs_readpage,
.readpages = cifs_readpages,
@@ -4995,7 +5017,7 @@ const struct address_space_operations cifs_addr_ops = {
.writepages = cifs_writepages,
.write_begin = cifs_write_begin,
.write_end = cifs_write_end,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = cifs_set_page_dirty,
.releasepage = cifs_release_page,
.direct_IO = cifs_direct_io,
.invalidatepage = cifs_invalidate_page,
@@ -5020,7 +5042,7 @@ const struct address_space_operations cifs_addr_ops_smallbuf = {
.writepages = cifs_writepages,
.write_begin = cifs_write_begin,
.write_end = cifs_write_end,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = cifs_set_page_dirty,
.releasepage = cifs_release_page,
.invalidatepage = cifs_invalidate_page,
.launder_page = cifs_launder_page,
diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c
index 003c5f1f4dfb..efaac4d5ff55 100644
--- a/fs/cifs/fscache.c
+++ b/fs/cifs/fscache.c
@@ -12,250 +12,136 @@
#include "cifs_fs_sb.h"
#include "cifsproto.h"

-/*
- * Key layout of CIFS server cache index object
- */
-struct cifs_server_key {
- __u64 conn_id;
-} __packed;
-
-/*
- * Get a cookie for a server object keyed by {IPaddress,port,family} tuple
- */
-void cifs_fscache_get_client_cookie(struct TCP_Server_Info *server)
-{
- struct cifs_server_key key;
-
- /*
- * Check if cookie was already initialized so don't reinitialize it.
- * In the future, as we integrate with newer fscache features,
- * we may want to instead add a check if cookie has changed
- */
- if (server->fscache)
- return;
-
- memset(&key, 0, sizeof(key));
- key.conn_id = server->conn_id;
-
- server->fscache =
- fscache_acquire_cookie(cifs_fscache_netfs.primary_index,
- &cifs_fscache_server_index_def,
- &key, sizeof(key),
- NULL, 0,
- server, 0, true);
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server, server->fscache);
-}
-
-void cifs_fscache_release_client_cookie(struct TCP_Server_Info *server)
+static void cifs_fscache_fill_volume_coherency(
+ struct cifs_tcon *tcon,
+ struct cifs_fscache_volume_coherency_data *cd)
{
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server, server->fscache);
- fscache_relinquish_cookie(server->fscache, NULL, false);
- server->fscache = NULL;
+ memset(cd, 0, sizeof(*cd));
+ cd->resource_id = cpu_to_le64(tcon->resource_id);
+ cd->vol_create_time = tcon->vol_create_time;
+ cd->vol_serial_number = cpu_to_le32(tcon->vol_serial_number);
}

-void cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
+int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
{
+ struct cifs_fscache_volume_coherency_data cd;
struct TCP_Server_Info *server = tcon->ses->server;
+ struct fscache_volume *vcookie;
+ const struct sockaddr *sa = (struct sockaddr *)&server->dstaddr;
+ size_t slen, i;
char *sharename;
- struct cifs_fscache_super_auxdata auxdata;
+ char *key;
+ int ret = -ENOMEM;

- /*
- * Check if cookie was already initialized so don't reinitialize it.
- * In the future, as we integrate with newer fscache features,
- * we may want to instead add a check if cookie has changed
- */
- if (tcon->fscache)
- return;
+ tcon->fscache = NULL;
+ switch (sa->sa_family) {
+ case AF_INET:
+ case AF_INET6:
+ break;
+ default:
+ cifs_dbg(VFS, "Unknown network family '%d'\n", sa->sa_family);
+ return -EINVAL;
+ }
+
+ memset(&key, 0, sizeof(key));

sharename = extract_sharename(tcon->treeName);
if (IS_ERR(sharename)) {
cifs_dbg(FYI, "%s: couldn't extract sharename\n", __func__);
- tcon->fscache = NULL;
- return;
+ return -EINVAL;
}

- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
+ slen = strlen(sharename);
+ for (i = 0; i < slen; i++)
+ if (sharename[i] == '/')
+ sharename[i] = ';';
+
+ key = kasprintf(GFP_KERNEL, "cifs,%pISpc,%s", sa, sharename);
+ if (!key)
+ goto out;
+
+ cifs_fscache_fill_volume_coherency(tcon, &cd);
+ vcookie = fscache_acquire_volume(key,
+ NULL, /* preferred_cache */
+ &cd, sizeof(cd));
+ cifs_dbg(FYI, "%s: (%s/0x%p)\n", __func__, key, vcookie);
+ if (IS_ERR(vcookie)) {
+ if (vcookie != ERR_PTR(-EBUSY)) {
+ ret = PTR_ERR(vcookie);
+ goto out_2;
+ }
+ pr_err("Cache volume key already in use (%s)\n", key);
+ vcookie = NULL;
+ }

- tcon->fscache =
- fscache_acquire_cookie(server->fscache,
- &cifs_fscache_super_index_def,
- sharename, strlen(sharename),
- &auxdata, sizeof(auxdata),
- tcon, 0, true);
+ tcon->fscache = vcookie;
+ ret = 0;
+out_2:
+ kfree(key);
+out:
kfree(sharename);
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server->fscache, tcon->fscache);
+ return ret;
}

void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon)
{
- struct cifs_fscache_super_auxdata auxdata;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
+ struct cifs_fscache_volume_coherency_data cd;

cifs_dbg(FYI, "%s: (0x%p)\n", __func__, tcon->fscache);
- fscache_relinquish_cookie(tcon->fscache, &auxdata, false);
- tcon->fscache = NULL;
-}
-
-static void cifs_fscache_acquire_inode_cookie(struct cifsInodeInfo *cifsi,
- struct cifs_tcon *tcon)
-{
- struct cifs_fscache_inode_auxdata auxdata;

- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
- cifsi->fscache =
- fscache_acquire_cookie(tcon->fscache,
- &cifs_fscache_inode_object_def,
- &cifsi->uniqueid, sizeof(cifsi->uniqueid),
- &auxdata, sizeof(auxdata),
- cifsi, cifsi->vfs_inode.i_size, true);
+ cifs_fscache_fill_volume_coherency(tcon, &cd);
+ fscache_relinquish_volume(tcon->fscache, &cd, false);
+ tcon->fscache = NULL;
}

-static void cifs_fscache_enable_inode_cookie(struct inode *inode)
+void cifs_fscache_get_inode_cookie(struct inode *inode)
{
+ struct cifs_fscache_inode_coherency_data cd;
struct cifsInodeInfo *cifsi = CIFS_I(inode);
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);

- if (cifsi->fscache)
- return;
-
- if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_FSCACHE))
- return;
-
- cifs_fscache_acquire_inode_cookie(cifsi, tcon);
+ cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd);

- cifs_dbg(FYI, "%s: got FH cookie (0x%p/0x%p)\n",
- __func__, tcon->fscache, cifsi->fscache);
+ cifsi->fscache =
+ fscache_acquire_cookie(tcon->fscache, 0,
+ &cifsi->uniqueid, sizeof(cifsi->uniqueid),
+ &cd, sizeof(cd),
+ i_size_read(&cifsi->vfs_inode));
}

-void cifs_fscache_release_inode_cookie(struct inode *inode)
+void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update)
{
- struct cifs_fscache_inode_auxdata auxdata;
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
- if (cifsi->fscache) {
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
+ if (update) {
+ struct cifs_fscache_inode_coherency_data cd;
+ loff_t i_size = i_size_read(inode);

- cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache);
- /* fscache_relinquish_cookie does not seem to update auxdata */
- fscache_update_cookie(cifsi->fscache, &auxdata);
- fscache_relinquish_cookie(cifsi->fscache, &auxdata, false);
- cifsi->fscache = NULL;
+ cifs_fscache_fill_coherency(inode, &cd);
+ fscache_unuse_cookie(cifs_inode_cookie(inode), &cd, &i_size);
+ } else {
+ fscache_unuse_cookie(cifs_inode_cookie(inode), NULL, NULL);
}
}

-void cifs_fscache_update_inode_cookie(struct inode *inode)
+void cifs_fscache_release_inode_cookie(struct inode *inode)
{
- struct cifs_fscache_inode_auxdata auxdata;
struct cifsInodeInfo *cifsi = CIFS_I(inode);

if (cifsi->fscache) {
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache);
- fscache_update_cookie(cifsi->fscache, &auxdata);
- }
-}
-
-void cifs_fscache_set_inode_cookie(struct inode *inode, struct file *filp)
-{
- cifs_fscache_enable_inode_cookie(inode);
-}
-
-void cifs_fscache_reset_inode_cookie(struct inode *inode)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
- struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
- struct fscache_cookie *old = cifsi->fscache;
-
- if (cifsi->fscache) {
- /* retire the current fscache cache and get a new one */
- fscache_relinquish_cookie(cifsi->fscache, NULL, true);
-
- cifs_fscache_acquire_inode_cookie(cifsi, tcon);
- cifs_dbg(FYI, "%s: new cookie 0x%p oldcookie 0x%p\n",
- __func__, cifsi->fscache, old);
+ fscache_relinquish_cookie(cifsi->fscache, false);
+ cifsi->fscache = NULL;
}
}

-int cifs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- if (PageFsCache(page)) {
- struct inode *inode = page->mapping->host;
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, page, cifsi->fscache);
- if (!fscache_maybe_release_page(cifsi->fscache, page, gfp))
- return 0;
- }
-
- return 1;
-}
-
-static void cifs_readpage_from_fscache_complete(struct page *page, void *ctx,
- int error)
-{
- cifs_dbg(FYI, "%s: (0x%p/%d)\n", __func__, page, error);
- if (!error)
- SetPageUptodate(page);
- unlock_page(page);
-}
-
/*
* Retrieve a page from FS-Cache
*/
int __cifs_readpage_from_fscache(struct inode *inode, struct page *page)
{
- int ret;
-
cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n",
__func__, CIFS_I(inode)->fscache, page, inode);
- ret = fscache_read_or_alloc_page(CIFS_I(inode)->fscache, page,
- cifs_readpage_from_fscache_complete,
- NULL,
- GFP_KERNEL);
- switch (ret) {
-
- case 0: /* page found in fscache, read submitted */
- cifs_dbg(FYI, "%s: submitted\n", __func__);
- return ret;
- case -ENOBUFS: /* page won't be cached */
- case -ENODATA: /* page not in cache */
- cifs_dbg(FYI, "%s: %d\n", __func__, ret);
- return 1;
-
- default:
- cifs_dbg(VFS, "unknown error ret = %d\n", ret);
- }
- return ret;
+ return -ENOBUFS; // Needs conversion to using netfslib
}

/*
@@ -266,78 +152,19 @@ int __cifs_readpages_from_fscache(struct inode *inode,
struct list_head *pages,
unsigned *nr_pages)
{
- int ret;
-
cifs_dbg(FYI, "%s: (0x%p/%u/0x%p)\n",
__func__, CIFS_I(inode)->fscache, *nr_pages, inode);
- ret = fscache_read_or_alloc_pages(CIFS_I(inode)->fscache, mapping,
- pages, nr_pages,
- cifs_readpage_from_fscache_complete,
- NULL,
- mapping_gfp_mask(mapping));
- switch (ret) {
- case 0: /* read submitted to the cache for all pages */
- cifs_dbg(FYI, "%s: submitted\n", __func__);
- return ret;
-
- case -ENOBUFS: /* some pages are not cached and can't be */
- case -ENODATA: /* some pages are not cached */
- cifs_dbg(FYI, "%s: no page\n", __func__);
- return 1;
-
- default:
- cifs_dbg(FYI, "unknown error ret = %d\n", ret);
- }
-
- return ret;
+ return -ENOBUFS; // Needs conversion to using netfslib
}

void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
{
struct cifsInodeInfo *cifsi = CIFS_I(inode);
- int ret;

WARN_ON(!cifsi->fscache);

cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n",
__func__, cifsi->fscache, page, inode);
- ret = fscache_write_page(cifsi->fscache, page,
- cifsi->vfs_inode.i_size, GFP_KERNEL);
- if (ret != 0)
- fscache_uncache_page(cifsi->fscache, page);
-}
-
-void __cifs_fscache_readpages_cancel(struct inode *inode, struct list_head *pages)
-{
- cifs_dbg(FYI, "%s: (fsc: %p, i: %p)\n",
- __func__, CIFS_I(inode)->fscache, inode);
- fscache_readpages_cancel(CIFS_I(inode)->fscache, pages);
-}
-
-void __cifs_fscache_invalidate_page(struct page *page, struct inode *inode)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_wait_on_page_write(cookie, page);
- fscache_uncache_page(cookie, page);
-}
-
-void __cifs_fscache_wait_on_page_write(struct inode *inode, struct page *page)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_wait_on_page_write(cookie, page);
-}
-
-void __cifs_fscache_uncache_page(struct inode *inode, struct page *page)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;

- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_uncache_page(cookie, page);
+ // Needs conversion to using netfslib
}
diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h
index 9baa1d0f22bd..0fc3f9252c84 100644
--- a/fs/cifs/fscache.h
+++ b/fs/cifs/fscache.h
@@ -13,84 +13,62 @@

#include "cifsglob.h"

-#ifdef CONFIG_CIFS_FSCACHE
-
/*
- * Auxiliary data attached to CIFS superblock within the cache
+ * Coherency data attached to CIFS volume within the cache
*/
-struct cifs_fscache_super_auxdata {
- u64 resource_id; /* unique server resource id */
+struct cifs_fscache_volume_coherency_data {
+ __le64 resource_id; /* unique server resource id */
__le64 vol_create_time;
- u32 vol_serial_number;
+ __le32 vol_serial_number;
} __packed;

/*
- * Auxiliary data attached to CIFS inode within the cache
+ * Coherency data attached to CIFS inode within the cache.
*/
-struct cifs_fscache_inode_auxdata {
- u64 last_write_time_sec;
- u64 last_change_time_sec;
- u32 last_write_time_nsec;
- u32 last_change_time_nsec;
- u64 eof;
+struct cifs_fscache_inode_coherency_data {
+ __le64 last_write_time_sec;
+ __le64 last_change_time_sec;
+ __le32 last_write_time_nsec;
+ __le32 last_change_time_nsec;
};

-/*
- * cache.c
- */
-extern struct fscache_netfs cifs_fscache_netfs;
-extern const struct fscache_cookie_def cifs_fscache_server_index_def;
-extern const struct fscache_cookie_def cifs_fscache_super_index_def;
-extern const struct fscache_cookie_def cifs_fscache_inode_object_def;
-
-extern int cifs_fscache_register(void);
-extern void cifs_fscache_unregister(void);
+#ifdef CONFIG_CIFS_FSCACHE

/*
* fscache.c
*/
-extern void cifs_fscache_get_client_cookie(struct TCP_Server_Info *);
-extern void cifs_fscache_release_client_cookie(struct TCP_Server_Info *);
-extern void cifs_fscache_get_super_cookie(struct cifs_tcon *);
+extern int cifs_fscache_get_super_cookie(struct cifs_tcon *);
extern void cifs_fscache_release_super_cookie(struct cifs_tcon *);

+extern void cifs_fscache_get_inode_cookie(struct inode *);
extern void cifs_fscache_release_inode_cookie(struct inode *);
-extern void cifs_fscache_update_inode_cookie(struct inode *inode);
-extern void cifs_fscache_set_inode_cookie(struct inode *, struct file *);
-extern void cifs_fscache_reset_inode_cookie(struct inode *);
+extern void cifs_fscache_unuse_inode_cookie(struct inode *, bool);
+
+static inline
+void cifs_fscache_fill_coherency(struct inode *inode,
+ struct cifs_fscache_inode_coherency_data *cd)
+{
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+ memset(cd, 0, sizeof(*cd));
+ cd->last_write_time_sec = cpu_to_le64(cifsi->vfs_inode.i_mtime.tv_sec);
+ cd->last_write_time_nsec = cpu_to_le32(cifsi->vfs_inode.i_mtime.tv_nsec);
+ cd->last_change_time_sec = cpu_to_le64(cifsi->vfs_inode.i_ctime.tv_sec);
+ cd->last_change_time_nsec = cpu_to_le32(cifsi->vfs_inode.i_ctime.tv_nsec);
+}
+

-extern void __cifs_fscache_invalidate_page(struct page *, struct inode *);
-extern void __cifs_fscache_wait_on_page_write(struct inode *inode, struct page *page);
-extern void __cifs_fscache_uncache_page(struct inode *inode, struct page *page);
extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
extern int __cifs_readpage_from_fscache(struct inode *, struct page *);
extern int __cifs_readpages_from_fscache(struct inode *,
struct address_space *,
struct list_head *,
unsigned *);
-extern void __cifs_fscache_readpages_cancel(struct inode *, struct list_head *);
-
extern void __cifs_readpage_to_fscache(struct inode *, struct page *);

-static inline void cifs_fscache_invalidate_page(struct page *page,
- struct inode *inode)
+static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode)
{
- if (PageFsCache(page))
- __cifs_fscache_invalidate_page(page, inode);
-}
-
-static inline void cifs_fscache_wait_on_page_write(struct inode *inode,
- struct page *page)
-{
- if (PageFsCache(page))
- __cifs_fscache_wait_on_page_write(inode, page);
-}
-
-static inline void cifs_fscache_uncache_page(struct inode *inode,
- struct page *page)
-{
- if (PageFsCache(page))
- __cifs_fscache_uncache_page(inode, page);
+ return CIFS_I(inode)->fscache;
}

static inline int cifs_readpage_from_fscache(struct inode *inode,
@@ -120,41 +98,20 @@ static inline void cifs_readpage_to_fscache(struct inode *inode,
__cifs_readpage_to_fscache(inode, page);
}

-static inline void cifs_fscache_readpages_cancel(struct inode *inode,
- struct list_head *pages)
+#else /* CONFIG_CIFS_FSCACHE */
+static inline
+void cifs_fscache_fill_coherency(struct inode *inode,
+ struct cifs_fscache_inode_coherency_data *cd)
{
- if (CIFS_I(inode)->fscache)
- return __cifs_fscache_readpages_cancel(inode, pages);
}

-#else /* CONFIG_CIFS_FSCACHE */
-static inline int cifs_fscache_register(void) { return 0; }
-static inline void cifs_fscache_unregister(void) {}
-
-static inline void
-cifs_fscache_get_client_cookie(struct TCP_Server_Info *server) {}
-static inline void
-cifs_fscache_release_client_cookie(struct TCP_Server_Info *server) {}
-static inline void cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) {}
-static inline void
-cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) {}
+static inline int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) { return 0; }
+static inline void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) {}

+static inline void cifs_fscache_get_inode_cookie(struct inode *inode) {}
static inline void cifs_fscache_release_inode_cookie(struct inode *inode) {}
-static inline void cifs_fscache_update_inode_cookie(struct inode *inode) {}
-static inline void cifs_fscache_set_inode_cookie(struct inode *inode,
- struct file *filp) {}
-static inline void cifs_fscache_reset_inode_cookie(struct inode *inode) {}
-static inline int cifs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- return 1; /* May release page */
-}
-
-static inline void cifs_fscache_invalidate_page(struct page *page,
- struct inode *inode) {}
-static inline void cifs_fscache_wait_on_page_write(struct inode *inode,
- struct page *page) {}
-static inline void cifs_fscache_uncache_page(struct inode *inode,
- struct page *page) {}
+static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) {}
+static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; }

static inline int
cifs_readpage_from_fscache(struct inode *inode, struct page *page)
@@ -173,11 +130,6 @@ static inline int cifs_readpages_from_fscache(struct inode *inode,
static inline void cifs_readpage_to_fscache(struct inode *inode,
struct page *page) {}

-static inline void cifs_fscache_readpages_cancel(struct inode *inode,
- struct list_head *pages)
-{
-}
-
#endif /* CONFIG_CIFS_FSCACHE */

#endif /* _CIFS_FSCACHE_H */
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 96d083db1737..ad0a30edfcd8 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1298,10 +1298,7 @@ cifs_iget(struct super_block *sb, struct cifs_fattr *fattr)
inode->i_flags |= S_NOATIME | S_NOCMTIME;
if (inode->i_state & I_NEW) {
inode->i_ino = hash;
-#ifdef CONFIG_CIFS_FSCACHE
- /* initialize per-inode cache cookie pointer */
- CIFS_I(inode)->fscache = NULL;
-#endif
+ cifs_fscache_get_inode_cookie(inode);
unlock_new_inode(inode);
}
}
@@ -1376,12 +1373,18 @@ struct inode *cifs_root_iget(struct super_block *sb)
inode = ERR_PTR(rc);
}

- /*
- * The cookie is initialized from volume info returned above.
- * Inside cifs_fscache_get_super_cookie it checks
- * that we do not get super cookie twice.
- */
- cifs_fscache_get_super_cookie(tcon);
+ if (!rc) {
+ /*
+ * The cookie is initialized from volume info returned above.
+ * Inside cifs_fscache_get_super_cookie it checks
+ * that we do not get super cookie twice.
+ */
+ rc = cifs_fscache_get_super_cookie(tcon);
+ if (rc < 0) {
+ iget_failed(inode);
+ inode = ERR_PTR(rc);
+ }
+ }

out:
kfree(path);
@@ -2270,6 +2273,8 @@ cifs_dentry_needs_reval(struct dentry *dentry)
int
cifs_invalidate_mapping(struct inode *inode)
{
+ struct cifs_fscache_inode_coherency_data cd;
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
int rc = 0;

if (inode->i_mapping && inode->i_mapping->nrpages != 0) {
@@ -2279,7 +2284,8 @@ cifs_invalidate_mapping(struct inode *inode)
__func__, inode);
}

- cifs_fscache_reset_inode_cookie(inode);
+ cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd);
+ fscache_invalidate(cifs_inode_cookie(inode), &cd, i_size_read(inode), 0);
return rc;
}

@@ -2784,8 +2790,10 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs)
goto out;

if ((attrs->ia_valid & ATTR_SIZE) &&
- attrs->ia_size != i_size_read(inode))
+ attrs->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attrs->ia_size);
+ fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
+ }

setattr_copy(&init_user_ns, inode, attrs);
mark_inode_dirty(inode);
@@ -2980,8 +2988,10 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
goto cifs_setattr_exit;

if ((attrs->ia_valid & ATTR_SIZE) &&
- attrs->ia_size != i_size_read(inode))
+ attrs->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attrs->ia_size);
+ fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
+ }

setattr_copy(&init_user_ns, inode, attrs);
mark_inode_dirty(inode);



2021-12-22 23:30:52

by David Howells

[permalink] [raw]
Subject: [PATCH v4 64/68] ceph: conversion to new fscache API

From: Jeff Layton <[email protected]>

Now that the fscache API has been reworked and simplified, change ceph
over to use it.

With the old API, we would only instantiate a cookie when the file was
open for reads. Change it to instantiate the cookie when the inode is
instantiated and call use/unuse when the file is opened/closed.

Also, ensure we resize the cached data on truncates, and invalidate the
cache in response to the appropriate events. This will allow us to
plumb in write support later.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/163906984277.143852.14697110691303589000.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967188351.1823006.5065634844099079351.stgit@warthog.procyon.org.uk/ # v3
---

fs/ceph/Kconfig | 2 -
fs/ceph/addr.c | 34 +++++----
fs/ceph/cache.c | 218 +++++++++++++++----------------------------------------
fs/ceph/cache.h | 97 +++++++++++++++++-------
fs/ceph/caps.c | 3 +
fs/ceph/file.c | 13 +++
fs/ceph/inode.c | 22 ++++--
fs/ceph/super.c | 10 ---
fs/ceph/super.h | 3 -
9 files changed, 178 insertions(+), 224 deletions(-)

diff --git a/fs/ceph/Kconfig b/fs/ceph/Kconfig
index 61f123356c3e..94df854147d3 100644
--- a/fs/ceph/Kconfig
+++ b/fs/ceph/Kconfig
@@ -21,7 +21,7 @@ config CEPH_FS
if CEPH_FS
config CEPH_FSCACHE
bool "Enable Ceph client caching support"
- depends on CEPH_FS=m && FSCACHE_OLD_API || CEPH_FS=y && FSCACHE_OLD_API=y
+ depends on CEPH_FS=m && FSCACHE || CEPH_FS=y && FSCACHE=y
help
Choose Y here to enable persistent, read-only local
caching support for Ceph clients using FS-Cache
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e53c8541f5b2..0ffc4c8d7c10 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -126,7 +126,7 @@ static int ceph_set_page_dirty(struct page *page)
BUG_ON(PagePrivate(page));
attach_page_private(page, snapc);

- return __set_page_dirty_nobuffers(page);
+ return ceph_fscache_set_page_dirty(page);
}

/*
@@ -141,8 +141,6 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
struct ceph_inode_info *ci;
struct ceph_snap_context *snapc;

- wait_on_page_fscache(page);
-
inode = page->mapping->host;
ci = ceph_inode(inode);

@@ -153,28 +151,36 @@ static void ceph_invalidatepage(struct page *page, unsigned int offset,
}

WARN_ON(!PageLocked(page));
- if (!PagePrivate(page))
- return;
+ if (PagePrivate(page)) {
+ dout("%p invalidatepage %p idx %lu full dirty page\n",
+ inode, page, page->index);

- dout("%p invalidatepage %p idx %lu full dirty page\n",
- inode, page, page->index);
+ snapc = detach_page_private(page);
+ ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
+ ceph_put_snap_context(snapc);
+ }

- snapc = detach_page_private(page);
- ceph_put_wrbuffer_cap_refs(ci, 1, snapc);
- ceph_put_snap_context(snapc);
+ wait_on_page_fscache(page);
}

static int ceph_releasepage(struct page *page, gfp_t gfp)
{
- dout("%p releasepage %p idx %lu (%sdirty)\n", page->mapping->host,
- page, page->index, PageDirty(page) ? "" : "not ");
+ struct inode *inode = page->mapping->host;
+
+ dout("%llx:%llx releasepage %p idx %lu (%sdirty)\n",
+ ceph_vinop(inode), page,
+ page->index, PageDirty(page) ? "" : "not ");
+
+ if (PagePrivate(page))
+ return 0;

if (PageFsCache(page)) {
- if (!(gfp & __GFP_DIRECT_RECLAIM) || !(gfp & __GFP_FS))
+ if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
return 0;
wait_on_page_fscache(page);
}
- return !PagePrivate(page);
+ ceph_fscache_note_page_release(inode);
+ return 1;
}

static void ceph_netfs_expand_readahead(struct netfs_read_request *rreq)
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 457afda5498a..7d22850623ef 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -12,199 +12,99 @@
#include "super.h"
#include "cache.h"

-struct fscache_netfs ceph_cache_netfs = {
- .name = "ceph",
- .version = 0,
-};
-
-static DEFINE_MUTEX(ceph_fscache_lock);
-static LIST_HEAD(ceph_fscache_list);
-
-struct ceph_fscache_entry {
- struct list_head list;
- struct fscache_cookie *fscache;
- size_t uniq_len;
- /* The following members must be last */
- struct ceph_fsid fsid;
- char uniquifier[];
-};
-
-static const struct fscache_cookie_def ceph_fscache_fsid_object_def = {
- .name = "CEPH.fsid",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-int __init ceph_fscache_register(void)
-{
- return fscache_register_netfs(&ceph_cache_netfs);
-}
-
-void ceph_fscache_unregister(void)
-{
- fscache_unregister_netfs(&ceph_cache_netfs);
-}
-
-int ceph_fscache_register_fs(struct ceph_fs_client* fsc, struct fs_context *fc)
+void ceph_fscache_register_inode_cookie(struct inode *inode)
{
- const struct ceph_fsid *fsid = &fsc->client->fsid;
- const char *fscache_uniq = fsc->mount_options->fscache_uniq;
- size_t uniq_len = fscache_uniq ? strlen(fscache_uniq) : 0;
- struct ceph_fscache_entry *ent;
- int err = 0;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);

- mutex_lock(&ceph_fscache_lock);
- list_for_each_entry(ent, &ceph_fscache_list, list) {
- if (memcmp(&ent->fsid, fsid, sizeof(*fsid)))
- continue;
- if (ent->uniq_len != uniq_len)
- continue;
- if (uniq_len && memcmp(ent->uniquifier, fscache_uniq, uniq_len))
- continue;
-
- errorfc(fc, "fscache cookie already registered for fsid %pU, use fsc=<uniquifier> option",
- fsid);
- err = -EBUSY;
- goto out_unlock;
- }
+ /* No caching for filesystem? */
+ if (!fsc->fscache)
+ return;

- ent = kzalloc(sizeof(*ent) + uniq_len, GFP_KERNEL);
- if (!ent) {
- err = -ENOMEM;
- goto out_unlock;
- }
+ /* Regular files only */
+ if (!S_ISREG(inode->i_mode))
+ return;

- memcpy(&ent->fsid, fsid, sizeof(*fsid));
- if (uniq_len > 0) {
- memcpy(&ent->uniquifier, fscache_uniq, uniq_len);
- ent->uniq_len = uniq_len;
- }
+ /* Only new inodes! */
+ if (!(inode->i_state & I_NEW))
+ return;

- fsc->fscache = fscache_acquire_cookie(ceph_cache_netfs.primary_index,
- &ceph_fscache_fsid_object_def,
- &ent->fsid, sizeof(ent->fsid) + uniq_len,
- NULL, 0,
- fsc, 0, true);
+ WARN_ON_ONCE(ci->fscache);

- if (fsc->fscache) {
- ent->fscache = fsc->fscache;
- list_add_tail(&ent->list, &ceph_fscache_list);
- } else {
- kfree(ent);
- errorfc(fc, "unable to register fscache cookie for fsid %pU",
- fsid);
- /* all other fs ignore this error */
- }
-out_unlock:
- mutex_unlock(&ceph_fscache_lock);
- return err;
+ ci->fscache = fscache_acquire_cookie(fsc->fscache, 0,
+ &ci->i_vino, sizeof(ci->i_vino),
+ &ci->i_version, sizeof(ci->i_version),
+ i_size_read(inode));
}

-static enum fscache_checkaux ceph_fscache_inode_check_aux(
- void *cookie_netfs_data, const void *data, uint16_t dlen,
- loff_t object_size)
+void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
{
- struct ceph_inode_info* ci = cookie_netfs_data;
- struct inode* inode = &ci->vfs_inode;
+ struct fscache_cookie *cookie = ci->fscache;

- if (dlen != sizeof(ci->i_version) ||
- i_size_read(inode) != object_size)
- return FSCACHE_CHECKAUX_OBSOLETE;
+ fscache_relinquish_cookie(cookie, false);
+}

- if (*(u64 *)data != ci->i_version)
- return FSCACHE_CHECKAUX_OBSOLETE;
+void ceph_fscache_use_cookie(struct inode *inode, bool will_modify)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);

- dout("ceph inode 0x%p cached okay\n", ci);
- return FSCACHE_CHECKAUX_OKAY;
+ fscache_use_cookie(ci->fscache, will_modify);
}

-static const struct fscache_cookie_def ceph_fscache_inode_object_def = {
- .name = "CEPH.inode",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = ceph_fscache_inode_check_aux,
-};
-
-void ceph_fscache_register_inode_cookie(struct inode *inode)
+void ceph_fscache_unuse_cookie(struct inode *inode, bool update)
{
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
-
- /* No caching for filesystem */
- if (!fsc->fscache)
- return;

- /* Only cache for regular files that are read only */
- if (!S_ISREG(inode->i_mode))
- return;
+ if (update) {
+ loff_t i_size = i_size_read(inode);

- inode_lock_nested(inode, I_MUTEX_CHILD);
- if (!ci->fscache) {
- ci->fscache = fscache_acquire_cookie(fsc->fscache,
- &ceph_fscache_inode_object_def,
- &ci->i_vino, sizeof(ci->i_vino),
- &ci->i_version, sizeof(ci->i_version),
- ci, i_size_read(inode), false);
+ fscache_unuse_cookie(ci->fscache, &ci->i_version, &i_size);
+ } else {
+ fscache_unuse_cookie(ci->fscache, NULL, NULL);
}
- inode_unlock(inode);
}

-void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+void ceph_fscache_update(struct inode *inode)
{
- struct fscache_cookie* cookie;
-
- if ((cookie = ci->fscache) == NULL)
- return;
-
- ci->fscache = NULL;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ loff_t i_size = i_size_read(inode);

- fscache_relinquish_cookie(cookie, &ci->i_vino, false);
+ fscache_update_cookie(ci->fscache, &ci->i_version, &i_size);
}

-static bool ceph_fscache_can_enable(void *data)
+void ceph_fscache_invalidate(struct inode *inode, bool dio_write)
{
- struct inode *inode = data;
- return !inode_is_open_for_write(inode);
+ struct ceph_inode_info *ci = ceph_inode(inode);
+
+ fscache_invalidate(ceph_inode(inode)->fscache,
+ &ci->i_version, i_size_read(inode),
+ dio_write ? FSCACHE_INVAL_DIO_WRITE : 0);
}

-void ceph_fscache_file_set_cookie(struct inode *inode, struct file *filp)
+int ceph_fscache_register_fs(struct ceph_fs_client* fsc, struct fs_context *fc)
{
- struct ceph_inode_info *ci = ceph_inode(inode);
+ const struct ceph_fsid *fsid = &fsc->client->fsid;
+ const char *fscache_uniq = fsc->mount_options->fscache_uniq;
+ size_t uniq_len = fscache_uniq ? strlen(fscache_uniq) : 0;
+ char *name;
+ int err = 0;

- if (!fscache_cookie_valid(ci->fscache))
- return;
+ name = kasprintf(GFP_KERNEL, "ceph,%pU%s%s", fsid, uniq_len ? "," : "",
+ uniq_len ? fscache_uniq : "");
+ if (!name)
+ return -ENOMEM;

- if (inode_is_open_for_write(inode)) {
- dout("fscache_file_set_cookie %p %p disabling cache\n",
- inode, filp);
- fscache_disable_cookie(ci->fscache, &ci->i_vino, false);
- } else {
- fscache_enable_cookie(ci->fscache, &ci->i_vino, i_size_read(inode),
- ceph_fscache_can_enable, inode);
- if (fscache_cookie_enabled(ci->fscache)) {
- dout("fscache_file_set_cookie %p %p enabling cache\n",
- inode, filp);
- }
+ fsc->fscache = fscache_acquire_volume(name, NULL, NULL, 0);
+ if (IS_ERR_OR_NULL(fsc->fscache)) {
+ errorfc(fc, "Unable to register fscache cookie for %s", name);
+ err = fsc->fscache ? PTR_ERR(fsc->fscache) : -EOPNOTSUPP;
+ fsc->fscache = NULL;
}
+ kfree(name);
+ return err;
}

void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc)
{
- if (fscache_cookie_valid(fsc->fscache)) {
- struct ceph_fscache_entry *ent;
- bool found = false;
-
- mutex_lock(&ceph_fscache_lock);
- list_for_each_entry(ent, &ceph_fscache_list, list) {
- if (ent->fscache == fsc->fscache) {
- list_del(&ent->list);
- kfree(ent);
- found = true;
- break;
- }
- }
- WARN_ON_ONCE(!found);
- mutex_unlock(&ceph_fscache_lock);
-
- __fscache_relinquish_cookie(fsc->fscache, NULL, false);
- }
- fsc->fscache = NULL;
+ fscache_relinquish_volume(fsc->fscache, NULL, false);
}
diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h
index 058ea2a04376..09164389fa66 100644
--- a/fs/ceph/cache.h
+++ b/fs/ceph/cache.h
@@ -12,19 +12,19 @@
#include <linux/netfs.h>

#ifdef CONFIG_CEPH_FSCACHE
-
-extern struct fscache_netfs ceph_cache_netfs;
-
-int ceph_fscache_register(void);
-void ceph_fscache_unregister(void);
+#include <linux/fscache.h>

int ceph_fscache_register_fs(struct ceph_fs_client* fsc, struct fs_context *fc);
void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc);

void ceph_fscache_register_inode_cookie(struct inode *inode);
void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci);
-void ceph_fscache_file_set_cookie(struct inode *inode, struct file *filp);
-void ceph_fscache_revalidate_cookie(struct ceph_inode_info *ci);
+
+void ceph_fscache_use_cookie(struct inode *inode, bool will_modify);
+void ceph_fscache_unuse_cookie(struct inode *inode, bool update);
+
+void ceph_fscache_update(struct inode *inode);
+void ceph_fscache_invalidate(struct inode *inode, bool dio_write);

static inline void ceph_fscache_inode_init(struct ceph_inode_info *ci)
{
@@ -36,37 +36,51 @@ static inline struct fscache_cookie *ceph_fscache_cookie(struct ceph_inode_info
return ci->fscache;
}

-static inline void ceph_fscache_invalidate(struct inode *inode)
+static inline void ceph_fscache_resize(struct inode *inode, loff_t to)
{
- fscache_invalidate(ceph_inode(inode)->fscache);
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
+
+ if (cookie) {
+ ceph_fscache_use_cookie(inode, true);
+ fscache_resize_cookie(cookie, to);
+ ceph_fscache_unuse_cookie(inode, true);
+ }
}

-static inline bool ceph_is_cache_enabled(struct inode *inode)
+static inline void ceph_fscache_unpin_writeback(struct inode *inode,
+ struct writeback_control *wbc)
{
- struct fscache_cookie *cookie = ceph_fscache_cookie(ceph_inode(inode));
+ fscache_unpin_writeback(wbc, ceph_fscache_cookie(ceph_inode(inode)));
+}
+
+static inline int ceph_fscache_set_page_dirty(struct page *page)
+{
+ struct inode *inode = page->mapping->host;
+ struct ceph_inode_info *ci = ceph_inode(inode);

- if (!cookie)
- return false;
- return fscache_cookie_enabled(cookie);
+ return fscache_set_page_dirty(page, ceph_fscache_cookie(ci));
}

static inline int ceph_begin_cache_operation(struct netfs_read_request *rreq)
{
struct fscache_cookie *cookie = ceph_fscache_cookie(ceph_inode(rreq->inode));

- return fscache_begin_read_operation(rreq, cookie);
+ return fscache_begin_read_operation(&rreq->cache_resources, cookie);
}
-#else

-static inline int ceph_fscache_register(void)
+static inline bool ceph_is_cache_enabled(struct inode *inode)
{
- return 0;
+ return fscache_cookie_enabled(ceph_fscache_cookie(ceph_inode(inode)));
}

-static inline void ceph_fscache_unregister(void)
+static inline void ceph_fscache_note_page_release(struct inode *inode)
{
-}
+ struct ceph_inode_info *ci = ceph_inode(inode);

+ fscache_note_page_release(ceph_fscache_cookie(ci));
+}
+#else /* CONFIG_CEPH_FSCACHE */
static inline int ceph_fscache_register_fs(struct ceph_fs_client* fsc,
struct fs_context *fc)
{
@@ -81,28 +95,49 @@ static inline void ceph_fscache_inode_init(struct ceph_inode_info *ci)
{
}

-static inline struct fscache_cookie *ceph_fscache_cookie(struct ceph_inode_info *ci)
+static inline void ceph_fscache_register_inode_cookie(struct inode *inode)
{
- return NULL;
}

-static inline void ceph_fscache_register_inode_cookie(struct inode *inode)
+static inline void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
{
}

-static inline void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci)
+static inline void ceph_fscache_use_cookie(struct inode *inode, bool will_modify)
{
}

-static inline void ceph_fscache_file_set_cookie(struct inode *inode,
- struct file *filp)
+static inline void ceph_fscache_unuse_cookie(struct inode *inode, bool update)
{
}

-static inline void ceph_fscache_invalidate(struct inode *inode)
+static inline void ceph_fscache_update(struct inode *inode)
{
}

+static inline void ceph_fscache_invalidate(struct inode *inode, bool dio_write)
+{
+}
+
+static inline struct fscache_cookie *ceph_fscache_cookie(struct ceph_inode_info *ci)
+{
+ return NULL;
+}
+
+static inline void ceph_fscache_resize(struct inode *inode, loff_t to)
+{
+}
+
+static inline void ceph_fscache_unpin_writeback(struct inode *inode,
+ struct writeback_control *wbc)
+{
+}
+
+static inline int ceph_fscache_set_page_dirty(struct page *page)
+{
+ return __set_page_dirty_nobuffers(page);
+}
+
static inline bool ceph_is_cache_enabled(struct inode *inode)
{
return false;
@@ -112,6 +147,10 @@ static inline int ceph_begin_cache_operation(struct netfs_read_request *rreq)
{
return -ENOBUFS;
}
-#endif

-#endif /* _CEPH_CACHE_H */
+static inline void ceph_fscache_note_page_release(struct inode *inode)
+{
+}
+#endif /* CONFIG_CEPH_FSCACHE */
+
+#endif
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index b9460b6fb76f..0bc0e6c157df 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1856,7 +1856,7 @@ static int try_nonblocking_invalidate(struct inode *inode)
u32 invalidating_gen = ci->i_rdcache_gen;

spin_unlock(&ci->i_ceph_lock);
- ceph_fscache_invalidate(inode);
+ ceph_fscache_invalidate(inode, false);
invalidate_mapping_pages(&inode->i_data, 0, -1);
spin_lock(&ci->i_ceph_lock);

@@ -2388,6 +2388,7 @@ int ceph_write_inode(struct inode *inode, struct writeback_control *wbc)
int wait = (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync);

dout("write_inode %p wait=%d\n", inode, wait);
+ ceph_fscache_unpin_writeback(inode, wbc);
if (wait) {
dirty = try_flush_caps(inode, &flush_tid);
if (dirty)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 02a0a0fd9ccd..bf1017682d09 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -248,8 +248,7 @@ static int ceph_init_file(struct inode *inode, struct file *file, int fmode)

switch (inode->i_mode & S_IFMT) {
case S_IFREG:
- ceph_fscache_register_inode_cookie(inode);
- ceph_fscache_file_set_cookie(inode, file);
+ ceph_fscache_use_cookie(inode, file->f_mode & FMODE_WRITE);
fallthrough;
case S_IFDIR:
ret = ceph_init_file_info(inode, file, fmode,
@@ -810,6 +809,7 @@ int ceph_release(struct inode *inode, struct file *file)
dout("release inode %p regular file %p\n", inode, file);
WARN_ON(!list_empty(&fi->rw_contexts));

+ ceph_fscache_unuse_cookie(inode, file->f_mode & FMODE_WRITE);
ceph_put_fmode(ci, fi->fmode, 1);

kmem_cache_free(ceph_file_cachep, fi);
@@ -1206,7 +1206,11 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
snapc, snapc ? snapc->seq : 0);

if (write) {
- int ret2 = invalidate_inode_pages2_range(inode->i_mapping,
+ int ret2;
+
+ ceph_fscache_invalidate(inode, true);
+
+ ret2 = invalidate_inode_pages2_range(inode->i_mapping,
pos >> PAGE_SHIFT,
(pos + count - 1) >> PAGE_SHIFT);
if (ret2 < 0)
@@ -1417,6 +1421,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
if (ret < 0)
return ret;

+ ceph_fscache_invalidate(inode, false);
ret = invalidate_inode_pages2_range(inode->i_mapping,
pos >> PAGE_SHIFT,
(pos + count - 1) >> PAGE_SHIFT);
@@ -2101,6 +2106,7 @@ static long ceph_fallocate(struct file *file, int mode,
goto unlock;

filemap_invalidate_lock(inode->i_mapping);
+ ceph_fscache_invalidate(inode, false);
ceph_zero_pagecache_range(inode, offset, length);
ret = ceph_zero_objects(inode, offset, length);

@@ -2425,6 +2431,7 @@ static ssize_t __ceph_copy_file_range(struct file *src_file, loff_t src_off,
goto out_caps;

/* Drop dst file cached pages */
+ ceph_fscache_invalidate(dst_inode, false);
ret = invalidate_inode_pages2_range(dst_inode->i_mapping,
dst_off >> PAGE_SHIFT,
(dst_off + len) >> PAGE_SHIFT);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index e3322fcb2e8d..ef4a980a7bf3 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -564,6 +564,8 @@ void ceph_evict_inode(struct inode *inode)
percpu_counter_dec(&mdsc->metric.total_inodes);

truncate_inode_pages_final(&inode->i_data);
+ if (inode->i_state & I_PINNING_FSCACHE_WB)
+ ceph_fscache_unuse_cookie(inode, true);
clear_inode(inode);

ceph_fscache_unregister_inode_cookie(ci);
@@ -634,6 +636,12 @@ int ceph_fill_file_size(struct inode *inode, int issued,
}
i_size_write(inode, size);
inode->i_blocks = calc_inode_blocks(size);
+ /*
+ * If we're expanding, then we should be able to just update
+ * the existing cookie.
+ */
+ if (size > isize)
+ ceph_fscache_update(inode);
ci->i_reported_size = size;
if (truncate_seq != ci->i_truncate_seq) {
dout("truncate_seq %u -> %u\n",
@@ -666,10 +674,6 @@ int ceph_fill_file_size(struct inode *inode, int issued,
truncate_size);
ci->i_truncate_size = truncate_size;
}
-
- if (queue_trunc)
- ceph_fscache_invalidate(inode);
-
return queue_trunc;
}

@@ -1053,6 +1057,8 @@ int ceph_fill_inode(struct inode *inode, struct page *locked_page,

spin_unlock(&ci->i_ceph_lock);

+ ceph_fscache_register_inode_cookie(inode);
+
if (fill_inline)
ceph_fill_inline_data(inode, locked_page,
iinfo->inline_data, iinfo->inline_len);
@@ -1814,11 +1820,13 @@ bool ceph_inode_set_size(struct inode *inode, loff_t size)
spin_lock(&ci->i_ceph_lock);
dout("set_size %p %llu -> %llu\n", inode, i_size_read(inode), size);
i_size_write(inode, size);
+ ceph_fscache_update(inode);
inode->i_blocks = calc_inode_blocks(size);

ret = __ceph_should_report_size(ci);

spin_unlock(&ci->i_ceph_lock);
+
return ret;
}

@@ -1844,6 +1852,8 @@ static void ceph_do_invalidate_pages(struct inode *inode)
u32 orig_gen;
int check = 0;

+ ceph_fscache_invalidate(inode, false);
+
mutex_lock(&ci->i_truncate_mutex);

if (ceph_inode_is_shutdown(inode)) {
@@ -1868,7 +1878,7 @@ static void ceph_do_invalidate_pages(struct inode *inode)
orig_gen = ci->i_rdcache_gen;
spin_unlock(&ci->i_ceph_lock);

- ceph_fscache_invalidate(inode);
+ ceph_fscache_invalidate(inode, false);
if (invalidate_inode_pages2(inode->i_mapping) < 0) {
pr_err("invalidate_inode_pages2 %llx.%llx failed\n",
ceph_vinop(inode));
@@ -1937,6 +1947,7 @@ void __ceph_do_pending_vmtruncate(struct inode *inode)
ci->i_truncate_pending, to);
spin_unlock(&ci->i_ceph_lock);

+ ceph_fscache_resize(inode, to);
truncate_pagecache(inode, to);

spin_lock(&ci->i_ceph_lock);
@@ -2184,7 +2195,6 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr)
if (inode_dirty_flags)
__mark_inode_dirty(inode, inode_dirty_flags);

-
if (mask) {
req->r_inode = inode;
ihold(inode);
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index bab61232dc5a..bea89bdb534a 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -787,16 +787,10 @@ static int __init init_caches(void)
if (!ceph_wb_pagevec_pool)
goto bad_pagevec_pool;

- error = ceph_fscache_register();
- if (error)
- goto bad_fscache;
-
return 0;

-bad_fscache:
- kmem_cache_destroy(ceph_mds_request_cachep);
bad_pagevec_pool:
- mempool_destroy(ceph_wb_pagevec_pool);
+ kmem_cache_destroy(ceph_mds_request_cachep);
bad_mds_req:
kmem_cache_destroy(ceph_dir_file_cachep);
bad_dir_file:
@@ -828,8 +822,6 @@ static void destroy_caches(void)
kmem_cache_destroy(ceph_dir_file_cachep);
kmem_cache_destroy(ceph_mds_request_cachep);
mempool_destroy(ceph_wb_pagevec_pool);
-
- ceph_fscache_unregister();
}

static void __ceph_umount_begin(struct ceph_fs_client *fsc)
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index ac331aa07cfa..d0142cc5c41b 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -21,7 +21,6 @@
#include <linux/ceph/libceph.h>

#ifdef CONFIG_CEPH_FSCACHE
-#define FSCACHE_USE_NEW_IO_API
#include <linux/fscache.h>
#endif

@@ -135,7 +134,7 @@ struct ceph_fs_client {
#endif

#ifdef CONFIG_CEPH_FSCACHE
- struct fscache_cookie *fscache;
+ struct fscache_volume *fscache;
#endif
};




2021-12-22 23:31:07

by David Howells

[permalink] [raw]
Subject: [PATCH v4 65/68] ceph: add fscache writeback support

From: Jeff Layton <[email protected]>

When updating the backing store from the pagecache (a'la writepage or
writepages), write to the cache first. This allows us to keep caching
files even when they are being written, as long as we have appropriate
caps.

Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]/ # v1
Link: https://lore.kernel.org/r/[email protected]/ # v2
Link: https://lore.kernel.org/r/163906985808.143852.1383891557313186623.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967190257.1823006.16713609520911954804.stgit@warthog.procyon.org.uk/ # v3
---

fs/ceph/addr.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 59 insertions(+), 8 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 0ffc4c8d7c10..e836f8f1d4f8 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -5,7 +5,6 @@
#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
-#include <linux/writeback.h> /* generic_writepages */
#include <linux/slab.h>
#include <linux/pagevec.h>
#include <linux/task_io_accounting_ops.h>
@@ -384,6 +383,38 @@ static void ceph_readahead(struct readahead_control *ractl)
netfs_readahead(ractl, &ceph_netfs_read_ops, (void *)(uintptr_t)got);
}

+#ifdef CONFIG_CEPH_FSCACHE
+static void ceph_set_page_fscache(struct page *page)
+{
+ set_page_fscache(page);
+}
+
+static void ceph_fscache_write_terminated(void *priv, ssize_t error, bool was_async)
+{
+ struct inode *inode = priv;
+
+ if (IS_ERR_VALUE(error) && error != -ENOBUFS)
+ ceph_fscache_invalidate(inode, false);
+}
+
+static void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, bool caching)
+{
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
+
+ fscache_write_to_cache(cookie, inode->i_mapping, off, len, i_size_read(inode),
+ ceph_fscache_write_terminated, inode, caching);
+}
+#else
+static inline void ceph_set_page_fscache(struct page *page)
+{
+}
+
+static inline void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, bool caching)
+{
+}
+#endif /* CONFIG_CEPH_FSCACHE */
+
struct ceph_writeback_ctl
{
loff_t i_size;
@@ -499,6 +530,7 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
struct ceph_writeback_ctl ceph_wbc;
struct ceph_osd_client *osdc = &fsc->client->osdc;
struct ceph_osd_request *req;
+ bool caching = ceph_is_cache_enabled(inode);

dout("writepage %p idx %lu\n", page, page->index);

@@ -537,16 +569,17 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
set_bdi_congested(inode_to_bdi(inode), BLK_RW_ASYNC);

- set_page_writeback(page);
req = ceph_osdc_new_request(osdc, &ci->i_layout, ceph_vino(inode), page_off, &len, 0, 1,
CEPH_OSD_OP_WRITE, CEPH_OSD_FLAG_WRITE, snapc,
ceph_wbc.truncate_seq, ceph_wbc.truncate_size,
true);
- if (IS_ERR(req)) {
- redirty_page_for_writepage(wbc, page);
- end_page_writeback(page);
+ if (IS_ERR(req))
return PTR_ERR(req);
- }
+
+ set_page_writeback(page);
+ if (caching)
+ ceph_set_page_fscache(page);
+ ceph_fscache_write_to_cache(inode, page_off, len, caching);

/* it may be a short write due to an object boundary */
WARN_ON_ONCE(len > thp_size(page));
@@ -605,6 +638,9 @@ static int ceph_writepage(struct page *page, struct writeback_control *wbc)
struct inode *inode = page->mapping->host;
BUG_ON(!inode);
ihold(inode);
+
+ wait_on_page_fscache(page);
+
err = writepage_nounlock(page, wbc);
if (err == -ERESTARTSYS) {
/* direct memory reclaimer was killed by SIGKILL. return 0
@@ -726,6 +762,7 @@ static int ceph_writepages_start(struct address_space *mapping,
struct ceph_writeback_ctl ceph_wbc;
bool should_loop, range_whole = false;
bool done = false;
+ bool caching = ceph_is_cache_enabled(inode);

dout("writepages_start %p (mode=%s)\n", inode,
wbc->sync_mode == WB_SYNC_NONE ? "NONE" :
@@ -849,7 +886,7 @@ static int ceph_writepages_start(struct address_space *mapping,
unlock_page(page);
break;
}
- if (PageWriteback(page)) {
+ if (PageWriteback(page) || PageFsCache(page)) {
if (wbc->sync_mode == WB_SYNC_NONE) {
dout("%p under writeback\n", page);
unlock_page(page);
@@ -857,6 +894,7 @@ static int ceph_writepages_start(struct address_space *mapping,
}
dout("waiting on writeback %p\n", page);
wait_on_page_writeback(page);
+ wait_on_page_fscache(page);
}

if (!clear_page_dirty_for_io(page)) {
@@ -989,9 +1027,19 @@ static int ceph_writepages_start(struct address_space *mapping,
op_idx = 0;
for (i = 0; i < locked_pages; i++) {
u64 cur_offset = page_offset(pages[i]);
+ /*
+ * Discontinuity in page range? Ceph can handle that by just passing
+ * multiple extents in the write op.
+ */
if (offset + len != cur_offset) {
+ /* If it's full, stop here */
if (op_idx + 1 == req->r_num_ops)
break;
+
+ /* Kick off an fscache write with what we have so far. */
+ ceph_fscache_write_to_cache(inode, offset, len, caching);
+
+ /* Start a new extent */
osd_req_op_extent_dup_last(req, op_idx,
cur_offset - offset);
dout("writepages got pages at %llu~%llu\n",
@@ -1002,14 +1050,17 @@ static int ceph_writepages_start(struct address_space *mapping,
osd_req_op_extent_update(req, op_idx, len);

len = 0;
- offset = cur_offset;
+ offset = cur_offset;
data_pages = pages + i;
op_idx++;
}

set_page_writeback(pages[i]);
+ if (caching)
+ ceph_set_page_fscache(pages[i]);
len += thp_size(page);
}
+ ceph_fscache_write_to_cache(inode, offset, len, caching);

if (ceph_wbc.size_stable) {
len = min(len, ceph_wbc.i_size - offset);



2021-12-22 23:31:52

by David Howells

[permalink] [raw]
Subject: [PATCH v4 67/68] fscache: Add a tracepoint for cookie use/unuse

Add a tracepoint to track fscache_use/unuse_cookie().

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
---

fs/fscache/cookie.c | 29 +++++++++++++++++++++++---
include/trace/events/fscache.h | 44 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index a7ea7d1db032..9bb1ab5fe5ed 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -556,6 +556,7 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
{
enum fscache_cookie_state state;
bool queue = false;
+ int n_active;

_enter("c=%08x", cookie->debug_id);

@@ -565,7 +566,11 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)

spin_lock(&cookie->lock);

- atomic_inc(&cookie->n_active);
+ n_active = atomic_inc_return(&cookie->n_active);
+ trace_fscache_active(cookie->debug_id, refcount_read(&cookie->ref),
+ n_active, atomic_read(&cookie->n_accesses),
+ will_modify ?
+ fscache_active_use_modify : fscache_active_use);

again:
state = fscache_cookie_state(cookie);
@@ -638,13 +643,29 @@ static void fscache_unuse_cookie_locked(struct fscache_cookie *cookie)
void __fscache_unuse_cookie(struct fscache_cookie *cookie,
const void *aux_data, const loff_t *object_size)
{
+ unsigned int debug_id = cookie->debug_id;
+ unsigned int r = refcount_read(&cookie->ref);
+ unsigned int a = atomic_read(&cookie->n_accesses);
+ unsigned int c;
+
if (aux_data || object_size)
__fscache_update_cookie(cookie, aux_data, object_size);

- if (atomic_dec_and_lock(&cookie->n_active, &cookie->lock)) {
- fscache_unuse_cookie_locked(cookie);
- spin_unlock(&cookie->lock);
+ /* Subtract 1 from counter unless that drops it to 0 (ie. it was 1) */
+ c = atomic_fetch_add_unless(&cookie->n_active, -1, 1);
+ if (c != 1) {
+ trace_fscache_active(debug_id, r, c - 1, a, fscache_active_unuse);
+ return;
}
+
+ spin_lock(&cookie->lock);
+ r = refcount_read(&cookie->ref);
+ a = atomic_read(&cookie->n_accesses);
+ c = atomic_dec_return(&cookie->n_active);
+ trace_fscache_active(debug_id, r, c, a, fscache_active_unuse);
+ if (c == 0)
+ fscache_unuse_cookie_locked(cookie);
+ spin_unlock(&cookie->lock);
}
EXPORT_SYMBOL(__fscache_unuse_cookie);

diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index 1594aefadeac..cb3fb337e880 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -71,6 +71,12 @@ enum fscache_cookie_trace {
fscache_cookie_see_work,
};

+enum fscache_active_trace {
+ fscache_active_use,
+ fscache_active_use_modify,
+ fscache_active_unuse,
+};
+
enum fscache_access_trace {
fscache_access_acquire_volume,
fscache_access_acquire_volume_end,
@@ -146,6 +152,11 @@ enum fscache_access_trace {
EM(fscache_cookie_see_withdraw, "- x-wth") \
E_(fscache_cookie_see_work, "- work ")

+#define fscache_active_traces \
+ EM(fscache_active_use, "USE ") \
+ EM(fscache_active_use_modify, "USE-m ") \
+ E_(fscache_active_unuse, "UNUSE ")
+
#define fscache_access_traces \
EM(fscache_access_acquire_volume, "BEGIN acq_vol") \
EM(fscache_access_acquire_volume_end, "END acq_vol") \
@@ -264,6 +275,39 @@ TRACE_EVENT(fscache_cookie,
__entry->ref)
);

+TRACE_EVENT(fscache_active,
+ TP_PROTO(unsigned int cookie_debug_id,
+ int ref,
+ int n_active,
+ int n_accesses,
+ enum fscache_active_trace why),
+
+ TP_ARGS(cookie_debug_id, ref, n_active, n_accesses, why),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, cookie )
+ __field(int, ref )
+ __field(int, n_active )
+ __field(int, n_accesses )
+ __field(enum fscache_active_trace, why )
+ ),
+
+ TP_fast_assign(
+ __entry->cookie = cookie_debug_id;
+ __entry->ref = ref;
+ __entry->n_active = n_active;
+ __entry->n_accesses = n_accesses;
+ __entry->why = why;
+ ),
+
+ TP_printk("c=%08x %s r=%d a=%d c=%d",
+ __entry->cookie,
+ __print_symbolic(__entry->why, fscache_active_traces),
+ __entry->ref,
+ __entry->n_accesses,
+ __entry->n_active)
+ );
+
TRACE_EVENT(fscache_access_cache,
TP_PROTO(unsigned int cache_debug_id,
int ref,



2021-12-22 23:32:06

by David Howells

[permalink] [raw]
Subject: [PATCH v4 68/68] 9p, afs, ceph, cifs, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()

In 9p, afs ceph, cifs and nfs, gfpflags_allow_blocking() (which wraps a
test for __GFP_DIRECT_RECLAIM being set) is used to determine if
->releasepage() should wait for the completion of a DIO write to fscache
with something like:

if (folio_test_fscache(folio)) {
if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
return false;
folio_wait_fscache(folio);
}

Instead, current_is_kswapd() should be used instead.

Note that this is based on a patch originally by Zhaoyang Huang[1]. In
addition to extending it to the other network filesystems and putting it on
top of my fscache rewrite, it also needs to include linux/swap.h in a bunch
of places. Can current_is_kswapd() be moved to linux/mm.h?

Originally-signed-off-by: Zhaoyang Huang <[email protected]>
Co-developed-by: David Howells <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Zhaoyang Huang <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: Marc Dionne <[email protected]>
cc: Jeff Layton <[email protected]>
cc: Steve French <[email protected]>
cc: Trond Myklebust <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]/ [1]
---

fs/9p/vfs_addr.c | 3 ++-
fs/afs/file.c | 3 ++-
fs/ceph/addr.c | 3 ++-
fs/cifs/file.c | 2 +-
fs/nfs/fscache.h | 3 ++-
5 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index f3f349f460e5..c72e9f8f5f32 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -16,6 +16,7 @@
#include <linux/pagemap.h>
#include <linux/idr.h>
#include <linux/sched.h>
+#include <linux/swap.h>
#include <linux/uio.h>
#include <linux/netfs.h>
#include <net/9p/9p.h>
@@ -143,7 +144,7 @@ static int v9fs_release_page(struct page *page, gfp_t gfp)
return 0;
#ifdef CONFIG_9P_FSCACHE
if (folio_test_fscache(folio)) {
- if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ if (current_is_kswapd() || !(gfp & __GFP_FS))
return 0;
folio_wait_fscache(folio);
}
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 572063dad0b3..5b98db127a1b 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -14,6 +14,7 @@
#include <linux/gfp.h>
#include <linux/task_io_accounting_ops.h>
#include <linux/mm.h>
+#include <linux/swap.h>
#include <linux/netfs.h>
#include "internal.h"

@@ -517,7 +518,7 @@ static int afs_releasepage(struct page *page, gfp_t gfp)
* elected to wait */
#ifdef CONFIG_AFS_FSCACHE
if (folio_test_fscache(folio)) {
- if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ if (current_is_kswapd() || !(gfp & __GFP_FS))
return false;
folio_wait_fscache(folio);
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e836f8f1d4f8..b3d9459c9bbd 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -4,6 +4,7 @@
#include <linux/backing-dev.h>
#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/swap.h>
#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/pagevec.h>
@@ -174,7 +175,7 @@ static int ceph_releasepage(struct page *page, gfp_t gfp)
return 0;

if (PageFsCache(page)) {
- if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ if (current_is_kswapd() || !(gfp & __GFP_FS))
return 0;
wait_on_page_fscache(page);
}
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 22b66ce10115..d872f6fe8e7d 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4809,7 +4809,7 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
if (PagePrivate(page))
return 0;
if (PageFsCache(page)) {
- if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ if (current_is_kswapd() || !(gfp & __GFP_FS))
return false;
wait_on_page_fscache(page);
}
diff --git a/fs/nfs/fscache.h b/fs/nfs/fscache.h
index e0220fc40366..25a5c0f82392 100644
--- a/fs/nfs/fscache.h
+++ b/fs/nfs/fscache.h
@@ -8,6 +8,7 @@
#ifndef _NFS_FSCACHE_H
#define _NFS_FSCACHE_H

+#include <linux/swap.h>
#include <linux/nfs_fs.h>
#include <linux/nfs_mount.h>
#include <linux/nfs4_mount.h>
@@ -52,7 +53,7 @@ extern void __nfs_readpage_to_fscache(struct inode *, struct page *);
static inline int nfs_fscache_release_page(struct page *page, gfp_t gfp)
{
if (PageFsCache(page)) {
- if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ if (current_is_kswapd() || !(gfp & __GFP_FS))
return false;
wait_on_page_fscache(page);
fscache_note_page_release(nfs_i_fscache(page->mapping->host));



2022-01-04 10:50:28

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 00/68] fscache, cachefiles: Rewrite

On Wed, 2021-12-22 at 23:13 +0000, David Howells wrote:
> Here's a set of patches implements a rewrite of the fscache driver and a
> matching rewrite of the cachefiles driver, significantly simplifying the
> code compared to what's upstream, removing the complex operation scheduling
> and object state machine in favour of something much smaller and simpler.
>
> The patchset is structured such that the first few patches disable fscache
> use by the network filesystems using it, remove the cachefiles driver
> entirely and as much of the fscache driver as can be got away with without
> causing build failures in the network filesystems. The patches after that
> recreate fscache and then cachefiles, attempting to add the pieces in a
> logical order. Finally, the filesystems are reenabled and then the very
> last patch changes the documentation.
>
>
> WHY REWRITE?
> ============
>
> Fscache's operation scheduling API was intended to handle sequencing of
> cache operations, which were all required (where possible) to run
> asynchronously in parallel with the operations being done by the network
> filesystem, whilst allowing the cache to be brought online and offline and
> to interrupt service for invalidation.
>
> With the advent of the tmpfile capacity in the VFS, however, an opportunity
> arises to do invalidation much more simply, without having to wait for I/O
> that's actually in progress: Cachefiles can simply create a tmpfile, cut
> over the file pointer for the backing object attached to a cookie and
> abandon the in-progress I/O, dismissing it upon completion.
>
> Future work here would involve using Omar Sandoval's vfs_link() with
> AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard
> link from a tmpfile as currently I have to unlink the old file first.
>
> These patches can also simplify the object state handling as I/O operations
> to the cache don't all have to be brought to a stop in order to invalidate
> a file. To that end, and with an eye on to writing a new backing cache
> model in the future, I've taken the opportunity to simplify the indexing
> structure.
>
> I've separated the index cookie concept from the file cookie concept by C
> type now. The former is now called a "volume cookie" (struct
> fscache_volume) and there is a container of file cookies. There are then
> just the two levels. All the index cookie levels are collapsed into a
> single volume cookie, and this has a single printable string as a key. For
> instance, an AFS volume would have a key of something like
> "afs,example.com,1000555", combining the filesystem name, cell name and
> volume ID. This is freeform, but must not have '/' chars in it.
>
> I've also eliminated all pointers back from fscache into the network
> filesystem. This required the duplication of a little bit of data in the
> cookie (cookie key, coherency data and file size), but it's not actually
> that much. This gets rid of problems with making sure we keep netfs data
> structures around so that the cache can access them.
>
> These patches mean that most of the code that was in the drivers before is
> simply gone and those drivers are now almost entirely new code. That being
> the case, there doesn't seem any particular reason to try and maintain
> bisectability across it. Further, there has to be a point in the middle
> where things are cut over as there's a single point everything has to go
> through (ie. /dev/cachefiles) and it can't be in use by two drivers at
> once.
>
>
> ISSUES YET OUTSTANDING
> ======================
>
> There are some issues still outstanding, unaddressed by this patchset, that
> will need fixing in future patchsets, but that don't stop this series from
> being usable:
>
> (1) The cachefiles driver needs to stop using the backing filesystem's
> metadata to store information about what parts of the cache are
> populated. This is not reliable with modern extent-based filesystems.
>
> Fixing this is deferred to a separate patchset as it involves
> negotiation with the network filesystem and the VM as to how much data
> to download to fulfil a read - which brings me on to (2)...
>
> (2) NFS and CIFS do not take account of how the cache would like I/O to be
> structured to meet its granularity requirements. Previously, the
> cache used page granularity, which was fine as the network filesystems
> also dealt in page granularity, and the backing filesystem (ext4, xfs
> or whatever) did whatever it did out of sight. However, we now have
> folios to deal with and the cache will now have to store its own
> metadata to track its contents.
>
> The change I'm looking at making for cachefiles is to store content
> bitmaps in one or more xattrs and making a bit in the map correspond
> to something like a 256KiB block. However, the size of an xattr and
> the fact that they have to be read/updated in one go means that I'm
> looking at covering 1GiB of data per 512-byte map and storing each map
> in an xattr. Cachefiles has the potential to grow into a fully
> fledged filesystem of its very own if I'm not careful.
>
> However, I'm also looking at changing things even more radically and
> going to a different model of how the cache is arranged and managed -
> one that's more akin to the way, say, openafs does things - which
> brings me on to (3)...
>
> (3) The way cachefilesd does culling is very inefficient for large caches
> and it would be better to move it into the kernel if I can as
> cachefilesd has to keep asking the kernel if it can cull a file.
> Changing the way the backend works would allow this to be addressed.
>
>
> BITS THAT MAY BE CONTROVERSIAL
> ==============================
>
> There are some bits I've added that may be controversial:
>
> (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if
> a files is already being used by some other kernel service (e.g. a
> duplicate cachefiles cache in the same directory) and reject it if it
> is. This isn't entirely necessary, but it helps prevent accidental
> data corruption.
>
> I don't want to use S_SWAPFILE as that has other effects, but quite
> possibly swapon() should set S_KERNEL_FILE too.
>
> Note that it doesn't prevent userspace from interfering, though
> perhaps it should. (I have made it prevent a marked directory from
> being rmdir-able).
>
> (2) Cachefiles wants to keep the backing file for a cookie open whilst we
> might need to write to it from network filesystem writeback. The
> problem is that the network filesystem unuses its cookie when its file
> is closed, and so we have nothing pinning the cachefiles file open and
> it will get closed automatically after a short time to avoid
> EMFILE/ENFILE problems.
>
> Reopening the cache file, however, is a problem if this is being done
> due to writeback triggered by exit(). Some filesystems will oops if
> we try to open a file in that context because they want to access
> current->fs or suchlike.
>
> To get around this, I added the following:
>
> (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
> filesystem inode to indicate that we have a usage count on the
> cookie caching that inode.
>
> (B) A flag in struct writeback_control, unpinned_fscache_wb, that is
> set when __writeback_single_inode() clears the last dirty page
> from i_pages - at which point it clears I_PINNING_FSCACHE_WB and
> sets this flag.
>
> This has to be done here so that clearing I_PINNING_FSCACHE_WB can
> be done atomically with the check of PAGECACHE_TAG_DIRTY that
> clears I_DIRTY_PAGES.
>
> (C) A function, fscache_set_page_dirty(), which if it is not set, sets
> I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the
> cache resources.
>
> (D) A function, fscache_unpin_writeback(), to be called by
> ->write_inode() to unuse the cookie.
>
> (E) A function, fscache_clear_inode_writeback(), to be called when the
> inode is evicted, before clear_inode() is called. This cleans up
> any lingering I_PINNING_FSCACHE_WB.
>
> The network filesystem can then use these tools to make sure that
> fscache_write_to_cache() can write locally modified data to the cache
> as well as to the server.
>
> For the future, I'm working on write helpers for netfs lib that should
> allow this facility to be removed by keeping track of the dirty
> regions separately - but that's incomplete at the moment and is also
> going to be affected by folios, one way or another, since it deals
> with pages.
>
>
> These patches can be found also on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite
>
> David
>
>
> Changes
> =======
> ver #4:
> - Dropped a pair of patches to try and cope with multipage folios in
> afs_write_begin/end() - it should really be done in the caller[7].
> - Fixed the use of sizeof with memset in cifs.
> - Removed an extraneous kdoc param.
> - Added a patch to add a tracepoint for fscache_use/unuse_cookie().
> - In cifs, tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
> - Add an expanded version of a patch to use current_is_kswapd() instead of
> !gfpflags_allow_blocking()[8].
> - Removed a couple of debugging print statements.
>
> ver #3:
> - Fixed a race in the cookie state machine between LRU discard and
> relinquishment[4].
> - Fixed up the hashing to make it portable[5].
> - Fixed up some netfs coherency data to make it portable.
> - Fixed some missing NFS_FSCACHE=n fallback functions in nfs[6].
> - Added a patch to store volume coherency data in an xattr.
> - Added a check that the cookie is unhashed before being freed.
> - Fixed fscache to use remove_proc_subtree() to remove /proc/fs/fscache/.
>
> ver #2:
> - Fix an unused-var warning due to CONFIG_9P_FSCACHE=n.
> - Use gfpflags_allow_blocking() rather than using flag directly.
> - Fixed some error logging in a couple of cachefiles functions.
> - Fixed an error check in the fscache volume allocation.
> - Need to unmark an inode we've moved to the graveyard before unlocking.
> - Upgraded to -rc4 to allow for upstream changes to cifs.
> - Should only change to inval state if can get access to cache.
> - Don't hold n_accesses elevated whilst cache is bound to a cookie, but
> rather add a flag that prevents the state machine from being queued when
> n_accesses reaches 0.
> - Remove the unused cookie pointer field from the fscache_acquire
> tracepoint.
> - Added missing transition to LRU_DISCARDING state.
> - Added two ceph patches from Jeff Layton[2].
> - Remove NFS_INO_FSCACHE as it's no longer used.
> - In NFS, need to unuse a cookie on file-release, not inode-clear.
> - Filled in the NFS cache I/O routines, borrowing from the previously posted
> fallback I/O code[3].
>
>
> Link: https://lore.kernel.org/r/[email protected]/ [1]
> Link: https://lore.kernel.org/r/[email protected]/ [2]
> Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [3]
> Link: https://lore.kernel.org/r/[email protected]/ [4]
> Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [5]
> Link: https://lore.kernel.org/r/61b90f3d.H1IkoeQfEsGNhvq9%[email protected]/ [6]
> Link: https://lore.kernel.org/r/CAHk-=wh2dr=NgVSVj0sw-gSuzhxhLRV5FymfPS146zGgF4kBjA@mail.gmail.com/ [7]
> Link: https://lore.kernel.org/r/[email protected]/ [8]
>
>
> References
> ==========
>
> These patches have been published for review before, firstly as part of a
> larger set:
>
> Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/
>
> Then as a cut-down set:
>
> Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
> Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5
>
> I split out a set to just restructure the I/O, which got merged back in to
> this one:
>
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk/ # v4
>
> ... and a larger set to do the conversion, also merged back into this one:
>
> Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk/ # v2
>
> Older versions of this one:
>
> Link: https://lore.kernel.org/r/163819575444.215744.318477214576928110.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967073889.1823006.12237147297060239168.stgit@warthog.procyon.org.uk/ # v3
>
> Proposals/information about the design have been published here:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> And requests for information:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> I've posted partial patches to try and help 9p and cifs along:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> ---
> Dave Wysochanski (1):
> nfs: Convert to new fscache volume/cookie API
>
> David Howells (65):
> fscache, cachefiles: Disable configuration
> cachefiles: Delete the cachefiles driver pending rewrite
> fscache: Remove the contents of the fscache driver, pending rewrite
> netfs: Display the netfs inode number in the netfs_read tracepoint
> netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space
> fscache: Introduce new driver
> fscache: Implement a hash function
> fscache: Implement cache registration
> fscache: Implement volume registration
> fscache: Implement cookie registration
> fscache: Implement cache-level access helpers
> fscache: Implement volume-level access helpers
> fscache: Implement cookie-level access helpers
> fscache: Implement functions add/remove a cache
> fscache: Provide and use cache methods to lookup/create/free a volume
> fscache: Add a function for a cache backend to note an I/O error
> fscache: Implement simple cookie state machine
> fscache: Implement cookie user counting and resource pinning
> fscache: Implement cookie invalidation
> fscache: Provide a means to begin an operation
> fscache: Count data storage objects in a cache
> fscache: Provide read/write stat counters for the cache
> fscache: Provide a function to let the netfs update its coherency data
> netfs: Pass more information on how to deal with a hole in the cache
> fscache: Implement raw I/O interface
> fscache: Implement higher-level write I/O interface
> vfs, fscache: Implement pinning of cache usage for writeback
> fscache: Provide a function to note the release of a page
> fscache: Provide a function to resize a cookie
> cachefiles: Introduce rewritten driver
> cachefiles: Define structs
> cachefiles: Add some error injection support
> cachefiles: Add a couple of tracepoints for logging errors
> cachefiles: Add cache error reporting macro
> cachefiles: Add security derivation
> cachefiles: Register a miscdev and parse commands over it
> cachefiles: Provide a function to check how much space there is
> vfs, cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement a function to get/create a directory in the cache
> cachefiles: Implement cache registration and withdrawal
> cachefiles: Implement volume support
> cachefiles: Add tracepoints for calls to the VFS
> cachefiles: Implement object lifecycle funcs
> cachefiles: Implement key to filename encoding
> cachefiles: Implement metadata/coherency data storage in xattrs
> cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement culling daemon commands
> cachefiles: Implement backing file wrangling
> cachefiles: Implement begin and end I/O operation
> cachefiles: Implement cookie resize for truncate
> cachefiles: Implement the I/O routines
> fscache, cachefiles: Store the volume coherency data
> cachefiles: Allow cachefiles to actually function
> fscache, cachefiles: Display stats of no-space events
> fscache, cachefiles: Display stat of culling events
> afs: Convert afs to use the new fscache API
> afs: Copy local writes to the cache when writing to the server
> afs: Skip truncation on the server of data we haven't written yet
> 9p: Use fscache indexing rewrite and reenable caching
> 9p: Copy local writes to the cache when writing to the server
> nfs: Implement cache I/O by accessing the cache directly
> cifs: Support fscache indexing rewrite (untested)
> fscache: Rewrite documentation
> fscache: Add a tracepoint for cookie use/unuse
> 9p, afs, ceph, cifs, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
>
> Jeff Layton (2):
> ceph: conversion to new fscache API
> ceph: add fscache writeback support
>
>
> .../filesystems/caching/backend-api.rst | 850 ++++------
> .../filesystems/caching/cachefiles.rst | 6 +-
> Documentation/filesystems/caching/fscache.rst | 525 ++----
> Documentation/filesystems/caching/index.rst | 4 +-
> .../filesystems/caching/netfs-api.rst | 1136 ++++---------
> Documentation/filesystems/caching/object.rst | 313 ----
> .../filesystems/caching/operations.rst | 210 ---
> Documentation/filesystems/netfs_library.rst | 16 +-
> fs/9p/Kconfig | 2 +-
> fs/9p/cache.c | 195 +--
> fs/9p/cache.h | 25 +-
> fs/9p/v9fs.c | 17 +-
> fs/9p/v9fs.h | 13 +-
> fs/9p/vfs_addr.c | 57 +-
> fs/9p/vfs_dir.c | 13 +
> fs/9p/vfs_file.c | 3 +-
> fs/9p/vfs_inode.c | 26 +-
> fs/9p/vfs_inode_dotl.c | 3 +-
> fs/9p/vfs_super.c | 3 +
> fs/afs/Kconfig | 2 +-
> fs/afs/Makefile | 3 -
> fs/afs/cache.c | 68 -
> fs/afs/cell.c | 12 -
> fs/afs/file.c | 38 +-
> fs/afs/inode.c | 101 +-
> fs/afs/internal.h | 37 +-
> fs/afs/main.c | 14 -
> fs/afs/super.c | 1 +
> fs/afs/volume.c | 29 +-
> fs/afs/write.c | 88 +-
> fs/cachefiles/Kconfig | 7 +
> fs/cachefiles/Makefile | 6 +-
> fs/cachefiles/bind.c | 278 ----
> fs/cachefiles/cache.c | 378 +++++
> fs/cachefiles/daemon.c | 180 +-
> fs/cachefiles/error_inject.c | 46 +
> fs/cachefiles/interface.c | 747 ++++-----
> fs/cachefiles/internal.h | 270 +--
> fs/cachefiles/io.c | 330 +++-
> fs/cachefiles/key.c | 201 ++-
> fs/cachefiles/main.c | 22 +-
> fs/cachefiles/namei.c | 1223 ++++++--------
> fs/cachefiles/rdwr.c | 972 -----------
> fs/cachefiles/security.c | 2 +-
> fs/cachefiles/volume.c | 139 ++
> fs/cachefiles/xattr.c | 421 ++---
> fs/ceph/Kconfig | 2 +-
> fs/ceph/addr.c | 102 +-
> fs/ceph/cache.c | 218 +--
> fs/ceph/cache.h | 97 +-
> fs/ceph/caps.c | 3 +-
> fs/ceph/file.c | 13 +-
> fs/ceph/inode.c | 22 +-
> fs/ceph/super.c | 10 +-
> fs/ceph/super.h | 3 +-
> fs/cifs/Kconfig | 2 +-
> fs/cifs/Makefile | 2 +-
> fs/cifs/cache.c | 105 --
> fs/cifs/cifsfs.c | 11 +-
> fs/cifs/cifsglob.h | 5 +-
> fs/cifs/connect.c | 12 -
> fs/cifs/file.c | 64 +-
> fs/cifs/fscache.c | 333 +---
> fs/cifs/fscache.h | 126 +-
> fs/cifs/inode.c | 36 +-
> fs/fs-writeback.c | 8 +
> fs/fscache/Makefile | 6 +-
> fs/fscache/cache.c | 618 +++----
> fs/fscache/cookie.c | 1448 +++++++++--------
> fs/fscache/fsdef.c | 98 --
> fs/fscache/internal.h | 317 +---
> fs/fscache/io.c | 376 ++++-
> fs/fscache/main.c | 147 +-
> fs/fscache/netfs.c | 74 -
> fs/fscache/object.c | 1125 -------------
> fs/fscache/operation.c | 633 -------
> fs/fscache/page.c | 1242 --------------
> fs/fscache/proc.c | 47 +-
> fs/fscache/stats.c | 293 +---
> fs/fscache/volume.c | 517 ++++++
> fs/namei.c | 3 +-
> fs/netfs/read_helper.c | 10 +-
> fs/nfs/Kconfig | 2 +-
> fs/nfs/Makefile | 2 +-
> fs/nfs/client.c | 4 -
> fs/nfs/direct.c | 2 +
> fs/nfs/file.c | 13 +-
> fs/nfs/fscache-index.c | 140 --
> fs/nfs/fscache.c | 490 ++----
> fs/nfs/fscache.h | 180 +-
> fs/nfs/inode.c | 11 +-
> fs/nfs/nfstrace.h | 1 -
> fs/nfs/read.c | 25 +-
> fs/nfs/super.c | 28 +-
> fs/nfs/write.c | 8 +-
> include/linux/fs.h | 4 +
> include/linux/fscache-cache.h | 614 ++-----
> include/linux/fscache.h | 1021 +++++-------
> include/linux/netfs.h | 15 +-
> include/linux/nfs_fs.h | 1 -
> include/linux/nfs_fs_sb.h | 9 +-
> include/linux/writeback.h | 1 +
> include/trace/events/cachefiles.h | 527 ++++--
> include/trace/events/fscache.h | 642 ++++----
> include/trace/events/netfs.h | 5 +-
> 105 files changed, 7396 insertions(+), 13509 deletions(-)
> delete mode 100644 Documentation/filesystems/caching/object.rst
> delete mode 100644 Documentation/filesystems/caching/operations.rst
> delete mode 100644 fs/afs/cache.c
> delete mode 100644 fs/cachefiles/bind.c
> create mode 100644 fs/cachefiles/cache.c
> create mode 100644 fs/cachefiles/error_inject.c
> delete mode 100644 fs/cachefiles/rdwr.c
> create mode 100644 fs/cachefiles/volume.c
> delete mode 100644 fs/cifs/cache.c
> delete mode 100644 fs/fscache/fsdef.c
> delete mode 100644 fs/fscache/netfs.c
> delete mode 100644 fs/fscache/object.c
> delete mode 100644 fs/fscache/operation.c
> delete mode 100644 fs/fscache/page.c
> create mode 100644 fs/fscache/volume.c
> delete mode 100644 fs/nfs/fscache-index.c
>
>

I tested these on ceph with the caches both enabled and disabled and
everything looks good.

Tested-by: Jeff Layton <[email protected]>

2022-01-04 11:28:04

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v4 00/68] fscache, cachefiles: Rewrite

On Wed, Dec 22, 2021 at 6:13 PM David Howells <[email protected]> wrote:
>
>
> Here's a set of patches implements a rewrite of the fscache driver and a
> matching rewrite of the cachefiles driver, significantly simplifying the
> code compared to what's upstream, removing the complex operation scheduling
> and object state machine in favour of something much smaller and simpler.
>
> The patchset is structured such that the first few patches disable fscache
> use by the network filesystems using it, remove the cachefiles driver
> entirely and as much of the fscache driver as can be got away with without
> causing build failures in the network filesystems. The patches after that
> recreate fscache and then cachefiles, attempting to add the pieces in a
> logical order. Finally, the filesystems are reenabled and then the very
> last patch changes the documentation.
>
>
> WHY REWRITE?
> ============
>
> Fscache's operation scheduling API was intended to handle sequencing of
> cache operations, which were all required (where possible) to run
> asynchronously in parallel with the operations being done by the network
> filesystem, whilst allowing the cache to be brought online and offline and
> to interrupt service for invalidation.
>
> With the advent of the tmpfile capacity in the VFS, however, an opportunity
> arises to do invalidation much more simply, without having to wait for I/O
> that's actually in progress: Cachefiles can simply create a tmpfile, cut
> over the file pointer for the backing object attached to a cookie and
> abandon the in-progress I/O, dismissing it upon completion.
>
> Future work here would involve using Omar Sandoval's vfs_link() with
> AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard
> link from a tmpfile as currently I have to unlink the old file first.
>
> These patches can also simplify the object state handling as I/O operations
> to the cache don't all have to be brought to a stop in order to invalidate
> a file. To that end, and with an eye on to writing a new backing cache
> model in the future, I've taken the opportunity to simplify the indexing
> structure.
>
> I've separated the index cookie concept from the file cookie concept by C
> type now. The former is now called a "volume cookie" (struct
> fscache_volume) and there is a container of file cookies. There are then
> just the two levels. All the index cookie levels are collapsed into a
> single volume cookie, and this has a single printable string as a key. For
> instance, an AFS volume would have a key of something like
> "afs,example.com,1000555", combining the filesystem name, cell name and
> volume ID. This is freeform, but must not have '/' chars in it.
>
> I've also eliminated all pointers back from fscache into the network
> filesystem. This required the duplication of a little bit of data in the
> cookie (cookie key, coherency data and file size), but it's not actually
> that much. This gets rid of problems with making sure we keep netfs data
> structures around so that the cache can access them.
>
> These patches mean that most of the code that was in the drivers before is
> simply gone and those drivers are now almost entirely new code. That being
> the case, there doesn't seem any particular reason to try and maintain
> bisectability across it. Further, there has to be a point in the middle
> where things are cut over as there's a single point everything has to go
> through (ie. /dev/cachefiles) and it can't be in use by two drivers at
> once.
>
>
> ISSUES YET OUTSTANDING
> ======================
>
> There are some issues still outstanding, unaddressed by this patchset, that
> will need fixing in future patchsets, but that don't stop this series from
> being usable:
>
> (1) The cachefiles driver needs to stop using the backing filesystem's
> metadata to store information about what parts of the cache are
> populated. This is not reliable with modern extent-based filesystems.
>
> Fixing this is deferred to a separate patchset as it involves
> negotiation with the network filesystem and the VM as to how much data
> to download to fulfil a read - which brings me on to (2)...
>
> (2) NFS and CIFS do not take account of how the cache would like I/O to be
> structured to meet its granularity requirements. Previously, the
> cache used page granularity, which was fine as the network filesystems
> also dealt in page granularity, and the backing filesystem (ext4, xfs
> or whatever) did whatever it did out of sight. However, we now have
> folios to deal with and the cache will now have to store its own
> metadata to track its contents.
>
> The change I'm looking at making for cachefiles is to store content
> bitmaps in one or more xattrs and making a bit in the map correspond
> to something like a 256KiB block. However, the size of an xattr and
> the fact that they have to be read/updated in one go means that I'm
> looking at covering 1GiB of data per 512-byte map and storing each map
> in an xattr. Cachefiles has the potential to grow into a fully
> fledged filesystem of its very own if I'm not careful.
>
> However, I'm also looking at changing things even more radically and
> going to a different model of how the cache is arranged and managed -
> one that's more akin to the way, say, openafs does things - which
> brings me on to (3)...
>
> (3) The way cachefilesd does culling is very inefficient for large caches
> and it would be better to move it into the kernel if I can as
> cachefilesd has to keep asking the kernel if it can cull a file.
> Changing the way the backend works would allow this to be addressed.
>
>
> BITS THAT MAY BE CONTROVERSIAL
> ==============================
>
> There are some bits I've added that may be controversial:
>
> (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if
> a files is already being used by some other kernel service (e.g. a
> duplicate cachefiles cache in the same directory) and reject it if it
> is. This isn't entirely necessary, but it helps prevent accidental
> data corruption.
>
> I don't want to use S_SWAPFILE as that has other effects, but quite
> possibly swapon() should set S_KERNEL_FILE too.
>
> Note that it doesn't prevent userspace from interfering, though
> perhaps it should. (I have made it prevent a marked directory from
> being rmdir-able).
>
> (2) Cachefiles wants to keep the backing file for a cookie open whilst we
> might need to write to it from network filesystem writeback. The
> problem is that the network filesystem unuses its cookie when its file
> is closed, and so we have nothing pinning the cachefiles file open and
> it will get closed automatically after a short time to avoid
> EMFILE/ENFILE problems.
>
> Reopening the cache file, however, is a problem if this is being done
> due to writeback triggered by exit(). Some filesystems will oops if
> we try to open a file in that context because they want to access
> current->fs or suchlike.
>
> To get around this, I added the following:
>
> (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
> filesystem inode to indicate that we have a usage count on the
> cookie caching that inode.
>
> (B) A flag in struct writeback_control, unpinned_fscache_wb, that is
> set when __writeback_single_inode() clears the last dirty page
> from i_pages - at which point it clears I_PINNING_FSCACHE_WB and
> sets this flag.
>
> This has to be done here so that clearing I_PINNING_FSCACHE_WB can
> be done atomically with the check of PAGECACHE_TAG_DIRTY that
> clears I_DIRTY_PAGES.
>
> (C) A function, fscache_set_page_dirty(), which if it is not set, sets
> I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the
> cache resources.
>
> (D) A function, fscache_unpin_writeback(), to be called by
> ->write_inode() to unuse the cookie.
>
> (E) A function, fscache_clear_inode_writeback(), to be called when the
> inode is evicted, before clear_inode() is called. This cleans up
> any lingering I_PINNING_FSCACHE_WB.
>
> The network filesystem can then use these tools to make sure that
> fscache_write_to_cache() can write locally modified data to the cache
> as well as to the server.
>
> For the future, I'm working on write helpers for netfs lib that should
> allow this facility to be removed by keeping track of the dirty
> regions separately - but that's incomplete at the moment and is also
> going to be affected by folios, one way or another, since it deals
> with pages.
>
>
> These patches can be found also on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite
>
> David
>
>
> Changes
> =======
> ver #4:
> - Dropped a pair of patches to try and cope with multipage folios in
> afs_write_begin/end() - it should really be done in the caller[7].
> - Fixed the use of sizeof with memset in cifs.
> - Removed an extraneous kdoc param.
> - Added a patch to add a tracepoint for fscache_use/unuse_cookie().
> - In cifs, tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
> - Add an expanded version of a patch to use current_is_kswapd() instead of
> !gfpflags_allow_blocking()[8].
> - Removed a couple of debugging print statements.
>
> ver #3:
> - Fixed a race in the cookie state machine between LRU discard and
> relinquishment[4].
> - Fixed up the hashing to make it portable[5].
> - Fixed up some netfs coherency data to make it portable.
> - Fixed some missing NFS_FSCACHE=n fallback functions in nfs[6].
> - Added a patch to store volume coherency data in an xattr.
> - Added a check that the cookie is unhashed before being freed.
> - Fixed fscache to use remove_proc_subtree() to remove /proc/fs/fscache/.
>
> ver #2:
> - Fix an unused-var warning due to CONFIG_9P_FSCACHE=n.
> - Use gfpflags_allow_blocking() rather than using flag directly.
> - Fixed some error logging in a couple of cachefiles functions.
> - Fixed an error check in the fscache volume allocation.
> - Need to unmark an inode we've moved to the graveyard before unlocking.
> - Upgraded to -rc4 to allow for upstream changes to cifs.
> - Should only change to inval state if can get access to cache.
> - Don't hold n_accesses elevated whilst cache is bound to a cookie, but
> rather add a flag that prevents the state machine from being queued when
> n_accesses reaches 0.
> - Remove the unused cookie pointer field from the fscache_acquire
> tracepoint.
> - Added missing transition to LRU_DISCARDING state.
> - Added two ceph patches from Jeff Layton[2].
> - Remove NFS_INO_FSCACHE as it's no longer used.
> - In NFS, need to unuse a cookie on file-release, not inode-clear.
> - Filled in the NFS cache I/O routines, borrowing from the previously posted
> fallback I/O code[3].
>
>
> Link: https://lore.kernel.org/r/[email protected]/ [1]
> Link: https://lore.kernel.org/r/[email protected]/ [2]
> Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [3]
> Link: https://lore.kernel.org/r/[email protected]/ [4]
> Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [5]
> Link: https://lore.kernel.org/r/61b90f3d.H1IkoeQfEsGNhvq9%[email protected]/ [6]
> Link: https://lore.kernel.org/r/CAHk-=wh2dr=NgVSVj0sw-gSuzhxhLRV5FymfPS146zGgF4kBjA@mail.gmail.com/ [7]
> Link: https://lore.kernel.org/r/[email protected]/ [8]
>
>
> References
> ==========
>
> These patches have been published for review before, firstly as part of a
> larger set:
>
> Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/
>
> Then as a cut-down set:
>
> Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
> Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5
>
> I split out a set to just restructure the I/O, which got merged back in to
> this one:
>
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk/ # v4
>
> ... and a larger set to do the conversion, also merged back into this one:
>
> Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk/ # v2
>
> Older versions of this one:
>
> Link: https://lore.kernel.org/r/163819575444.215744.318477214576928110.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967073889.1823006.12237147297060239168.stgit@warthog.procyon.org.uk/ # v3
>
> Proposals/information about the design have been published here:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> And requests for information:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> I've posted partial patches to try and help 9p and cifs along:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> ---
> Dave Wysochanski (1):
> nfs: Convert to new fscache volume/cookie API
>
> David Howells (65):
> fscache, cachefiles: Disable configuration
> cachefiles: Delete the cachefiles driver pending rewrite
> fscache: Remove the contents of the fscache driver, pending rewrite
> netfs: Display the netfs inode number in the netfs_read tracepoint
> netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space
> fscache: Introduce new driver
> fscache: Implement a hash function
> fscache: Implement cache registration
> fscache: Implement volume registration
> fscache: Implement cookie registration
> fscache: Implement cache-level access helpers
> fscache: Implement volume-level access helpers
> fscache: Implement cookie-level access helpers
> fscache: Implement functions add/remove a cache
> fscache: Provide and use cache methods to lookup/create/free a volume
> fscache: Add a function for a cache backend to note an I/O error
> fscache: Implement simple cookie state machine
> fscache: Implement cookie user counting and resource pinning
> fscache: Implement cookie invalidation
> fscache: Provide a means to begin an operation
> fscache: Count data storage objects in a cache
> fscache: Provide read/write stat counters for the cache
> fscache: Provide a function to let the netfs update its coherency data
> netfs: Pass more information on how to deal with a hole in the cache
> fscache: Implement raw I/O interface
> fscache: Implement higher-level write I/O interface
> vfs, fscache: Implement pinning of cache usage for writeback
> fscache: Provide a function to note the release of a page
> fscache: Provide a function to resize a cookie
> cachefiles: Introduce rewritten driver
> cachefiles: Define structs
> cachefiles: Add some error injection support
> cachefiles: Add a couple of tracepoints for logging errors
> cachefiles: Add cache error reporting macro
> cachefiles: Add security derivation
> cachefiles: Register a miscdev and parse commands over it
> cachefiles: Provide a function to check how much space there is
> vfs, cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement a function to get/create a directory in the cache
> cachefiles: Implement cache registration and withdrawal
> cachefiles: Implement volume support
> cachefiles: Add tracepoints for calls to the VFS
> cachefiles: Implement object lifecycle funcs
> cachefiles: Implement key to filename encoding
> cachefiles: Implement metadata/coherency data storage in xattrs
> cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement culling daemon commands
> cachefiles: Implement backing file wrangling
> cachefiles: Implement begin and end I/O operation
> cachefiles: Implement cookie resize for truncate
> cachefiles: Implement the I/O routines
> fscache, cachefiles: Store the volume coherency data
> cachefiles: Allow cachefiles to actually function
> fscache, cachefiles: Display stats of no-space events
> fscache, cachefiles: Display stat of culling events
> afs: Convert afs to use the new fscache API
> afs: Copy local writes to the cache when writing to the server
> afs: Skip truncation on the server of data we haven't written yet
> 9p: Use fscache indexing rewrite and reenable caching
> 9p: Copy local writes to the cache when writing to the server
> nfs: Implement cache I/O by accessing the cache directly
> cifs: Support fscache indexing rewrite (untested)
> fscache: Rewrite documentation
> fscache: Add a tracepoint for cookie use/unuse
> 9p, afs, ceph, cifs, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
>
> Jeff Layton (2):
> ceph: conversion to new fscache API
> ceph: add fscache writeback support
>
>
> .../filesystems/caching/backend-api.rst | 850 ++++------
> .../filesystems/caching/cachefiles.rst | 6 +-
> Documentation/filesystems/caching/fscache.rst | 525 ++----
> Documentation/filesystems/caching/index.rst | 4 +-
> .../filesystems/caching/netfs-api.rst | 1136 ++++---------
> Documentation/filesystems/caching/object.rst | 313 ----
> .../filesystems/caching/operations.rst | 210 ---
> Documentation/filesystems/netfs_library.rst | 16 +-
> fs/9p/Kconfig | 2 +-
> fs/9p/cache.c | 195 +--
> fs/9p/cache.h | 25 +-
> fs/9p/v9fs.c | 17 +-
> fs/9p/v9fs.h | 13 +-
> fs/9p/vfs_addr.c | 57 +-
> fs/9p/vfs_dir.c | 13 +
> fs/9p/vfs_file.c | 3 +-
> fs/9p/vfs_inode.c | 26 +-
> fs/9p/vfs_inode_dotl.c | 3 +-
> fs/9p/vfs_super.c | 3 +
> fs/afs/Kconfig | 2 +-
> fs/afs/Makefile | 3 -
> fs/afs/cache.c | 68 -
> fs/afs/cell.c | 12 -
> fs/afs/file.c | 38 +-
> fs/afs/inode.c | 101 +-
> fs/afs/internal.h | 37 +-
> fs/afs/main.c | 14 -
> fs/afs/super.c | 1 +
> fs/afs/volume.c | 29 +-
> fs/afs/write.c | 88 +-
> fs/cachefiles/Kconfig | 7 +
> fs/cachefiles/Makefile | 6 +-
> fs/cachefiles/bind.c | 278 ----
> fs/cachefiles/cache.c | 378 +++++
> fs/cachefiles/daemon.c | 180 +-
> fs/cachefiles/error_inject.c | 46 +
> fs/cachefiles/interface.c | 747 ++++-----
> fs/cachefiles/internal.h | 270 +--
> fs/cachefiles/io.c | 330 +++-
> fs/cachefiles/key.c | 201 ++-
> fs/cachefiles/main.c | 22 +-
> fs/cachefiles/namei.c | 1223 ++++++--------
> fs/cachefiles/rdwr.c | 972 -----------
> fs/cachefiles/security.c | 2 +-
> fs/cachefiles/volume.c | 139 ++
> fs/cachefiles/xattr.c | 421 ++---
> fs/ceph/Kconfig | 2 +-
> fs/ceph/addr.c | 102 +-
> fs/ceph/cache.c | 218 +--
> fs/ceph/cache.h | 97 +-
> fs/ceph/caps.c | 3 +-
> fs/ceph/file.c | 13 +-
> fs/ceph/inode.c | 22 +-
> fs/ceph/super.c | 10 +-
> fs/ceph/super.h | 3 +-
> fs/cifs/Kconfig | 2 +-
> fs/cifs/Makefile | 2 +-
> fs/cifs/cache.c | 105 --
> fs/cifs/cifsfs.c | 11 +-
> fs/cifs/cifsglob.h | 5 +-
> fs/cifs/connect.c | 12 -
> fs/cifs/file.c | 64 +-
> fs/cifs/fscache.c | 333 +---
> fs/cifs/fscache.h | 126 +-
> fs/cifs/inode.c | 36 +-
> fs/fs-writeback.c | 8 +
> fs/fscache/Makefile | 6 +-
> fs/fscache/cache.c | 618 +++----
> fs/fscache/cookie.c | 1448 +++++++++--------
> fs/fscache/fsdef.c | 98 --
> fs/fscache/internal.h | 317 +---
> fs/fscache/io.c | 376 ++++-
> fs/fscache/main.c | 147 +-
> fs/fscache/netfs.c | 74 -
> fs/fscache/object.c | 1125 -------------
> fs/fscache/operation.c | 633 -------
> fs/fscache/page.c | 1242 --------------
> fs/fscache/proc.c | 47 +-
> fs/fscache/stats.c | 293 +---
> fs/fscache/volume.c | 517 ++++++
> fs/namei.c | 3 +-
> fs/netfs/read_helper.c | 10 +-
> fs/nfs/Kconfig | 2 +-
> fs/nfs/Makefile | 2 +-
> fs/nfs/client.c | 4 -
> fs/nfs/direct.c | 2 +
> fs/nfs/file.c | 13 +-
> fs/nfs/fscache-index.c | 140 --
> fs/nfs/fscache.c | 490 ++----
> fs/nfs/fscache.h | 180 +-
> fs/nfs/inode.c | 11 +-
> fs/nfs/nfstrace.h | 1 -
> fs/nfs/read.c | 25 +-
> fs/nfs/super.c | 28 +-
> fs/nfs/write.c | 8 +-
> include/linux/fs.h | 4 +
> include/linux/fscache-cache.h | 614 ++-----
> include/linux/fscache.h | 1021 +++++-------
> include/linux/netfs.h | 15 +-
> include/linux/nfs_fs.h | 1 -
> include/linux/nfs_fs_sb.h | 9 +-
> include/linux/writeback.h | 1 +
> include/trace/events/cachefiles.h | 527 ++++--
> include/trace/events/fscache.h | 642 ++++----
> include/trace/events/netfs.h | 5 +-
> 105 files changed, 7396 insertions(+), 13509 deletions(-)
> delete mode 100644 Documentation/filesystems/caching/object.rst
> delete mode 100644 Documentation/filesystems/caching/operations.rst
> delete mode 100644 fs/afs/cache.c
> delete mode 100644 fs/cachefiles/bind.c
> create mode 100644 fs/cachefiles/cache.c
> create mode 100644 fs/cachefiles/error_inject.c
> delete mode 100644 fs/cachefiles/rdwr.c
> create mode 100644 fs/cachefiles/volume.c
> delete mode 100644 fs/cifs/cache.c
> delete mode 100644 fs/fscache/fsdef.c
> delete mode 100644 fs/fscache/netfs.c
> delete mode 100644 fs/fscache/object.c
> delete mode 100644 fs/fscache/operation.c
> delete mode 100644 fs/fscache/page.c
> create mode 100644 fs/fscache/volume.c
> delete mode 100644 fs/nfs/fscache-index.c
>
>

Tested this with NFS and fscache enabled and disabled looks very
stable. You can add
Tested-by: Dave Wysochanski <[email protected]>

Since this set was on top of 5.16.0-rc4, I compared runs of vanilla
5.16.0-rc4 and this set.
No new failures were seen, no oops or warns, or other stability
issues, and many runs I ran back to back without reboots.

Summary of tests
- NFS fscache unit tests: PASS
- xfstests vers=4.2,fsc,nofsc (Hammerspace pNFS flexfiles, RHEL 8u4): PASS
- xfstests vers=4.1,fsc,nofsc (Hammerspace pNFS flexfiles, NetApp
Ontap 9.x pNFS files): PASS
- xfstests vers=4.0,fsc,nofsc (NetApp Ontap 9.x pNFS files, RHEL 8u4): PASS
- xfstests vers=3 (RHEL 8u4): PASS


2022-01-06 15:55:19

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 28/68] fscache: Provide a function to note the release of a page

On Wed, 2021-12-22 at 23:20 +0000, David Howells wrote:
> Provide a function to be called from a network filesystem's releasepage
> method to indicate that a page has been released that might have been a
> reflection of data upon the server - and now that data must be reloaded
> from the server or the cache.
>
> This is used to end an optimisation for empty files, in particular files
> that have just been created locally, whereby we know there cannot yet be
> any data that we would need to read from the server or the cache.
>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/163819617128.215744.4725572296135656508.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906920354.143852.7511819614661372008.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967128061.1823006.611781655060034988.stgit@warthog.procyon.org.uk/ # v3
> ---
>
> include/linux/fscache.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/linux/fscache.h b/include/linux/fscache.h
> index 18e725671594..28ce258c1f87 100644
> --- a/include/linux/fscache.h
> +++ b/include/linux/fscache.h
> @@ -607,4 +607,20 @@ static inline void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
> }
> }
>
> +/**
> + * fscache_note_page_release - Note that a netfs page got released
> + * @cookie: The cookie corresponding to the file
> + *
> + * Note that a page that has been copied to the cache has been released. This
> + * means that future reads will need to look in the cache to see if it's there.
> + */
> +static inline
> +void fscache_note_page_release(struct fscache_cookie *cookie)
> +{
> + if (cookie &&
> + test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> + test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> + clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
> +}
> +
> #endif /* _LINUX_FSCACHE_H */
>
>

Is this logic correct?

FSCACHE_COOKIE_HAVE_DATA gets set in cachefiles_write_complete, but will
that ever be called on a cookie that has no data? Will we ever call
cachefiles_write at all when there is no data to be written?

--
Jeff Layton <[email protected]>

2022-01-06 16:27:08

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 28/68] fscache: Provide a function to note the release of a page

Jeff Layton <[email protected]> wrote:

> > +/**
> > + * fscache_note_page_release - Note that a netfs page got released
> > + * @cookie: The cookie corresponding to the file
> > + *
> > + * Note that a page that has been copied to the cache has been released. This
> > + * means that future reads will need to look in the cache to see if it's there.
> > + */
> > +static inline
> > +void fscache_note_page_release(struct fscache_cookie *cookie)
> > +{
> > + if (cookie &&
> > + test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> > + test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> > + clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
> > +}
> > +
> > #endif /* _LINUX_FSCACHE_H */
> >
> >
>
> Is this logic correct?
>
> FSCACHE_COOKIE_HAVE_DATA gets set in cachefiles_write_complete, but will
> that ever be called on a cookie that has no data? Will we ever call
> cachefiles_write at all when there is no data to be written?

FSCACHE_COOKIE_NO_DATA_TO_READ is set if we have no data in the cache yet
(ie. the backing file lookup was negative, the file is 0 length or the cookie
got invalidated). It means that we have no data in the cache, not that the
file is necessarily empty on the server.

FSCACHE_COOKIE_HAVE_DATA is set once we've stored data in the backing file.
From that point on, we have data we *could* read - however, it's covered by
pages in the netfs pagecache until at such time one of those covering pages is
released.

So if we've written data to the cache (HAVE_DATA) and there wasn't any data in
the cache when we started (NO_DATA_TO_READ), it may no longer be true that we
can skip reading from the cache.

Read skipping is done by cachefiles_prepare_read().

Note that I'm not doing tracking on a per-page basis, but only on a per-file
basis.

David


2022-01-06 17:04:19

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

On Wed, 2021-12-22 at 23:23 +0000, David Howells wrote:
> Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
> the kernel to prevent cachefiles or other kernel services from interfering
> with that file.
>
> Alter rmdir to reject attempts to remove a directory marked with this flag.
> This is used by cachefiles to prevent cachefilesd from removing them.
>
> Using S_SWAPFILE instead isn't really viable as that has other effects in
> the I/O paths.
>
> Changes
> =======
> ver #3:
> - Check for the object pointer being NULL in the tracepoints rather than
> the caller.
>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3
> ---
>
> fs/cachefiles/Makefile | 1 +
> fs/cachefiles/namei.c | 43 +++++++++++++++++++++++++++++++++++++
> fs/namei.c | 3 ++-
> include/linux/fs.h | 1 +
> include/trace/events/cachefiles.h | 42 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 89 insertions(+), 1 deletion(-)
> create mode 100644 fs/cachefiles/namei.c
>
> diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
> index 463e3d608b75..e0b092ca077f 100644
> --- a/fs/cachefiles/Makefile
> +++ b/fs/cachefiles/Makefile
> @@ -7,6 +7,7 @@ cachefiles-y := \
> cache.o \
> daemon.o \
> main.o \
> + namei.o \
> security.o
>
> cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> new file mode 100644
> index 000000000000..913f83f1c900
> --- /dev/null
> +++ b/fs/cachefiles/namei.c
> @@ -0,0 +1,43 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* CacheFiles path walking and related routines
> + *
> + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + */
> +
> +#include <linux/fs.h>
> +#include "internal.h"
> +
> +/*
> + * Mark the backing file as being a cache file if it's not already in use. The
> + * mark tells the culling request command that it's not allowed to cull the
> + * file or directory. The caller must hold the inode lock.
> + */
> +static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
> + struct dentry *dentry)
> +{
> + struct inode *inode = d_backing_inode(dentry);
> + bool can_use = false;
> +
> + if (!(inode->i_flags & S_KERNEL_FILE)) {

nit: most of the other S_* flags have a corresponding IS_* macro. Should
this be:

IS_KERNEL_FILE(inode)

?

> + inode->i_flags |= S_KERNEL_FILE;
> + trace_cachefiles_mark_active(object, inode);
> + can_use = true;
> + } else {
> + pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
> + }
> +
> + return can_use;
> +}
> +
> +/*
> + * Unmark a backing inode. The caller must hold the inode lock.
> + */
> +static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
> + struct dentry *dentry)
> +{
> + struct inode *inode = d_backing_inode(dentry);
> +
> + inode->i_flags &= ~S_KERNEL_FILE;
> + trace_cachefiles_mark_inactive(object, inode);
> +}
> diff --git a/fs/namei.c b/fs/namei.c
> index 1f9d2187c765..d81f04f8d818 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3958,7 +3958,8 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
> inode_lock(dentry->d_inode);
>
> error = -EBUSY;
> - if (is_local_mountpoint(dentry))
> + if (is_local_mountpoint(dentry) ||
> + (dentry->d_inode->i_flags & S_KERNEL_FILE))
> goto out;
>
> error = security_inode_rmdir(dir, dentry);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2c0b8e77d9ab..bcf1ca430139 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2249,6 +2249,7 @@ struct super_operations {
> #define S_ENCRYPTED (1 << 14) /* Encrypted file (using fs/crypto/) */
> #define S_CASEFOLD (1 << 15) /* Casefolded file */
> #define S_VERITY (1 << 16) /* Verity file (using fs/verity/) */
> +#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
>
> /*
> * Note that nosuid etc flags are inode-specific: setting some file-system
> diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
> index 9bd5a8a60801..6331cd29880d 100644
> --- a/include/trace/events/cachefiles.h
> +++ b/include/trace/events/cachefiles.h
> @@ -83,6 +83,48 @@ cachefiles_error_traces;
> #define E_(a, b) { a, b }
>
>
> +TRACE_EVENT(cachefiles_mark_active,
> + TP_PROTO(struct cachefiles_object *obj,
> + struct inode *inode),
> +
> + TP_ARGS(obj, inode),
> +
> + /* Note that obj may be NULL */
> + TP_STRUCT__entry(
> + __field(unsigned int, obj )
> + __field(ino_t, inode )
> + ),
> +
> + TP_fast_assign(
> + __entry->obj = obj ? obj->debug_id : 0;
> + __entry->inode = inode->i_ino;
> + ),
> +
> + TP_printk("o=%08x i=%lx",
> + __entry->obj, __entry->inode)
> + );
> +
> +TRACE_EVENT(cachefiles_mark_inactive,
> + TP_PROTO(struct cachefiles_object *obj,
> + struct inode *inode),
> +
> + TP_ARGS(obj, inode),
> +
> + /* Note that obj may be NULL */
> + TP_STRUCT__entry(
> + __field(unsigned int, obj )
> + __field(ino_t, inode )
> + ),
> +
> + TP_fast_assign(
> + __entry->obj = obj ? obj->debug_id : 0;
> + __entry->inode = inode->i_ino;
> + ),
> +
> + TP_printk("o=%08x i=%lx",
> + __entry->obj, __entry->inode)
> + );
> +
> TRACE_EVENT(cachefiles_vfs_error,
> TP_PROTO(struct cachefiles_object *obj, struct inode *backer,
> int error, enum cachefiles_error_trace where),
>
>

--
Jeff Layton <[email protected]>

2022-01-06 17:17:33

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 40/68] cachefiles: Implement cache registration and withdrawal

On Wed, 2021-12-22 at 23:23 +0000, David Howells wrote:
> Do the following:
>
> (1) Fill out cachefiles_daemon_add_cache() so that it sets up the cache
> directories and registers the cache with cachefiles.
>
> (2) Add a function to do the top-level part of cache withdrawal and
> unregistration.
>
> (3) Add a function to sync a cache.
>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/163819633175.215744.10857127598041268340.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906935445.143852.15545194974036410029.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967142904.1823006.244055483596047072.stgit@warthog.procyon.org.uk/ # v3
> ---
>
> fs/cachefiles/Makefile | 1
> fs/cachefiles/cache.c | 207 +++++++++++++++++++++++++++++++++++++++++++++
> fs/cachefiles/daemon.c | 8 +-
> fs/cachefiles/interface.c | 18 ++++
> fs/cachefiles/internal.h | 9 ++
> 5 files changed, 240 insertions(+), 3 deletions(-)
> create mode 100644 fs/cachefiles/interface.c
>
> diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
> index e0b092ca077f..92af5daee8ce 100644
> --- a/fs/cachefiles/Makefile
> +++ b/fs/cachefiles/Makefile
> @@ -6,6 +6,7 @@
> cachefiles-y := \
> cache.o \
> daemon.o \
> + interface.o \
> main.o \
> namei.o \
> security.o
> diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
> index 73636f89eefa..0462e7af87fb 100644
> --- a/fs/cachefiles/cache.c
> +++ b/fs/cachefiles/cache.c
> @@ -10,6 +10,166 @@
> #include <linux/namei.h>
> #include "internal.h"
>
> +/*
> + * Bring a cache online.
> + */
> +int cachefiles_add_cache(struct cachefiles_cache *cache)
> +{
> + struct fscache_cache *cache_cookie;
> + struct path path;
> + struct kstatfs stats;
> + struct dentry *graveyard, *cachedir, *root;
> + const struct cred *saved_cred;
> + int ret;
> +
> + _enter("");
> +
> + cache_cookie = fscache_acquire_cache(cache->tag);
> + if (IS_ERR(cache_cookie))
> + return PTR_ERR(cache_cookie);
> +
> + /* we want to work under the module's security ID */
> + ret = cachefiles_get_security_ID(cache);
> + if (ret < 0)
> + goto error_getsec;
> +
> + cachefiles_begin_secure(cache, &saved_cred);
> +
> + /* look up the directory at the root of the cache */
> + ret = kern_path(cache->rootdirname, LOOKUP_DIRECTORY, &path);
> + if (ret < 0)
> + goto error_open_root;
> +
> + cache->mnt = path.mnt;
> + root = path.dentry;
> +
> + ret = -EINVAL;
> + if (mnt_user_ns(path.mnt) != &init_user_ns) {
> + pr_warn("File cache on idmapped mounts not supported");
> + goto error_unsupported;
> + }
> +
> + /* check parameters */
> + ret = -EOPNOTSUPP;
> + if (d_is_negative(root) ||
> + !d_backing_inode(root)->i_op->lookup ||
> + !d_backing_inode(root)->i_op->mkdir ||
> + !(d_backing_inode(root)->i_opflags & IOP_XATTR) ||
> + !root->d_sb->s_op->statfs ||
> + !root->d_sb->s_op->sync_fs ||
> + root->d_sb->s_blocksize > PAGE_SIZE)
> + goto error_unsupported;
> +

That's quite a collection of tests.

Most are obvious, but some comments explaining the need for others would
not be a bad thing. In particular, why is s_blocksize > PAGE_SIZE
unsupported?

Also, should you vet whether the fs supports i_op->tmpfile here ?

> + ret = -EROFS;
> + if (sb_rdonly(root->d_sb))
> + goto error_unsupported;
> +
> + /* determine the security of the on-disk cache as this governs
> + * security ID of files we create */
> + ret = cachefiles_determine_cache_security(cache, root, &saved_cred);
> + if (ret < 0)
> + goto error_unsupported;
> +
> + /* get the cache size and blocksize */
> + ret = vfs_statfs(&path, &stats);
> + if (ret < 0)
> + goto error_unsupported;
> +
> + ret = -ERANGE;
> + if (stats.f_bsize <= 0)
> + goto error_unsupported;
> +
> + ret = -EOPNOTSUPP;
> + if (stats.f_bsize > PAGE_SIZE)
> + goto error_unsupported;
> +
> + cache->bsize = stats.f_bsize;
> + cache->bshift = 0;
> + if (stats.f_bsize < PAGE_SIZE)
> + cache->bshift = PAGE_SHIFT - ilog2(stats.f_bsize);
> +
> + _debug("blksize %u (shift %u)",
> + cache->bsize, cache->bshift);
> +
> + _debug("size %llu, avail %llu",
> + (unsigned long long) stats.f_blocks,
> + (unsigned long long) stats.f_bavail);
> +
> + /* set up caching limits */
> + do_div(stats.f_files, 100);
> + cache->fstop = stats.f_files * cache->fstop_percent;
> + cache->fcull = stats.f_files * cache->fcull_percent;
> + cache->frun = stats.f_files * cache->frun_percent;
> +
> + _debug("limits {%llu,%llu,%llu} files",
> + (unsigned long long) cache->frun,
> + (unsigned long long) cache->fcull,
> + (unsigned long long) cache->fstop);
> +
> + stats.f_blocks >>= cache->bshift;
> + do_div(stats.f_blocks, 100);
> + cache->bstop = stats.f_blocks * cache->bstop_percent;
> + cache->bcull = stats.f_blocks * cache->bcull_percent;
> + cache->brun = stats.f_blocks * cache->brun_percent;
> +
> + _debug("limits {%llu,%llu,%llu} blocks",
> + (unsigned long long) cache->brun,
> + (unsigned long long) cache->bcull,
> + (unsigned long long) cache->bstop);
> +
> + /* get the cache directory and check its type */
> + cachedir = cachefiles_get_directory(cache, root, "cache", NULL);
> + if (IS_ERR(cachedir)) {
> + ret = PTR_ERR(cachedir);
> + goto error_unsupported;
> + }
> +
> + cache->store = cachedir;
> +
> + /* get the graveyard directory */
> + graveyard = cachefiles_get_directory(cache, root, "graveyard", NULL);
> + if (IS_ERR(graveyard)) {
> + ret = PTR_ERR(graveyard);
> + goto error_unsupported;
> + }
> +
> + cache->graveyard = graveyard;
> + cache->cache = cache_cookie;
> +
> + ret = fscache_add_cache(cache_cookie, &cachefiles_cache_ops, cache);
> + if (ret < 0)
> + goto error_add_cache;
> +
> + /* done */
> + set_bit(CACHEFILES_READY, &cache->flags);
> + dput(root);
> +
> + pr_info("File cache on %s registered\n", cache_cookie->name);
> +
> + /* check how much space the cache has */
> + cachefiles_has_space(cache, 0, 0);
> + cachefiles_end_secure(cache, saved_cred);
> + _leave(" = 0 [%px]", cache->cache);
> + return 0;
> +
> +error_add_cache:
> + cachefiles_put_directory(cache->graveyard);
> + cache->graveyard = NULL;
> +error_unsupported:
> + cachefiles_put_directory(cache->store);
> + cache->store = NULL;
> + mntput(cache->mnt);
> + cache->mnt = NULL;
> + dput(root);
> +error_open_root:
> + cachefiles_end_secure(cache, saved_cred);
> +error_getsec:
> + fscache_relinquish_cache(cache_cookie);
> + cache->cache = NULL;
> + pr_err("Failed to register: %d\n", ret);
> + return ret;
> +}
> +
> /*
> * See if we have space for a number of pages and/or a number of files in the
> * cache
> @@ -101,3 +261,50 @@ int cachefiles_has_space(struct cachefiles_cache *cache,
> _leave(" = %d", ret);
> return ret;
> }
> +
> +/*
> + * Sync a cache to backing disk.
> + */
> +static void cachefiles_sync_cache(struct cachefiles_cache *cache)
> +{
> + const struct cred *saved_cred;
> + int ret;
> +
> + _enter("%s", cache->cache->name);
> +
> + /* make sure all pages pinned by operations on behalf of the netfs are
> + * written to disc */
> + cachefiles_begin_secure(cache, &saved_cred);
> + down_read(&cache->mnt->mnt_sb->s_umount);
> + ret = sync_filesystem(cache->mnt->mnt_sb);
> + up_read(&cache->mnt->mnt_sb->s_umount);
> + cachefiles_end_secure(cache, saved_cred);
> +
> + if (ret == -EIO)
> + cachefiles_io_error(cache,
> + "Attempt to sync backing fs superblock returned error %d",
> + ret);
> +}
> +
> +/*
> + * Withdraw cache objects.
> + */
> +void cachefiles_withdraw_cache(struct cachefiles_cache *cache)
> +{
> + struct fscache_cache *fscache = cache->cache;
> +
> + pr_info("File cache on %s unregistering\n", fscache->name);
> +
> + fscache_withdraw_cache(fscache);
> +
> + /* we now have to destroy all the active objects pertaining to this
> + * cache - which we do by passing them off to thread pool to be
> + * disposed of */
> + // PLACEHOLDER: Withdraw objects
> + fscache_wait_for_objects(fscache);
> +
> + // PLACEHOLDER: Withdraw volume
> + cachefiles_sync_cache(cache);
> + cache->cache = NULL;
> + fscache_relinquish_cache(fscache);
> +}
> diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
> index 7d4691614cec..a449ee661987 100644
> --- a/fs/cachefiles/daemon.c
> +++ b/fs/cachefiles/daemon.c
> @@ -702,6 +702,7 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
>
> pr_warn("Cache is disabled for development\n");
> return -ENOANO; // Don't allow the cache to operate yet
> + //return cachefiles_add_cache(cache);
> }
>
> /*
> @@ -711,10 +712,11 @@ static void cachefiles_daemon_unbind(struct cachefiles_cache *cache)
> {
> _enter("");
>
> - if (test_bit(CACHEFILES_READY, &cache->flags)) {
> - // PLACEHOLDER: Withdraw cache
> - }
> + if (test_bit(CACHEFILES_READY, &cache->flags))
> + cachefiles_withdraw_cache(cache);
>
> + cachefiles_put_directory(cache->graveyard);
> + cachefiles_put_directory(cache->store);
> mntput(cache->mnt);
>
> kfree(cache->rootdirname);
> diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> new file mode 100644
> index 000000000000..564ea8fa6641
> --- /dev/null
> +++ b/fs/cachefiles/interface.c
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* FS-Cache interface to CacheFiles
> + *
> + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/mount.h>
> +#include <linux/xattr.h>
> +#include <linux/file.h>
> +#include <linux/falloc.h>
> +#include <trace/events/fscache.h>
> +#include "internal.h"
> +
> +const struct fscache_cache_ops cachefiles_cache_ops = {
> + .name = "cachefiles",
> +};
> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
> index 48768a3ab105..77e874c2bbe7 100644
> --- a/fs/cachefiles/internal.h
> +++ b/fs/cachefiles/internal.h
> @@ -32,6 +32,8 @@ struct cachefiles_object {
> struct cachefiles_cache {
> struct fscache_cache *cache; /* Cache cookie */
> struct vfsmount *mnt; /* mountpoint holding the cache */
> + struct dentry *store; /* Directory into which live objects go */
> + struct dentry *graveyard; /* directory into which dead objects go */
> struct file *cachefilesd; /* manager daemon handle */
> const struct cred *cache_cred; /* security override for accessing cache */
> struct mutex daemon_mutex; /* command serialisation mutex */
> @@ -78,8 +80,10 @@ static inline void cachefiles_state_changed(struct cachefiles_cache *cache)
> /*
> * cache.c
> */
> +extern int cachefiles_add_cache(struct cachefiles_cache *cache);
> extern int cachefiles_has_space(struct cachefiles_cache *cache,
> unsigned fnr, unsigned bnr);
> +extern void cachefiles_withdraw_cache(struct cachefiles_cache *cache);
>
> /*
> * daemon.c
> @@ -125,6 +129,11 @@ static inline int cachefiles_inject_remove_error(void)
> return cachefiles_error_injection_state & 2 ? -EIO : 0;
> }
>
> +/*
> + * interface.c
> + */
> +extern const struct fscache_cache_ops cachefiles_cache_ops;
> +
> /*
> * namei.c
> */
>
>

--
Jeff Layton <[email protected]>

2022-01-06 17:43:44

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 44/68] cachefiles: Implement key to filename encoding

On Wed, 2021-12-22 at 23:24 +0000, David Howells wrote:
> Implement a function to encode a binary cookie key as something that can be
> used as a filename. Four options are considered:
>
> (1) All printable chars with no '/' characters. Prepend a 'D' to indicate
> the encoding but otherwise use as-is.
>
> (2) Appears to be an array of __be32. Encode as 'S' plus a list of
> hex-encoded 32-bit ints separated by commas. If a number is 0, it is
> rendered as "" instead of "0".
>
> (3) Appears to be an array of __le32. Encoded as (2) but with a 'T'
> encoding prefix.
>
> (4) Encoded as base64 with an 'E' prefix plus a second char indicating how
> much padding is involved. A non-standard base64 encoding is used
> because '/' cannot be used in the encoded form.
>
> If (1) is not possible, whichever of (2), (3) or (4) produces the shortest
> string is selected (hex-encoding a number may be less dense than base64
> encoding it).
>

Since most cookies are fairly small, is there any real benefit to
optimizing for length here? How much inflation are we talking about?

> Note that the prefix characters have to be selected from the set [DEIJST@]
> lest cachefilesd remove the files because it recognise the name.
>
> Changes
> =======
> ver #2:
> - Fix a short allocation that didn't allow for a string terminator[1]
>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/[email protected]/ [1]
> Link: https://lore.kernel.org/r/163819640393.215744.15212364106412961104.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906940529.143852.17352132319136117053.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967149827.1823006.6088580775428487961.stgit@warthog.procyon.org.uk/ # v3
> ---
>
> fs/cachefiles/Makefile | 1
> fs/cachefiles/internal.h | 5 ++
> fs/cachefiles/key.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 144 insertions(+)
> create mode 100644 fs/cachefiles/key.c
>
> diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
> index d67210ece9cd..6f025940a65c 100644
> --- a/fs/cachefiles/Makefile
> +++ b/fs/cachefiles/Makefile
> @@ -7,6 +7,7 @@ cachefiles-y := \
> cache.o \
> daemon.o \
> interface.o \
> + key.o \
> main.o \
> namei.o \
> security.o \
> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
> index 8763ee4a0df2..dbc37f5d4714 100644
> --- a/fs/cachefiles/internal.h
> +++ b/fs/cachefiles/internal.h
> @@ -173,6 +173,11 @@ extern struct cachefiles_object *cachefiles_grab_object(struct cachefiles_object
> extern void cachefiles_put_object(struct cachefiles_object *object,
> enum cachefiles_obj_ref_trace why);
>
> +/*
> + * key.c
> + */
> +extern bool cachefiles_cook_key(struct cachefiles_object *object);
> +
> /*
> * main.c
> */
> diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c
> new file mode 100644
> index 000000000000..bf935e25bdbe
> --- /dev/null
> +++ b/fs/cachefiles/key.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Key to pathname encoder
> + *
> + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + */
> +
> +#include <linux/slab.h>
> +#include "internal.h"
> +
> +static const char cachefiles_charmap[64] =
> + "0123456789" /* 0 - 9 */
> + "abcdefghijklmnopqrstuvwxyz" /* 10 - 35 */
> + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" /* 36 - 61 */
> + "_-" /* 62 - 63 */
> + ;
> +
> +static const char cachefiles_filecharmap[256] = {
> + /* we skip space and tab and control chars */
> + [33 ... 46] = 1, /* '!' -> '.' */
> + /* we skip '/' as it's significant to pathwalk */
> + [48 ... 127] = 1, /* '0' -> '~' */
> +};
> +
> +static inline unsigned int how_many_hex_digits(unsigned int x)
> +{
> + return x ? round_up(ilog2(x) + 1, 4) / 4 : 0;
> +}
> +
> +/*
> + * turn the raw key into something cooked
> + * - the key may be up to NAME_MAX in length (including the length word)
> + * - "base64" encode the strange keys, mapping 3 bytes of raw to four of
> + * cooked
> + * - need to cut the cooked key into 252 char lengths (189 raw bytes)
> + */
> +bool cachefiles_cook_key(struct cachefiles_object *object)
> +{
> + const u8 *key = fscache_get_key(object->cookie), *kend;
> + unsigned char ch;
> + unsigned int acc, i, n, nle, nbe, keylen = object->cookie->key_len;
> + unsigned int b64len, len, print, pad;
> + char *name, sep;
> +
> + _enter(",%u,%*phN", keylen, keylen, key);
> +
> + BUG_ON(keylen > NAME_MAX - 3);
> +
> + print = 1;
> + for (i = 0; i < keylen; i++) {
> + ch = key[i];
> + print &= cachefiles_filecharmap[ch];
> + }
> +
> + /* If the path is usable ASCII, then we render it directly */
> + if (print) {
> + len = 1 + keylen;
> + name = kmalloc(len + 1, GFP_KERNEL);
> + if (!name)
> + return false;
> +
> + name[0] = 'D'; /* Data object type, string encoding */
> + memcpy(name + 1, key, keylen);
> + goto success;
> + }
> +
> + /* See if it makes sense to encode it as "hex,hex,hex" for each 32-bit
> + * chunk. We rely on the key having been padded out to a whole number
> + * of 32-bit words.
> + */
> + n = round_up(keylen, 4);
> + nbe = nle = 0;
> + for (i = 0; i < n; i += 4) {
> + u32 be = be32_to_cpu(*(__be32 *)(key + i));
> + u32 le = le32_to_cpu(*(__le32 *)(key + i));
> +
> + nbe += 1 + how_many_hex_digits(be);
> + nle += 1 + how_many_hex_digits(le);
> + }
> +
> + b64len = DIV_ROUND_UP(keylen, 3);
> + pad = b64len * 3 - keylen;
> + b64len = 2 + b64len * 4; /* Length if we base64-encode it */
> + _debug("len=%u nbe=%u nle=%u b64=%u", keylen, nbe, nle, b64len);
> + if (nbe < b64len || nle < b64len) {
> + unsigned int nlen = min(nbe, nle) + 1;
> + name = kmalloc(nlen, GFP_KERNEL);
> + if (!name)
> + return false;
> + sep = (nbe <= nle) ? 'S' : 'T'; /* Encoding indicator */
> + len = 0;
> + for (i = 0; i < n; i += 4) {
> + u32 x;
> + if (nbe <= nle)
> + x = be32_to_cpu(*(__be32 *)(key + i));
> + else
> + x = le32_to_cpu(*(__le32 *)(key + i));
> + name[len++] = sep;
> + if (x != 0)
> + len += snprintf(name + len, nlen - len, "%x", x);
> + sep = ',';
> + }
> + goto success;
> + }
> +
> + /* We need to base64-encode it */
> + name = kmalloc(b64len + 1, GFP_KERNEL);
> + if (!name)
> + return false;
> +
> + name[0] = 'E';
> + name[1] = '0' + pad;
> + len = 2;
> + kend = key + keylen;
> + do {
> + acc = *key++;
> + if (key < kend) {
> + acc |= *key++ << 8;
> + if (key < kend)
> + acc |= *key++ << 16;
> + }
> +
> + name[len++] = cachefiles_charmap[acc & 63];
> + acc >>= 6;
> + name[len++] = cachefiles_charmap[acc & 63];
> + acc >>= 6;
> + name[len++] = cachefiles_charmap[acc & 63];
> + acc >>= 6;
> + name[len++] = cachefiles_charmap[acc & 63];
> + } while (key < kend);

It might be good to eventually consolidate this code with the base64
scheme that fscrypt uses. Are they compatible? If so, then that can be
done in a later merge.

> +
> +success:
> + name[len] = 0;
> + object->d_name = name;
> + object->d_name_len = len;
> + _leave(" = %s", object->d_name);
> + return true;
> +}
>
>

--
Jeff Layton <[email protected]>

2022-01-06 17:44:16

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 40/68] cachefiles: Implement cache registration and withdrawal

Jeff Layton <[email protected]> wrote:

> > + /* check parameters */
> > + ret = -EOPNOTSUPP;
> > + if (d_is_negative(root) ||
> > + !d_backing_inode(root)->i_op->lookup ||
> > + !d_backing_inode(root)->i_op->mkdir ||
> > + !(d_backing_inode(root)->i_opflags & IOP_XATTR) ||
> > + !root->d_sb->s_op->statfs ||
> > + !root->d_sb->s_op->sync_fs ||
> > + root->d_sb->s_blocksize > PAGE_SIZE)
> > + goto error_unsupported;
> > +
>
> That's quite a collection of tests.
>
> Most are obvious, but some comments explaining the need for others would
> not be a bad thing. In particular, why is s_blocksize > PAGE_SIZE
> unsupported?

It can't do page-sized DIO requests to a filesystem with a block size larger
than a page. In the future I can work around that in conjunction with
netfslib by expanding read and write sizes.

> Also, should you vet whether the fs supports i_op->tmpfile here ?

That's a good idea.

David


2022-01-06 18:04:30

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 46/68] cachefiles: Mark a backing file in use with an inode flag

On Wed, 2021-12-22 at 23:25 +0000, David Howells wrote:
> Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
> the kernel to prevent cachefiles or other kernel services from interfering
> with that file.
>
> Using S_SWAPFILE instead isn't really viable as that has other effects in
> the I/O paths.
>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/163819642273.215744.6414248677118690672.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906943215.143852.16972351425323967014.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967154118.1823006.13227551961786743991.stgit@warthog.procyon.org.uk/ # v3
> ---
>
> fs/cachefiles/internal.h | 2 ++
> fs/cachefiles/namei.c | 35 +++++++++++++++++++++++++++++++++++
> 2 files changed, 37 insertions(+)
>
> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
> index 01071e7a7c02..7c67a70a3dff 100644
> --- a/fs/cachefiles/internal.h
> +++ b/fs/cachefiles/internal.h
> @@ -187,6 +187,8 @@ extern struct kmem_cache *cachefiles_object_jar;
> /*
> * namei.c
> */
> +extern void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
> + struct file *file);
> extern struct dentry *cachefiles_get_directory(struct cachefiles_cache *cache,
> struct dentry *dir,
> const char *name,
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 11a33209ab5f..db60a671c3fc 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -31,6 +31,18 @@ static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
> return can_use;
> }
>
> +static bool cachefiles_mark_inode_in_use(struct cachefiles_object *object,
> + struct dentry *dentry)
> +{
> + struct inode *inode = d_backing_inode(dentry);
> + bool can_use;
> +
> + inode_lock(inode);
> + can_use = __cachefiles_mark_inode_in_use(object, dentry);
> + inode_unlock(inode);
> + return can_use;
> +}
> +
> /*
> * Unmark a backing inode. The caller must hold the inode lock.
> */
> @@ -43,6 +55,29 @@ static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
> trace_cachefiles_mark_inactive(object, inode);
> }
>
> +/*
> + * Unmark a backing inode and tell cachefilesd that there's something that can
> + * be culled.
> + */
> +void cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
> + struct file *file)
> +{
> + struct cachefiles_cache *cache = object->volume->cache;
> + struct inode *inode = file_inode(file);
> +
> + if (inode) {
> + inode_lock(inode);
> + __cachefiles_unmark_inode_in_use(object, file->f_path.dentry);
> + inode_unlock(inode);
> +
> + if (!test_bit(CACHEFILES_OBJECT_USING_TMPFILE, &object->flags)) {
> + atomic_long_add(inode->i_blocks, &cache->b_released);
> + if (atomic_inc_return(&cache->f_released))
> + cachefiles_state_changed(cache);
> + }
> + }
> +}
> +
> /*
> * get a subdirectory
> */
>
>

Probably, this patch should be merged with #38. The commit logs are the
same, and they are at least somewhat related.

--
Jeff Layton <[email protected]>

2022-01-06 18:26:18

by Marc Dionne

[permalink] [raw]
Subject: Re: [Linux-cachefs] [PATCH v4 00/68] fscache, cachefiles: Rewrite

On Wed, Dec 22, 2021 at 7:13 PM David Howells <[email protected]> wrote:
>
>
> Here's a set of patches implements a rewrite of the fscache driver and a
> matching rewrite of the cachefiles driver, significantly simplifying the
> code compared to what's upstream, removing the complex operation scheduling
> and object state machine in favour of something much smaller and simpler.
>
> The patchset is structured such that the first few patches disable fscache
> use by the network filesystems using it, remove the cachefiles driver
> entirely and as much of the fscache driver as can be got away with without
> causing build failures in the network filesystems. The patches after that
> recreate fscache and then cachefiles, attempting to add the pieces in a
> logical order. Finally, the filesystems are reenabled and then the very
> last patch changes the documentation.
>
>
> WHY REWRITE?
> ============
>
> Fscache's operation scheduling API was intended to handle sequencing of
> cache operations, which were all required (where possible) to run
> asynchronously in parallel with the operations being done by the network
> filesystem, whilst allowing the cache to be brought online and offline and
> to interrupt service for invalidation.
>
> With the advent of the tmpfile capacity in the VFS, however, an opportunity
> arises to do invalidation much more simply, without having to wait for I/O
> that's actually in progress: Cachefiles can simply create a tmpfile, cut
> over the file pointer for the backing object attached to a cookie and
> abandon the in-progress I/O, dismissing it upon completion.
>
> Future work here would involve using Omar Sandoval's vfs_link() with
> AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard
> link from a tmpfile as currently I have to unlink the old file first.
>
> These patches can also simplify the object state handling as I/O operations
> to the cache don't all have to be brought to a stop in order to invalidate
> a file. To that end, and with an eye on to writing a new backing cache
> model in the future, I've taken the opportunity to simplify the indexing
> structure.
>
> I've separated the index cookie concept from the file cookie concept by C
> type now. The former is now called a "volume cookie" (struct
> fscache_volume) and there is a container of file cookies. There are then
> just the two levels. All the index cookie levels are collapsed into a
> single volume cookie, and this has a single printable string as a key. For
> instance, an AFS volume would have a key of something like
> "afs,example.com,1000555", combining the filesystem name, cell name and
> volume ID. This is freeform, but must not have '/' chars in it.
>
> I've also eliminated all pointers back from fscache into the network
> filesystem. This required the duplication of a little bit of data in the
> cookie (cookie key, coherency data and file size), but it's not actually
> that much. This gets rid of problems with making sure we keep netfs data
> structures around so that the cache can access them.
>
> These patches mean that most of the code that was in the drivers before is
> simply gone and those drivers are now almost entirely new code. That being
> the case, there doesn't seem any particular reason to try and maintain
> bisectability across it. Further, there has to be a point in the middle
> where things are cut over as there's a single point everything has to go
> through (ie. /dev/cachefiles) and it can't be in use by two drivers at
> once.
>
>
> ISSUES YET OUTSTANDING
> ======================
>
> There are some issues still outstanding, unaddressed by this patchset, that
> will need fixing in future patchsets, but that don't stop this series from
> being usable:
>
> (1) The cachefiles driver needs to stop using the backing filesystem's
> metadata to store information about what parts of the cache are
> populated. This is not reliable with modern extent-based filesystems.
>
> Fixing this is deferred to a separate patchset as it involves
> negotiation with the network filesystem and the VM as to how much data
> to download to fulfil a read - which brings me on to (2)...
>
> (2) NFS and CIFS do not take account of how the cache would like I/O to be
> structured to meet its granularity requirements. Previously, the
> cache used page granularity, which was fine as the network filesystems
> also dealt in page granularity, and the backing filesystem (ext4, xfs
> or whatever) did whatever it did out of sight. However, we now have
> folios to deal with and the cache will now have to store its own
> metadata to track its contents.
>
> The change I'm looking at making for cachefiles is to store content
> bitmaps in one or more xattrs and making a bit in the map correspond
> to something like a 256KiB block. However, the size of an xattr and
> the fact that they have to be read/updated in one go means that I'm
> looking at covering 1GiB of data per 512-byte map and storing each map
> in an xattr. Cachefiles has the potential to grow into a fully
> fledged filesystem of its very own if I'm not careful.
>
> However, I'm also looking at changing things even more radically and
> going to a different model of how the cache is arranged and managed -
> one that's more akin to the way, say, openafs does things - which
> brings me on to (3)...
>
> (3) The way cachefilesd does culling is very inefficient for large caches
> and it would be better to move it into the kernel if I can as
> cachefilesd has to keep asking the kernel if it can cull a file.
> Changing the way the backend works would allow this to be addressed.
>
>
> BITS THAT MAY BE CONTROVERSIAL
> ==============================
>
> There are some bits I've added that may be controversial:
>
> (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if
> a files is already being used by some other kernel service (e.g. a
> duplicate cachefiles cache in the same directory) and reject it if it
> is. This isn't entirely necessary, but it helps prevent accidental
> data corruption.
>
> I don't want to use S_SWAPFILE as that has other effects, but quite
> possibly swapon() should set S_KERNEL_FILE too.
>
> Note that it doesn't prevent userspace from interfering, though
> perhaps it should. (I have made it prevent a marked directory from
> being rmdir-able).
>
> (2) Cachefiles wants to keep the backing file for a cookie open whilst we
> might need to write to it from network filesystem writeback. The
> problem is that the network filesystem unuses its cookie when its file
> is closed, and so we have nothing pinning the cachefiles file open and
> it will get closed automatically after a short time to avoid
> EMFILE/ENFILE problems.
>
> Reopening the cache file, however, is a problem if this is being done
> due to writeback triggered by exit(). Some filesystems will oops if
> we try to open a file in that context because they want to access
> current->fs or suchlike.
>
> To get around this, I added the following:
>
> (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
> filesystem inode to indicate that we have a usage count on the
> cookie caching that inode.
>
> (B) A flag in struct writeback_control, unpinned_fscache_wb, that is
> set when __writeback_single_inode() clears the last dirty page
> from i_pages - at which point it clears I_PINNING_FSCACHE_WB and
> sets this flag.
>
> This has to be done here so that clearing I_PINNING_FSCACHE_WB can
> be done atomically with the check of PAGECACHE_TAG_DIRTY that
> clears I_DIRTY_PAGES.
>
> (C) A function, fscache_set_page_dirty(), which if it is not set, sets
> I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the
> cache resources.
>
> (D) A function, fscache_unpin_writeback(), to be called by
> ->write_inode() to unuse the cookie.
>
> (E) A function, fscache_clear_inode_writeback(), to be called when the
> inode is evicted, before clear_inode() is called. This cleans up
> any lingering I_PINNING_FSCACHE_WB.
>
> The network filesystem can then use these tools to make sure that
> fscache_write_to_cache() can write locally modified data to the cache
> as well as to the server.
>
> For the future, I'm working on write helpers for netfs lib that should
> allow this facility to be removed by keeping track of the dirty
> regions separately - but that's incomplete at the moment and is also
> going to be affected by folios, one way or another, since it deals
> with pages.
>
>
> These patches can be found also on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite
>
> David
>
>
> Changes
> =======
> ver #4:
> - Dropped a pair of patches to try and cope with multipage folios in
> afs_write_begin/end() - it should really be done in the caller[7].
> - Fixed the use of sizeof with memset in cifs.
> - Removed an extraneous kdoc param.
> - Added a patch to add a tracepoint for fscache_use/unuse_cookie().
> - In cifs, tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
> - Add an expanded version of a patch to use current_is_kswapd() instead of
> !gfpflags_allow_blocking()[8].
> - Removed a couple of debugging print statements.
>
> ver #3:
> - Fixed a race in the cookie state machine between LRU discard and
> relinquishment[4].
> - Fixed up the hashing to make it portable[5].
> - Fixed up some netfs coherency data to make it portable.
> - Fixed some missing NFS_FSCACHE=n fallback functions in nfs[6].
> - Added a patch to store volume coherency data in an xattr.
> - Added a check that the cookie is unhashed before being freed.
> - Fixed fscache to use remove_proc_subtree() to remove /proc/fs/fscache/.
>
> ver #2:
> - Fix an unused-var warning due to CONFIG_9P_FSCACHE=n.
> - Use gfpflags_allow_blocking() rather than using flag directly.
> - Fixed some error logging in a couple of cachefiles functions.
> - Fixed an error check in the fscache volume allocation.
> - Need to unmark an inode we've moved to the graveyard before unlocking.
> - Upgraded to -rc4 to allow for upstream changes to cifs.
> - Should only change to inval state if can get access to cache.
> - Don't hold n_accesses elevated whilst cache is bound to a cookie, but
> rather add a flag that prevents the state machine from being queued when
> n_accesses reaches 0.
> - Remove the unused cookie pointer field from the fscache_acquire
> tracepoint.
> - Added missing transition to LRU_DISCARDING state.
> - Added two ceph patches from Jeff Layton[2].
> - Remove NFS_INO_FSCACHE as it's no longer used.
> - In NFS, need to unuse a cookie on file-release, not inode-clear.
> - Filled in the NFS cache I/O routines, borrowing from the previously posted
> fallback I/O code[3].
>
>
> Link: https://lore.kernel.org/r/[email protected]/ [1]
> Link: https://lore.kernel.org/r/[email protected]/ [2]
> Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [3]
> Link: https://lore.kernel.org/r/[email protected]/ [4]
> Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [5]
> Link: https://lore.kernel.org/r/61b90f3d.H1IkoeQfEsGNhvq9%[email protected]/ [6]
> Link: https://lore.kernel.org/r/CAHk-=wh2dr=NgVSVj0sw-gSuzhxhLRV5FymfPS146zGgF4kBjA@mail.gmail.com/ [7]
> Link: https://lore.kernel.org/r/[email protected]/ [8]
>
>
> References
> ==========
>
> These patches have been published for review before, firstly as part of a
> larger set:
>
> Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/
>
> Then as a cut-down set:
>
> Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
> Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5
>
> I split out a set to just restructure the I/O, which got merged back in to
> this one:
>
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk/ # v4
>
> ... and a larger set to do the conversion, also merged back into this one:
>
> Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk/ # v2
>
> Older versions of this one:
>
> Link: https://lore.kernel.org/r/163819575444.215744.318477214576928110.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967073889.1823006.12237147297060239168.stgit@warthog.procyon.org.uk/ # v3
>
> Proposals/information about the design have been published here:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> And requests for information:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> I've posted partial patches to try and help 9p and cifs along:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> ---
> Dave Wysochanski (1):
> nfs: Convert to new fscache volume/cookie API
>
> David Howells (65):
> fscache, cachefiles: Disable configuration
> cachefiles: Delete the cachefiles driver pending rewrite
> fscache: Remove the contents of the fscache driver, pending rewrite
> netfs: Display the netfs inode number in the netfs_read tracepoint
> netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space
> fscache: Introduce new driver
> fscache: Implement a hash function
> fscache: Implement cache registration
> fscache: Implement volume registration
> fscache: Implement cookie registration
> fscache: Implement cache-level access helpers
> fscache: Implement volume-level access helpers
> fscache: Implement cookie-level access helpers
> fscache: Implement functions add/remove a cache
> fscache: Provide and use cache methods to lookup/create/free a volume
> fscache: Add a function for a cache backend to note an I/O error
> fscache: Implement simple cookie state machine
> fscache: Implement cookie user counting and resource pinning
> fscache: Implement cookie invalidation
> fscache: Provide a means to begin an operation
> fscache: Count data storage objects in a cache
> fscache: Provide read/write stat counters for the cache
> fscache: Provide a function to let the netfs update its coherency data
> netfs: Pass more information on how to deal with a hole in the cache
> fscache: Implement raw I/O interface
> fscache: Implement higher-level write I/O interface
> vfs, fscache: Implement pinning of cache usage for writeback
> fscache: Provide a function to note the release of a page
> fscache: Provide a function to resize a cookie
> cachefiles: Introduce rewritten driver
> cachefiles: Define structs
> cachefiles: Add some error injection support
> cachefiles: Add a couple of tracepoints for logging errors
> cachefiles: Add cache error reporting macro
> cachefiles: Add security derivation
> cachefiles: Register a miscdev and parse commands over it
> cachefiles: Provide a function to check how much space there is
> vfs, cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement a function to get/create a directory in the cache
> cachefiles: Implement cache registration and withdrawal
> cachefiles: Implement volume support
> cachefiles: Add tracepoints for calls to the VFS
> cachefiles: Implement object lifecycle funcs
> cachefiles: Implement key to filename encoding
> cachefiles: Implement metadata/coherency data storage in xattrs
> cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement culling daemon commands
> cachefiles: Implement backing file wrangling
> cachefiles: Implement begin and end I/O operation
> cachefiles: Implement cookie resize for truncate
> cachefiles: Implement the I/O routines
> fscache, cachefiles: Store the volume coherency data
> cachefiles: Allow cachefiles to actually function
> fscache, cachefiles: Display stats of no-space events
> fscache, cachefiles: Display stat of culling events
> afs: Convert afs to use the new fscache API
> afs: Copy local writes to the cache when writing to the server
> afs: Skip truncation on the server of data we haven't written yet
> 9p: Use fscache indexing rewrite and reenable caching
> 9p: Copy local writes to the cache when writing to the server
> nfs: Implement cache I/O by accessing the cache directly
> cifs: Support fscache indexing rewrite (untested)
> fscache: Rewrite documentation
> fscache: Add a tracepoint for cookie use/unuse
> 9p, afs, ceph, cifs, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
>
> Jeff Layton (2):
> ceph: conversion to new fscache API
> ceph: add fscache writeback support
>
>
> .../filesystems/caching/backend-api.rst | 850 ++++------
> .../filesystems/caching/cachefiles.rst | 6 +-
> Documentation/filesystems/caching/fscache.rst | 525 ++----
> Documentation/filesystems/caching/index.rst | 4 +-
> .../filesystems/caching/netfs-api.rst | 1136 ++++---------
> Documentation/filesystems/caching/object.rst | 313 ----
> .../filesystems/caching/operations.rst | 210 ---
> Documentation/filesystems/netfs_library.rst | 16 +-
> fs/9p/Kconfig | 2 +-
> fs/9p/cache.c | 195 +--
> fs/9p/cache.h | 25 +-
> fs/9p/v9fs.c | 17 +-
> fs/9p/v9fs.h | 13 +-
> fs/9p/vfs_addr.c | 57 +-
> fs/9p/vfs_dir.c | 13 +
> fs/9p/vfs_file.c | 3 +-
> fs/9p/vfs_inode.c | 26 +-
> fs/9p/vfs_inode_dotl.c | 3 +-
> fs/9p/vfs_super.c | 3 +
> fs/afs/Kconfig | 2 +-
> fs/afs/Makefile | 3 -
> fs/afs/cache.c | 68 -
> fs/afs/cell.c | 12 -
> fs/afs/file.c | 38 +-
> fs/afs/inode.c | 101 +-
> fs/afs/internal.h | 37 +-
> fs/afs/main.c | 14 -
> fs/afs/super.c | 1 +
> fs/afs/volume.c | 29 +-
> fs/afs/write.c | 88 +-
> fs/cachefiles/Kconfig | 7 +
> fs/cachefiles/Makefile | 6 +-
> fs/cachefiles/bind.c | 278 ----
> fs/cachefiles/cache.c | 378 +++++
> fs/cachefiles/daemon.c | 180 +-
> fs/cachefiles/error_inject.c | 46 +
> fs/cachefiles/interface.c | 747 ++++-----
> fs/cachefiles/internal.h | 270 +--
> fs/cachefiles/io.c | 330 +++-
> fs/cachefiles/key.c | 201 ++-
> fs/cachefiles/main.c | 22 +-
> fs/cachefiles/namei.c | 1223 ++++++--------
> fs/cachefiles/rdwr.c | 972 -----------
> fs/cachefiles/security.c | 2 +-
> fs/cachefiles/volume.c | 139 ++
> fs/cachefiles/xattr.c | 421 ++---
> fs/ceph/Kconfig | 2 +-
> fs/ceph/addr.c | 102 +-
> fs/ceph/cache.c | 218 +--
> fs/ceph/cache.h | 97 +-
> fs/ceph/caps.c | 3 +-
> fs/ceph/file.c | 13 +-
> fs/ceph/inode.c | 22 +-
> fs/ceph/super.c | 10 +-
> fs/ceph/super.h | 3 +-
> fs/cifs/Kconfig | 2 +-
> fs/cifs/Makefile | 2 +-
> fs/cifs/cache.c | 105 --
> fs/cifs/cifsfs.c | 11 +-
> fs/cifs/cifsglob.h | 5 +-
> fs/cifs/connect.c | 12 -
> fs/cifs/file.c | 64 +-
> fs/cifs/fscache.c | 333 +---
> fs/cifs/fscache.h | 126 +-
> fs/cifs/inode.c | 36 +-
> fs/fs-writeback.c | 8 +
> fs/fscache/Makefile | 6 +-
> fs/fscache/cache.c | 618 +++----
> fs/fscache/cookie.c | 1448 +++++++++--------
> fs/fscache/fsdef.c | 98 --
> fs/fscache/internal.h | 317 +---
> fs/fscache/io.c | 376 ++++-
> fs/fscache/main.c | 147 +-
> fs/fscache/netfs.c | 74 -
> fs/fscache/object.c | 1125 -------------
> fs/fscache/operation.c | 633 -------
> fs/fscache/page.c | 1242 --------------
> fs/fscache/proc.c | 47 +-
> fs/fscache/stats.c | 293 +---
> fs/fscache/volume.c | 517 ++++++
> fs/namei.c | 3 +-
> fs/netfs/read_helper.c | 10 +-
> fs/nfs/Kconfig | 2 +-
> fs/nfs/Makefile | 2 +-
> fs/nfs/client.c | 4 -
> fs/nfs/direct.c | 2 +
> fs/nfs/file.c | 13 +-
> fs/nfs/fscache-index.c | 140 --
> fs/nfs/fscache.c | 490 ++----
> fs/nfs/fscache.h | 180 +-
> fs/nfs/inode.c | 11 +-
> fs/nfs/nfstrace.h | 1 -
> fs/nfs/read.c | 25 +-
> fs/nfs/super.c | 28 +-
> fs/nfs/write.c | 8 +-
> include/linux/fs.h | 4 +
> include/linux/fscache-cache.h | 614 ++-----
> include/linux/fscache.h | 1021 +++++-------
> include/linux/netfs.h | 15 +-
> include/linux/nfs_fs.h | 1 -
> include/linux/nfs_fs_sb.h | 9 +-
> include/linux/writeback.h | 1 +
> include/trace/events/cachefiles.h | 527 ++++--
> include/trace/events/fscache.h | 642 ++++----
> include/trace/events/netfs.h | 5 +-
> 105 files changed, 7396 insertions(+), 13509 deletions(-)
> delete mode 100644 Documentation/filesystems/caching/object.rst
> delete mode 100644 Documentation/filesystems/caching/operations.rst
> delete mode 100644 fs/afs/cache.c
> delete mode 100644 fs/cachefiles/bind.c
> create mode 100644 fs/cachefiles/cache.c
> create mode 100644 fs/cachefiles/error_inject.c
> delete mode 100644 fs/cachefiles/rdwr.c
> create mode 100644 fs/cachefiles/volume.c
> delete mode 100644 fs/cifs/cache.c
> delete mode 100644 fs/fscache/fsdef.c
> delete mode 100644 fs/fscache/netfs.c
> delete mode 100644 fs/fscache/object.c
> delete mode 100644 fs/fscache/operation.c
> delete mode 100644 fs/fscache/page.c
> create mode 100644 fs/fscache/volume.c
> delete mode 100644 fs/nfs/fscache-index.c
>
>

v4 passes all tests in our kafs test suite, so you can add:

Tested-by: [email protected]

Marc

2022-01-06 18:29:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 00/68] fscache, cachefiles: Rewrite

On Wed, 2021-12-22 at 23:13 +0000, David Howells wrote:
> Here's a set of patches implements a rewrite of the fscache driver and a
> matching rewrite of the cachefiles driver, significantly simplifying the
> code compared to what's upstream, removing the complex operation scheduling
> and object state machine in favour of something much smaller and simpler.
>
> The patchset is structured such that the first few patches disable fscache
> use by the network filesystems using it, remove the cachefiles driver
> entirely and as much of the fscache driver as can be got away with without
> causing build failures in the network filesystems. The patches after that
> recreate fscache and then cachefiles, attempting to add the pieces in a
> logical order. Finally, the filesystems are reenabled and then the very
> last patch changes the documentation.
>
>
> WHY REWRITE?
> ============
>
> Fscache's operation scheduling API was intended to handle sequencing of
> cache operations, which were all required (where possible) to run
> asynchronously in parallel with the operations being done by the network
> filesystem, whilst allowing the cache to be brought online and offline and
> to interrupt service for invalidation.
>
> With the advent of the tmpfile capacity in the VFS, however, an opportunity
> arises to do invalidation much more simply, without having to wait for I/O
> that's actually in progress: Cachefiles can simply create a tmpfile, cut
> over the file pointer for the backing object attached to a cookie and
> abandon the in-progress I/O, dismissing it upon completion.
>
> Future work here would involve using Omar Sandoval's vfs_link() with
> AT_LINK_REPLACE[1] to allow an extant file to be displaced by a new hard
> link from a tmpfile as currently I have to unlink the old file first.
>
> These patches can also simplify the object state handling as I/O operations
> to the cache don't all have to be brought to a stop in order to invalidate
> a file. To that end, and with an eye on to writing a new backing cache
> model in the future, I've taken the opportunity to simplify the indexing
> structure.
>
> I've separated the index cookie concept from the file cookie concept by C
> type now. The former is now called a "volume cookie" (struct
> fscache_volume) and there is a container of file cookies. There are then
> just the two levels. All the index cookie levels are collapsed into a
> single volume cookie, and this has a single printable string as a key. For
> instance, an AFS volume would have a key of something like
> "afs,example.com,1000555", combining the filesystem name, cell name and
> volume ID. This is freeform, but must not have '/' chars in it.
>
> I've also eliminated all pointers back from fscache into the network
> filesystem. This required the duplication of a little bit of data in the
> cookie (cookie key, coherency data and file size), but it's not actually
> that much. This gets rid of problems with making sure we keep netfs data
> structures around so that the cache can access them.
>
> These patches mean that most of the code that was in the drivers before is
> simply gone and those drivers are now almost entirely new code. That being
> the case, there doesn't seem any particular reason to try and maintain
> bisectability across it. Further, there has to be a point in the middle
> where things are cut over as there's a single point everything has to go
> through (ie. /dev/cachefiles) and it can't be in use by two drivers at
> once.
>
>
> ISSUES YET OUTSTANDING
> ======================
>
> There are some issues still outstanding, unaddressed by this patchset, that
> will need fixing in future patchsets, but that don't stop this series from
> being usable:
>
> (1) The cachefiles driver needs to stop using the backing filesystem's
> metadata to store information about what parts of the cache are
> populated. This is not reliable with modern extent-based filesystems.
>
> Fixing this is deferred to a separate patchset as it involves
> negotiation with the network filesystem and the VM as to how much data
> to download to fulfil a read - which brings me on to (2)...
>
> (2) NFS and CIFS do not take account of how the cache would like I/O to be
> structured to meet its granularity requirements. Previously, the
> cache used page granularity, which was fine as the network filesystems
> also dealt in page granularity, and the backing filesystem (ext4, xfs
> or whatever) did whatever it did out of sight. However, we now have
> folios to deal with and the cache will now have to store its own
> metadata to track its contents.
>
> The change I'm looking at making for cachefiles is to store content
> bitmaps in one or more xattrs and making a bit in the map correspond
> to something like a 256KiB block. However, the size of an xattr and
> the fact that they have to be read/updated in one go means that I'm
> looking at covering 1GiB of data per 512-byte map and storing each map
> in an xattr. Cachefiles has the potential to grow into a fully
> fledged filesystem of its very own if I'm not careful.
>
> However, I'm also looking at changing things even more radically and
> going to a different model of how the cache is arranged and managed -
> one that's more akin to the way, say, openafs does things - which
> brings me on to (3)...
>
> (3) The way cachefilesd does culling is very inefficient for large caches
> and it would be better to move it into the kernel if I can as
> cachefilesd has to keep asking the kernel if it can cull a file.
> Changing the way the backend works would allow this to be addressed.
>
>
> BITS THAT MAY BE CONTROVERSIAL
> ==============================
>
> There are some bits I've added that may be controversial:
>
> (1) I've provided a flag, S_KERNEL_FILE, that cachefiles uses to check if
> a files is already being used by some other kernel service (e.g. a
> duplicate cachefiles cache in the same directory) and reject it if it
> is. This isn't entirely necessary, but it helps prevent accidental
> data corruption.
>
> I don't want to use S_SWAPFILE as that has other effects, but quite
> possibly swapon() should set S_KERNEL_FILE too.
>
> Note that it doesn't prevent userspace from interfering, though
> perhaps it should. (I have made it prevent a marked directory from
> being rmdir-able).
>
> (2) Cachefiles wants to keep the backing file for a cookie open whilst we
> might need to write to it from network filesystem writeback. The
> problem is that the network filesystem unuses its cookie when its file
> is closed, and so we have nothing pinning the cachefiles file open and
> it will get closed automatically after a short time to avoid
> EMFILE/ENFILE problems.
>
> Reopening the cache file, however, is a problem if this is being done
> due to writeback triggered by exit(). Some filesystems will oops if
> we try to open a file in that context because they want to access
> current->fs or suchlike.
>
> To get around this, I added the following:
>
> (A) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network
> filesystem inode to indicate that we have a usage count on the
> cookie caching that inode.
>
> (B) A flag in struct writeback_control, unpinned_fscache_wb, that is
> set when __writeback_single_inode() clears the last dirty page
> from i_pages - at which point it clears I_PINNING_FSCACHE_WB and
> sets this flag.
>
> This has to be done here so that clearing I_PINNING_FSCACHE_WB can
> be done atomically with the check of PAGECACHE_TAG_DIRTY that
> clears I_DIRTY_PAGES.
>
> (C) A function, fscache_set_page_dirty(), which if it is not set, sets
> I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the
> cache resources.
>
> (D) A function, fscache_unpin_writeback(), to be called by
> ->write_inode() to unuse the cookie.
>
> (E) A function, fscache_clear_inode_writeback(), to be called when the
> inode is evicted, before clear_inode() is called. This cleans up
> any lingering I_PINNING_FSCACHE_WB.
>
> The network filesystem can then use these tools to make sure that
> fscache_write_to_cache() can write locally modified data to the cache
> as well as to the server.
>
> For the future, I'm working on write helpers for netfs lib that should
> allow this facility to be removed by keeping track of the dirty
> regions separately - but that's incomplete at the moment and is also
> going to be affected by folios, one way or another, since it deals
> with pages.
>
>
> These patches can be found also on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite
>
> David
>
>
> Changes
> =======
> ver #4:
> - Dropped a pair of patches to try and cope with multipage folios in
> afs_write_begin/end() - it should really be done in the caller[7].
> - Fixed the use of sizeof with memset in cifs.
> - Removed an extraneous kdoc param.
> - Added a patch to add a tracepoint for fscache_use/unuse_cookie().
> - In cifs, tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().
> - Add an expanded version of a patch to use current_is_kswapd() instead of
> !gfpflags_allow_blocking()[8].
> - Removed a couple of debugging print statements.
>
> ver #3:
> - Fixed a race in the cookie state machine between LRU discard and
> relinquishment[4].
> - Fixed up the hashing to make it portable[5].
> - Fixed up some netfs coherency data to make it portable.
> - Fixed some missing NFS_FSCACHE=n fallback functions in nfs[6].
> - Added a patch to store volume coherency data in an xattr.
> - Added a check that the cookie is unhashed before being freed.
> - Fixed fscache to use remove_proc_subtree() to remove /proc/fs/fscache/.
>
> ver #2:
> - Fix an unused-var warning due to CONFIG_9P_FSCACHE=n.
> - Use gfpflags_allow_blocking() rather than using flag directly.
> - Fixed some error logging in a couple of cachefiles functions.
> - Fixed an error check in the fscache volume allocation.
> - Need to unmark an inode we've moved to the graveyard before unlocking.
> - Upgraded to -rc4 to allow for upstream changes to cifs.
> - Should only change to inval state if can get access to cache.
> - Don't hold n_accesses elevated whilst cache is bound to a cookie, but
> rather add a flag that prevents the state machine from being queued when
> n_accesses reaches 0.
> - Remove the unused cookie pointer field from the fscache_acquire
> tracepoint.
> - Added missing transition to LRU_DISCARDING state.
> - Added two ceph patches from Jeff Layton[2].
> - Remove NFS_INO_FSCACHE as it's no longer used.
> - In NFS, need to unuse a cookie on file-release, not inode-clear.
> - Filled in the NFS cache I/O routines, borrowing from the previously posted
> fallback I/O code[3].
>
>
> Link: https://lore.kernel.org/r/[email protected]/ [1]
> Link: https://lore.kernel.org/r/[email protected]/ [2]
> Link: https://lore.kernel.org/r/163189108292.2509237.12615909591150927232.stgit@warthog.procyon.org.uk/ [3]
> Link: https://lore.kernel.org/r/[email protected]/ [4]
> Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [5]
> Link: https://lore.kernel.org/r/61b90f3d.H1IkoeQfEsGNhvq9%[email protected]/ [6]
> Link: https://lore.kernel.org/r/CAHk-=wh2dr=NgVSVj0sw-gSuzhxhLRV5FymfPS146zGgF4kBjA@mail.gmail.com/ [7]
> Link: https://lore.kernel.org/r/[email protected]/ [8]
>
>
> References
> ==========
>
> These patches have been published for review before, firstly as part of a
> larger set:
>
> Link: https://lore.kernel.org/r/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/
>
> Link: https://lore.kernel.org/r/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/
>
> Then as a cut-down set:
>
> Link: https://lore.kernel.org/r/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk/ # v4
> Link: https://lore.kernel.org/r/161653784755.2770958.11820491619308713741.stgit@warthog.procyon.org.uk/ # v5
>
> I split out a set to just restructure the I/O, which got merged back in to
> this one:
>
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/
> Link: https://lore.kernel.org/r/163189104510.2509237.10805032055807259087.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163363935000.1980952.15279841414072653108.stgit@warthog.procyon.org.uk/ # v3
> Link: https://lore.kernel.org/r/163551653404.1877519.12363794970541005441.stgit@warthog.procyon.org.uk/ # v4
>
> ... and a larger set to do the conversion, also merged back into this one:
>
> Link: https://lore.kernel.org/r/163456861570.2614702.14754548462706508617.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163492911924.1038219.13107463173777870713.stgit@warthog.procyon.org.uk/ # v2
>
> Older versions of this one:
>
> Link: https://lore.kernel.org/r/163819575444.215744.318477214576928110.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906878733.143852.5604115678965006622.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967073889.1823006.12237147297060239168.stgit@warthog.procyon.org.uk/ # v3
>
> Proposals/information about the design have been published here:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> And requests for information:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> I've posted partial patches to try and help 9p and cifs along:
>
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
> Link: https://lore.kernel.org/r/[email protected]/
>
> ---
> Dave Wysochanski (1):
> nfs: Convert to new fscache volume/cookie API
>
> David Howells (65):
> fscache, cachefiles: Disable configuration
> cachefiles: Delete the cachefiles driver pending rewrite
> fscache: Remove the contents of the fscache driver, pending rewrite
> netfs: Display the netfs inode number in the netfs_read tracepoint
> netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd space
> fscache: Introduce new driver
> fscache: Implement a hash function
> fscache: Implement cache registration
> fscache: Implement volume registration
> fscache: Implement cookie registration
> fscache: Implement cache-level access helpers
> fscache: Implement volume-level access helpers
> fscache: Implement cookie-level access helpers
> fscache: Implement functions add/remove a cache
> fscache: Provide and use cache methods to lookup/create/free a volume
> fscache: Add a function for a cache backend to note an I/O error
> fscache: Implement simple cookie state machine
> fscache: Implement cookie user counting and resource pinning
> fscache: Implement cookie invalidation
> fscache: Provide a means to begin an operation
> fscache: Count data storage objects in a cache
> fscache: Provide read/write stat counters for the cache
> fscache: Provide a function to let the netfs update its coherency data
> netfs: Pass more information on how to deal with a hole in the cache
> fscache: Implement raw I/O interface
> fscache: Implement higher-level write I/O interface
> vfs, fscache: Implement pinning of cache usage for writeback
> fscache: Provide a function to note the release of a page
> fscache: Provide a function to resize a cookie
> cachefiles: Introduce rewritten driver
> cachefiles: Define structs
> cachefiles: Add some error injection support
> cachefiles: Add a couple of tracepoints for logging errors
> cachefiles: Add cache error reporting macro
> cachefiles: Add security derivation
> cachefiles: Register a miscdev and parse commands over it
> cachefiles: Provide a function to check how much space there is
> vfs, cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement a function to get/create a directory in the cache
> cachefiles: Implement cache registration and withdrawal
> cachefiles: Implement volume support
> cachefiles: Add tracepoints for calls to the VFS
> cachefiles: Implement object lifecycle funcs
> cachefiles: Implement key to filename encoding
> cachefiles: Implement metadata/coherency data storage in xattrs
> cachefiles: Mark a backing file in use with an inode flag
> cachefiles: Implement culling daemon commands
> cachefiles: Implement backing file wrangling
> cachefiles: Implement begin and end I/O operation
> cachefiles: Implement cookie resize for truncate
> cachefiles: Implement the I/O routines
> fscache, cachefiles: Store the volume coherency data
> cachefiles: Allow cachefiles to actually function
> fscache, cachefiles: Display stats of no-space events
> fscache, cachefiles: Display stat of culling events
> afs: Convert afs to use the new fscache API
> afs: Copy local writes to the cache when writing to the server
> afs: Skip truncation on the server of data we haven't written yet
> 9p: Use fscache indexing rewrite and reenable caching
> 9p: Copy local writes to the cache when writing to the server
> nfs: Implement cache I/O by accessing the cache directly
> cifs: Support fscache indexing rewrite (untested)
> fscache: Rewrite documentation
> fscache: Add a tracepoint for cookie use/unuse
> 9p, afs, ceph, cifs, nfs: Use current_is_kswapd() rather than gfpflags_allow_blocking()
>
> Jeff Layton (2):
> ceph: conversion to new fscache API
> ceph: add fscache writeback support
>
>
> .../filesystems/caching/backend-api.rst | 850 ++++------
> .../filesystems/caching/cachefiles.rst | 6 +-
> Documentation/filesystems/caching/fscache.rst | 525 ++----
> Documentation/filesystems/caching/index.rst | 4 +-
> .../filesystems/caching/netfs-api.rst | 1136 ++++---------
> Documentation/filesystems/caching/object.rst | 313 ----
> .../filesystems/caching/operations.rst | 210 ---
> Documentation/filesystems/netfs_library.rst | 16 +-
> fs/9p/Kconfig | 2 +-
> fs/9p/cache.c | 195 +--
> fs/9p/cache.h | 25 +-
> fs/9p/v9fs.c | 17 +-
> fs/9p/v9fs.h | 13 +-
> fs/9p/vfs_addr.c | 57 +-
> fs/9p/vfs_dir.c | 13 +
> fs/9p/vfs_file.c | 3 +-
> fs/9p/vfs_inode.c | 26 +-
> fs/9p/vfs_inode_dotl.c | 3 +-
> fs/9p/vfs_super.c | 3 +
> fs/afs/Kconfig | 2 +-
> fs/afs/Makefile | 3 -
> fs/afs/cache.c | 68 -
> fs/afs/cell.c | 12 -
> fs/afs/file.c | 38 +-
> fs/afs/inode.c | 101 +-
> fs/afs/internal.h | 37 +-
> fs/afs/main.c | 14 -
> fs/afs/super.c | 1 +
> fs/afs/volume.c | 29 +-
> fs/afs/write.c | 88 +-
> fs/cachefiles/Kconfig | 7 +
> fs/cachefiles/Makefile | 6 +-
> fs/cachefiles/bind.c | 278 ----
> fs/cachefiles/cache.c | 378 +++++
> fs/cachefiles/daemon.c | 180 +-
> fs/cachefiles/error_inject.c | 46 +
> fs/cachefiles/interface.c | 747 ++++-----
> fs/cachefiles/internal.h | 270 +--
> fs/cachefiles/io.c | 330 +++-
> fs/cachefiles/key.c | 201 ++-
> fs/cachefiles/main.c | 22 +-
> fs/cachefiles/namei.c | 1223 ++++++--------
> fs/cachefiles/rdwr.c | 972 -----------
> fs/cachefiles/security.c | 2 +-
> fs/cachefiles/volume.c | 139 ++
> fs/cachefiles/xattr.c | 421 ++---
> fs/ceph/Kconfig | 2 +-
> fs/ceph/addr.c | 102 +-
> fs/ceph/cache.c | 218 +--
> fs/ceph/cache.h | 97 +-
> fs/ceph/caps.c | 3 +-
> fs/ceph/file.c | 13 +-
> fs/ceph/inode.c | 22 +-
> fs/ceph/super.c | 10 +-
> fs/ceph/super.h | 3 +-
> fs/cifs/Kconfig | 2 +-
> fs/cifs/Makefile | 2 +-
> fs/cifs/cache.c | 105 --
> fs/cifs/cifsfs.c | 11 +-
> fs/cifs/cifsglob.h | 5 +-
> fs/cifs/connect.c | 12 -
> fs/cifs/file.c | 64 +-
> fs/cifs/fscache.c | 333 +---
> fs/cifs/fscache.h | 126 +-
> fs/cifs/inode.c | 36 +-
> fs/fs-writeback.c | 8 +
> fs/fscache/Makefile | 6 +-
> fs/fscache/cache.c | 618 +++----
> fs/fscache/cookie.c | 1448 +++++++++--------
> fs/fscache/fsdef.c | 98 --
> fs/fscache/internal.h | 317 +---
> fs/fscache/io.c | 376 ++++-
> fs/fscache/main.c | 147 +-
> fs/fscache/netfs.c | 74 -
> fs/fscache/object.c | 1125 -------------
> fs/fscache/operation.c | 633 -------
> fs/fscache/page.c | 1242 --------------
> fs/fscache/proc.c | 47 +-
> fs/fscache/stats.c | 293 +---
> fs/fscache/volume.c | 517 ++++++
> fs/namei.c | 3 +-
> fs/netfs/read_helper.c | 10 +-
> fs/nfs/Kconfig | 2 +-
> fs/nfs/Makefile | 2 +-
> fs/nfs/client.c | 4 -
> fs/nfs/direct.c | 2 +
> fs/nfs/file.c | 13 +-
> fs/nfs/fscache-index.c | 140 --
> fs/nfs/fscache.c | 490 ++----
> fs/nfs/fscache.h | 180 +-
> fs/nfs/inode.c | 11 +-
> fs/nfs/nfstrace.h | 1 -
> fs/nfs/read.c | 25 +-
> fs/nfs/super.c | 28 +-
> fs/nfs/write.c | 8 +-
> include/linux/fs.h | 4 +
> include/linux/fscache-cache.h | 614 ++-----
> include/linux/fscache.h | 1021 +++++-------
> include/linux/netfs.h | 15 +-
> include/linux/nfs_fs.h | 1 -
> include/linux/nfs_fs_sb.h | 9 +-
> include/linux/writeback.h | 1 +
> include/trace/events/cachefiles.h | 527 ++++--
> include/trace/events/fscache.h | 642 ++++----
> include/trace/events/netfs.h | 5 +-
> 105 files changed, 7396 insertions(+), 13509 deletions(-)
> delete mode 100644 Documentation/filesystems/caching/object.rst
> delete mode 100644 Documentation/filesystems/caching/operations.rst
> delete mode 100644 fs/afs/cache.c
> delete mode 100644 fs/cachefiles/bind.c
> create mode 100644 fs/cachefiles/cache.c
> create mode 100644 fs/cachefiles/error_inject.c
> delete mode 100644 fs/cachefiles/rdwr.c
> create mode 100644 fs/cachefiles/volume.c
> delete mode 100644 fs/cifs/cache.c
> delete mode 100644 fs/fscache/fsdef.c
> delete mode 100644 fs/fscache/netfs.c
> delete mode 100644 fs/fscache/object.c
> delete mode 100644 fs/fscache/operation.c
> delete mode 100644 fs/fscache/page.c
> create mode 100644 fs/fscache/volume.c
> delete mode 100644 fs/nfs/fscache-index.c
>
>

Whew, big set, but it looks good overall. I had a few nit-picky
comments, but I'm fine with fixing those up in follow-on patches so that
we don't invalidate any testing done so far.

You can add my Reviewed-by to patches 1-55 and 66-68.

For 56-63, you can add my Acked-by. I'm less familiar with those
filesystems these days, but they seem sane.

Nice work!
--
Jeff Layton <[email protected]>

2022-01-07 11:20:17

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 44/68] cachefiles: Implement key to filename encoding

Jeff Layton <[email protected]> wrote:

> Since most cookies are fairly small, is there any real benefit to
> optimizing for length here? How much inflation are we talking about?

Taking AFS as an example, a vnode is represented at the file level by two
numbers: a 32-bit or 96-bit file ID and a 32-bit uniquifier. If it's a 96-bit
file ID, a lot of the time, the upper 64-bits will be zero, so we're talking
something like:

S421d4,1f07f34,,

instead of:

S000421d401f07f340000000000000000

or:

E0AAQh1AHwfzQAAAAAAAAAAA==

The first makes for a more readable name in the cache. The real fun is with
NFS, where the name can be very long. For one that's just 5 words in length:

T81010001,1,20153e2,,a906194b

instead of:

T8101000100000001020153e200000000a906194b

or:

E0gQEAAQAAAAECAVPiAAAAAKkGGUs=

(The letter on the front represents the encoding scheme; in the base64 encoding
the second digit indicates the amount of padding).

I don't know how much difference it makes to the backing filesystem's
directory packing - and it may depend on the particular filesystem.

David


2022-01-07 11:25:46

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 46/68] cachefiles: Mark a backing file in use with an inode flag

Jeff Layton <[email protected]> wrote:

> Probably, this patch should be merged with #38. The commit logs are the
> same, and they are at least somewhat related.

That's not so simple, unfortunately. Patch 46 requires bits of patches 43 and
45 in addition to patch 38 and patch 39 depends on patch 38.

David


2022-01-07 21:53:02

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 63/68] cifs: Support fscache indexing rewrite (untested)

This patch isn't quite right and needs a fix. The attached patch fixes it.
I'll fold that in and post a v5 of this patch.

David
---
cifs: Fix the fscache cookie management

Fix the fscache cookie management in cifs in the following ways:

(1) The cookie should be released in cifs_evict_inode() after it has been
unused from being pinned by cifs_set_page_dirty().

(2) The cookie shouldn't be released in cifsFileInfo_put_final(). That's
dealing with closing a file, not getting rid of an inode. The cookie
needs to persist beyond the last file close because writepages may
happen after closure.

(3) The cookie needs to be used in cifs_atomic_open() to match
cifs_open(). As yet unknown files being opened for writing seem to go
by the former route rather than the latter.

This can be triggered by something like:

# systemctl start cachefilesd
# mount //carina/test /xfstest.test -o user=shares,pass=xxxx.fsc
# bash 5</xfstest.test/bar
# echo hello >/xfstest.test/bar

The key is to open the file R/O and then open it R/W and write to it whilst
it's still open for R/O. A cookie isn't acquired if it's just opened R/W
as it goes through the atomic_open method rather than the open method.

Signed-off-by: David Howells <[email protected]>
---
fs/cifs/cifsfs.c | 8 ++++++++
fs/cifs/dir.c | 4 ++++
fs/cifs/file.c | 2 --
3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index d3f3acf340f1..26cf2193c9a2 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -398,6 +398,7 @@ cifs_evict_inode(struct inode *inode)
truncate_inode_pages_final(&inode->i_data);
if (inode->i_state & I_PINNING_FSCACHE_WB)
cifs_fscache_unuse_inode_cookie(inode, true);
+ cifs_fscache_release_inode_cookie(inode);
clear_inode(inode);
}

@@ -722,6 +723,12 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)
}
#endif

+static int cifs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+ fscache_unpin_writeback(wbc, cifs_inode_cookie(inode));
+ return 0;
+}
+
static int cifs_drop_inode(struct inode *inode)
{
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
@@ -734,6 +741,7 @@ static int cifs_drop_inode(struct inode *inode)
static const struct super_operations cifs_super_ops = {
.statfs = cifs_statfs,
.alloc_inode = cifs_alloc_inode,
+ .write_inode = cifs_write_inode,
.free_inode = cifs_free_inode,
.drop_inode = cifs_drop_inode,
.evict_inode = cifs_evict_inode,
diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 6e8e7cc26ae2..6186824b366e 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -22,6 +22,7 @@
#include "cifs_unicode.h"
#include "fs_context.h"
#include "cifs_ioctl.h"
+#include "fscache.h"

static void
renew_parental_timestamps(struct dentry *direntry)
@@ -509,6 +510,9 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry,
rc = -ENOMEM;
}

+ fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
+ file->f_mode & FMODE_WRITE);
+
out:
cifs_put_tlink(tlink);
out_free_xid:
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index d872f6fe8e7d..44da7646f789 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -376,8 +376,6 @@ static void cifsFileInfo_put_final(struct cifsFileInfo *cifs_file)
struct cifsLockInfo *li, *tmp;
struct super_block *sb = inode->i_sb;

- cifs_fscache_release_inode_cookie(inode);
-
/*
* Delete any outstanding lock records. We'll lose them when the file
* is closed anyway.


2022-01-07 22:16:43

by David Howells

[permalink] [raw]
Subject: [PATCH v5 63/68] cifs: Support fscache indexing rewrite

Change the cifs filesystem to take account of the changes to fscache's
indexing rewrite and reenable caching in cifs.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.

(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.

For cifs, I've made it render the volume name string as:

"cifs,<ipaddress>,<sharename>"

where the sharename has '/' characters replaced with ';'.

This probably needs rethinking a bit as the total name could exceed
the maximum filename component length.

Further, the coherency data is currently just set to 0. It needs
something else doing with it - I wonder if it would suffice simply to
sum the resource_id, vol_create_time and vol_serial_number or maybe
hash them.

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before.

(4) The functions to set/reset cookies are removed and
fscache_use_cookie() and fscache_unuse_cookie() are used instead.

fscache_use_cookie() is passed a flag to indicate if the cookie is
opened for writing. fscache_unuse_cookie() is passed updates for the
metadata if we changed it (ie. if the file was opened for writing).

These are called when the file is opened or closed.

(5) cifs_setattr_*() are made to call fscache_resize() to change the size
of the cache object.

(6) The functions to read and write data are stubbed out pending a
conversion to use netfslib.

Changes
=======
ver #5:
- Fixed a couple of bits of cookie handling[2]:
- The cookie should be released in cifs_evict_inode(), not
cifsFileInfo_put_final(). The cookie needs to persist beyond file
closure so that writepages will be able to write to it.
- fscache_use_cookie() needs to be called in cifs_atomic_open() as it is
for cifs_open().

ver #4:
- Fixed the use of sizeof with memset.
- tcon->vol_create_time is __le64 so doesn't need cpu_to_le64().

ver #3:
- Canonicalise the cifs coherency data to make the cache portable.
- Set volume coherency data.

ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- Upgraded to -rc4 to allow for upstream changes[1].
- fscache_acquire_volume() now returns errors.

Signed-off-by: David Howells <[email protected]>
Acked-by: Jeff Layton <[email protected]>
cc: Steve French <[email protected]>
cc: Shyam Prasad N <[email protected]>
cc: [email protected]
cc: [email protected]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23b55d673d7527b093cd97b7c217c82e70cd1af0 [1]
Link: https://lore.kernel.org/r/[email protected]/ [2]
Link: https://lore.kernel.org/r/163819671009.215744.11230627184193298714.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906982979.143852.10672081929614953210.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967187187.1823006.247415138444991444.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021579335.640689.2681324337038770579.stgit@warthog.procyon.org.uk/ # v4
---
fs/cifs/Kconfig | 2
fs/cifs/Makefile | 2
fs/cifs/cache.c | 105 ----------------
fs/cifs/cifsfs.c | 19 +--
fs/cifs/cifsglob.h | 5
fs/cifs/connect.c | 12 -
fs/cifs/dir.c | 4
fs/cifs/file.c | 66 ++++++----
fs/cifs/fscache.c | 333 ++++++++++++-----------------------------------------
fs/cifs/fscache.h | 126 ++++++--------------
fs/cifs/inode.c | 36 +++--
11 files changed, 203 insertions(+), 507 deletions(-)

diff --git a/fs/cifs/Kconfig b/fs/cifs/Kconfig
index 346ae8716deb..3b7e3b9e4fd2 100644
--- a/fs/cifs/Kconfig
+++ b/fs/cifs/Kconfig
@@ -188,7 +188,7 @@ config CIFS_SMB_DIRECT

config CIFS_FSCACHE
bool "Provide CIFS client caching support"
- depends on CIFS=m && FSCACHE_OLD_API || CIFS=y && FSCACHE_OLD_API=y
+ depends on CIFS=m && FSCACHE || CIFS=y && FSCACHE=y
help
Makes CIFS FS-Cache capable. Say Y here if you want your CIFS data
to be cached locally on disk through the general filesystem cache
diff --git a/fs/cifs/Makefile b/fs/cifs/Makefile
index 87fcacdf3de7..cc8fdcb35b71 100644
--- a/fs/cifs/Makefile
+++ b/fs/cifs/Makefile
@@ -25,7 +25,7 @@ cifs-$(CONFIG_CIFS_DFS_UPCALL) += cifs_dfs_ref.o dfs_cache.o

cifs-$(CONFIG_CIFS_SWN_UPCALL) += netlink.o cifs_swn.o

-cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o cache.o
+cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o

cifs-$(CONFIG_CIFS_SMB_DIRECT) += smbdirect.o

diff --git a/fs/cifs/cache.c b/fs/cifs/cache.c
deleted file mode 100644
index 8be57aaedab6..000000000000
--- a/fs/cifs/cache.c
+++ /dev/null
@@ -1,105 +0,0 @@
-// SPDX-License-Identifier: LGPL-2.1
-/*
- * CIFS filesystem cache index structure definitions
- *
- * Copyright (c) 2010 Novell, Inc.
- * Authors(s): Suresh Jayaraman ([email protected]>
- *
- */
-#include "fscache.h"
-#include "cifs_debug.h"
-
-/*
- * CIFS filesystem definition for FS-Cache
- */
-struct fscache_netfs cifs_fscache_netfs = {
- .name = "cifs",
- .version = 0,
-};
-
-/*
- * Register CIFS for caching with FS-Cache
- */
-int cifs_fscache_register(void)
-{
- return fscache_register_netfs(&cifs_fscache_netfs);
-}
-
-/*
- * Unregister CIFS for caching
- */
-void cifs_fscache_unregister(void)
-{
- fscache_unregister_netfs(&cifs_fscache_netfs);
-}
-
-/*
- * Server object for FS-Cache
- */
-const struct fscache_cookie_def cifs_fscache_server_index_def = {
- .name = "CIFS.server",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
-};
-
-static enum
-fscache_checkaux cifs_fscache_super_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct cifs_fscache_super_auxdata auxdata;
- const struct cifs_tcon *tcon = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-/*
- * Superblock object for FS-Cache
- */
-const struct fscache_cookie_def cifs_fscache_super_index_def = {
- .name = "CIFS.super",
- .type = FSCACHE_COOKIE_TYPE_INDEX,
- .check_aux = cifs_fscache_super_check_aux,
-};
-
-static enum
-fscache_checkaux cifs_fscache_inode_check_aux(void *cookie_netfs_data,
- const void *data,
- uint16_t datalen,
- loff_t object_size)
-{
- struct cifs_fscache_inode_auxdata auxdata;
- struct cifsInodeInfo *cifsi = cookie_netfs_data;
-
- if (datalen != sizeof(auxdata))
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
- if (memcmp(data, &auxdata, datalen) != 0)
- return FSCACHE_CHECKAUX_OBSOLETE;
-
- return FSCACHE_CHECKAUX_OKAY;
-}
-
-const struct fscache_cookie_def cifs_fscache_inode_object_def = {
- .name = "CIFS.uniqueid",
- .type = FSCACHE_COOKIE_TYPE_DATAFILE,
- .check_aux = cifs_fscache_inode_check_aux,
-};
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index dca42aa87d30..26cf2193c9a2 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -396,6 +396,9 @@ static void
cifs_evict_inode(struct inode *inode)
{
truncate_inode_pages_final(&inode->i_data);
+ if (inode->i_state & I_PINNING_FSCACHE_WB)
+ cifs_fscache_unuse_inode_cookie(inode, true);
+ cifs_fscache_release_inode_cookie(inode);
clear_inode(inode);
}

@@ -720,6 +723,12 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)
}
#endif

+static int cifs_write_inode(struct inode *inode, struct writeback_control *wbc)
+{
+ fscache_unpin_writeback(wbc, cifs_inode_cookie(inode));
+ return 0;
+}
+
static int cifs_drop_inode(struct inode *inode)
{
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
@@ -732,6 +741,7 @@ static int cifs_drop_inode(struct inode *inode)
static const struct super_operations cifs_super_ops = {
.statfs = cifs_statfs,
.alloc_inode = cifs_alloc_inode,
+ .write_inode = cifs_write_inode,
.free_inode = cifs_free_inode,
.drop_inode = cifs_drop_inode,
.evict_inode = cifs_evict_inode,
@@ -1624,13 +1634,9 @@ init_cifs(void)
goto out_destroy_cifsoplockd_wq;
}

- rc = cifs_fscache_register();
- if (rc)
- goto out_destroy_deferredclose_wq;
-
rc = cifs_init_inodecache();
if (rc)
- goto out_unreg_fscache;
+ goto out_destroy_deferredclose_wq;

rc = cifs_init_mids();
if (rc)
@@ -1692,8 +1698,6 @@ init_cifs(void)
cifs_destroy_mids();
out_destroy_inodecache:
cifs_destroy_inodecache();
-out_unreg_fscache:
- cifs_fscache_unregister();
out_destroy_deferredclose_wq:
destroy_workqueue(deferredclose_wq);
out_destroy_cifsoplockd_wq:
@@ -1729,7 +1733,6 @@ exit_cifs(void)
cifs_destroy_request_bufs();
cifs_destroy_mids();
cifs_destroy_inodecache();
- cifs_fscache_unregister();
destroy_workqueue(deferredclose_wq);
destroy_workqueue(cifsoplockd_wq);
destroy_workqueue(decrypt_wq);
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h
index be74606724c7..ba6fbb1ad8f3 100644
--- a/fs/cifs/cifsglob.h
+++ b/fs/cifs/cifsglob.h
@@ -659,9 +659,6 @@ struct TCP_Server_Info {
unsigned int total_read; /* total amount of data read in this pass */
atomic_t in_send; /* requests trying to send */
atomic_t num_waiters; /* blocked waiting to get in sendrecv */
-#ifdef CONFIG_CIFS_FSCACHE
- struct fscache_cookie *fscache; /* client index cache cookie */
-#endif
#ifdef CONFIG_CIFS_STATS2
atomic_t num_cmds[NUMBER_OF_SMB2_COMMANDS]; /* total requests by cmd */
atomic_t smb2slowcmd[NUMBER_OF_SMB2_COMMANDS]; /* count resps > 1 sec */
@@ -1117,7 +1114,7 @@ struct cifs_tcon {
__u32 max_bytes_copy;
#ifdef CONFIG_CIFS_FSCACHE
u64 resource_id; /* server resource id */
- struct fscache_cookie *fscache; /* cookie for share */
+ struct fscache_volume *fscache; /* cookie for share */
#endif
struct list_head pending_opens; /* list of incomplete opens */
struct cached_fid crfid; /* Cached root fid */
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 18448dbd762a..a52fd3a30c88 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -1396,10 +1396,6 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect)

cifs_crypto_secmech_release(server);

- /* fscache server cookies are based on primary channel only */
- if (!CIFS_SERVER_IS_CHAN(server))
- cifs_fscache_release_client_cookie(server);
-
kfree(server->session_key.response);
server->session_key.response = NULL;
server->session_key.len = 0;
@@ -1559,14 +1555,6 @@ cifs_get_tcp_session(struct smb3_fs_context *ctx,
list_add(&tcp_ses->tcp_ses_list, &cifs_tcp_ses_list);
spin_unlock(&cifs_tcp_ses_lock);

- /* fscache server cookies are based on primary channel only */
- if (!CIFS_SERVER_IS_CHAN(tcp_ses))
- cifs_fscache_get_client_cookie(tcp_ses);
-#ifdef CONFIG_CIFS_FSCACHE
- else
- tcp_ses->fscache = tcp_ses->primary_server->fscache;
-#endif /* CONFIG_CIFS_FSCACHE */
-
/* queue echo request delayed work */
queue_delayed_work(cifsiod_wq, &tcp_ses->echo, tcp_ses->echo_interval);

diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 6e8e7cc26ae2..6186824b366e 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -22,6 +22,7 @@
#include "cifs_unicode.h"
#include "fs_context.h"
#include "cifs_ioctl.h"
+#include "fscache.h"

static void
renew_parental_timestamps(struct dentry *direntry)
@@ -509,6 +510,9 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry,
rc = -ENOMEM;
}

+ fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
+ file->f_mode & FMODE_WRITE);
+
out:
cifs_put_tlink(tlink);
out_free_xid:
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 9fee3af83a73..c13807e75698 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -376,8 +376,6 @@ static void cifsFileInfo_put_final(struct cifsFileInfo *cifs_file)
struct cifsLockInfo *li, *tmp;
struct super_block *sb = inode->i_sb;

- cifs_fscache_release_inode_cookie(inode);
-
/*
* Delete any outstanding lock records. We'll lose them when the file
* is closed anyway.
@@ -632,7 +630,18 @@ int cifs_open(struct inode *inode, struct file *file)
goto out;
}

- cifs_fscache_set_inode_cookie(inode, file);
+
+ fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
+ file->f_mode & FMODE_WRITE);
+ if (file->f_flags & O_DIRECT &&
+ (!((file->f_flags & O_ACCMODE) != O_RDONLY) ||
+ file->f_flags & O_APPEND)) {
+ struct cifs_fscache_inode_coherency_data cd;
+ cifs_fscache_fill_coherency(file_inode(file), &cd);
+ fscache_invalidate(cifs_inode_cookie(file_inode(file)),
+ &cd, i_size_read(file_inode(file)),
+ FSCACHE_INVAL_DIO_WRITE);
+ }

if ((oplock & CIFS_CREATE_ACTION) && !posix_open_ok && tcon->unix_ext) {
/*
@@ -876,6 +885,8 @@ int cifs_close(struct inode *inode, struct file *file)
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_deferred_close *dclose;

+ cifs_fscache_unuse_inode_cookie(inode, file->f_mode & FMODE_WRITE);
+
if (file->private_data != NULL) {
cfile = file->private_data;
file->private_data = NULL;
@@ -886,7 +897,6 @@ int cifs_close(struct inode *inode, struct file *file)
dclose) {
if (test_and_clear_bit(CIFS_INO_MODIFIED_ATTR, &cinode->flags)) {
inode->i_ctime = inode->i_mtime = current_time(inode);
- cifs_fscache_update_inode_cookie(inode);
}
spin_lock(&cinode->deferred_lock);
cifs_add_deferred_close(cfile, dclose);
@@ -4198,10 +4208,12 @@ static vm_fault_t
cifs_page_mkwrite(struct vm_fault *vmf)
{
struct page *page = vmf->page;
- struct file *file = vmf->vma->vm_file;
- struct inode *inode = file_inode(file);

- cifs_fscache_wait_on_page_write(inode, page);
+#ifdef CONFIG_CIFS_FSCACHE
+ if (PageFsCache(page) &&
+ wait_on_page_fscache_killable(page) < 0)
+ return VM_FAULT_RETRY;
+#endif

lock_page(page);
return VM_FAULT_LOCKED;
@@ -4275,8 +4287,6 @@ cifs_readv_complete(struct work_struct *work)
if (rdata->result == 0 ||
(rdata->result == -EAGAIN && got_bytes))
cifs_readpage_to_fscache(rdata->mapping->host, page);
- else
- cifs_fscache_uncache_page(rdata->mapping->host, page);

got_bytes -= min_t(unsigned int, PAGE_SIZE, got_bytes);

@@ -4593,11 +4603,6 @@ static int cifs_readpages(struct file *file, struct address_space *mapping,
kref_put(&rdata->refcount, cifs_readdata_release);
}

- /* Any pages that have been shown to fscache but didn't get added to
- * the pagecache must be uncached before they get returned to the
- * allocator.
- */
- cifs_fscache_readpages_cancel(mapping->host, page_list);
free_xid(xid);
return rc;
}
@@ -4801,17 +4806,19 @@ static int cifs_release_page(struct page *page, gfp_t gfp)
{
if (PagePrivate(page))
return 0;
-
- return cifs_fscache_release_page(page, gfp);
+ if (PageFsCache(page)) {
+ if (!gfpflags_allow_blocking(gfp) || !(gfp & __GFP_FS))
+ return false;
+ wait_on_page_fscache(page);
+ }
+ fscache_note_page_release(cifs_inode_cookie(page->mapping->host));
+ return true;
}

static void cifs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length)
{
- struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);
-
- if (offset == 0 && length == PAGE_SIZE)
- cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
+ wait_on_page_fscache(page);
}

static int cifs_launder_page(struct page *page)
@@ -4831,7 +4838,7 @@ static int cifs_launder_page(struct page *page)
if (clear_page_dirty_for_io(page))
rc = cifs_writepage_locked(page, &wbc);

- cifs_fscache_invalidate_page(page, page->mapping->host);
+ wait_on_page_fscache(page);
return rc;
}

@@ -4988,6 +4995,19 @@ static void cifs_swap_deactivate(struct file *file)
/* do we need to unpin (or unlock) the file */
}

+/*
+ * Mark a page as having been made dirty and thus needing writeback. We also
+ * need to pin the cache object to write back to.
+ */
+#ifdef CONFIG_CIFS_FSCACHE
+static int cifs_set_page_dirty(struct page *page)
+{
+ return fscache_set_page_dirty(page, cifs_inode_cookie(page->mapping->host));
+}
+#else
+#define cifs_set_page_dirty __set_page_dirty_nobuffers
+#endif
+
const struct address_space_operations cifs_addr_ops = {
.readpage = cifs_readpage,
.readpages = cifs_readpages,
@@ -4995,7 +5015,7 @@ const struct address_space_operations cifs_addr_ops = {
.writepages = cifs_writepages,
.write_begin = cifs_write_begin,
.write_end = cifs_write_end,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = cifs_set_page_dirty,
.releasepage = cifs_release_page,
.direct_IO = cifs_direct_io,
.invalidatepage = cifs_invalidate_page,
@@ -5020,7 +5040,7 @@ const struct address_space_operations cifs_addr_ops_smallbuf = {
.writepages = cifs_writepages,
.write_begin = cifs_write_begin,
.write_end = cifs_write_end,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .set_page_dirty = cifs_set_page_dirty,
.releasepage = cifs_release_page,
.invalidatepage = cifs_invalidate_page,
.launder_page = cifs_launder_page,
diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c
index 003c5f1f4dfb..efaac4d5ff55 100644
--- a/fs/cifs/fscache.c
+++ b/fs/cifs/fscache.c
@@ -12,250 +12,136 @@
#include "cifs_fs_sb.h"
#include "cifsproto.h"

-/*
- * Key layout of CIFS server cache index object
- */
-struct cifs_server_key {
- __u64 conn_id;
-} __packed;
-
-/*
- * Get a cookie for a server object keyed by {IPaddress,port,family} tuple
- */
-void cifs_fscache_get_client_cookie(struct TCP_Server_Info *server)
-{
- struct cifs_server_key key;
-
- /*
- * Check if cookie was already initialized so don't reinitialize it.
- * In the future, as we integrate with newer fscache features,
- * we may want to instead add a check if cookie has changed
- */
- if (server->fscache)
- return;
-
- memset(&key, 0, sizeof(key));
- key.conn_id = server->conn_id;
-
- server->fscache =
- fscache_acquire_cookie(cifs_fscache_netfs.primary_index,
- &cifs_fscache_server_index_def,
- &key, sizeof(key),
- NULL, 0,
- server, 0, true);
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server, server->fscache);
-}
-
-void cifs_fscache_release_client_cookie(struct TCP_Server_Info *server)
+static void cifs_fscache_fill_volume_coherency(
+ struct cifs_tcon *tcon,
+ struct cifs_fscache_volume_coherency_data *cd)
{
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server, server->fscache);
- fscache_relinquish_cookie(server->fscache, NULL, false);
- server->fscache = NULL;
+ memset(cd, 0, sizeof(*cd));
+ cd->resource_id = cpu_to_le64(tcon->resource_id);
+ cd->vol_create_time = tcon->vol_create_time;
+ cd->vol_serial_number = cpu_to_le32(tcon->vol_serial_number);
}

-void cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
+int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon)
{
+ struct cifs_fscache_volume_coherency_data cd;
struct TCP_Server_Info *server = tcon->ses->server;
+ struct fscache_volume *vcookie;
+ const struct sockaddr *sa = (struct sockaddr *)&server->dstaddr;
+ size_t slen, i;
char *sharename;
- struct cifs_fscache_super_auxdata auxdata;
+ char *key;
+ int ret = -ENOMEM;

- /*
- * Check if cookie was already initialized so don't reinitialize it.
- * In the future, as we integrate with newer fscache features,
- * we may want to instead add a check if cookie has changed
- */
- if (tcon->fscache)
- return;
+ tcon->fscache = NULL;
+ switch (sa->sa_family) {
+ case AF_INET:
+ case AF_INET6:
+ break;
+ default:
+ cifs_dbg(VFS, "Unknown network family '%d'\n", sa->sa_family);
+ return -EINVAL;
+ }
+
+ memset(&key, 0, sizeof(key));

sharename = extract_sharename(tcon->treeName);
if (IS_ERR(sharename)) {
cifs_dbg(FYI, "%s: couldn't extract sharename\n", __func__);
- tcon->fscache = NULL;
- return;
+ return -EINVAL;
}

- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
+ slen = strlen(sharename);
+ for (i = 0; i < slen; i++)
+ if (sharename[i] == '/')
+ sharename[i] = ';';
+
+ key = kasprintf(GFP_KERNEL, "cifs,%pISpc,%s", sa, sharename);
+ if (!key)
+ goto out;
+
+ cifs_fscache_fill_volume_coherency(tcon, &cd);
+ vcookie = fscache_acquire_volume(key,
+ NULL, /* preferred_cache */
+ &cd, sizeof(cd));
+ cifs_dbg(FYI, "%s: (%s/0x%p)\n", __func__, key, vcookie);
+ if (IS_ERR(vcookie)) {
+ if (vcookie != ERR_PTR(-EBUSY)) {
+ ret = PTR_ERR(vcookie);
+ goto out_2;
+ }
+ pr_err("Cache volume key already in use (%s)\n", key);
+ vcookie = NULL;
+ }

- tcon->fscache =
- fscache_acquire_cookie(server->fscache,
- &cifs_fscache_super_index_def,
- sharename, strlen(sharename),
- &auxdata, sizeof(auxdata),
- tcon, 0, true);
+ tcon->fscache = vcookie;
+ ret = 0;
+out_2:
+ kfree(key);
+out:
kfree(sharename);
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, server->fscache, tcon->fscache);
+ return ret;
}

void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon)
{
- struct cifs_fscache_super_auxdata auxdata;
-
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.resource_id = tcon->resource_id;
- auxdata.vol_create_time = tcon->vol_create_time;
- auxdata.vol_serial_number = tcon->vol_serial_number;
+ struct cifs_fscache_volume_coherency_data cd;

cifs_dbg(FYI, "%s: (0x%p)\n", __func__, tcon->fscache);
- fscache_relinquish_cookie(tcon->fscache, &auxdata, false);
- tcon->fscache = NULL;
-}
-
-static void cifs_fscache_acquire_inode_cookie(struct cifsInodeInfo *cifsi,
- struct cifs_tcon *tcon)
-{
- struct cifs_fscache_inode_auxdata auxdata;

- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
- cifsi->fscache =
- fscache_acquire_cookie(tcon->fscache,
- &cifs_fscache_inode_object_def,
- &cifsi->uniqueid, sizeof(cifsi->uniqueid),
- &auxdata, sizeof(auxdata),
- cifsi, cifsi->vfs_inode.i_size, true);
+ cifs_fscache_fill_volume_coherency(tcon, &cd);
+ fscache_relinquish_volume(tcon->fscache, &cd, false);
+ tcon->fscache = NULL;
}

-static void cifs_fscache_enable_inode_cookie(struct inode *inode)
+void cifs_fscache_get_inode_cookie(struct inode *inode)
{
+ struct cifs_fscache_inode_coherency_data cd;
struct cifsInodeInfo *cifsi = CIFS_I(inode);
struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);

- if (cifsi->fscache)
- return;
-
- if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_FSCACHE))
- return;
-
- cifs_fscache_acquire_inode_cookie(cifsi, tcon);
+ cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd);

- cifs_dbg(FYI, "%s: got FH cookie (0x%p/0x%p)\n",
- __func__, tcon->fscache, cifsi->fscache);
+ cifsi->fscache =
+ fscache_acquire_cookie(tcon->fscache, 0,
+ &cifsi->uniqueid, sizeof(cifsi->uniqueid),
+ &cd, sizeof(cd),
+ i_size_read(&cifsi->vfs_inode));
}

-void cifs_fscache_release_inode_cookie(struct inode *inode)
+void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update)
{
- struct cifs_fscache_inode_auxdata auxdata;
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
- if (cifsi->fscache) {
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
+ if (update) {
+ struct cifs_fscache_inode_coherency_data cd;
+ loff_t i_size = i_size_read(inode);

- cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache);
- /* fscache_relinquish_cookie does not seem to update auxdata */
- fscache_update_cookie(cifsi->fscache, &auxdata);
- fscache_relinquish_cookie(cifsi->fscache, &auxdata, false);
- cifsi->fscache = NULL;
+ cifs_fscache_fill_coherency(inode, &cd);
+ fscache_unuse_cookie(cifs_inode_cookie(inode), &cd, &i_size);
+ } else {
+ fscache_unuse_cookie(cifs_inode_cookie(inode), NULL, NULL);
}
}

-void cifs_fscache_update_inode_cookie(struct inode *inode)
+void cifs_fscache_release_inode_cookie(struct inode *inode)
{
- struct cifs_fscache_inode_auxdata auxdata;
struct cifsInodeInfo *cifsi = CIFS_I(inode);

if (cifsi->fscache) {
- memset(&auxdata, 0, sizeof(auxdata));
- auxdata.eof = cifsi->server_eof;
- auxdata.last_write_time_sec = cifsi->vfs_inode.i_mtime.tv_sec;
- auxdata.last_change_time_sec = cifsi->vfs_inode.i_ctime.tv_sec;
- auxdata.last_write_time_nsec = cifsi->vfs_inode.i_mtime.tv_nsec;
- auxdata.last_change_time_nsec = cifsi->vfs_inode.i_ctime.tv_nsec;
-
cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache);
- fscache_update_cookie(cifsi->fscache, &auxdata);
- }
-}
-
-void cifs_fscache_set_inode_cookie(struct inode *inode, struct file *filp)
-{
- cifs_fscache_enable_inode_cookie(inode);
-}
-
-void cifs_fscache_reset_inode_cookie(struct inode *inode)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
- struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb);
- struct fscache_cookie *old = cifsi->fscache;
-
- if (cifsi->fscache) {
- /* retire the current fscache cache and get a new one */
- fscache_relinquish_cookie(cifsi->fscache, NULL, true);
-
- cifs_fscache_acquire_inode_cookie(cifsi, tcon);
- cifs_dbg(FYI, "%s: new cookie 0x%p oldcookie 0x%p\n",
- __func__, cifsi->fscache, old);
+ fscache_relinquish_cookie(cifsi->fscache, false);
+ cifsi->fscache = NULL;
}
}

-int cifs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- if (PageFsCache(page)) {
- struct inode *inode = page->mapping->host;
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n",
- __func__, page, cifsi->fscache);
- if (!fscache_maybe_release_page(cifsi->fscache, page, gfp))
- return 0;
- }
-
- return 1;
-}
-
-static void cifs_readpage_from_fscache_complete(struct page *page, void *ctx,
- int error)
-{
- cifs_dbg(FYI, "%s: (0x%p/%d)\n", __func__, page, error);
- if (!error)
- SetPageUptodate(page);
- unlock_page(page);
-}
-
/*
* Retrieve a page from FS-Cache
*/
int __cifs_readpage_from_fscache(struct inode *inode, struct page *page)
{
- int ret;
-
cifs_dbg(FYI, "%s: (fsc:%p, p:%p, i:0x%p\n",
__func__, CIFS_I(inode)->fscache, page, inode);
- ret = fscache_read_or_alloc_page(CIFS_I(inode)->fscache, page,
- cifs_readpage_from_fscache_complete,
- NULL,
- GFP_KERNEL);
- switch (ret) {
-
- case 0: /* page found in fscache, read submitted */
- cifs_dbg(FYI, "%s: submitted\n", __func__);
- return ret;
- case -ENOBUFS: /* page won't be cached */
- case -ENODATA: /* page not in cache */
- cifs_dbg(FYI, "%s: %d\n", __func__, ret);
- return 1;
-
- default:
- cifs_dbg(VFS, "unknown error ret = %d\n", ret);
- }
- return ret;
+ return -ENOBUFS; // Needs conversion to using netfslib
}

/*
@@ -266,78 +152,19 @@ int __cifs_readpages_from_fscache(struct inode *inode,
struct list_head *pages,
unsigned *nr_pages)
{
- int ret;
-
cifs_dbg(FYI, "%s: (0x%p/%u/0x%p)\n",
__func__, CIFS_I(inode)->fscache, *nr_pages, inode);
- ret = fscache_read_or_alloc_pages(CIFS_I(inode)->fscache, mapping,
- pages, nr_pages,
- cifs_readpage_from_fscache_complete,
- NULL,
- mapping_gfp_mask(mapping));
- switch (ret) {
- case 0: /* read submitted to the cache for all pages */
- cifs_dbg(FYI, "%s: submitted\n", __func__);
- return ret;
-
- case -ENOBUFS: /* some pages are not cached and can't be */
- case -ENODATA: /* some pages are not cached */
- cifs_dbg(FYI, "%s: no page\n", __func__);
- return 1;
-
- default:
- cifs_dbg(FYI, "unknown error ret = %d\n", ret);
- }
-
- return ret;
+ return -ENOBUFS; // Needs conversion to using netfslib
}

void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
{
struct cifsInodeInfo *cifsi = CIFS_I(inode);
- int ret;

WARN_ON(!cifsi->fscache);

cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n",
__func__, cifsi->fscache, page, inode);
- ret = fscache_write_page(cifsi->fscache, page,
- cifsi->vfs_inode.i_size, GFP_KERNEL);
- if (ret != 0)
- fscache_uncache_page(cifsi->fscache, page);
-}
-
-void __cifs_fscache_readpages_cancel(struct inode *inode, struct list_head *pages)
-{
- cifs_dbg(FYI, "%s: (fsc: %p, i: %p)\n",
- __func__, CIFS_I(inode)->fscache, inode);
- fscache_readpages_cancel(CIFS_I(inode)->fscache, pages);
-}
-
-void __cifs_fscache_invalidate_page(struct page *page, struct inode *inode)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_wait_on_page_write(cookie, page);
- fscache_uncache_page(cookie, page);
-}
-
-void __cifs_fscache_wait_on_page_write(struct inode *inode, struct page *page)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;
-
- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_wait_on_page_write(cookie, page);
-}
-
-void __cifs_fscache_uncache_page(struct inode *inode, struct page *page)
-{
- struct cifsInodeInfo *cifsi = CIFS_I(inode);
- struct fscache_cookie *cookie = cifsi->fscache;

- cifs_dbg(FYI, "%s: (0x%p/0x%p)\n", __func__, page, cookie);
- fscache_uncache_page(cookie, page);
+ // Needs conversion to using netfslib
}
diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h
index 9baa1d0f22bd..0fc3f9252c84 100644
--- a/fs/cifs/fscache.h
+++ b/fs/cifs/fscache.h
@@ -13,84 +13,62 @@

#include "cifsglob.h"

-#ifdef CONFIG_CIFS_FSCACHE
-
/*
- * Auxiliary data attached to CIFS superblock within the cache
+ * Coherency data attached to CIFS volume within the cache
*/
-struct cifs_fscache_super_auxdata {
- u64 resource_id; /* unique server resource id */
+struct cifs_fscache_volume_coherency_data {
+ __le64 resource_id; /* unique server resource id */
__le64 vol_create_time;
- u32 vol_serial_number;
+ __le32 vol_serial_number;
} __packed;

/*
- * Auxiliary data attached to CIFS inode within the cache
+ * Coherency data attached to CIFS inode within the cache.
*/
-struct cifs_fscache_inode_auxdata {
- u64 last_write_time_sec;
- u64 last_change_time_sec;
- u32 last_write_time_nsec;
- u32 last_change_time_nsec;
- u64 eof;
+struct cifs_fscache_inode_coherency_data {
+ __le64 last_write_time_sec;
+ __le64 last_change_time_sec;
+ __le32 last_write_time_nsec;
+ __le32 last_change_time_nsec;
};

-/*
- * cache.c
- */
-extern struct fscache_netfs cifs_fscache_netfs;
-extern const struct fscache_cookie_def cifs_fscache_server_index_def;
-extern const struct fscache_cookie_def cifs_fscache_super_index_def;
-extern const struct fscache_cookie_def cifs_fscache_inode_object_def;
-
-extern int cifs_fscache_register(void);
-extern void cifs_fscache_unregister(void);
+#ifdef CONFIG_CIFS_FSCACHE

/*
* fscache.c
*/
-extern void cifs_fscache_get_client_cookie(struct TCP_Server_Info *);
-extern void cifs_fscache_release_client_cookie(struct TCP_Server_Info *);
-extern void cifs_fscache_get_super_cookie(struct cifs_tcon *);
+extern int cifs_fscache_get_super_cookie(struct cifs_tcon *);
extern void cifs_fscache_release_super_cookie(struct cifs_tcon *);

+extern void cifs_fscache_get_inode_cookie(struct inode *);
extern void cifs_fscache_release_inode_cookie(struct inode *);
-extern void cifs_fscache_update_inode_cookie(struct inode *inode);
-extern void cifs_fscache_set_inode_cookie(struct inode *, struct file *);
-extern void cifs_fscache_reset_inode_cookie(struct inode *);
+extern void cifs_fscache_unuse_inode_cookie(struct inode *, bool);
+
+static inline
+void cifs_fscache_fill_coherency(struct inode *inode,
+ struct cifs_fscache_inode_coherency_data *cd)
+{
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+ memset(cd, 0, sizeof(*cd));
+ cd->last_write_time_sec = cpu_to_le64(cifsi->vfs_inode.i_mtime.tv_sec);
+ cd->last_write_time_nsec = cpu_to_le32(cifsi->vfs_inode.i_mtime.tv_nsec);
+ cd->last_change_time_sec = cpu_to_le64(cifsi->vfs_inode.i_ctime.tv_sec);
+ cd->last_change_time_nsec = cpu_to_le32(cifsi->vfs_inode.i_ctime.tv_nsec);
+}
+

-extern void __cifs_fscache_invalidate_page(struct page *, struct inode *);
-extern void __cifs_fscache_wait_on_page_write(struct inode *inode, struct page *page);
-extern void __cifs_fscache_uncache_page(struct inode *inode, struct page *page);
extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
extern int __cifs_readpage_from_fscache(struct inode *, struct page *);
extern int __cifs_readpages_from_fscache(struct inode *,
struct address_space *,
struct list_head *,
unsigned *);
-extern void __cifs_fscache_readpages_cancel(struct inode *, struct list_head *);
-
extern void __cifs_readpage_to_fscache(struct inode *, struct page *);

-static inline void cifs_fscache_invalidate_page(struct page *page,
- struct inode *inode)
+static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode)
{
- if (PageFsCache(page))
- __cifs_fscache_invalidate_page(page, inode);
-}
-
-static inline void cifs_fscache_wait_on_page_write(struct inode *inode,
- struct page *page)
-{
- if (PageFsCache(page))
- __cifs_fscache_wait_on_page_write(inode, page);
-}
-
-static inline void cifs_fscache_uncache_page(struct inode *inode,
- struct page *page)
-{
- if (PageFsCache(page))
- __cifs_fscache_uncache_page(inode, page);
+ return CIFS_I(inode)->fscache;
}

static inline int cifs_readpage_from_fscache(struct inode *inode,
@@ -120,41 +98,20 @@ static inline void cifs_readpage_to_fscache(struct inode *inode,
__cifs_readpage_to_fscache(inode, page);
}

-static inline void cifs_fscache_readpages_cancel(struct inode *inode,
- struct list_head *pages)
+#else /* CONFIG_CIFS_FSCACHE */
+static inline
+void cifs_fscache_fill_coherency(struct inode *inode,
+ struct cifs_fscache_inode_coherency_data *cd)
{
- if (CIFS_I(inode)->fscache)
- return __cifs_fscache_readpages_cancel(inode, pages);
}

-#else /* CONFIG_CIFS_FSCACHE */
-static inline int cifs_fscache_register(void) { return 0; }
-static inline void cifs_fscache_unregister(void) {}
-
-static inline void
-cifs_fscache_get_client_cookie(struct TCP_Server_Info *server) {}
-static inline void
-cifs_fscache_release_client_cookie(struct TCP_Server_Info *server) {}
-static inline void cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) {}
-static inline void
-cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) {}
+static inline int cifs_fscache_get_super_cookie(struct cifs_tcon *tcon) { return 0; }
+static inline void cifs_fscache_release_super_cookie(struct cifs_tcon *tcon) {}

+static inline void cifs_fscache_get_inode_cookie(struct inode *inode) {}
static inline void cifs_fscache_release_inode_cookie(struct inode *inode) {}
-static inline void cifs_fscache_update_inode_cookie(struct inode *inode) {}
-static inline void cifs_fscache_set_inode_cookie(struct inode *inode,
- struct file *filp) {}
-static inline void cifs_fscache_reset_inode_cookie(struct inode *inode) {}
-static inline int cifs_fscache_release_page(struct page *page, gfp_t gfp)
-{
- return 1; /* May release page */
-}
-
-static inline void cifs_fscache_invalidate_page(struct page *page,
- struct inode *inode) {}
-static inline void cifs_fscache_wait_on_page_write(struct inode *inode,
- struct page *page) {}
-static inline void cifs_fscache_uncache_page(struct inode *inode,
- struct page *page) {}
+static inline void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) {}
+static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { return NULL; }

static inline int
cifs_readpage_from_fscache(struct inode *inode, struct page *page)
@@ -173,11 +130,6 @@ static inline int cifs_readpages_from_fscache(struct inode *inode,
static inline void cifs_readpage_to_fscache(struct inode *inode,
struct page *page) {}

-static inline void cifs_fscache_readpages_cancel(struct inode *inode,
- struct list_head *pages)
-{
-}
-
#endif /* CONFIG_CIFS_FSCACHE */

#endif /* _CIFS_FSCACHE_H */
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 96d083db1737..ad0a30edfcd8 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1298,10 +1298,7 @@ cifs_iget(struct super_block *sb, struct cifs_fattr *fattr)
inode->i_flags |= S_NOATIME | S_NOCMTIME;
if (inode->i_state & I_NEW) {
inode->i_ino = hash;
-#ifdef CONFIG_CIFS_FSCACHE
- /* initialize per-inode cache cookie pointer */
- CIFS_I(inode)->fscache = NULL;
-#endif
+ cifs_fscache_get_inode_cookie(inode);
unlock_new_inode(inode);
}
}
@@ -1376,12 +1373,18 @@ struct inode *cifs_root_iget(struct super_block *sb)
inode = ERR_PTR(rc);
}

- /*
- * The cookie is initialized from volume info returned above.
- * Inside cifs_fscache_get_super_cookie it checks
- * that we do not get super cookie twice.
- */
- cifs_fscache_get_super_cookie(tcon);
+ if (!rc) {
+ /*
+ * The cookie is initialized from volume info returned above.
+ * Inside cifs_fscache_get_super_cookie it checks
+ * that we do not get super cookie twice.
+ */
+ rc = cifs_fscache_get_super_cookie(tcon);
+ if (rc < 0) {
+ iget_failed(inode);
+ inode = ERR_PTR(rc);
+ }
+ }

out:
kfree(path);
@@ -2270,6 +2273,8 @@ cifs_dentry_needs_reval(struct dentry *dentry)
int
cifs_invalidate_mapping(struct inode *inode)
{
+ struct cifs_fscache_inode_coherency_data cd;
+ struct cifsInodeInfo *cifsi = CIFS_I(inode);
int rc = 0;

if (inode->i_mapping && inode->i_mapping->nrpages != 0) {
@@ -2279,7 +2284,8 @@ cifs_invalidate_mapping(struct inode *inode)
__func__, inode);
}

- cifs_fscache_reset_inode_cookie(inode);
+ cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd);
+ fscache_invalidate(cifs_inode_cookie(inode), &cd, i_size_read(inode), 0);
return rc;
}

@@ -2784,8 +2790,10 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs)
goto out;

if ((attrs->ia_valid & ATTR_SIZE) &&
- attrs->ia_size != i_size_read(inode))
+ attrs->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attrs->ia_size);
+ fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
+ }

setattr_copy(&init_user_ns, inode, attrs);
mark_inode_dirty(inode);
@@ -2980,8 +2988,10 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
goto cifs_setattr_exit;

if ((attrs->ia_valid & ATTR_SIZE) &&
- attrs->ia_size != i_size_read(inode))
+ attrs->ia_size != i_size_read(inode)) {
truncate_setsize(inode, attrs->ia_size);
+ fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
+ }

setattr_copy(&init_user_ns, inode, attrs);
mark_inode_dirty(inode);


2022-01-08 07:09:12

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

On Sat, Dec 25, 2021 at 1:32 AM David Howells <[email protected]> wrote:
>
> Use an inode flag, S_KERNEL_FILE, to mark that a backing file is in use by
> the kernel to prevent cachefiles or other kernel services from interfering
> with that file.
>
> Alter rmdir to reject attempts to remove a directory marked with this flag.
> This is used by cachefiles to prevent cachefilesd from removing them.
>
> Using S_SWAPFILE instead isn't really viable as that has other effects in
> the I/O paths.
>
> Changes
> =======
> ver #3:
> - Check for the object pointer being NULL in the tracepoints rather than
> the caller.
>
> Signed-off-by: David Howells <[email protected]>
> cc: [email protected]
> Link: https://lore.kernel.org/r/163819630256.215744.4815885535039369574.stgit@warthog.procyon.org.uk/ # v1
> Link: https://lore.kernel.org/r/163906931596.143852.8642051223094013028.stgit@warthog.procyon.org.uk/ # v2
> Link: https://lore.kernel.org/r/163967141000.1823006.12920680657559677789.stgit@warthog.procyon.org.uk/ # v3
> ---
>
> fs/cachefiles/Makefile | 1 +
> fs/cachefiles/namei.c | 43 +++++++++++++++++++++++++++++++++++++
> fs/namei.c | 3 ++-
> include/linux/fs.h | 1 +
> include/trace/events/cachefiles.h | 42 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 89 insertions(+), 1 deletion(-)
> create mode 100644 fs/cachefiles/namei.c
>
> diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
> index 463e3d608b75..e0b092ca077f 100644
> --- a/fs/cachefiles/Makefile
> +++ b/fs/cachefiles/Makefile
> @@ -7,6 +7,7 @@ cachefiles-y := \
> cache.o \
> daemon.o \
> main.o \
> + namei.o \
> security.o
>
> cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> new file mode 100644
> index 000000000000..913f83f1c900
> --- /dev/null
> +++ b/fs/cachefiles/namei.c
> @@ -0,0 +1,43 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* CacheFiles path walking and related routines
> + *
> + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
> + * Written by David Howells ([email protected])
> + */
> +
> +#include <linux/fs.h>
> +#include "internal.h"
> +
> +/*
> + * Mark the backing file as being a cache file if it's not already in use. The
> + * mark tells the culling request command that it's not allowed to cull the
> + * file or directory. The caller must hold the inode lock.
> + */
> +static bool __cachefiles_mark_inode_in_use(struct cachefiles_object *object,
> + struct dentry *dentry)
> +{
> + struct inode *inode = d_backing_inode(dentry);
> + bool can_use = false;
> +
> + if (!(inode->i_flags & S_KERNEL_FILE)) {
> + inode->i_flags |= S_KERNEL_FILE;
> + trace_cachefiles_mark_active(object, inode);
> + can_use = true;
> + } else {
> + pr_notice("cachefiles: Inode already in use: %pd\n", dentry);
> + }
> +
> + return can_use;
> +}
> +
> +/*
> + * Unmark a backing inode. The caller must hold the inode lock.
> + */
> +static void __cachefiles_unmark_inode_in_use(struct cachefiles_object *object,
> + struct dentry *dentry)
> +{
> + struct inode *inode = d_backing_inode(dentry);
> +
> + inode->i_flags &= ~S_KERNEL_FILE;
> + trace_cachefiles_mark_inactive(object, inode);
> +}
> diff --git a/fs/namei.c b/fs/namei.c
> index 1f9d2187c765..d81f04f8d818 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3958,7 +3958,8 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
> inode_lock(dentry->d_inode);
>
> error = -EBUSY;
> - if (is_local_mountpoint(dentry))
> + if (is_local_mountpoint(dentry) ||
> + (dentry->d_inode->i_flags & S_KERNEL_FILE))

Better as this check to the many other checks in may_delete()

> goto out;
>
> error = security_inode_rmdir(dir, dentry);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2c0b8e77d9ab..bcf1ca430139 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2249,6 +2249,7 @@ struct super_operations {
> #define S_ENCRYPTED (1 << 14) /* Encrypted file (using fs/crypto/) */
> #define S_CASEFOLD (1 << 15) /* Casefolded file */
> #define S_VERITY (1 << 16) /* Verity file (using fs/verity/) */
> +#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
>

Trying to brand this flag as a generic "in use by kernel" is misleading.
Modules other than cachefiles cannot set/clear this flag, because then
cachefiles won't know that it is allowed to set/clear the flag.

So I think it would be better to call it for what it really is - an inode flag
that is controlled by cachefiles.
Also, the name KERNEL_FILE for a directory is a bit confusing IMO.
Perhaps S_CACHEFILES?

Thanks,
Amir.

2022-01-08 07:18:02

by Matthew Wilcox (Oracle)

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

On Sat, Jan 08, 2022 at 09:08:57AM +0200, Amir Goldstein wrote:
> > +#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
>
> Trying to brand this flag as a generic "in use by kernel" is misleading.
> Modules other than cachefiles cannot set/clear this flag, because then
> cachefiles won't know that it is allowed to set/clear the flag.

Huh? If some other kernel module sets it, cachefiles will try to set it,
see it's already set, and fail. Presumably cachefiles does not go round
randomly "unusing" files that it has not previously started using.

I mean, yes, obviously, it's a kernel module, it can set and clear
whatever flags it likes on any inode in the system, but conceptually,
it's an advisory whole-file lock.

2022-01-08 08:41:41

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

Amir Goldstein <[email protected]> wrote:

> > - if (is_local_mountpoint(dentry))
> > + if (is_local_mountpoint(dentry) ||
> > + (dentry->d_inode->i_flags & S_KERNEL_FILE))
>
> Better as this check to the many other checks in may_delete()

Okay. It will make things a bit more complicated, so I'll do it in a follow
up patch. The problem is that it will prevent the cachefiles driver in the
kernel from renaming directories and unlinking files as it's currently
removing the mark *after* moving/deleting them.

> > +#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
> >
>
> Trying to brand this flag as a generic "in use by kernel" is misleading.
> Modules other than cachefiles cannot set/clear this flag, because then
> cachefiles won't know that it is allowed to set/clear the flag.

If the flag is set, then cachefiles thinks some other kernel driver is using
the file and it shouldn't try to use it. It doesn't matter who has it open.

It should never happen as other kernel drivers shouldn't be poking around
inside cachefiles's cache, but possibly someone could misconfigure something.

David


2022-01-08 08:43:27

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

Matthew Wilcox <[email protected]> wrote:

>
> Huh? If some other kernel module sets it, cachefiles will try to set it,
> see it's already set, and fail. Presumably cachefiles does not go round
> randomly "unusing" files that it has not previously started using.

Correct. It only unuses a file that it marked in-use in the first place.

David


2022-01-09 15:27:38

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v4 63/68] cifs: Support fscache indexing rewrite (untested)

On Fri, 2022-01-07 at 21:52 +0000, David Howells wrote:
> This patch isn't quite right and needs a fix. The attached patch fixes it.
> I'll fold that in and post a v5 of this patch.
>
> David
> ---
> cifs: Fix the fscache cookie management
>
> Fix the fscache cookie management in cifs in the following ways:
>
> (1) The cookie should be released in cifs_evict_inode() after it has been
> unused from being pinned by cifs_set_page_dirty().
>
> (2) The cookie shouldn't be released in cifsFileInfo_put_final(). That's
> dealing with closing a file, not getting rid of an inode. The cookie
> needs to persist beyond the last file close because writepages may
> happen after closure.
>
> (3) The cookie needs to be used in cifs_atomic_open() to match
> cifs_open(). As yet unknown files being opened for writing seem to go
> by the former route rather than the latter.
>
> This can be triggered by something like:
>
> # systemctl start cachefilesd
> # mount //carina/test /xfstest.test -o user=shares,pass=xxxx.fsc
> # bash 5</xfstest.test/bar
> # echo hello >/xfstest.test/bar
>
> The key is to open the file R/O and then open it R/W and write to it whilst
> it's still open for R/O. A cookie isn't acquired if it's just opened R/W
> as it goes through the atomic_open method rather than the open method.
>
> Signed-off-by: David Howells <[email protected]>
> ---
> fs/cifs/cifsfs.c | 8 ++++++++
> fs/cifs/dir.c | 4 ++++
> fs/cifs/file.c | 2 --
> 3 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
> index d3f3acf340f1..26cf2193c9a2 100644
> --- a/fs/cifs/cifsfs.c
> +++ b/fs/cifs/cifsfs.c
> @@ -398,6 +398,7 @@ cifs_evict_inode(struct inode *inode)
> truncate_inode_pages_final(&inode->i_data);
> if (inode->i_state & I_PINNING_FSCACHE_WB)
> cifs_fscache_unuse_inode_cookie(inode, true);
> + cifs_fscache_release_inode_cookie(inode);
> clear_inode(inode);
> }
>
> @@ -722,6 +723,12 @@ static int cifs_show_stats(struct seq_file *s, struct dentry *root)
> }
> #endif
>
> +static int cifs_write_inode(struct inode *inode, struct writeback_control *wbc)
> +{
> + fscache_unpin_writeback(wbc, cifs_inode_cookie(inode));
> + return 0;
> +}
> +
> static int cifs_drop_inode(struct inode *inode)
> {
> struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
> @@ -734,6 +741,7 @@ static int cifs_drop_inode(struct inode *inode)
> static const struct super_operations cifs_super_ops = {
> .statfs = cifs_statfs,
> .alloc_inode = cifs_alloc_inode,
> + .write_inode = cifs_write_inode,
> .free_inode = cifs_free_inode,
> .drop_inode = cifs_drop_inode,
> .evict_inode = cifs_evict_inode,
> diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
> index 6e8e7cc26ae2..6186824b366e 100644
> --- a/fs/cifs/dir.c
> +++ b/fs/cifs/dir.c
> @@ -22,6 +22,7 @@
> #include "cifs_unicode.h"
> #include "fs_context.h"
> #include "cifs_ioctl.h"
> +#include "fscache.h"
>
> static void
> renew_parental_timestamps(struct dentry *direntry)
> @@ -509,6 +510,9 @@ cifs_atomic_open(struct inode *inode, struct dentry *direntry,
> rc = -ENOMEM;
> }
>
> + fscache_use_cookie(cifs_inode_cookie(file_inode(file)),
> + file->f_mode & FMODE_WRITE);
> +
> out:
> cifs_put_tlink(tlink);
> out_free_xid:
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index d872f6fe8e7d..44da7646f789 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -376,8 +376,6 @@ static void cifsFileInfo_put_final(struct cifsFileInfo *cifs_file)
> struct cifsLockInfo *li, *tmp;
> struct super_block *sb = inode->i_sb;
>
> - cifs_fscache_release_inode_cookie(inode);
> -
> /*
> * Delete any outstanding lock records. We'll lose them when the file
> * is closed anyway.
>

Looks good.

Acked-by: Jeff Layton <[email protected]>

2022-01-10 08:00:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

On Sat, Jan 08, 2022 at 07:17:33AM +0000, Matthew Wilcox wrote:
> On Sat, Jan 08, 2022 at 09:08:57AM +0200, Amir Goldstein wrote:
> > > +#define S_KERNEL_FILE (1 << 17) /* File is in use by the kernel (eg. fs/cachefiles) */
> >
> > Trying to brand this flag as a generic "in use by kernel" is misleading.
> > Modules other than cachefiles cannot set/clear this flag, because then
> > cachefiles won't know that it is allowed to set/clear the flag.
>
> Huh? If some other kernel module sets it, cachefiles will try to set it,
> see it's already set, and fail. Presumably cachefiles does not go round
> randomly "unusing" files that it has not previously started using.
>
> I mean, yes, obviously, it's a kernel module, it can set and clear
> whatever flags it likes on any inode in the system, but conceptually,
> it's an advisory whole-file lock.

So let's name it that way. We have plenty of files in kernel use using
filp_open and this flag very obviously means something else.

2022-01-10 11:22:09

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v4 00/68] fscache, cachefiles: Rewrite

David Howells wrote on Wed, Dec 22, 2021 at 11:13:11PM +0000:
> These patches can be found also on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite

Tested today's tip of the branch for 9p, with the patch I just sent it
appears to work ok with the note that files aren't read from cachefilesd
properly.

They were in v5.15 and not 5.16 so it's another regression from the
previous PR, and not a problem with these patches though, so can add my
tested-by to the three 9p patches:

Tested-by: Dominique Martinet <[email protected]>

--
Dominique

2022-01-10 11:32:36

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v4 38/68] vfs, cachefiles: Mark a backing file in use with an inode flag

Christoph Hellwig <[email protected]> wrote:

> So let's name it that way. We have plenty of files in kernel use using
> filp_open and this flag very obviously means something else.

S_KERNEL_LOCK?

David


2022-01-10 11:53:56

by Daire Byrne

[permalink] [raw]
Subject: Re: [PATCH v4 00/68] fscache, cachefiles: Rewrite

On Wed, 22 Dec 2021 at 23:13, David Howells <[email protected]> wrote:
> These patches can be found also on:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-rewrite

I have run this through our production workloads without issue. There
were no recorded performance or stability differences between this and
the old fscache/cachefiles.

Our workload comprises mounting ~20 remote servers with "-o fsc" over
the WAN and then re-exporting those to 500 local client instances.
This production workload churns the fscache backing filesystem (EXT4)
pretty well (hundreds of MB/s) across all of the mount points
simultaneously.

I tested with both NFSv4.2 and NFSv3 mounts. Previously written cache
data was correctly reused between reboots and remounts.

Tested-by: Daire Byrne <[email protected]>

Cheers,

Daire