2021-01-21 01:50:58

by David Howells

[permalink] [raw]
Subject: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API


Here's a set of patches to do two things:

(1) Add a helper library to handle the new VM readahead interface. This
is intended to be used unconditionally by the filesystem (whether or
not caching is enabled) and provides a common framework for doing
caching, transparent huge pages and, in the future, possibly fscrypt
and read bandwidth maximisation. It also allows the netfs and the
cache to align, expand and slice up a read request from the VM in
various ways; the netfs need only provide a function to read a stretch
of data to the pagecache and the helper takes care of the rest.

(2) Add an alternative fscache/cachfiles I/O API that uses the kiocb
facility to do async DIO to transfer data to/from the netfs's pages,
rather than using readpage with wait queue snooping on one side and
vfs_write() on the other. It also uses less memory, since it doesn't
do buffered I/O on the backing file.

Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
to be read from the cache. Whilst this is an improvement from the
bmap interface, it still has a problem with regard to a modern
extent-based filesystem inserting or removing bridging blocks of
zeros. Fixing that requires a much greater overhaul.

This is a step towards overhauling the fscache API. The change is opt-in
on the part of the network filesystem. A netfs should not try to mix the
old and the new API because of conflicting ways of handling pages and the
PG_fscache page flag and because it would be mixing DIO with buffered I/O.
Further, the helper library can't be used with the old API.

This does not change any of the fscache cookie handling APIs or the way
invalidation is done.

In the near term, I intend to deprecate and remove the old I/O API
(fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(),
fscache_write_page() and fscache_uncache_page()) and eventually replace
most of fscache/cachefiles with something simpler and easier to follow.

The patchset contains four parts:

(1) Some helper patches, including provision of an ITER_XARRAY iov
iterator and a function to do readahead expansion.

(2) Patches to add the netfs helper library.

(3) A patch to add the fscache/cachefiles kiocb API

(4) Patches to add support in AFS for this.

With this, AFS without a cache passes all expected xfstests; with a cache,
there's an extra failure, but that's also there before these patches.
Fixing that probably requires a greater overhaul.

These patches can be found also on:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib

David
---
David Howells (24):
iov_iter: Add ITER_XARRAY
vm: Add wait/unlock functions for PG_fscache
mm: Implement readahead_control pageset expansion
vfs: Export rw_verify_area() for use by cachefiles
netfs: Make a netfs helper module
netfs: Provide readahead and readpage netfs helpers
netfs: Add tracepoints
netfs: Gather stats
netfs: Add write_begin helper
netfs: Define an interface to talk to a cache
fscache, cachefiles: Add alternate API to use kiocb for read/write to cache
afs: Disable use of the fscache I/O routines
afs: Pass page into dirty region helpers to provide THP size
afs: Print the operation debug_id when logging an unexpected data version
afs: Move key to afs_read struct
afs: Don't truncate iter during data fetch
afs: Log remote unmarshalling errors
afs: Set up the iov_iter before calling afs_extract_data()
afs: Use ITER_XARRAY for writing
afs: Wait on PG_fscache before modifying/releasing a page
afs: Extract writeback extension into its own function
afs: Prepare for use of THPs
afs: Use the fs operation ops to handle FetchData completion
afs: Use new fscache read helper API

Takashi Iwai (1):
cachefiles: Drop superfluous readpages aops NULL check


fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/afs/Kconfig | 1 +
fs/afs/dir.c | 225 ++++---
fs/afs/file.c | 472 ++++----------
fs/afs/fs_operation.c | 4 +-
fs/afs/fsclient.c | 108 ++--
fs/afs/inode.c | 7 +-
fs/afs/internal.h | 57 +-
fs/afs/rxrpc.c | 150 ++---
fs/afs/write.c | 610 ++++++++++--------
fs/afs/yfsclient.c | 82 +--
fs/cachefiles/Makefile | 1 +
fs/cachefiles/interface.c | 5 +-
fs/cachefiles/internal.h | 9 +
fs/cachefiles/rdwr.c | 2 -
fs/cachefiles/rdwr2.c | 406 ++++++++++++
fs/fscache/Makefile | 3 +-
fs/fscache/internal.h | 3 +
fs/fscache/page.c | 2 +-
fs/fscache/page2.c | 116 ++++
fs/fscache/stats.c | 1 +
fs/internal.h | 5 -
fs/netfs/Kconfig | 23 +
fs/netfs/Makefile | 5 +
fs/netfs/internal.h | 97 +++
fs/netfs/read_helper.c | 1142 +++++++++++++++++++++++++++++++++
fs/netfs/stats.c | 57 ++
fs/read_write.c | 1 +
include/linux/fs.h | 1 +
include/linux/fscache-cache.h | 4 +
include/linux/fscache.h | 28 +-
include/linux/netfs.h | 167 +++++
include/linux/pagemap.h | 16 +
include/net/af_rxrpc.h | 2 +-
include/trace/events/afs.h | 74 +--
include/trace/events/netfs.h | 201 ++++++
mm/filemap.c | 18 +
mm/readahead.c | 70 ++
net/rxrpc/recvmsg.c | 9 +-
40 files changed, 3171 insertions(+), 1015 deletions(-)
create mode 100644 fs/cachefiles/rdwr2.c
create mode 100644 fs/fscache/page2.c
create mode 100644 fs/netfs/Kconfig
create mode 100644 fs/netfs/Makefile
create mode 100644 fs/netfs/internal.h
create mode 100644 fs/netfs/read_helper.c
create mode 100644 fs/netfs/stats.c
create mode 100644 include/linux/netfs.h
create mode 100644 include/trace/events/netfs.h



2021-01-21 01:57:38

by David Howells

[permalink] [raw]
Subject: [PATCH 18/25] afs: Log remote unmarshalling errors

Log unmarshalling errors reported by the peer (ie. it can't parse what we
sent it). Limit the maximum number of messages to 3.

Signed-off-by: David Howells <[email protected]>
---

fs/afs/rxrpc.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 0ec38b758f29..ae68576f822f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -500,6 +500,39 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
_leave(" = %d", ret);
}

+/*
+ * Log remote abort codes that indicate that we have a protocol disagreement
+ * with the server.
+ */
+static void afs_log_error(struct afs_call *call, s32 remote_abort)
+{
+ static int max = 0;
+ const char *msg;
+ int m;
+
+ switch (remote_abort) {
+ case RX_EOF: msg = "unexpected EOF"; break;
+ case RXGEN_CC_MARSHAL: msg = "client marshalling"; break;
+ case RXGEN_CC_UNMARSHAL: msg = "client unmarshalling"; break;
+ case RXGEN_SS_MARSHAL: msg = "server marshalling"; break;
+ case RXGEN_SS_UNMARSHAL: msg = "server unmarshalling"; break;
+ case RXGEN_DECODE: msg = "opcode decode"; break;
+ case RXGEN_SS_XDRFREE: msg = "server XDR cleanup"; break;
+ case RXGEN_CC_XDRFREE: msg = "client XDR cleanup"; break;
+ case -32: msg = "insufficient data"; break;
+ default:
+ return;
+ }
+
+ m = max;
+ if (m < 3) {
+ max = m + 1;
+ pr_notice("kAFS: Peer reported %s failure on %s [%pISp]\n",
+ msg, call->type->name,
+ &call->alist->addrs[call->addr_ix].transport);
+ }
+}
+
/*
* deliver messages to a call
*/
@@ -563,6 +596,7 @@ static void afs_deliver_to_call(struct afs_call *call)
goto out;
case -ECONNABORTED:
ASSERTCMP(state, ==, AFS_CALL_COMPLETE);
+ afs_log_error(call, call->abort_code);
goto done;
case -ENOTSUPP:
abort_code = RXGEN_OPCODE;


2021-01-21 01:58:41

by David Howells

[permalink] [raw]
Subject: [PATCH 16/25] afs: Move key to afs_read struct

Stash the key used to authenticate read operations in the afs_read struct.
This will be necessary to reissue the operation against the server if a
read from the cache fails in upcoming cache changes.

Signed-off-by: David Howells <[email protected]>
---

fs/afs/dir.c | 3 ++-
fs/afs/file.c | 16 +++++++++-------
fs/afs/internal.h | 3 ++-
fs/afs/write.c | 12 ++++++------
4 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 7bd659ad959e..96e9e2e60d97 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -241,6 +241,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
return ERR_PTR(-ENOMEM);

refcount_set(&req->usage, 1);
+ req->key = key_get(key);
req->nr_pages = nr_pages;
req->actual_len = i_size; /* May change */
req->len = nr_pages * PAGE_SIZE; /* We can ask for more than there is */
@@ -305,7 +306,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)

if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
trace_afs_reload_dir(dvnode);
- ret = afs_fetch_data(dvnode, key, req);
+ ret = afs_fetch_data(dvnode, req);
if (ret < 0)
goto error_unlock;

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 21868bfc3a44..d23192b3b933 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -199,6 +199,7 @@ void afs_put_read(struct afs_read *req)
if (req->pages != req->array)
kfree(req->pages);
}
+ key_put(req->key);
kfree(req);
}
}
@@ -229,7 +230,7 @@ static const struct afs_operation_ops afs_fetch_data_operation = {
/*
* Fetch file data from the volume.
*/
-int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *req)
+int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req)
{
struct afs_operation *op;

@@ -238,9 +239,9 @@ int afs_fetch_data(struct afs_vnode *vnode, struct key *key, struct afs_read *re
vnode->fid.vid,
vnode->fid.vnode,
vnode->fid.unique,
- key_serial(key));
+ key_serial(req->key));

- op = afs_alloc_operation(key, vnode->volume);
+ op = afs_alloc_operation(req->key, vnode->volume);
if (IS_ERR(op))
return PTR_ERR(op);

@@ -279,6 +280,7 @@ int afs_page_filler(void *data, struct page *page)
* unmarshalling code will clear the unfilled space.
*/
refcount_set(&req->usage, 1);
+ req->key = key_get(key);
req->pos = (loff_t)page->index << PAGE_SHIFT;
req->len = PAGE_SIZE;
req->nr_pages = 1;
@@ -288,7 +290,7 @@ int afs_page_filler(void *data, struct page *page)

/* read the contents of the file from the server into the
* page */
- ret = afs_fetch_data(vnode, key, req);
+ ret = afs_fetch_data(vnode, req);
afs_put_read(req);

if (ret < 0) {
@@ -373,7 +375,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
struct afs_read *req;
struct list_head *p;
struct page *first, *page;
- struct key *key = afs_file_key(file);
pgoff_t index;
int ret, n, i;

@@ -397,6 +398,7 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,

refcount_set(&req->usage, 1);
req->vnode = vnode;
+ req->key = key_get(afs_file_key(file));
req->page_done = afs_readpages_page_done;
req->pos = first->index;
req->pos <<= PAGE_SHIFT;
@@ -426,11 +428,11 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
} while (req->nr_pages < n);

if (req->nr_pages == 0) {
- kfree(req);
+ afs_put_read(req);
return 0;
}

- ret = afs_fetch_data(vnode, key, req);
+ ret = afs_fetch_data(vnode, req);
if (ret < 0)
goto error;

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index cd545e7dbfb8..4b255d10f726 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -204,6 +204,7 @@ struct afs_read {
loff_t actual_len; /* How much we're actually getting */
loff_t remain; /* Amount remaining */
loff_t file_size; /* File size returned by server */
+ struct key *key; /* The key to use to reissue the read */
afs_dataversion_t data_version; /* Version number returned by server */
refcount_t usage;
unsigned int index; /* Which page we're reading into */
@@ -1045,7 +1046,7 @@ extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *);
extern void afs_put_wb_key(struct afs_wb_key *);
extern int afs_open(struct inode *, struct file *);
extern int afs_release(struct inode *, struct file *);
-extern int afs_fetch_data(struct afs_vnode *, struct key *, struct afs_read *);
+extern int afs_fetch_data(struct afs_vnode *, struct afs_read *);
extern int afs_page_filler(void *, struct page *);
extern void afs_put_read(struct afs_read *);

diff --git a/fs/afs/write.c b/fs/afs/write.c
index 9d0cef35ecba..7eba0d3201ba 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -25,9 +25,10 @@ int afs_set_page_dirty(struct page *page)
/*
* partly or wholly fill a page that's under preparation for writing
*/
-static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
+static int afs_fill_page(struct file *file,
loff_t pos, unsigned int len, struct page *page)
{
+ struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
struct afs_read *req;
size_t p;
void *data;
@@ -49,6 +50,7 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
return -ENOMEM;

refcount_set(&req->usage, 1);
+ req->key = key_get(afs_file_key(file));
req->pos = pos;
req->len = len;
req->nr_pages = 1;
@@ -56,7 +58,7 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key,
req->pages[0] = page;
get_page(page);

- ret = afs_fetch_data(vnode, key, req);
+ ret = afs_fetch_data(vnode, req);
afs_put_read(req);
if (ret < 0) {
if (ret == -ENOENT) {
@@ -80,7 +82,6 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
{
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
struct page *page;
- struct key *key = afs_file_key(file);
unsigned long priv;
unsigned f, from = pos & (PAGE_SIZE - 1);
unsigned t, to = from + len;
@@ -95,7 +96,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
return -ENOMEM;

if (!PageUptodate(page) && len != PAGE_SIZE) {
- ret = afs_fill_page(vnode, key, pos & PAGE_MASK, PAGE_SIZE, page);
+ ret = afs_fill_page(file, pos & PAGE_MASK, PAGE_SIZE, page);
if (ret < 0) {
unlock_page(page);
put_page(page);
@@ -163,7 +164,6 @@ int afs_write_end(struct file *file, struct address_space *mapping,
struct page *page, void *fsdata)
{
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
- struct key *key = afs_file_key(file);
unsigned long priv;
unsigned int f, from = pos & (PAGE_SIZE - 1);
unsigned int t, to = from + copied;
@@ -193,7 +193,7 @@ int afs_write_end(struct file *file, struct address_space *mapping,
* unmarshalling routine will take care of clearing any
* bits that are beyond the EOF.
*/
- ret = afs_fill_page(vnode, key, pos + copied,
+ ret = afs_fill_page(file, pos + copied,
len - copied, page);
if (ret < 0)
goto out;


2021-01-21 02:10:49

by David Howells

[permalink] [raw]
Subject: [PATCH 13/25] afs: Disable use of the fscache I/O routines

Disable use of the fscache I/O routined by the AFS filesystem. It's about
to transition to passing iov_iters down and fscache is about to have its
I/O path to use iov_iter, so all that needs to change.

Signed-off-by: David Howells <[email protected]>
cc: [email protected]
---

fs/afs/file.c | 199 ++++++++++----------------------------------------------
fs/afs/inode.c | 2 -
fs/afs/write.c | 10 ---
3 files changed, 36 insertions(+), 175 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 85f5adf21aa0..6d43713fde01 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -203,24 +203,6 @@ void afs_put_read(struct afs_read *req)
}
}

-#ifdef CONFIG_AFS_FSCACHE
-/*
- * deal with notification that a page was read from the cache
- */
-static void afs_file_readpage_read_complete(struct page *page,
- void *data,
- int error)
-{
- _enter("%p,%p,%d", page, data, error);
-
- /* if the read completes with an error, we just unlock the page and let
- * the VM reissue the readpage */
- if (!error)
- SetPageUptodate(page);
- unlock_page(page);
-}
-#endif
-
static void afs_fetch_data_success(struct afs_operation *op)
{
struct afs_vnode *vnode = op->file[0].vnode;
@@ -288,89 +270,46 @@ int afs_page_filler(void *data, struct page *page)
if (test_bit(AFS_VNODE_DELETED, &vnode->flags))
goto error;

- /* is it cached? */
-#ifdef CONFIG_AFS_FSCACHE
- ret = fscache_read_or_alloc_page(vnode->cache,
- page,
- afs_file_readpage_read_complete,
- NULL,
- GFP_KERNEL);
-#else
- ret = -ENOBUFS;
-#endif
- switch (ret) {
- /* read BIO submitted (page in cache) */
- case 0:
- break;
-
- /* page not yet cached */
- case -ENODATA:
- _debug("cache said ENODATA");
- goto go_on;
-
- /* page will not be cached */
- case -ENOBUFS:
- _debug("cache said ENOBUFS");
-
- fallthrough;
- default:
- go_on:
- req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
- if (!req)
- goto enomem;
-
- /* We request a full page. If the page is a partial one at the
- * end of the file, the server will return a short read and the
- * unmarshalling code will clear the unfilled space.
- */
- refcount_set(&req->usage, 1);
- req->pos = (loff_t)page->index << PAGE_SHIFT;
- req->len = PAGE_SIZE;
- req->nr_pages = 1;
- req->pages = req->array;
- req->pages[0] = page;
- get_page(page);
-
- /* read the contents of the file from the server into the
- * page */
- ret = afs_fetch_data(vnode, key, req);
- afs_put_read(req);
-
- if (ret < 0) {
- if (ret == -ENOENT) {
- _debug("got NOENT from server"
- " - marking file deleted and stale");
- set_bit(AFS_VNODE_DELETED, &vnode->flags);
- ret = -ESTALE;
- }
-
-#ifdef CONFIG_AFS_FSCACHE
- fscache_uncache_page(vnode->cache, page);
-#endif
- BUG_ON(PageFsCache(page));
-
- if (ret == -EINTR ||
- ret == -ENOMEM ||
- ret == -ERESTARTSYS ||
- ret == -EAGAIN)
- goto error;
- goto io_error;
- }
+ req = kzalloc(struct_size(req, array, 1), GFP_KERNEL);
+ if (!req)
+ goto enomem;

- SetPageUptodate(page);
+ /* We request a full page. If the page is a partial one at the
+ * end of the file, the server will return a short read and the
+ * unmarshalling code will clear the unfilled space.
+ */
+ refcount_set(&req->usage, 1);
+ req->pos = (loff_t)page->index << PAGE_SHIFT;
+ req->len = PAGE_SIZE;
+ req->nr_pages = 1;
+ req->pages = req->array;
+ req->pages[0] = page;
+ get_page(page);
+
+ /* read the contents of the file from the server into the
+ * page */
+ ret = afs_fetch_data(vnode, key, req);
+ afs_put_read(req);

- /* send the page to the cache */
-#ifdef CONFIG_AFS_FSCACHE
- if (PageFsCache(page) &&
- fscache_write_page(vnode->cache, page, vnode->status.size,
- GFP_KERNEL) != 0) {
- fscache_uncache_page(vnode->cache, page);
- BUG_ON(PageFsCache(page));
+ if (ret < 0) {
+ if (ret == -ENOENT) {
+ _debug("got NOENT from server"
+ " - marking file deleted and stale");
+ set_bit(AFS_VNODE_DELETED, &vnode->flags);
+ ret = -ESTALE;
}
-#endif
- unlock_page(page);
+
+ if (ret == -EINTR ||
+ ret == -ENOMEM ||
+ ret == -ERESTARTSYS ||
+ ret == -EAGAIN)
+ goto error;
+ goto io_error;
}

+ SetPageUptodate(page);
+ unlock_page(page);
+
_leave(" = 0");
return 0;

@@ -416,23 +355,10 @@ static int afs_readpage(struct file *file, struct page *page)
*/
static void afs_readpages_page_done(struct afs_read *req)
{
-#ifdef CONFIG_AFS_FSCACHE
- struct afs_vnode *vnode = req->vnode;
-#endif
struct page *page = req->pages[req->index];

req->pages[req->index] = NULL;
SetPageUptodate(page);
-
- /* send the page to the cache */
-#ifdef CONFIG_AFS_FSCACHE
- if (PageFsCache(page) &&
- fscache_write_page(vnode->cache, page, vnode->status.size,
- GFP_KERNEL) != 0) {
- fscache_uncache_page(vnode->cache, page);
- BUG_ON(PageFsCache(page));
- }
-#endif
unlock_page(page);
put_page(page);
}
@@ -491,9 +417,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
index = page->index;
if (add_to_page_cache_lru(page, mapping, index,
readahead_gfp_mask(mapping))) {
-#ifdef CONFIG_AFS_FSCACHE
- fscache_uncache_page(vnode->cache, page);
-#endif
put_page(page);
break;
}
@@ -526,9 +449,6 @@ static int afs_readpages_one(struct file *file, struct address_space *mapping,
for (i = 0; i < req->nr_pages; i++) {
page = req->pages[i];
if (page) {
-#ifdef CONFIG_AFS_FSCACHE
- fscache_uncache_page(vnode->cache, page);
-#endif
SetPageError(page);
unlock_page(page);
}
@@ -560,37 +480,6 @@ static int afs_readpages(struct file *file, struct address_space *mapping,
}

/* attempt to read as many of the pages as possible */
-#ifdef CONFIG_AFS_FSCACHE
- ret = fscache_read_or_alloc_pages(vnode->cache,
- mapping,
- pages,
- &nr_pages,
- afs_file_readpage_read_complete,
- NULL,
- mapping_gfp_mask(mapping));
-#else
- ret = -ENOBUFS;
-#endif
-
- switch (ret) {
- /* all pages are being read from the cache */
- case 0:
- BUG_ON(!list_empty(pages));
- BUG_ON(nr_pages != 0);
- _leave(" = 0 [reading all]");
- return 0;
-
- /* there were pages that couldn't be read from the cache */
- case -ENODATA:
- case -ENOBUFS:
- break;
-
- /* other error */
- default:
- _leave(" = %d", ret);
- return ret;
- }
-
while (!list_empty(pages)) {
ret = afs_readpages_one(file, mapping, pages);
if (ret < 0)
@@ -670,17 +559,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,

BUG_ON(!PageLocked(page));

-#ifdef CONFIG_AFS_FSCACHE
- /* we clean up only if the entire page is being invalidated */
- if (offset == 0 && length == PAGE_SIZE) {
- if (PageFsCache(page)) {
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- fscache_wait_on_page_write(vnode->cache, page);
- fscache_uncache_page(vnode->cache, page);
- }
- }
-#endif
-
if (PagePrivate(page))
afs_invalidate_dirty(page, offset, length);

@@ -702,13 +580,6 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)

/* deny if page is being written to the cache and the caller hasn't
* elected to wait */
-#ifdef CONFIG_AFS_FSCACHE
- if (!fscache_maybe_release_page(vnode->cache, page, gfp_flags)) {
- _leave(" = F [cache busy]");
- return 0;
- }
-#endif
-
if (PagePrivate(page)) {
priv = (unsigned long)detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("rel"),
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index b0d7b892090d..48edd8d724d2 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -428,7 +428,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode)
} __packed key;
struct afs_vnode_cache_aux aux;

- if (vnode->status.type == AFS_FTYPE_DIR) {
+ if (vnode->status.type != AFS_FTYPE_FILE) {
vnode->cache = NULL;
return;
}
diff --git a/fs/afs/write.c b/fs/afs/write.c
index c9195fc67fd8..92eaa88000d7 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -847,9 +847,6 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
/* Wait for the page to be written to the cache before we allow it to
* be modified. We then assume the entire page will need writing back.
*/
-#ifdef CONFIG_AFS_FSCACHE
- fscache_wait_on_page_write(vnode->cache, vmf->page);
-#endif

if (PageWriteback(vmf->page) &&
wait_on_page_bit_killable(vmf->page, PG_writeback) < 0)
@@ -936,12 +933,5 @@ int afs_launder_page(struct page *page)
priv = (unsigned long)detach_page_private(page);
trace_afs_page_dirty(vnode, tracepoint_string("laundered"),
page->index, priv);
-
-#ifdef CONFIG_AFS_FSCACHE
- if (PageFsCache(page)) {
- fscache_wait_on_page_write(vnode->cache, page);
- fscache_uncache_page(vnode->cache, page);
- }
-#endif
return ret;
}


2021-01-21 02:10:49

by David Howells

[permalink] [raw]
Subject: [PATCH 14/25] afs: Pass page into dirty region helpers to provide THP size

Pass a pointer to the page being accessed into the dirty region helpers so
that the size of the page can be determined in case it's a transparent huge
page.

This also required the page to be passed into the afs_page_dirty trace
point - so there's no need to specifically pass in the index or private
data as these can be retrieved directly from the page struct.

Signed-off-by: David Howells <[email protected]>
---

fs/afs/file.c | 20 +++++++--------
fs/afs/internal.h | 16 ++++++------
fs/afs/write.c | 60 ++++++++++++++++++--------------------------
include/trace/events/afs.h | 23 ++++++++++-------
4 files changed, 55 insertions(+), 64 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 6d43713fde01..21868bfc3a44 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -515,8 +515,8 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
return;

/* We may need to shorten the dirty region */
- f = afs_page_dirty_from(priv);
- t = afs_page_dirty_to(priv);
+ f = afs_page_dirty_from(page, priv);
+ t = afs_page_dirty_to(page, priv);

if (t <= offset || f >= end)
return; /* Doesn't overlap */
@@ -534,17 +534,17 @@ static void afs_invalidate_dirty(struct page *page, unsigned int offset,
if (f == t)
goto undirty;

- priv = afs_page_dirty(f, t);
+ priv = afs_page_dirty(page, f, t);
set_page_private(page, priv);
- trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("trunc"), page);
return;

undirty:
- trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("undirty"), page);
clear_page_dirty_for_io(page);
full_invalidate:
- priv = (unsigned long)detach_page_private(page);
- trace_afs_page_dirty(vnode, tracepoint_string("inval"), page->index, priv);
+ detach_page_private(page);
+ trace_afs_page_dirty(vnode, tracepoint_string("inval"), page);
}

/*
@@ -572,7 +572,6 @@ static void afs_invalidatepage(struct page *page, unsigned int offset,
static int afs_releasepage(struct page *page, gfp_t gfp_flags)
{
struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- unsigned long priv;

_enter("{{%llx:%llu}[%lu],%lx},%x",
vnode->fid.vid, vnode->fid.vnode, page->index, page->flags,
@@ -581,9 +580,8 @@ static int afs_releasepage(struct page *page, gfp_t gfp_flags)
/* deny if page is being written to the cache and the caller hasn't
* elected to wait */
if (PagePrivate(page)) {
- priv = (unsigned long)detach_page_private(page);
- trace_afs_page_dirty(vnode, tracepoint_string("rel"),
- page->index, priv);
+ detach_page_private(page);
+ trace_afs_page_dirty(vnode, tracepoint_string("rel"), page);
}

/* indicate that the page can be released */
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 0d150a29e39e..cd545e7dbfb8 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -875,31 +875,31 @@ struct afs_vnode_cache_aux {
#define __AFS_PAGE_PRIV_MMAPPED 0x8000UL
#endif

-static inline unsigned int afs_page_dirty_resolution(void)
+static inline unsigned int afs_page_dirty_resolution(struct page *page)
{
- int shift = PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
+ int shift = thp_order(page) + PAGE_SHIFT - (__AFS_PAGE_PRIV_SHIFT - 1);
return (shift > 0) ? shift : 0;
}

-static inline size_t afs_page_dirty_from(unsigned long priv)
+static inline size_t afs_page_dirty_from(struct page *page, unsigned long priv)
{
unsigned long x = priv & __AFS_PAGE_PRIV_MASK;

/* The lower bound is inclusive */
- return x << afs_page_dirty_resolution();
+ return x << afs_page_dirty_resolution(page);
}

-static inline size_t afs_page_dirty_to(unsigned long priv)
+static inline size_t afs_page_dirty_to(struct page *page, unsigned long priv)
{
unsigned long x = (priv >> __AFS_PAGE_PRIV_SHIFT) & __AFS_PAGE_PRIV_MASK;

/* The upper bound is immediately beyond the region */
- return (x + 1) << afs_page_dirty_resolution();
+ return (x + 1) << afs_page_dirty_resolution(page);
}

-static inline unsigned long afs_page_dirty(size_t from, size_t to)
+static inline unsigned long afs_page_dirty(struct page *page, size_t from, size_t to)
{
- unsigned int res = afs_page_dirty_resolution();
+ unsigned int res = afs_page_dirty_resolution(page);
from >>= res;
to = (to - 1) >> res;
return (to << __AFS_PAGE_PRIV_SHIFT) | from;
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 92eaa88000d7..9d0cef35ecba 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -112,15 +112,14 @@ int afs_write_begin(struct file *file, struct address_space *mapping,
t = f = 0;
if (PagePrivate(page)) {
priv = page_private(page);
- f = afs_page_dirty_from(priv);
- t = afs_page_dirty_to(priv);
+ f = afs_page_dirty_from(page, priv);
+ t = afs_page_dirty_to(page, priv);
ASSERTCMP(f, <=, t);
}

if (f != t) {
if (PageWriteback(page)) {
- trace_afs_page_dirty(vnode, tracepoint_string("alrdy"),
- page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("alrdy"), page);
goto flush_conflicting_write;
}
/* If the file is being filled locally, allow inter-write
@@ -204,21 +203,19 @@ int afs_write_end(struct file *file, struct address_space *mapping,

if (PagePrivate(page)) {
priv = page_private(page);
- f = afs_page_dirty_from(priv);
- t = afs_page_dirty_to(priv);
+ f = afs_page_dirty_from(page, priv);
+ t = afs_page_dirty_to(page, priv);
if (from < f)
f = from;
if (to > t)
t = to;
- priv = afs_page_dirty(f, t);
+ priv = afs_page_dirty(page, f, t);
set_page_private(page, priv);
- trace_afs_page_dirty(vnode, tracepoint_string("dirty+"),
- page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("dirty+"), page);
} else {
- priv = afs_page_dirty(from, to);
+ priv = afs_page_dirty(page, from, to);
attach_page_private(page, (void *)priv);
- trace_afs_page_dirty(vnode, tracepoint_string("dirty"),
- page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("dirty"), page);
}

set_page_dirty(page);
@@ -321,7 +318,6 @@ static void afs_pages_written_back(struct afs_vnode *vnode,
pgoff_t first, pgoff_t last)
{
struct pagevec pv;
- unsigned long priv;
unsigned count, loop;

_enter("{%llx:%llu},{%lx-%lx}",
@@ -340,9 +336,9 @@ static void afs_pages_written_back(struct afs_vnode *vnode,
ASSERTCMP(pv.nr, ==, count);

for (loop = 0; loop < count; loop++) {
- priv = (unsigned long)detach_page_private(pv.pages[loop]);
+ detach_page_private(pv.pages[loop]);
trace_afs_page_dirty(vnode, tracepoint_string("clear"),
- pv.pages[loop]->index, priv);
+ pv.pages[loop]);
end_page_writeback(pv.pages[loop]);
}
first += count;
@@ -516,15 +512,13 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
*/
start = primary_page->index;
priv = page_private(primary_page);
- offset = afs_page_dirty_from(priv);
- to = afs_page_dirty_to(priv);
- trace_afs_page_dirty(vnode, tracepoint_string("store"),
- primary_page->index, priv);
+ offset = afs_page_dirty_from(primary_page, priv);
+ to = afs_page_dirty_to(primary_page, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);

WARN_ON(offset == to);
if (offset == to)
- trace_afs_page_dirty(vnode, tracepoint_string("WARN"),
- primary_page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);

if (start >= final_page ||
(to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)))
@@ -562,8 +556,8 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
}

priv = page_private(page);
- f = afs_page_dirty_from(priv);
- t = afs_page_dirty_to(priv);
+ f = afs_page_dirty_from(page, priv);
+ t = afs_page_dirty_to(page, priv);
if (f != 0 &&
!test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) {
unlock_page(page);
@@ -571,8 +565,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
}
to = t;

- trace_afs_page_dirty(vnode, tracepoint_string("store+"),
- page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("store+"), page);

if (!clear_page_dirty_for_io(page))
BUG();
@@ -861,14 +854,13 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
*/
wait_on_page_writeback(vmf->page);

- priv = afs_page_dirty(0, PAGE_SIZE);
+ priv = afs_page_dirty(vmf->page, 0, PAGE_SIZE);
priv = afs_page_dirty_mmapped(priv);
- trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"),
- vmf->page->index, priv);
if (PagePrivate(vmf->page))
set_page_private(vmf->page, priv);
else
attach_page_private(vmf->page, (void *)priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("mkwrite"), vmf->page);
file_update_time(file);

sb_end_pagefault(inode->i_sb);
@@ -921,17 +913,15 @@ int afs_launder_page(struct page *page)
f = 0;
t = PAGE_SIZE;
if (PagePrivate(page)) {
- f = afs_page_dirty_from(priv);
- t = afs_page_dirty_to(priv);
+ f = afs_page_dirty_from(page, priv);
+ t = afs_page_dirty_to(page, priv);
}

- trace_afs_page_dirty(vnode, tracepoint_string("launder"),
- page->index, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("launder"), page);
ret = afs_store_data(mapping, page->index, page->index, t, f, true);
}

- priv = (unsigned long)detach_page_private(page);
- trace_afs_page_dirty(vnode, tracepoint_string("laundered"),
- page->index, priv);
+ detach_page_private(page);
+ trace_afs_page_dirty(vnode, tracepoint_string("laundered"), page);
return ret;
}
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index 4a5cc8c64be3..9203cf6a8c53 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -969,30 +969,33 @@ TRACE_EVENT(afs_dir_check_failed,
);

TRACE_EVENT(afs_page_dirty,
- TP_PROTO(struct afs_vnode *vnode, const char *where,
- pgoff_t page, unsigned long priv),
+ TP_PROTO(struct afs_vnode *vnode, const char *where, struct page *page),

- TP_ARGS(vnode, where, page, priv),
+ TP_ARGS(vnode, where, page),

TP_STRUCT__entry(
__field(struct afs_vnode *, vnode )
__field(const char *, where )
__field(pgoff_t, page )
- __field(unsigned long, priv )
+ __field(unsigned long, from )
+ __field(unsigned long, to )
),

TP_fast_assign(
__entry->vnode = vnode;
__entry->where = where;
- __entry->page = page;
- __entry->priv = priv;
+ __entry->page = page->index;
+ __entry->from = afs_page_dirty_from(page, page->private);
+ __entry->to = afs_page_dirty_to(page, page->private);
+ __entry->to |= (afs_is_page_dirty_mmapped(page->private) ?
+ (1UL << (BITS_PER_LONG - 1)) : 0);
),

- TP_printk("vn=%p %lx %s %zx-%zx%s",
+ TP_printk("vn=%p %lx %s %lx-%lx%s",
__entry->vnode, __entry->page, __entry->where,
- afs_page_dirty_from(__entry->priv),
- afs_page_dirty_to(__entry->priv),
- afs_is_page_dirty_mmapped(__entry->priv) ? " M" : "")
+ __entry->from,
+ __entry->to & ~(1UL << (BITS_PER_LONG - 1)),
+ __entry->to & (1UL << (BITS_PER_LONG - 1)) ? " M" : "")
);

TRACE_EVENT(afs_call_state,


2021-01-21 02:10:54

by David Howells

[permalink] [raw]
Subject: [PATCH 10/25] netfs: Add write_begin helper

Add a helper to do the pre-reading work for the netfs write_begin address
space op.

Signed-off-by: David Howells <[email protected]>
---

fs/netfs/internal.h | 2 +
fs/netfs/read_helper.c | 165 ++++++++++++++++++++++++++++++++++++++++++
fs/netfs/stats.c | 10 ++-
include/linux/netfs.h | 8 ++
include/trace/events/netfs.h | 4 +
5 files changed, 185 insertions(+), 4 deletions(-)

diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index d83317b1eb9d..3d7ab3ab5743 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed;
extern atomic_t netfs_n_rh_zero;
extern atomic_t netfs_n_rh_short_read;
extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_begin;
extern atomic_t netfs_n_rh_write_done;
extern atomic_t netfs_n_rh_write_failed;
+extern atomic_t netfs_n_rh_write_zskip;


static inline void netfs_stat(atomic_t *stat)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 275ac37e2834..eb34c368617d 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -760,3 +760,168 @@ int netfs_readpage(struct file *file,
return ret;
}
EXPORT_SYMBOL(netfs_readpage);
+
+static void netfs_clear_thp(struct page *page)
+{
+ unsigned int i;
+
+ for (i = 0; i < thp_nr_pages(page); i++)
+ clear_highpage(page + i);
+}
+
+/**
+ * netfs_write_begin - Helper to prepare for writing
+ * @file: The file to read from
+ * @mapping: The mapping to read from
+ * @pos: File position at which the write will begin
+ * @len: The length of the write in this page
+ * @flags: AOP_* flags
+ * @_page: Where to put the resultant page
+ * @_fsdata: Place for the netfs to store a cookie
+ * @ops: The network filesystem's operations for the helper to use
+ * @netfs_priv: Private netfs data to be retained in the request
+ *
+ * Pre-read data for a write-begin request by drawing data from the cache if
+ * possible, or the netfs if not. Space beyond the EOF is zero-filled.
+ * Multiple I/O requests from different sources will get munged together. If
+ * necessary, the readahead window can be expanded in either direction to a
+ * more convenient alighment for RPC efficiency or to make storage in the cache
+ * feasible.
+ *
+ * The calling netfs must provide a table of operations, only one of which,
+ * issue_op, is mandatory.
+ *
+ * The check_write_begin() operation can be provided to check for and flush
+ * conflicting writes once the page is grabbed and locked. It is passed a
+ * pointer to the fsdata cookie that gets returned to the VM to be passed to
+ * write_end. It is permitted to sleep. It should return 0 if the request
+ * should go ahead; unlock the page and return -EAGAIN to cause the page to be
+ * regot; or return an error.
+ *
+ * This is usable whether or not caching is enabled.
+ */
+int netfs_write_begin(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned int len, unsigned int flags,
+ struct page **_page, void **_fsdata,
+ const struct netfs_read_request_ops *ops,
+ void *netfs_priv)
+{
+ struct netfs_read_request *rreq;
+ struct page *page, *xpage;
+ struct inode *inode = file_inode(file);
+ unsigned int debug_index = 0;
+ pgoff_t index = pos >> PAGE_SHIFT;
+ int pos_in_page = pos & ~PAGE_MASK;
+ loff_t size;
+ int ret;
+
+ struct readahead_control ractl = {
+ .file = file,
+ .mapping = mapping,
+ ._index = index,
+ ._nr_pages = 0,
+ };
+
+retry:
+ page = grab_cache_page_write_begin(mapping, index, 0);
+ if (!page)
+ return -ENOMEM;
+
+ if (ops->check_write_begin) {
+ /* Allow the netfs (eg. ceph) to flush conflicts. */
+ ret = ops->check_write_begin(file, pos, len, page, _fsdata);
+ if (ret < 0) {
+ if (ret == -EAGAIN)
+ goto retry;
+ goto error;
+ }
+ }
+
+ if (PageUptodate(page))
+ goto have_page;
+
+ /* If the page is beyond the EOF, we want to clear it - unless it's
+ * within the cache granule containing the EOF, in which case we need
+ * to preload the granule.
+ */
+ size = i_size_read(inode);
+ if (!ops->is_cache_enabled(inode) &&
+ ((pos_in_page == 0 && len == thp_size(page)) ||
+ (pos >= size) ||
+ (pos_in_page == 0 && (pos + len) >= size))) {
+ netfs_clear_thp(page);
+ SetPageUptodate(page);
+ netfs_stat(&netfs_n_rh_write_zskip);
+ goto have_page_no_wait;
+ }
+
+ ret = -ENOMEM;
+ rreq = netfs_alloc_read_request(ops, netfs_priv, file);
+ if (!rreq)
+ goto error;
+ rreq->mapping = page->mapping;
+ rreq->start = page->index * PAGE_SIZE;
+ rreq->len = thp_size(page);
+ rreq->no_unlock_page = page->index;
+ __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags);
+ netfs_priv = NULL;
+
+ netfs_stat(&netfs_n_rh_write_begin);
+ trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin);
+
+ /* Expand the request to meet caching requirements and download
+ * preferences.
+ */
+ ractl._nr_pages = thp_nr_pages(page);
+ netfs_rreq_expand(rreq, &ractl);
+ netfs_get_read_request(rreq);
+
+ /* We hold the page locks, so we can drop the references */
+ while ((xpage = readahead_page(&ractl)))
+ if (xpage != page)
+ put_page(xpage);
+
+ atomic_set(&rreq->nr_rd_ops, 1);
+ do {
+ if (!netfs_rreq_submit_slice(rreq, &debug_index))
+ break;
+
+ } while (rreq->submitted < rreq->len);
+
+ // TODO: If we didn't submit enough readage, we need to clean up
+
+ /* Keep nr_rd_ops incremented so that the ref always belongs to us, and
+ * the service code isn't punted off to a random thread pool to
+ * process.
+ */
+ for (;;) {
+ wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1);
+ netfs_rreq_assess(rreq);
+ if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags))
+ break;
+ cond_resched();
+ }
+
+ ret = rreq->error;
+ netfs_put_read_request(rreq);
+ if (ret < 0)
+ goto error;
+
+have_page:
+ wait_on_page_fscache(page);
+have_page_no_wait:
+ if (netfs_priv)
+ ops->cleanup(netfs_priv, mapping);
+ *_page = page;
+ _leave(" = 0");
+ return 0;
+
+error:
+ unlock_page(page);
+ put_page(page);
+ if (netfs_priv)
+ ops->cleanup(netfs_priv, mapping);
+ _leave(" = %d", ret);
+ return ret;
+}
+EXPORT_SYMBOL(netfs_write_begin);
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
index 3a7a3c10e1cd..cdd09e30ce75 100644
--- a/fs/netfs/stats.c
+++ b/fs/netfs/stats.c
@@ -23,19 +23,23 @@ atomic_t netfs_n_rh_read_failed;
atomic_t netfs_n_rh_zero;
atomic_t netfs_n_rh_short_read;
atomic_t netfs_n_rh_write;
+atomic_t netfs_n_rh_write_begin;
atomic_t netfs_n_rh_write_done;
atomic_t netfs_n_rh_write_failed;
+atomic_t netfs_n_rh_write_zskip;

void netfs_stats_show(struct seq_file *m)
{
- seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n",
+ seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u rr=%u sr=%u\n",
atomic_read(&netfs_n_rh_readahead),
atomic_read(&netfs_n_rh_readpage),
+ atomic_read(&netfs_n_rh_write_begin),
atomic_read(&netfs_n_rh_rreq),
atomic_read(&netfs_n_rh_sreq));
- seq_printf(m, "RdHelp : ZR=%u sh=%u\n",
+ seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n",
atomic_read(&netfs_n_rh_zero),
- atomic_read(&netfs_n_rh_short_read));
+ atomic_read(&netfs_n_rh_short_read),
+ atomic_read(&netfs_n_rh_write_zskip));
seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n",
atomic_read(&netfs_n_rh_download),
atomic_read(&netfs_n_rh_download_done),
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 9a262eb36b0f..1a81baecb182 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -87,11 +87,14 @@ struct netfs_read_request {
* Operations the network filesystem can/must provide to the helpers.
*/
struct netfs_read_request_ops {
+ bool (*is_cache_enabled)(struct inode *inode);
void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
void (*expand_readahead)(struct netfs_read_request *rreq);
bool (*clamp_length)(struct netfs_read_subrequest *subreq);
void (*issue_op)(struct netfs_read_subrequest *subreq);
bool (*is_still_valid)(struct netfs_read_request *rreq);
+ int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
+ struct page *page, void **_fsdata);
void (*done)(struct netfs_read_request *rreq);
void (*cleanup)(struct address_space *mapping, void *netfs_priv);
};
@@ -104,6 +107,11 @@ extern int netfs_readpage(struct file *,
struct page *,
const struct netfs_read_request_ops *,
void *);
+extern int netfs_write_begin(struct file *, struct address_space *,
+ loff_t, unsigned int, unsigned int, struct page **,
+ void **,
+ const struct netfs_read_request_ops *,
+ void *);

extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t);
extern void netfs_stats_show(struct seq_file *);
diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h
index 7044c981349d..ed54de2ed14d 100644
--- a/include/trace/events/netfs.h
+++ b/include/trace/events/netfs.h
@@ -22,6 +22,7 @@ enum netfs_read_trace {
netfs_read_trace_expanded,
netfs_read_trace_readahead,
netfs_read_trace_readpage,
+ netfs_read_trace_write_begin,
};

enum netfs_rreq_trace {
@@ -50,7 +51,8 @@ enum netfs_sreq_trace {
#define netfs_read_traces \
EM(netfs_read_trace_expanded, "EXPANDED ") \
EM(netfs_read_trace_readahead, "READAHEAD") \
- E_(netfs_read_trace_readpage, "READPAGE ")
+ EM(netfs_read_trace_readpage, "READPAGE ") \
+ E_(netfs_read_trace_write_begin, "WRITEBEGN")

#define netfs_rreq_traces \
EM(netfs_rreq_trace_assess, "ASSESS") \


2021-01-21 02:10:54

by David Howells

[permalink] [raw]
Subject: [PATCH 11/25] netfs: Define an interface to talk to a cache

Add an interface to the netfs helper library for reading data from the
cache instead of downloading it from the server and support for writing
data just downloaded or cleared to the cache.

The API passes an iov_iter to the cache read/write routines to indicate the
data/buffer to be used. This is done using the ITER_XARRAY type to provide
direct access to the netfs inode's pagecache.

When the netfs's ->begin_cache_operation() method is called, this must fill
in the cache_resources in the netfs_read_request struct, including the
netfs_cache_ops used by the helper lib to talk to the cache. The helper
lib does not directly access the cache.

Signed-off-by: David Howells <[email protected]>
---

fs/netfs/read_helper.c | 217 +++++++++++++++++++++++++++++++++++++++++
include/linux/fscache-cache.h | 4 +
include/linux/fscache.h | 17 +++
include/linux/netfs.h | 48 +++++++++
4 files changed, 285 insertions(+), 1 deletion(-)

diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index eb34c368617d..ca7593148599 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -86,6 +86,8 @@ static void netfs_free_read_request(struct work_struct *work)
if (rreq->netfs_priv)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
+ if (rreq->cache_resources.ops)
+ rreq->cache_resources.ops->end_operation(&rreq->cache_resources);
kfree(rreq);
netfs_stat_d(&netfs_n_rh_rreq);
}
@@ -148,6 +150,32 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq)
iov_iter_zero(iov_iter_count(&iter), &iter);
}

+static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error)
+{
+ struct netfs_read_subrequest *subreq = priv;
+
+ netfs_subreq_terminated(subreq, transferred_or_error);
+}
+
+/*
+ * Issue a read against the cache.
+ * - Eats the caller's ref on subreq.
+ */
+static void netfs_read_from_cache(struct netfs_read_request *rreq,
+ struct netfs_read_subrequest *subreq,
+ bool seek_data)
+{
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
+ struct iov_iter iter;
+
+ iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages,
+ subreq->start + subreq->transferred,
+ subreq->len - subreq->transferred);
+
+ cres->ops->read(cres, subreq->start, &iter, seek_data,
+ netfs_cache_read_terminated, subreq);
+}
+
/*
* Fill a subrequest region with zeroes.
*/
@@ -192,6 +220,128 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq)
netfs_put_read_request(rreq);
}

+/*
+ * Deal with the completion of writing the data to the cache. We have to clear
+ * the PG_fscache bits on the pages involved and release the caller's ref.
+ *
+ * May be called in softirq mode and we inherit a ref from the caller.
+ */
+static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq)
+{
+ struct netfs_read_subrequest *subreq;
+ struct page *page;
+ pgoff_t unlocked = 0;
+ bool have_unlocked = false;
+
+ rcu_read_lock();
+
+ list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+ XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE);
+
+ xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) {
+ /* We might have multiple writes from the same huge
+ * page, but we mustn't unlock a page more than once.
+ */
+ if (have_unlocked && page->index <= unlocked)
+ continue;
+ unlocked = page->index;
+ unlock_page_fscache(page);
+ have_unlocked = true;
+ }
+ }
+
+ rcu_read_unlock();
+ netfs_rreq_completed(rreq);
+}
+
+static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error)
+{
+ struct netfs_read_subrequest *subreq = priv;
+ struct netfs_read_request *rreq = subreq->rreq;
+
+ if (IS_ERR_VALUE(transferred_or_error))
+ subreq->error = transferred_or_error;
+ else
+ subreq->error = 0;
+
+ trace_netfs_sreq(subreq, netfs_sreq_trace_write_term);
+
+ /* If we decrement nr_wr_ops to 0, the ref belongs to us. */
+ if (atomic_dec_and_test(&rreq->nr_wr_ops))
+ netfs_rreq_unmark_after_write(rreq);
+
+ netfs_put_subrequest(subreq);
+}
+
+/*
+ * Perform any outstanding writes to the cache. We inherit a ref from the
+ * caller.
+ */
+static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq)
+{
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
+ struct netfs_read_subrequest *subreq, *next, *p;
+ struct iov_iter iter;
+
+ trace_netfs_rreq(rreq, netfs_rreq_trace_write);
+
+ /* We don't want terminating writes trying to wake us up whilst we're
+ * still going through the list.
+ */
+ atomic_inc(&rreq->nr_wr_ops);
+
+ list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) {
+ if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) {
+ list_del_init(&subreq->rreq_link);
+ netfs_put_subrequest(subreq);
+ }
+ }
+
+ list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+ /* Amalgamate adjacent writes */
+ while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
+ next = list_next_entry(subreq, rreq_link);
+ if (next->start > subreq->start + subreq->len)
+ break;
+ subreq->len += next->len;
+ list_del_init(&next->rreq_link);
+ netfs_put_subrequest(next);
+ }
+
+ iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages,
+ subreq->start, subreq->len);
+
+ atomic_inc(&rreq->nr_wr_ops);
+ netfs_get_read_subrequest(subreq);
+ trace_netfs_sreq(subreq, netfs_sreq_trace_write);
+ cres->ops->write(cres, subreq->start, &iter,
+ netfs_rreq_copy_terminated, subreq);
+ }
+
+ /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */
+ if (atomic_dec_and_test(&rreq->nr_wr_ops))
+ netfs_rreq_unmark_after_write(rreq);
+}
+
+static void netfs_rreq_write_to_cache_work(struct work_struct *work)
+{
+ struct netfs_read_request *rreq =
+ container_of(work, struct netfs_read_request, work);
+
+ netfs_rreq_do_write_to_cache(rreq);
+}
+
+static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq)
+{
+ if (in_softirq()) {
+ rreq->work.func = netfs_rreq_write_to_cache_work;
+ if (!queue_work(system_unbound_wq, &rreq->work))
+ BUG();
+ } else {
+ netfs_rreq_do_write_to_cache(rreq);
+ }
+}
+
/*
* Unlock the pages in a read operation. We need to set PG_fscache on any
* pages we're going to write back before we unlock them.
@@ -289,7 +439,10 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq,

netfs_get_read_subrequest(subreq);
atomic_inc(&rreq->nr_rd_ops);
- netfs_read_from_server(rreq, subreq);
+ if (subreq->source == NETFS_READ_FROM_CACHE)
+ netfs_read_from_cache(rreq, subreq, true);
+ else
+ netfs_read_from_server(rreq, subreq);
}

/*
@@ -334,6 +487,25 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
return false;
}

+/*
+ * Check to see if the data read is still valid.
+ */
+static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq)
+{
+ struct netfs_read_subrequest *subreq;
+
+ if (!rreq->netfs_ops->is_still_valid ||
+ rreq->netfs_ops->is_still_valid(rreq))
+ return;
+
+ list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+ if (subreq->source == NETFS_READ_FROM_CACHE) {
+ subreq->error = -ESTALE;
+ __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
+ }
+ }
+}
+
/*
* Assess the state of a read request and decide what to do next.
*
@@ -345,6 +517,8 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq)
trace_netfs_rreq(rreq, netfs_rreq_trace_assess);

again:
+ netfs_rreq_is_still_valid(rreq);
+
if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) &&
test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) {
if (netfs_rreq_perform_resubmissions(rreq))
@@ -357,6 +531,9 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq)
clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);

+ if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags))
+ return netfs_rreq_write_to_cache(rreq);
+
netfs_rreq_completed(rreq);
}

@@ -486,7 +663,10 @@ static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequ
loff_t i_size)
{
struct netfs_read_request *rreq = subreq->rreq;
+ struct netfs_cache_resources *cres = &rreq->cache_resources;

+ if (cres->ops)
+ return cres->ops->prepare_read(subreq, i_size);
if (subreq->start >= rreq->i_size)
return NETFS_FILL_WITH_ZEROES;
return NETFS_DOWNLOAD_FROM_SERVER;
@@ -578,6 +758,9 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
case NETFS_DOWNLOAD_FROM_SERVER:
netfs_read_from_server(rreq, subreq);
break;
+ case NETFS_READ_FROM_CACHE:
+ netfs_read_from_cache(rreq, subreq, false);
+ break;
default:
BUG();
}
@@ -590,9 +773,23 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq,
return false;
}

+static void netfs_cache_expand_readahead(struct netfs_read_request *rreq,
+ loff_t *_start, size_t *_len, loff_t i_size)
+{
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
+
+ if (cres->ops && cres->ops->expand_readahead)
+ cres->ops->expand_readahead(cres, _start, _len, i_size);
+}
+
static void netfs_rreq_expand(struct netfs_read_request *rreq,
struct readahead_control *ractl)
{
+ /* Give the cache a chance to change the request parameters. The
+ * resultant request must contain the original region.
+ */
+ netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size);
+
/* Give the netfs a chance to change the request parameters. The
* resultant request must contain the original region.
*/
@@ -644,6 +841,7 @@ void netfs_readahead(struct readahead_control *ractl,
struct netfs_read_request *rreq;
struct page *page;
unsigned int debug_index = 0;
+ int ret;

_enter("%lx,%x", readahead_index(ractl), readahead_count(ractl));

@@ -661,6 +859,11 @@ void netfs_readahead(struct readahead_control *ractl,
trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
netfs_read_trace_readahead);

+ if (ops->begin_cache_operation) {
+ ret = ops->begin_cache_operation(rreq);
+ if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
+ goto cleanup_free;
+ }
netfs_rreq_expand(rreq, ractl);

atomic_set(&rreq->nr_rd_ops, 1);
@@ -686,6 +889,9 @@ void netfs_readahead(struct readahead_control *ractl,
netfs_rreq_assess(rreq);
return;

+cleanup_free:
+ netfs_put_read_request(rreq);
+ return;
cleanup:
if (netfs_priv)
ops->cleanup(ractl->mapping, netfs_priv);
@@ -735,6 +941,14 @@ int netfs_readpage(struct file *file,
netfs_stat(&netfs_n_rh_readpage);
trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage);

+ if (ops->begin_cache_operation) {
+ ret = ops->begin_cache_operation(rreq);
+ if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) {
+ unlock_page(page);
+ goto out;
+ }
+ }
+
netfs_get_read_request(rreq);

atomic_set(&rreq->nr_rd_ops, 1);
@@ -756,6 +970,7 @@ int netfs_readpage(struct file *file,
} while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags));

ret = rreq->error;
+out:
netfs_put_read_request(rreq);
return ret;
}
diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 3f0b19dcfae7..3235ddbdcc09 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -304,6 +304,10 @@ struct fscache_cache_ops {

/* dissociate a cache from all the pages it was backing */
void (*dissociate_pages)(struct fscache_cache *cache);
+
+ /* Begin a read operation for the netfs lib */
+ int (*begin_read_operation)(struct netfs_read_request *rreq,
+ struct fscache_retrieval *op);
};

extern struct fscache_cookie fscache_fsdef_index;
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 1f8dc72369ee..0e4161ce347c 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -37,6 +37,7 @@ struct pagevec;
struct fscache_cache_tag;
struct fscache_cookie;
struct fscache_netfs;
+struct netfs_read_request;

typedef void (*fscache_rw_complete_t)(struct page *page,
void *context,
@@ -217,6 +218,7 @@ extern void __fscache_readpages_cancel(struct fscache_cookie *cookie,
extern void __fscache_disable_cookie(struct fscache_cookie *, const void *, bool);
extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_t,
bool (*)(void *), void *);
+extern int __fscache_begin_read_operation(struct netfs_read_request *, struct fscache_cookie *);

/**
* fscache_register_netfs - Register a filesystem as desiring caching services
@@ -831,4 +833,19 @@ void fscache_enable_cookie(struct fscache_cookie *cookie,
can_enable, data);
}

+/**
+ * fscache_begin_read_operation - Begin a read operation for the netfs lib
+ * @rreq: The read request being undertaken
+ * @cookie: The cookie representing the cache object
+ */
+static inline
+int fscache_begin_read_operation(struct netfs_read_request *rreq,
+ struct fscache_cookie *cookie)
+{
+ if (fscache_cookie_valid(cookie) && fscache_cookie_enabled(cookie))
+ return __fscache_begin_read_operation(rreq, cookie);
+ else
+ return -ENOBUFS;
+}
+
#endif /* _LINUX_FSCACHE_H */
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 1a81baecb182..9ca64f458f95 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -32,6 +32,17 @@ enum netfs_read_source {
NETFS_INVALID_READ,
} __mode(byte);

+typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error);
+
+/*
+ * Resources required to do operations on a cache.
+ */
+struct netfs_cache_resources {
+ const struct netfs_cache_ops *ops;
+ void *cache_priv;
+ void *cache_priv2;
+};
+
/*
* Descriptor for a single component subrequest.
*/
@@ -61,11 +72,13 @@ struct netfs_read_request {
struct work_struct work;
struct inode *inode; /* The file being accessed */
struct address_space *mapping; /* The mapping being accessed */
+ struct netfs_cache_resources cache_resources;
struct list_head subrequests; /* Requests to fetch I/O from disk or net */
void *netfs_priv; /* Private data for the netfs */
unsigned int debug_id;
unsigned int cookie_debug_id;
atomic_t nr_rd_ops; /* Number of read ops in progress */
+ atomic_t nr_wr_ops; /* Number of write ops in progress */
size_t submitted; /* Amount submitted for I/O so far */
size_t len; /* Length of the request */
short error; /* 0 or error that occurred */
@@ -89,6 +102,7 @@ struct netfs_read_request {
struct netfs_read_request_ops {
bool (*is_cache_enabled)(struct inode *inode);
void (*init_rreq)(struct netfs_read_request *rreq, struct file *file);
+ int (*begin_cache_operation)(struct netfs_read_request *rreq);
void (*expand_readahead)(struct netfs_read_request *rreq);
bool (*clamp_length)(struct netfs_read_subrequest *subreq);
void (*issue_op)(struct netfs_read_subrequest *subreq);
@@ -99,6 +113,40 @@ struct netfs_read_request_ops {
void (*cleanup)(struct address_space *mapping, void *netfs_priv);
};

+/*
+ * Table of operations for access to a cache. This is obtained by
+ * rreq->ops->begin_cache_operation().
+ */
+struct netfs_cache_ops {
+ /* End an operation */
+ void (*end_operation)(struct netfs_cache_resources *cres);
+
+ /* Read data from the cache */
+ int (*read)(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ bool seek_data,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv);
+
+ /* Write data to the cache */
+ int (*write)(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv);
+
+ /* Expand readahead request */
+ void (*expand_readahead)(struct netfs_cache_resources *cres,
+ loff_t *_start, size_t *_len, loff_t i_size);
+
+ /* Prepare a read operation, shortening it to a cached/uncached
+ * boundary as appropriate.
+ */
+ enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq,
+ loff_t i_size);
+};
+
struct readahead_control;
extern void netfs_readahead(struct readahead_control *,
const struct netfs_read_request_ops *,


2021-01-21 02:11:48

by David Howells

[permalink] [raw]
Subject: [PATCH 12/25] fscache, cachefiles: Add alternate API to use kiocb for read/write to cache

Add an alternate API by which the cache can be accessed through a kiocb,
doing async DIO, rather than using the current API that tells the cache
where all the pages are.

The new API is intended to be used in conjunction with the netfs helper
library. A filesystem must pick one or the other and not mix them.

Signed-off-by: David Howells <[email protected]>
---

fs/cachefiles/Makefile | 1
fs/cachefiles/interface.c | 5 -
fs/cachefiles/internal.h | 9 +
fs/cachefiles/rdwr2.c | 406 +++++++++++++++++++++++++++++++++++++++++++++
fs/fscache/Makefile | 3
fs/fscache/internal.h | 3
fs/fscache/page.c | 2
fs/fscache/page2.c | 116 +++++++++++++
fs/fscache/stats.c | 1
9 files changed, 542 insertions(+), 4 deletions(-)
create mode 100644 fs/cachefiles/rdwr2.c
create mode 100644 fs/fscache/page2.c

diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 891dedda5905..ea17b169ea5e 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -11,6 +11,7 @@ cachefiles-y := \
main.o \
namei.o \
rdwr.o \
+ rdwr2.o \
security.o \
xattr.o

diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 4cea5fbf695e..eaede2585d07 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -319,8 +319,8 @@ static void cachefiles_drop_object(struct fscache_object *_object)
/*
* dispose of a reference to an object
*/
-static void cachefiles_put_object(struct fscache_object *_object,
- enum fscache_obj_ref_trace why)
+void cachefiles_put_object(struct fscache_object *_object,
+ enum fscache_obj_ref_trace why)
{
struct cachefiles_object *object;
struct fscache_cache *cache;
@@ -568,4 +568,5 @@ const struct fscache_cache_ops cachefiles_cache_ops = {
.uncache_page = cachefiles_uncache_page,
.dissociate_pages = cachefiles_dissociate_pages,
.check_consistency = cachefiles_check_consistency,
+ .begin_read_operation = cachefiles_begin_read_operation,
};
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index cf9bd6401c2d..896819b80bbc 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -150,6 +150,9 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
*/
extern const struct fscache_cache_ops cachefiles_cache_ops;

+extern void cachefiles_put_object(struct fscache_object *_object,
+ enum fscache_obj_ref_trace why);
+
/*
* key.c
*/
@@ -217,6 +220,12 @@ extern int cachefiles_allocate_pages(struct fscache_retrieval *,
extern int cachefiles_write_page(struct fscache_storage *, struct page *);
extern void cachefiles_uncache_page(struct fscache_object *, struct page *);

+/*
+ * rdwr2.c
+ */
+extern int cachefiles_begin_read_operation(struct netfs_read_request *,
+ struct fscache_retrieval *);
+
/*
* security.c
*/
diff --git a/fs/cachefiles/rdwr2.c b/fs/cachefiles/rdwr2.c
new file mode 100644
index 000000000000..841dfabf00ea
--- /dev/null
+++ b/fs/cachefiles/rdwr2.c
@@ -0,0 +1,406 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* kiocb-using read/write
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/mount.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/uio.h>
+#include <linux/sched/mm.h>
+#include <linux/netfs.h>
+#include "internal.h"
+
+struct cachefiles_kiocb {
+ struct kiocb iocb;
+ refcount_t ki_refcnt;
+ loff_t start;
+ union {
+ size_t skipped;
+ size_t len;
+ };
+ netfs_io_terminated_t term_func;
+ void *term_func_priv;
+};
+
+static inline void cachefiles_put_kiocb(struct cachefiles_kiocb *ki)
+{
+ if (refcount_dec_and_test(&ki->ki_refcnt)) {
+ fput(ki->iocb.ki_filp);
+ kfree(ki);
+ }
+}
+
+/*
+ * Handle completion of a read from the cache.
+ */
+static void cachefiles_read_complete(struct kiocb *iocb, long ret, long ret2)
+{
+ struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+
+ _enter("%ld,%ld", ret, ret2);
+
+ if (ki->term_func) {
+ if (ret < 0)
+ ki->term_func(ki->term_func_priv, ret);
+ else
+ ki->term_func(ki->term_func_priv, ki->skipped + ret);
+ }
+
+ cachefiles_put_kiocb(ki);
+}
+
+/*
+ * Initiate a read from the cache.
+ */
+static int cachefiles_read(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ bool seek_data,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ struct cachefiles_kiocb *ki;
+ struct file *file = cres->cache_priv2;
+ unsigned int old_nofs;
+ ssize_t ret = -ENOBUFS;
+ size_t len = iov_iter_count(iter), skipped = 0;
+
+ _enter("%pD,%li,%llx,%zx/%llx",
+ file, file_inode(file)->i_ino, start_pos, len,
+ i_size_read(file->f_inode));
+
+ /* If the caller asked us to seek for data before doing the read, then
+ * we should do that now. If we find a gap, we fill it with zeros.
+ */
+ if (seek_data) {
+ loff_t off = start_pos, off2;
+
+ off2 = vfs_llseek(file, off, SEEK_DATA);
+ if (off2 < 0 && off2 >= (loff_t)-MAX_ERRNO && off2 != -ENXIO) {
+ skipped = 0;
+ ret = off2;
+ goto presubmission_error;
+ }
+
+ if (off2 == -ENXIO || off2 >= start_pos + len) {
+ /* The region is beyond the EOF or there's no more data
+ * in the region, so clear the rest of the buffer and
+ * return success.
+ */
+ iov_iter_zero(len, iter);
+ skipped = len;
+ ret = 0;
+ goto presubmission_error;
+ }
+
+ skipped = off2 - off;
+ iov_iter_zero(skipped, iter);
+ }
+
+ ret = -ENOBUFS;
+ ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+ if (!ki)
+ goto presubmission_error;
+
+ refcount_set(&ki->ki_refcnt, 2);
+ ki->iocb.ki_filp = file;
+ ki->iocb.ki_pos = start_pos + skipped;
+ ki->iocb.ki_flags = IOCB_DIRECT;
+ ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file));
+ ki->iocb.ki_ioprio = get_current_ioprio();
+ ki->skipped = skipped;
+ ki->term_func = term_func;
+ ki->term_func_priv = term_func_priv;
+
+ if (ki->term_func)
+ ki->iocb.ki_complete = cachefiles_read_complete;
+
+ ret = rw_verify_area(READ, file, &ki->iocb.ki_pos, len - skipped);
+ if (ret < 0)
+ goto presubmission_error_free;
+
+ get_file(ki->iocb.ki_filp);
+
+ old_nofs = memalloc_nofs_save();
+ ret = call_read_iter(file, &ki->iocb, iter);
+ memalloc_nofs_restore(old_nofs);
+ switch (ret) {
+ case -EIOCBQUEUED:
+ goto in_progress;
+
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTARTNOHAND:
+ case -ERESTART_RESTARTBLOCK:
+ /* There's no easy way to restart the syscall since other AIO's
+ * may be already running. Just fail this IO with EINTR.
+ */
+ ret = -EINTR;
+ fallthrough;
+ default:
+ cachefiles_read_complete(&ki->iocb, ret, 0);
+ if (ret > 0)
+ ret = 0;
+ break;
+ }
+
+in_progress:
+ cachefiles_put_kiocb(ki);
+ _leave(" = %zd", ret);
+ return ret;
+
+presubmission_error_free:
+ kfree(ki);
+presubmission_error:
+ if (term_func)
+ term_func(term_func_priv, ret < 0 ? ret : skipped);
+ return ret;
+}
+
+/*
+ * Handle completion of a write to the cache.
+ */
+static void cachefiles_write_complete(struct kiocb *iocb, long ret, long ret2)
+{
+ struct cachefiles_kiocb *ki = container_of(iocb, struct cachefiles_kiocb, iocb);
+ struct inode *inode = file_inode(ki->iocb.ki_filp);
+
+ _enter("%ld,%ld", ret, ret2);
+
+ /* Tell lockdep we inherited freeze protection from submission thread */
+ __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+ __sb_end_write(inode->i_sb, SB_FREEZE_WRITE);
+
+ if (ki->term_func)
+ ki->term_func(ki->term_func_priv, ret);
+
+ cachefiles_put_kiocb(ki);
+}
+
+/*
+ * Initiate a write to the cache.
+ */
+static int cachefiles_write(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ struct cachefiles_kiocb *ki;
+ struct inode *inode;
+ struct file *file = cres->cache_priv2;
+ unsigned int old_nofs;
+ ssize_t ret = -ENOBUFS;
+ size_t len = iov_iter_count(iter);
+
+ _enter("%pD,%li,%llx,%zx/%llx",
+ file, file_inode(file)->i_ino, start_pos, len,
+ i_size_read(file->f_inode));
+
+ ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
+ if (!ki)
+ goto presubmission_error;
+
+ refcount_set(&ki->ki_refcnt, 2);
+ ki->iocb.ki_filp = file;
+ ki->iocb.ki_pos = start_pos;
+ ki->iocb.ki_flags = IOCB_DIRECT | IOCB_WRITE;
+ ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file));
+ ki->iocb.ki_ioprio = get_current_ioprio();
+ ki->start = start_pos;
+ ki->len = len;
+ ki->term_func = term_func;
+ ki->term_func_priv = term_func_priv;
+
+ if (ki->term_func)
+ ki->iocb.ki_complete = cachefiles_write_complete;
+
+ ret = rw_verify_area(WRITE, file, &ki->iocb.ki_pos, iov_iter_count(iter));
+ if (ret < 0)
+ goto presubmission_error_free;
+
+ /* Open-code file_start_write here to grab freeze protection, which
+ * will be released by another thread in aio_complete_rw(). Fool
+ * lockdep by telling it the lock got released so that it doesn't
+ * complain about the held lock when we return to userspace.
+ */
+ inode = file_inode(file);
+ __sb_start_write(inode->i_sb, SB_FREEZE_WRITE);
+ __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+
+ get_file(ki->iocb.ki_filp);
+
+ old_nofs = memalloc_nofs_save();
+ ret = call_write_iter(file, &ki->iocb, iter);
+ memalloc_nofs_restore(old_nofs);
+ switch (ret) {
+ case -EIOCBQUEUED:
+ goto in_progress;
+
+ case -ERESTARTSYS:
+ case -ERESTARTNOINTR:
+ case -ERESTARTNOHAND:
+ case -ERESTART_RESTARTBLOCK:
+ /* There's no easy way to restart the syscall since other AIO's
+ * may be already running. Just fail this IO with EINTR.
+ */
+ ret = -EINTR;
+ /* Fall through */
+ default:
+ cachefiles_write_complete(&ki->iocb, ret, 0);
+ if (ret > 0)
+ ret = 0;
+ break;
+ }
+
+in_progress:
+ cachefiles_put_kiocb(ki);
+ _leave(" = %zd", ret);
+ return ret;
+
+presubmission_error_free:
+ kfree(ki);
+presubmission_error:
+ if (term_func)
+ term_func(term_func_priv, -ENOMEM);
+ return -ENOMEM;
+}
+
+/*
+ * Prepare a read operation, shortening it to a cached/uncached
+ * boundary as appropriate.
+ */
+static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq,
+ loff_t i_size)
+{
+ struct fscache_retrieval *op = subreq->rreq->cache_resources.cache_priv;
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ const struct cred *saved_cred;
+ struct file *file = subreq->rreq->cache_resources.cache_priv2;
+ loff_t off;
+
+ _enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size);
+
+ object = container_of(op->op.object,
+ struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ if (!file)
+ goto cache_fail_nosec;
+
+ if (subreq->start >= i_size)
+ return NETFS_FILL_WITH_ZEROES;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+
+ off = vfs_llseek(file, subreq->start, SEEK_DATA);
+ if (off < 0 && off >= (loff_t)-MAX_ERRNO) {
+ if (off == (loff_t)-ENXIO)
+ goto download_and_store;
+ goto cache_fail;
+ }
+
+ if (off >= subreq->start + subreq->len)
+ goto download_and_store;
+
+ if (off > subreq->start) {
+ subreq->len = off - subreq->start;
+ goto download_and_store;
+ }
+
+ off = vfs_llseek(file, subreq->start, SEEK_HOLE);
+ if (off < 0 && off >= (loff_t)-MAX_ERRNO)
+ goto cache_fail;
+
+ if (off < subreq->start + subreq->len)
+ subreq->len = off - subreq->start;
+
+ cachefiles_end_secure(cache, saved_cred);
+ return NETFS_READ_FROM_CACHE;
+
+download_and_store:
+ if (cachefiles_has_space(cache, 0, (subreq->len + PAGE_SIZE - 1) / PAGE_SIZE) == 0)
+ __set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+cache_fail:
+ cachefiles_end_secure(cache, saved_cred);
+cache_fail_nosec:
+ return NETFS_DOWNLOAD_FROM_SERVER;
+}
+
+/*
+ * Clean up an operation.
+ */
+static void cachefiles_end_operation(struct netfs_cache_resources *cres)
+{
+ struct fscache_retrieval *op = cres->cache_priv;
+ struct file *file = cres->cache_priv2;
+
+ _enter("");
+
+ if (file)
+ fput(file);
+ if (op) {
+ fscache_op_complete(&op->op, false);
+ fscache_put_retrieval(op);
+ }
+
+ _leave("");
+}
+
+static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
+ .end_operation = cachefiles_end_operation,
+ .read = cachefiles_read,
+ .write = cachefiles_write,
+ .expand_readahead = NULL,
+ .prepare_read = cachefiles_prepare_read,
+};
+
+/*
+ * Open the cache file when beginning a cache operation.
+ */
+int cachefiles_begin_read_operation(struct netfs_read_request *rreq,
+ struct fscache_retrieval *op)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ struct path path;
+ struct file *file;
+
+ _enter("");
+
+ object = container_of(op->op.object,
+ struct cachefiles_object, fscache);
+ cache = container_of(object->fscache.cache,
+ struct cachefiles_cache, cache);
+
+ path.mnt = cache->mnt;
+ path.dentry = object->backer;
+ file = open_with_fake_path(&path, O_RDWR | O_LARGEFILE | O_DIRECT,
+ d_inode(object->backer), cache->cache_cred);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+ if (!S_ISREG(file_inode(file)->i_mode))
+ goto error_file;
+ if (unlikely(!file->f_op->read_iter) ||
+ unlikely(!file->f_op->write_iter)) {
+ pr_notice("Cache does not support read_iter and write_iter\n");
+ goto error_file;
+ }
+
+ fscache_get_retrieval(op);
+ rreq->cache_resources.cache_priv = op;
+ rreq->cache_resources.cache_priv2 = file;
+ rreq->cache_resources.ops = &cachefiles_netfs_cache_ops;
+ rreq->cookie_debug_id = object->fscache.debug_id;
+ _leave("");
+ return 0;
+
+error_file:
+ fput(file);
+ return -EIO;
+}
diff --git a/fs/fscache/Makefile b/fs/fscache/Makefile
index 79e08e05ef84..38f28fec2aa3 100644
--- a/fs/fscache/Makefile
+++ b/fs/fscache/Makefile
@@ -11,7 +11,8 @@ fscache-y := \
netfs.o \
object.o \
operation.o \
- page.o
+ page.o \
+ page2.o

fscache-$(CONFIG_PROC_FS) += proc.o
fscache-$(CONFIG_FSCACHE_STATS) += stats.o
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 08e91efbce53..c42672cadf2d 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -142,6 +142,9 @@ extern int fscache_wait_for_operation_activation(struct fscache_object *,
atomic_t *,
atomic_t *);
extern void fscache_invalidate_writes(struct fscache_cookie *);
+extern struct fscache_retrieval *fscache_alloc_retrieval(struct fscache_cookie *,
+ struct address_space *,
+ fscache_rw_complete_t, void *);

/*
* proc.c
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 26af6fdf1538..991b0a871744 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -299,7 +299,7 @@ static void fscache_release_retrieval_op(struct fscache_operation *_op)
/*
* allocate a retrieval op
*/
-static struct fscache_retrieval *fscache_alloc_retrieval(
+struct fscache_retrieval *fscache_alloc_retrieval(
struct fscache_cookie *cookie,
struct address_space *mapping,
fscache_rw_complete_t end_io_func,
diff --git a/fs/fscache/page2.c b/fs/fscache/page2.c
new file mode 100644
index 000000000000..e3bc7178d99f
--- /dev/null
+++ b/fs/fscache/page2.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Cache data I/O routines
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#define FSCACHE_DEBUG_LEVEL PAGE
+#include <linux/module.h>
+#include <linux/fscache-cache.h>
+#include <linux/slab.h>
+#include <linux/netfs.h>
+#include "internal.h"
+
+/*
+ * Start a cache read operation.
+ * - we return:
+ * -ENOMEM - out of memory, some pages may be being read
+ * -ERESTARTSYS - interrupted, some pages may be being read
+ * -ENOBUFS - no backing object or space available in which to cache any
+ * pages not being read
+ * -ENODATA - no data available in the backing object for some or all of
+ * the pages
+ * 0 - dispatched a read on all pages
+ */
+int __fscache_begin_read_operation(struct netfs_read_request *rreq,
+ struct fscache_cookie *cookie)
+{
+ struct fscache_retrieval *op;
+ struct fscache_object *object;
+ bool wake_cookie = false;
+ int ret;
+
+ _enter("rr=%08x", rreq->debug_id);
+
+ fscache_stat(&fscache_n_retrievals);
+
+ if (hlist_empty(&cookie->backing_objects))
+ goto nobufs;
+
+ if (test_bit(FSCACHE_COOKIE_INVALIDATING, &cookie->flags)) {
+ _leave(" = -ENOBUFS [invalidating]");
+ return -ENOBUFS;
+ }
+
+ ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
+
+ if (fscache_wait_for_deferred_lookup(cookie) < 0)
+ return -ERESTARTSYS;
+
+ op = fscache_alloc_retrieval(cookie, NULL, NULL, NULL);
+ if (!op)
+ return -ENOMEM;
+ //atomic_set(&op->n_pages, 1);
+ trace_fscache_page_op(cookie, NULL, &op->op, fscache_page_op_retr_multi);
+
+ spin_lock(&cookie->lock);
+
+ if (!fscache_cookie_enabled(cookie) ||
+ hlist_empty(&cookie->backing_objects))
+ goto nobufs_unlock;
+ object = hlist_entry(cookie->backing_objects.first,
+ struct fscache_object, cookie_link);
+
+ __fscache_use_cookie(cookie);
+ atomic_inc(&object->n_reads);
+ __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags);
+
+ if (fscache_submit_op(object, &op->op) < 0)
+ goto nobufs_unlock_dec;
+ spin_unlock(&cookie->lock);
+
+ fscache_stat(&fscache_n_retrieval_ops);
+
+ /* we wait for the operation to become active, and then process it
+ * *here*, in this thread, and not in the thread pool */
+ ret = fscache_wait_for_operation_activation(
+ object, &op->op,
+ __fscache_stat(&fscache_n_retrieval_op_waits),
+ __fscache_stat(&fscache_n_retrievals_object_dead));
+ if (ret < 0)
+ goto error;
+
+ /* ask the cache to honour the operation */
+ ret = object->cache->ops->begin_read_operation(rreq, op);
+
+error:
+ if (ret == -ENOMEM)
+ fscache_stat(&fscache_n_retrievals_nomem);
+ else if (ret == -ERESTARTSYS)
+ fscache_stat(&fscache_n_retrievals_intr);
+ else if (ret == -ENODATA)
+ fscache_stat(&fscache_n_retrievals_nodata);
+ else if (ret < 0)
+ fscache_stat(&fscache_n_retrievals_nobufs);
+ else
+ fscache_stat(&fscache_n_retrievals_ok);
+
+ fscache_put_retrieval(op);
+ _leave(" = %d", ret);
+ return ret;
+
+nobufs_unlock_dec:
+ atomic_dec(&object->n_reads);
+ wake_cookie = __fscache_unuse_cookie(cookie);
+nobufs_unlock:
+ spin_unlock(&cookie->lock);
+ fscache_put_retrieval(op);
+ if (wake_cookie)
+ __fscache_wake_unused_cookie(cookie);
+nobufs:
+ fscache_stat(&fscache_n_retrievals_nobufs);
+ _leave(" = -ENOBUFS");
+ return -ENOBUFS;
+}
+EXPORT_SYMBOL(__fscache_begin_read_operation);
diff --git a/fs/fscache/stats.c b/fs/fscache/stats.c
index a5aa93ece8c5..a7c3ed89a3e0 100644
--- a/fs/fscache/stats.c
+++ b/fs/fscache/stats.c
@@ -278,5 +278,6 @@ int fscache_stats_show(struct seq_file *m, void *v)
atomic_read(&fscache_n_cache_stale_objects),
atomic_read(&fscache_n_cache_retired_objects),
atomic_read(&fscache_n_cache_culled_objects));
+ netfs_stats_show(m);
return 0;
}


2021-01-21 02:11:53

by David Howells

[permalink] [raw]
Subject: [PATCH 22/25] afs: Extract writeback extension into its own function

Extract writeback extension into its own function to break up the writeback
function a bit.

Signed-off-by: David Howells <[email protected]>
---

fs/afs/write.c | 109 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 67 insertions(+), 42 deletions(-)

diff --git a/fs/afs/write.c b/fs/afs/write.c
index e1791de90478..89c804bfe253 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -490,47 +490,25 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter,
}

/*
- * Synchronously write back the locked page and any subsequent non-locked dirty
- * pages.
+ * Extend the region to be written back to include subsequent contiguously
+ * dirty pages if possible, but don't sleep while doing so.
+ *
+ * If this page holds new content, then we can include filler zeros in the
+ * writeback.
*/
-static int afs_write_back_from_locked_page(struct address_space *mapping,
- struct writeback_control *wbc,
- struct page *primary_page,
- pgoff_t final_page)
+static void afs_extend_writeback(struct address_space *mapping,
+ struct afs_vnode *vnode,
+ long *_count,
+ pgoff_t start,
+ pgoff_t final_page,
+ unsigned *_offset,
+ unsigned *_to,
+ bool new_content)
{
- struct afs_vnode *vnode = AFS_FS_I(mapping->host);
- struct iov_iter iter;
struct page *pages[8], *page;
- unsigned long count, priv;
- unsigned n, offset, to, f, t;
- pgoff_t start, first, last;
- loff_t i_size, pos, end;
- int loop, ret;
-
- _enter(",%lx", primary_page->index);
-
- count = 1;
- if (test_set_page_writeback(primary_page))
- BUG();
-
- /* Find all consecutive lockable dirty pages that have contiguous
- * written regions, stopping when we find a page that is not
- * immediately lockable, is not dirty or is missing, or we reach the
- * end of the range.
- */
- start = primary_page->index;
- priv = page_private(primary_page);
- offset = afs_page_dirty_from(primary_page, priv);
- to = afs_page_dirty_to(primary_page, priv);
- trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
-
- WARN_ON(offset == to);
- if (offset == to)
- trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);
-
- if (start >= final_page ||
- (to < PAGE_SIZE && !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)))
- goto no_more;
+ unsigned long count = *_count, priv;
+ unsigned offset = *_offset, to = *_to, n, f, t;
+ int loop;

start++;
do {
@@ -551,8 +529,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,

for (loop = 0; loop < n; loop++) {
page = pages[loop];
- if (to != PAGE_SIZE &&
- !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags))
+ if (to != PAGE_SIZE && !new_content)
break;
if (page->index > final_page)
break;
@@ -566,8 +543,7 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
priv = page_private(page);
f = afs_page_dirty_from(page, priv);
t = afs_page_dirty_to(page, priv);
- if (f != 0 &&
- !test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags)) {
+ if (f != 0 && !new_content) {
unlock_page(page);
break;
}
@@ -593,6 +569,55 @@ static int afs_write_back_from_locked_page(struct address_space *mapping,
} while (start <= final_page && count < 65536);

no_more:
+ *_count = count;
+ *_offset = offset;
+ *_to = to;
+}
+
+/*
+ * Synchronously write back the locked page and any subsequent non-locked dirty
+ * pages.
+ */
+static int afs_write_back_from_locked_page(struct address_space *mapping,
+ struct writeback_control *wbc,
+ struct page *primary_page,
+ pgoff_t final_page)
+{
+ struct afs_vnode *vnode = AFS_FS_I(mapping->host);
+ struct iov_iter iter;
+ unsigned long count, priv;
+ unsigned offset, to;
+ pgoff_t start, first, last;
+ loff_t i_size, pos, end;
+ bool new_content = test_bit(AFS_VNODE_NEW_CONTENT, &vnode->flags);
+ int ret;
+
+ _enter(",%lx", primary_page->index);
+
+ count = 1;
+ if (test_set_page_writeback(primary_page))
+ BUG();
+
+ /* Find all consecutive lockable dirty pages that have contiguous
+ * written regions, stopping when we find a page that is not
+ * immediately lockable, is not dirty or is missing, or we reach the
+ * end of the range.
+ */
+ start = primary_page->index;
+ priv = page_private(primary_page);
+ offset = afs_page_dirty_from(primary_page, priv);
+ to = afs_page_dirty_to(primary_page, priv);
+ trace_afs_page_dirty(vnode, tracepoint_string("store"), primary_page);
+
+ WARN_ON(offset == to);
+ if (offset == to)
+ trace_afs_page_dirty(vnode, tracepoint_string("WARN"), primary_page);
+
+ if (start < final_page &&
+ (to == PAGE_SIZE || new_content))
+ afs_extend_writeback(mapping, vnode, &count, start, final_page,
+ &offset, &to, new_content);
+
/* We now have a contiguous set of dirty pages, each with writeback
* set; the first page is still locked at this point, but all the rest
* have been unlocked.


2021-01-21 02:46:40

by David Howells

[permalink] [raw]
Subject: [PATCH 04/25] vfs: Export rw_verify_area() for use by cachefiles

Export rw_verify_area() for so that cachefiles can use it before issuing
call_read_iter() and call_write_iter() to effect async DIO operations
against the cache.

Signed-off-by: David Howells <[email protected]>
---

fs/internal.h | 5 -----
fs/read_write.c | 1 +
include/linux/fs.h | 1 +
3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 77c50befbfbe..92e686249c40 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -164,11 +164,6 @@ extern char *simple_dname(struct dentry *, char *, int);
extern void dput_to_list(struct dentry *, struct list_head *);
extern void shrink_dentry_list(struct list_head *);

-/*
- * read_write.c
- */
-extern int rw_verify_area(int, struct file *, const loff_t *, size_t);
-
/*
* pipe.c
*/
diff --git a/fs/read_write.c b/fs/read_write.c
index 75f764b43418..fe84e11245bd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -400,6 +400,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t
return security_file_permission(file,
read_write == READ ? MAY_READ : MAY_WRITE);
}
+EXPORT_SYMBOL(rw_verify_area);

static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd47deea7c17..493804856ab3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2760,6 +2760,7 @@ extern int notify_change(struct dentry *, struct iattr *, struct inode **);
extern int inode_permission(struct inode *, int);
extern int generic_permission(struct inode *, int);
extern int __check_sticky(struct inode *dir, struct inode *inode);
+extern int rw_verify_area(int, struct file *, const loff_t *, size_t);

static inline bool execute_ok(struct inode *inode)
{


2021-01-21 03:02:29

by David Howells

[permalink] [raw]
Subject: [PATCH 09/25] netfs: Gather stats

Gather statistics from the netfs interface that can be exported through a
seqfile. This is intended to be called by a later patch when viewing
/proc/fs/fscache/stats.

Signed-off-by: David Howells <[email protected]>
---

fs/netfs/Kconfig | 15 ++++++++++++++
fs/netfs/Makefile | 3 +--
fs/netfs/internal.h | 34 +++++++++++++++++++++++++++++++
fs/netfs/read_helper.c | 23 +++++++++++++++++++++
fs/netfs/stats.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/netfs.h | 1 +
6 files changed, 127 insertions(+), 2 deletions(-)
create mode 100644 fs/netfs/stats.c

diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig
index 2ebf90e6ca95..578112713703 100644
--- a/fs/netfs/Kconfig
+++ b/fs/netfs/Kconfig
@@ -6,3 +6,18 @@ config NETFS_SUPPORT
This option enables support for network filesystems, including
helpers for high-level buffered I/O, abstracting out read
segmentation, local caching and transparent huge page support.
+
+config NETFS_STATS
+ bool "Gather statistical information on local caching"
+ depends on NETFS_SUPPORT && PROC_FS
+ help
+ This option causes statistical information to be gathered on local
+ caching and exported through file:
+
+ /proc/fs/fscache/stats
+
+ The gathering of statistics adds a certain amount of overhead to
+ execution as there are a quite a few stats gathered, and on a
+ multi-CPU system these may be on cachelines that keep bouncing
+ between CPUs. On the other hand, the stats are very useful for
+ debugging purposes. Saying 'Y' here is recommended.
diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile
index 4b4eff2ba369..c15bfc966d96 100644
--- a/fs/netfs/Makefile
+++ b/fs/netfs/Makefile
@@ -1,6 +1,5 @@
# SPDX-License-Identifier: GPL-2.0

-netfs-y := \
- read_helper.o
+netfs-y := read_helper.o stats.o

obj-$(CONFIG_NETFS_SUPPORT) := netfs.o
diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h
index b07f43f9130e..d83317b1eb9d 100644
--- a/fs/netfs/internal.h
+++ b/fs/netfs/internal.h
@@ -16,8 +16,42 @@
*/
extern unsigned int netfs_debug;

+/*
+ * stats.c
+ */
+#ifdef CONFIG_NETFS_STATS
+extern atomic_t netfs_n_rh_readahead;
+extern atomic_t netfs_n_rh_readpage;
+extern atomic_t netfs_n_rh_rreq;
+extern atomic_t netfs_n_rh_sreq;
+extern atomic_t netfs_n_rh_download;
+extern atomic_t netfs_n_rh_download_done;
+extern atomic_t netfs_n_rh_download_failed;
+extern atomic_t netfs_n_rh_download_instead;
+extern atomic_t netfs_n_rh_read;
+extern atomic_t netfs_n_rh_read_done;
+extern atomic_t netfs_n_rh_read_failed;
+extern atomic_t netfs_n_rh_zero;
+extern atomic_t netfs_n_rh_short_read;
+extern atomic_t netfs_n_rh_write;
+extern atomic_t netfs_n_rh_write_done;
+extern atomic_t netfs_n_rh_write_failed;
+
+
+static inline void netfs_stat(atomic_t *stat)
+{
+ atomic_inc(stat);
+}
+
+static inline void netfs_stat_d(atomic_t *stat)
+{
+ atomic_dec(stat);
+}
+
+#else
#define netfs_stat(x) do {} while(0)
#define netfs_stat_d(x) do {} while(0)
+#endif

/*****************************************************************************/
/*
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 2a90314ea36f..275ac37e2834 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -55,6 +55,7 @@ static struct netfs_read_request *netfs_alloc_read_request(
refcount_set(&rreq->usage, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
ops->init_rreq(rreq, file);
+ netfs_stat(&netfs_n_rh_rreq);
}

return rreq;
@@ -86,6 +87,7 @@ static void netfs_free_read_request(struct work_struct *work)
rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv);
trace_netfs_rreq(rreq, netfs_rreq_trace_free);
kfree(rreq);
+ netfs_stat_d(&netfs_n_rh_rreq);
}

static void netfs_put_read_request(struct netfs_read_request *rreq)
@@ -114,6 +116,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest(
INIT_LIST_HEAD(&subreq->rreq_link);
refcount_set(&subreq->usage, 2);
subreq->rreq = rreq;
+ netfs_stat(&netfs_n_rh_sreq);
}

return subreq;
@@ -129,6 +132,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq)
trace_netfs_sreq(subreq, netfs_sreq_trace_free);
netfs_put_read_request(subreq->rreq);
kfree(subreq);
+ netfs_stat_d(&netfs_n_rh_sreq);
}

/*
@@ -150,6 +154,7 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq)
static void netfs_fill_with_zeroes(struct netfs_read_request *rreq,
struct netfs_read_subrequest *subreq)
{
+ netfs_stat(&netfs_n_rh_zero);
__set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
netfs_subreq_terminated(subreq, 0);
}
@@ -173,6 +178,7 @@ static void netfs_fill_with_zeroes(struct netfs_read_request *rreq,
static void netfs_read_from_server(struct netfs_read_request *rreq,
struct netfs_read_subrequest *subreq)
{
+ netfs_stat(&netfs_n_rh_download);
rreq->netfs_ops->issue_op(subreq);
}

@@ -278,6 +284,7 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq,
__clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags);
__set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags);

+ netfs_stat(&netfs_n_rh_short_read);
trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short);

netfs_get_read_subrequest(subreq);
@@ -309,6 +316,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq)
break;
subreq->source = NETFS_DOWNLOAD_FROM_SERVER;
subreq->error = 0;
+ netfs_stat(&netfs_n_rh_download_instead);
trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead);
netfs_get_read_subrequest(subreq);
atomic_inc(&rreq->nr_rd_ops);
@@ -399,6 +407,17 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,
subreq->debug_index, subreq->start, subreq->flags,
transferred_or_error);

+ switch (subreq->source) {
+ case NETFS_READ_FROM_CACHE:
+ netfs_stat(&netfs_n_rh_read_done);
+ break;
+ case NETFS_DOWNLOAD_FROM_SERVER:
+ netfs_stat(&netfs_n_rh_download_done);
+ break;
+ default:
+ break;
+ }
+
if (IS_ERR_VALUE(transferred_or_error)) {
subreq->error = transferred_or_error;
goto failed;
@@ -452,8 +471,10 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq,

failed:
if (subreq->source == NETFS_READ_FROM_CACHE) {
+ netfs_stat(&netfs_n_rh_read_failed);
set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags);
} else {
+ netfs_stat(&netfs_n_rh_download_failed);
set_bit(NETFS_RREQ_FAILED, &rreq->flags);
rreq->error = subreq->error;
}
@@ -636,6 +657,7 @@ void netfs_readahead(struct readahead_control *ractl,
rreq->start = readahead_pos(ractl);
rreq->len = readahead_length(ractl);

+ netfs_stat(&netfs_n_rh_readahead);
trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl),
netfs_read_trace_readahead);

@@ -710,6 +732,7 @@ int netfs_readpage(struct file *file,
rreq->start = page->index * PAGE_SIZE;
rreq->len = thp_size(page);

+ netfs_stat(&netfs_n_rh_readpage);
trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage);

netfs_get_read_request(rreq);
diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c
new file mode 100644
index 000000000000..3a7a3c10e1cd
--- /dev/null
+++ b/fs/netfs/stats.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Netfs support statistics
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([email protected])
+ */
+
+#include <linux/seq_file.h>
+#include <linux/netfs.h>
+#include "internal.h"
+
+atomic_t netfs_n_rh_readahead;
+atomic_t netfs_n_rh_readpage;
+atomic_t netfs_n_rh_rreq;
+atomic_t netfs_n_rh_sreq;
+atomic_t netfs_n_rh_download;
+atomic_t netfs_n_rh_download_done;
+atomic_t netfs_n_rh_download_failed;
+atomic_t netfs_n_rh_download_instead;
+atomic_t netfs_n_rh_read;
+atomic_t netfs_n_rh_read_done;
+atomic_t netfs_n_rh_read_failed;
+atomic_t netfs_n_rh_zero;
+atomic_t netfs_n_rh_short_read;
+atomic_t netfs_n_rh_write;
+atomic_t netfs_n_rh_write_done;
+atomic_t netfs_n_rh_write_failed;
+
+void netfs_stats_show(struct seq_file *m)
+{
+ seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n",
+ atomic_read(&netfs_n_rh_readahead),
+ atomic_read(&netfs_n_rh_readpage),
+ atomic_read(&netfs_n_rh_rreq),
+ atomic_read(&netfs_n_rh_sreq));
+ seq_printf(m, "RdHelp : ZR=%u sh=%u\n",
+ atomic_read(&netfs_n_rh_zero),
+ atomic_read(&netfs_n_rh_short_read));
+ seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n",
+ atomic_read(&netfs_n_rh_download),
+ atomic_read(&netfs_n_rh_download_done),
+ atomic_read(&netfs_n_rh_download_failed),
+ atomic_read(&netfs_n_rh_download_instead));
+ seq_printf(m, "RdHelp : RD=%u rs=%u rf=%u\n",
+ atomic_read(&netfs_n_rh_read),
+ atomic_read(&netfs_n_rh_read_done),
+ atomic_read(&netfs_n_rh_read_failed));
+ seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n",
+ atomic_read(&netfs_n_rh_write),
+ atomic_read(&netfs_n_rh_write_done),
+ atomic_read(&netfs_n_rh_write_failed));
+}
+EXPORT_SYMBOL(netfs_stats_show);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 606fd6512b4a..9a262eb36b0f 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -106,5 +106,6 @@ extern int netfs_readpage(struct file *,
void *);

extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t);
+extern void netfs_stats_show(struct seq_file *);

#endif /* _LINUX_NETFS_H */


2021-01-21 17:13:29

by David Howells

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

J. Bruce Fields <[email protected]> wrote:

> On Wed, Jan 20, 2021 at 10:21:24PM +0000, David Howells wrote:
> > Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available
> > to be read from the cache. Whilst this is an improvement from the
> > bmap interface, it still has a problem with regard to a modern
> > extent-based filesystem inserting or removing bridging blocks of
> > zeros.
>
> What are the consequences from the point of view of a user?

The cache can get both false positive and false negative results on checks for
the presence of data because an extent-based filesystem can, at will, insert
or remove blocks of contiguous zeros to make the extents easier to encode
(ie. bridge them or split them).

A false-positive means that you get a block of zeros in the middle of your
file that very probably shouldn't be there (ie. file corruption); a
false-negative means that we go and reload the missing chunk from the server.

The problem exists in cachefiles whether we use bmap or we use
SEEK_HOLE/SEEK_DATA. The only way round it is to keep track of what data is
present independently of backing filesystem's metadata.

To this end, it shouldn't (mis)behave differently than the code already there
- except that it handles better the case in which the backing filesystem
blocksize != PAGE_SIZE (which may not be relevant on an extent-based
filesystem anyway if it packs parts of different files together in a single
block) because the current implementation only bmaps the first block in a page
and doesn't probe for the rest.

Fixing this requires a much bigger overhaul of cachefiles than this patchset
performs.

Also, it works towards getting rid of this use of bmap, but that's not user
visible.

David

2021-01-21 19:15:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

On Thu, Jan 21, 2021 at 06:55:13PM +0000, David Howells wrote:
> J. Bruce Fields <[email protected]> wrote:
>
> > > Fixing this requires a much bigger overhaul of cachefiles than this patchset
> > > performs.
> >
> > That sounds like "sometimes you may get file corruption and there's
> > nothing you can do about it". But I know people actually use fscache,
> > so it must be reliable at least for some use cases.
>
> Yes. That's true for the upstream code because that uses bmap.

Sorry, when you say "that's true", what part are you referring to?

> I'm switching
> to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change
> the issue.
>
> > Is it that those "bridging" blocks only show up in certain corner cases
> > that users can arrange to avoid? Or that it's OK as long as you use
> > certain specific file systems whose behavior goes beyond what's
> > technically required by the bamp or seek interfaces?
>
> That's a question for the xfs, ext4 and btrfs maintainers, and may vary
> between kernel versions and fsck or filesystem packing utility versions.

So, I'm still confused: there must be some case where we know fscache
actually works reliably and doesn't corrupt your data, right?

--b.

2021-01-21 19:27:42

by David Howells

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

J. Bruce Fields <[email protected]> wrote:

> > Fixing this requires a much bigger overhaul of cachefiles than this patchset
> > performs.
>
> That sounds like "sometimes you may get file corruption and there's
> nothing you can do about it". But I know people actually use fscache,
> so it must be reliable at least for some use cases.

Yes. That's true for the upstream code because that uses bmap. I'm switching
to use SEEK_HOLE/SEEK_DATA to get rid of the bmap usage, but it doesn't change
the issue.

> Is it that those "bridging" blocks only show up in certain corner cases
> that users can arrange to avoid? Or that it's OK as long as you use
> certain specific file systems whose behavior goes beyond what's
> technically required by the bamp or seek interfaces?

That's a question for the xfs, ext4 and btrfs maintainers, and may vary
between kernel versions and fsck or filesystem packing utility versions.

David

2021-01-22 16:11:58

by David Howells

[permalink] [raw]
Subject: Re: [RFC][PATCH 00/25] Network fs helper library & fscache kiocb API

J. Bruce Fields <[email protected]> wrote:

> > J. Bruce Fields <[email protected]> wrote:
> > > So, I'm still confused: there must be some case where we know fscache
> > > actually works reliably and doesn't corrupt your data, right?
> >
> > Using ext2/3, for example. I don't know under what circumstances xfs, ext4
> > and btrfs might insert/remove blocks of zeros, but I'm told it can happen.
>
> Do ext2/3 work well for fscache in other ways?

Ext3 shouldn't be a problem. That's what I used when developing it. I'm not
sure if ext2 supports xattrs, though.

David