changes since v6:
- rebased to latest upstream branch
- cachefiles: move calling of cachefiles_ondemand_clean_object() into
cachefiles_withdraw_cookie() (it is called in
cachefiles_clean_up_object() previously), so that the previously
anon_fd could be withdrawed if the looking up phase failed (
object->file is still NULL while a previous OPEN request has been
sent to user daemon) (patch 3)
- cachefiles: rename "cinit" (response of OPEN request) to "copen"
(patch 2)
- cachefiles: "fscache: export fscache_end_operation()" is still needed.
Since it has been pull requested for 5.18 (not merged yet), it's not
included in this patchset.
- erofs: register fscache volume for each erofs filesystem (patch 9)
- erofs: rename erofs_is_nodev_mode() to erofs_is_fscache_mode (patch 8)
- erofs: remove erofs_fscache_read_folio() helper (patch 12)
- erofs: remove erofs_fscache_get_map() helper, and no new field is
needed in "struct erofs_map_blocks" now (patch 16)
- erofs: fold several helper functions, and other miscellaneous fixes
regarding to comments from Gao Xiang
Kernel Patchset
---------------
Git tree:
https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v7
Gitweb:
https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v7
User Daemon for Quick Test
--------------------------
Git tree:
https://github.com/lostjeffle/demand-read-cachefilesd.git main
Gitweb:
https://github.com/lostjeffle/demand-read-cachefilesd
RFC: https://lore.kernel.org/all/[email protected]/t/
v1: https://lore.kernel.org/lkml/[email protected]/T/
v2: https://lore.kernel.org/all/[email protected]/t/
v3: https://lore.kernel.org/lkml/[email protected]/T/
v4: https://lore.kernel.org/lkml/[email protected]/T/#t
v5: https://lore.kernel.org/lkml/[email protected]/T/
v6: https://lore.kernel.org/lkml/[email protected]/T/
[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.
erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)
Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.
To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.
That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.
The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.
[1] https://nydus.dev
[2] https://sched.co/pcdL
[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.
When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.
The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.
The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.
The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.
1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user daemon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.
After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.
2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.
[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.
If data has already been ready in the backing file, netfs (e.g. erofs)
will read from the backing file directly and won't be trapped to user
space anymore. Thus the user daemon could fetch data (from remote)
asynchronously on the background, and thus accelerate the backing file
accessing in some degree.
2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.
In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.
Jeffle Xu (19):
cachefiles: extract write routine
cachefiles: notify user daemon with anon_fd when looking up cookie
cachefiles: notify user daemon when withdrawing cookie
cachefiles: implement on-demand read
cachefiles: enable on-demand read mode
cachefiles: document on-demand read mode
erofs: make erofs_map_blocks() generally available
erofs: add mode checking helper
erofs: register fscache volume
erofs: add fscache context helper functions
erofs: add anonymous inode managing page cache for data blob
erofs: add erofs_fscache_read_folios() helper
erofs: register fscache context for primary data blob
erofs: register fscache context for extra data blobs
erofs: implement fscache-based metadata read
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data readahead
erofs: add 'fsid' mount option
.../filesystems/caching/cachefiles.rst | 166 ++++++
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 90 +++-
fs/cachefiles/interface.c | 2 +
fs/cachefiles/internal.h | 67 +++
fs/cachefiles/io.c | 72 ++-
fs/cachefiles/namei.c | 16 +-
fs/cachefiles/ondemand.c | 473 ++++++++++++++++++
fs/erofs/Kconfig | 10 +
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 27 +-
fs/erofs/fscache.c | 369 ++++++++++++++
fs/erofs/inode.c | 5 +
fs/erofs/internal.h | 55 ++
fs/erofs/super.c | 99 +++-
include/linux/fscache.h | 1 +
include/linux/netfs.h | 1 +
include/trace/events/cachefiles.h | 2 +
include/uapi/linux/cachefiles.h | 57 +++
20 files changed, 1457 insertions(+), 68 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 fs/erofs/fscache.c
create mode 100644 include/uapi/linux/cachefiles.h
--
2.27.0
Implement the data plane of reading data from data blobs over fscache
for inline layout.
For the heading non-inline part, the data plane for non-inline layout is
reused, while only the tail packing part needs special handling.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 65de1c754e80..d32cb5840c6d 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -60,6 +60,40 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
return ret;
}
+static int erofs_fscache_readpage_inline(struct folio *folio,
+ struct erofs_map_blocks *map)
+{
+ struct inode *inode = folio_file_mapping(folio)->host;
+ struct super_block *sb = inode->i_sb;
+ struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+ erofs_blk_t blknr;
+ size_t offset, len;
+ void *src, *dst;
+
+ /*
+ * For inline (tail packing) layout, the offset may be non-zero, which
+ * can be calculated from corresponding physical address directly.
+ */
+ offset = erofs_blkoff(map->m_pa);
+ blknr = erofs_blknr(map->m_pa);
+ len = map->m_llen;
+
+ src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+ if (IS_ERR(src))
+ return PTR_ERR(src);
+
+ DBG_BUGON(folio_size(folio) != PAGE_SIZE);
+
+ dst = kmap(folio_page(folio, 0));
+ memcpy(dst, src + offset, len);
+ memset(dst + len, 0, PAGE_SIZE - len);
+ kunmap(folio_page(folio, 0));
+
+ erofs_put_metabuf(&buf);
+
+ return 0;
+}
+
static int erofs_fscache_readpage(struct file *file, struct page *page)
{
struct folio *folio = page_folio(page);
@@ -85,6 +119,12 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
goto out_uptodate;
}
+ /* inline readpage */
+ if (map.m_flags & EROFS_MAP_META) {
+ ret = erofs_fscache_readpage_inline(folio, &map);
+ goto out_uptodate;
+ }
+
/* no-inline readpage */
mdev = (struct erofs_map_dev) {
.m_deviceid = map.m_deviceid,
--
2.27.0
Document new user interface introduced by on-demand read mode.
Signed-off-by: Jeffle Xu <[email protected]>
---
.../filesystems/caching/cachefiles.rst | 166 ++++++++++++++++++
1 file changed, 166 insertions(+)
diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..3daa08b70382 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,8 @@ Cache on Already Mounted Filesystem
(*) Debugging.
+ (*) On-demand Read.
+
Overview
@@ -482,3 +484,167 @@ the control file. For example::
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in original mode, cachefiles mainly serves as a local cache for
+remote networking fs, while in on-demand read mode, cachefiles can boost the
+scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when cache miss, netfs itself will fetch data from remote, and then write the
+fetched data into cache file. While in on-demand read mode, a user daemon is
+responsible for fetching data and then writing to the cache file.
+
+``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The model is like::
+
+ kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to user daemon when needed.
+User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's pending request to be processed. A POLLIN event will be returned
+when there's pending request.
+
+Then user daemon needs to read the devnode to fetch one request and process it
+accordingly. It is worth nothing that each read only gets one request. When
+finished processing the request, user daemon needs to write the reply to the
+devnode.
+
+Each request is started with a message header like::
+
+ struct cachefiles_msg {
+ __u32 id;
+ __u32 opcode;
+ __u32 len;
+ __u8 data[];
+ };
+
+ * ``id`` identifies the position of this request in an internal xarray
+ managing all pending requests.
+
+ * ``opcode`` identifies the type of this request.
+
+ * ``data`` identifies the payload of this request.
+
+ * ``len`` identifies the whole length of this request, including the
+ header and following type specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to "bind" command::
+
+ bind [ondemand]
+
+When "bind" command takes without argument, it defaults to the original mode.
+When "bind" command takes with "ondemand" argument, i.e. "bind ondemand",
+on-demand read mode will be enabled.
+
+
+OPEN Request
+------------
+
+When netfs opens a cache file for the first time, a request with
+CACHEFILES_OP_OPEN opcode, a.k.a OPEN request will be sent to user daemon. The
+payload format is like::
+
+ struct cachefiles_open {
+ __u32 volume_key_len;
+ __u32 cookie_key_len;
+ __u32 fd;
+ __u32 flags;
+ __u8 data[];
+ };
+
+ * ``data`` contains volume_key and cookie_key in sequence.
+
+ * ``volume_key_len`` identifies the length of the volume key of the
+ cache file, in bytes. volume_key is of string format, with a suffix
+ '\0'.
+
+ * ``cookie_key_len`` identifies the length of the cookie key of the
+ cache file, in bytes. The format of cookie_key is netfs specific. It
+ can be of binary format.
+
+ * ``fd`` identifies the anonymous fd of the cache file, with which user
+ daemon can perform write/llseek file operations on the cache file.
+
+
+OPEN request contains (volume_key, cookie_key, anon_fd) triple for corresponding
+cache file. With this triple, user daemon could fetch and write data into the
+cache file in the background, even when kernel has not triggered the cache miss
+yet. User daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key), and write the fetched data into cache file with the
+given anon_fd.
+
+After recording the (volume_key, cookie_key, anon_fd) triple, user daemon shall
+reply with "copen" (complete open) command::
+
+ copen <id>,<cache_size>
+
+ * ``id`` is exactly the id field of the previous OPEN request.
+
+ * When >= 0, ``cache_size`` identifies the size of the cache file;
+ when < 0, ``cache_size`` identifies the error code ecountered by the
+ user daemon.
+
+
+CLOSE Request
+-------------
+When cookie withdrawed, a request with CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
+request, will be sent to user daemon. It will notify user daemon to close the
+attached anon_fd. The payload format is like::
+
+ struct cachefiles_close {
+ __u32 fd;
+ };
+
+ * ``fd`` identifies the anon_fd to be closed, which is exactly the same
+ with that in OPEN request.
+
+
+READ Request
+------------
+
+When on-demand read mode is turned on, and cache miss encountered, kernel will
+send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, to user
+daemon. It will notify user daemon to fetch data in the requested file range.
+The payload format is like::
+
+ struct cachefiles_read {
+ __u64 off;
+ __u64 len;
+ __u32 fd;
+ };
+
+ * ``off`` identifies the starting offset of the requested file range.
+
+ * ``len`` identifies the length of the requested file range.
+
+ * ``fd`` identifies the anonymous fd of the requested cache file. It is
+ guaranteed that it shall be the same with the fd field in the previous
+ OPEN request.
+
+When receiving one READ request, user daemon needs to fetch data of the
+requested file range, and then write the fetched data into cache file with the
+given anonymous fd.
+
+When finished processing the READ request, user daemon needs to reply with
+CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
+
+ ioctl(fd, CACHEFILES_IOC_CREAD, id);
+
+ * ``fd`` is exactly the fd field of the previous READ request.
+
+ * ``id`` is exactly the id field of the previous READ request.
--
2.27.0
Fscache/cachefiles used to serve as a local cache for remote fs. This
patch, along with the following patches, introduces a new on-demand read
mode for cachefiles, which can boost the scenario where on-demand read
semantics is needed, e.g. container image distribution.
The essential difference between the original mode and on-demand read
mode is that, in the original mode, when cache miss, netfs itself will
fetch data from remote, and then write the fetched data into cache file.
While in on-demand read mode, a user daemon is responsible for fetching
data and then writing to the cache file.
As the first step, notify user daemon with anon_fd when looking up
cookie.
Send the anonymous fd to user daemon when looking up cookie, no matter
whether the cache file exists there or not. With the given anonymous fd,
user daemon can fetch and then write data into cache file in advance,
even when cache miss has not happened yet.
Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
cache file size shall be retrieved at runtime. This helps the scenario
where one cache file can contain multiple netfs files for the purpose of
deduplication, e.g. In this case, netfs itself has no idea the cache
file size, whilst user daemon needs to offer the hint on the cache file
size.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 77 +++++--
fs/cachefiles/internal.h | 43 ++++
fs/cachefiles/namei.c | 16 +-
fs/cachefiles/ondemand.c | 358 ++++++++++++++++++++++++++++++
include/linux/fscache.h | 1 +
include/trace/events/cachefiles.h | 2 +
include/uapi/linux/cachefiles.h | 39 ++++
9 files changed, 533 insertions(+), 15 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 include/uapi/linux/cachefiles.h
diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..58aad1fb4c5c 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@ config CACHEFILES_ERROR_INJECTION
help
This permits error injection to be enabled in cachefiles whilst a
cache is in service.
+
+config CACHEFILES_ONDEMAND
+ bool "Support for on-demand read"
+ depends on CACHEFILES
+ default n
+ help
+ This permits on-demand read mode of cachefiles. In this mode, when
+ cache miss, the cachefiles backend instead of netfs, is responsible
+ for fetching data, e.g. through user daemon.
+
+ If unsure, say N.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 16d811f1a2fa..c37a7a9af10b 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -16,5 +16,6 @@ cachefiles-y := \
xattr.o
cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
+cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o
obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..d155a6da90d3 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -75,6 +75,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
{ "inuse", cachefiles_daemon_inuse },
{ "secctx", cachefiles_daemon_secctx },
{ "tag", cachefiles_daemon_tag },
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ { "copen", cachefiles_ondemand_copen },
+#endif
{ "", NULL }
};
@@ -108,6 +111,9 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
INIT_LIST_HEAD(&cache->volumes);
INIT_LIST_HEAD(&cache->object_list);
spin_lock_init(&cache->object_list_lock);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+#endif
/* set default caching limits
* - limit at 1% free space and/or free files
@@ -126,6 +132,27 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
return 0;
}
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static inline void cachefiles_flush_reqs(struct cachefiles_cache *cache)
+{
+ struct xarray *xa = &cache->reqs;
+ struct cachefiles_req *req;
+ unsigned long index;
+
+ /*
+ * 1) Cache has been marked as dead state, and then 2) flush all
+ * pending requests in @reqs xarray. The barrier inside set_bit()
+ * will ensure that above two ops won't be reordered.
+ */
+ xa_lock(xa);
+ xa_for_each(xa, index, req) {
+ req->error = -EIO;
+ complete(&req->done);
+ }
+ xa_unlock(xa);
+}
+#endif
+
/*
* Release a cache.
*/
@@ -139,6 +166,11 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
set_bit(CACHEFILES_DEAD, &cache->flags);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ cachefiles_flush_reqs(cache);
+ xa_destroy(&cache->reqs);
+#endif
+
cachefiles_daemon_unbind(cache);
/* clean up the control file interface */
@@ -152,23 +184,14 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
return 0;
}
-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
- size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer, size_t buflen)
{
- struct cachefiles_cache *cache = file->private_data;
unsigned long long b_released;
unsigned f_released;
char buffer[256];
int n;
- //_enter(",,%zu,", buflen);
-
- if (!test_bit(CACHEFILES_READY, &cache->flags))
- return 0;
-
/* check how much space the cache has */
cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
@@ -206,6 +229,26 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
return n;
}
+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+ size_t buflen, loff_t *pos)
+{
+ struct cachefiles_cache *cache = file->private_data;
+
+ //_enter(",,%zu,", buflen);
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags))
+ return 0;
+
+ if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+ test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+ else
+ return cachefiles_do_daemon_read(cache, _buffer, buflen);
+}
+
/*
* Take a command from cachefilesd, parse it and act on it.
*/
@@ -297,8 +340,16 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
poll_wait(file, &cache->daemon_pollwq, poll);
mask = 0;
- if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
- mask |= EPOLLIN;
+ if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+ test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ if (!xa_empty(&cache->reqs))
+ mask |= EPOLLIN;
+#endif
+ } else {
+ if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+ mask |= EPOLLIN;
+ }
if (test_bit(CACHEFILES_CULLING, &cache->flags))
mask |= EPOLLOUT;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..7d5c7d391fdb 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,8 @@
#include <linux/fscache-cache.h>
#include <linux/cred.h>
#include <linux/security.h>
+#include <linux/xarray.h>
+#include <linux/cachefiles.h>
#define CACHEFILES_DIO_BLOCK_SIZE 4096
@@ -58,6 +60,9 @@ struct cachefiles_object {
enum cachefiles_content content_info:8; /* Info about content presence */
unsigned long flags;
#define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ int fd; /* anonymous fd */
+#endif
};
/*
@@ -98,11 +103,24 @@ struct cachefiles_cache {
#define CACHEFILES_DEAD 1 /* T if cache dead */
#define CACHEFILES_CULLING 2 /* T if cull engaged */
#define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE 4 /* T if in on-demand read mode */
char *rootdirname; /* name of cache root directory */
char *secctx; /* LSM security context */
char *tag; /* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ struct xarray reqs; /* xarray of pending on-demand requests */
+#endif
};
+struct cachefiles_req {
+ struct cachefiles_object *object;
+ struct completion done;
+ int error;
+ struct cachefiles_msg msg;
+};
+
+#define CACHEFILES_REQ_NEW XA_MARK_1
+
#include <trace/events/cachefiles.h>
static inline
@@ -250,6 +268,31 @@ extern struct file *cachefiles_create_tmpfile(struct cachefiles_object *object);
extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
struct cachefiles_object *object);
+/*
+ * ondemand.c
+ */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer, size_t buflen);
+
+extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
+ char *args);
+
+extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+
+#else
+static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer, size_t buflen)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+ return 0;
+}
+#endif
+
/*
* security.c
*/
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index fe1bab0f36d4..68213304e96b 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -452,10 +452,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
struct file *file;
struct path path;
- uint64_t ni_size = object->cookie->object_size;
+ uint64_t ni_size;
long ret;
- ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
cachefiles_begin_secure(cache, &saved_cred);
@@ -481,6 +480,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
goto out_dput;
}
+ ret = cachefiles_ondemand_init_object(object);
+ if (ret < 0) {
+ file = ERR_PTR(ret);
+ goto out_unuse;
+ }
+
+ ni_size = object->cookie->object_size;
+ ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
if (ni_size > 0) {
trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
cachefiles_trunc_expand_tmpfile);
@@ -586,6 +594,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
}
_debug("file -> %pd positive", dentry);
+ ret = cachefiles_ondemand_init_object(object);
+ if (ret < 0)
+ goto error_fput;
+
ret = cachefiles_check_auxdata(object, file);
if (ret < 0)
goto check_failed;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
new file mode 100644
index 000000000000..f0e6e422263b
--- /dev/null
+++ b/fs/cachefiles/ondemand.c
@@ -0,0 +1,358 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022, Alibaba Cloud
+ */
+#include <linux/fdtable.h>
+#include <linux/anon_inodes.h>
+#include <linux/uio.h>
+#include "internal.h"
+
+static int cachefiles_ondemand_fd_release(struct inode *inode,
+ struct file *file)
+{
+ struct cachefiles_object *object = file->private_data;
+
+ /*
+ * Uninstall anon_fd to the cachefiles object, so that no further
+ * associated requests will get enqueued.
+ */
+ object->fd = -1;
+
+ cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+ return 0;
+}
+
+static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
+ struct iov_iter *iter)
+{
+ struct cachefiles_object *object = kiocb->ki_filp->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct file *file = object->file;
+ size_t len = iter->count;
+ loff_t pos = kiocb->ki_pos;
+ const struct cred *saved_cred;
+ int ret;
+
+ if (!file)
+ return -ENOBUFS;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
+ cachefiles_end_secure(cache, saved_cred);
+ if (ret < 0)
+ return ret;
+
+ ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
+ if (!ret)
+ ret = len;
+
+ return ret;
+}
+
+static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
+ int whence)
+{
+ struct cachefiles_object *object = filp->private_data;
+ struct file *file = object->file;
+
+ if (!file)
+ return -ENOBUFS;
+
+ return vfs_llseek(file, pos, whence);
+}
+
+static const struct file_operations cachefiles_ondemand_fd_fops = {
+ .owner = THIS_MODULE,
+ .release = cachefiles_ondemand_fd_release,
+ .write_iter = cachefiles_ondemand_fd_write_iter,
+ .llseek = cachefiles_ondemand_fd_llseek,
+};
+
+/*
+ * OPEN request Completion (copen)
+ * - command: "copen <id>,<cache_size>"
+ * <cache_size> represents the object size if >=0, error code if it's negative
+ */
+int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
+{
+ struct cachefiles_req *req;
+ struct fscache_cookie *cookie;
+ char *pid, *psize;
+ unsigned long id;
+ long size;
+ int ret;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ if (!*args) {
+ pr_err("Empty id specified\n");
+ return -EINVAL;
+ }
+
+ pid = args;
+ psize = strchr(args, ',');
+ if (!psize) {
+ pr_err("Cache size is not specified\n");
+ return -EINVAL;
+ }
+
+ *psize = 0;
+ psize++;
+
+ ret = kstrtoul(pid, 0, &id);
+ if (ret)
+ return ret;
+
+ req = xa_erase(&cache->reqs, id);
+ if (!req)
+ return -EINVAL;
+
+ /* fail OPEN request if copen format is invalid */
+ ret = kstrtol(psize, 0, &size);
+ if (ret) {
+ req->error = ret;
+ goto out;
+ }
+
+ /* fail OPEN request if daemon reports an error */
+ if (size < 0) {
+ req->error = size;
+ goto out;
+ }
+
+ cookie = req->object->cookie;
+ cookie->object_size = size;
+ if (size)
+ clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+ else
+ set_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+
+out:
+ complete(&req->done);
+ return ret;
+}
+
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_open *load;
+ struct fd f;
+ int ret;
+
+ object = cachefiles_grab_object(req->object,
+ cachefiles_obj_get_ondemand_fd);
+
+ ret = anon_inode_getfd("[cachefiles]", &cachefiles_ondemand_fd_fops,
+ object, O_WRONLY);
+ if (ret < 0) {
+ cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+ return ret;
+ }
+
+ f = fdget_pos(ret);
+ if (WARN_ON_ONCE(!f.file))
+ return -EBADFD;
+
+ f.file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
+ fdput_pos(f);
+
+ load = (void *)req->msg.data;
+ load->fd = ret;
+ object->fd = ret;
+
+ return 0;
+}
+
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer, size_t buflen)
+{
+ struct cachefiles_req *req;
+ struct cachefiles_msg *msg;
+ unsigned long id = 0;
+ size_t n;
+ int ret = 0;
+ XA_STATE(xas, &cache->reqs, 0);
+
+ /*
+ * Search for request that has not ever been processed, to prevent
+ * requests from being sent to user daemon repeatedly.
+ */
+ xa_lock(&cache->reqs);
+ req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+ if (!req) {
+ xa_unlock(&cache->reqs);
+ return 0;
+ }
+
+ msg = &req->msg;
+ n = msg->len;
+
+ if (n > buflen) {
+ xa_unlock(&cache->reqs);
+ return -EMSGSIZE;
+ }
+
+ xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
+ xa_unlock(&cache->reqs);
+
+ msg->id = id = xas.xa_index;
+
+ if (msg->opcode == CACHEFILES_OP_OPEN) {
+ ret = cachefiles_ondemand_get_fd(req);
+ if (ret)
+ goto error;
+ }
+
+ if (copy_to_user(_buffer, msg, n) != 0) {
+ ret = -EFAULT;
+ goto err_put_fd;
+ }
+
+ return n;
+
+err_put_fd:
+ if (msg->opcode == CACHEFILES_OP_OPEN)
+ close_fd(req->object->fd);
+error:
+ xa_erase(&cache->reqs, id);
+ req->error = ret;
+ complete(&req->done);
+ return ret;
+}
+
+typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
+
+static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
+ enum cachefiles_opcode opcode,
+ size_t data_len,
+ init_req_fn init_req,
+ void *private)
+{
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct cachefiles_req *req;
+ XA_STATE(xas, &cache->reqs, 0);
+ int ret;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return 0;
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags))
+ return -EIO;
+
+ req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ req->object = object;
+ init_completion(&req->done);
+ req->msg.opcode = opcode;
+ req->msg.len = sizeof(struct cachefiles_msg) + data_len;
+
+ ret = init_req(req, private);
+ if (ret)
+ goto out;
+
+ do {
+ /*
+ * Stop enqueuing the request when daemon is dying. So we need
+ * to 1) check cache state, and 2) enqueue request if cache is
+ * alive.
+ *
+ * These two ops need to be atomic as a whole. Otherwise request
+ * may be enqueued after xarray has been flushed, in which case
+ * the orphan request will never be completed and thus netfs
+ * will hang there forever.
+ */
+ xas_lock(&xas);
+
+ /* recheck dead state with lock held */
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ xas_unlock(&xas);
+ ret = -EIO;
+ goto out;
+ }
+
+ xas.xa_index = 0;
+ xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
+ if (xas.xa_node == XAS_RESTART)
+ xas_set_err(&xas, -EBUSY);
+ xas_store(&xas, req);
+ xas_clear_mark(&xas, XA_FREE_MARK);
+ xas_set_mark(&xas, CACHEFILES_REQ_NEW);
+ xas_unlock(&xas);
+ } while (xas_nomem(&xas, GFP_KERNEL));
+
+ ret = xas_error(&xas);
+ if (ret)
+ goto out;
+
+ wake_up_all(&cache->daemon_pollwq);
+ wait_for_completion(&req->done);
+ ret = req->error;
+out:
+ kfree(req);
+ return ret;
+}
+
+static int init_open_req(struct cachefiles_req *req, void *private)
+{
+ struct cachefiles_object *object = req->object;
+ struct fscache_cookie *cookie = object->cookie;
+ struct fscache_volume *volume = object->volume->vcookie;
+ struct cachefiles_open *load = (void *)req->msg.data;
+ size_t volume_key_len, cookie_key_len;
+ void *volume_key, *cookie_key;
+
+ /*
+ * Volume key is of string format.
+ * key[0] stores strlen() of the string, while the remained part stores
+ * the content of the string (excluding the suffix '\0'). Append the
+ * suffix '\0' to the output volume_key, so that it's a valid string.
+ */
+ volume_key_len = volume->key[0] + 1;
+ volume_key = volume->key + 1;
+
+ /*
+ * Cookie key is of binary format, which is netfs specific.
+ */
+ cookie_key_len = cookie->key_len;
+ cookie_key = fscache_get_key(cookie);
+
+ if (!(object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)) {
+ pr_err("WANT_CACHE_SIZE is needed for on-demand mode\n");
+ return -EINVAL;
+ }
+
+ load->volume_key_len = volume_key_len;
+ load->cookie_key_len = cookie_key_len;
+ memcpy(load->data, volume_key, volume_key_len);
+ memcpy(load->data + volume_key_len, cookie_key, cookie_key_len);
+
+ return 0;
+}
+
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+ struct fscache_cookie *cookie = object->cookie;
+ struct fscache_volume *volume = object->volume->vcookie;
+ size_t volume_key_len, cookie_key_len, data_len;
+
+ /*
+ * Cachefiles will firstly check cache file under the root cache
+ * directory. If coherency check failed, it will fallback to creating a
+ * new tmpfile as the cache file. Reuse the previously created anon_fd
+ * if any.
+ */
+ if (object->fd > 0)
+ return 0;
+
+ volume_key_len = volume->key[0] + 1;
+ cookie_key_len = cookie->key_len;
+ data_len = sizeof(struct cachefiles_open) +
+ volume_key_len + cookie_key_len;
+
+ return cachefiles_ondemand_send_req(object,
+ CACHEFILES_OP_OPEN, data_len,
+ init_open_req, NULL);
+}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 6727fb0db619..663ab6f2ede6 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,7 @@ struct fscache_cookie;
#define FSCACHE_ADV_SINGLE_CHUNK 0x01 /* The object is a single chunk of data */
#define FSCACHE_ADV_WRITE_CACHE 0x00 /* Do cache if written to locally */
#define FSCACHE_ADV_WRITE_NOCACHE 0x02 /* Don't cache if written to locally */
+#define FSCACHE_ADV_WANT_CACHE_SIZE 0x04 /* Retrieve cache size at runtime */
#define FSCACHE_INVAL_DIO_WRITE 0x01 /* Invalidate due to DIO write */
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 2c530637e10a..68e6d4eea9bd 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace {
cachefiles_obj_see_lookup_failed,
cachefiles_obj_see_withdraw_cookie,
cachefiles_obj_see_withdrawal,
+ cachefiles_obj_get_ondemand_fd,
+ cachefiles_obj_put_ondemand_fd,
};
enum fscache_why_object_killed {
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
new file mode 100644
index 000000000000..49cb7adb439a
--- /dev/null
+++ b/include/uapi/linux/cachefiles.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_CACHEFILES_H
+#define _LINUX_CACHEFILES_H
+
+#include <linux/types.h>
+
+/*
+ * Fscache ensures that the maximum length of cookie key is 255. The volume key
+ * is controlled by netfs, and generally no bigger than 255.
+ */
+#define CACHEFILES_MSG_MAX_SIZE 1024
+
+enum cachefiles_opcode {
+ CACHEFILES_OP_OPEN,
+};
+
+/*
+ * @id identifying position of this message in the radix tree
+ * @opcode message type, CACHEFILE_OP_*
+ * @len message length, including message header and following data
+ * @data message type specific payload
+ */
+struct cachefiles_msg {
+ __u32 id;
+ __u32 opcode;
+ __u32 len;
+ __u8 data[];
+};
+
+struct cachefiles_open {
+ __u32 volume_key_len;
+ __u32 cookie_key_len;
+ __u32 fd;
+ __u32 flags;
+ /* following data contains volume_key and cookie_key in sequence */
+ __u8 data[];
+};
+
+#endif
--
2.27.0
Notify user daemon that cookie is going to be withdrawn, providing a
hint that the associated anon_fd can be closed. The anon_fd attached in
the CLOSE request shall be same with that in the previous OPEN request.
Be noted that this is only a hint. User daemon can close the anon_fd
when receiving the CLOSE request, then it will receive another anon_fd
if the cookie gets looked up. Or it can also ignore the CLOSE request,
and keep writing data into the anon_fd. However the next time cookie
gets looked up, the user daemon will still receive another anon_fd.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/interface.c | 2 ++
fs/cachefiles/internal.h | 5 +++++
fs/cachefiles/ondemand.c | 34 +++++++++++++++++++++++++++++++++
include/uapi/linux/cachefiles.h | 5 +++++
4 files changed, 46 insertions(+)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index ae93cee9d25d..a69073a1d3f0 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -362,6 +362,8 @@ static void cachefiles_withdraw_cookie(struct fscache_cookie *cookie)
spin_unlock(&cache->object_list_lock);
}
+ cachefiles_ondemand_clean_object(object);
+
if (object->file) {
cachefiles_begin_secure(cache, &saved_cred);
cachefiles_clean_up_object(object, cache);
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 7d5c7d391fdb..8a397d4da560 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -279,6 +279,7 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
char *args);
extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
#else
static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
@@ -291,6 +292,10 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
{
return 0;
}
+
+static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
+{
+}
#endif
/*
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index f0e6e422263b..8cbfba057616 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -209,6 +209,12 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
goto err_put_fd;
}
+ /* CLOSE request has no reply */
+ if (msg->opcode == CACHEFILES_OP_CLOSE) {
+ xa_erase(&cache->reqs, id);
+ complete(&req->done);
+ }
+
return n;
err_put_fd:
@@ -332,6 +338,26 @@ static int init_open_req(struct cachefiles_req *req, void *private)
return 0;
}
+static int init_close_req(struct cachefiles_req *req, void *private)
+{
+ struct cachefiles_object *object = req->object;
+ struct cachefiles_close *load = (void *)req->msg.data;
+ int fd = object->fd;
+
+ if (WARN_ON_ONCE(fd == -1))
+ return -EIO;
+
+ /*
+ * It's possible if the cookie looking up phase failed before READ
+ * request has ever been sent.
+ */
+ if (fd == 0)
+ return -ENOENT;
+
+ load->fd = fd;
+ return 0;
+}
+
int cachefiles_ondemand_init_object(struct cachefiles_object *object)
{
struct fscache_cookie *cookie = object->cookie;
@@ -356,3 +382,11 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
CACHEFILES_OP_OPEN, data_len,
init_open_req, NULL);
}
+
+void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
+{
+ cachefiles_ondemand_send_req(object,
+ CACHEFILES_OP_CLOSE,
+ sizeof(struct cachefiles_close),
+ init_close_req, NULL);
+}
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 49cb7adb439a..dba97d284899 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -12,6 +12,7 @@
enum cachefiles_opcode {
CACHEFILES_OP_OPEN,
+ CACHEFILES_OP_CLOSE,
};
/*
@@ -36,4 +37,8 @@ struct cachefiles_open {
__u8 data[];
};
+struct cachefiles_close {
+ __u32 fd;
+};
+
#endif
--
2.27.0
On 3/31/22 7:57 PM, Jeffle Xu wrote:
> changes since v6:
> - cachefiles: "fscache: export fscache_end_operation()" is still needed.
> Since it has been pull requested for 5.18 (not merged yet), it's not
> included in this patchset.
>
>
> Kernel Patchset
> ---------------
> Git tree:
>
> https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v7
>
> Gitweb:
>
> https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v7
>
>
I have rebased it to the latest upstream, which has contained David's PR
on fscache/cachefiles. There's no git conflic when rebasing.
--
Thanks,
Jeffle
Implement the data plane of on-demand read mode.
A new NETFS_READ_HOLE_ONDEMAND flag is introduced to indicate that
on-demand read should be done when a cache miss encountered. In this
case, the read routine will send a READ request to user daemon, along
with the anonymous fd and the file range that shall be read. Now user
daemon is responsible for fetching data in the given file range, and
then writing the fetched data into cache file with the given anonymous
fd.
After sending the READ request, the read routine will hang there, until
the READ request is handled by user daemon. Then it will retry to read
from the same file range. If a cache miss is encountered again on the
same file range, the read routine will fail then.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/internal.h | 9 ++++
fs/cachefiles/io.c | 11 +++++
fs/cachefiles/ondemand.c | 81 +++++++++++++++++++++++++++++++++
include/linux/netfs.h | 1 +
include/uapi/linux/cachefiles.h | 13 ++++++
5 files changed, 115 insertions(+)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8a397d4da560..b4a834671b6b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -281,6 +281,9 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);
+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len);
+
#else
static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
char __user *_buffer, size_t buflen)
@@ -296,6 +299,12 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
{
}
+
+static inline int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len)
+{
+ return -EOPNOTSUPP;
+}
#endif
/*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 7c3037afae63..c8f43c51f9d6 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
file, file_inode(file)->i_ino, start_pos, len,
i_size_read(file_inode(file)));
+retry:
/* If the caller asked us to seek for data before doing the read, then
* we should do that now. If we find a gap, we fill it with zeros.
*/
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
if (read_hole == NETFS_READ_HOLE_FAIL)
goto presubmission_error;
+ if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+ ret = cachefiles_ondemand_read(object, off, len);
+ if (ret)
+ goto presubmission_error;
+
+ /* fail the read if no progress achieved */
+ read_hole = NETFS_READ_HOLE_FAIL;
+ goto retry;
+ }
+
iov_iter_zero(len, iter);
skipped = len;
ret = 0;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 8cbfba057616..4f07a7584f7c 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -11,13 +11,30 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
struct file *file)
{
struct cachefiles_object *object = file->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct xarray *xa = &cache->reqs;
+ struct cachefiles_req *req;
+ unsigned long index;
+ xa_lock(xa);
/*
* Uninstall anon_fd to the cachefiles object, so that no further
* associated requests will get enqueued.
*/
object->fd = -1;
+ /*
+ * Flush all pending READ requests since their completion depends on
+ * anon_fd.
+ */
+ xa_for_each(xa, index, req) {
+ if (req->msg.opcode == CACHEFILES_OP_READ) {
+ req->error = -EIO;
+ complete(&req->done);
+ }
+ }
+ xa_unlock(xa);
+
cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
return 0;
}
@@ -61,11 +78,35 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
return vfs_llseek(file, pos, whence);
}
+static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
+ unsigned long arg)
+{
+ struct cachefiles_object *object = filp->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct cachefiles_req *req;
+ unsigned long id;
+
+ if (ioctl != CACHEFILES_IOC_CREAD)
+ return -EINVAL;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ id = arg;
+ req = xa_erase(&cache->reqs, id);
+ if (!req)
+ return -EINVAL;
+
+ complete(&req->done);
+ return 0;
+}
+
static const struct file_operations cachefiles_ondemand_fd_fops = {
.owner = THIS_MODULE,
.release = cachefiles_ondemand_fd_release,
.write_iter = cachefiles_ondemand_fd_write_iter,
.llseek = cachefiles_ondemand_fd_llseek,
+ .unlocked_ioctl = cachefiles_ondemand_fd_ioctl,
};
/*
@@ -279,6 +320,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
goto out;
}
+ /* recheck anon_fd for READ request with lock held */
+ if (opcode == CACHEFILES_OP_READ && object->fd == -1) {
+ xas_unlock(&xas);
+ ret = -EIO;
+ goto out;
+ }
+
xas.xa_index = 0;
xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
if (xas.xa_node == XAS_RESTART)
@@ -358,6 +406,28 @@ static int init_close_req(struct cachefiles_req *req, void *private)
return 0;
}
+struct cachefiles_read_ctx {
+ loff_t off;
+ size_t len;
+};
+
+static int init_read_req(struct cachefiles_req *req, void *private)
+{
+ struct cachefiles_object *object = req->object;
+ struct cachefiles_read *load = (void *)&req->msg.data;
+ struct cachefiles_read_ctx *read_ctx = private;
+ int fd = object->fd;
+
+ /* Stop enqueuing request when daemon closes anon_fd prematurely. */
+ if (WARN_ON_ONCE(fd == -1))
+ return -EIO;
+
+ load->off = read_ctx->off;
+ load->len = read_ctx->len;
+ load->fd = fd;
+ return 0;
+}
+
int cachefiles_ondemand_init_object(struct cachefiles_object *object)
{
struct fscache_cookie *cookie = object->cookie;
@@ -390,3 +460,14 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
sizeof(struct cachefiles_close),
init_close_req, NULL);
}
+
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len)
+{
+ struct cachefiles_read_ctx read_ctx = {pos, len};
+
+ return cachefiles_ondemand_send_req(object,
+ CACHEFILES_OP_READ,
+ sizeof(struct cachefiles_read),
+ init_read_req, &read_ctx);
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 614f22213e21..2a9c50d3a928 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -203,6 +203,7 @@ enum netfs_read_from_hole {
NETFS_READ_HOLE_IGNORE,
NETFS_READ_HOLE_CLEAR,
NETFS_READ_HOLE_FAIL,
+ NETFS_READ_HOLE_ONDEMAND,
};
/*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index dba97d284899..efd18ef6453c 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -3,6 +3,7 @@
#define _LINUX_CACHEFILES_H
#include <linux/types.h>
+#include <linux/ioctl.h>
/*
* Fscache ensures that the maximum length of cookie key is 255. The volume key
@@ -13,6 +14,7 @@
enum cachefiles_opcode {
CACHEFILES_OP_OPEN,
CACHEFILES_OP_CLOSE,
+ CACHEFILES_OP_READ,
};
/*
@@ -41,4 +43,15 @@ struct cachefiles_close {
__u32 fd;
};
+struct cachefiles_read {
+ __u64 off;
+ __u64 len;
+ __u32 fd;
+};
+
+/*
+ * For CACHEFILES_IOC_CREAD, arg is the @id field of corresponding READ request.
+ */
+#define CACHEFILES_IOC_CREAD _IOW(0x98, 1, int)
+
#endif
--
2.27.0
Introduce a context structure for managing data blobs, and helper
functions for initializing and cleaning up this context structure.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 19 +++++++++++++++++++
2 files changed, 65 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 7a6d0239ebb1..67a3c4935245 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,52 @@
#include <linux/fscache.h>
#include "internal.h"
+/*
+ * Create an fscache context for data blob.
+ * Return: 0 on success and allocated fscache context is assigned to @fscache,
+ * negative error number on failure.
+ */
+int erofs_fscache_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache, char *name)
+{
+ struct fscache_volume *volume = EROFS_SB(sb)->volume;
+ struct erofs_fscache *ctx;
+ struct fscache_cookie *cookie;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+
+ cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
+ name, strlen(name), NULL, 0, 0);
+ if (!cookie) {
+ erofs_err(sb, "failed to get cookie for %s", name);
+ kfree(name);
+ return -EINVAL;
+ }
+
+ fscache_use_cookie(cookie, false);
+ ctx->cookie = cookie;
+
+ *fscache = ctx;
+ return 0;
+}
+
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+ struct erofs_fscache *ctx = *fscache;
+
+ if (!ctx)
+ return;
+
+ fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+ fscache_relinquish_cookie(ctx->cookie, false);
+ ctx->cookie = NULL;
+
+ kfree(ctx);
+ *fscache = NULL;
+}
+
int erofs_fscache_register_fs(struct super_block *sb)
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 952a2f483f94..c6a3351a4d7d 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
u16 max_pclusterblks;
};
+struct erofs_fscache {
+ struct fscache_cookie *cookie;
+};
+
struct erofs_sb_info {
struct erofs_mount_opts opt; /* options */
#ifdef CONFIG_EROFS_FS_ZIP
@@ -626,9 +630,24 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
#ifdef CONFIG_EROFS_FS_ONDEMAND
int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb);
+
+int erofs_fscache_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache, char *name);
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
#else
static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+
+static inline int erofs_fscache_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache,
+ char *name)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+}
#endif
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
--
2.27.0
Introduce one anonymous inode managing page cache for data blob. Then
erofs could read directly from the address space of the anonymous inode
when cache hit.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 39 ++++++++++++++++++++++++++++++++++++---
fs/erofs/internal.h | 6 ++++--
2 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 67a3c4935245..1c88614203d2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,17 +5,22 @@
#include <linux/fscache.h>
#include "internal.h"
+static const struct address_space_operations erofs_fscache_meta_aops = {
+};
+
/*
* Create an fscache context for data blob.
* Return: 0 on success and allocated fscache context is assigned to @fscache,
* negative error number on failure.
*/
int erofs_fscache_register_cookie(struct super_block *sb,
- struct erofs_fscache **fscache, char *name)
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode)
{
struct fscache_volume *volume = EROFS_SB(sb)->volume;
struct erofs_fscache *ctx;
struct fscache_cookie *cookie;
+ int ret;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
@@ -25,15 +30,40 @@ int erofs_fscache_register_cookie(struct super_block *sb,
name, strlen(name), NULL, 0, 0);
if (!cookie) {
erofs_err(sb, "failed to get cookie for %s", name);
- kfree(name);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err;
}
fscache_use_cookie(cookie, false);
ctx->cookie = cookie;
+ if (need_inode) {
+ struct inode *const inode = new_inode(sb);
+
+ if (!inode) {
+ erofs_err(sb, "failed to get anon inode for %s", name);
+ ret = -ENOMEM;
+ goto err_cookie;
+ }
+
+ set_nlink(inode, 1);
+ inode->i_size = OFFSET_MAX;
+ inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+
+ ctx->inode = inode;
+ }
+
*fscache = ctx;
return 0;
+
+err_cookie:
+ fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+ fscache_relinquish_cookie(ctx->cookie, false);
+ ctx->cookie = NULL;
+err:
+ kfree(ctx);
+ return ret;
}
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
@@ -47,6 +77,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
fscache_relinquish_cookie(ctx->cookie, false);
ctx->cookie = NULL;
+ iput(ctx->inode);
+ ctx->inode = NULL;
+
kfree(ctx);
*fscache = NULL;
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index c6a3351a4d7d..3a4a344cfed3 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {
struct erofs_fscache {
struct fscache_cookie *cookie;
+ struct inode *inode;
};
struct erofs_sb_info {
@@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb);
int erofs_fscache_register_cookie(struct super_block *sb,
- struct erofs_fscache **fscache, char *name);
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode);
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
#else
static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
@@ -640,7 +642,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
static inline int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
- char *name)
+ char *name, bool need_inode)
{
return -EOPNOTSUPP;
}
--
2.27.0
On 3/31/22 7:57 PM, Jeffle Xu wrote:
>
> + ret = cachefiles_ondemand_init_object(object);
> + if (ret < 0) {
> + file = ERR_PTR(ret);
> + goto out_unuse;
> + }
> +
Sorry, this patch really depends on [1] which introduces "out_unuse" label.
[1] https://www.spinics.net/lists/linux-cachefs/msg05972.html
My git branch has already contained this commit [2].
[2]
https://github.com/lostjeffle/linux/commit/3c71705e777fa75d37f784640a232db14ce62c31
--
Thanks,
Jeffle
Similar to the multi device mode, erofs could be mounted from one
primary data blob (mandatory) and multiple extra data blobs (optional).
Register fscache context for each extra data blob.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 3 +++
fs/erofs/internal.h | 2 ++
fs/erofs/super.c | 25 +++++++++++++++++--------
3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index bc22642358ec..14b64d960541 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = sb->s_bdev;
map->m_daxdev = EROFS_SB(sb)->dax_dev;
map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+ map->m_fscache = EROFS_SB(sb)->s_fscache;
if (map->m_deviceid) {
down_read(&devs->rwsem);
@@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
+ map->m_fscache = dif->fscache;
up_read(&devs->rwsem);
} else if (devs->extra_devices) {
down_read(&devs->rwsem);
@@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
+ map->m_fscache = dif->fscache;
break;
}
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index eb37b33bce37..90f7d6286a4f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
struct erofs_device_info {
char *path;
+ struct erofs_fscache *fscache;
struct block_device *bdev;
struct dax_device *dax_dev;
u64 dax_part_off;
@@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
#endif /* !CONFIG_EROFS_FS_ZIP */
struct erofs_map_dev {
+ struct erofs_fscache *m_fscache;
struct block_device *m_bdev;
struct dax_device *m_daxdev;
u64 m_dax_part_off;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 9498b899b73b..8c7181cd37e6 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -259,15 +259,23 @@ static int erofs_init_devices(struct super_block *sb,
}
dis = ptr + erofs_blkoff(pos);
- bdev = blkdev_get_by_path(dif->path,
- FMODE_READ | FMODE_EXCL,
- sb->s_type);
- if (IS_ERR(bdev)) {
- err = PTR_ERR(bdev);
- break;
+ if (erofs_is_fscache_mode(sb)) {
+ err = erofs_fscache_register_cookie(sb, &dif->fscache,
+ dif->path, false);
+ if (err)
+ break;
+ } else {
+ bdev = blkdev_get_by_path(dif->path,
+ FMODE_READ | FMODE_EXCL,
+ sb->s_type);
+ if (IS_ERR(bdev)) {
+ err = PTR_ERR(bdev);
+ break;
+ }
+ dif->bdev = bdev;
+ dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
}
- dif->bdev = bdev;
- dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
+
dif->blocks = le32_to_cpu(dis->blocks);
dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
sbi->total_blocks += dif->blocks;
@@ -701,6 +709,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
fs_put_dax(dif->dax_dev);
if (dif->bdev)
blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
+ erofs_fscache_unregister_cookie(&dif->fscache);
kfree(dif->path);
kfree(dif);
return 0;
--
2.27.0