2022-04-06 14:32:29

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

changes since v7:
- rebased to 5.18-rc1
- include "cachefiles: unmark inode in use in error path" patch into
this patchset to avoid warning from test robot (patch 1)
- cachefiles: rename [cookie|volume]_key_len field of struct
cachefiles_open to [cookie|volume]_key_size to avoid potential
misunderstanding. Also add more documentation to
include/uapi/linux/cachefiles.h. (patch 3)
- cachefiles: valid check for error code returned from user daemon
(patch 3)
- cachefiles: change WARN_ON_ONCE() to pr_info_once() when user daemon
closes anon_fd prematurely (patch 4/5)
- ready for complete review


Kernel Patchset
---------------
Git tree:

https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v8

Gitweb:

https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v8


User Daemon for Quick Test
--------------------------
Git tree:

https://github.com/lostjeffle/demand-read-cachefilesd.git main

Gitweb:

https://github.com/lostjeffle/demand-read-cachefilesd


RFC: https://lore.kernel.org/all/[email protected]/t/
v1: https://lore.kernel.org/lkml/[email protected]/T/
v2: https://lore.kernel.org/all/[email protected]/t/
v3: https://lore.kernel.org/lkml/[email protected]/T/
v4: https://lore.kernel.org/lkml/[email protected]/T/#t
v5: https://lore.kernel.org/lkml/[email protected]/T/
v6: https://lore.kernel.org/lkml/[email protected]/T/
v7: https://www.spinics.net/lists/linux-fsdevel/msg215066.html


[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.

erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)

Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.

To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.

The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.

When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user daemon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.


[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.

If data has already been ready in the backing file, netfs (e.g. erofs)
will read from the backing file directly and won't be trapped to user
space anymore. Thus the user daemon could fetch data (from remote)
asynchronously on the background, and thus accelerate the backing file
accessing in some degree.

2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.

In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.


Jeffle Xu (20):
cachefiles: unmark inode in use in error path
cachefiles: extract write routine
cachefiles: notify user daemon with anon_fd when looking up cookie
cachefiles: notify user daemon when withdrawing cookie
cachefiles: implement on-demand read
cachefiles: enable on-demand read mode
cachefiles: document on-demand read mode
erofs: make erofs_map_blocks() generally available
erofs: add mode checking helper
erofs: register fscache volume
erofs: add fscache context helper functions
erofs: add anonymous inode managing page cache for data blob
erofs: add erofs_fscache_read_folios() helper
erofs: register fscache context for primary data blob
erofs: register fscache context for extra data blobs
erofs: implement fscache-based metadata read
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data readahead
erofs: add 'fsid' mount option

.../filesystems/caching/cachefiles.rst | 165 ++++++
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 90 +++-
fs/cachefiles/interface.c | 2 +
fs/cachefiles/internal.h | 67 +++
fs/cachefiles/io.c | 72 ++-
fs/cachefiles/namei.c | 49 +-
fs/cachefiles/ondemand.c | 479 ++++++++++++++++++
fs/erofs/Kconfig | 10 +
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 27 +-
fs/erofs/fscache.c | 369 ++++++++++++++
fs/erofs/inode.c | 5 +
fs/erofs/internal.h | 55 ++
fs/erofs/super.c | 99 +++-
include/linux/fscache.h | 1 +
include/linux/netfs.h | 1 +
include/trace/events/cachefiles.h | 2 +
include/uapi/linux/cachefiles.h | 72 +++
20 files changed, 1501 insertions(+), 77 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 fs/erofs/fscache.c
create mode 100644 include/uapi/linux/cachefiles.h

--
2.27.0


2022-04-06 14:32:35

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 08/20] erofs: make erofs_map_blocks() generally available

... so that it can be used in the following introduced fscache mode.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 4 ++--
fs/erofs/internal.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 780db1e5f4b7..bc22642358ec 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -110,8 +110,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
return 0;
}

-static int erofs_map_blocks(struct inode *inode,
- struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags)
{
struct super_block *sb = inode->i_sb;
struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5298c4ee277d..fe9564e5091e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -486,6 +486,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags);

/* inode.c */
static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
--
2.27.0

2022-04-06 14:33:39

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 12/20] erofs: add anonymous inode managing page cache for data blob

Introduce one anonymous inode managing page cache for data blob. Then
erofs could read directly from the address space of the anonymous inode
when cache hit.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 39 ++++++++++++++++++++++++++++++++++++---
fs/erofs/internal.h | 6 ++++--
2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 67a3c4935245..1c88614203d2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,17 +5,22 @@
#include <linux/fscache.h>
#include "internal.h"

+static const struct address_space_operations erofs_fscache_meta_aops = {
+};
+
/*
* Create an fscache context for data blob.
* Return: 0 on success and allocated fscache context is assigned to @fscache,
* negative error number on failure.
*/
int erofs_fscache_register_cookie(struct super_block *sb,
- struct erofs_fscache **fscache, char *name)
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode)
{
struct fscache_volume *volume = EROFS_SB(sb)->volume;
struct erofs_fscache *ctx;
struct fscache_cookie *cookie;
+ int ret;

ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
@@ -25,15 +30,40 @@ int erofs_fscache_register_cookie(struct super_block *sb,
name, strlen(name), NULL, 0, 0);
if (!cookie) {
erofs_err(sb, "failed to get cookie for %s", name);
- kfree(name);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err;
}

fscache_use_cookie(cookie, false);
ctx->cookie = cookie;

+ if (need_inode) {
+ struct inode *const inode = new_inode(sb);
+
+ if (!inode) {
+ erofs_err(sb, "failed to get anon inode for %s", name);
+ ret = -ENOMEM;
+ goto err_cookie;
+ }
+
+ set_nlink(inode, 1);
+ inode->i_size = OFFSET_MAX;
+ inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+
+ ctx->inode = inode;
+ }
+
*fscache = ctx;
return 0;
+
+err_cookie:
+ fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+ fscache_relinquish_cookie(ctx->cookie, false);
+ ctx->cookie = NULL;
+err:
+ kfree(ctx);
+ return ret;
}

void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
@@ -47,6 +77,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
fscache_relinquish_cookie(ctx->cookie, false);
ctx->cookie = NULL;

+ iput(ctx->inode);
+ ctx->inode = NULL;
+
kfree(ctx);
*fscache = NULL;
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index c6a3351a4d7d..3a4a344cfed3 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {

struct erofs_fscache {
struct fscache_cookie *cookie;
+ struct inode *inode;
};

struct erofs_sb_info {
@@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb);

int erofs_fscache_register_cookie(struct super_block *sb,
- struct erofs_fscache **fscache, char *name);
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode);
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
#else
static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
@@ -640,7 +642,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}

static inline int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
- char *name)
+ char *name, bool need_inode)
{
return -EOPNOTSUPP;
}
--
2.27.0

2022-04-06 14:33:41

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 18/20] erofs: implement fscache-based data read for inline layout

Implement the data plane of reading data from data blobs over fscache
for inline layout.

For the heading non-inline part, the data plane for non-inline layout is
reused, while only the tail packing part needs special handling.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 65de1c754e80..d32cb5840c6d 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -60,6 +60,40 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
return ret;
}

+static int erofs_fscache_readpage_inline(struct folio *folio,
+ struct erofs_map_blocks *map)
+{
+ struct inode *inode = folio_file_mapping(folio)->host;
+ struct super_block *sb = inode->i_sb;
+ struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+ erofs_blk_t blknr;
+ size_t offset, len;
+ void *src, *dst;
+
+ /*
+ * For inline (tail packing) layout, the offset may be non-zero, which
+ * can be calculated from corresponding physical address directly.
+ */
+ offset = erofs_blkoff(map->m_pa);
+ blknr = erofs_blknr(map->m_pa);
+ len = map->m_llen;
+
+ src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+ if (IS_ERR(src))
+ return PTR_ERR(src);
+
+ DBG_BUGON(folio_size(folio) != PAGE_SIZE);
+
+ dst = kmap(folio_page(folio, 0));
+ memcpy(dst, src + offset, len);
+ memset(dst + len, 0, PAGE_SIZE - len);
+ kunmap(folio_page(folio, 0));
+
+ erofs_put_metabuf(&buf);
+
+ return 0;
+}
+
static int erofs_fscache_readpage(struct file *file, struct page *page)
{
struct folio *folio = page_folio(page);
@@ -85,6 +119,12 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
goto out_uptodate;
}

+ /* inline readpage */
+ if (map.m_flags & EROFS_MAP_META) {
+ ret = erofs_fscache_readpage_inline(folio, &map);
+ goto out_uptodate;
+ }
+
/* no-inline readpage */
mdev = (struct erofs_map_dev) {
.m_deviceid = map.m_deviceid,
--
2.27.0

2022-04-06 14:33:42

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 11/20] erofs: add fscache context helper functions

Introduce a context structure for managing data blobs, and helper
functions for initializing and cleaning up this context structure.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 19 +++++++++++++++++++
2 files changed, 65 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 7a6d0239ebb1..67a3c4935245 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,52 @@
#include <linux/fscache.h>
#include "internal.h"

+/*
+ * Create an fscache context for data blob.
+ * Return: 0 on success and allocated fscache context is assigned to @fscache,
+ * negative error number on failure.
+ */
+int erofs_fscache_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache, char *name)
+{
+ struct fscache_volume *volume = EROFS_SB(sb)->volume;
+ struct erofs_fscache *ctx;
+ struct fscache_cookie *cookie;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+
+ cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
+ name, strlen(name), NULL, 0, 0);
+ if (!cookie) {
+ erofs_err(sb, "failed to get cookie for %s", name);
+ kfree(name);
+ return -EINVAL;
+ }
+
+ fscache_use_cookie(cookie, false);
+ ctx->cookie = cookie;
+
+ *fscache = ctx;
+ return 0;
+}
+
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+ struct erofs_fscache *ctx = *fscache;
+
+ if (!ctx)
+ return;
+
+ fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+ fscache_relinquish_cookie(ctx->cookie, false);
+ ctx->cookie = NULL;
+
+ kfree(ctx);
+ *fscache = NULL;
+}
+
int erofs_fscache_register_fs(struct super_block *sb)
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 952a2f483f94..c6a3351a4d7d 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
u16 max_pclusterblks;
};

+struct erofs_fscache {
+ struct fscache_cookie *cookie;
+};
+
struct erofs_sb_info {
struct erofs_mount_opts opt; /* options */
#ifdef CONFIG_EROFS_FS_ZIP
@@ -626,9 +630,24 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
#ifdef CONFIG_EROFS_FS_ONDEMAND
int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb);
+
+int erofs_fscache_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache, char *name);
+void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
#else
static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+
+static inline int erofs_fscache_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache,
+ char *name)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
+{
+}
#endif

#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
--
2.27.0

2022-04-06 14:33:43

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 05/20] cachefiles: implement on-demand read

Implement the data plane of on-demand read mode.

A new NETFS_READ_HOLE_ONDEMAND flag is introduced to indicate that
on-demand read should be done when a cache miss encountered. In this
case, the read routine will send a READ request to user daemon, along
with the anonymous fd and the file range that shall be read. Now user
daemon is responsible for fetching data in the given file range, and
then writing the fetched data into cache file with the given anonymous
fd.

After sending the READ request, the read routine will hang there, until
the READ request is handled by user daemon. Then it will retry to read
from the same file range. If a cache miss is encountered again on the
same file range, the read routine will fail then.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/internal.h | 9 ++++
fs/cachefiles/io.c | 11 +++++
fs/cachefiles/ondemand.c | 83 +++++++++++++++++++++++++++++++++
include/linux/netfs.h | 1 +
include/uapi/linux/cachefiles.h | 18 +++++++
5 files changed, 122 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8a397d4da560..b4a834671b6b 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -281,6 +281,9 @@ extern int cachefiles_ondemand_copen(struct cachefiles_cache *cache,
extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
extern void cachefiles_ondemand_clean_object(struct cachefiles_object *object);

+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len);
+
#else
static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
char __user *_buffer, size_t buflen)
@@ -296,6 +299,12 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
static inline void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
{
}
+
+static inline int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len)
+{
+ return -EOPNOTSUPP;
+}
#endif

/*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 50a14e8f0aac..6f2e20cd41f4 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
file, file_inode(file)->i_ino, start_pos, len,
i_size_read(file_inode(file)));

+retry:
/* If the caller asked us to seek for data before doing the read, then
* we should do that now. If we find a gap, we fill it with zeros.
*/
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
if (read_hole == NETFS_READ_HOLE_FAIL)
goto presubmission_error;

+ if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+ ret = cachefiles_ondemand_read(object, off, len);
+ if (ret)
+ goto presubmission_error;
+
+ /* fail the read if no progress achieved */
+ read_hole = NETFS_READ_HOLE_FAIL;
+ goto retry;
+ }
+
iov_iter_zero(len, iter);
skipped = len;
ret = 0;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index defd65124052..149ae1923955 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -11,13 +11,30 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
struct file *file)
{
struct cachefiles_object *object = file->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct xarray *xa = &cache->reqs;
+ struct cachefiles_req *req;
+ unsigned long index;

+ xa_lock(xa);
/*
* Uninstall anon_fd to the cachefiles object, so that no further
* associated requests will get enqueued.
*/
object->fd = -1;

+ /*
+ * Flush all pending READ requests since their completion depends on
+ * anon_fd.
+ */
+ xa_for_each(xa, index, req) {
+ if (req->msg.opcode == CACHEFILES_OP_READ) {
+ req->error = -EIO;
+ complete(&req->done);
+ }
+ }
+ xa_unlock(xa);
+
cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
return 0;
}
@@ -61,11 +78,35 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
return vfs_llseek(file, pos, whence);
}

+static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
+ unsigned long arg)
+{
+ struct cachefiles_object *object = filp->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct cachefiles_req *req;
+ unsigned long id;
+
+ if (ioctl != CACHEFILES_IOC_CREAD)
+ return -EINVAL;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ id = arg;
+ req = xa_erase(&cache->reqs, id);
+ if (!req)
+ return -EINVAL;
+
+ complete(&req->done);
+ return 0;
+}
+
static const struct file_operations cachefiles_ondemand_fd_fops = {
.owner = THIS_MODULE,
.release = cachefiles_ondemand_fd_release,
.write_iter = cachefiles_ondemand_fd_write_iter,
.llseek = cachefiles_ondemand_fd_llseek,
+ .unlocked_ioctl = cachefiles_ondemand_fd_ioctl,
};

/*
@@ -283,6 +324,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
goto out;
}

+ /* recheck anon_fd for READ request with lock held */
+ if (opcode == CACHEFILES_OP_READ && object->fd == -1) {
+ xas_unlock(&xas);
+ ret = -EIO;
+ goto out;
+ }
+
xas.xa_index = 0;
xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
if (xas.xa_node == XAS_RESTART)
@@ -362,6 +410,30 @@ static int init_close_req(struct cachefiles_req *req, void *private)
return 0;
}

+struct cachefiles_read_ctx {
+ loff_t off;
+ size_t len;
+};
+
+static int init_read_req(struct cachefiles_req *req, void *private)
+{
+ struct cachefiles_object *object = req->object;
+ struct cachefiles_read *load = (void *)&req->msg.data;
+ struct cachefiles_read_ctx *read_ctx = private;
+ int fd = object->fd;
+
+ /* Stop enqueuing request when daemon closes anon_fd prematurely. */
+ if (fd == -1) {
+ pr_info_once("READ: anonymous fd closed prematurely.\n");
+ return -EIO;
+ }
+
+ load->off = read_ctx->off;
+ load->len = read_ctx->len;
+ load->fd = fd;
+ return 0;
+}
+
int cachefiles_ondemand_init_object(struct cachefiles_object *object)
{
struct fscache_cookie *cookie = object->cookie;
@@ -394,3 +466,14 @@ void cachefiles_ondemand_clean_object(struct cachefiles_object *object)
sizeof(struct cachefiles_close),
init_close_req, NULL);
}
+
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len)
+{
+ struct cachefiles_read_ctx read_ctx = {pos, len};
+
+ return cachefiles_ondemand_send_req(object,
+ CACHEFILES_OP_READ,
+ sizeof(struct cachefiles_read),
+ init_read_req, &read_ctx);
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index c7bf1eaf51d5..c1854e92333e 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -222,6 +222,7 @@ enum netfs_read_from_hole {
NETFS_READ_HOLE_IGNORE,
NETFS_READ_HOLE_CLEAR,
NETFS_READ_HOLE_FAIL,
+ NETFS_READ_HOLE_ONDEMAND,
};

/*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 73397e142ab3..9506b1697e14 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -3,6 +3,7 @@
#define _LINUX_CACHEFILES_H

#include <linux/types.h>
+#include <linux/ioctl.h>

/*
* Fscache ensures that the maximum length of cookie key is 255. The volume key
@@ -13,6 +14,7 @@
enum cachefiles_opcode {
CACHEFILES_OP_OPEN,
CACHEFILES_OP_CLOSE,
+ CACHEFILES_OP_READ,
};

/*
@@ -51,4 +53,20 @@ struct cachefiles_close {
__u32 fd;
};

+/*
+ * @off identifies the starting offset of the requested file range.
+ * @len identifies the length of the requested file range.
+ */
+struct cachefiles_read {
+ __u64 off;
+ __u64 len;
+ __u32 fd;
+};
+
+/*
+ * Reply for READ request (Completion for READ)
+ * arg for CACHEFILES_IOC_CREAD ioctl is the @id field of READ request.
+ */
+#define CACHEFILES_IOC_CREAD _IOW(0x98, 1, int)
+
#endif
--
2.27.0

2022-04-06 14:33:44

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 07/20] cachefiles: document on-demand read mode

Document new user interface introduced by on-demand read mode.

Signed-off-by: Jeffle Xu <[email protected]>
---
.../filesystems/caching/cachefiles.rst | 165 ++++++++++++++++++
1 file changed, 165 insertions(+)

diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index 8bf396b76359..386801135027 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -28,6 +28,8 @@ Cache on Already Mounted Filesystem

(*) Debugging.

+ (*) On-demand Read.
+


Overview
@@ -482,3 +484,166 @@ the control file. For example::
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug

will turn on all function entry debugging.
+
+
+On-demand Read
+==============
+
+When working in original mode, cachefiles mainly serves as a local cache for
+remote networking fs, while in on-demand read mode, cachefiles can boost the
+scenario where on-demand read semantics is needed, e.g. container image
+distribution.
+
+The essential difference between these two modes is that, in original mode,
+when cache miss, netfs itself will fetch data from remote, and then write the
+fetched data into cache file. While in on-demand read mode, a user daemon is
+responsible for fetching data and then writing to the cache file.
+
+``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
+
+
+Protocol Communication
+----------------------
+
+The on-demand read mode relies on a simple protocol used for communication
+between kernel and user daemon. The model is like::
+
+ kernel --[request]--> user daemon --[reply]--> kernel
+
+The cachefiles kernel module will send requests to user daemon when needed.
+User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
+there's pending request to be processed. A POLLIN event will be returned
+when there's pending request.
+
+Then user daemon needs to read the devnode to fetch one request and process it
+accordingly. It is worth nothing that each read only gets one request. When
+finished processing the request, user daemon needs to write the reply to the
+devnode.
+
+Each request is started with a message header like::
+
+ struct cachefiles_msg {
+ __u32 id;
+ __u32 opcode;
+ __u32 len;
+ __u8 data[];
+ };
+
+ * ``id`` is a unique ID identifying this request among all pending
+ requests.
+
+ * ``opcode`` identifies the type of this request.
+
+ * ``data`` identifies the payload of this request.
+
+ * ``len`` identifies the whole length of this request, including the
+ header and following type specific payload.
+
+
+Turn on On-demand Mode
+----------------------
+
+An optional parameter is added to "bind" command::
+
+ bind [ondemand]
+
+When "bind" command takes without argument, it defaults to the original mode.
+When "bind" command takes with "ondemand" argument, i.e. "bind ondemand",
+on-demand read mode will be enabled.
+
+
+OPEN Request
+------------
+
+When netfs opens a cache file for the first time, a request with
+CACHEFILES_OP_OPEN opcode, a.k.a OPEN request will be sent to user daemon. The
+payload format is like::
+
+ struct cachefiles_open {
+ __u32 volume_key_size;
+ __u32 cookie_key_size;
+ __u32 fd;
+ __u32 flags;
+ __u8 data[];
+ };
+
+ * ``data`` contains volume_key and cookie_key in sequence.
+
+ * ``volume_key_size`` identifies the size of volume key of the cache
+ file, in bytes. volume_key is of string format, with a suffix '\0'.
+
+ * ``cookie_key_size`` identifies the size of cookie key of the cache
+ file, in bytes. cookie_key is of binary format, which is netfs
+ specific.
+
+ * ``fd`` identifies the anonymous fd of the cache file, with which user
+ daemon can perform write/llseek file operations on the cache file.
+
+
+OPEN request contains (volume_key, cookie_key, anon_fd) triple for corresponding
+cache file. With this triple, user daemon could fetch and write data into the
+cache file in the background, even when kernel has not triggered the cache miss
+yet. User daemon is able to distinguish the requested cache file with the given
+(volume_key, cookie_key), and write the fetched data into cache file with the
+given anon_fd.
+
+After recording the (volume_key, cookie_key, anon_fd) triple, user daemon shall
+reply with "copen" (complete open) command::
+
+ copen <id>,<cache_size>
+
+ * ``id`` is exactly the id field of the previous OPEN request.
+
+ * When >= 0, ``cache_size`` identifies the size of the cache file;
+ when < 0, ``cache_size`` identifies the error code ecountered by the
+ user daemon.
+
+
+CLOSE Request
+-------------
+When cookie withdrawed, a request with CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
+request, will be sent to user daemon. It will notify user daemon to close the
+attached anon_fd. The payload format is like::
+
+ struct cachefiles_close {
+ __u32 fd;
+ };
+
+ * ``fd`` identifies the anon_fd to be closed, which is exactly the same
+ with that in OPEN request.
+
+
+READ Request
+------------
+
+When on-demand read mode is turned on, and cache miss encountered, kernel will
+send a request with CACHEFILES_OP_READ opcode, a.k.a READ request, to user
+daemon. It will notify user daemon to fetch data in the requested file range.
+The payload format is like::
+
+ struct cachefiles_read {
+ __u64 off;
+ __u64 len;
+ __u32 fd;
+ };
+
+ * ``off`` identifies the starting offset of the requested file range.
+
+ * ``len`` identifies the length of the requested file range.
+
+ * ``fd`` identifies the anonymous fd of the requested cache file. It is
+ guaranteed that it shall be the same with the fd field in the previous
+ OPEN request.
+
+When receiving one READ request, user daemon needs to fetch data of the
+requested file range, and then write the fetched data into cache file with the
+given anonymous fd.
+
+When finished processing the READ request, user daemon needs to reply with
+CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
+
+ ioctl(fd, CACHEFILES_IOC_CREAD, id);
+
+ * ``fd`` is exactly the fd field of the previous READ request.
+
+ * ``id`` is exactly the id field of the previous READ request.
--
2.27.0

2022-04-06 14:33:51

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 17/20] erofs: implement fscache-based data read for non-inline layout

Implement the data plane of reading data from data blobs over fscache
for non-inline layout.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/inode.c | 5 +++++
fs/erofs/internal.h | 2 ++
3 files changed, 59 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 158cc273f8fb..65de1c754e80 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -60,10 +60,62 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
return ret;
}

+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+ struct folio *folio = page_folio(page);
+ struct inode *inode = folio_file_mapping(folio)->host;
+ struct super_block *sb = inode->i_sb;
+ struct erofs_map_blocks map;
+ struct erofs_map_dev mdev;
+ erofs_off_t pos;
+ loff_t pstart;
+ int ret = 0;
+
+ DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
+
+ pos = folio_pos(folio);
+ map.m_la = pos;
+
+ ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+ if (ret)
+ goto out_unlock;
+
+ if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+ folio_zero_range(folio, 0, folio_size(folio));
+ goto out_uptodate;
+ }
+
+ /* no-inline readpage */
+ mdev = (struct erofs_map_dev) {
+ .m_deviceid = map.m_deviceid,
+ .m_pa = map.m_pa,
+ };
+
+ ret = erofs_map_dev(sb, &mdev);
+ if (ret)
+ goto out_unlock;
+
+ pstart = mdev.m_pa + (pos - map.m_la);
+ ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+ folio_file_mapping(folio), folio_pos(folio),
+ folio_size(folio), pstart);
+
+out_uptodate:
+ if (!ret)
+ folio_mark_uptodate(folio);
+out_unlock:
+ folio_unlock(folio);
+ return ret;
+}
+
static const struct address_space_operations erofs_fscache_meta_aops = {
.readpage = erofs_fscache_meta_readpage,
};

+const struct address_space_operations erofs_fscache_access_aops = {
+ .readpage = erofs_fscache_readpage,
+};
+
/*
* Get the page cache of data blob at the index offset.
* Return: up to date page on success, ERR_PTR() on failure.
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index e8b37ba5e9ad..88b51b5fb53f 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -296,7 +296,12 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
err = z_erofs_fill_inode(inode);
goto out_unlock;
}
+
inode->i_mapping->a_ops = &erofs_raw_access_aops;
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ if (erofs_is_fscache_mode(inode->i_sb))
+ inode->i_mapping->a_ops = &erofs_fscache_access_aops;
+#endif

out_unlock:
erofs_put_metabuf(&buf);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index e186051f0640..336d19647c96 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -642,6 +642,8 @@ int erofs_fscache_register_cookie(struct super_block *sb,
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);

struct folio *erofs_fscache_get_folio(struct super_block *sb, pgoff_t index);
+
+extern const struct address_space_operations erofs_fscache_access_aops;
#else
static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
--
2.27.0

2022-04-06 14:34:02

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 20/20] erofs: add 'fsid' mount option

Introduce 'fsid' mount option to enable on-demand read sementics, in
which case, erofs will be mounted from data blobs. Users could specify
the name of primary data blob by this mount option.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/super.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index a5e4de60a0d8..292b4a70ce19 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -398,6 +398,7 @@ enum {
Opt_dax,
Opt_dax_enum,
Opt_device,
+ Opt_fsid,
Opt_err
};

@@ -422,6 +423,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_flag("dax", Opt_dax),
fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
fsparam_string("device", Opt_device),
+ fsparam_string("fsid", Opt_fsid),
{}
};

@@ -517,6 +519,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
}
++ctx->devs->extra_devices;
break;
+ case Opt_fsid:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ kfree(ctx->opt.fsid);
+ ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
+ if (!ctx->opt.fsid)
+ return -ENOMEM;
+#else
+ errorfc(fc, "fsid option not supported");
+#endif
+ break;
default:
return -ENOPARAM;
}
@@ -597,9 +609,14 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_maxbytes = MAX_LFS_FILESIZE;
sb->s_op = &erofs_sops;

- if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
- erofs_err(sb, "failed to set erofs blksize");
- return -EINVAL;
+ if (erofs_is_fscache_mode(sb)) {
+ sb->s_blocksize = EROFS_BLKSIZ;
+ sb->s_blocksize_bits = LOG_BLOCK_SIZE;
+ } else {
+ if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+ erofs_err(sb, "failed to set erofs blksize");
+ return -EINVAL;
+ }
}

sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
@@ -608,7 +625,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
- sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
+ ctx->opt.fsid = NULL;
sbi->devs = ctx->devs;
ctx->devs = NULL;

@@ -625,6 +642,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
err = super_setup_bdi(sb);
if (err)
return err;
+ } else {
+ sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
}

err = erofs_read_superblock(sb);
@@ -684,6 +703,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

static int erofs_fc_get_tree(struct fs_context *fc)
{
+ struct erofs_fs_context *ctx = fc->fs_private;
+
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
+ return get_tree_nodev(fc, erofs_fc_fill_super);
+
return get_tree_bdev(fc, erofs_fc_fill_super);
}

@@ -733,6 +757,7 @@ static void erofs_fc_free(struct fs_context *fc)
struct erofs_fs_context *ctx = fc->fs_private;

erofs_free_dev_context(ctx->devs);
+ kfree(ctx->opt.fsid);
kfree(ctx);
}

@@ -773,7 +798,10 @@ static void erofs_kill_sb(struct super_block *sb)

WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);

- kill_block_super(sb);
+ if (erofs_is_fscache_mode(sb))
+ generic_shutdown_super(sb);
+ else
+ kill_block_super(sb);

sbi = EROFS_SB(sb);
if (!sbi)
@@ -783,6 +811,7 @@ static void erofs_kill_sb(struct super_block *sb)
fs_put_dax(sbi->dax_dev);
erofs_fscache_unregister_cookie(&sbi->s_fscache);
erofs_fscache_unregister_fs(sb);
+ kfree(sbi->opt.fsid);
kfree(sbi);
sb->s_fs_info = NULL;
}
@@ -884,7 +913,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
struct erofs_sb_info *sbi = EROFS_SB(sb);
- u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+ u64 id = 0;
+
+ if (!erofs_is_fscache_mode(sb))
+ id = huge_encode_dev(sb->s_bdev->bd_dev);

buf->f_type = sb->s_magic;
buf->f_bsize = EROFS_BLKSIZ;
@@ -929,6 +961,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
seq_puts(seq, ",dax=always");
if (test_opt(opt, DAX_NEVER))
seq_puts(seq, ",dax=never");
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ if (opt->fsid)
+ seq_printf(seq, ",fsid=%s", opt->fsid);
+#endif
return 0;
}

--
2.27.0

2022-04-06 14:34:04

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 19/20] erofs: implement fscache-based data readahead

Implement fscache-based data readahead. Also registers an individual
bdi for each erofs instance to enable readahead.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/super.c | 4 ++
2 files changed, 98 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index d32cb5840c6d..620d44210809 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -148,12 +148,106 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
return ret;
}

+static inline void erofs_fscache_unlock_folios(struct readahead_control *rac,
+ size_t len)
+{
+ while (len) {
+ struct folio *folio = readahead_folio(rac);
+
+ len -= folio_size(folio);
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ }
+}
+
+static void erofs_fscache_readahead(struct readahead_control *rac)
+{
+ struct inode *inode = rac->mapping->host;
+ struct super_block *sb = inode->i_sb;
+ size_t len, count, done = 0;
+ erofs_off_t pos;
+ loff_t start, offset;
+ int ret;
+
+ if (!readahead_count(rac))
+ return;
+
+ start = readahead_pos(rac);
+ len = readahead_length(rac);
+
+ do {
+ struct erofs_map_blocks map;
+ struct erofs_map_dev mdev;
+
+ pos = start + done;
+ map.m_la = pos;
+
+ ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+ if (ret)
+ return;
+
+ /*
+ * 1) For CHUNK_BASED layout, the output m_la is rounded down to
+ * the nearest chunk boundary, and the output m_llen actually
+ * starts from the start of the containing chunk.
+ * 2) For other cases, the output m_la is equal to o_la.
+ */
+ offset = start + done;
+ count = min_t(size_t, map.m_llen - (pos - map.m_la), len - done);
+
+ /* Read-ahead Hole */
+ if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+ struct iov_iter iter;
+
+ iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
+ offset, count);
+ iov_iter_zero(count, &iter);
+
+ erofs_fscache_unlock_folios(rac, count);
+ ret = count;
+ continue;
+ }
+
+ /* Read-ahead Inline */
+ if (map.m_flags & EROFS_MAP_META) {
+ struct folio *folio = readahead_folio(rac);
+
+ ret = erofs_fscache_readpage_inline(folio, &map);
+ if (!ret) {
+ folio_mark_uptodate(folio);
+ ret = folio_size(folio);
+ }
+
+ folio_unlock(folio);
+ continue;
+ }
+
+ /* Read-ahead No-inline */
+ mdev = (struct erofs_map_dev) {
+ .m_deviceid = map.m_deviceid,
+ .m_pa = map.m_pa,
+ };
+ ret = erofs_map_dev(sb, &mdev);
+ if (ret)
+ return;
+
+ ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+ rac->mapping, offset, count,
+ mdev.m_pa + (pos - map.m_la));
+ if (!ret) {
+ erofs_fscache_unlock_folios(rac, count);
+ ret = count;
+ }
+ } while (ret > 0 && ((done += ret) < len));
+}
+
static const struct address_space_operations erofs_fscache_meta_aops = {
.readpage = erofs_fscache_meta_readpage,
};

const struct address_space_operations erofs_fscache_access_aops = {
.readpage = erofs_fscache_readpage,
+ .readahead = erofs_fscache_readahead,
};

/*
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8c7181cd37e6..a5e4de60a0d8 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -621,6 +621,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sbi->opt.fsid, true);
if (err)
return err;
+
+ err = super_setup_bdi(sb);
+ if (err)
+ return err;
}

err = erofs_read_superblock(sb);
--
2.27.0

2022-04-06 14:34:27

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 09/20] erofs: add mode checking helper

Until then erofs is exactly blockdev based filesystem.

A new fscache-based mode is going to be introduced for erofs to support
scenarios where on-demand read semantics is needed, e.g. container
image distribution. In this case, erofs could be mounted from data blobs
through fscache.

Add a helper checking which mode erofs works in.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/internal.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fe9564e5091e..05a97533b1e9 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -161,6 +161,11 @@ struct erofs_sb_info {
#define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option)
#define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)

+static inline bool erofs_is_fscache_mode(struct super_block *sb)
+{
+ return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev;
+}
+
enum {
EROFS_ZIP_CACHE_DISABLED,
EROFS_ZIP_CACHE_READAHEAD,
--
2.27.0

2022-04-06 14:34:31

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 10/20] erofs: register fscache volume

A new fscache based mode is going to be introduced for erofs, in which
case on-demand read semantics is implemented through fscache.

As the first step, register fscache volume for each erofs filesystem.
That means, data blobs can not be shared among erofs filesystems. In the
following iteration, we are going to introduce the domain semantics, in
which case several erofs filesystems can belong to one domain, and data
blobs can be shared among these erofs filesystems of one domain.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/Kconfig | 10 ++++++++++
fs/erofs/Makefile | 1 +
fs/erofs/fscache.c | 37 +++++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 13 +++++++++++++
fs/erofs/super.c | 7 +++++++
5 files changed, 68 insertions(+)
create mode 100644 fs/erofs/fscache.c

diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
index f57255ab88ed..3d05265e3e8e 100644
--- a/fs/erofs/Kconfig
+++ b/fs/erofs/Kconfig
@@ -98,3 +98,13 @@ config EROFS_FS_ZIP_LZMA
systems will be readable without selecting this option.

If unsure, say N.
+
+config EROFS_FS_ONDEMAND
+ bool "EROFS fscache-based ondemand-read"
+ depends on CACHEFILES_ONDEMAND && (EROFS_FS=m && FSCACHE || EROFS_FS=y && FSCACHE=y)
+ default n
+ help
+ EROFS is mounted from data blobs and on-demand read semantics is
+ implemented through fscache.
+
+ If unsure, say N.
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 8a3317e38e5a..99bbc597a3e9 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -5,3 +5,4 @@ erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
+erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
new file mode 100644
index 000000000000..7a6d0239ebb1
--- /dev/null
+++ b/fs/erofs/fscache.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022, Alibaba Cloud
+ */
+#include <linux/fscache.h>
+#include "internal.h"
+
+int erofs_fscache_register_fs(struct super_block *sb)
+{
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+ struct fscache_volume *volume;
+ char *name;
+ int ret = 0;
+
+ name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
+ if (!name)
+ return -ENOMEM;
+
+ volume = fscache_acquire_volume(name, NULL, NULL, 0);
+ if (IS_ERR_OR_NULL(volume)) {
+ erofs_err(sb, "failed to register volume for %s", name);
+ ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP;
+ volume = NULL;
+ }
+
+ sbi->volume = volume;
+ kfree(name);
+ return ret;
+}
+
+void erofs_fscache_unregister_fs(struct super_block *sb)
+{
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+ fscache_relinquish_volume(sbi->volume, NULL, false);
+ sbi->volume = NULL;
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 05a97533b1e9..952a2f483f94 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -74,6 +74,7 @@ struct erofs_mount_opts {
unsigned int max_sync_decompress_pages;
#endif
unsigned int mount_opt;
+ char *fsid;
};

struct erofs_dev_context {
@@ -146,6 +147,9 @@ struct erofs_sb_info {
/* sysfs support */
struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
struct completion s_kobj_unregister;
+
+ /* fscache support */
+ struct fscache_volume *volume;
};

#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
@@ -618,6 +622,15 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
}
#endif /* !CONFIG_EROFS_FS_ZIP */

+/* fscache.c */
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+int erofs_fscache_register_fs(struct super_block *sb);
+void erofs_fscache_unregister_fs(struct super_block *sb);
+#else
+static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
+static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+#endif
+
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

#endif /* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0c4b41130c2f..6590ed1b7d3b 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -601,6 +601,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sbi->devs = ctx->devs;
ctx->devs = NULL;

+ if (erofs_is_fscache_mode(sb)) {
+ err = erofs_fscache_register_fs(sb);
+ if (err)
+ return err;
+ }
+
err = erofs_read_superblock(sb);
if (err)
return err;
@@ -757,6 +763,7 @@ static void erofs_kill_sb(struct super_block *sb)

erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev);
+ erofs_fscache_unregister_fs(sb);
kfree(sbi);
sb->s_fs_info = NULL;
}
--
2.27.0

2022-04-06 15:04:00

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v8 14/20] erofs: register fscache context for primary data blob

Registers fscache context for primary data blob. Also move the
initialization of s_op and related fields forward, since anonymous
inode will be allocated under the super block when registering the
fscache context.

Something worth mentioning about the cleanup routine.

1. The fscache context will instantiate anonymous inodes under the super
block. Release these anonymous inodes when .put_super() is called, or
we'll get "VFS: Busy inodes after unmount." warning.

2. The fscache context is initialized prior to the root inode. If
.kill_sb() is called when mount failed, .put_super() won't be called
when root inode has not been initialized yet. Thus .kill_sb() shall
also contain the cleanup routine.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/internal.h | 1 +
fs/erofs/super.c | 15 +++++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 3a4a344cfed3..eb37b33bce37 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -155,6 +155,7 @@ struct erofs_sb_info {

/* fscache support */
struct fscache_volume *volume;
+ struct erofs_fscache *s_fscache;
};

#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 6590ed1b7d3b..9498b899b73b 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -585,6 +585,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
int err;

sb->s_magic = EROFS_SUPER_MAGIC;
+ sb->s_flags |= SB_RDONLY | SB_NOATIME;
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_op = &erofs_sops;

if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
erofs_err(sb, "failed to set erofs blksize");
@@ -605,6 +608,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
err = erofs_fscache_register_fs(sb);
if (err)
return err;
+
+ err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
+ sbi->opt.fsid, true);
+ if (err)
+ return err;
}

err = erofs_read_superblock(sb);
@@ -619,11 +627,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
clear_opt(&sbi->opt, DAX_ALWAYS);
}
}
- sb->s_flags |= SB_RDONLY | SB_NOATIME;
- sb->s_maxbytes = MAX_LFS_FILESIZE;
- sb->s_time_gran = 1;

- sb->s_op = &erofs_sops;
+ sb->s_time_gran = 1;
sb->s_xattr = erofs_xattr_handlers;

if (test_opt(&sbi->opt, POSIX_ACL))
@@ -763,6 +768,7 @@ static void erofs_kill_sb(struct super_block *sb)

erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev);
+ erofs_fscache_unregister_cookie(&sbi->s_fscache);
erofs_fscache_unregister_fs(sb);
kfree(sbi);
sb->s_fs_info = NULL;
@@ -781,6 +787,7 @@ static void erofs_put_super(struct super_block *sb)
iput(sbi->managed_cache);
sbi->managed_cache = NULL;
#endif
+ erofs_fscache_unregister_cookie(&sbi->s_fscache);
}

static struct file_system_type erofs_fs_type = {
--
2.27.0

2022-04-07 09:28:16

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 08/20] erofs: make erofs_map_blocks() generally available

On Wed, Apr 06, 2022 at 03:56:00PM +0800, Jeffle Xu wrote:
> ... so that it can be used in the following introduced fscache mode.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/data.c | 4 ++--
> fs/erofs/internal.h | 2 ++
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 780db1e5f4b7..bc22642358ec 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -110,8 +110,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
> return 0;
> }
>
> -static int erofs_map_blocks(struct inode *inode,
> - struct erofs_map_blocks *map, int flags)
> +int erofs_map_blocks(struct inode *inode,
> + struct erofs_map_blocks *map, int flags)
> {
> struct super_block *sb = inode->i_sb;
> struct erofs_inode *vi = EROFS_I(inode);
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 5298c4ee277d..fe9564e5091e 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -486,6 +486,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
> int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
> int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> u64 start, u64 len);
> +int erofs_map_blocks(struct inode *inode,
> + struct erofs_map_blocks *map, int flags);
>
> /* inode.c */
> static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
> --
> 2.27.0

2022-04-07 13:57:08

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 10/20] erofs: register fscache volume

On Wed, Apr 06, 2022 at 03:56:02PM +0800, Jeffle Xu wrote:
> A new fscache based mode is going to be introduced for erofs, in which
> case on-demand read semantics is implemented through fscache.
>
> As the first step, register fscache volume for each erofs filesystem.
> That means, data blobs can not be shared among erofs filesystems. In the
> following iteration, we are going to introduce the domain semantics, in
> which case several erofs filesystems can belong to one domain, and data
> blobs can be shared among these erofs filesystems of one domain.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/Kconfig | 10 ++++++++++
> fs/erofs/Makefile | 1 +
> fs/erofs/fscache.c | 37 +++++++++++++++++++++++++++++++++++++
> fs/erofs/internal.h | 13 +++++++++++++
> fs/erofs/super.c | 7 +++++++
> 5 files changed, 68 insertions(+)
> create mode 100644 fs/erofs/fscache.c
>
> diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig
> index f57255ab88ed..3d05265e3e8e 100644
> --- a/fs/erofs/Kconfig
> +++ b/fs/erofs/Kconfig
> @@ -98,3 +98,13 @@ config EROFS_FS_ZIP_LZMA
> systems will be readable without selecting this option.
>
> If unsure, say N.
> +
> +config EROFS_FS_ONDEMAND
> + bool "EROFS fscache-based ondemand-read"
> + depends on CACHEFILES_ONDEMAND && (EROFS_FS=m && FSCACHE || EROFS_FS=y && FSCACHE=y)
> + default n
> + help
> + EROFS is mounted from data blobs and on-demand read semantics is
> + implemented through fscache.
> +
> + If unsure, say N.
> diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
> index 8a3317e38e5a..99bbc597a3e9 100644
> --- a/fs/erofs/Makefile
> +++ b/fs/erofs/Makefile
> @@ -5,3 +5,4 @@ erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o sysfs.o
> erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
> erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
> erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
> +erofs-$(CONFIG_EROFS_FS_ONDEMAND) += fscache.o
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> new file mode 100644
> index 000000000000..7a6d0239ebb1
> --- /dev/null
> +++ b/fs/erofs/fscache.c
> @@ -0,0 +1,37 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2022, Alibaba Cloud
> + */
> +#include <linux/fscache.h>
> +#include "internal.h"
> +
> +int erofs_fscache_register_fs(struct super_block *sb)
> +{
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> + struct fscache_volume *volume;
> + char *name;
> + int ret = 0;
> +
> + name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
> + if (!name)
> + return -ENOMEM;
> +
> + volume = fscache_acquire_volume(name, NULL, NULL, 0);
> + if (IS_ERR_OR_NULL(volume)) {
> + erofs_err(sb, "failed to register volume for %s", name);
> + ret = volume ? PTR_ERR(volume) : -EOPNOTSUPP;
> + volume = NULL;
> + }
> +
> + sbi->volume = volume;
> + kfree(name);
> + return ret;
> +}
> +
> +void erofs_fscache_unregister_fs(struct super_block *sb)
> +{
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> +
> + fscache_relinquish_volume(sbi->volume, NULL, false);
> + sbi->volume = NULL;
> +}
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 05a97533b1e9..952a2f483f94 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -74,6 +74,7 @@ struct erofs_mount_opts {
> unsigned int max_sync_decompress_pages;
> #endif
> unsigned int mount_opt;
> + char *fsid;
> };
>
> struct erofs_dev_context {
> @@ -146,6 +147,9 @@ struct erofs_sb_info {
> /* sysfs support */
> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
> struct completion s_kobj_unregister;
> +
> + /* fscache support */
> + struct fscache_volume *volume;
> };
>
> #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
> @@ -618,6 +622,15 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
> }
> #endif /* !CONFIG_EROFS_FS_ZIP */
>
> +/* fscache.c */
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> +int erofs_fscache_register_fs(struct super_block *sb);
> +void erofs_fscache_unregister_fs(struct super_block *sb);
> +#else
> +static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
> +static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
> +#endif
> +
> #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
>
> #endif /* __EROFS_INTERNAL_H */
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 0c4b41130c2f..6590ed1b7d3b 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -601,6 +601,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> sbi->devs = ctx->devs;
> ctx->devs = NULL;
>
> + if (erofs_is_fscache_mode(sb)) {
> + err = erofs_fscache_register_fs(sb);
> + if (err)
> + return err;
> + }
> +
> err = erofs_read_superblock(sb);
> if (err)
> return err;
> @@ -757,6 +763,7 @@ static void erofs_kill_sb(struct super_block *sb)
>
> erofs_free_dev_context(sbi->devs);
> fs_put_dax(sbi->dax_dev);
> + erofs_fscache_unregister_fs(sb);
> kfree(sbi);
> sb->s_fs_info = NULL;
> }
> --
> 2.27.0

2022-04-07 15:40:37

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 12/20] erofs: add anonymous inode managing page cache for data blob

On Wed, Apr 06, 2022 at 03:56:04PM +0800, Jeffle Xu wrote:
> Introduce one anonymous inode managing page cache for data blob. Then
> erofs could read directly from the address space of the anonymous inode
> when cache hit.

Introduce one anonymous inode for data blobs so that erofs
can cache metadata directly within such anonymous inode.

>
> Signed-off-by: Jeffle Xu <[email protected]>

Yeah, I think currently we can live with that:

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang


> ---
> fs/erofs/fscache.c | 39 ++++++++++++++++++++++++++++++++++++---
> fs/erofs/internal.h | 6 ++++--
> 2 files changed, 40 insertions(+), 5 deletions(-)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 67a3c4935245..1c88614203d2 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -5,17 +5,22 @@
> #include <linux/fscache.h>
> #include "internal.h"
>
> +static const struct address_space_operations erofs_fscache_meta_aops = {
> +};
> +
> /*
> * Create an fscache context for data blob.
> * Return: 0 on success and allocated fscache context is assigned to @fscache,
> * negative error number on failure.
> */
> int erofs_fscache_register_cookie(struct super_block *sb,
> - struct erofs_fscache **fscache, char *name)
> + struct erofs_fscache **fscache,
> + char *name, bool need_inode)
> {
> struct fscache_volume *volume = EROFS_SB(sb)->volume;
> struct erofs_fscache *ctx;
> struct fscache_cookie *cookie;
> + int ret;
>
> ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> if (!ctx)
> @@ -25,15 +30,40 @@ int erofs_fscache_register_cookie(struct super_block *sb,
> name, strlen(name), NULL, 0, 0);
> if (!cookie) {
> erofs_err(sb, "failed to get cookie for %s", name);
> - kfree(name);
> - return -EINVAL;
> + ret = -EINVAL;
> + goto err;
> }
>
> fscache_use_cookie(cookie, false);
> ctx->cookie = cookie;
>
> + if (need_inode) {
> + struct inode *const inode = new_inode(sb);
> +
> + if (!inode) {
> + erofs_err(sb, "failed to get anon inode for %s", name);
> + ret = -ENOMEM;
> + goto err_cookie;
> + }
> +
> + set_nlink(inode, 1);
> + inode->i_size = OFFSET_MAX;
> + inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
> + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
> +
> + ctx->inode = inode;
> + }
> +
> *fscache = ctx;
> return 0;
> +
> +err_cookie:
> + fscache_unuse_cookie(ctx->cookie, NULL, NULL);
> + fscache_relinquish_cookie(ctx->cookie, false);
> + ctx->cookie = NULL;
> +err:
> + kfree(ctx);
> + return ret;
> }
>
> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
> @@ -47,6 +77,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
> fscache_relinquish_cookie(ctx->cookie, false);
> ctx->cookie = NULL;
>
> + iput(ctx->inode);
> + ctx->inode = NULL;
> +
> kfree(ctx);
> *fscache = NULL;
> }
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index c6a3351a4d7d..3a4a344cfed3 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {
>
> struct erofs_fscache {
> struct fscache_cookie *cookie;
> + struct inode *inode;
> };
>
> struct erofs_sb_info {
> @@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb);
> void erofs_fscache_unregister_fs(struct super_block *sb);
>
> int erofs_fscache_register_cookie(struct super_block *sb,
> - struct erofs_fscache **fscache, char *name);
> + struct erofs_fscache **fscache,
> + char *name, bool need_inode);
> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
> #else
> static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
> @@ -640,7 +642,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
>
> static inline int erofs_fscache_register_cookie(struct super_block *sb,
> struct erofs_fscache **fscache,
> - char *name)
> + char *name, bool need_inode)
> {
> return -EOPNOTSUPP;
> }
> --
> 2.27.0

2022-04-07 16:45:27

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 20/20] erofs: add 'fsid' mount option

On Wed, Apr 06, 2022 at 03:56:12PM +0800, Jeffle Xu wrote:
> Introduce 'fsid' mount option to enable on-demand read sementics, in
> which case, erofs will be mounted from data blobs. Users could specify
> the name of primary data blob by this mount option.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/super.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index a5e4de60a0d8..292b4a70ce19 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -398,6 +398,7 @@ enum {
> Opt_dax,
> Opt_dax_enum,
> Opt_device,
> + Opt_fsid,
> Opt_err
> };
>
> @@ -422,6 +423,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
> fsparam_flag("dax", Opt_dax),
> fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
> fsparam_string("device", Opt_device),
> + fsparam_string("fsid", Opt_fsid),
> {}
> };
>
> @@ -517,6 +519,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
> }
> ++ctx->devs->extra_devices;
> break;
> + case Opt_fsid:
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> + kfree(ctx->opt.fsid);
> + ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
> + if (!ctx->opt.fsid)
> + return -ENOMEM;
> +#else
> + errorfc(fc, "fsid option not supported");
> +#endif
> + break;
> default:
> return -ENOPARAM;
> }
> @@ -597,9 +609,14 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> sb->s_maxbytes = MAX_LFS_FILESIZE;
> sb->s_op = &erofs_sops;
>
> - if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> - erofs_err(sb, "failed to set erofs blksize");
> - return -EINVAL;
> + if (erofs_is_fscache_mode(sb)) {
> + sb->s_blocksize = EROFS_BLKSIZ;
> + sb->s_blocksize_bits = LOG_BLOCK_SIZE;
> + } else {
> + if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> + erofs_err(sb, "failed to set erofs blksize");
> + return -EINVAL;
> + }
> }
>
> sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
> @@ -608,7 +625,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>
> sb->s_fs_info = sbi;
> sbi->opt = ctx->opt;
> - sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
> + ctx->opt.fsid = NULL;
> sbi->devs = ctx->devs;
> ctx->devs = NULL;
>
> @@ -625,6 +642,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> err = super_setup_bdi(sb);
> if (err)
> return err;
> + } else {
> + sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);

It should go with the previous patch? And even over long line here.

Thanks,
Gao Xiang

> }
>
> err = erofs_read_superblock(sb);
> @@ -684,6 +703,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>
> static int erofs_fc_get_tree(struct fs_context *fc)
> {
> + struct erofs_fs_context *ctx = fc->fs_private;
> +
> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
> + return get_tree_nodev(fc, erofs_fc_fill_super);
> +
> return get_tree_bdev(fc, erofs_fc_fill_super);
> }
>
> @@ -733,6 +757,7 @@ static void erofs_fc_free(struct fs_context *fc)
> struct erofs_fs_context *ctx = fc->fs_private;
>
> erofs_free_dev_context(ctx->devs);
> + kfree(ctx->opt.fsid);
> kfree(ctx);
> }
>
> @@ -773,7 +798,10 @@ static void erofs_kill_sb(struct super_block *sb)
>
> WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
>
> - kill_block_super(sb);
> + if (erofs_is_fscache_mode(sb))
> + generic_shutdown_super(sb);
> + else
> + kill_block_super(sb);
>
> sbi = EROFS_SB(sb);
> if (!sbi)
> @@ -783,6 +811,7 @@ static void erofs_kill_sb(struct super_block *sb)
> fs_put_dax(sbi->dax_dev);
> erofs_fscache_unregister_cookie(&sbi->s_fscache);
> erofs_fscache_unregister_fs(sb);
> + kfree(sbi->opt.fsid);
> kfree(sbi);
> sb->s_fs_info = NULL;
> }
> @@ -884,7 +913,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
> {
> struct super_block *sb = dentry->d_sb;
> struct erofs_sb_info *sbi = EROFS_SB(sb);
> - u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
> + u64 id = 0;
> +
> + if (!erofs_is_fscache_mode(sb))
> + id = huge_encode_dev(sb->s_bdev->bd_dev);
>
> buf->f_type = sb->s_magic;
> buf->f_bsize = EROFS_BLKSIZ;
> @@ -929,6 +961,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
> seq_puts(seq, ",dax=always");
> if (test_opt(opt, DAX_NEVER))
> seq_puts(seq, ",dax=never");
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> + if (opt->fsid)
> + seq_printf(seq, ",fsid=%s", opt->fsid);
> +#endif
> return 0;
> }
>
> --
> 2.27.0
>

2022-04-07 18:50:32

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 14/20] erofs: register fscache context for primary data blob

On Wed, Apr 06, 2022 at 03:56:06PM +0800, Jeffle Xu wrote:
> Registers fscache context for primary data blob. Also move the
> initialization of s_op and related fields forward, since anonymous
> inode will be allocated under the super block when registering the
> fscache context.
>
> Something worth mentioning about the cleanup routine.
>
> 1. The fscache context will instantiate anonymous inodes under the super
> block. Release these anonymous inodes when .put_super() is called, or
> we'll get "VFS: Busy inodes after unmount." warning.
>
> 2. The fscache context is initialized prior to the root inode. If
> .kill_sb() is called when mount failed, .put_super() won't be called
> when root inode has not been initialized yet. Thus .kill_sb() shall
> also contain the cleanup routine.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/internal.h | 1 +
> fs/erofs/super.c | 15 +++++++++++----
> 2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 3a4a344cfed3..eb37b33bce37 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -155,6 +155,7 @@ struct erofs_sb_info {
>
> /* fscache support */
> struct fscache_volume *volume;
> + struct erofs_fscache *s_fscache;
> };
>
> #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 6590ed1b7d3b..9498b899b73b 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -585,6 +585,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> int err;
>
> sb->s_magic = EROFS_SUPER_MAGIC;
> + sb->s_flags |= SB_RDONLY | SB_NOATIME;
> + sb->s_maxbytes = MAX_LFS_FILESIZE;
> + sb->s_op = &erofs_sops;
>
> if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
> erofs_err(sb, "failed to set erofs blksize");
> @@ -605,6 +608,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> err = erofs_fscache_register_fs(sb);
> if (err)
> return err;
> +
> + err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
> + sbi->opt.fsid, true);
> + if (err)
> + return err;
> }
>
> err = erofs_read_superblock(sb);
> @@ -619,11 +627,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> clear_opt(&sbi->opt, DAX_ALWAYS);
> }
> }
> - sb->s_flags |= SB_RDONLY | SB_NOATIME;
> - sb->s_maxbytes = MAX_LFS_FILESIZE;
> - sb->s_time_gran = 1;
>
> - sb->s_op = &erofs_sops;
> + sb->s_time_gran = 1;
> sb->s_xattr = erofs_xattr_handlers;
>
> if (test_opt(&sbi->opt, POSIX_ACL))
> @@ -763,6 +768,7 @@ static void erofs_kill_sb(struct super_block *sb)
>
> erofs_free_dev_context(sbi->devs);
> fs_put_dax(sbi->dax_dev);
> + erofs_fscache_unregister_cookie(&sbi->s_fscache);
> erofs_fscache_unregister_fs(sb);
> kfree(sbi);
> sb->s_fs_info = NULL;
> @@ -781,6 +787,7 @@ static void erofs_put_super(struct super_block *sb)
> iput(sbi->managed_cache);
> sbi->managed_cache = NULL;
> #endif
> + erofs_fscache_unregister_cookie(&sbi->s_fscache);
> }
>
> static struct file_system_type erofs_fs_type = {
> --
> 2.27.0
>

2022-04-07 19:41:21

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 09/20] erofs: add mode checking helper

On Wed, Apr 06, 2022 at 03:56:01PM +0800, Jeffle Xu wrote:
> Until then erofs is exactly blockdev based filesystem.
>
> A new fscache-based mode is going to be introduced for erofs to support
> scenarios where on-demand read semantics is needed, e.g. container
> image distribution. In this case, erofs could be mounted from data blobs
> through fscache.
>
> Add a helper checking which mode erofs works in.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/internal.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fe9564e5091e..05a97533b1e9 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -161,6 +161,11 @@ struct erofs_sb_info {
> #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option)
> #define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)
>
> +static inline bool erofs_is_fscache_mode(struct super_block *sb)
> +{
> + return IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && !sb->s_bdev;
> +}
> +
> enum {
> EROFS_ZIP_CACHE_DISABLED,
> EROFS_ZIP_CACHE_READAHEAD,
> --
> 2.27.0

2022-04-07 20:19:06

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 19/20] erofs: implement fscache-based data readahead

On Wed, Apr 06, 2022 at 03:56:11PM +0800, Jeffle Xu wrote:
> Implement fscache-based data readahead. Also registers an individual
> bdi for each erofs instance to enable readahead.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/fscache.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/super.c | 4 ++
> 2 files changed, 98 insertions(+)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index d32cb5840c6d..620d44210809 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -148,12 +148,106 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
> return ret;
> }
>
> +static inline void erofs_fscache_unlock_folios(struct readahead_control *rac,
> + size_t len)
> +{
> + while (len) {
> + struct folio *folio = readahead_folio(rac);
> +
> + len -= folio_size(folio);
> + folio_mark_uptodate(folio);
> + folio_unlock(folio);
> + }
> +}
> +
> +static void erofs_fscache_readahead(struct readahead_control *rac)
> +{
> + struct inode *inode = rac->mapping->host;
> + struct super_block *sb = inode->i_sb;
> + size_t len, count, done = 0;
> + erofs_off_t pos;
> + loff_t start, offset;
> + int ret;
> +
> + if (!readahead_count(rac))
> + return;
> +
> + start = readahead_pos(rac);
> + len = readahead_length(rac);
> +
> + do {
> + struct erofs_map_blocks map;
> + struct erofs_map_dev mdev;
> +
> + pos = start + done;
> + map.m_la = pos;
> +
> + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> + if (ret)
> + return;
> +
> + /*
> + * 1) For CHUNK_BASED layout, the output m_la is rounded down to
> + * the nearest chunk boundary, and the output m_llen actually
> + * starts from the start of the containing chunk.
> + * 2) For other cases, the output m_la is equal to o_la.
> + */

I think such comment is really unneeded, we should calculate like below
as always. Also I don't find o_la here anymore.

> + offset = start + done;
> + count = min_t(size_t, map.m_llen - (pos - map.m_la), len - done);
> +
> + /* Read-ahead Hole */
> + if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> + struct iov_iter iter;
> +
> + iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
> + offset, count);
> + iov_iter_zero(count, &iter);
> +
> + erofs_fscache_unlock_folios(rac, count);
> + ret = count;
> + continue;
> + }
> +
> + /* Read-ahead Inline */

Unnecessary comment.

> + if (map.m_flags & EROFS_MAP_META) {
> + struct folio *folio = readahead_folio(rac);
> +
> + ret = erofs_fscache_readpage_inline(folio, &map);
> + if (!ret) {
> + folio_mark_uptodate(folio);
> + ret = folio_size(folio);
> + }
> +
> + folio_unlock(folio);
> + continue;
> + }
> +
> + /* Read-ahead No-inline */

Same here.

Thanks,
Gao Xiang

> + mdev = (struct erofs_map_dev) {
> + .m_deviceid = map.m_deviceid,
> + .m_pa = map.m_pa,
> + };
> + ret = erofs_map_dev(sb, &mdev);
> + if (ret)
> + return;
> +
> + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> + rac->mapping, offset, count,
> + mdev.m_pa + (pos - map.m_la));
> + if (!ret) {
> + erofs_fscache_unlock_folios(rac, count);
> + ret = count;
> + }
> + } while (ret > 0 && ((done += ret) < len));
> +}
> +
> static const struct address_space_operations erofs_fscache_meta_aops = {
> .readpage = erofs_fscache_meta_readpage,
> };
>
> const struct address_space_operations erofs_fscache_access_aops = {
> .readpage = erofs_fscache_readpage,
> + .readahead = erofs_fscache_readahead,
> };
>
> /*
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 8c7181cd37e6..a5e4de60a0d8 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -621,6 +621,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> sbi->opt.fsid, true);
> if (err)
> return err;
> +
> + err = super_setup_bdi(sb);
> + if (err)
> + return err;
> }
>
> err = erofs_read_superblock(sb);
> --
> 2.27.0
>

2022-04-07 20:58:30

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 17/20] erofs: implement fscache-based data read for non-inline layout

On Wed, Apr 06, 2022 at 03:56:09PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading data from data blobs over fscache
> for non-inline layout.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/fscache.c | 52 +++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/inode.c | 5 +++++
> fs/erofs/internal.h | 2 ++
> 3 files changed, 59 insertions(+)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 158cc273f8fb..65de1c754e80 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -60,10 +60,62 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
> return ret;
> }
>
> +static int erofs_fscache_readpage(struct file *file, struct page *page)
> +{
> + struct folio *folio = page_folio(page);
> + struct inode *inode = folio_file_mapping(folio)->host;
> + struct super_block *sb = inode->i_sb;
> + struct erofs_map_blocks map;
> + struct erofs_map_dev mdev;
> + erofs_off_t pos;
> + loff_t pstart;
> + int ret = 0;
> +
> + DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
> +
> + pos = folio_pos(folio);
> + map.m_la = pos;
> +
> + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> + if (ret)
> + goto out_unlock;
> +
> + if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> + folio_zero_range(folio, 0, folio_size(folio));
> + goto out_uptodate;
> + }
> +
> + /* no-inline readpage */
> + mdev = (struct erofs_map_dev) {
> + .m_deviceid = map.m_deviceid,
> + .m_pa = map.m_pa,
> + };
> +
> + ret = erofs_map_dev(sb, &mdev);
> + if (ret)
> + goto out_unlock;
> +
> + pstart = mdev.m_pa + (pos - map.m_la);
> + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> + folio_file_mapping(folio), folio_pos(folio),
> + folio_size(folio), pstart);
> +
> +out_uptodate:
> + if (!ret)
> + folio_mark_uptodate(folio);
> +out_unlock:
> + folio_unlock(folio);
> + return ret;
> +}
> +
> static const struct address_space_operations erofs_fscache_meta_aops = {
> .readpage = erofs_fscache_meta_readpage,
> };
>
> +const struct address_space_operations erofs_fscache_access_aops = {
> + .readpage = erofs_fscache_readpage,
> +};
> +
> /*
> * Get the page cache of data blob at the index offset.
> * Return: up to date page on success, ERR_PTR() on failure.
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index e8b37ba5e9ad..88b51b5fb53f 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -296,7 +296,12 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
> err = z_erofs_fill_inode(inode);
> goto out_unlock;
> }
> +

unnecessary modification.

Otherwise looks good:
Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> inode->i_mapping->a_ops = &erofs_raw_access_aops;
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> + if (erofs_is_fscache_mode(inode->i_sb))
> + inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> +#endif
>
> out_unlock:
> erofs_put_metabuf(&buf);
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index e186051f0640..336d19647c96 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -642,6 +642,8 @@ int erofs_fscache_register_cookie(struct super_block *sb,
> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
>
> struct folio *erofs_fscache_get_folio(struct super_block *sb, pgoff_t index);
> +
> +extern const struct address_space_operations erofs_fscache_access_aops;
> #else
> static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
> static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
> --
> 2.27.0
>

2022-04-07 21:08:38

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 11/20] erofs: add fscache context helper functions

On Wed, Apr 06, 2022 at 03:56:03PM +0800, Jeffle Xu wrote:
> Introduce a context structure for managing data blobs, and helper
> functions for initializing and cleaning up this context structure.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/fscache.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/internal.h | 19 +++++++++++++++++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 7a6d0239ebb1..67a3c4935245 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -5,6 +5,52 @@
> #include <linux/fscache.h>
> #include "internal.h"
>
> +/*
> + * Create an fscache context for data blob.
> + * Return: 0 on success and allocated fscache context is assigned to @fscache,
> + * negative error number on failure.
> + */
> +int erofs_fscache_register_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache, char *name)
> +{
> + struct fscache_volume *volume = EROFS_SB(sb)->volume;
> + struct erofs_fscache *ctx;
> + struct fscache_cookie *cookie;
> +
> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + cookie = fscache_acquire_cookie(volume, FSCACHE_ADV_WANT_CACHE_SIZE,
> + name, strlen(name), NULL, 0, 0);
> + if (!cookie) {
> + erofs_err(sb, "failed to get cookie for %s", name);
> + kfree(name);
> + return -EINVAL;
> + }
> +
> + fscache_use_cookie(cookie, false);
> + ctx->cookie = cookie;
> +
> + *fscache = ctx;
> + return 0;
> +}
> +
> +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
> +{
> + struct erofs_fscache *ctx = *fscache;
> +
> + if (!ctx)
> + return;
> +
> + fscache_unuse_cookie(ctx->cookie, NULL, NULL);
> + fscache_relinquish_cookie(ctx->cookie, false);
> + ctx->cookie = NULL;
> +
> + kfree(ctx);
> + *fscache = NULL;
> +}
> +
> int erofs_fscache_register_fs(struct super_block *sb)
> {
> struct erofs_sb_info *sbi = EROFS_SB(sb);
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 952a2f483f94..c6a3351a4d7d 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -97,6 +97,10 @@ struct erofs_sb_lz4_info {
> u16 max_pclusterblks;
> };
>
> +struct erofs_fscache {
> + struct fscache_cookie *cookie;
> +};
> +
> struct erofs_sb_info {
> struct erofs_mount_opts opt; /* options */
> #ifdef CONFIG_EROFS_FS_ZIP
> @@ -626,9 +630,24 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
> #ifdef CONFIG_EROFS_FS_ONDEMAND
> int erofs_fscache_register_fs(struct super_block *sb);
> void erofs_fscache_unregister_fs(struct super_block *sb);
> +
> +int erofs_fscache_register_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache, char *name);
> +void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
> #else
> static inline int erofs_fscache_register_fs(struct super_block *sb) { return 0; }
> static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
> +
> +static inline int erofs_fscache_register_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache,
> + char *name)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
> +{
> +}
> #endif
>
> #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
> --
> 2.27.0

2022-04-07 21:19:25

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 18/20] erofs: implement fscache-based data read for inline layout

On Wed, Apr 06, 2022 at 03:56:10PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading data from data blobs over fscache
> for inline layout.
>
> For the heading non-inline part, the data plane for non-inline layout is
> reused, while only the tail packing part needs special handling.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/fscache.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 65de1c754e80..d32cb5840c6d 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -60,6 +60,40 @@ static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
> return ret;
> }
>
> +static int erofs_fscache_readpage_inline(struct folio *folio,
> + struct erofs_map_blocks *map)
> +{
> + struct inode *inode = folio_file_mapping(folio)->host;
> + struct super_block *sb = inode->i_sb;
> + struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
> + erofs_blk_t blknr;
> + size_t offset, len;
> + void *src, *dst;
> +
> + /*
> + * For inline (tail packing) layout, the offset may be non-zero, which
> + * can be calculated from corresponding physical address directly.
> + */
> + offset = erofs_blkoff(map->m_pa);
> + blknr = erofs_blknr(map->m_pa);
> + len = map->m_llen;
> +
> + src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
> + if (IS_ERR(src))
> + return PTR_ERR(src);
> +
> + DBG_BUGON(folio_size(folio) != PAGE_SIZE);
> +
> + dst = kmap(folio_page(folio, 0));

kmap_local_folio?

> + memcpy(dst, src + offset, len);
> + memset(dst + len, 0, PAGE_SIZE - len);
> + kunmap(folio_page(folio, 0));
> +
> + erofs_put_metabuf(&buf);
> +
> + return 0;
> +}
> +
> static int erofs_fscache_readpage(struct file *file, struct page *page)
> {
> struct folio *folio = page_folio(page);
> @@ -85,6 +119,12 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
> goto out_uptodate;
> }
>
> + /* inline readpage */

I think the code below is self-explained.

> + if (map.m_flags & EROFS_MAP_META) {
> + ret = erofs_fscache_readpage_inline(folio, &map);
> + goto out_uptodate;
> + }
> +
> /* no-inline readpage */

Same here.

Thanks,
Gao Xiang

> mdev = (struct erofs_map_dev) {
> .m_deviceid = map.m_deviceid,
> --
> 2.27.0
>

2022-04-08 02:42:59

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v8 12/20] erofs: add anonymous inode managing page cache for data blob



On 4/7/22 1:31 PM, Gao Xiang wrote:
> On Wed, Apr 06, 2022 at 03:56:04PM +0800, Jeffle Xu wrote:
>> Introduce one anonymous inode managing page cache for data blob. Then
>> erofs could read directly from the address space of the anonymous inode
>> when cache hit.
>
> Introduce one anonymous inode for data blobs so that erofs
> can cache metadata directly within such anonymous inode.
>

Thanks. Commit message will be updated in the next version.

--
Thanks,
Jeffle

2022-04-11 15:32:30

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v8 05/20] cachefiles: implement on-demand read

Jeffle Xu <[email protected]> wrote:

> /*
> * Uninstall anon_fd to the cachefiles object, so that no further
> * associated requests will get enqueued.
> */

"Uninstall anon_fd from..."?

> +static int init_read_req(struct cachefiles_req *req, void *private)

Prefix with "cachefiles_" please (or "cachefiles_ondemand_").

David

2022-04-11 20:06:34

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

On Wed, Apr 06, 2022 at 03:55:52PM +0800, Jeffle Xu wrote:
> changes since v7:
> - rebased to 5.18-rc1
> - include "cachefiles: unmark inode in use in error path" patch into
> this patchset to avoid warning from test robot (patch 1)
> - cachefiles: rename [cookie|volume]_key_len field of struct
> cachefiles_open to [cookie|volume]_key_size to avoid potential
> misunderstanding. Also add more documentation to
> include/uapi/linux/cachefiles.h. (patch 3)
> - cachefiles: valid check for error code returned from user daemon
> (patch 3)
> - cachefiles: change WARN_ON_ONCE() to pr_info_once() when user daemon
> closes anon_fd prematurely (patch 4/5)
> - ready for complete review
>
>
> Kernel Patchset
> ---------------
> Git tree:
>
> https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v8
>
> Gitweb:
>
> https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v8
>
>
> User Daemon for Quick Test
> --------------------------
> Git tree:
>
> https://github.com/lostjeffle/demand-read-cachefilesd.git main
>
> Gitweb:
>
> https://github.com/lostjeffle/demand-read-cachefilesd
>

Btw, we've also finished a preliminary end-to-end on-demand download
daemon in order to test the fscache on-demand kernel code as a real
end-to-end workload for container use cases:

User guide: https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
Video: https://youtu.be/F4IF2_DENXo

Thanks,
Gao Xiang

2022-04-12 12:29:46

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Btw, do you want to add a tracepoint or two to cachefiles to log requests?

David

2022-04-12 21:13:49

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v8 05/20] cachefiles: implement on-demand read



On 4/11/22 8:44 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> /*
>> * Uninstall anon_fd to the cachefiles object, so that no further
>> * associated requests will get enqueued.
>> */
>
> "Uninstall anon_fd from..."?

Okay, will be fixed.

>
>> +static int init_read_req(struct cachefiles_req *req, void *private)
>
> Prefix with "cachefiles_" please (or "cachefiles_ondemand_").

Alright.

Thanks for reviewing.

--
Thanks,
Jeffle

2022-04-12 21:23:49

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v8 07/20] cachefiles: document on-demand read mode

Hi, thanks for such thorough and detailed reviewing and all these
corrections. I will fix them in the next version.


On 4/11/22 9:38 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> + (*) On-demand Read.
>> +
>
> Unnecessary extra blank line.
>
> Jeffle Xu <[email protected]> wrote:
>
> What's the scope of the uniqueness of "id"? Is it just unique to a particular
> cachefiles cache?

Yes. Currently each cache, I mean, each "struct cachefiles_cache",
maintains an xarray. The id is unique in the scope of the cache.


>
>> +
>> + struct cachefiles_close {
>> + __u32 fd;
>> + };
>> +
>
> "where:"
>
>> + * ``fd`` identifies the anon_fd to be closed, which is exactly the same
>
> "... which should be the same as that provided to the OPEN request".
>
> Is it possible for userspace to move the fd around with dup() or whatever?

Currently No. The anon_fd is stored in

```
struct cachefiles_object {
int fd;
...
}
```

When sending READ/CLOSE request, the associated anon_fd is all fetched
from @fd field of struct cachefiles_object. dup() won't update @fd field
of struct cachefiles_object.

Thus when dup() is done, let's say there are fd A (original) and fd B
(duplicated from fd A) associated to the cachefiles_object. Then the @fd
field of following READ/CLOSE requests is always fd A, since @fd field
of struct cachefiles_object is not updated. However the CREAD (reply to
READ request) ioctl indeed can be done on either fd A or fd B.

Then when fd A is closed while fd B is still alive, @fd field of
following READ/CLOSE requests is still fd A, which is indeed buggy since
fd A can be reused then.

To fix this, I plan to replace @fd field of READ/CLOSE requests with
@object_id field.

```
struct cachefiles_close {
__u32 object_id;
};


struct cachefiles_read {
__u32 object_id;
__u64 off;
__u64 len;
};
```

Then each cachefiles_object has a unique object_id (in the scope of
cachefiles_cache). Each object_id can be mapped to multiple fds (1:N
mapping), while kernel only send an initial fd of this object_id through
OPEN request.

```
struct cachefiles_open {
__u32 object_id;
__u32 fd;
__u32 volume_key_size;
__u32 cookie_key_size;
__u32 flags;
__u8 data[];
};
```

The user daemon can modify the mapping through dup(), but it's
responsible for maintaining and updating this mapping. That is, the
mapping between object_id and all its associated fds should be
maintained in the user space.


>> +
>> + struct cachefiles_read {
>> + __u64 off;
>> + __u64 len;
>> + __u32 fd;
>> + };
>> +
>> + * ``off`` identifies the starting offset of the requested file range.
>
> identifies -> indicates
>
>> +
>> + * ``len`` identifies the length of the requested file range.
>> +
>
> identifies -> indicates (you could alternatively say "specified")
>
>> + * ``fd`` identifies the anonymous fd of the requested cache file. It is
>> + guaranteed that it shall be the same with
>
> "same with" -> "same as"
>
> Since the kernel cannot make such a guarantee, I think you may need to restate
> this as something like "Userspace must present the same fd as was given in the
> previous OPEN request".

Yes, whether the @fd field of READ request is same as that of OPEN
request or not, is actually implementation dependent. However as
described above, I'm going to change @fd field into @object_id field.
After that refactoring, the @object_id field of READ/CLOSE request
should be the same as the @object_id filed of CLOSE request.



>> +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
>> +
>> + ioctl(fd, CACHEFILES_IOC_CREAD, id);
>> +
>> + * ``fd`` is exactly the fd field of the previous READ request.
>
> Does that have to be true? What if userspace moves it somewhere else?
>

As described above, I'm going to change @fd field into @object_id field.
Then there is an @object_id filed in READ request. When replying the
READ request, the user daemon itself needs to get the corresponding
anon_fd of the given @object_id through the self-maintained mapping.


--
Thanks,
Jeffle

2022-04-12 22:27:56

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v8 07/20] cachefiles: document on-demand read mode

Jeffle Xu <[email protected]> wrote:

> + (*) On-demand Read.
> +

Unnecessary extra blank line.

Jeffle Xu <[email protected]> wrote:

> +
> +
> +On-demand Read
> +==============
> +
> +When working in original mode, cachefiles mainly serves as a local cache for
> +remote networking fs, while in on-demand read mode, cachefiles can boost the
> +scenario where on-demand read semantics is needed, e.g. container image
> +distribution.
> +
> +The essential difference between these two modes is that, in original mode,
> +when cache miss, netfs itself will fetch data from remote, and then write the
> +fetched data into cache file. While in on-demand read mode, a user daemon is
> +responsible for fetching data and then writing to the cache file.
> +
> +``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.

You're missing a few articles there. How about:

"""
When working in its original mode, cachefiles mainly serves as a local cache
for a remote networking fs - while in on-demand read mode, cachefiles can boost
the scenario where on-demand read semantics are needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when a cache miss occurs, the netfs will fetch the data from the remote server
and then write it to the cache file. With on-demand read mode, however,
fetching and the data and writing it into the cache is delegated to a user
daemon.

``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
"""

"should be enabled".

Also, two spaces after a full stop please (but not after the dot in a
contraction, e.g. "e.g.").

> +The on-demand read mode relies on a simple protocol used for communication
> +between kernel and user daemon. The model is like::

"The protocol can be modelled as"?

> +The cachefiles kernel module will send requests to

the

> user daemon when needed.
> +

the

> User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
> +there's

a

> pending request to be processed. A POLLIN event will be returned
> +when there's

a

> pending request.

> +Then user daemon needs to read

"The user daemon [than] reads "

> the devnode to fetch one

one -> a

> request and process it
> +accordingly. It is worth nothing

nothing -> noting

> that each read only gets one request. When
> +finished processing the request,

the

> user daemon needs to write the reply to the
> +devnode.

> +Each request is started with a message header like::

"is started with" -> "starts with".
"like" -> "of the form".

> + * ``id`` is a unique ID identifying this request among all pending
> + requests.

What's the scope of the uniqueness of "id"? Is it just unique to a particular
cachefiles cache?

> + * ``len`` identifies the whole length of this request, including the
> + header and following type specific payload.

type-specific.

> +An optional parameter is added to "bind" command::

to the "bind" command.

> +When

the

> "bind" command takes

takes -> is given

> without argument, it defaults to the original mode.
> +When

the

> "bind" command takes

is given

> with

the

> "ondemand" argument, i.e. "bind ondemand",
> +on-demand read mode will be enabled.

> +OPEN Request

The

> +------------
> +
> +When

the

> netfs opens a cache file for the first time, a request with

the

> +CACHEFILES_OP_OPEN opcode, a.k.a

an

> OPEN request will be sent to

the

> user daemon. The
> +payload format is like::

format is like -> of the form

> +
> + struct cachefiles_open {
> + __u32 volume_key_size;
> + __u32 cookie_key_size;
> + __u32 fd;
> + __u32 flags;
> + __u8 data[];
> + };
> +

"where:"

> + * ``data`` contains

the

> volume_key and cookie_key in sequence.

Might be better to say "contains the volume_key followed directly by the
cookie_key. The volume key is a NUL-terminated string; cookie_key is binary
data.".

> +
> + * ``volume_key_size`` identifies

identifies -> indicates/supplies

> the size of

the

> volume key of the cache
> + file, in bytes. volume_key is of string format, with a suffix '\0'.
> +
> + * ``cookie_key_size`` identifies the size of cookie key of the cache
> + file, in bytes. cookie_key is of binary format, which is netfs
> + specific.

"... indicates the size of the cookie key in bytes."

> +
> + * ``fd`` identifies the

the -> an

> anonymous fd of

of -> referring to

> the cache file, with

with -> through

> which user
> + daemon can perform write/llseek file operations on the cache file.
> +
> +
> +

The

> OPEN request contains

a

> (volume_key, cookie_key, anon_fd) triple for

triplet for the

I would probably also use {...} rather than (...).

> corresponding
> +cache file. With this triple,

triplet, the

> user daemon could

could -> can

> fetch and write data into the
> +cache file in the background, even when kernel has not triggered the

the -> a

> cache miss
> +yet.

The

> User daemon is able to distinguish the requested cache file with the given
> +(volume_key, cookie_key), and write the fetched data into

the

> cache file with

with -> using

> the
> +given anon_fd.
> +
> +After recording the (volume_key, cookie_key, anon_fd) triple,

triplet, the

> user daemon shall

shall -> should

> +reply with

reply with -> complete the request by issuing a

> "copen" (complete open) command::
> +
> + copen <id>,<cache_size>
> +
> + * ``id`` is exactly the id field of the previous OPEN request.
> +
> + * When >= 0, ``cache_size`` identifies the size of the cache file;
> + when < 0, ``cache_size`` identifies the error code ecountered by the
> + user daemon.

identifies -> indicates
ecountered -> encountered

> +CLOSE Request

The

> +-------------
> +When

a

> cookie withdrawed,

withdrawed -> withdrawn

> a request with

a

> CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
> +request,

Maybe phrase as "... a close request (opcode CACHEFILES_OP_CLOSE),

> will be sent to user daemon. It will notify

the

> user daemon to close the
> +attached anon_fd. The payload format is like::

like -> of the form

> +
> + struct cachefiles_close {
> + __u32 fd;
> + };
> +

"where:"

> + * ``fd`` identifies the anon_fd to be closed, which is exactly the same

"... which should be the same as that provided to the OPEN request".

Is it possible for userspace to move the fd around with dup() or whatever?

> + with that in OPEN request.
> +
> +
> +READ Request

The

> +------------
> +
> +When on-demand read mode is turned on, and

a

> cache miss encountered,

the

> kernel will
> +send a request with CACHEFILES_OP_READ opcode, a.k.a READ request,

"send a READ request (opcode CACHEFILES_OP_READ)"

> to

the

> user
> +daemon. It will notify

It will notify -> This will ask/tell

> user daemon to fetch data in the requested file range.
> +The payload format is like::

format is like -> is of the form

> +
> + struct cachefiles_read {
> + __u64 off;
> + __u64 len;
> + __u32 fd;
> + };
> +
> + * ``off`` identifies the starting offset of the requested file range.

identifies -> indicates

> +
> + * ``len`` identifies the length of the requested file range.
> +

identifies -> indicates (you could alternatively say "specified")

> + * ``fd`` identifies the anonymous fd of the requested cache file. It is
> + guaranteed that it shall be the same with

"same with" -> "same as"

Since the kernel cannot make such a guarantee, I think you may need to restate
this as something like "Userspace must present the same fd as was given in the
previous OPEN request".

> the fd field in the previous
> + OPEN request.
> +
> +When receiving one

one -> a

> READ request,

the

> user daemon needs to fetch

the

> data of the
> +requested file range, and then write the fetched data

, and then write the fetched data -> and write it

> into cache file

cache file -> cache

> with

using

> the
> +given anonymous fd.

+ to indicate the destination.

> +
> +When finished

When finished -> To finish

> processing the READ request,

the

> user daemon needs to reply with

the

> +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
> +
> + ioctl(fd, CACHEFILES_IOC_CREAD, id);
> +
> + * ``fd`` is exactly the fd field of the previous READ request.

Does that have to be true? What if userspace moves it somewhere else?

> +
> + * ``id`` is exactly the id field of the previous READ request.

is exactly the -> must match the

David

2022-04-12 23:49:54

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics



On 4/11/22 9:43 PM, David Howells wrote:
> Btw, do you want to add a tracepoint or two to cachefiles to log requests?
>

Good idea. Tracepoints will help a lot when debugging.


--
Thanks,
Jeffle

2022-04-13 23:14:10

by 田子晨

[permalink] [raw]
Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics



> 2022年4月10日 下午8:51,Gao Xiang <[email protected]> 写道:
>
> On Wed, Apr 06, 2022 at 03:55:52PM +0800, Jeffle Xu wrote:
>> changes since v7:
>> - rebased to 5.18-rc1
>> - include "cachefiles: unmark inode in use in error path" patch into
>> this patchset to avoid warning from test robot (patch 1)
>> - cachefiles: rename [cookie|volume]_key_len field of struct
>> cachefiles_open to [cookie|volume]_key_size to avoid potential
>> misunderstanding. Also add more documentation to
>> include/uapi/linux/cachefiles.h. (patch 3)
>> - cachefiles: valid check for error code returned from user daemon
>> (patch 3)
>> - cachefiles: change WARN_ON_ONCE() to pr_info_once() when user daemon
>> closes anon_fd prematurely (patch 4/5)
>> - ready for complete review
>>
>>
>> Kernel Patchset
>> ---------------
>> Git tree:
>>
>> https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v8
>>
>> Gitweb:
>>
>> https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v8
>>
>>
>> User Daemon for Quick Test
>> --------------------------
>> Git tree:
>>
>> https://github.com/lostjeffle/demand-read-cachefilesd.git main
>>
>> Gitweb:
>>
>> https://github.com/lostjeffle/demand-read-cachefilesd
>>
>
> Btw, we've also finished a preliminary end-to-end on-demand download
> daemon in order to test the fscache on-demand kernel code as a real
> end-to-end workload for container use cases:
>
> User guide: https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
> Video: https://youtu.be/F4IF2_DENXo

Tested-by: Zichen Tian <[email protected]>

> Thanks,
> Gao Xiang
>

2022-04-14 13:33:13

by Jiachen Zhang

[permalink] [raw]
Subject: Re: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

On Sun, Apr 10, 2022 at 8:52 PM Gao Xiang <[email protected]> wrote:
>
> On Wed, Apr 06, 2022 at 03:55:52PM +0800, Jeffle Xu wrote:
> > changes since v7:
> > - rebased to 5.18-rc1
> > - include "cachefiles: unmark inode in use in error path" patch into
> > this patchset to avoid warning from test robot (patch 1)
> > - cachefiles: rename [cookie|volume]_key_len field of struct
> > cachefiles_open to [cookie|volume]_key_size to avoid potential
> > misunderstanding. Also add more documentation to
> > include/uapi/linux/cachefiles.h. (patch 3)
> > - cachefiles: valid check for error code returned from user daemon
> > (patch 3)
> > - cachefiles: change WARN_ON_ONCE() to pr_info_once() when user daemon
> > closes anon_fd prematurely (patch 4/5)
> > - ready for complete review
> >
> >
> > Kernel Patchset
> > ---------------
> > Git tree:
> >
> > https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v8
> >
> > Gitweb:
> >
> > https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v8
> >
> >
> > User Daemon for Quick Test
> > --------------------------
> > Git tree:
> >
> > https://github.com/lostjeffle/demand-read-cachefilesd.git main
> >
> > Gitweb:
> >
> > https://github.com/lostjeffle/demand-read-cachefilesd
> >
>
> Btw, we've also finished a preliminary end-to-end on-demand download
> daemon in order to test the fscache on-demand kernel code as a real
> end-to-end workload for container use cases:
>
> User guide: https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
> Video: https://youtu.be/F4IF2_DENXo
>
> Thanks,
> Gao Xiang

Hi Xiang,

I think this feature is interesting and promising. So I have performed
some tests according to the user guide. Hope it can be an upstream
feature.

Thanks,
Jiachen

2022-04-14 15:34:14

by Gao Xiang

[permalink] [raw]
Subject: Re: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Hi Jiachen,

On Thu, Apr 14, 2022 at 04:10:10PM +0800, Jiachen Zhang wrote:
> On Sun, Apr 10, 2022 at 8:52 PM Gao Xiang <[email protected]> wrote:
> >
> > On Wed, Apr 06, 2022 at 03:55:52PM +0800, Jeffle Xu wrote:
> > > changes since v7:
> > > - rebased to 5.18-rc1
> > > - include "cachefiles: unmark inode in use in error path" patch into
> > > this patchset to avoid warning from test robot (patch 1)
> > > - cachefiles: rename [cookie|volume]_key_len field of struct
> > > cachefiles_open to [cookie|volume]_key_size to avoid potential
> > > misunderstanding. Also add more documentation to
> > > include/uapi/linux/cachefiles.h. (patch 3)
> > > - cachefiles: valid check for error code returned from user daemon
> > > (patch 3)
> > > - cachefiles: change WARN_ON_ONCE() to pr_info_once() when user daemon
> > > closes anon_fd prematurely (patch 4/5)
> > > - ready for complete review
> > >
> > >
> > > Kernel Patchset
> > > ---------------
> > > Git tree:
> > >
> > > https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v8
> > >
> > > Gitweb:
> > >
> > > https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v8
> > >
> > >
> > > User Daemon for Quick Test
> > > --------------------------
> > > Git tree:
> > >
> > > https://github.com/lostjeffle/demand-read-cachefilesd.git main
> > >
> > > Gitweb:
> > >
> > > https://github.com/lostjeffle/demand-read-cachefilesd
> > >
> >
> > Btw, we've also finished a preliminary end-to-end on-demand download
> > daemon in order to test the fscache on-demand kernel code as a real
> > end-to-end workload for container use cases:
> >
> > User guide: https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
> > Video: https://youtu.be/F4IF2_DENXo
> >
> > Thanks,
> > Gao Xiang
>
> Hi Xiang,
>
> I think this feature is interesting and promising. So I have performed
> some tests according to the user guide. Hope it can be an upstream
> feature.

Many thanks for the feedback. We're doing our best to form/stablize it
now. Still struggle with some specific cases.

Thanks,
Gao Xiang


>
> Thanks,
> Jiachen