2022-04-16 01:58:05

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 00/21] fscache,erofs: fscache-based on-demand read semantics

changes since v8:
- rebase to 5.18-rc2
- cachefiles: use object_id rather than anon_fd to uniquely identify a
cachefile object to avoid potential issues when the user moves the
anonymous fd around, e.g. through dup() (refer to commit message and
cachefiles_ondemand_get_fd() of patch 2 for more details)
(David Howells)
- cachefiles: add @unbind_pincount refcount to avoid the potential deadlock
(refer to commit message of patch3 for more details)
- cachefiles: move the calling site of cachefiles_ondemand_read() from
cachefiles_read() to cacehfiles_prep_read() (refer to commit message
of patch 5 for more details)
- cachefiles: add tracepoints (patch 7) (David Howells)
- cachefiles: update documentation (patch 8) (David Howells)
- erofs: update Reviewed-by tag from Gao Xiang
- erofs: move the logic of initializing bdev/dax_dev in fscache mode out
from patch 15/20. Instead move it into patch 9, so that patch 20 can
focus on the mount option handling
- erofs: update the subject line and commit message of patch 12 (Gao
Xiang)
- erofs: remove and fold erofs_fscache_get_folio() helper (patch 16)
(Gao Xiang)
- erofs: change kmap() to kamp_loacl_folio(), and comment cleanup (patch
18) (Gao Xiang)
- update "advantage of fscache-based on-demand read" section of the
cover letter
- we've finished a preliminary end-to-end on-demand download daemon in
order to test the fscache on-demand kernel code as a real end-to-end
workload for container use cases. The test user guide is added in the
cover letter.
- Thanks Zichen Tian for testing
Tested-by: Zichen Tian <[email protected]>


Kernel Patchset
---------------
Git tree:

https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v9

Gitweb:

https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v9


User Guide for E2E Container Use Case
-------------------------------------
User guide:

https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md

Video:

https://youtu.be/F4IF2_DENXo


User Daemon for Quick Test
--------------------------
Git tree:

https://github.com/lostjeffle/demand-read-cachefilesd.git main

Gitweb:

https://github.com/lostjeffle/demand-read-cachefilesd


RFC: https://lore.kernel.org/all/[email protected]/t/
v1: https://lore.kernel.org/lkml/[email protected]/T/
v2: https://lore.kernel.org/all/[email protected]/t/
v3: https://lore.kernel.org/lkml/[email protected]/T/
v4: https://lore.kernel.org/lkml/[email protected]/T/#t
v5: https://lore.kernel.org/lkml/[email protected]/T/
v6: https://lore.kernel.org/lkml/[email protected]/T/
v7: https://lore.kernel.org/lkml/[email protected]/T/
v8: https://lore.kernel.org/all/[email protected]/T/


[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.

erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)

Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.

To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.

The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.

When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user daemon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.


[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetching data from local/remote on cache miss) is
done on the user daemon side even without any file system request driven.
In addition, if cached data has already been available locally, fscache
will use it instead of trapping to user space anymore.

Therefore, different from event-driven approaches, the fscache on-demand
user daemon could also fetch data (from remote) asynchronously in the
background just like most multi-threaded HTTP downloaders.

2. Flexible request amplification
Since the data plane can be independently controlled by the user daemon,
the user daemon can also fetch more data from remote than that the file
system actually requests for small I/O sizes. Then, fetched data in bulk
will be available at once and fscache won't be trapped into the user
daemon again.

3. Support massive blobs
This mechanism can naturally support a large amount of backing files,
and thus can benefit the densely employed scenarios. In our use cases,
one container image can be formed of one bootstrap (required) and
multiple chunk-deduplicated data blobs (optional).

For example, one container image for node.js will correspond to ~20
files in total. In densely employed environment, there could be hundreds
of containers and thus thousands of backing files on one machine.




Jeffle Xu (21):
cachefiles: extract write routine
cachefiles: notify user daemon when looking up cookie
cachefiles: unbind cachefiles gracefully in on-demand mode
cachefiles: notify user daemon when withdrawing cookie
cachefiles: implement on-demand read
cachefiles: enable on-demand read mode
cachefiles: add tracepoints for on-demand read mode
cachefiles: document on-demand read mode
erofs: make erofs_map_blocks() generally available
erofs: add fscache mode check helper
erofs: register fscache volume
erofs: add fscache context helper functions
erofs: add anonymous inode caching metadata for data blobs
erofs: add erofs_fscache_read_folios() helper
erofs: register fscache context for primary data blob
erofs: register fscache context for extra data blobs
erofs: implement fscache-based metadata read
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data readahead
erofs: add 'fsid' mount option

.../filesystems/caching/cachefiles.rst | 170 ++++++
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 116 +++-
fs/cachefiles/interface.c | 2 +
fs/cachefiles/internal.h | 74 +++
fs/cachefiles/io.c | 76 ++-
fs/cachefiles/namei.c | 16 +-
fs/cachefiles/ondemand.c | 496 ++++++++++++++++++
fs/erofs/Kconfig | 10 +
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 26 +-
fs/erofs/fscache.c | 365 +++++++++++++
fs/erofs/inode.c | 4 +
fs/erofs/internal.h | 49 ++
fs/erofs/super.c | 105 +++-
fs/erofs/sysfs.c | 4 +-
include/linux/fscache.h | 1 +
include/linux/netfs.h | 2 +
include/trace/events/cachefiles.h | 176 +++++++
include/uapi/linux/cachefiles.h | 68 +++
21 files changed, 1694 insertions(+), 79 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 fs/erofs/fscache.c
create mode 100644 include/uapi/linux/cachefiles.h

--
2.27.0


2022-04-16 02:02:08

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 16/21] erofs: register fscache context for extra data blobs

Similar to the multi device mode, erofs could be mounted from one
primary data blob (mandatory) and multiple extra data blobs (optional).

Register fscache context for each extra data blob.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 3 +++
fs/erofs/internal.h | 2 ++
fs/erofs/super.c | 8 +++++++-
3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index bc22642358ec..14b64d960541 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = sb->s_bdev;
map->m_daxdev = EROFS_SB(sb)->dax_dev;
map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+ map->m_fscache = EROFS_SB(sb)->s_fscache;

if (map->m_deviceid) {
down_read(&devs->rwsem);
@@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
+ map->m_fscache = dif->fscache;
up_read(&devs->rwsem);
} else if (devs->extra_devices) {
down_read(&devs->rwsem);
@@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
+ map->m_fscache = dif->fscache;
break;
}
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 386658416159..fa488af8dfcf 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;

struct erofs_device_info {
char *path;
+ struct erofs_fscache *fscache;
struct block_device *bdev;
struct dax_device *dax_dev;
u64 dax_part_off;
@@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
#endif /* !CONFIG_EROFS_FS_ZIP */

struct erofs_map_dev {
+ struct erofs_fscache *m_fscache;
struct block_device *m_bdev;
struct dax_device *m_daxdev;
u64 m_dax_part_off;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 61dc900295f9..c6755bcae4a6 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb,
}
dis = ptr + erofs_blkoff(pos);

- if (!erofs_is_fscache_mode(sb)) {
+ if (erofs_is_fscache_mode(sb)) {
+ err = erofs_fscache_register_cookie(sb, &dif->fscache,
+ dif->path, false);
+ if (err)
+ break;
+ } else {
bdev = blkdev_get_by_path(dif->path,
FMODE_READ | FMODE_EXCL,
sb->s_type);
@@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
fs_put_dax(dif->dax_dev);
if (dif->bdev)
blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
+ erofs_fscache_unregister_cookie(&dif->fscache);
kfree(dif->path);
kfree(dif);
return 0;
--
2.27.0

2022-04-16 02:06:47

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 13/21] erofs: add anonymous inode caching metadata for data blobs

Introduce one anonymous inode for data blobs so that erofs can cache
metadata directly within such anonymous inode.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
---
fs/erofs/fscache.c | 39 ++++++++++++++++++++++++++++++++++++---
fs/erofs/internal.h | 6 ++++--
2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 67a3c4935245..1c88614203d2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,17 +5,22 @@
#include <linux/fscache.h>
#include "internal.h"

+static const struct address_space_operations erofs_fscache_meta_aops = {
+};
+
/*
* Create an fscache context for data blob.
* Return: 0 on success and allocated fscache context is assigned to @fscache,
* negative error number on failure.
*/
int erofs_fscache_register_cookie(struct super_block *sb,
- struct erofs_fscache **fscache, char *name)
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode)
{
struct fscache_volume *volume = EROFS_SB(sb)->volume;
struct erofs_fscache *ctx;
struct fscache_cookie *cookie;
+ int ret;

ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
@@ -25,15 +30,40 @@ int erofs_fscache_register_cookie(struct super_block *sb,
name, strlen(name), NULL, 0, 0);
if (!cookie) {
erofs_err(sb, "failed to get cookie for %s", name);
- kfree(name);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err;
}

fscache_use_cookie(cookie, false);
ctx->cookie = cookie;

+ if (need_inode) {
+ struct inode *const inode = new_inode(sb);
+
+ if (!inode) {
+ erofs_err(sb, "failed to get anon inode for %s", name);
+ ret = -ENOMEM;
+ goto err_cookie;
+ }
+
+ set_nlink(inode, 1);
+ inode->i_size = OFFSET_MAX;
+ inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+
+ ctx->inode = inode;
+ }
+
*fscache = ctx;
return 0;
+
+err_cookie:
+ fscache_unuse_cookie(ctx->cookie, NULL, NULL);
+ fscache_relinquish_cookie(ctx->cookie, false);
+ ctx->cookie = NULL;
+err:
+ kfree(ctx);
+ return ret;
}

void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
@@ -47,6 +77,9 @@ void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
fscache_relinquish_cookie(ctx->cookie, false);
ctx->cookie = NULL;

+ iput(ctx->inode);
+ ctx->inode = NULL;
+
kfree(ctx);
*fscache = NULL;
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b1f19f058503..5867cb63fd74 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,7 @@ struct erofs_sb_lz4_info {

struct erofs_fscache {
struct fscache_cookie *cookie;
+ struct inode *inode;
};

struct erofs_sb_info {
@@ -632,7 +633,8 @@ int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb);

int erofs_fscache_register_cookie(struct super_block *sb,
- struct erofs_fscache **fscache, char *name);
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode);
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
#else
static inline int erofs_fscache_register_fs(struct super_block *sb)
@@ -643,7 +645,7 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}

static inline int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
- char *name)
+ char *name, bool need_inode)
{
return -EOPNOTSUPP;
}
--
2.27.0

2022-04-16 02:07:00

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 21/21] erofs: add 'fsid' mount option

Introduce 'fsid' mount option to enable on-demand read sementics, in
which case, erofs will be mounted from data blobs. Users could specify
the name of primary data blob by this mount option.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++-
fs/erofs/sysfs.c | 4 ++--
2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f68ba929100d..4a623630e1c4 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb)

if (erofs_sb_has_ztailpacking(sbi))
erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
+ if (erofs_is_fscache_mode(sb))
+ erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!");
out:
erofs_put_metabuf(&buf);
return ret;
@@ -399,6 +401,7 @@ enum {
Opt_dax,
Opt_dax_enum,
Opt_device,
+ Opt_fsid,
Opt_err
};

@@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_flag("dax", Opt_dax),
fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
fsparam_string("device", Opt_device),
+ fsparam_string("fsid", Opt_fsid),
{}
};

@@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
}
++ctx->devs->extra_devices;
break;
+ case Opt_fsid:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ kfree(ctx->opt.fsid);
+ ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
+ if (!ctx->opt.fsid)
+ return -ENOMEM;
+#else
+ errorfc(fc, "fsid option not supported");
+#endif
+ break;
default:
return -ENOPARAM;
}
@@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
+ ctx->opt.fsid = NULL;
sbi->devs = ctx->devs;
ctx->devs = NULL;

@@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

static int erofs_fc_get_tree(struct fs_context *fc)
{
+ struct erofs_fs_context *ctx = fc->fs_private;
+
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
+ return get_tree_nodev(fc, erofs_fc_fill_super);
+
return get_tree_bdev(fc, erofs_fc_fill_super);
}

@@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
struct erofs_fs_context *ctx = fc->fs_private;

erofs_free_dev_context(ctx->devs);
+ kfree(ctx->opt.fsid);
kfree(ctx);
}

@@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)

WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);

- kill_block_super(sb);
+ if (erofs_is_fscache_mode(sb))
+ generic_shutdown_super(sb);
+ else
+ kill_block_super(sb);

sbi = EROFS_SB(sb);
if (!sbi)
@@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb)
fs_put_dax(sbi->dax_dev);
erofs_fscache_unregister_cookie(&sbi->s_fscache);
erofs_fscache_unregister_fs(sb);
+ kfree(sbi->opt.fsid);
kfree(sbi);
sb->s_fs_info = NULL;
}
@@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
seq_puts(seq, ",dax=always");
if (test_opt(opt, DAX_NEVER))
seq_puts(seq, ",dax=never");
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ if (opt->fsid)
+ seq_printf(seq, ",fsid=%s", opt->fsid);
+#endif
return 0;
}

diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
index f3babf1e6608..c1383e508bbe 100644
--- a/fs/erofs/sysfs.c
+++ b/fs/erofs/sysfs.c
@@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb)

sbi->s_kobj.kset = &erofs_root;
init_completion(&sbi->s_kobj_unregister);
- err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL,
- "%s", sb->s_id);
+ err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
+ erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
if (err)
goto put_sb_kobj;
return 0;
--
2.27.0

2022-04-16 02:09:31

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 06/21] cachefiles: enable on-demand read mode

Enable on-demand read mode by adding an optional parameter to the "bind"
command.

On-demand mode will be turned on when this parameter is "ondemand", i.e.
"bind ondemand". Otherwise cachefiles will work in the original mode.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/daemon.c | 13 ++++++++-----
fs/cachefiles/io.c | 11 -----------
2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 2e946e4eb65a..c8bde21ace6a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -758,11 +758,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
cache->brun_percent >= 100)
return -ERANGE;

- if (*args) {
- pr_err("'bind' command doesn't take an argument\n");
- return -EINVAL;
- }
-
if (!cache->rootdirname) {
pr_err("No cache directory specified\n");
return -EINVAL;
@@ -774,6 +769,14 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
return -EBUSY;
}

+ if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+ !strcmp(args, "ondemand")) {
+ set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+ } else if (*args) {
+ pr_err("'bind' command doesn't take an argument\n");
+ return -EINVAL;
+ }
+
/* Make sure we have copies of the tag string */
if (!cache->tag) {
/*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index ccf77a969653..000a28f46e59 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,7 +95,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
file, file_inode(file)->i_ino, start_pos, len,
i_size_read(file_inode(file)));

-retry:
/* If the caller asked us to seek for data before doing the read, then
* we should do that now. If we find a gap, we fill it with zeros.
*/
@@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
if (read_hole == NETFS_READ_HOLE_FAIL)
goto presubmission_error;

- if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
- ret = cachefiles_ondemand_read(object, off, len);
- if (ret)
- goto presubmission_error;
-
- /* fail the read if no progress achieved */
- read_hole = NETFS_READ_HOLE_FAIL;
- goto retry;
- }
-
iov_iter_zero(len, iter);
skipped = len;
ret = 0;
--
2.27.0

2022-04-16 02:10:47

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode

Add a refcount to avoid the deadlock in on-demand read mode. The
on-demand read mode will pin the corresponding cachefiles object for
each anonymous fd. The cachefiles object is unpinned when the anonymous
fd gets closed. When the user daemon exits and the fd of
"/dev/cachefiles" device node gets closed, it will wait for all
cahcefiles objects gets withdrawn. Then if there's any anonymous fd
getting closed after the fd of the device node, the user daemon will
hang forever, waiting for all objects getting withdrawn.

To fix this, add a refcount indicating if there's any object pinned by
anonymous fds. The cachefiles cache gets unbound and withdrawn when the
refcount decreased to 0. It won't change the behaviour of the original
mode, in which case the cachefiles cache gets unbound and withdrawn as
long as the fd of the device node gets closed. Besides, kref_get() is
adequate whilst kref_get_unless_zero() is not needed here, since no more
anonymous fd will be created when the .release() callback of the device
node fd has already been called.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/daemon.c | 24 +++++++++++++++++++++---
fs/cachefiles/internal.h | 3 +++
fs/cachefiles/ondemand.c | 3 +++
3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 69ca22aa6abf..2e946e4eb65a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -111,6 +111,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
INIT_LIST_HEAD(&cache->volumes);
INIT_LIST_HEAD(&cache->object_list);
spin_lock_init(&cache->object_list_lock);
+ kref_init(&cache->unbind_pincount);
#ifdef CONFIG_CACHEFILES_ONDEMAND
xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
xa_init_flags(&cache->ondemand_ids, XA_FLAGS_ALLOC1);
@@ -157,6 +158,25 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
}
#endif

+static void cachefiles_release_cache(struct kref *kref)
+{
+ struct cachefiles_cache *cache;
+
+ cache = container_of(kref, struct cachefiles_cache, unbind_pincount);
+ cachefiles_daemon_unbind(cache);
+ kfree(cache);
+}
+
+void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache)
+{
+ kref_put(&cache->unbind_pincount, cachefiles_release_cache);
+}
+
+void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache)
+{
+ kref_get(&cache->unbind_pincount);
+}
+
/*
* Release a cache.
*/
@@ -173,14 +193,12 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
#ifdef CONFIG_CACHEFILES_ONDEMAND
cachefiles_flush_reqs(cache);
#endif
- cachefiles_daemon_unbind(cache);
-
/* clean up the control file interface */
cache->cachefilesd = NULL;
file->private_data = NULL;
cachefiles_open = 0;

- kfree(cache);
+ cachefiles_put_unbind_pincount(cache);

_leave("");
return 0;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 8ebe238af20b..9b83d8c82709 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -109,6 +109,7 @@ struct cachefiles_cache {
char *rootdirname; /* name of cache root directory */
char *secctx; /* LSM security context */
char *tag; /* cache binding tag */
+ struct kref unbind_pincount;/* refcount to do daemon unbind */
#ifdef CONFIG_CACHEFILES_ONDEMAND
struct xarray reqs; /* xarray of pending on-demand requests */
struct xarray ondemand_ids; /* xarray for ondemand_id allocation */
@@ -167,6 +168,8 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
* daemon.c
*/
extern const struct file_operations cachefiles_daemon_fops;
+extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache);
+extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);

/*
* error_inject.c
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 890cd3ecc2f0..eec883640efa 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -14,6 +14,7 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
xa_erase(&cache->ondemand_ids, object_id);
cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+ cachefiles_put_unbind_pincount(cache);
return 0;
}

@@ -169,6 +170,8 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
load->fd = fd;
req->msg.object_id = object_id;
object->ondemand_id = object_id;
+
+ cachefiles_get_unbind_pincount(cache);
return 0;

err_put_fd:
--
2.27.0

2022-04-16 02:12:00

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 09/21] erofs: make erofs_map_blocks() generally available

... so that it can be used in the following introduced fscache mode.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
---
fs/erofs/data.c | 4 ++--
fs/erofs/internal.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 780db1e5f4b7..bc22642358ec 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -110,8 +110,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
return 0;
}

-static int erofs_map_blocks(struct inode *inode,
- struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags)
{
struct super_block *sb = inode->i_sb;
struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5298c4ee277d..fe9564e5091e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -486,6 +486,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags);

/* inode.c */
static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
--
2.27.0

2022-04-16 02:19:30

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 17/21] erofs: implement fscache-based metadata read

Implement the data plane of reading metadata from primary data blob
over fscache.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 19 +++++++++++++++----
fs/erofs/fscache.c | 27 +++++++++++++++++++++++++++
2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 14b64d960541..bb9c1fd48c19 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -6,6 +6,7 @@
*/
#include "internal.h"
#include <linux/prefetch.h>
+#include <linux/sched/mm.h>
#include <linux/dax.h>
#include <trace/events/erofs.h>

@@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
erofs_off_t offset = blknr_to_addr(blkaddr);
pgoff_t index = offset >> PAGE_SHIFT;
struct page *page = buf->page;
+ struct folio *folio;
+ unsigned int nofs_flag;

if (!page || page->index != index) {
erofs_put_metabuf(buf);
- page = read_cache_page_gfp(mapping, index,
- mapping_gfp_constraint(mapping, ~__GFP_FS));
- if (IS_ERR(page))
- return page;
+
+ nofs_flag = memalloc_nofs_save();
+ folio = read_cache_folio(mapping, index, NULL, NULL);
+ memalloc_nofs_restore(nofs_flag);
+ if (IS_ERR(folio))
+ return folio;
+
/* should already be PageUptodate, no need to lock page */
+ page = folio_file_page(folio, index);
buf->page = page;
}
if (buf->kmap_type == EROFS_NO_KMAP) {
@@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
erofs_blk_t blkaddr, enum erofs_kmap_type type)
{
+ if (erofs_is_fscache_mode(sb))
+ return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode,
+ blkaddr, type);
+
return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type);
}

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 066f68c062e2..3f00eb34ac35 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -58,7 +58,34 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
return ret;
}

+static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
+{
+ int ret;
+ struct folio *folio = page_folio(page);
+ struct super_block *sb = folio_mapping(folio)->host->i_sb;
+ struct erofs_map_dev mdev = {
+ .m_deviceid = 0,
+ .m_pa = folio_pos(folio),
+ };
+
+ ret = erofs_map_dev(sb, &mdev);
+ if (ret)
+ goto out;
+
+ ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+ folio_mapping(folio), folio_pos(folio),
+ folio_size(folio), mdev.m_pa);
+ if (ret)
+ goto out;
+
+ folio_mark_uptodate(folio);
+out:
+ folio_unlock(folio);
+ return ret;
+}
+
static const struct address_space_operations erofs_fscache_meta_aops = {
+ .readpage = erofs_fscache_meta_readpage,
};

/*
--
2.27.0

2022-04-16 02:30:37

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 15/21] erofs: register fscache context for primary data blob

Registers fscache context for primary data blob. Also move the
initialization of s_op and related fields forward, since anonymous
inode will be allocated under the super block when registering the
fscache context.

Something worth mentioning about the cleanup routine.

1. The fscache context will instantiate anonymous inodes under the super
block. Release these anonymous inodes when .put_super() is called, or
we'll get "VFS: Busy inodes after unmount." warning.

2. The fscache context is initialized prior to the root inode. If
.kill_sb() is called when mount failed, .put_super() won't be called
when root inode has not been initialized yet. Thus .kill_sb() shall
also contain the cleanup routine.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
---
fs/erofs/internal.h | 1 +
fs/erofs/super.c | 15 +++++++++++----
2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5867cb63fd74..386658416159 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -155,6 +155,7 @@ struct erofs_sb_info {

/* fscache support */
struct fscache_volume *volume;
+ struct erofs_fscache *s_fscache;
};

#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index fd8daa447237..61dc900295f9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -589,6 +589,9 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
int err;

sb->s_magic = EROFS_SUPER_MAGIC;
+ sb->s_flags |= SB_RDONLY | SB_NOATIME;
+ sb->s_maxbytes = MAX_LFS_FILESIZE;
+ sb->s_op = &erofs_sops;

sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
@@ -606,6 +609,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
err = erofs_fscache_register_fs(sb);
if (err)
return err;
+
+ err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
+ sbi->opt.fsid, true);
+ if (err)
+ return err;
} else {
if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
erofs_err(sb, "failed to set erofs blksize");
@@ -628,11 +636,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
clear_opt(&sbi->opt, DAX_ALWAYS);
}
}
- sb->s_flags |= SB_RDONLY | SB_NOATIME;
- sb->s_maxbytes = MAX_LFS_FILESIZE;
- sb->s_time_gran = 1;

- sb->s_op = &erofs_sops;
+ sb->s_time_gran = 1;
sb->s_xattr = erofs_xattr_handlers;

if (test_opt(&sbi->opt, POSIX_ACL))
@@ -772,6 +777,7 @@ static void erofs_kill_sb(struct super_block *sb)

erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev);
+ erofs_fscache_unregister_cookie(&sbi->s_fscache);
erofs_fscache_unregister_fs(sb);
kfree(sbi);
sb->s_fs_info = NULL;
@@ -790,6 +796,7 @@ static void erofs_put_super(struct super_block *sb)
iput(sbi->managed_cache);
sbi->managed_cache = NULL;
#endif
+ erofs_fscache_unregister_cookie(&sbi->s_fscache);
}

static struct file_system_type erofs_fs_type = {
--
2.27.0

2022-04-16 02:33:05

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 20/21] erofs: implement fscache-based data readahead

Implement fscache-based data readahead. Also registers an individual
bdi for each erofs instance to enable readahead.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/super.c | 4 +++
2 files changed, 90 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 08849c15500f..eaa50692ddba 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -163,12 +163,98 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
return ret;
}

+static void erofs_fscache_unlock_folios(struct readahead_control *rac,
+ size_t len)
+{
+ while (len) {
+ struct folio *folio = readahead_folio(rac);
+
+ len -= folio_size(folio);
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
+ }
+}
+
+static void erofs_fscache_readahead(struct readahead_control *rac)
+{
+ struct inode *inode = rac->mapping->host;
+ struct super_block *sb = inode->i_sb;
+ size_t len, count, done = 0;
+ erofs_off_t pos;
+ loff_t start, offset;
+ int ret;
+
+ if (!readahead_count(rac))
+ return;
+
+ start = readahead_pos(rac);
+ len = readahead_length(rac);
+
+ do {
+ struct erofs_map_blocks map;
+ struct erofs_map_dev mdev;
+
+ pos = start + done;
+ map.m_la = pos;
+
+ ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+ if (ret)
+ return;
+
+ offset = start + done;
+ count = min_t(size_t, map.m_llen - (pos - map.m_la),
+ len - done);
+
+ if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+ struct iov_iter iter;
+
+ iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
+ offset, count);
+ iov_iter_zero(count, &iter);
+
+ erofs_fscache_unlock_folios(rac, count);
+ ret = count;
+ continue;
+ }
+
+ if (map.m_flags & EROFS_MAP_META) {
+ struct folio *folio = readahead_folio(rac);
+
+ ret = erofs_fscache_readpage_inline(folio, &map);
+ if (!ret) {
+ folio_mark_uptodate(folio);
+ ret = folio_size(folio);
+ }
+
+ folio_unlock(folio);
+ continue;
+ }
+
+ mdev = (struct erofs_map_dev) {
+ .m_deviceid = map.m_deviceid,
+ .m_pa = map.m_pa,
+ };
+ ret = erofs_map_dev(sb, &mdev);
+ if (ret)
+ return;
+
+ ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
+ rac->mapping, offset, count,
+ mdev.m_pa + (pos - map.m_la));
+ if (!ret) {
+ erofs_fscache_unlock_folios(rac, count);
+ ret = count;
+ }
+ } while (ret > 0 && ((done += ret) < len));
+}
+
static const struct address_space_operations erofs_fscache_meta_aops = {
.readpage = erofs_fscache_meta_readpage,
};

const struct address_space_operations erofs_fscache_access_aops = {
.readpage = erofs_fscache_readpage,
+ .readahead = erofs_fscache_readahead,
};

/*
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index c6755bcae4a6..f68ba929100d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -619,6 +619,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sbi->opt.fsid, true);
if (err)
return err;
+
+ err = super_setup_bdi(sb);
+ if (err)
+ return err;
} else {
if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
erofs_err(sb, "failed to set erofs blksize");
--
2.27.0

2022-04-16 02:38:23

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v9 14/21] erofs: add erofs_fscache_read_folios() helper

Add erofs_fscache_read_folios() helper reading from fscache. It supports
on-demand read semantics. That is, it will make the backend prepare for
the data when cache miss. Once data ready, it will read from the cache.

This helper can then be used to implement .readpage()/.readahead() of
on-demand read semantics.

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Gao Xiang <[email protected]>
---
fs/erofs/fscache.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 1c88614203d2..066f68c062e2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -5,6 +5,59 @@
#include <linux/fscache.h>
#include "internal.h"

+/*
+ * Read data from fscache and fill the read data into page cache described by
+ * @start/len, which shall be both aligned with PAGE_SIZE. @pstart describes
+ * the start physical address in the cache file.
+ */
+static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
+ struct address_space *mapping,
+ loff_t start, size_t len,
+ loff_t pstart)
+{
+ enum netfs_io_source source;
+ struct netfs_io_subrequest subreq;
+ struct netfs_io_request rreq;
+ struct netfs_cache_resources *cres = &rreq.cache_resources;
+ struct iov_iter iter;
+ size_t done = 0;
+ int ret;
+
+ memset(&rreq, 0, sizeof(rreq));
+ memset(&subreq, 0, sizeof(subreq));
+ subreq.rreq = &rreq;
+
+ ret = fscache_begin_read_operation(cres, cookie);
+ if (ret)
+ return ret;
+
+ while (done < len) {
+ subreq.start = pstart + done;
+ subreq.len = len - done;
+ subreq.flags = 1 << NETFS_SREQ_ONDEMAND;
+
+ source = cres->ops->prepare_read(&subreq, LLONG_MAX);
+ if (WARN_ON(subreq.len == 0))
+ source = NETFS_INVALID_READ;
+ if (source != NETFS_READ_FROM_CACHE) {
+ ret = -EIO;
+ goto out;
+ }
+
+ iov_iter_xarray(&iter, READ, &mapping->i_pages,
+ start + done, subreq.len);
+ ret = fscache_read(cres, subreq.start, &iter,
+ NETFS_READ_HOLE_FAIL, NULL, NULL);
+ if (ret)
+ goto out;
+
+ done += subreq.len;
+ }
+out:
+ fscache_end_operation(cres);
+ return ret;
+}
+
static const struct address_space_operations erofs_fscache_meta_aops = {
};

--
2.27.0

2022-04-21 07:22:07

by Jia Zhu

[permalink] [raw]
Subject: Re: [PATCH v9 00/21] fscache, erofs: fscache-based on-demand read semantics



在 4/15/22 8:35 PM, Jeffle Xu 写道:
> changes since v8:
> - rebase to 5.18-rc2
> - cachefiles: use object_id rather than anon_fd to uniquely identify a
> cachefile object to avoid potential issues when the user moves the
> anonymous fd around, e.g. through dup() (refer to commit message and
> cachefiles_ondemand_get_fd() of patch 2 for more details)
> (David Howells)
> - cachefiles: add @unbind_pincount refcount to avoid the potential deadlock
> (refer to commit message of patch3 for more details)
> - cachefiles: move the calling site of cachefiles_ondemand_read() from
> cachefiles_read() to cacehfiles_prep_read() (refer to commit message
> of patch 5 for more details)
> - cachefiles: add tracepoints (patch 7) (David Howells)
> - cachefiles: update documentation (patch 8) (David Howells)
> - erofs: update Reviewed-by tag from Gao Xiang
> - erofs: move the logic of initializing bdev/dax_dev in fscache mode out
> from patch 15/20. Instead move it into patch 9, so that patch 20 can
> focus on the mount option handling
> - erofs: update the subject line and commit message of patch 12 (Gao
> Xiang)
> - erofs: remove and fold erofs_fscache_get_folio() helper (patch 16)
> (Gao Xiang)
> - erofs: change kmap() to kamp_loacl_folio(), and comment cleanup (patch
> 18) (Gao Xiang)
> - update "advantage of fscache-based on-demand read" section of the
> cover letter
> - we've finished a preliminary end-to-end on-demand download daemon in
> order to test the fscache on-demand kernel code as a real end-to-end
> workload for container use cases. The test user guide is added in the
> cover letter.
> - Thanks Zichen Tian for testing
> Tested-by: Zichen Tian <[email protected]>
>
>
> Kernel Patchset
> ---------------
> Git tree:
>
> https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v9
>
> Gitweb:
>
> https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v9
>
>
> User Guide for E2E Container Use Case
> -------------------------------------
> User guide:
>
> https://github.com/dragonflyoss/image-service/blob/fscache/docs/nydus-fscache.md
>
> Video:
>
> https://youtu.be/F4IF2_DENXo
>
>
> User Daemon for Quick Test
> --------------------------
> Git tree:
>
> https://github.com/lostjeffle/demand-read-cachefilesd.git main
>
> Gitweb:
>
> https://github.com/lostjeffle/demand-read-cachefilesd
>
>
> RFC: https://lore.kernel.org/all/[email protected]/t/
> v1: https://lore.kernel.org/lkml/[email protected]/T/
> v2: https://lore.kernel.org/all/[email protected]/t/
> v3: https://lore.kernel.org/lkml/[email protected]/T/
> v4: https://lore.kernel.org/lkml/[email protected]/T/#t
> v5: https://lore.kernel.org/lkml/[email protected]/T/
> v6: https://lore.kernel.org/lkml/[email protected]/T/
> v7: https://lore.kernel.org/lkml/[email protected]/T/
> v8: https://lore.kernel.org/all/[email protected]/T/
>
>
> [Background]
> ============
> Nydus [1] is an image distribution service especially optimized for
> distribution over network. Nydus is an excellent container image
> acceleration solution, since it only pulls data from remote when needed,
> a.k.a. on-demand reading and it also supports chunk-based deduplication,
> compression, etc.
>
> erofs (Enhanced Read-Only File System) is a filesystem designed for
> read-only scenarios. (Documentation/filesystem/erofs.rst)
>
> Over the past months we've been focusing on supporting Nydus image service
> with in-kernel erofs format[2]. In that case, each container image will be
> organized in one bootstrap (metadata) and (optional) multiple data blobs in
> erofs format. Massive container images will be stored on one machine.
>
> To accelerate the container startup (fetching container images from remote
> and then start the container), we do hope that the bootstrap & blob files
> could support on-demand read. That is, erofs can be mounted and accessed
> even when the bootstrap/data blob files have not been fully downloaded.
> Then it'll have native performance after data is available locally.
>
> That means we have to manage the cache state of the bootstrap/data blob
> files (if cache hit, read directly from the local cache; if cache miss,
> fetch the data somehow). It would be painful and may be dumb for erofs to
> implement the cache management itself. Thus we prefer fscache/cachefiles
> to do the cache management instead.
>
> The fscache on-demand read feature aims to be implemented in a generic way
> so that it can benefit other use cases and/or filesystems if it's
> implemented in the fscache subsystem.
>
> [1] https://nydus.dev
> [2] https://sched.co/pcdL
>
>
> [Overall Design]
> ================
> Please refer to patch 7 ("cachefiles: document on-demand read mode") for
> more details.
>
> When working in the original mode, cachefiles mainly serves as a local cache
> for remote networking fs, while in on-demand read mode, cachefiles can work
> in the scenario where on-demand read semantics is needed, e.g. container image
> distribution.
>
> The essential difference between these two modes is that, in original mode,
> when cache miss, netfs itself will fetch data from remote, and then write the
> fetched data into cache file. While in on-demand read mode, a user daemon is
> responsible for fetching data and then feeds to the kernel fscache side.
>
> The on-demand read mode relies on a simple protocol used for communication
> between kernel and user daemon.
>
> The proposed implementation relies on the anonymous fd mechanism to avoid
> the dependence on the format of cache file. When a fscache cachefile is opened
> for the first time, an anon_fd associated with the cache file is sent to the
> user daemon. With the given anon_fd, user daemon could fetch and write data
> into the cache file in the background, even when kernel has not triggered the
> cache miss. Besides, the write() syscall to the anon_fd will finally call
> cachefiles kernel module, which will write data to cache file in the latest
> format of cache file.
>
> 1. cache miss
> When cache miss, cachefiles kernel module will notify user daemon with the
> anon_fd, along with the requested file range. When notified, user daemon
> needs to fetch data of the requested file range, and then write the fetched
> data into cache file with the given anonymous fd. When finished processing
> the request, user daemon needs to notify the kernel.
>
> After notifying the user daemon, the kernel read routine will hang there,
> until the request is handled by user daemon. When it's awaken by the
> notification from user daemon, i.e. the corresponding hole has been filled
> by the user daemon, it will retry to read from the same file range.
>
> 2. cache hit
> Once data is already ready in cache file, netfs will read from cache
> file directly.
>
>
> [Advantage of fscache-based on-demand read]
> ========================================
> 1. Asynchronous prefetch
> In current mechanism, fscache is responsible for cache state management,
> while the data plane (fetching data from local/remote on cache miss) is
> done on the user daemon side even without any file system request driven.
> In addition, if cached data has already been available locally, fscache
> will use it instead of trapping to user space anymore.
>
> Therefore, different from event-driven approaches, the fscache on-demand
> user daemon could also fetch data (from remote) asynchronously in the
> background just like most multi-threaded HTTP downloaders.
>
> 2. Flexible request amplification
> Since the data plane can be independently controlled by the user daemon,
> the user daemon can also fetch more data from remote than that the file
> system actually requests for small I/O sizes. Then, fetched data in bulk
> will be available at once and fscache won't be trapped into the user
> daemon again.
>
> 3. Support massive blobs
> This mechanism can naturally support a large amount of backing files,
> and thus can benefit the densely employed scenarios. In our use cases,
> one container image can be formed of one bootstrap (required) and
> multiple chunk-deduplicated data blobs (optional).
>
> For example, one container image for node.js will correspond to ~20
> files in total. In densely employed environment, there could be hundreds
> of containers and thus thousands of backing files on one machine.
>
>
>
>
> Jeffle Xu (21):
> cachefiles: extract write routine
> cachefiles: notify user daemon when looking up cookie
> cachefiles: unbind cachefiles gracefully in on-demand mode
> cachefiles: notify user daemon when withdrawing cookie
> cachefiles: implement on-demand read
> cachefiles: enable on-demand read mode
> cachefiles: add tracepoints for on-demand read mode
> cachefiles: document on-demand read mode
> erofs: make erofs_map_blocks() generally available
> erofs: add fscache mode check helper
> erofs: register fscache volume
> erofs: add fscache context helper functions
> erofs: add anonymous inode caching metadata for data blobs
> erofs: add erofs_fscache_read_folios() helper
> erofs: register fscache context for primary data blob
> erofs: register fscache context for extra data blobs
> erofs: implement fscache-based metadata read
> erofs: implement fscache-based data read for non-inline layout
> erofs: implement fscache-based data read for inline layout
> erofs: implement fscache-based data readahead
> erofs: add 'fsid' mount option
>
> .../filesystems/caching/cachefiles.rst | 170 ++++++
> fs/cachefiles/Kconfig | 11 +
> fs/cachefiles/Makefile | 1 +
> fs/cachefiles/daemon.c | 116 +++-
> fs/cachefiles/interface.c | 2 +
> fs/cachefiles/internal.h | 74 +++
> fs/cachefiles/io.c | 76 ++-
> fs/cachefiles/namei.c | 16 +-
> fs/cachefiles/ondemand.c | 496 ++++++++++++++++++
> fs/erofs/Kconfig | 10 +
> fs/erofs/Makefile | 1 +
> fs/erofs/data.c | 26 +-
> fs/erofs/fscache.c | 365 +++++++++++++
> fs/erofs/inode.c | 4 +
> fs/erofs/internal.h | 49 ++
> fs/erofs/super.c | 105 +++-
> fs/erofs/sysfs.c | 4 +-
> include/linux/fscache.h | 1 +
> include/linux/netfs.h | 2 +
> include/trace/events/cachefiles.h | 176 +++++++
> include/uapi/linux/cachefiles.h | 68 +++
> 21 files changed, 1694 insertions(+), 79 deletions(-)
> create mode 100644 fs/cachefiles/ondemand.c
> create mode 100644 fs/erofs/fscache.c
> create mode 100644 include/uapi/linux/cachefiles.h
>
Hi Jeffle & Xiang,

Thanks for coming up with such an innovative solution. We interested in
this and want to deploy it in our system. So we have performed the tests
by user guide and did some error injection tests using User Daemon Demo
offered by Jeffle. Hope it can be an upstream feature.

Thanks,
Jia

Tested-by: Jia Zhu <[email protected]>

2022-04-21 19:02:39

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v9 20/21] erofs: implement fscache-based data readahead

On Fri, Apr 15, 2022 at 08:36:13PM +0800, Jeffle Xu wrote:
> Implement fscache-based data readahead. Also registers an individual
> bdi for each erofs instance to enable readahead.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/fscache.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/super.c | 4 +++
> 2 files changed, 90 insertions(+)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 08849c15500f..eaa50692ddba 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -163,12 +163,98 @@ static int erofs_fscache_readpage(struct file *file, struct page *page)
> return ret;
> }
>
> +static void erofs_fscache_unlock_folios(struct readahead_control *rac,
> + size_t len)
> +{
> + while (len) {
> + struct folio *folio = readahead_folio(rac);
> +
> + len -= folio_size(folio);
> + folio_mark_uptodate(folio);
> + folio_unlock(folio);
> + }
> +}
> +
> +static void erofs_fscache_readahead(struct readahead_control *rac)
> +{
> + struct inode *inode = rac->mapping->host;
> + struct super_block *sb = inode->i_sb;
> + size_t len, count, done = 0;
> + erofs_off_t pos;
> + loff_t start, offset;
> + int ret;
> +
> + if (!readahead_count(rac))
> + return;
> +
> + start = readahead_pos(rac);
> + len = readahead_length(rac);
> +
> + do {
> + struct erofs_map_blocks map;
> + struct erofs_map_dev mdev;
> +
> + pos = start + done;
> + map.m_la = pos;
> +
> + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> + if (ret)
> + return;
> +
> + offset = start + done;
> + count = min_t(size_t, map.m_llen - (pos - map.m_la),
> + len - done);
> +
> + if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> + struct iov_iter iter;
> +
> + iov_iter_xarray(&iter, READ, &rac->mapping->i_pages,
> + offset, count);
> + iov_iter_zero(count, &iter);
> +
> + erofs_fscache_unlock_folios(rac, count);
> + ret = count;
> + continue;
> + }
> +
> + if (map.m_flags & EROFS_MAP_META) {
> + struct folio *folio = readahead_folio(rac);
> +
> + ret = erofs_fscache_readpage_inline(folio, &map);
> + if (!ret) {
> + folio_mark_uptodate(folio);
> + ret = folio_size(folio);
> + }
> +
> + folio_unlock(folio);
> + continue;
> + }
> +
> + mdev = (struct erofs_map_dev) {
> + .m_deviceid = map.m_deviceid,
> + .m_pa = map.m_pa,
> + };
> + ret = erofs_map_dev(sb, &mdev);
> + if (ret)
> + return;
> +
> + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> + rac->mapping, offset, count,
> + mdev.m_pa + (pos - map.m_la));
> + if (!ret) {
> + erofs_fscache_unlock_folios(rac, count);
> + ret = count;
> + }

I think this really needs a comment why we don't need to unlock folios
for the error cases.

Thanks,
Gao Xiang

> + } while (ret > 0 && ((done += ret) < len));
> +}
> +

2022-04-21 19:32:21

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v9 17/21] erofs: implement fscache-based metadata read

On Fri, Apr 15, 2022 at 08:36:10PM +0800, Jeffle Xu wrote:
> Implement the data plane of reading metadata from primary data blob
> over fscache.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/data.c | 19 +++++++++++++++----
> fs/erofs/fscache.c | 27 +++++++++++++++++++++++++++
> 2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 14b64d960541..bb9c1fd48c19 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -6,6 +6,7 @@
> */
> #include "internal.h"
> #include <linux/prefetch.h>
> +#include <linux/sched/mm.h>
> #include <linux/dax.h>
> #include <trace/events/erofs.h>
>
> @@ -35,14 +36,20 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
> erofs_off_t offset = blknr_to_addr(blkaddr);
> pgoff_t index = offset >> PAGE_SHIFT;
> struct page *page = buf->page;
> + struct folio *folio;
> + unsigned int nofs_flag;
>
> if (!page || page->index != index) {
> erofs_put_metabuf(buf);
> - page = read_cache_page_gfp(mapping, index,
> - mapping_gfp_constraint(mapping, ~__GFP_FS));
> - if (IS_ERR(page))
> - return page;
> +
> + nofs_flag = memalloc_nofs_save();
> + folio = read_cache_folio(mapping, index, NULL, NULL);
> + memalloc_nofs_restore(nofs_flag);
> + if (IS_ERR(folio))
> + return folio;
> +
> /* should already be PageUptodate, no need to lock page */
> + page = folio_file_page(folio, index);
> buf->page = page;
> }
> if (buf->kmap_type == EROFS_NO_KMAP) {
> @@ -63,6 +70,10 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
> void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
> erofs_blk_t blkaddr, enum erofs_kmap_type type)
> {
> + if (erofs_is_fscache_mode(sb))
> + return erofs_bread(buf, EROFS_SB(sb)->s_fscache->inode,
> + blkaddr, type);
> +
> return erofs_bread(buf, sb->s_bdev->bd_inode, blkaddr, type);
> }
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 066f68c062e2..3f00eb34ac35 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -58,7 +58,34 @@ static int erofs_fscache_read_folios(struct fscache_cookie *cookie,
> return ret;
> }
>
> +static int erofs_fscache_meta_readpage(struct file *data, struct page *page)
> +{
> + int ret;
> + struct folio *folio = page_folio(page);
> + struct super_block *sb = folio_mapping(folio)->host->i_sb;
> + struct erofs_map_dev mdev = {
> + .m_deviceid = 0,
> + .m_pa = folio_pos(folio),
> + };
> +
> + ret = erofs_map_dev(sb, &mdev);
> + if (ret)
> + goto out;
> +
> + ret = erofs_fscache_read_folios(mdev.m_fscache->cookie,
> + folio_mapping(folio), folio_pos(folio),
> + folio_size(folio), mdev.m_pa);
> + if (ret)
> + goto out;
> +
> + folio_mark_uptodate(folio);

if (!ret)
folio_mark_uptodate(folio);

Thanks,
Gao Xiang

> +out:
> + folio_unlock(folio);
> + return ret;
> +}
> +
> static const struct address_space_operations erofs_fscache_meta_aops = {
> + .readpage = erofs_fscache_meta_readpage,
> };
>
> /*
> --
> 2.27.0

2022-04-22 02:48:56

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode



On 4/21/22 10:02 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> + struct kref unbind_pincount;/* refcount to do daemon unbind */
>
> Please use refcount_t or atomic_t, especially as this isn't the refcount for
> the structure.

Okay, will be done in the next version.

>
>> - cachefiles_daemon_unbind(cache);
>> -
>> /* clean up the control file interface */
>> cache->cachefilesd = NULL;
>> file->private_data = NULL;
>> cachefiles_open = 0;
>
> Please call cachefiles_daemon_unbind() before the cleanup.

Since the cachefiles_struct struct will be freed once the pincount is
decreased to 0, "cache->cachefilesd = NULL;" needs to be done before
decreasing the pincount. BTW, "cachefiles_open = 0;" indeed should be
done only when pincount has been decreased to 0.


--
Thanks,
Jeffle

2022-04-22 18:12:25

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v9 06/21] cachefiles: enable on-demand read mode

Jeffle Xu <[email protected]> wrote:

> + if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
> + !strcmp(args, "ondemand")) {
> + set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
> + } else if (*args) {
> + pr_err("'bind' command doesn't take an argument\n");

The error message isn't true if CONFIG_CACHEFILES_ONDEMAND=y. It would be
better to say "Invalid argument to the 'bind' command".

> -retry:
> /* If the caller asked us to seek for data before doing the read, then
> * we should do that now. If we find a gap, we fill it with zeros.
> */
> @@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
> if (read_hole == NETFS_READ_HOLE_FAIL)
> goto presubmission_error;
>
> - if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
> - ret = cachefiles_ondemand_read(object, off, len);
> - if (ret)
> - goto presubmission_error;
> -
> - /* fail the read if no progress achieved */
> - read_hole = NETFS_READ_HOLE_FAIL;
> - goto retry;
> - }
> -

Unexplained deletion of newly added code.

David

2022-04-22 19:21:33

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v9 16/21] erofs: register fscache context for extra data blobs

On Fri, Apr 15, 2022 at 08:36:09PM +0800, Jeffle Xu wrote:
> Similar to the multi device mode, erofs could be mounted from one
> primary data blob (mandatory) and multiple extra data blobs (optional).
>
> Register fscache context for each extra data blob.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/data.c | 3 +++
> fs/erofs/internal.h | 2 ++
> fs/erofs/super.c | 8 +++++++-
> 3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index bc22642358ec..14b64d960541 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -199,6 +199,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> map->m_bdev = sb->s_bdev;
> map->m_daxdev = EROFS_SB(sb)->dax_dev;
> map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
> + map->m_fscache = EROFS_SB(sb)->s_fscache;
>
> if (map->m_deviceid) {
> down_read(&devs->rwsem);
> @@ -210,6 +211,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> map->m_bdev = dif->bdev;
> map->m_daxdev = dif->dax_dev;
> map->m_dax_part_off = dif->dax_part_off;
> + map->m_fscache = dif->fscache;
> up_read(&devs->rwsem);
> } else if (devs->extra_devices) {
> down_read(&devs->rwsem);
> @@ -227,6 +229,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> map->m_bdev = dif->bdev;
> map->m_daxdev = dif->dax_dev;
> map->m_dax_part_off = dif->dax_part_off;
> + map->m_fscache = dif->fscache;
> break;
> }
> }
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 386658416159..fa488af8dfcf 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -49,6 +49,7 @@ typedef u32 erofs_blk_t;
>
> struct erofs_device_info {
> char *path;
> + struct erofs_fscache *fscache;
> struct block_device *bdev;
> struct dax_device *dax_dev;
> u64 dax_part_off;
> @@ -482,6 +483,7 @@ static inline int z_erofs_map_blocks_iter(struct inode *inode,
> #endif /* !CONFIG_EROFS_FS_ZIP */
>
> struct erofs_map_dev {
> + struct erofs_fscache *m_fscache;
> struct block_device *m_bdev;
> struct dax_device *m_daxdev;
> u64 m_dax_part_off;
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 61dc900295f9..c6755bcae4a6 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -259,7 +259,12 @@ static int erofs_init_devices(struct super_block *sb,
> }
> dis = ptr + erofs_blkoff(pos);
>
> - if (!erofs_is_fscache_mode(sb)) {
> + if (erofs_is_fscache_mode(sb)) {
> + err = erofs_fscache_register_cookie(sb, &dif->fscache,
> + dif->path, false);
> + if (err)
> + break;
> + } else {
> bdev = blkdev_get_by_path(dif->path,
> FMODE_READ | FMODE_EXCL,
> sb->s_type);
> @@ -710,6 +715,7 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
> fs_put_dax(dif->dax_dev);
> if (dif->bdev)
> blkdev_put(dif->bdev, FMODE_READ | FMODE_EXCL);
> + erofs_fscache_unregister_cookie(&dif->fscache);
> kfree(dif->path);
> kfree(dif);
> return 0;
> --
> 2.27.0

2022-04-22 20:31:50

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v9 03/21] cachefiles: unbind cachefiles gracefully in on-demand mode

Jeffle Xu <[email protected]> wrote:

> + struct kref unbind_pincount;/* refcount to do daemon unbind */

Please use refcount_t or atomic_t, especially as this isn't the refcount for
the structure.

> - cachefiles_daemon_unbind(cache);
> -
> /* clean up the control file interface */
> cache->cachefilesd = NULL;
> file->private_data = NULL;
> cachefiles_open = 0;

Please call cachefiles_daemon_unbind() before the cleanup.

David

2022-04-22 22:11:08

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v9 21/21] erofs: add 'fsid' mount option

On Fri, Apr 15, 2022 at 08:36:14PM +0800, Jeffle Xu wrote:
> Introduce 'fsid' mount option to enable on-demand read sementics, in
> which case, erofs will be mounted from data blobs. Users could specify
> the name of primary data blob by this mount option.
>
> Signed-off-by: Jeffle Xu <[email protected]>

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

> ---
> fs/erofs/super.c | 31 ++++++++++++++++++++++++++++++-
> fs/erofs/sysfs.c | 4 ++--
> 2 files changed, 32 insertions(+), 3 deletions(-)
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index f68ba929100d..4a623630e1c4 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -371,6 +371,8 @@ static int erofs_read_superblock(struct super_block *sb)
>
> if (erofs_sb_has_ztailpacking(sbi))
> erofs_info(sb, "EXPERIMENTAL compressed inline data feature in use. Use at your own risk!");
> + if (erofs_is_fscache_mode(sb))
> + erofs_info(sb, "EXPERIMENTAL fscache-based on-demand read feature in use. Use at your own risk!");
> out:
> erofs_put_metabuf(&buf);
> return ret;
> @@ -399,6 +401,7 @@ enum {
> Opt_dax,
> Opt_dax_enum,
> Opt_device,
> + Opt_fsid,
> Opt_err
> };
>
> @@ -423,6 +426,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
> fsparam_flag("dax", Opt_dax),
> fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
> fsparam_string("device", Opt_device),
> + fsparam_string("fsid", Opt_fsid),
> {}
> };
>
> @@ -518,6 +522,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
> }
> ++ctx->devs->extra_devices;
> break;
> + case Opt_fsid:
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> + kfree(ctx->opt.fsid);
> + ctx->opt.fsid = kstrdup(param->string, GFP_KERNEL);
> + if (!ctx->opt.fsid)
> + return -ENOMEM;
> +#else
> + errorfc(fc, "fsid option not supported");
> +#endif
> + break;
> default:
> return -ENOPARAM;
> }
> @@ -604,6 +618,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>
> sb->s_fs_info = sbi;
> sbi->opt = ctx->opt;
> + ctx->opt.fsid = NULL;
> sbi->devs = ctx->devs;
> ctx->devs = NULL;
>
> @@ -690,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>
> static int erofs_fc_get_tree(struct fs_context *fc)
> {
> + struct erofs_fs_context *ctx = fc->fs_private;
> +
> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
> + return get_tree_nodev(fc, erofs_fc_fill_super);
> +
> return get_tree_bdev(fc, erofs_fc_fill_super);
> }
>
> @@ -739,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
> struct erofs_fs_context *ctx = fc->fs_private;
>
> erofs_free_dev_context(ctx->devs);
> + kfree(ctx->opt.fsid);
> kfree(ctx);
> }
>
> @@ -779,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)
>
> WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
>
> - kill_block_super(sb);
> + if (erofs_is_fscache_mode(sb))
> + generic_shutdown_super(sb);
> + else
> + kill_block_super(sb);
>
> sbi = EROFS_SB(sb);
> if (!sbi)
> @@ -789,6 +813,7 @@ static void erofs_kill_sb(struct super_block *sb)
> fs_put_dax(sbi->dax_dev);
> erofs_fscache_unregister_cookie(&sbi->s_fscache);
> erofs_fscache_unregister_fs(sb);
> + kfree(sbi->opt.fsid);
> kfree(sbi);
> sb->s_fs_info = NULL;
> }
> @@ -938,6 +963,10 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
> seq_puts(seq, ",dax=always");
> if (test_opt(opt, DAX_NEVER))
> seq_puts(seq, ",dax=never");
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> + if (opt->fsid)
> + seq_printf(seq, ",fsid=%s", opt->fsid);
> +#endif
> return 0;
> }
>
> diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
> index f3babf1e6608..c1383e508bbe 100644
> --- a/fs/erofs/sysfs.c
> +++ b/fs/erofs/sysfs.c
> @@ -205,8 +205,8 @@ int erofs_register_sysfs(struct super_block *sb)
>
> sbi->s_kobj.kset = &erofs_root;
> init_completion(&sbi->s_kobj_unregister);
> - err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL,
> - "%s", sb->s_id);
> + err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
> + erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
> if (err)
> goto put_sb_kobj;
> return 0;
> --
> 2.27.0

2022-04-22 22:42:52

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v9 06/21] cachefiles: enable on-demand read mode



On 4/21/22 10:17 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> + if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
>> + !strcmp(args, "ondemand")) {
>> + set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
>> + } else if (*args) {
>> + pr_err("'bind' command doesn't take an argument\n");
>
> The error message isn't true if CONFIG_CACHEFILES_ONDEMAND=y. It would be
> better to say "Invalid argument to the 'bind' command".

Right. Or users may gets confused then. Will be fixed in the next version.

>
>> -retry:
>> /* If the caller asked us to seek for data before doing the read, then
>> * we should do that now. If we find a gap, we fill it with zeros.
>> */
>> @@ -120,16 +119,6 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
>> if (read_hole == NETFS_READ_HOLE_FAIL)
>> goto presubmission_error;
>>
>> - if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
>> - ret = cachefiles_ondemand_read(object, off, len);
>> - if (ret)
>> - goto presubmission_error;
>> -
>> - /* fail the read if no progress achieved */
>> - read_hole = NETFS_READ_HOLE_FAIL;
>> - goto retry;
>> - }
>> -
>

Sorry, it's my mistake when doing "git rebase". The previous version
(v8) actually calls cachefiles_ondemand_read() in cachefiles_read().
However as explained in the commit message of patch 5 ("cachefiles:
implement on-demand read"), fscache_read() can only detect if the
requested file range is fully cache miss, whilst it can't detect if it
is partial cache miss, i.e. there's a hole inside the requested file range.

Thus in this patchset (v9), we move the entry of calling
cachefiles_ondemand_read() from cachefiles_read() to
cachefiles_prepare_read(). The above "deletion of newly added code" is
actually reverting the previous change to cachefiles_read(). It was
mistakenly merged to this patch when I was doing "git rebase"...
Actually it should be merged to patch 5 ("cachefiles: implement
on-demand read"), which initially introduce the change to cachefiles_read().

Apologize for the careless mistake...


--
Thanks,
Jeffle