Changes since V4:
0. Relinquish the volume after mutex unlock in erofs_fscache_domain_put().
1. Use kill_anon_super() instead of kill_litter_super() to umount pseudo mnt.
2. Extract erofs_fscache_relinquish_cookie() to reduce lines.
3. Add code comments.
4. Remove useless local variable initialization.
5. Add "Fixes" line to patch 1.
6. Add Reviewed-by lines from Jingbo Xu.
[Kernel Patchset]
===============
Git tree:
https://github.com/userzj/linux.git zhujia/shared-domain-v5
Git web:
https://github.com/userzj/linux/tree/zhujia/shared-domain-v5
[User Daemon for Quick Test]
============================
Git web:
https://github.com/userzj/demand-read-cachefilesd/tree/shared-domain
More test cases will be added to:
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/log/?h=experimental-tests-fscache
[E2E Container Demo for Quick Test]
===================================
[Issue]
https://github.com/containerd/nydus-snapshotter/issues/161
[PR]
https://github.com/containerd/nydus-snapshotter/pull/162
[Background]
============
In ondemand read mode, we use individual volume to present an erofs
mountpoint, cookies to present bootstrap and data blobs.
In which case, since cookies can't be shared between fscache volumes,
even if the data blobs between different mountpoints are exactly same,
they can't be shared.
[Introduction]
==============
Here we introduce erofs shared domain to resolve above mentioned case.
Several erofs filesystems can belong to one domain, and data blobs can
be shared among these erofs filesystems of same domain.
[Usage]
Users could specify 'domain_id' mount option to create or join into a
domain which reuses the same cookies(blobs).
[Design]
========
1. Use pseudo mnt to manage domain's lifecycle.
2. Use a linked list to maintain & traverse domains.
3. Use pseudo sb to create anonymous inode for recording cookie's info
and manage cookies lifecycle.
[Flow Path]
===========
1. User specify a new 'domain_id' in mount option.
1.1 Traverse domain list, compare domain_id with existing domain.[Miss]
1.2 Create a new domain(volume), add it to domain list.
1.3 Traverse pseudo sb's inode list, compare cookie name with
existing cookies.[Miss]
1.4 Alloc new anonymous inodes and cookies.
2. User specify an existing 'domain_id' in mount option and the data
blob is existed in domain.
2.1 Traverse domain list, compare domain_id with existing domain.[Hit]
2.2 Reuse the domain and increase its refcnt.
2.3 Traverse pseudo sb's inode list, compare cookie name with
existing cookies.[Hit]
2.4 Reuse the cookie and increase its refcnt.
RFC: https://lore.kernel.org/all/YxAlO%[email protected]/
V1: https://lore.kernel.org/all/[email protected]/
V2: https://lore.kernel.org/all/[email protected]/
V3: https://lore.kernel.org/all/[email protected]/
V4: https://lore.kernel.org/all/[email protected]/
Jia Zhu (6):
erofs: use kill_anon_super() to kill super in fscache mode
erofs: code clean up for fscache
erofs: introduce fscache-based domain
erofs: introduce a pseudo mnt to manage shared cookies
erofs: Support sharing cookies in the same domain
erofs: introduce 'domain_id' mount option
fs/erofs/fscache.c | 264 ++++++++++++++++++++++++++++++++++++++------
fs/erofs/internal.h | 32 ++++--
fs/erofs/super.c | 73 +++++++++---
fs/erofs/sysfs.c | 19 +++-
4 files changed, 325 insertions(+), 63 deletions(-)
--
2.20.1
Use kill_anon_super() instead of generic_shutdown_super() since the
mount() in erofs fscache mode uses get_tree_nodev() and associated
anon bdev needs to be freed.
Fixes: 9c0cc9c729657 ("erofs: add 'fsid' mount option")
Suggested-by: Jingbo Xu <[email protected]>
Signed-off-by: Jia Zhu <[email protected]>
Reviewed-by: Jingbo Xu <[email protected]>
---
fs/erofs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 3173debeaa5a..9716d355a63e 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -879,7 +879,7 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
if (erofs_is_fscache_mode(sb))
- generic_shutdown_super(sb);
+ kill_anon_super(sb);
else
kill_block_super(sb);
--
2.20.1
Introduce 'domain_id' mount option to enable shared domain sementics.
In which case, the related cookie is shared if two mountpoints in the
same domain have the same data blob. Users could specify the name of
domain by this mount option.
Signed-off-by: Jia Zhu <[email protected]>
Reviewed-by: Jingbo Xu <[email protected]>
---
fs/erofs/super.c | 17 +++++++++++++++++
fs/erofs/sysfs.c | 19 +++++++++++++++++--
2 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index ab746181ae08..9f7fe6c04e65 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -440,6 +440,7 @@ enum {
Opt_dax_enum,
Opt_device,
Opt_fsid,
+ Opt_domain_id,
Opt_err
};
@@ -465,6 +466,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
fsparam_string("device", Opt_device),
fsparam_string("fsid", Opt_fsid),
+ fsparam_string("domain_id", Opt_domain_id),
{}
};
@@ -568,6 +570,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
return -ENOMEM;
#else
errorfc(fc, "fsid option not supported");
+#endif
+ break;
+ case Opt_domain_id:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ kfree(ctx->opt.domain_id);
+ ctx->opt.domain_id = kstrdup(param->string, GFP_KERNEL);
+ if (!ctx->opt.domain_id)
+ return -ENOMEM;
+#else
+ errorfc(fc, "domain_id option not supported");
#endif
break;
default:
@@ -702,6 +714,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
ctx->opt.fsid = NULL;
+ ctx->opt.domain_id = NULL;
sbi->devs = ctx->devs;
ctx->devs = NULL;
@@ -846,6 +859,7 @@ static void erofs_fc_free(struct fs_context *fc)
erofs_free_dev_context(ctx->devs);
kfree(ctx->opt.fsid);
+ kfree(ctx->opt.domain_id);
kfree(ctx);
}
@@ -916,6 +930,7 @@ static void erofs_kill_sb(struct super_block *sb)
fs_put_dax(sbi->dax_dev, NULL);
erofs_fscache_unregister_fs(sb);
kfree(sbi->opt.fsid);
+ kfree(sbi->opt.domain_id);
kfree(sbi);
sb->s_fs_info = NULL;
}
@@ -1068,6 +1083,8 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
#ifdef CONFIG_EROFS_FS_ONDEMAND
if (opt->fsid)
seq_printf(seq, ",fsid=%s", opt->fsid);
+ if (opt->domain_id)
+ seq_printf(seq, ",domain_id=%s", opt->domain_id);
#endif
return 0;
}
diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
index c1383e508bbe..341fb43ad587 100644
--- a/fs/erofs/sysfs.c
+++ b/fs/erofs/sysfs.c
@@ -201,12 +201,27 @@ static struct kobject erofs_feat = {
int erofs_register_sysfs(struct super_block *sb)
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
+ char *name;
+ char *str = NULL;
int err;
+ if (erofs_is_fscache_mode(sb)) {
+ if (sbi->opt.domain_id) {
+ str = kasprintf(GFP_KERNEL, "%s,%s", sbi->opt.domain_id,
+ sbi->opt.fsid);
+ if (!str)
+ return -ENOMEM;
+ name = str;
+ } else {
+ name = sbi->opt.fsid;
+ }
+ } else {
+ name = sb->s_id;
+ }
sbi->s_kobj.kset = &erofs_root;
init_completion(&sbi->s_kobj_unregister);
- err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
- erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
+ err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s", name);
+ kfree(str);
if (err)
goto put_sb_kobj;
return 0;
--
2.20.1
Use a pseudo mnt to manage shared cookies.
Signed-off-by: Jia Zhu <[email protected]>
Reviewed-by: Jingbo Xu <[email protected]>
---
fs/erofs/fscache.c | 13 +++++++++++++
fs/erofs/internal.h | 1 +
fs/erofs/super.c | 33 +++++++++++++++++++++++++++++++--
3 files changed, 45 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 9c82284e66ee..4a7346b9fa73 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -8,6 +8,7 @@
static DEFINE_MUTEX(erofs_domain_list_lock);
static LIST_HEAD(erofs_domain_list);
+static struct vfsmount *erofs_pseudo_mnt;
static struct netfs_io_request *erofs_fscache_alloc_request(struct address_space *mapping,
loff_t start, size_t len)
@@ -428,6 +429,10 @@ static void erofs_fscache_domain_put(struct erofs_domain *domain)
mutex_lock(&erofs_domain_list_lock);
if (refcount_dec_and_test(&domain->ref)) {
list_del(&domain->list);
+ if (list_empty(&erofs_domain_list)) {
+ kern_unmount(erofs_pseudo_mnt);
+ erofs_pseudo_mnt = NULL;
+ }
mutex_unlock(&erofs_domain_list_lock);
fscache_relinquish_volume(domain->volume, NULL, false);
kfree(domain->domain_id);
@@ -482,6 +487,14 @@ static int erofs_fscache_init_domain(struct super_block *sb)
if (err)
goto out;
+ if (!erofs_pseudo_mnt) {
+ erofs_pseudo_mnt = kern_mount(&erofs_fs_type);
+ if (IS_ERR(erofs_pseudo_mnt)) {
+ err = PTR_ERR(erofs_pseudo_mnt);
+ goto out;
+ }
+ }
+
domain->volume = sbi->volume;
refcount_set(&domain->ref, 1);
list_add(&domain->list, &erofs_domain_list);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4c11313a072f..273fb35170e2 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -402,6 +402,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
}
extern const struct super_operations erofs_sops;
+extern struct file_system_type erofs_fs_type;
extern const struct address_space_operations erofs_raw_access_aops;
extern const struct address_space_operations z_erofs_aops;
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 884e7ed3d760..ab746181ae08 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -676,6 +676,13 @@ static const struct export_operations erofs_export_ops = {
.get_parent = erofs_get_parent,
};
+static int erofs_fc_fill_pseudo_super(struct super_block *sb, struct fs_context *fc)
+{
+ static const struct tree_descr empty_descr = {""};
+
+ return simple_fill_super(sb, EROFS_SUPER_MAGIC, &empty_descr);
+}
+
static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
{
struct inode *inode;
@@ -776,6 +783,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
return 0;
}
+static int erofs_fc_anon_get_tree(struct fs_context *fc)
+{
+ return get_tree_nodev(fc, erofs_fc_fill_pseudo_super);
+}
+
static int erofs_fc_get_tree(struct fs_context *fc)
{
struct erofs_fs_context *ctx = fc->fs_private;
@@ -844,10 +856,21 @@ static const struct fs_context_operations erofs_context_ops = {
.free = erofs_fc_free,
};
+static const struct fs_context_operations erofs_anon_context_ops = {
+ .get_tree = erofs_fc_anon_get_tree,
+};
+
static int erofs_init_fs_context(struct fs_context *fc)
{
- struct erofs_fs_context *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ struct erofs_fs_context *ctx;
+
+ /* pseudo mount for anon inodes */
+ if (fc->sb_flags & SB_KERNMOUNT) {
+ fc->ops = &erofs_anon_context_ops;
+ return 0;
+ }
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL);
@@ -874,6 +897,12 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
+ /* pseudo mount for anon inodes */
+ if (sb->s_flags & SB_KERNMOUNT) {
+ kill_anon_super(sb);
+ return;
+ }
+
if (erofs_is_fscache_mode(sb))
kill_anon_super(sb);
else
@@ -907,7 +936,7 @@ static void erofs_put_super(struct super_block *sb)
erofs_fscache_unregister_fs(sb);
}
-static struct file_system_type erofs_fs_type = {
+struct file_system_type erofs_fs_type = {
.owner = THIS_MODULE,
.name = "erofs",
.init_fs_context = erofs_init_fs_context,
--
2.20.1
Several erofs filesystems can belong to one domain, and data blobs can
be shared among these erofs filesystems of same domain.
Users could specify domain_id mount option to create or join into a
domain.
Signed-off-by: Jia Zhu <[email protected]>
---
fs/erofs/fscache.c | 99 ++++++++++++++++++++++++++++++++++++++++++---
fs/erofs/internal.h | 3 ++
2 files changed, 96 insertions(+), 6 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 4a7346b9fa73..d52d3d0ce9af 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -7,6 +7,7 @@
#include "internal.h"
static DEFINE_MUTEX(erofs_domain_list_lock);
+static DEFINE_MUTEX(erofs_domain_cookies_lock);
static LIST_HEAD(erofs_domain_list);
static struct vfsmount *erofs_pseudo_mnt;
@@ -527,8 +528,8 @@ static int erofs_fscache_register_domain(struct super_block *sb)
return err;
}
-struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
- char *name, bool need_inode)
+struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb,
+ char *name, bool need_inode)
{
struct fscache_volume *volume = EROFS_SB(sb)->volume;
struct erofs_fscache *ctx;
@@ -577,17 +578,103 @@ struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
return ERR_PTR(ret);
}
-void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx)
+static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx)
{
- if (!ctx)
- return;
-
fscache_unuse_cookie(ctx->cookie, NULL, NULL);
fscache_relinquish_cookie(ctx->cookie, false);
iput(ctx->inode);
+ kfree(ctx->name);
kfree(ctx);
}
+static
+struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb,
+ char *name, bool need_inode)
+{
+ int err;
+ struct inode *inode;
+ struct erofs_fscache *ctx;
+ struct erofs_domain *domain = EROFS_SB(sb)->domain;
+
+ ctx = erofs_fscache_acquire_cookie(sb, name, need_inode);
+ if (IS_ERR(ctx))
+ return ctx;
+
+ ctx->name = kstrdup(name, GFP_KERNEL);
+ if (!ctx->name) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ inode = new_inode(erofs_pseudo_mnt->mnt_sb);
+ if (!inode) {
+ kfree(ctx->name);
+ err = -ENOMEM;
+ goto out;
+ }
+
+ ctx->domain = domain;
+ ctx->anon_inode = inode;
+ inode->i_private = ctx;
+ refcount_inc(&domain->ref);
+ return ctx;
+out:
+ erofs_fscache_relinquish_cookie(ctx);
+ return ERR_PTR(err);
+}
+
+static
+struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb,
+ char *name, bool need_inode)
+{
+ struct inode *inode;
+ struct erofs_fscache *ctx;
+ struct erofs_domain *domain = EROFS_SB(sb)->domain;
+ struct super_block *psb = erofs_pseudo_mnt->mnt_sb;
+
+ mutex_lock(&erofs_domain_cookies_lock);
+ list_for_each_entry(inode, &psb->s_inodes, i_sb_list) {
+ ctx = inode->i_private;
+ if (!ctx || ctx->domain != domain || strcmp(ctx->name, name))
+ continue;
+ igrab(inode);
+ mutex_unlock(&erofs_domain_cookies_lock);
+ return ctx;
+ }
+ ctx = erofs_fscache_domain_init_cookie(sb, name, need_inode);
+ mutex_unlock(&erofs_domain_cookies_lock);
+ return ctx;
+}
+
+struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
+ char *name, bool need_inode)
+{
+ if (EROFS_SB(sb)->opt.domain_id)
+ return erofs_domain_register_cookie(sb, name, need_inode);
+ return erofs_fscache_acquire_cookie(sb, name, need_inode);
+}
+
+void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx)
+{
+ bool drop;
+ struct erofs_domain *domain;
+
+ if (!ctx)
+ return;
+ domain = ctx->domain;
+ if (domain) {
+ mutex_lock(&erofs_domain_cookies_lock);
+ drop = atomic_read(&ctx->anon_inode->i_count) == 1;
+ iput(ctx->anon_inode);
+ mutex_unlock(&erofs_domain_cookies_lock);
+ if (!drop)
+ return;
+ }
+
+ erofs_fscache_relinquish_cookie(ctx);
+ erofs_fscache_domain_put(domain);
+}
+
int erofs_fscache_register_fs(struct super_block *sb)
{
int ret;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 273fb35170e2..0f63830c9056 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -109,6 +109,9 @@ struct erofs_domain {
struct erofs_fscache {
struct fscache_cookie *cookie;
struct inode *inode;
+ struct inode *anon_inode;
+ struct erofs_domain *domain;
+ char *name;
};
struct erofs_sb_info {
--
2.20.1
On 9/16/22 4:59 PM, Jia Zhu wrote:
> +static
> +struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb,
> + char *name, bool need_inode)
> +{
> + int err;
> + struct inode *inode;
> + struct erofs_fscache *ctx;
> + struct erofs_domain *domain = EROFS_SB(sb)->domain;
> +
> + ctx = erofs_fscache_acquire_cookie(sb, name, need_inode);
> + if (IS_ERR(ctx))
> + return ctx;
> +
> + ctx->name = kstrdup(name, GFP_KERNEL);
> + if (!ctx->name) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + inode = new_inode(erofs_pseudo_mnt->mnt_sb);
> + if (!inode) {
> + kfree(ctx->name);
^
This line can be omitted since erofs_fscache_relinquish_cookie() will be
called.
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + ctx->domain = domain;
> + ctx->anon_inode = inode;
> + inode->i_private = ctx;
> + refcount_inc(&domain->ref);
> + return ctx;
> +out:
> + erofs_fscache_relinquish_cookie(ctx);
> + return ERR_PTR(err);
> +}
Otherwise LGTM.
Reviewed-by: Jingbo Xu <[email protected]>
--
Thanks,
Jingbo