Changes since V1:
0. Only initialize one pseudo fs to manage anonymous inodes(cookies).
1. Remove ctx's 'ref' field and replace it with inode's i_count.
2. Add lock in erofs_fscache_unregister_cookie() to avoid race condition
between cookie's get/put/search.
3. Remove useless blank lines.
[Kernel Patchset]
===============
Git tree:
https://github.com/userzj/linux.git zhujia/shared-domain-v2
Git web:
https://github.com/userzj/linux/tree/zhujia/shared-domain-v2
[Background]
============
In ondemand read mode, we use individual volume to present an erofs
mountpoint, cookies to present bootstrap and data blobs.
In which case, since cookies can't be shared between fscache volumes,
even if the data blobs between different mountpoints are exactly same,
they can't be shared.
[Introduction]
==============
Here we introduce erofs shared domain to resolve above mentioned case.
Several erofs filesystems can belong to one domain, and data blobs can
be shared among these erofs filesystems of same domain.
[Usage]
Users could specify 'domain_id' mount option to create or join into a
domain which reuses the same cookies(blobs).
[Design]
========
1. Use pseudo mnt to manage domain's lifecycle.
2. Use a linked list to maintain & traverse domains.
3. Use pseudo sb to create anonymous inode for recording cookie's info
and manage cookies lifecycle.
[Flow Path]
===========
1. User specify a new 'domain_id' in mount option.
1.1 Traverse domain list, compare domain_id with existing domain.[Miss]
1.2 Create a new domain(volume), add it to domain list.
1.3 Traverse pseudo sb's inode list, compare cookie name with
existing cookies.[Miss]
1.4 Alloc new anonymous inodes and cookies.
2. User specify an existing 'domain_id' in mount option and the data
blob is existed in domain.
2.1 Traverse domain list, compare domain_id with existing domain.[Hit]
2.2 Reuse the domain and increase its refcnt.
2.3 Traverse pseudo sb's inode list, compare cookie name with
existing cookies.[Hit]
2.4 Reuse the cookie and increase its refcnt.
[Test]
======
Git web:
https://github.com/userzj/demand-read-cachefilesd/tree/shared-domain
More test cases will be added to:
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/log/?h=experimental-tests-fscache
RFC: https://lore.kernel.org/all/YxAlO%[email protected]/
V1: https://lore.kernel.org/all/[email protected]/
Jia Zhu (5):
erofs: add 'domain_id' mount option for on-demand read sementics
erofs: introduce fscache-based domain
erofs: add 'domain_id' prefix when register sysfs
erofs: remove duplicated unregister_cookie
erofs: support fscache based shared domain
fs/erofs/fscache.c | 168 +++++++++++++++++++++++++++++++++++++++++++-
fs/erofs/internal.h | 31 +++++++-
fs/erofs/super.c | 94 +++++++++++++++++++------
fs/erofs/sysfs.c | 11 ++-
4 files changed, 278 insertions(+), 26 deletions(-)
--
2.20.1
In shared domain mount procedure, add 'domain_id' prefix to register
sysfs entry. Thus we could distinguish mounts that don't use shared
domain.
Signed-off-by: Jia Zhu <[email protected]>
---
fs/erofs/sysfs.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
index c1383e508bbe..c0031d7bd817 100644
--- a/fs/erofs/sysfs.c
+++ b/fs/erofs/sysfs.c
@@ -201,12 +201,21 @@ static struct kobject erofs_feat = {
int erofs_register_sysfs(struct super_block *sb)
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
+ char *name = NULL;
int err;
+ if (erofs_is_fscache_mode(sb)) {
+ name = kasprintf(GFP_KERNEL, "%s%s%s", sbi->opt.domain_id ?
+ sbi->opt.domain_id : "", sbi->opt.domain_id ? "," : "",
+ sbi->opt.fsid);
+ if (!name)
+ return -ENOMEM;
+ }
sbi->s_kobj.kset = &erofs_root;
init_completion(&sbi->s_kobj_unregister);
err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
- erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
+ name ? name : sb->s_id);
+ kfree(name);
if (err)
goto put_sb_kobj;
return 0;
--
2.20.1
Several erofs filesystems can belong to one domain, and data blobs can
be shared among these erofs filesystems of same domain.
Users could specify domain_id mount option to create or join into a domain.
Signed-off-by: Jia Zhu <[email protected]>
---
fs/erofs/fscache.c | 73 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 12 ++++++++
fs/erofs/super.c | 10 +++++--
3 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 439dd3cc096a..c01845808ede 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -559,12 +559,27 @@ int erofs_fscache_register_cookie(struct super_block *sb,
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
{
struct erofs_fscache *ctx = *fscache;
+ struct erofs_domain *domain;
if (!ctx)
return;
+ domain = ctx->domain;
+ if (domain) {
+ mutex_lock(&domain->mutex);
+ /* Cookie is still in use */
+ if (atomic_read(&ctx->anon_inode->i_count) > 1) {
+ iput(ctx->anon_inode);
+ mutex_unlock(&domain->mutex);
+ return;
+ }
+ iput(ctx->anon_inode);
+ kfree(ctx->name);
+ mutex_unlock(&domain->mutex);
+ }
fscache_unuse_cookie(ctx->cookie, NULL, NULL);
fscache_relinquish_cookie(ctx->cookie, false);
+ erofs_fscache_domain_put(domain);
ctx->cookie = NULL;
iput(ctx->inode);
@@ -609,3 +624,61 @@ void erofs_fscache_unregister_fs(struct super_block *sb)
sbi->volume = NULL;
sbi->domain = NULL;
}
+
+static int erofs_fscache_domain_init_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache, char *name, bool need_inode)
+{
+ int ret;
+ struct inode *inode;
+ struct erofs_fscache *ctx;
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+ struct erofs_domain *domain = sbi->domain;
+
+ ret = erofs_fscache_register_cookie(sb, &ctx, name, need_inode);
+ if (ret)
+ return ret;
+
+ ctx->name = kstrdup(name, GFP_KERNEL);
+ if (!ctx->name)
+ return -ENOMEM;
+
+ inode = new_inode(erofs_pseudo_mnt->mnt_sb);
+ if (!inode) {
+ kfree(ctx->name);
+ return -ENOMEM;
+ }
+
+ ctx->domain = domain;
+ ctx->anon_inode = inode;
+ inode->i_private = ctx;
+ erofs_fscache_domain_get(domain);
+ *fscache = ctx;
+ return 0;
+}
+
+int erofs_domain_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache, char *name, bool need_inode)
+{
+ int err;
+ struct inode *inode;
+ struct erofs_fscache *ctx;
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+ struct erofs_domain *domain = sbi->domain;
+ struct super_block *psb = erofs_pseudo_mnt->mnt_sb;
+
+ mutex_lock(&domain->mutex);
+ list_for_each_entry(inode, &psb->s_inodes, i_sb_list) {
+ ctx = inode->i_private;
+ if (!ctx)
+ continue;
+ if (!strcmp(ctx->name, name)) {
+ *fscache = ctx;
+ igrab(inode);
+ mutex_unlock(&domain->mutex);
+ return 0;
+ }
+ }
+ err = erofs_fscache_domain_init_cookie(sb, fscache, name, need_inode);
+ mutex_unlock(&domain->mutex);
+ return err;
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 2790c93ffb83..efa4f4ad77cc 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -110,6 +110,9 @@ struct erofs_domain {
struct erofs_fscache {
struct fscache_cookie *cookie;
struct inode *inode;
+ struct inode *anon_inode;
+ struct erofs_domain *domain;
+ char *name;
};
struct erofs_sb_info {
@@ -625,6 +628,9 @@ int erofs_fscache_register_domain(struct super_block *sb);
int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
char *name, bool need_inode);
+int erofs_domain_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode);
void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
extern const struct address_space_operations erofs_fscache_access_aops;
@@ -646,6 +652,12 @@ static inline int erofs_fscache_register_cookie(struct super_block *sb,
{
return -EOPNOTSUPP;
}
+static inline int erofs_domain_register_cookie(struct super_block *sb,
+ struct erofs_fscache **fscache,
+ char *name, bool need_inode)
+{
+ return -EOPNOTSUPP;
+}
static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
{
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 667a78f0ee70..11c5ba84567c 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -245,8 +245,12 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
}
if (erofs_is_fscache_mode(sb)) {
- ret = erofs_fscache_register_cookie(sb, &dif->fscache,
- dif->path, false);
+ if (sbi->opt.domain_id)
+ ret = erofs_domain_register_cookie(sb, &dif->fscache, dif->path,
+ false);
+ else
+ ret = erofs_fscache_register_cookie(sb, &dif->fscache, dif->path,
+ false);
if (ret)
return ret;
} else {
@@ -726,6 +730,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
err = erofs_fscache_register_domain(sb);
if (err)
return err;
+ err = erofs_domain_register_cookie(sb, &sbi->s_fscache,
+ sbi->opt.fsid, true);
} else {
err = erofs_fscache_register_fs(sb);
if (err)
--
2.20.1
Introduce 'domain_id' mount option to enable shared domain sementics.
In which case, the related cookie is shared if two mountpoints in the
same domain have the same data blob. Users could specify the name of
domain by this mount option.
Signed-off-by: Jia Zhu <[email protected]>
---
fs/erofs/internal.h | 1 +
fs/erofs/super.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index cfee49d33b95..fe435d077f1a 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -76,6 +76,7 @@ struct erofs_mount_opts {
#endif
unsigned int mount_opt;
char *fsid;
+ char *domain_id;
};
struct erofs_dev_context {
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 3173debeaa5a..d01109069c6b 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -440,6 +440,7 @@ enum {
Opt_dax_enum,
Opt_device,
Opt_fsid,
+ Opt_domain_id,
Opt_err
};
@@ -465,6 +466,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
fsparam_string("device", Opt_device),
fsparam_string("fsid", Opt_fsid),
+ fsparam_string("domain_id", Opt_domain_id),
{}
};
@@ -568,6 +570,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
return -ENOMEM;
#else
errorfc(fc, "fsid option not supported");
+#endif
+ break;
+ case Opt_domain_id:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ kfree(ctx->opt.domain_id);
+ ctx->opt.domain_id = kstrdup(param->string, GFP_KERNEL);
+ if (!ctx->opt.domain_id)
+ return -ENOMEM;
+#else
+ errorfc(fc, "domain_id option not supported");
#endif
break;
default:
@@ -695,6 +707,7 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
ctx->opt.fsid = NULL;
+ ctx->opt.domain_id = NULL;
sbi->devs = ctx->devs;
ctx->devs = NULL;
@@ -838,6 +851,7 @@ static void erofs_fc_free(struct fs_context *fc)
erofs_free_dev_context(ctx->devs);
kfree(ctx->opt.fsid);
+ kfree(ctx->opt.domain_id);
kfree(ctx);
}
@@ -892,6 +906,7 @@ static void erofs_kill_sb(struct super_block *sb)
erofs_fscache_unregister_cookie(&sbi->s_fscache);
erofs_fscache_unregister_fs(sb);
kfree(sbi->opt.fsid);
+ kfree(sbi->opt.domain_id);
kfree(sbi);
sb->s_fs_info = NULL;
}
@@ -1044,6 +1059,8 @@ static int erofs_show_options(struct seq_file *seq, struct dentry *root)
#ifdef CONFIG_EROFS_FS_ONDEMAND
if (opt->fsid)
seq_printf(seq, ",fsid=%s", opt->fsid);
+ if (opt->domain_id)
+ seq_printf(seq, ",domain_id=%s", opt->domain_id);
#endif
return 0;
}
--
2.20.1
In erofs umount scenario, erofs_fscache_unregister_cookie() is called
twice in kill_sb() and put_super().
It works for original semantics, cause 'ctx' will be set to NULL in
put_super() and will not be unregister again in kill_sb().
However, in shared domain scenario, we use refcount to maintain the
lifecycle of cookie. Unregister the cookie twice will cause it to be
released early.
For the above reasons, this patch removes duplicate unregister_cookie
and move fscache_unregister_* before shotdown_super() to prevent busy
inode(ctx->inode) when umount.
Signed-off-by: Jia Zhu <[email protected]>
---
fs/erofs/super.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 69de1731f454..667a78f0ee70 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -919,19 +919,20 @@ static void erofs_kill_sb(struct super_block *sb)
kill_litter_super(sb);
return;
}
- if (erofs_is_fscache_mode(sb))
- generic_shutdown_super(sb);
- else
- kill_block_super(sb);
-
sbi = EROFS_SB(sb);
if (!sbi)
return;
+ if (erofs_is_fscache_mode(sb)) {
+ erofs_fscache_unregister_cookie(&sbi->s_fscache);
+ erofs_fscache_unregister_fs(sb);
+ generic_shutdown_super(sb);
+ } else {
+ kill_block_super(sb);
+ }
+
erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev, NULL);
- erofs_fscache_unregister_cookie(&sbi->s_fscache);
- erofs_fscache_unregister_fs(sb);
kfree(sbi->opt.fsid);
kfree(sbi->opt.domain_id);
kfree(sbi);
@@ -951,7 +952,6 @@ static void erofs_put_super(struct super_block *sb)
iput(sbi->managed_cache);
sbi->managed_cache = NULL;
#endif
- erofs_fscache_unregister_cookie(&sbi->s_fscache);
}
struct file_system_type erofs_fs_type = {
--
2.20.1
A new fscache-based shared domain mode is going to be introduced for
erofs. In which case, same data blobs in same domain will be shared
and reused to reduce on-disk space usage.
As the first step, we use pseudo mnt to manage and maintain domain's
lifecycle.
The implementation of sharing blobs will be introduced in subsequent
patches.
Signed-off-by: Jia Zhu <[email protected]>
---
fs/erofs/fscache.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-
fs/erofs/internal.h | 18 ++++++++-
fs/erofs/super.c | 51 ++++++++++++++++++------
3 files changed, 149 insertions(+), 15 deletions(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 8e01d89c3319..439dd3cc096a 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -1,10 +1,15 @@
// SPDX-License-Identifier: GPL-2.0-or-later
/*
* Copyright (C) 2022, Alibaba Cloud
+ * Copyright (C) 2022, Bytedance Inc. All rights reserved.
*/
#include <linux/fscache.h>
#include "internal.h"
+static DEFINE_MUTEX(erofs_domain_list_lock);
+static LIST_HEAD(erofs_domain_list);
+static struct vfsmount *erofs_pseudo_mnt;
+
static struct netfs_io_request *erofs_fscache_alloc_request(struct address_space *mapping,
loff_t start, size_t len)
{
@@ -417,6 +422,87 @@ const struct address_space_operations erofs_fscache_access_aops = {
.readahead = erofs_fscache_readahead,
};
+static void erofs_fscache_domain_get(struct erofs_domain *domain)
+{
+ if (!domain)
+ return;
+ refcount_inc(&domain->ref);
+}
+
+static void erofs_fscache_domain_put(struct erofs_domain *domain)
+{
+ if (!domain)
+ return;
+ if (refcount_dec_and_test(&domain->ref)) {
+ fscache_relinquish_volume(domain->volume, NULL, false);
+ mutex_lock(&erofs_domain_list_lock);
+ list_del(&domain->list);
+ mutex_unlock(&erofs_domain_list_lock);
+ kfree(domain->domain_id);
+ kfree(domain);
+ }
+}
+
+static int erofs_fscache_init_domain(struct super_block *sb)
+{
+ int err;
+ struct erofs_domain *domain;
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+ domain = kzalloc(sizeof(struct erofs_domain), GFP_KERNEL);
+ if (!domain)
+ return -ENOMEM;
+
+ domain->domain_id = kstrdup(sbi->opt.domain_id, GFP_KERNEL);
+ if (!domain->domain_id) {
+ kfree(domain);
+ return -ENOMEM;
+ }
+ sbi->domain = domain;
+ if (!erofs_pseudo_mnt) {
+ erofs_pseudo_mnt = kern_mount(&erofs_fs_type);
+ if (IS_ERR(erofs_pseudo_mnt)) {
+ err = PTR_ERR(erofs_pseudo_mnt);
+ goto out;
+ }
+ }
+ err = erofs_fscache_register_fs(sb);
+ if (err)
+ goto out;
+
+ domain->volume = sbi->volume;
+ refcount_set(&domain->ref, 1);
+ mutex_init(&domain->mutex);
+ list_add(&domain->list, &erofs_domain_list);
+ return 0;
+out:
+ kfree(domain->domain_id);
+ kfree(domain);
+ sbi->domain = NULL;
+ return err;
+}
+
+int erofs_fscache_register_domain(struct super_block *sb)
+{
+ int err;
+ struct erofs_domain *domain;
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+ mutex_lock(&erofs_domain_list_lock);
+ list_for_each_entry(domain, &erofs_domain_list, list) {
+ if (!strcmp(domain->domain_id, sbi->opt.domain_id)) {
+ erofs_fscache_domain_get(domain);
+ sbi->domain = domain;
+ sbi->volume = domain->volume;
+ mutex_unlock(&erofs_domain_list_lock);
+ return 0;
+ }
+ }
+ err = erofs_fscache_init_domain(sb);
+ mutex_unlock(&erofs_domain_list_lock);
+ return err;
+}
+
int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
char *name, bool need_inode)
@@ -495,7 +581,8 @@ int erofs_fscache_register_fs(struct super_block *sb)
char *name;
int ret = 0;
- name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
+ name = kasprintf(GFP_KERNEL, "erofs,%s",
+ sbi->domain ? sbi->domain->domain_id : sbi->opt.fsid);
if (!name)
return -ENOMEM;
@@ -515,6 +602,10 @@ void erofs_fscache_unregister_fs(struct super_block *sb)
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
- fscache_relinquish_volume(sbi->volume, NULL, false);
+ if (sbi->domain)
+ erofs_fscache_domain_put(sbi->domain);
+ else
+ fscache_relinquish_volume(sbi->volume, NULL, false);
sbi->volume = NULL;
+ sbi->domain = NULL;
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fe435d077f1a..2790c93ffb83 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -99,6 +99,14 @@ struct erofs_sb_lz4_info {
u16 max_pclusterblks;
};
+struct erofs_domain {
+ refcount_t ref;
+ struct mutex mutex;
+ struct list_head list;
+ struct fscache_volume *volume;
+ char *domain_id;
+};
+
struct erofs_fscache {
struct fscache_cookie *cookie;
struct inode *inode;
@@ -158,6 +166,7 @@ struct erofs_sb_info {
/* fscache support */
struct fscache_volume *volume;
struct erofs_fscache *s_fscache;
+ struct erofs_domain *domain;
};
#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
@@ -394,6 +403,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
}
extern const struct super_operations erofs_sops;
+extern struct file_system_type erofs_fs_type;
extern const struct address_space_operations erofs_raw_access_aops;
extern const struct address_space_operations z_erofs_aops;
@@ -610,6 +620,7 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
#ifdef CONFIG_EROFS_FS_ONDEMAND
int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb);
+int erofs_fscache_register_domain(struct super_block *sb);
int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
@@ -620,10 +631,15 @@ extern const struct address_space_operations erofs_fscache_access_aops;
#else
static inline int erofs_fscache_register_fs(struct super_block *sb)
{
- return 0;
+ return -EOPNOTSUPP;
}
static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
+static inline int erofs_fscache_register_domain(const struct super_block *sb)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int erofs_fscache_register_cookie(struct super_block *sb,
struct erofs_fscache **fscache,
char *name, bool need_inode)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index d01109069c6b..69de1731f454 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -688,6 +688,13 @@ static const struct export_operations erofs_export_ops = {
.get_parent = erofs_get_parent,
};
+static int erofs_fc_fill_pseudo_super(struct super_block *sb, struct fs_context *fc)
+{
+ static const struct tree_descr empty_descr = {""};
+
+ return simple_fill_super(sb, EROFS_SUPER_MAGIC, &empty_descr);
+}
+
static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
{
struct inode *inode;
@@ -715,12 +722,17 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_blocksize = EROFS_BLKSIZ;
sb->s_blocksize_bits = LOG_BLOCK_SIZE;
- err = erofs_fscache_register_fs(sb);
- if (err)
- return err;
-
- err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
- sbi->opt.fsid, true);
+ if (sbi->opt.domain_id) {
+ err = erofs_fscache_register_domain(sb);
+ if (err)
+ return err;
+ } else {
+ err = erofs_fscache_register_fs(sb);
+ if (err)
+ return err;
+ err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
+ sbi->opt.fsid, true);
+ }
if (err)
return err;
@@ -798,8 +810,12 @@ static int erofs_fc_get_tree(struct fs_context *fc)
{
struct erofs_fs_context *ctx = fc->fs_private;
- if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
- return get_tree_nodev(fc, erofs_fc_fill_super);
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) {
+ if (!ctx && fc->sb_flags & SB_KERNMOUNT)
+ return get_tree_nodev(fc, erofs_fc_fill_pseudo_super);
+ if (ctx->opt.fsid)
+ return get_tree_nodev(fc, erofs_fc_fill_super);
+ }
return get_tree_bdev(fc, erofs_fc_fill_super);
}
@@ -849,6 +865,8 @@ static void erofs_fc_free(struct fs_context *fc)
{
struct erofs_fs_context *ctx = fc->fs_private;
+ if (!ctx)
+ return;
erofs_free_dev_context(ctx->devs);
kfree(ctx->opt.fsid);
kfree(ctx->opt.domain_id);
@@ -864,8 +882,12 @@ static const struct fs_context_operations erofs_context_ops = {
static int erofs_init_fs_context(struct fs_context *fc)
{
- struct erofs_fs_context *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ struct erofs_fs_context *ctx;
+ if (fc->sb_flags & SB_KERNMOUNT)
+ goto out;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL);
@@ -878,6 +900,7 @@ static int erofs_init_fs_context(struct fs_context *fc)
idr_init(&ctx->devs->tree);
init_rwsem(&ctx->devs->rwsem);
erofs_default_options(ctx);
+out:
fc->ops = &erofs_context_ops;
return 0;
}
@@ -892,6 +915,10 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
+ if (sb->s_flags & SB_KERNMOUNT) {
+ kill_litter_super(sb);
+ return;
+ }
if (erofs_is_fscache_mode(sb))
generic_shutdown_super(sb);
else
@@ -916,8 +943,8 @@ static void erofs_put_super(struct super_block *sb)
{
struct erofs_sb_info *const sbi = EROFS_SB(sb);
- DBG_BUGON(!sbi);
-
+ if (!sbi)
+ return;
erofs_unregister_sysfs(sb);
erofs_shrinker_unregister(sb);
#ifdef CONFIG_EROFS_FS_ZIP
@@ -927,7 +954,7 @@ static void erofs_put_super(struct super_block *sb)
erofs_fscache_unregister_cookie(&sbi->s_fscache);
}
-static struct file_system_type erofs_fs_type = {
+struct file_system_type erofs_fs_type = {
.owner = THIS_MODULE,
.name = "erofs",
.init_fs_context = erofs_init_fs_context,
--
2.20.1
[E2E Container Demo for Quick Test]
===================================
[Issue]
https://github.com/containerd/nydus-snapshotter/issues/161
[Pull Request]
https://github.com/containerd/nydus-snapshotter/pull/162
在 2022/9/2 18:53, Jia Zhu 写道:
> Changes since V1:
> 0. Only initialize one pseudo fs to manage anonymous inodes(cookies).
> 1. Remove ctx's 'ref' field and replace it with inode's i_count.
> 2. Add lock in erofs_fscache_unregister_cookie() to avoid race condition
> between cookie's get/put/search.
> 3. Remove useless blank lines.
>
> [Kernel Patchset]
> ===============
> Git tree:
> https://github.com/userzj/linux.git zhujia/shared-domain-v2
> Git web:
> https://github.com/userzj/linux/tree/zhujia/shared-domain-v2
>
> [Background]
> ============
> In ondemand read mode, we use individual volume to present an erofs
> mountpoint, cookies to present bootstrap and data blobs.
>
> In which case, since cookies can't be shared between fscache volumes,
> even if the data blobs between different mountpoints are exactly same,
> they can't be shared.
>
> [Introduction]
> ==============
> Here we introduce erofs shared domain to resolve above mentioned case.
> Several erofs filesystems can belong to one domain, and data blobs can
> be shared among these erofs filesystems of same domain.
>
> [Usage]
> Users could specify 'domain_id' mount option to create or join into a
> domain which reuses the same cookies(blobs).
>
> [Design]
> ========
> 1. Use pseudo mnt to manage domain's lifecycle.
> 2. Use a linked list to maintain & traverse domains.
> 3. Use pseudo sb to create anonymous inode for recording cookie's info
> and manage cookies lifecycle.
>
> [Flow Path]
> ===========
> 1. User specify a new 'domain_id' in mount option.
> 1.1 Traverse domain list, compare domain_id with existing domain.[Miss]
> 1.2 Create a new domain(volume), add it to domain list.
> 1.3 Traverse pseudo sb's inode list, compare cookie name with
> existing cookies.[Miss]
> 1.4 Alloc new anonymous inodes and cookies.
>
> 2. User specify an existing 'domain_id' in mount option and the data
> blob is existed in domain.
> 2.1 Traverse domain list, compare domain_id with existing domain.[Hit]
> 2.2 Reuse the domain and increase its refcnt.
> 2.3 Traverse pseudo sb's inode list, compare cookie name with
> existing cookies.[Hit]
> 2.4 Reuse the cookie and increase its refcnt.
> [Test]
> ======
> Git web:
> https://github.com/userzj/demand-read-cachefilesd/tree/shared-domain
> More test cases will be added to:
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/log/?h=experimental-tests-fscache
>
> RFC: https://lore.kernel.org/all/YxAlO%[email protected]/
> V1: https://lore.kernel.org/all/[email protected]/
>
>
> Jia Zhu (5):
> erofs: add 'domain_id' mount option for on-demand read sementics
> erofs: introduce fscache-based domain
> erofs: add 'domain_id' prefix when register sysfs
> erofs: remove duplicated unregister_cookie
> erofs: support fscache based shared domain
>
> fs/erofs/fscache.c | 168 +++++++++++++++++++++++++++++++++++++++++++-
> fs/erofs/internal.h | 31 +++++++-
> fs/erofs/super.c | 94 +++++++++++++++++++------
> fs/erofs/sysfs.c | 11 ++-
> 4 files changed, 278 insertions(+), 26 deletions(-)
>
On 9/2/22 6:53 PM, Jia Zhu wrote:
> A new fscache-based shared domain mode is going to be introduced for
> erofs. In which case, same data blobs in same domain will be shared
> and reused to reduce on-disk space usage.
>
> As the first step, we use pseudo mnt to manage and maintain domain's
> lifecycle.
>
> The implementation of sharing blobs will be introduced in subsequent
> patches.
>
> Signed-off-by: Jia Zhu <[email protected]>
> ---
> fs/erofs/fscache.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-
> fs/erofs/internal.h | 18 ++++++++-
> fs/erofs/super.c | 51 ++++++++++++++++++------
> 3 files changed, 149 insertions(+), 15 deletions(-)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 8e01d89c3319..439dd3cc096a 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -1,10 +1,15 @@
> // SPDX-License-Identifier: GPL-2.0-or-later
> /*
> * Copyright (C) 2022, Alibaba Cloud
> + * Copyright (C) 2022, Bytedance Inc. All rights reserved.
> */
> #include <linux/fscache.h>
> #include "internal.h"
>
> +static DEFINE_MUTEX(erofs_domain_list_lock);
> +static LIST_HEAD(erofs_domain_list);
> +static struct vfsmount *erofs_pseudo_mnt;
> +
> static struct netfs_io_request *erofs_fscache_alloc_request(struct address_space *mapping,
> loff_t start, size_t len)
> {
> @@ -417,6 +422,87 @@ const struct address_space_operations erofs_fscache_access_aops = {
> .readahead = erofs_fscache_readahead,
> };
>
> +static void erofs_fscache_domain_get(struct erofs_domain *domain)
> +{
> + if (!domain)
> + return;
> + refcount_inc(&domain->ref);
> +}
It seems that the input @domain can not be NULL, and thus the NULL check
is not needed.
Besides how about:
struct erofs_domain *domain erofs_fscache_domain_get(struct erofs_domain
*domain)
{
refcount_inc(&domain->ref);
return domain;
}
> +
> +static void erofs_fscache_domain_put(struct erofs_domain *domain)
> +{
> + if (!domain)
> + return;
> + if (refcount_dec_and_test(&domain->ref)) {
> + fscache_relinquish_volume(domain->volume, NULL, false);
> + mutex_lock(&erofs_domain_list_lock);
> + list_del(&domain->list);
> + mutex_unlock(&erofs_domain_list_lock);
> + kfree(domain->domain_id);
> + kfree(domain);
> + }
> +}
> +
> +static int erofs_fscache_init_domain(struct super_block *sb)
> +{
> + int err;
> + struct erofs_domain *domain;
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> +
> + domain = kzalloc(sizeof(struct erofs_domain), GFP_KERNEL);
> + if (!domain)
> + return -ENOMEM;
> +
> + domain->domain_id = kstrdup(sbi->opt.domain_id, GFP_KERNEL);
> + if (!domain->domain_id) {
> + kfree(domain);
> + return -ENOMEM;
> + }
> + sbi->domain = domain;
> + if (!erofs_pseudo_mnt) {
> + erofs_pseudo_mnt = kern_mount(&erofs_fs_type);
> + if (IS_ERR(erofs_pseudo_mnt)) {
> + err = PTR_ERR(erofs_pseudo_mnt);
> + goto out;
> + }
> + }
> + err = erofs_fscache_register_fs(sb);
> + if (err)
> + goto out;
> +
> + domain->volume = sbi->volume;
> + refcount_set(&domain->ref, 1);
> + mutex_init(&domain->mutex);
> + list_add(&domain->list, &erofs_domain_list);
> + return 0;
> +out:
> + kfree(domain->domain_id);
> + kfree(domain);
> + sbi->domain = NULL;
> + return err;
> +}
> +
> +int erofs_fscache_register_domain(struct super_block *sb)
> +{
> + int err;
> + struct erofs_domain *domain;
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> +
> + mutex_lock(&erofs_domain_list_lock);
> + list_for_each_entry(domain, &erofs_domain_list, list) {
> + if (!strcmp(domain->domain_id, sbi->opt.domain_id)) {
> + erofs_fscache_domain_get(domain);
> + sbi->domain = domain;
sbi->domain = erofs_fscache_domain_get(domain);
> + sbi->volume = domain->volume;
> + mutex_unlock(&erofs_domain_list_lock);
> + return 0;
> + }
> + }
> + err = erofs_fscache_init_domain(sb);
> + mutex_unlock(&erofs_domain_list_lock);
> + return err;
> +}
> +
> int erofs_fscache_register_cookie(struct super_block *sb,
> struct erofs_fscache **fscache,
> char *name, bool need_inode)
> @@ -495,7 +581,8 @@ int erofs_fscache_register_fs(struct super_block *sb)
> char *name;
> int ret = 0;
>
> - name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
> + name = kasprintf(GFP_KERNEL, "erofs,%s",
> + sbi->domain ? sbi->domain->domain_id : sbi->opt.fsid);
Do we also need to encode the cookie name in the "<domain_id>,<fsid>"
format? This will affect the path of the cache files.
> if (!name)
> return -ENOMEM;
>
> @@ -515,6 +602,10 @@ void erofs_fscache_unregister_fs(struct super_block *sb)
> {
> struct erofs_sb_info *sbi = EROFS_SB(sb);
>
> - fscache_relinquish_volume(sbi->volume, NULL, false);
> + if (sbi->domain)
> + erofs_fscache_domain_put(sbi->domain);
> + else
> + fscache_relinquish_volume(sbi->volume, NULL, false);
> sbi->volume = NULL;
> + sbi->domain = NULL;
> }
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fe435d077f1a..2790c93ffb83 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -99,6 +99,14 @@ struct erofs_sb_lz4_info {
> u16 max_pclusterblks;
> };
>
> +struct erofs_domain {
> + refcount_t ref;
> + struct mutex mutex;
> + struct list_head list;
> + struct fscache_volume *volume;
> + char *domain_id;
> +};
> +
> struct erofs_fscache {
> struct fscache_cookie *cookie;
> struct inode *inode;
> @@ -158,6 +166,7 @@ struct erofs_sb_info {
> /* fscache support */
> struct fscache_volume *volume;
> struct erofs_fscache *s_fscache;
> + struct erofs_domain *domain;
> };
>
> #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
> @@ -394,6 +403,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
> }
>
> extern const struct super_operations erofs_sops;
> +extern struct file_system_type erofs_fs_type;
>
> extern const struct address_space_operations erofs_raw_access_aops;
> extern const struct address_space_operations z_erofs_aops;
> @@ -610,6 +620,7 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
> #ifdef CONFIG_EROFS_FS_ONDEMAND
> int erofs_fscache_register_fs(struct super_block *sb);
> void erofs_fscache_unregister_fs(struct super_block *sb);
> +int erofs_fscache_register_domain(struct super_block *sb);
>
> int erofs_fscache_register_cookie(struct super_block *sb,
> struct erofs_fscache **fscache,
> @@ -620,10 +631,15 @@ extern const struct address_space_operations erofs_fscache_access_aops;
> #else
> static inline int erofs_fscache_register_fs(struct super_block *sb)
> {
> - return 0;
> + return -EOPNOTSUPP;
> }
> static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
>
> +static inline int erofs_fscache_register_domain(const struct super_block *sb)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> static inline int erofs_fscache_register_cookie(struct super_block *sb,
> struct erofs_fscache **fscache,
> char *name, bool need_inode)
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index d01109069c6b..69de1731f454 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -688,6 +688,13 @@ static const struct export_operations erofs_export_ops = {
> .get_parent = erofs_get_parent,
> };
>
> +static int erofs_fc_fill_pseudo_super(struct super_block *sb, struct fs_context *fc)
> +{
> + static const struct tree_descr empty_descr = {""};
> +
> + return simple_fill_super(sb, EROFS_SUPER_MAGIC, &empty_descr);
> +}
> +
> static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> {
> struct inode *inode;
> @@ -715,12 +722,17 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> sb->s_blocksize = EROFS_BLKSIZ;
> sb->s_blocksize_bits = LOG_BLOCK_SIZE;
>
> - err = erofs_fscache_register_fs(sb);
> - if (err)
> - return err;
> -
> - err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
> - sbi->opt.fsid, true);
> + if (sbi->opt.domain_id) {
> + err = erofs_fscache_register_domain(sb);
> + if (err)
> + return err;
> + } else {
> + err = erofs_fscache_register_fs(sb);
> + if (err)
> + return err;
> + err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
> + sbi->opt.fsid, true);
We'd better keep only one entry to the fscache related codes. How about
moving erofs_fscache_register_cookie(), i.e. registering cookie for
bootstrap, into erofs_fscache_register_fs()? Similarly, check the
domain_id and call erofs_fscache_register_domain() inside
erofs_fscache_register_fs().
Similarly, check domain_id and call erofs_domain_register_cookie()
inside erofs_fscache_register_cookie().
> + }
> if (err)
> return err;
>
> @@ -798,8 +810,12 @@ static int erofs_fc_get_tree(struct fs_context *fc)
> {
> struct erofs_fs_context *ctx = fc->fs_private;
>
> - if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
> - return get_tree_nodev(fc, erofs_fc_fill_super);
> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) {
> + if (!ctx && fc->sb_flags & SB_KERNMOUNT)
> + return get_tree_nodev(fc, erofs_fc_fill_pseudo_super);
> + if (ctx->opt.fsid)
> + return get_tree_nodev(fc, erofs_fc_fill_super);
> + }
>
> return get_tree_bdev(fc, erofs_fc_fill_super);
> }
> @@ -849,6 +865,8 @@ static void erofs_fc_free(struct fs_context *fc)
> {
> struct erofs_fs_context *ctx = fc->fs_private;
>
> + if (!ctx)
> + return;
> erofs_free_dev_context(ctx->devs);
> kfree(ctx->opt.fsid);
> kfree(ctx->opt.domain_id);
> @@ -864,8 +882,12 @@ static const struct fs_context_operations erofs_context_ops = {
>
> static int erofs_init_fs_context(struct fs_context *fc)
> {
> - struct erofs_fs_context *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> + struct erofs_fs_context *ctx;
>
> + if (fc->sb_flags & SB_KERNMOUNT)
> + goto out;
> +
> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> if (!ctx)
> return -ENOMEM;
> ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL);
> @@ -878,6 +900,7 @@ static int erofs_init_fs_context(struct fs_context *fc)
> idr_init(&ctx->devs->tree);
> init_rwsem(&ctx->devs->rwsem);
> erofs_default_options(ctx);
> +out:
> fc->ops = &erofs_context_ops;
> return 0;
> }
> @@ -892,6 +915,10 @@ static void erofs_kill_sb(struct super_block *sb)
>
> WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
>
> + if (sb->s_flags & SB_KERNMOUNT) {
> + kill_litter_super(sb);
> + return;
> + }
> if (erofs_is_fscache_mode(sb))
> generic_shutdown_super(sb);
> else
> @@ -916,8 +943,8 @@ static void erofs_put_super(struct super_block *sb)
> {
> struct erofs_sb_info *const sbi = EROFS_SB(sb);
>
> - DBG_BUGON(!sbi);
> -
> + if (!sbi)
> + return;
> erofs_unregister_sysfs(sb);
> erofs_shrinker_unregister(sb);
> #ifdef CONFIG_EROFS_FS_ZIP
> @@ -927,7 +954,7 @@ static void erofs_put_super(struct super_block *sb)
> erofs_fscache_unregister_cookie(&sbi->s_fscache);
> }
>
> -static struct file_system_type erofs_fs_type = {
> +struct file_system_type erofs_fs_type = {
> .owner = THIS_MODULE,
> .name = "erofs",
> .init_fs_context = erofs_init_fs_context,
--
Thanks,
Jingbo
On 9/2/22 6:53 PM, Jia Zhu wrote:
> In shared domain mount procedure, add 'domain_id' prefix to register
> sysfs entry. Thus we could distinguish mounts that don't use shared
> domain.
>
> Signed-off-by: Jia Zhu <[email protected]>
> ---
> fs/erofs/sysfs.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
> index c1383e508bbe..c0031d7bd817 100644
> --- a/fs/erofs/sysfs.c
> +++ b/fs/erofs/sysfs.c
> @@ -201,12 +201,21 @@ static struct kobject erofs_feat = {
> int erofs_register_sysfs(struct super_block *sb)
> {
> struct erofs_sb_info *sbi = EROFS_SB(sb);
> + char *name = NULL;
> int err;
>
> + if (erofs_is_fscache_mode(sb)) {
> + name = kasprintf(GFP_KERNEL, "%s%s%s", sbi->opt.domain_id ?
> + sbi->opt.domain_id : "", sbi->opt.domain_id ? "," : "",
> + sbi->opt.fsid);
> + if (!name)
> + return -ENOMEM;
> + }
How about:
name = erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id;
if (sbi->opt.domain_id) {
str = kasprintf(GFP_KERNEL, "%s,%s", sbi->opt.domain_id, sbi->opt.fsid);
name = str;
}
> sbi->s_kobj.kset = &erofs_root;
> init_completion(&sbi->s_kobj_unregister);
> err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
> - erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
> + name ? name : sb->s_id);
kobject_init_and_add(..., "%s", name);
kfree(str);
though it's still not such straightforward...
Any better idea?
> + kfree(name);
> if (err)
> goto put_sb_kobj;
> return 0;
--
Thanks,
Jingbo
On 9/2/22 6:53 PM, Jia Zhu wrote:
> In erofs umount scenario, erofs_fscache_unregister_cookie() is called
> twice in kill_sb() and put_super().
>
> It works for original semantics, cause 'ctx' will be set to NULL in
> put_super() and will not be unregister again in kill_sb().
> However, in shared domain scenario, we use refcount to maintain the
> lifecycle of cookie. Unregister the cookie twice will cause it to be
> released early.
>
> For the above reasons, this patch removes duplicate unregister_cookie
> and move fscache_unregister_* before shotdown_super() to prevent busy
> inode(ctx->inode) when umount.
>
> Signed-off-by: Jia Zhu <[email protected]>
> ---
> fs/erofs/super.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 69de1731f454..667a78f0ee70 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -919,19 +919,20 @@ static void erofs_kill_sb(struct super_block *sb)
> kill_litter_super(sb);
> return;
> }
> - if (erofs_is_fscache_mode(sb))
> - generic_shutdown_super(sb);
> - else
> - kill_block_super(sb);
> -
> sbi = EROFS_SB(sb);
> if (!sbi)
> return;
>
> + if (erofs_is_fscache_mode(sb)) {
> + erofs_fscache_unregister_cookie(&sbi->s_fscache);
> + erofs_fscache_unregister_fs(sb);
> + generic_shutdown_super(sb);
Generally we can't do clean ups before generic_shutdown_super(), since
generic_shutdown_super() may trigger IO, e.g. in sync_filesystem(),
though it's not the case for erofs (read-only).
How about embedding erofs_fscache_unregister_cookie() into
erofs_fscache_unregister_fs(), and thus we can check domain_id in
erofs_fscache_unregister_fs()?
> + } else {
> + kill_block_super(sb);
> + }
> +
> erofs_free_dev_context(sbi->devs);
> fs_put_dax(sbi->dax_dev, NULL);
> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
> - erofs_fscache_unregister_fs(sb);
> kfree(sbi->opt.fsid);
> kfree(sbi->opt.domain_id);
> kfree(sbi);
> @@ -951,7 +952,6 @@ static void erofs_put_super(struct super_block *sb)
> iput(sbi->managed_cache);
> sbi->managed_cache = NULL;
> #endif
> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
> }
>
> struct file_system_type erofs_fs_type = {
--
Thanks,
Jingbo
On 9/9/22 5:23 PM, JeffleXu wrote:
>
>
> On 9/2/22 6:53 PM, Jia Zhu wrote:
>> In shared domain mount procedure, add 'domain_id' prefix to register
>> sysfs entry. Thus we could distinguish mounts that don't use shared
>> domain.
>>
>> Signed-off-by: Jia Zhu <[email protected]>
>> ---
>> fs/erofs/sysfs.c | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
>> index c1383e508bbe..c0031d7bd817 100644
>> --- a/fs/erofs/sysfs.c
>> +++ b/fs/erofs/sysfs.c
>> @@ -201,12 +201,21 @@ static struct kobject erofs_feat = {
>> int erofs_register_sysfs(struct super_block *sb)
>> {
>> struct erofs_sb_info *sbi = EROFS_SB(sb);
>> + char *name = NULL;
>> int err;
>>
>> + if (erofs_is_fscache_mode(sb)) {
>> + name = kasprintf(GFP_KERNEL, "%s%s%s", sbi->opt.domain_id ?
>> + sbi->opt.domain_id : "", sbi->opt.domain_id ? "," : "",
>> + sbi->opt.fsid);
>> + if (!name)
>> + return -ENOMEM;
>> + }
>
>
> How about:
>
> name = erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id;
> if (sbi->opt.domain_id) {
> str = kasprintf(GFP_KERNEL, "%s,%s", sbi->opt.domain_id, sbi->opt.fsid);
> name = str;
> }
Another choice:
if (erofs_is_fscache_mode(sb)) {
if (sbi->opt.domain_id) {
str = kasprintf(GFP_KERNEL, "%s,%s", sbi->opt.domain_id, sbi->opt.fsid);
name = str;
} else {
name = sbi->opt.fsid;
}
} else {
name = sb->s_id;
}
>
>
>> sbi->s_kobj.kset = &erofs_root;
>> init_completion(&sbi->s_kobj_unregister);
>> err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
>> - erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
>> + name ? name : sb->s_id);
>
> kobject_init_and_add(..., "%s", name);
> kfree(str);
>
> though it's still not such straightforward...
>
> Any better idea?
>
>
>> + kfree(name);
>> if (err)
>> goto put_sb_kobj;
>> return 0;
>
--
Thanks,
Jingbo
On 9/9/22 5:55 PM, JeffleXu wrote:
>
>
> On 9/2/22 6:53 PM, Jia Zhu wrote:
>> In erofs umount scenario, erofs_fscache_unregister_cookie() is called
>> twice in kill_sb() and put_super().
>>
>> It works for original semantics, cause 'ctx' will be set to NULL in
>> put_super() and will not be unregister again in kill_sb().
>> However, in shared domain scenario, we use refcount to maintain the
>> lifecycle of cookie. Unregister the cookie twice will cause it to be
>> released early.
Sorry, why can't we also set sbi->s_fscache to NULL after decrementing
the refcount in shared domain mode, to avoid the refcount being
decremented twice?
>>
>> For the above reasons, this patch removes duplicate unregister_cookie
>> and move fscache_unregister_* before shotdown_super() to prevent busy
>> inode(ctx->inode) when umount.
>>
>> Signed-off-by: Jia Zhu <[email protected]>
>> ---
>> fs/erofs/super.c | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>> index 69de1731f454..667a78f0ee70 100644
>> --- a/fs/erofs/super.c
>> +++ b/fs/erofs/super.c
>> @@ -919,19 +919,20 @@ static void erofs_kill_sb(struct super_block *sb)
>> kill_litter_super(sb);
>> return;
>> }
>> - if (erofs_is_fscache_mode(sb))
>> - generic_shutdown_super(sb);
>> - else
>> - kill_block_super(sb);
>> -
>> sbi = EROFS_SB(sb);
>> if (!sbi)
>> return;
>>
>> + if (erofs_is_fscache_mode(sb)) {
>> + erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> + erofs_fscache_unregister_fs(sb);
>> + generic_shutdown_super(sb);
>
> Generally we can't do clean ups before generic_shutdown_super(), since
> generic_shutdown_super() may trigger IO, e.g. in sync_filesystem(),
> though it's not the case for erofs (read-only).
>
> How about embedding erofs_fscache_unregister_cookie() into
> erofs_fscache_unregister_fs(), and thus we can check domain_id in
> erofs_fscache_unregister_fs()?
>
>> + } else {
>> + kill_block_super(sb);
>> + }
>> +
>> erofs_free_dev_context(sbi->devs);
>> fs_put_dax(sbi->dax_dev, NULL);
>> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> - erofs_fscache_unregister_fs(sb);
>> kfree(sbi->opt.fsid);
>> kfree(sbi->opt.domain_id);
>> kfree(sbi);
>> @@ -951,7 +952,6 @@ static void erofs_put_super(struct super_block *sb)
>> iput(sbi->managed_cache);
>> sbi->managed_cache = NULL;
>> #endif
>> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> }
>>
>> struct file_system_type erofs_fs_type = {
>
--
Thanks,
Jingbo
在 2022/9/9 17:26, JeffleXu 写道:
>
>
> On 9/9/22 5:23 PM, JeffleXu wrote:
>>
>>
>> On 9/2/22 6:53 PM, Jia Zhu wrote:
>>> In shared domain mount procedure, add 'domain_id' prefix to register
>>> sysfs entry. Thus we could distinguish mounts that don't use shared
>>> domain.
>>>
>>> Signed-off-by: Jia Zhu <[email protected]>
>>> ---
>>> fs/erofs/sysfs.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/erofs/sysfs.c b/fs/erofs/sysfs.c
>>> index c1383e508bbe..c0031d7bd817 100644
>>> --- a/fs/erofs/sysfs.c
>>> +++ b/fs/erofs/sysfs.c
>>> @@ -201,12 +201,21 @@ static struct kobject erofs_feat = {
>>> int erofs_register_sysfs(struct super_block *sb)
>>> {
>>> struct erofs_sb_info *sbi = EROFS_SB(sb);
>>> + char *name = NULL;
>>> int err;
>>>
>>> + if (erofs_is_fscache_mode(sb)) {
>>> + name = kasprintf(GFP_KERNEL, "%s%s%s", sbi->opt.domain_id ?
>>> + sbi->opt.domain_id : "", sbi->opt.domain_id ? "," : "",
>>> + sbi->opt.fsid);
>>> + if (!name)
>>> + return -ENOMEM;
>>> + }
>>
>>
>> How about:
>>
>> name = erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id;
>> if (sbi->opt.domain_id) {
>> str = kasprintf(GFP_KERNEL, "%s,%s", sbi->opt.domain_id, sbi->opt.fsid);
>> name = str;
>> }
>
> Another choice:
>
> if (erofs_is_fscache_mode(sb)) {
> if (sbi->opt.domain_id) {
> str = kasprintf(GFP_KERNEL, "%s,%s", sbi->opt.domain_id, sbi->opt.fsid);
> name = str;
> } else {
> name = sbi->opt.fsid;
> }
> } else {
> name = sb->s_id;
> }
>
>
Thanks for your advise, this version looks more intuitive. I'll apply it
in next version.
>
>
>>
>>
>>> sbi->s_kobj.kset = &erofs_root;
>>> init_completion(&sbi->s_kobj_unregister);
>>> err = kobject_init_and_add(&sbi->s_kobj, &erofs_sb_ktype, NULL, "%s",
>>> - erofs_is_fscache_mode(sb) ? sbi->opt.fsid : sb->s_id);
>>> + name ? name : sb->s_id);
>>
>> kobject_init_and_add(..., "%s", name);
>> kfree(str);
>>
>> though it's still not such straightforward...
>>
>> Any better idea?
>>
>>
>>> + kfree(name);
>>> if (err)
>>> goto put_sb_kobj;
>>> return 0;
>>
>
在 2022/9/9 17:55, JeffleXu 写道:
>
>
> On 9/2/22 6:53 PM, Jia Zhu wrote:
>> In erofs umount scenario, erofs_fscache_unregister_cookie() is called
>> twice in kill_sb() and put_super().
>>
>> It works for original semantics, cause 'ctx' will be set to NULL in
>> put_super() and will not be unregister again in kill_sb().
>> However, in shared domain scenario, we use refcount to maintain the
>> lifecycle of cookie. Unregister the cookie twice will cause it to be
>> released early.
>>
>> For the above reasons, this patch removes duplicate unregister_cookie
>> and move fscache_unregister_* before shotdown_super() to prevent busy
>> inode(ctx->inode) when umount.
>>
>> Signed-off-by: Jia Zhu <[email protected]>
>> ---
>> fs/erofs/super.c | 16 ++++++++--------
>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>> index 69de1731f454..667a78f0ee70 100644
>> --- a/fs/erofs/super.c
>> +++ b/fs/erofs/super.c
>> @@ -919,19 +919,20 @@ static void erofs_kill_sb(struct super_block *sb)
>> kill_litter_super(sb);
>> return;
>> }
>> - if (erofs_is_fscache_mode(sb))
>> - generic_shutdown_super(sb);
>> - else
>> - kill_block_super(sb);
>> -
>> sbi = EROFS_SB(sb);
>> if (!sbi)
>> return;
>>
>> + if (erofs_is_fscache_mode(sb)) {
>> + erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> + erofs_fscache_unregister_fs(sb);
>> + generic_shutdown_super(sb);
>
> Generally we can't do clean ups before generic_shutdown_super(), since
> generic_shutdown_super() may trigger IO, e.g. in sync_filesystem(),
> though it's not the case for erofs (read-only).
>
> How about embedding erofs_fscache_unregister_cookie() into
> erofs_fscache_unregister_fs(), and thus we can check domain_id in
> erofs_fscache_unregister_fs()?
>
Thanks.
>> + } else {
>> + kill_block_super(sb);
>> + }
>> +
>> erofs_free_dev_context(sbi->devs);
>> fs_put_dax(sbi->dax_dev, NULL);
>> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> - erofs_fscache_unregister_fs(sb);
>> kfree(sbi->opt.fsid);
>> kfree(sbi->opt.domain_id);
>> kfree(sbi);
>> @@ -951,7 +952,6 @@ static void erofs_put_super(struct super_block *sb)
>> iput(sbi->managed_cache);
>> sbi->managed_cache = NULL;
>> #endif
>> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> }
>>
>> struct file_system_type erofs_fs_type = {
>
在 2022/9/9 16:42, JeffleXu 写道:
>
>
> On 9/2/22 6:53 PM, Jia Zhu wrote:
>> A new fscache-based shared domain mode is going to be introduced for
>> erofs. In which case, same data blobs in same domain will be shared
>> and reused to reduce on-disk space usage.
>>
>> As the first step, we use pseudo mnt to manage and maintain domain's
>> lifecycle.
>>
>> The implementation of sharing blobs will be introduced in subsequent
>> patches.
>>
>> Signed-off-by: Jia Zhu <[email protected]>
>> ---
>> fs/erofs/fscache.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-
>> fs/erofs/internal.h | 18 ++++++++-
>> fs/erofs/super.c | 51 ++++++++++++++++++------
>> 3 files changed, 149 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>> index 8e01d89c3319..439dd3cc096a 100644
>> --- a/fs/erofs/fscache.c
>> +++ b/fs/erofs/fscache.c
>> @@ -1,10 +1,15 @@
>> // SPDX-License-Identifier: GPL-2.0-or-later
>> /*
>> * Copyright (C) 2022, Alibaba Cloud
>> + * Copyright (C) 2022, Bytedance Inc. All rights reserved.
>> */
>> #include <linux/fscache.h>
>> #include "internal.h"
>>
>> +static DEFINE_MUTEX(erofs_domain_list_lock);
>> +static LIST_HEAD(erofs_domain_list);
>> +static struct vfsmount *erofs_pseudo_mnt;
>> +
>> static struct netfs_io_request *erofs_fscache_alloc_request(struct address_space *mapping,
>> loff_t start, size_t len)
>> {
>> @@ -417,6 +422,87 @@ const struct address_space_operations erofs_fscache_access_aops = {
>> .readahead = erofs_fscache_readahead,
>> };
>>
>> +static void erofs_fscache_domain_get(struct erofs_domain *domain)
>> +{
>> + if (!domain)
>> + return;
>> + refcount_inc(&domain->ref);
>> +}
>
Hi Jingbo,
Thanks for your careful review.
> It seems that the input @domain can not be NULL, and thus the NULL check
> is not needed.
>
I will remove it in next version.
> Besides how about:
>
> struct erofs_domain *domain erofs_fscache_domain_get(struct erofs_domain
> *domain)
> {
> refcount_inc(&domain->ref);
> return domain;
> }
>
Thanks for the suggestion, I will apply it.
>> +
>> +static void erofs_fscache_domain_put(struct erofs_domain *domain)
>> +{
>> + if (!domain)
>> + return;
>> + if (refcount_dec_and_test(&domain->ref)) {
>> + fscache_relinquish_volume(domain->volume, NULL, false);
>> + mutex_lock(&erofs_domain_list_lock);
>> + list_del(&domain->list);
>> + mutex_unlock(&erofs_domain_list_lock);
>> + kfree(domain->domain_id);
>> + kfree(domain);
>> + }
>> +}
>> +
>> +static int erofs_fscache_init_domain(struct super_block *sb)
>> +{
>> + int err;
>> + struct erofs_domain *domain;
>> + struct erofs_sb_info *sbi = EROFS_SB(sb);
>> +
>> + domain = kzalloc(sizeof(struct erofs_domain), GFP_KERNEL);
>> + if (!domain)
>> + return -ENOMEM;
>> +
>> + domain->domain_id = kstrdup(sbi->opt.domain_id, GFP_KERNEL);
>> + if (!domain->domain_id) {
>> + kfree(domain);
>> + return -ENOMEM;
>> + }
>> + sbi->domain = domain;
>> + if (!erofs_pseudo_mnt) {
>> + erofs_pseudo_mnt = kern_mount(&erofs_fs_type);
>> + if (IS_ERR(erofs_pseudo_mnt)) {
>> + err = PTR_ERR(erofs_pseudo_mnt);
>> + goto out;
>> + }
>> + }
>> + err = erofs_fscache_register_fs(sb);
>> + if (err)
>> + goto out;
>> +
>> + domain->volume = sbi->volume;
>> + refcount_set(&domain->ref, 1);
>> + mutex_init(&domain->mutex);
>> + list_add(&domain->list, &erofs_domain_list);
>> + return 0;
>> +out:
>> + kfree(domain->domain_id);
>> + kfree(domain);
>> + sbi->domain = NULL;
>> + return err;
>> +}
>> +
>> +int erofs_fscache_register_domain(struct super_block *sb)
>> +{
>> + int err;
>> + struct erofs_domain *domain;
>> + struct erofs_sb_info *sbi = EROFS_SB(sb);
>> +
>> + mutex_lock(&erofs_domain_list_lock);
>> + list_for_each_entry(domain, &erofs_domain_list, list) {
>> + if (!strcmp(domain->domain_id, sbi->opt.domain_id)) {
>> + erofs_fscache_domain_get(domain);
>> + sbi->domain = domain;
>
> sbi->domain = erofs_fscache_domain_get(domain);
>
>> + sbi->volume = domain->volume;
>> + mutex_unlock(&erofs_domain_list_lock);
>> + return 0;
>> + }
>> + }
>> + err = erofs_fscache_init_domain(sb);
>> + mutex_unlock(&erofs_domain_list_lock);
>> + return err;
>> +}
>> +
>> int erofs_fscache_register_cookie(struct super_block *sb,
>> struct erofs_fscache **fscache,
>> char *name, bool need_inode)
>> @@ -495,7 +581,8 @@ int erofs_fscache_register_fs(struct super_block *sb)
>> char *name;
>> int ret = 0;
>>
>> - name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
>> + name = kasprintf(GFP_KERNEL, "erofs,%s",
>> + sbi->domain ? sbi->domain->domain_id : sbi->opt.fsid);
>
> Do we also need to encode the cookie name in the "<domain_id>,<fsid>"
> format? This will affect the path of the cache files.
>
I think even though the cookies have the same name, they belong to
different volumes(path). Cookies do not affect each other.
Are there other benefits to doing so?
>> if (!name)
>> return -ENOMEM;
>>
>> @@ -515,6 +602,10 @@ void erofs_fscache_unregister_fs(struct super_block *sb)
>> {
>> struct erofs_sb_info *sbi = EROFS_SB(sb);
>>
>> - fscache_relinquish_volume(sbi->volume, NULL, false);
>> + if (sbi->domain)
>> + erofs_fscache_domain_put(sbi->domain);
>> + else
>> + fscache_relinquish_volume(sbi->volume, NULL, false);
>> sbi->volume = NULL;
>> + sbi->domain = NULL;
>> }
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index fe435d077f1a..2790c93ffb83 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -99,6 +99,14 @@ struct erofs_sb_lz4_info {
>> u16 max_pclusterblks;
>> };
>>
>> +struct erofs_domain {
>> + refcount_t ref;
>> + struct mutex mutex;
>> + struct list_head list;
>> + struct fscache_volume *volume;
>> + char *domain_id;
>> +};
>> +
>> struct erofs_fscache {
>> struct fscache_cookie *cookie;
>> struct inode *inode;
>> @@ -158,6 +166,7 @@ struct erofs_sb_info {
>> /* fscache support */
>> struct fscache_volume *volume;
>> struct erofs_fscache *s_fscache;
>> + struct erofs_domain *domain;
>> };
>>
>> #define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
>> @@ -394,6 +403,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
>> }
>>
>> extern const struct super_operations erofs_sops;
>> +extern struct file_system_type erofs_fs_type;
>>
>> extern const struct address_space_operations erofs_raw_access_aops;
>> extern const struct address_space_operations z_erofs_aops;
>> @@ -610,6 +620,7 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
>> #ifdef CONFIG_EROFS_FS_ONDEMAND
>> int erofs_fscache_register_fs(struct super_block *sb);
>> void erofs_fscache_unregister_fs(struct super_block *sb);
>> +int erofs_fscache_register_domain(struct super_block *sb);
>>
>> int erofs_fscache_register_cookie(struct super_block *sb,
>> struct erofs_fscache **fscache,
>> @@ -620,10 +631,15 @@ extern const struct address_space_operations erofs_fscache_access_aops;
>> #else
>> static inline int erofs_fscache_register_fs(struct super_block *sb)
>> {
>> - return 0;
>> + return -EOPNOTSUPP;
>> }
>> static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
>>
>> +static inline int erofs_fscache_register_domain(const struct super_block *sb)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> static inline int erofs_fscache_register_cookie(struct super_block *sb,
>> struct erofs_fscache **fscache,
>> char *name, bool need_inode)
>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>> index d01109069c6b..69de1731f454 100644
>> --- a/fs/erofs/super.c
>> +++ b/fs/erofs/super.c
>> @@ -688,6 +688,13 @@ static const struct export_operations erofs_export_ops = {
>> .get_parent = erofs_get_parent,
>> };
>>
>> +static int erofs_fc_fill_pseudo_super(struct super_block *sb, struct fs_context *fc)
>> +{
>> + static const struct tree_descr empty_descr = {""};
>> +
>> + return simple_fill_super(sb, EROFS_SUPER_MAGIC, &empty_descr);
>> +}
>> +
>> static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>> {
>> struct inode *inode;
>> @@ -715,12 +722,17 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>> sb->s_blocksize = EROFS_BLKSIZ;
>> sb->s_blocksize_bits = LOG_BLOCK_SIZE;
>>
>> - err = erofs_fscache_register_fs(sb);
>> - if (err)
>> - return err;
>> -
>> - err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
>> - sbi->opt.fsid, true);
>> + if (sbi->opt.domain_id) {
>> + err = erofs_fscache_register_domain(sb);
>> + if (err)
>> + return err;
>> + } else {
>> + err = erofs_fscache_register_fs(sb);
>> + if (err)
>> + return err;
>> + err = erofs_fscache_register_cookie(sb, &sbi->s_fscache,
>> + sbi->opt.fsid, true);
>
> We'd better keep only one entry to the fscache related codes. How about
> moving erofs_fscache_register_cookie(), i.e. registering cookie for
> bootstrap, into erofs_fscache_register_fs()? Similarly, check the
> domain_id and call erofs_fscache_register_domain() inside
> erofs_fscache_register_fs().
>
> Similarly, check domain_id and call erofs_domain_register_cookie()
> inside erofs_fscache_register_cookie().
>
Thanks, that looks great, I will revise it in next version.
>
>
>> + }
>> if (err)
>> return err;
>>
>> @@ -798,8 +810,12 @@ static int erofs_fc_get_tree(struct fs_context *fc)
>> {
>> struct erofs_fs_context *ctx = fc->fs_private;
>>
>> - if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.fsid)
>> - return get_tree_nodev(fc, erofs_fc_fill_super);
>> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND)) {
>> + if (!ctx && fc->sb_flags & SB_KERNMOUNT)
>> + return get_tree_nodev(fc, erofs_fc_fill_pseudo_super);
>> + if (ctx->opt.fsid)
>> + return get_tree_nodev(fc, erofs_fc_fill_super);
>> + }
>>
>> return get_tree_bdev(fc, erofs_fc_fill_super);
>> }
>> @@ -849,6 +865,8 @@ static void erofs_fc_free(struct fs_context *fc)
>> {
>> struct erofs_fs_context *ctx = fc->fs_private;
>>
>> + if (!ctx)
>> + return;
>> erofs_free_dev_context(ctx->devs);
>> kfree(ctx->opt.fsid);
>> kfree(ctx->opt.domain_id);
>> @@ -864,8 +882,12 @@ static const struct fs_context_operations erofs_context_ops = {
>>
>> static int erofs_init_fs_context(struct fs_context *fc)
>> {
>> - struct erofs_fs_context *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> + struct erofs_fs_context *ctx;
>>
>> + if (fc->sb_flags & SB_KERNMOUNT)
>> + goto out;
>> +
>> + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> if (!ctx)
>> return -ENOMEM;
>> ctx->devs = kzalloc(sizeof(struct erofs_dev_context), GFP_KERNEL);
>> @@ -878,6 +900,7 @@ static int erofs_init_fs_context(struct fs_context *fc)
>> idr_init(&ctx->devs->tree);
>> init_rwsem(&ctx->devs->rwsem);
>> erofs_default_options(ctx);
>> +out:
>> fc->ops = &erofs_context_ops;
>> return 0;
>> }
>> @@ -892,6 +915,10 @@ static void erofs_kill_sb(struct super_block *sb)
>>
>> WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
>>
>> + if (sb->s_flags & SB_KERNMOUNT) {
>> + kill_litter_super(sb);
>> + return;
>> + }
>> if (erofs_is_fscache_mode(sb))
>> generic_shutdown_super(sb);
>> else
>> @@ -916,8 +943,8 @@ static void erofs_put_super(struct super_block *sb)
>> {
>> struct erofs_sb_info *const sbi = EROFS_SB(sb);
>>
>> - DBG_BUGON(!sbi);
>> -
>> + if (!sbi)
>> + return;
>> erofs_unregister_sysfs(sb);
>> erofs_shrinker_unregister(sb);
>> #ifdef CONFIG_EROFS_FS_ZIP
>> @@ -927,7 +954,7 @@ static void erofs_put_super(struct super_block *sb)
>> erofs_fscache_unregister_cookie(&sbi->s_fscache);
>> }
>>
>> -static struct file_system_type erofs_fs_type = {
>> +struct file_system_type erofs_fs_type = {
>> .owner = THIS_MODULE,
>> .name = "erofs",
>> .init_fs_context = erofs_init_fs_context,
>
在 2022/9/9 18:28, JeffleXu 写道:
>
>
> On 9/9/22 5:55 PM, JeffleXu wrote:
>>
>>
>> On 9/2/22 6:53 PM, Jia Zhu wrote:
>>> In erofs umount scenario, erofs_fscache_unregister_cookie() is called
>>> twice in kill_sb() and put_super().
>>>
>>> It works for original semantics, cause 'ctx' will be set to NULL in
>>> put_super() and will not be unregister again in kill_sb().
>>> However, in shared domain scenario, we use refcount to maintain the
>>> lifecycle of cookie. Unregister the cookie twice will cause it to be
>>> released early.
>
> Sorry, why can't we also set sbi->s_fscache to NULL after decrementing
> the refcount in shared domain mode, to avoid the refcount being
> decremented twice?
>
In below case:
1. mount -t erofs none -o fsid=bootstrap.img -o device=blob.img -o
domain_id=1 mnt
2. mount -t erofs none -o fsid=bootstrap.img -o device=blob.img -o
domain_id=1 mnt2
3. Procedure 2 will fail since we can't mount erofs with the same sysfs
entry.
4. Mount()'s error handling path will step into put_super().
5. Domain's refcnt will be decremented even if it has not yet
been incremented.
>>>
>>> For the above reasons, this patch removes duplicate unregister_cookie
>>> and move fscache_unregister_* before shotdown_super() to prevent busy
>>> inode(ctx->inode) when umount.
>>>
>>> Signed-off-by: Jia Zhu <[email protected]>
>>> ---
>>> fs/erofs/super.c | 16 ++++++++--------
>>> 1 file changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>>> index 69de1731f454..667a78f0ee70 100644
>>> --- a/fs/erofs/super.c
>>> +++ b/fs/erofs/super.c
>>> @@ -919,19 +919,20 @@ static void erofs_kill_sb(struct super_block *sb)
>>> kill_litter_super(sb);
>>> return;
>>> }
>>> - if (erofs_is_fscache_mode(sb))
>>> - generic_shutdown_super(sb);
>>> - else
>>> - kill_block_super(sb);
>>> -
>>> sbi = EROFS_SB(sb);
>>> if (!sbi)
>>> return;
>>>
>>> + if (erofs_is_fscache_mode(sb)) {
>>> + erofs_fscache_unregister_cookie(&sbi->s_fscache);
>>> + erofs_fscache_unregister_fs(sb);
>>> + generic_shutdown_super(sb);
>>
>> Generally we can't do clean ups before generic_shutdown_super(), since
>> generic_shutdown_super() may trigger IO, e.g. in sync_filesystem(),
>> though it's not the case for erofs (read-only).
>>
>> How about embedding erofs_fscache_unregister_cookie() into
>> erofs_fscache_unregister_fs(), and thus we can check domain_id in
>> erofs_fscache_unregister_fs()?
>>
>>> + } else {
>>> + kill_block_super(sb);
>>> + }
>>> +
>>> erofs_free_dev_context(sbi->devs);
>>> fs_put_dax(sbi->dax_dev, NULL);
>>> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
>>> - erofs_fscache_unregister_fs(sb);
>>> kfree(sbi->opt.fsid);
>>> kfree(sbi->opt.domain_id);
>>> kfree(sbi);
>>> @@ -951,7 +952,6 @@ static void erofs_put_super(struct super_block *sb)
>>> iput(sbi->managed_cache);
>>> sbi->managed_cache = NULL;
>>> #endif
>>> - erofs_fscache_unregister_cookie(&sbi->s_fscache);
>>> }
>>>
>>> struct file_system_type erofs_fs_type = {
>>
>
On 9/13/22 12:31 PM, Jia Zhu wrote:
>
>
> 在 2022/9/9 16:42, JeffleXu 写道:
>>> int erofs_fscache_register_cookie(struct super_block *sb,
>>> struct erofs_fscache **fscache,
>>> char *name, bool need_inode)
>>> @@ -495,7 +581,8 @@ int erofs_fscache_register_fs(struct super_block
>>> *sb)
>>> char *name;
>>> int ret = 0;
>>> - name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
>>> + name = kasprintf(GFP_KERNEL, "erofs,%s",
>>> + sbi->domain ? sbi->domain->domain_id : sbi->opt.fsid);
>>
>> Do we also need to encode the cookie name in the "<domain_id>,<fsid>"
>> format? This will affect the path of the cache files.
>>
> I think even though the cookies have the same name, they belong to
> different volumes(path). Cookies do not affect each other.
> Are there other benefits to doing so?
Okay. The current implementation is correct. Please ignore the noise.
--
Thanks,
Jingbo
On 9/2/22 6:53 PM, Jia Zhu wrote:
> Several erofs filesystems can belong to one domain, and data blobs can
> be shared among these erofs filesystems of same domain.
>
> Users could specify domain_id mount option to create or join into a domain.
>
> Signed-off-by: Jia Zhu <[email protected]>
> ---
> fs/erofs/fscache.c | 73 +++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/internal.h | 12 ++++++++
> fs/erofs/super.c | 10 +++++--
> 3 files changed, 93 insertions(+), 2 deletions(-)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 439dd3cc096a..c01845808ede 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -559,12 +559,27 @@ int erofs_fscache_register_cookie(struct super_block *sb,
> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
> {
> struct erofs_fscache *ctx = *fscache;
> + struct erofs_domain *domain;
>
> if (!ctx)
> return;
> + domain = ctx->domain;
> + if (domain) {
> + mutex_lock(&domain->mutex);
> + /* Cookie is still in use */
> + if (atomic_read(&ctx->anon_inode->i_count) > 1) {
> + iput(ctx->anon_inode);
> + mutex_unlock(&domain->mutex);
> + return;
> + }
> + iput(ctx->anon_inode);
> + kfree(ctx->name);
> + mutex_unlock(&domain->mutex);
> + }
>
> fscache_unuse_cookie(ctx->cookie, NULL, NULL);
> fscache_relinquish_cookie(ctx->cookie, false);
> + erofs_fscache_domain_put(domain);
> ctx->cookie = NULL;
>
> iput(ctx->inode);
> @@ -609,3 +624,61 @@ void erofs_fscache_unregister_fs(struct super_block *sb)
> sbi->volume = NULL;
> sbi->domain = NULL;
> }
> +
> +static int erofs_fscache_domain_init_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache, char *name, bool need_inode)
> +{
> + int ret;
> + struct inode *inode;
> + struct erofs_fscache *ctx;
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> + struct erofs_domain *domain = sbi->domain;
> +
> + ret = erofs_fscache_register_cookie(sb, &ctx, name, need_inode);
> + if (ret)
> + return ret;
> +
> + ctx->name = kstrdup(name, GFP_KERNEL);
> + if (!ctx->name)
> + return -ENOMEM;
Shall we clean up the above registered cookie in the error path?
> +
> + inode = new_inode(erofs_pseudo_mnt->mnt_sb);
> + if (!inode) {
> + kfree(ctx->name);
> + return -ENOMEM;
> + }
Ditto.
> +
> + ctx->domain = domain;
> + ctx->anon_inode = inode;
> + inode->i_private = ctx;
> + erofs_fscache_domain_get(domain);
> + *fscache = ctx;
> + return 0;
> +}
> +
> +int erofs_domain_register_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache, char *name, bool need_inode)
> +{
> + int err;
> + struct inode *inode;
> + struct erofs_fscache *ctx;
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> + struct erofs_domain *domain = sbi->domain;
> + struct super_block *psb = erofs_pseudo_mnt->mnt_sb;
> +
> + mutex_lock(&domain->mutex);
What is domain->mutex used for?
> + list_for_each_entry(inode, &psb->s_inodes, i_sb_list) {
> + ctx = inode->i_private;
> + if (!ctx)
> + continue;
> + if (!strcmp(ctx->name, name)) {
> + *fscache = ctx;
> + igrab(inode);
> + mutex_unlock(&domain->mutex);
> + return 0;
> + }
> + }
> + err = erofs_fscache_domain_init_cookie(sb, fscache, name, need_inode);
> + mutex_unlock(&domain->mutex);
> + return err;
> +}
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 2790c93ffb83..efa4f4ad77cc 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -110,6 +110,9 @@ struct erofs_domain {
> struct erofs_fscache {
> struct fscache_cookie *cookie;
> struct inode *inode;
> + struct inode *anon_inode;
Why can't we reuse @inode for anon_inode?
> + struct erofs_domain *domain;
> + char *name;
> };
>
> struct erofs_sb_info {
> @@ -625,6 +628,9 @@ int erofs_fscache_register_domain(struct super_block *sb);
> int erofs_fscache_register_cookie(struct super_block *sb,
> struct erofs_fscache **fscache,
> char *name, bool need_inode);
> +int erofs_domain_register_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache,
> + char *name, bool need_inode);
> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
>
> extern const struct address_space_operations erofs_fscache_access_aops;
> @@ -646,6 +652,12 @@ static inline int erofs_fscache_register_cookie(struct super_block *sb,
> {
> return -EOPNOTSUPP;
> }
> +static inline int erofs_domain_register_cookie(struct super_block *sb,
> + struct erofs_fscache **fscache,
> + char *name, bool need_inode)
> +{
> + return -EOPNOTSUPP;
> +}
>
> static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
> {
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 667a78f0ee70..11c5ba84567c 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -245,8 +245,12 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
> }
>
> if (erofs_is_fscache_mode(sb)) {
> - ret = erofs_fscache_register_cookie(sb, &dif->fscache,
> - dif->path, false);
> + if (sbi->opt.domain_id)
> + ret = erofs_domain_register_cookie(sb, &dif->fscache, dif->path,
> + false);
> + else
> + ret = erofs_fscache_register_cookie(sb, &dif->fscache, dif->path,
> + false);
> if (ret)
> return ret;
> } else {
> @@ -726,6 +730,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> err = erofs_fscache_register_domain(sb);
> if (err)
> return err;
> + err = erofs_domain_register_cookie(sb, &sbi->s_fscache,
> + sbi->opt.fsid, true);
> } else {
> err = erofs_fscache_register_fs(sb);
> if (err)
--
Thanks,
Jingbo
在 2022/9/13 14:27, JeffleXu 写道:
>
>
> On 9/2/22 6:53 PM, Jia Zhu wrote:
>> Several erofs filesystems can belong to one domain, and data blobs can
>> be shared among these erofs filesystems of same domain.
>>
>> Users could specify domain_id mount option to create or join into a domain.
>>
>> Signed-off-by: Jia Zhu <[email protected]>
>> ---
>> fs/erofs/fscache.c | 73 +++++++++++++++++++++++++++++++++++++++++++++
>> fs/erofs/internal.h | 12 ++++++++
>> fs/erofs/super.c | 10 +++++--
>> 3 files changed, 93 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
>> index 439dd3cc096a..c01845808ede 100644
>> --- a/fs/erofs/fscache.c
>> +++ b/fs/erofs/fscache.c
>> @@ -559,12 +559,27 @@ int erofs_fscache_register_cookie(struct super_block *sb,
>> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
>> {
>> struct erofs_fscache *ctx = *fscache;
>> + struct erofs_domain *domain;
>>
>> if (!ctx)
>> return;
>> + domain = ctx->domain;
>> + if (domain) {
>> + mutex_lock(&domain->mutex);
>> + /* Cookie is still in use */
>> + if (atomic_read(&ctx->anon_inode->i_count) > 1) {
>> + iput(ctx->anon_inode);
>> + mutex_unlock(&domain->mutex);
>> + return;
>> + }
>> + iput(ctx->anon_inode);
>> + kfree(ctx->name);
>> + mutex_unlock(&domain->mutex);
>> + }
>>
>> fscache_unuse_cookie(ctx->cookie, NULL, NULL);
>> fscache_relinquish_cookie(ctx->cookie, false);
>> + erofs_fscache_domain_put(domain);
>> ctx->cookie = NULL;
>>
>> iput(ctx->inode);
>> @@ -609,3 +624,61 @@ void erofs_fscache_unregister_fs(struct super_block *sb)
>> sbi->volume = NULL;
>> sbi->domain = NULL;
>> }
>> +
>> +static int erofs_fscache_domain_init_cookie(struct super_block *sb,
>> + struct erofs_fscache **fscache, char *name, bool need_inode)
>> +{
>> + int ret;
>> + struct inode *inode;
>> + struct erofs_fscache *ctx;
>> + struct erofs_sb_info *sbi = EROFS_SB(sb);
>> + struct erofs_domain *domain = sbi->domain;
>> +
>> + ret = erofs_fscache_register_cookie(sb, &ctx, name, need_inode);
>> + if (ret)
>> + return ret;
>> +
>> + ctx->name = kstrdup(name, GFP_KERNEL);
>> + if (!ctx->name)
>> + return -ENOMEM;
>
> Shall we clean up the above registered cookie in the error path?
>
If this step fails, error will be transmitted to vfs_get_tree() and
erofs_kill_sb() will relinquish the cookie.
>> +
>> + inode = new_inode(erofs_pseudo_mnt->mnt_sb);
>> + if (!inode) {
>> + kfree(ctx->name);
>> + return -ENOMEM;
>> + }
>
> Ditto.
>
>> +
>> + ctx->domain = domain;
>> + ctx->anon_inode = inode;
>> + inode->i_private = ctx;
>> + erofs_fscache_domain_get(domain);
>> + *fscache = ctx;
>> + return 0;
>> +}
>> +
>> +int erofs_domain_register_cookie(struct super_block *sb,
>> + struct erofs_fscache **fscache, char *name, bool need_inode)
>> +{
>> + int err;
>> + struct inode *inode;
>> + struct erofs_fscache *ctx;
>> + struct erofs_sb_info *sbi = EROFS_SB(sb);
>> + struct erofs_domain *domain = sbi->domain;
>> + struct super_block *psb = erofs_pseudo_mnt->mnt_sb;
>> +
>> + mutex_lock(&domain->mutex);
>
> What is domain->mutex used for?
>
This lock is used to avoid race conditions between cookie's
traverse/del/insert in the inode list.
It seems to be possible to increase the granularity of the lock
after v2's change "Only initialize one pseudo fs to manage anonymous
inodes(cookies).".
>
>> + list_for_each_entry(inode, &psb->s_inodes, i_sb_list) {
>> + ctx = inode->i_private;
>> + if (!ctx)
>> + continue;
>> + if (!strcmp(ctx->name, name)) {
>> + *fscache = ctx;
>> + igrab(inode);
>> + mutex_unlock(&domain->mutex);
>> + return 0;
>> + }
>> + }
>> + err = erofs_fscache_domain_init_cookie(sb, fscache, name, need_inode);
>> + mutex_unlock(&domain->mutex);
>> + return err;
>> +}
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index 2790c93ffb83..efa4f4ad77cc 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -110,6 +110,9 @@ struct erofs_domain {
>> struct erofs_fscache {
>> struct fscache_cookie *cookie;
>> struct inode *inode;
>> + struct inode *anon_inode;
>
> Why can't we reuse @inode for anon_inode?
>
We use pseudo sb's anonymous inodes list to manage the cookie.
Wouldn't it be a bit of messy if reuses erofs meta inode?
>
>> + struct erofs_domain *domain;
>> + char *name;
>> };
>>
>> struct erofs_sb_info {
>> @@ -625,6 +628,9 @@ int erofs_fscache_register_domain(struct super_block *sb);
>> int erofs_fscache_register_cookie(struct super_block *sb,
>> struct erofs_fscache **fscache,
>> char *name, bool need_inode);
>> +int erofs_domain_register_cookie(struct super_block *sb,
>> + struct erofs_fscache **fscache,
>> + char *name, bool need_inode);
>> void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache);
>>
>> extern const struct address_space_operations erofs_fscache_access_aops;
>> @@ -646,6 +652,12 @@ static inline int erofs_fscache_register_cookie(struct super_block *sb,
>> {
>> return -EOPNOTSUPP;
>> }
>> +static inline int erofs_domain_register_cookie(struct super_block *sb,
>> + struct erofs_fscache **fscache,
>> + char *name, bool need_inode)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>>
>> static inline void erofs_fscache_unregister_cookie(struct erofs_fscache **fscache)
>> {
>> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
>> index 667a78f0ee70..11c5ba84567c 100644
>> --- a/fs/erofs/super.c
>> +++ b/fs/erofs/super.c
>> @@ -245,8 +245,12 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
>> }
>>
>> if (erofs_is_fscache_mode(sb)) {
>> - ret = erofs_fscache_register_cookie(sb, &dif->fscache,
>> - dif->path, false);
>> + if (sbi->opt.domain_id)
>> + ret = erofs_domain_register_cookie(sb, &dif->fscache, dif->path,
>> + false);
>> + else
>> + ret = erofs_fscache_register_cookie(sb, &dif->fscache, dif->path,
>> + false);
>> if (ret)
>> return ret;
>> } else {
>> @@ -726,6 +730,8 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
>> err = erofs_fscache_register_domain(sb);
>> if (err)
>> return err;
>> + err = erofs_domain_register_cookie(sb, &sbi->s_fscache,
>> + sbi->opt.fsid, true);
>> } else {
>> err = erofs_fscache_register_fs(sb);
>> if (err)
>
On 9/2/22 6:53 PM, Jia Zhu wrote:
> int erofs_fscache_register_cookie(struct super_block *sb,
> struct erofs_fscache **fscache,
> char *name, bool need_inode)
> @@ -495,7 +581,8 @@ int erofs_fscache_register_fs(struct super_block *sb)
> char *name;
> int ret = 0;
>
> - name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
> + name = kasprintf(GFP_KERNEL, "erofs,%s",
> + sbi->domain ? sbi->domain->domain_id : sbi->opt.fsid);
> if (!name)
> return -ENOMEM;
>
What if domain_id and fsid has the same value?
How about the format "erofs,<domain_id>,<fsid>"? While in the
non-share-domain mode, is the format like "erofs,,<fsid>" or the default
"erofs,<fsid>"?
--
Thanks,
Jingbo
在 2022/9/14 11:02, JeffleXu 写道:
>
>
> On 9/2/22 6:53 PM, Jia Zhu wrote:
>
>> int erofs_fscache_register_cookie(struct super_block *sb,
>> struct erofs_fscache **fscache,
>> char *name, bool need_inode)
>> @@ -495,7 +581,8 @@ int erofs_fscache_register_fs(struct super_block *sb)
>> char *name;
>> int ret = 0;
>>
>> - name = kasprintf(GFP_KERNEL, "erofs,%s", sbi->opt.fsid);
>> + name = kasprintf(GFP_KERNEL, "erofs,%s",
>> + sbi->domain ? sbi->domain->domain_id : sbi->opt.fsid);
>> if (!name)
>> return -ENOMEM;
>>
>
> What if domain_id and fsid has the same value?
>
> How about the format "erofs,<domain_id>,<fsid>"? While in the
> non-share-domain mode, is the format like "erofs,,<fsid>" or the default
> "erofs,<fsid>"?
>
Thanks for pointing this out. I'll revise it.
>