2022-03-25 18:55:30

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 00/22] fscache,erofs: fscache-based on-demand read semantics

changes since v5:
- cachefiles: Move the enabling of on-demand read mode to the end of the
cachefiles subset of the patchset (David Howells) (patch 6)
- cachefiles: avoid the duplicate kstrdup() when handling cinit command.
Also polish the commist message with the suggestion from David
Howells. (David Howells) (patch 3)
- cachefiles: reuse the spinlock inside xarray to prevent the race
condition, which also fixes GFP_KERNEL allocation while holding
rw_lock (Matthew Wilcox) (patch 3)
- cachefiles: completion of READ request is done through
CACHEFILES_IOC_CREAD ioctl on anon_fd (David Howells) (patch 5)
- erofs: rename erofs_bdev_mode() to erofs_is_nodev_mode() (Gao Xiang)
(patch 10)
- erofs: expand the existing "struct erofs_map_blocks" rather than create
a new "struct erofs_fscache_map" (Gao Xiang) (patch 17)
- erofs: fold functions handling readahead for inline/non-inline/hole
into one function, which also omits use of "struct
erofs_fscache_ra_ctx" (Gao Xiang) (patch 21)
- erofs: use folio APIs, though there's assumption that folio size
equals PAGE_SIZE (Gao Xiang)
- erofs: rename "-o uuid=" mount option to "-o tag=" (Gao Xiang) (patch
22)


Kernel Patchset
---------------
Git tree:

[email protected]:lostjeffle/linux.git jingbo/dev-erofs-fscache-v6

Gitweb:

https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v6


User Daemon for Quick Test
--------------------------
Git tree:

[email protected]:lostjeffle/demand-read-cachefilesd.git main

Gitweb:

https://github.com/lostjeffle/demand-read-cachefilesd


RFC: https://lore.kernel.org/all/[email protected]/t/
v1: https://lore.kernel.org/lkml/[email protected]/T/
v2: https://lore.kernel.org/all/[email protected]/t/
v3: https://lore.kernel.org/lkml/[email protected]/T/
v4: https://lore.kernel.org/lkml/[email protected]/T/#t
v5: https://lore.kernel.org/lkml/[email protected]/T/


[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.

erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)

Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.

To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.

The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.

[1] https://nydus.dev
[2] https://sched.co/pcdL


[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.

When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user dameon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.


[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.

If data has already been ready in the backing file, netfs (e.g. erofs)
will read from the backing file directly and won't be trapped to user
space anymore. Thus the user daemon could fetch data (from remote)
asynchronously on the background, and thus accelerate the backing file
accessing in some degree.

2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.

In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.



Jeffle Xu (22):
fscache: export fscache_end_operation()
cachefiles: extract write routine
cachefiles: notify user daemon with anon_fd when looking up cookie
cachefiles: notify user daemon when withdrawing cookie
cachefiles: implement on-demand read
cachefiles: enable on-demand read mode
cachefiles: document on-demand read mode
erofs: use meta buffers for erofs_read_superblock()
erofs: make erofs_map_blocks() generally available
erofs: add mode checking helper
erofs: register global fscache volume
erofs: add cookie context helper functions
erofs: add anonymous inode managing page cache of blob file
erofs: add erofs_fscache_read_folios() helper
erofs: register cookie context for bootstrap blob
erofs: implement fscache-based metadata read
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based data read for inline layout
erofs: register cookie context for data blobs
erofs: implement fscache-based data read for data blobs
erofs: implement fscache-based data readahead
erofs: add 'tag' mount option

.../filesystems/caching/cachefiles.rst | 178 +++++++
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 89 +++-
fs/cachefiles/interface.c | 2 +
fs/cachefiles/internal.h | 64 +++
fs/cachefiles/io.c | 72 ++-
fs/cachefiles/namei.c | 16 +-
fs/cachefiles/ondemand.c | 456 ++++++++++++++++++
fs/erofs/Kconfig | 10 +
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 24 +-
fs/erofs/fscache.c | 444 +++++++++++++++++
fs/erofs/inode.c | 8 +-
fs/erofs/internal.h | 51 ++
fs/erofs/super.c | 115 ++++-
fs/fscache/internal.h | 11 -
fs/nfs/fscache.c | 8 -
include/linux/fscache.h | 15 +
include/linux/netfs.h | 1 +
include/trace/events/cachefiles.h | 2 +
include/uapi/linux/cachefiles.h | 55 +++
22 files changed, 1544 insertions(+), 90 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 fs/erofs/fscache.c
create mode 100644 include/uapi/linux/cachefiles.h

--
2.27.0


2022-03-25 18:57:38

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 16/22] erofs: implement fscache-based metadata read

Implements the data plane of reading metadata from bootstrap blob file
over fscache.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 17 +++++++++++++++--
fs/erofs/fscache.c | 34 ++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 8 ++++++++
3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 6e2a28242453..b4571bea93d5 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -31,15 +31,28 @@ void erofs_put_metabuf(struct erofs_buf *buf)
void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
erofs_blk_t blkaddr, enum erofs_kmap_type type)
{
- struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
+ struct address_space *mapping;
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
erofs_off_t offset = blknr_to_addr(blkaddr);
pgoff_t index = offset >> PAGE_SHIFT;
struct page *page = buf->page;

if (!page || page->index != index) {
erofs_put_metabuf(buf);
- page = read_cache_page_gfp(mapping, index,
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) &&
+ erofs_is_nodev_mode(sb)) {
+ struct folio *folio;
+
+ folio = erofs_fscache_get_folio(sbi->bootstrap, index);
+ if (IS_ERR(folio))
+ page = (struct page *)folio;
+ else
+ page = folio_page(folio, 0);
+ } else {
+ mapping = sb->s_bdev->bd_inode->i_mapping;
+ page = read_cache_page_gfp(mapping, index,
mapping_gfp_constraint(mapping, ~__GFP_FS));
+ }
if (IS_ERR(page))
return page;
/* should already be PageUptodate, no need to lock page */
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 6a55f7b5f883..91377939b4f7 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -46,9 +46,43 @@ static inline int erofs_fscache_read_folio(struct fscache_cookie *cookie,
pstart);
}

+static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
+{
+ int ret;
+ struct erofs_fscache *ctx = (struct erofs_fscache *)data;
+ struct folio *folio = page_folio(page);
+
+ ret = erofs_fscache_read_folio(ctx->cookie, folio, folio_pos(folio));
+ if (!ret)
+ folio_mark_uptodate(folio);
+
+ folio_unlock(folio);
+ return ret;
+}
+
static const struct address_space_operations erofs_fscache_blob_aops = {
+ .readpage = erofs_fscache_readpage_blob,
};

+/*
+ * erofs_fscache_get_folio - find and read page cache of blob file
+ * @ctx: the context of the blob file
+ * @index: the page index
+ *
+ * Get the page cache of the blob file at the index offset. It will find the
+ * page through the address space of the anonymous inode. This function is only
+ * used to read page cache of bootstrap blob file (metadata), since currently
+ * only bootstrap blob file manages an anonymous inode inside the fscache
+ * context.
+ *
+ * Return: up to date page on success, ERR_PTR() on failure.
+ */
+struct folio *erofs_fscache_get_folio(struct erofs_fscache *ctx, pgoff_t index)
+{
+ DBG_BUGON(!ctx->inode);
+ return read_mapping_folio(ctx->inode->i_mapping, index, ctx);
+}
+
static int erofs_fscache_init_cookie(struct erofs_fscache *ctx, char *path)
{
struct fscache_cookie *cookie;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index d8c886a7491e..fa89a1e3012f 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -632,6 +632,8 @@ void erofs_exit_fscache(void);
struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path,
bool need_inode);
void erofs_fscache_put(struct erofs_fscache *ctx);
+
+struct folio *erofs_fscache_get_folio(struct erofs_fscache *ctx, pgoff_t index);
#else
static inline int erofs_init_fscache(void) { return 0; }
static inline void erofs_exit_fscache(void) {}
@@ -643,6 +645,12 @@ static inline struct erofs_fscache *erofs_fscache_get(struct super_block *sb,
return ERR_PTR(-EOPNOTSUPP);
}
static inline void erofs_fscache_put(struct erofs_fscache *ctx) {}
+
+static inline struct folio *erofs_fscache_get_folio(struct erofs_fscache *ctx,
+ pgoff_t index)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
#endif

#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
--
2.27.0

2022-03-25 18:59:15

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 06/22] cachefiles: enable on-demand read mode

Enable on-demand read mode by adding an optional parameter to the "bind"
command.

On-demand mode will be turned on when this parameter is "ondemand", i.e.
"bind ondemand". Otherwise cachefiles will work in the original mode.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/daemon.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 91b88a499737..2c38c5361bda 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -741,11 +741,6 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
cache->brun_percent >= 100)
return -ERANGE;

- if (*args) {
- pr_err("'bind' command doesn't take an argument\n");
- return -EINVAL;
- }
-
if (!cache->rootdirname) {
pr_err("No cache directory specified\n");
return -EINVAL;
@@ -757,6 +752,14 @@ static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
return -EBUSY;
}

+ if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+ !strcmp(args, "ondemand")) {
+ set_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags);
+ } else if (*args) {
+ pr_err("'bind' command doesn't take an argument\n");
+ return -EINVAL;
+ }
+
/* Make sure we have copies of the tag string */
if (!cache->tag) {
/*
--
2.27.0

2022-03-25 19:01:07

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 18/22] erofs: implement fscache-based data read for inline layout

This patch implements the data plane of reading data from bootstrap blob
file over fscache for inline layout.

For the heading non-inline part, the data plane for non-inline layout is
resued, while only the tail packing part needs special handling.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 45 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 4a9a4e60c15d..d75958470645 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -74,8 +74,9 @@ static int erofs_fscache_readpage_noinline(struct folio *folio,
{
struct fscache_cookie *cookie = map->m_fscache->cookie;
/*
- * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
- * and the output map.m_pa is exactly the physical address of o_la.
+ * 1) For FLAT_PLAIN and FLAT_INLINE (the heading non tail packing part)
+ * layout, the output map.m_la shall be equal to o_la, and the output
+ * map.m_pa is exactly the physical address of o_la.
* 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
* nearest chunk boundary, and the output map.m_pa is actually the
* physical address of this chunk boundary. So we need to recalculate
@@ -86,6 +87,42 @@ static int erofs_fscache_readpage_noinline(struct folio *folio,
return erofs_fscache_read_folio(cookie, folio, start);
}

+static int erofs_fscache_readpage_inline(struct folio *folio,
+ struct erofs_map_blocks *map)
+{
+ struct inode *inode = folio_file_mapping(folio)->host;
+ struct super_block *sb = inode->i_sb;
+ struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
+ erofs_blk_t blknr;
+ size_t offset, len;
+ void *src, *dst;
+
+ /*
+ * For inline (tail packing) layout, the offset may be non-zero, which
+ * can be calculated from corresponding physical address directly.
+ * Currently only flat layout supports inline (FLAT_INLINE), and the
+ * output map.m_pa is exactly the physical address of o_la in this case.
+ */
+ offset = erofs_blkoff(map->m_pa);
+ blknr = erofs_blknr(map->m_pa);
+ len = map->m_llen;
+
+ src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+ if (IS_ERR(src))
+ return PTR_ERR(src);
+
+ DBG_BUGON(folio_size(folio) != PAGE_SIZE);
+
+ dst = kmap(folio_page(folio, 0));
+ memcpy(dst, src + offset, len);
+ memset(dst + len, 0, PAGE_SIZE - len);
+ kunmap(folio_page(folio, 0));
+
+ erofs_put_metabuf(&buf);
+
+ return 0;
+}
+
static int erofs_fscache_do_readpage(struct folio *folio)
{
struct inode *inode = folio_file_mapping(folio)->host;
@@ -116,8 +153,12 @@ static int erofs_fscache_do_readpage(struct folio *folio)
if (ret)
return ret;

+ if (map.m_flags & EROFS_MAP_META)
+ return erofs_fscache_readpage_inline(folio, &map);
+
switch (vi->datalayout) {
case EROFS_INODE_FLAT_PLAIN:
+ case EROFS_INODE_FLAT_INLINE:
case EROFS_INODE_CHUNK_BASED:
return erofs_fscache_readpage_noinline(folio, &map);
default:
--
2.27.0

2022-03-25 19:04:27

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 15/22] erofs: register cookie context for bootstrap blob

Registers fscache_cookie for the bootstrap blob file. The bootstrap blob
file can be specified by a new mount option, which is going to be
introduced by a following patch.

Something worth mentioning about the cleanup routine.

1. The init routine is prior to when the root inode gets initialized,
and thus the corresponding cleanup routine shall be placed inside
.kill_sb() callback.

2. The init routine will instantiate anonymous inodes under the
super_block, and thus .put_super() callback shall also contain the
cleanup routine. Or we'll get "VFS: Busy inodes after unmount." warning.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/internal.h | 3 +++
fs/erofs/super.c | 17 +++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 459f31803c3b..d8c886a7491e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -73,6 +73,7 @@ struct erofs_mount_opts {
/* threshold for decompression synchronously */
unsigned int max_sync_decompress_pages;
#endif
+ char *tag;
unsigned int mount_opt;
};

@@ -151,6 +152,8 @@ struct erofs_sb_info {
/* sysfs support */
struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
struct completion s_kobj_unregister;
+
+ struct erofs_fscache *bootstrap;
};

#define EROFS_SB(sb) ((struct erofs_sb_info *)(sb)->s_fs_info)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 798f0c379e35..de5aeda4aea0 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -598,6 +598,16 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sbi->devs = ctx->devs;
ctx->devs = NULL;

+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && erofs_is_nodev_mode(sb)) {
+ struct erofs_fscache *bootstrap;
+
+ bootstrap = erofs_fscache_get(sb, ctx->opt.tag, true);
+ if (IS_ERR(bootstrap))
+ return PTR_ERR(bootstrap);
+
+ sbi->bootstrap = bootstrap;
+ }
+
err = erofs_read_superblock(sb);
if (err)
return err;
@@ -753,6 +763,7 @@ static void erofs_kill_sb(struct super_block *sb)
return;

erofs_free_dev_context(sbi->devs);
+ erofs_fscache_put(sbi->bootstrap);
fs_put_dax(sbi->dax_dev);
kfree(sbi);
sb->s_fs_info = NULL;
@@ -771,6 +782,12 @@ static void erofs_put_super(struct super_block *sb)
iput(sbi->managed_cache);
sbi->managed_cache = NULL;
#endif
+ erofs_fscache_put(sbi->bootstrap);
+ /*
+ * Set sbi->bootstrap to NULL, so that the following cleanup routine
+ * inside .kill_sb() could be skipped then.
+ */
+ sbi->bootstrap = NULL;
}

static struct file_system_type erofs_fs_type = {
--
2.27.0

2022-03-25 19:16:05

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 10/22] erofs: add mode checking helper

Until then erofs is exactly blockdev based filesystem. In other using
scenarios (e.g. container image), erofs needs to run upon files.

This patch set is going to introduces a new nodev mode, in which erofs
could be mounted from a bootstrap blob file containing complete erofs
image.

Add a helper checking which mode erofs works in.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/internal.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index e424293f47a2..1486e2573667 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -161,6 +161,11 @@ struct erofs_sb_info {
#define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option)
#define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)

+static inline bool erofs_is_nodev_mode(struct super_block *sb)
+{
+ return !sb->s_bdev;
+}
+
enum {
EROFS_ZIP_CACHE_DISABLED,
EROFS_ZIP_CACHE_READAHEAD,
--
2.27.0

2022-03-25 19:16:32

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 20/22] erofs: implement fscache-based data read for data blobs

Implements the data plane of reading data from data blob file over
fscache.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 3 +++
fs/erofs/fscache.c | 15 +++++++++++++--
fs/erofs/internal.h | 1 +
3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index b4571bea93d5..b9a05de3c3b2 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -206,6 +206,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = sb->s_bdev;
map->m_daxdev = EROFS_SB(sb)->dax_dev;
map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
+ map->m_fscache = EROFS_SB(sb)->bootstrap;

if (map->m_deviceid) {
down_read(&devs->rwsem);
@@ -217,6 +218,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
+ map->m_fscache = dif->blob;
up_read(&devs->rwsem);
} else if (devs->extra_devices) {
down_read(&devs->rwsem);
@@ -234,6 +236,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
+ map->m_fscache = dif->blob;
break;
}
}
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index d75958470645..cbb39657615e 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -63,9 +63,20 @@ static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
static inline int erofs_fscache_get_map(struct erofs_map_blocks *map,
struct super_block *sb)
{
- struct erofs_sb_info *sbi = EROFS_SB(sb);
+ struct erofs_map_dev mdev;
+ int ret;
+
+ mdev = (struct erofs_map_dev) {
+ .m_deviceid = map->m_deviceid,
+ .m_pa = map->m_pa,
+ };
+
+ ret = erofs_map_dev(sb, &mdev);
+ if (ret)
+ return ret;

- map->m_fscache = sbi->bootstrap;
+ map->m_fscache = mdev.m_fscache;
+ map->m_pa = mdev.m_pa;
return 0;
}

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 94a118caf580..cea08f12a2c3 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -487,6 +487,7 @@ struct erofs_map_dev {
struct block_device *m_bdev;
struct dax_device *m_daxdev;
u64 m_dax_part_off;
+ struct erofs_fscache *m_fscache;

erofs_off_t m_pa;
unsigned int m_deviceid;
--
2.27.0

2022-03-25 19:27:48

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 22/22] erofs: add 'tag' mount option

Introduce 'tag' mount option to enable on-demand read sementics. In
this case, erofs could be mounted from blob files instead of blkdev.
By then users could specify the name of bootstrap blob file containing
the complete erofs image.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/super.c | 44 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 8ac400581784..6ea83f36842c 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -403,6 +403,7 @@ enum {
Opt_dax,
Opt_dax_enum,
Opt_device,
+ Opt_tag,
Opt_err
};

@@ -427,6 +428,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_flag("dax", Opt_dax),
fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
fsparam_string("device", Opt_device),
+ fsparam_string("tag", Opt_tag),
{}
};

@@ -522,6 +524,16 @@ static int erofs_fc_parse_param(struct fs_context *fc,
}
++ctx->devs->extra_devices;
break;
+ case Opt_tag:
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ kfree(ctx->opt.tag);
+ ctx->opt.tag = kstrdup(param->string, GFP_KERNEL);
+ if (!ctx->opt.tag)
+ return -ENOMEM;
+#else
+ errorfc(fc, "tag option not supported");
+#endif
+ break;
default:
return -ENOPARAM;
}
@@ -596,9 +608,14 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

sb->s_magic = EROFS_SUPER_MAGIC;

- if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
- erofs_err(sb, "failed to set erofs blksize");
- return -EINVAL;
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && erofs_is_nodev_mode(sb)) {
+ sb->s_blocksize = EROFS_BLKSIZ;
+ sb->s_blocksize_bits = LOG_BLOCK_SIZE;
+ } else {
+ if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+ erofs_err(sb, "failed to set erofs blksize");
+ return -EINVAL;
+ }
}

sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
@@ -607,7 +624,6 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
- sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
sbi->devs = ctx->devs;
ctx->devs = NULL;

@@ -623,6 +639,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
err = super_setup_bdi(sb);
if (err)
return err;
+
+ sbi->dax_dev = NULL;
+ } else {
+ sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
}

err = erofs_read_superblock(sb);
@@ -685,6 +705,11 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)

static int erofs_fc_get_tree(struct fs_context *fc)
{
+ struct erofs_fs_context *ctx = fc->fs_private;
+
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && ctx->opt.tag)
+ return get_tree_nodev(fc, erofs_fc_fill_super);
+
return get_tree_bdev(fc, erofs_fc_fill_super);
}

@@ -734,6 +759,7 @@ static void erofs_fc_free(struct fs_context *fc)
struct erofs_fs_context *ctx = fc->fs_private;

erofs_free_dev_context(ctx->devs);
+ kfree(ctx->opt.tag);
kfree(ctx);
}

@@ -774,7 +800,10 @@ static void erofs_kill_sb(struct super_block *sb)

WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);

- kill_block_super(sb);
+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) && erofs_is_nodev_mode(sb))
+ generic_shutdown_super(sb);
+ else
+ kill_block_super(sb);

sbi = EROFS_SB(sb);
if (!sbi)
@@ -896,7 +925,10 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
struct erofs_sb_info *sbi = EROFS_SB(sb);
- u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+ u64 id = 0;
+
+ if (!IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) || !erofs_is_nodev_mode(sb))
+ id = huge_encode_dev(sb->s_bdev->bd_dev);

buf->f_type = sb->s_magic;
buf->f_bsize = EROFS_BLKSIZ;
--
2.27.0

2022-03-25 19:30:40

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 02/22] cachefiles: extract write routine

Extract the generic routine of writing data to cache files, and make it
generally available.

This will be used by the following patch implementing on-demand read
mode. Since it's called inside cachefiles module in this case, make the
interface generic and unrelated to netfs_cache_resources.

It is worth nothing that, ki->inval_counter is not initialized after
this cleanup. It shall not make any visible difference, since
inval_counter is no longer used in the write completion routine, i.e.
cachefiles_write_complete().

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/internal.h | 10 +++++++
fs/cachefiles/io.c | 61 +++++++++++++++++++++++-----------------
2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c793d33b0224..e80673d0ab97 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -201,6 +201,16 @@ extern void cachefiles_put_object(struct cachefiles_object *object,
*/
extern bool cachefiles_begin_operation(struct netfs_cache_resources *cres,
enum fscache_want_state want_state);
+extern int __cachefiles_prepare_write(struct cachefiles_object *object,
+ struct file *file,
+ loff_t *_start, size_t *_len,
+ bool no_space_allocated_yet);
+extern int __cachefiles_write(struct cachefiles_object *object,
+ struct file *file,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv);

/*
* key.c
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 753986ea1583..8dbc1eb254a3 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -278,36 +278,33 @@ static void cachefiles_write_complete(struct kiocb *iocb, long ret)
/*
* Initiate a write to the cache.
*/
-static int cachefiles_write(struct netfs_cache_resources *cres,
- loff_t start_pos,
- struct iov_iter *iter,
- netfs_io_terminated_t term_func,
- void *term_func_priv)
+int __cachefiles_write(struct cachefiles_object *object,
+ struct file *file,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
{
- struct cachefiles_object *object;
struct cachefiles_cache *cache;
struct cachefiles_kiocb *ki;
struct inode *inode;
- struct file *file;
unsigned int old_nofs;
- ssize_t ret = -ENOBUFS;
+ ssize_t ret;
size_t len = iov_iter_count(iter);

- if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
- goto presubmission_error;
fscache_count_write();
- object = cachefiles_cres_object(cres);
cache = object->volume->cache;
- file = cachefiles_cres_file(cres);

_enter("%pD,%li,%llx,%zx/%llx",
file, file_inode(file)->i_ino, start_pos, len,
i_size_read(file_inode(file)));

- ret = -ENOMEM;
ki = kzalloc(sizeof(struct cachefiles_kiocb), GFP_KERNEL);
- if (!ki)
- goto presubmission_error;
+ if (!ki) {
+ if (term_func)
+ term_func(term_func_priv, -ENOMEM, false);
+ return -ENOMEM;
+ }

refcount_set(&ki->ki_refcnt, 2);
ki->iocb.ki_filp = file;
@@ -316,7 +313,6 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
ki->iocb.ki_hint = ki_hint_validate(file_write_hint(file));
ki->iocb.ki_ioprio = get_current_ioprio();
ki->object = object;
- ki->inval_counter = cres->inval_counter;
ki->start = start_pos;
ki->len = len;
ki->term_func = term_func;
@@ -371,11 +367,24 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
cachefiles_put_kiocb(ki);
_leave(" = %zd", ret);
return ret;
+}

-presubmission_error:
- if (term_func)
- term_func(term_func_priv, ret, false);
- return ret;
+static int cachefiles_write(struct netfs_cache_resources *cres,
+ loff_t start_pos,
+ struct iov_iter *iter,
+ netfs_io_terminated_t term_func,
+ void *term_func_priv)
+{
+ if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE)) {
+ if (term_func)
+ term_func(term_func_priv, -ENOBUFS, false);
+ return -ENOBUFS;
+ }
+
+ return __cachefiles_write(cachefiles_cres_object(cres),
+ cachefiles_cres_file(cres),
+ start_pos, iter,
+ term_func, term_func_priv);
}

/*
@@ -486,13 +495,12 @@ static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subreque
/*
* Prepare for a write to occur.
*/
-static int __cachefiles_prepare_write(struct netfs_cache_resources *cres,
- loff_t *_start, size_t *_len, loff_t i_size,
- bool no_space_allocated_yet)
+int __cachefiles_prepare_write(struct cachefiles_object *object,
+ struct file *file,
+ loff_t *_start, size_t *_len,
+ bool no_space_allocated_yet)
{
- struct cachefiles_object *object = cachefiles_cres_object(cres);
struct cachefiles_cache *cache = object->volume->cache;
- struct file *file = cachefiles_cres_file(cres);
loff_t start = *_start, pos;
size_t len = *_len, down;
int ret;
@@ -579,7 +587,8 @@ static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
}

cachefiles_begin_secure(cache, &saved_cred);
- ret = __cachefiles_prepare_write(cres, _start, _len, i_size,
+ ret = __cachefiles_prepare_write(object, cachefiles_cres_file(cres),
+ _start, _len,
no_space_allocated_yet);
cachefiles_end_secure(cache, saved_cred);
return ret;
--
2.27.0

2022-03-25 19:36:32

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 01/22] fscache: export fscache_end_operation()

Export fscache_end_operation() to avoid code duplication.

Besides, considering the paired fscache_begin_read_operation() is
already exported, it shall make sense to also export
fscache_end_operation().

Signed-off-by: Jeffle Xu <[email protected]>
Reviewed-by: Liu Bo <[email protected]>
---
fs/fscache/internal.h | 11 -----------
fs/nfs/fscache.c | 8 --------
include/linux/fscache.h | 14 ++++++++++++++
3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index f121c21590dc..ed1c9ed737f2 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -70,17 +70,6 @@ static inline void fscache_see_cookie(struct fscache_cookie *cookie,
where);
}

-/*
- * io.c
- */
-static inline void fscache_end_operation(struct netfs_cache_resources *cres)
-{
- const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
-
- if (ops)
- ops->end_operation(cres);
-}
-
/*
* main.c
*/
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index cfe901650ab0..39654ca72d3d 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -249,14 +249,6 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
}
}

-static inline void fscache_end_operation(struct netfs_cache_resources *cres)
-{
- const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
-
- if (ops)
- ops->end_operation(cres);
-}
-
/*
* Fallback page reading interface.
*/
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index 296c5f1d9f35..d2430da8aa67 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -456,6 +456,20 @@ int fscache_begin_read_operation(struct netfs_cache_resources *cres,
return -ENOBUFS;
}

+/**
+ * fscache_end_operation - End the read operation for the netfs lib
+ * @cres: The cache resources for the read operation
+ *
+ * Clean up the resources at the end of the read request.
+ */
+static inline void fscache_end_operation(struct netfs_cache_resources *cres)
+{
+ const struct netfs_cache_ops *ops = fscache_operation_valid(cres);
+
+ if (ops)
+ ops->end_operation(cres);
+}
+
/**
* fscache_read - Start a read from the cache.
* @cres: The cache resources to use
--
2.27.0

2022-03-25 19:43:15

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 13/22] erofs: add anonymous inode managing page cache of blob file

Introduce one anonymous inode for managing page cache of corresponding
blob file. Then erofs could read directly from the address space of the
anonymous inode when cache hit.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 41 ++++++++++++++++++++++++++++++++++++++++-
fs/erofs/internal.h | 7 +++++--
2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 73235fd43bf6..30383d9adb62 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -7,6 +7,9 @@

static struct fscache_volume *volume;

+static const struct address_space_operations erofs_fscache_blob_aops = {
+};
+
static int erofs_fscache_init_cookie(struct erofs_fscache *ctx, char *path)
{
struct fscache_cookie *cookie;
@@ -31,6 +34,29 @@ static inline void erofs_fscache_cleanup_cookie(struct erofs_fscache *ctx)
ctx->cookie = NULL;
}

+static int erofs_fscache_get_inode(struct erofs_fscache *ctx,
+ struct super_block *sb)
+{
+ struct inode *const inode = new_inode(sb);
+
+ if (!inode)
+ return -ENOMEM;
+
+ set_nlink(inode, 1);
+ inode->i_size = OFFSET_MAX;
+ inode->i_mapping->a_ops = &erofs_fscache_blob_aops;
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+
+ ctx->inode = inode;
+ return 0;
+}
+
+static inline void erofs_fscache_put_inode(struct erofs_fscache *ctx)
+{
+ iput(ctx->inode);
+ ctx->inode = NULL;
+}
+
/*
* erofs_fscache_get - create an fscache context for blob file
* @sb: superblock
@@ -38,7 +64,8 @@ static inline void erofs_fscache_cleanup_cookie(struct erofs_fscache *ctx)
*
* Return: fscache context on success, ERR_PTR() on failure.
*/
-struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path)
+struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path,
+ bool need_inode)
{
struct erofs_fscache *ctx;
int ret;
@@ -53,7 +80,18 @@ struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path)
goto err;
}

+ if (need_inode) {
+ ret = erofs_fscache_get_inode(ctx, sb);
+ if (ret) {
+ erofs_err(sb, "failed to get anonymous inode");
+ goto err_cookie;
+ }
+ }
+
return ctx;
+
+err_cookie:
+ erofs_fscache_cleanup_cookie(ctx);
err:
kfree(ctx);
return ERR_PTR(ret);
@@ -65,6 +103,7 @@ void erofs_fscache_put(struct erofs_fscache *ctx)
return;

erofs_fscache_cleanup_cookie(ctx);
+ erofs_fscache_put_inode(ctx);
kfree(ctx);
}

diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index d4f2b43cedae..459f31803c3b 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -98,6 +98,7 @@ struct erofs_sb_lz4_info {

struct erofs_fscache {
struct fscache_cookie *cookie;
+ struct inode *inode;
};

struct erofs_sb_info {
@@ -625,14 +626,16 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
int erofs_init_fscache(void);
void erofs_exit_fscache(void);

-struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path);
+struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path,
+ bool need_inode);
void erofs_fscache_put(struct erofs_fscache *ctx);
#else
static inline int erofs_init_fscache(void) { return 0; }
static inline void erofs_exit_fscache(void) {}

static inline struct erofs_fscache *erofs_fscache_get(struct super_block *sb,
- char *path)
+ char *path,
+ bool need_inode)
{
return ERR_PTR(-EOPNOTSUPP);
}
--
2.27.0

2022-03-25 19:43:28

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 17/22] erofs: implement fscache-based data read for non-inline layout

Implements the data plane of reading data from bootstrap blob file over
fscache for non-inline layout.

Be noted that compressed layout is not supported yet.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 83 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/inode.c | 8 ++++-
fs/erofs/internal.h | 5 +++
3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 91377939b4f7..4a9a4e60c15d 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -60,10 +60,93 @@ static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
return ret;
}

+static inline int erofs_fscache_get_map(struct erofs_map_blocks *map,
+ struct super_block *sb)
+{
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
+
+ map->m_fscache = sbi->bootstrap;
+ return 0;
+}
+
+static int erofs_fscache_readpage_noinline(struct folio *folio,
+ struct erofs_map_blocks *map)
+{
+ struct fscache_cookie *cookie = map->m_fscache->cookie;
+ /*
+ * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
+ * and the output map.m_pa is exactly the physical address of o_la.
+ * 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
+ * nearest chunk boundary, and the output map.m_pa is actually the
+ * physical address of this chunk boundary. So we need to recalculate
+ * the actual physical address of o_la.
+ */
+ loff_t start = map->m_pa + (map->o_la - map->m_la);
+
+ return erofs_fscache_read_folio(cookie, folio, start);
+}
+
+static int erofs_fscache_do_readpage(struct folio *folio)
+{
+ struct inode *inode = folio_file_mapping(folio)->host;
+ struct erofs_inode *vi = EROFS_I(inode);
+ struct super_block *sb = inode->i_sb;
+ struct erofs_map_blocks map;
+ int ret;
+
+ if (erofs_inode_is_data_compressed(vi->datalayout)) {
+ erofs_info(sb, "compressed layout not supported yet");
+ return -EOPNOTSUPP;
+ }
+
+ DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
+
+ map.m_la = map.o_la = folio_pos(folio);
+
+ ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+ if (ret)
+ return ret;
+
+ if (!(map.m_flags & EROFS_MAP_MAPPED)) {
+ folio_zero_range(folio, 0, folio_size(folio));
+ return 0;
+ }
+
+ ret = erofs_fscache_get_map(&map, sb);
+ if (ret)
+ return ret;
+
+ switch (vi->datalayout) {
+ case EROFS_INODE_FLAT_PLAIN:
+ case EROFS_INODE_CHUNK_BASED:
+ return erofs_fscache_readpage_noinline(folio, &map);
+ default:
+ DBG_BUGON(1);
+ return -EOPNOTSUPP;
+ }
+}
+
+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+ struct folio *folio = page_folio(page);
+ int ret;
+
+ ret = erofs_fscache_do_readpage(folio);
+ if (!ret)
+ folio_mark_uptodate(folio);
+
+ folio_unlock(folio);
+ return ret;
+}
+
static const struct address_space_operations erofs_fscache_blob_aops = {
.readpage = erofs_fscache_readpage_blob,
};

+const struct address_space_operations erofs_fscache_access_aops = {
+ .readpage = erofs_fscache_readpage,
+};
+
/*
* erofs_fscache_get_folio - find and read page cache of blob file
* @ctx: the context of the blob file
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index ff62f84f47d3..744faf3ef9f4 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -296,7 +296,13 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
err = z_erofs_fill_inode(inode);
goto out_unlock;
}
- inode->i_mapping->a_ops = &erofs_raw_access_aops;
+
+#ifdef CONFIG_EROFS_FS_ONDEMAND
+ if (erofs_is_nodev_mode(inode->i_sb))
+ inode->i_mapping->a_ops = &erofs_fscache_access_aops;
+#endif
+ if (!erofs_is_nodev_mode(inode->i_sb))
+ inode->i_mapping->a_ops = &erofs_raw_access_aops;

out_unlock:
erofs_put_metabuf(&buf);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index fa89a1e3012f..6537ededed51 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -442,6 +442,9 @@ struct erofs_map_blocks {
unsigned short m_deviceid;
char m_algorithmformat;
unsigned int m_flags;
+
+ struct erofs_fscache *m_fscache;
+ erofs_off_t o_la;
};

/* Flags used by erofs_map_blocks_flatmode() */
@@ -634,6 +637,8 @@ struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path,
void erofs_fscache_put(struct erofs_fscache *ctx);

struct folio *erofs_fscache_get_folio(struct erofs_fscache *ctx, pgoff_t index);
+
+extern const struct address_space_operations erofs_fscache_access_aops;
#else
static inline int erofs_init_fscache(void) { return 0; }
static inline void erofs_exit_fscache(void) {}
--
2.27.0

2022-03-25 19:50:44

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 09/22] erofs: make erofs_map_blocks() generally available

... so that it can be used in the following introduced fs/erofs/fscache.c.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 4 ++--
fs/erofs/internal.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 226a57c57ee6..6e2a28242453 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -104,8 +104,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
return 0;
}

-static int erofs_map_blocks(struct inode *inode,
- struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags)
{
struct super_block *sb = inode->i_sb;
struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 5aa2cf2c2f80..e424293f47a2 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -484,6 +484,8 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags);

/* inode.c */
static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
--
2.27.0

2022-03-25 20:01:36

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 05/22] cachefiles: implement on-demand read

Implement the data plane of on-demand read mode.

A new NETFS_READ_HOLE_ONDEMAND flag is introduced to indicate that
on-demand read should be done when a cache miss encountered. In this
case, the read routine will send a READ request to user daemon, along
with the anonymous fd and the file range that shall be read. Now user
daemon is responsible for fetching data in the given file range, and
then writing the fetched data into cache file with the given anonymous
fd.

After sending the READ request, the read routine will hang there, until
the READ request is handled by user daemon. Then it will retry to read
from the same file range. If a cache miss is encountered again on the
same file range, the read routine will fail then.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/internal.h | 7 +++
fs/cachefiles/io.c | 11 +++++
fs/cachefiles/ondemand.c | 81 +++++++++++++++++++++++++++++++++
include/linux/netfs.h | 1 +
include/uapi/linux/cachefiles.h | 13 ++++++
5 files changed, 113 insertions(+)

diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index c80b519a887b..686f25097681 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -281,6 +281,8 @@ extern int cachefiles_ondemand_cinit(struct cachefiles_cache *cache,

extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
extern void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object);
+extern int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len);

#else
ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
@@ -295,6 +297,11 @@ static inline int cachefiles_ondemand_init_object(struct cachefiles_object *obje
}

static inline void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object) {}
+static inline int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len)
+{
+ return -EOPNOTSUPP;
+}
#endif

/*
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 8dbc1eb254a3..ee1283ba7a2c 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -95,6 +95,7 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
file, file_inode(file)->i_ino, start_pos, len,
i_size_read(file_inode(file)));

+retry:
/* If the caller asked us to seek for data before doing the read, then
* we should do that now. If we find a gap, we fill it with zeros.
*/
@@ -119,6 +120,16 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
if (read_hole == NETFS_READ_HOLE_FAIL)
goto presubmission_error;

+ if (read_hole == NETFS_READ_HOLE_ONDEMAND) {
+ if (!cachefiles_ondemand_read(object, off, len)) {
+ /* fail the read if no progress achieved */
+ read_hole = NETFS_READ_HOLE_FAIL;
+ goto retry;
+ }
+
+ goto presubmission_error;
+ }
+
iov_iter_zero(len, iter);
skipped = len;
ret = 0;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 7fd518e01e5a..965fb7bd97c0 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -11,13 +11,30 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
struct file *file)
{
struct cachefiles_object *object = file->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct xarray *xa = &cache->reqs;
+ struct cachefiles_req *req;
+ unsigned long index;

+ xa_lock(xa);
/*
* Uninstall anon_fd to the cachefiles object, so that no further
* associated requests will get enqueued.
*/
object->fd = -1;

+ /*
+ * Flush all pending READ requests since their completion depends on
+ * anon_fd.
+ */
+ xa_for_each(xa, index, req) {
+ if (req->msg.opcode == CACHEFILES_OP_READ) {
+ req->error = -EIO;
+ complete(&req->done);
+ }
+ }
+ xa_unlock(xa);
+
cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
return 0;
}
@@ -60,11 +77,35 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, int w
return vfs_llseek(file, pos, whence);
}

+static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
+ unsigned long arg)
+{
+ struct cachefiles_object *object = filp->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct cachefiles_req *req;
+ unsigned long id;
+
+ if (ioctl != CACHEFILES_IOC_CREAD)
+ return -EINVAL;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ id = arg;
+ req = xa_erase(&cache->reqs, id);
+ if (!req)
+ return -EINVAL;
+
+ complete(&req->done);
+ return 0;
+}
+
static const struct file_operations cachefiles_ondemand_fd_fops = {
.owner = THIS_MODULE,
.release = cachefiles_ondemand_fd_release,
.write_iter = cachefiles_ondemand_fd_write_iter,
.llseek = cachefiles_ondemand_fd_llseek,
+ .unlocked_ioctl = cachefiles_ondemand_fd_ioctl,
};

/*
@@ -269,6 +310,13 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
goto out;
}

+ /* recheck anon_fd for READ request with lock held */
+ if (opcode == CACHEFILES_OP_READ && object->fd == -1) {
+ xas_unlock(&xas);
+ ret = -EIO;
+ goto out;
+ }
+
xas.xa_index = 0;
xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
if (xas.xa_node == XAS_RESTART)
@@ -341,6 +389,28 @@ static int init_close_req(struct cachefiles_req *req, void *private)
return 0;
}

+struct cachefiles_read_ctx {
+ loff_t off;
+ size_t len;
+};
+
+static int init_read_req(struct cachefiles_req *req, void *private)
+{
+ struct cachefiles_object *object = req->object;
+ struct cachefiles_read *load = (void *)&req->msg.data;
+ struct cachefiles_read_ctx *read_ctx = private;
+ int fd = object->fd;
+
+ /* Stop enqueuing request when daemon closes anon_fd prematurely. */
+ if (WARN_ON_ONCE(fd == -1))
+ return -EIO;
+
+ load->off = read_ctx->off;
+ load->len = read_ctx->len;
+ load->fd = fd;
+ return 0;
+}
+
int cachefiles_ondemand_init_object(struct cachefiles_object *object)
{
struct fscache_cookie *cookie = object->cookie;
@@ -373,3 +443,14 @@ void cachefiles_ondemand_cleanup_object(struct cachefiles_object *object)
sizeof(struct cachefiles_close),
init_close_req, NULL);
}
+
+int cachefiles_ondemand_read(struct cachefiles_object *object,
+ loff_t pos, size_t len)
+{
+ struct cachefiles_read_ctx read_ctx = {pos, len};
+
+ return cachefiles_ondemand_send_req(object,
+ CACHEFILES_OP_READ,
+ sizeof(struct cachefiles_read),
+ init_read_req, &read_ctx);
+}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 614f22213e21..2a9c50d3a928 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -203,6 +203,7 @@ enum netfs_read_from_hole {
NETFS_READ_HOLE_IGNORE,
NETFS_READ_HOLE_CLEAR,
NETFS_READ_HOLE_FAIL,
+ NETFS_READ_HOLE_ONDEMAND,
};

/*
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
index 03047e4b7df2..004335d44e16 100644
--- a/include/uapi/linux/cachefiles.h
+++ b/include/uapi/linux/cachefiles.h
@@ -3,6 +3,7 @@
#define _LINUX_CACHEFILES_H

#include <linux/types.h>
+#include <linux/ioctl.h>

/*
* Fscache ensures that the maximum length of cookie key is 255. The volume key
@@ -13,6 +14,7 @@
enum cachefiles_opcode {
CACHEFILES_OP_OPEN,
CACHEFILES_OP_CLOSE,
+ CACHEFILES_OP_READ,
};

/*
@@ -45,4 +47,15 @@ struct cachefiles_close {
__u32 fd;
};

+struct cachefiles_read {
+ __u64 off;
+ __u64 len;
+ __u32 fd;
+};
+
+/*
+ * For CACHEFILES_IOC_CREAD, arg is the @id field of corresponding READ request.
+ */
+#define CACHEFILES_IOC_CREAD _IOW(0x98, 1, long)
+
#endif
--
2.27.0

2022-03-25 20:18:43

by Jingbo Xu

[permalink] [raw]
Subject: [PATCH v6 03/22] cachefiles: notify user daemon with anon_fd when looking up cookie

Fscache/cachefiles used to serve as a local cache for remote fs. This
patch, along with the following patches, introduces a new on-demand read
mode for cachefiles, which can boost the scenario where on-demand read
semantics is needed, e.g. container image distribution.

The essential difference between the original mode and on-demand read
mode is that, in the original mode, when cache miss, netfs itself will
fetch data from remote, and then write the fetched data into cache file.
While in on-demand read mode, a user daemon is responsible for fetching
data and then writing to the cache file.

As the first step, notify user daemon with anon_fd when looking up
cookie.

Send the anonymous fd to user daemon when looking up cookie, no matter
whether the cache file exist there or not. With the given anonymous fd,
user daemon can fetch and then write data into cache file in advance,
even when cache miss has not happended yet.

Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that
cache file size shall be retrieved at runtime. This helps the scenario
where one cache file can contain multiple netfs files for the purpose of
deduplication, e.g. In this case, netfs itself has no idea the cache
file size, whilst user daemon needs to offer the hint on the cache file
size.

Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 76 ++++++-
fs/cachefiles/internal.h | 44 ++++
fs/cachefiles/namei.c | 16 +-
fs/cachefiles/ondemand.c | 348 ++++++++++++++++++++++++++++++
include/linux/fscache.h | 1 +
include/trace/events/cachefiles.h | 2 +
include/uapi/linux/cachefiles.h | 43 ++++
9 files changed, 529 insertions(+), 13 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 include/uapi/linux/cachefiles.h

diff --git a/fs/cachefiles/Kconfig b/fs/cachefiles/Kconfig
index 719faeeda168..58aad1fb4c5c 100644
--- a/fs/cachefiles/Kconfig
+++ b/fs/cachefiles/Kconfig
@@ -26,3 +26,14 @@ config CACHEFILES_ERROR_INJECTION
help
This permits error injection to be enabled in cachefiles whilst a
cache is in service.
+
+config CACHEFILES_ONDEMAND
+ bool "Support for on-demand read"
+ depends on CACHEFILES
+ default n
+ help
+ This permits on-demand read mode of cachefiles. In this mode, when
+ cache miss, the cachefiles backend instead of netfs, is responsible
+ for fetching data, e.g. through user daemon.
+
+ If unsure, say N.
diff --git a/fs/cachefiles/Makefile b/fs/cachefiles/Makefile
index 16d811f1a2fa..c37a7a9af10b 100644
--- a/fs/cachefiles/Makefile
+++ b/fs/cachefiles/Makefile
@@ -16,5 +16,6 @@ cachefiles-y := \
xattr.o

cachefiles-$(CONFIG_CACHEFILES_ERROR_INJECTION) += error_inject.o
+cachefiles-$(CONFIG_CACHEFILES_ONDEMAND) += ondemand.o

obj-$(CONFIG_CACHEFILES) := cachefiles.o
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 7ac04ee2c0a0..91b88a499737 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -75,6 +75,9 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
{ "inuse", cachefiles_daemon_inuse },
{ "secctx", cachefiles_daemon_secctx },
{ "tag", cachefiles_daemon_tag },
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ { "cinit", cachefiles_ondemand_cinit },
+#endif
{ "", NULL }
};

@@ -108,6 +111,9 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
INIT_LIST_HEAD(&cache->volumes);
INIT_LIST_HEAD(&cache->object_list);
spin_lock_init(&cache->object_list_lock);
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ xa_init_flags(&cache->reqs, XA_FLAGS_ALLOC);
+#endif

/* set default caching limits
* - limit at 1% free space and/or free files
@@ -126,6 +132,27 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
return 0;
}

+#ifdef CONFIG_CACHEFILES_ONDEMAND
+static inline void cachefiles_flush_reqs(struct cachefiles_cache *cache)
+{
+ struct xarray *xa = &cache->reqs;
+ struct cachefiles_req *req;
+ unsigned long index;
+
+ /*
+ * 1) Cache has been marked as dead state, and then 2) flush all
+ * pending requests in @reqs xarray. The barrier inside set_bit()
+ * will ensure that above two ops won't be reordered.
+ */
+ xa_lock(xa);
+ xa_for_each(xa, index, req) {
+ req->error = -EIO;
+ complete(&req->done);
+ }
+ xa_unlock(xa);
+}
+#endif
+
/*
* Release a cache.
*/
@@ -139,6 +166,11 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)

set_bit(CACHEFILES_DEAD, &cache->flags);

+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ cachefiles_flush_reqs(cache);
+ xa_destroy(&cache->reqs);
+#endif
+
cachefiles_daemon_unbind(cache);

/* clean up the control file interface */
@@ -152,23 +184,15 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
return 0;
}

-/*
- * Read the cache state.
- */
-static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
- size_t buflen, loff_t *pos)
+static ssize_t cachefiles_do_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer,
+ size_t buflen)
{
- struct cachefiles_cache *cache = file->private_data;
unsigned long long b_released;
unsigned f_released;
char buffer[256];
int n;

- //_enter(",,%zu,", buflen);
-
- if (!test_bit(CACHEFILES_READY, &cache->flags))
- return 0;
-
/* check how much space the cache has */
cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);

@@ -206,6 +230,26 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
return n;
}

+/*
+ * Read the cache state.
+ */
+static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
+ size_t buflen, loff_t *pos)
+{
+ struct cachefiles_cache *cache = file->private_data;
+
+ //_enter(",,%zu,", buflen);
+
+ if (!test_bit(CACHEFILES_READY, &cache->flags))
+ return 0;
+
+ if (IS_ENABLED(CONFIG_CACHEFILES_ONDEMAND) &&
+ test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return cachefiles_ondemand_daemon_read(cache, _buffer, buflen);
+ else
+ return cachefiles_do_daemon_read(cache, _buffer, buflen);
+}
+
/*
* Take a command from cachefilesd, parse it and act on it.
*/
@@ -297,8 +341,18 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
poll_wait(file, &cache->daemon_pollwq, poll);
mask = 0;

+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ if (test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags)) {
+ if (!xa_empty(&cache->reqs))
+ mask |= EPOLLIN;
+ } else {
+ if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+ mask |= EPOLLIN;
+ }
+#else
if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
mask |= EPOLLIN;
+#endif

if (test_bit(CACHEFILES_CULLING, &cache->flags))
mask |= EPOLLOUT;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index e80673d0ab97..8a0f1b691aca 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,8 @@
#include <linux/fscache-cache.h>
#include <linux/cred.h>
#include <linux/security.h>
+#include <linux/xarray.h>
+#include <linux/cachefiles.h>

#define CACHEFILES_DIO_BLOCK_SIZE 4096

@@ -58,6 +60,9 @@ struct cachefiles_object {
enum cachefiles_content content_info:8; /* Info about content presence */
unsigned long flags;
#define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ int fd; /* anonymous fd */
+#endif
};

/*
@@ -98,11 +103,24 @@ struct cachefiles_cache {
#define CACHEFILES_DEAD 1 /* T if cache dead */
#define CACHEFILES_CULLING 2 /* T if cull engaged */
#define CACHEFILES_STATE_CHANGED 3 /* T if state changed (poll trigger) */
+#define CACHEFILES_ONDEMAND_MODE 4 /* T if in on-demand read mode */
char *rootdirname; /* name of cache root directory */
char *secctx; /* LSM security context */
char *tag; /* cache binding tag */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+ struct xarray reqs; /* xarray of pending on-demand requests */
+#endif
};

+struct cachefiles_req {
+ struct cachefiles_object *object;
+ struct completion done;
+ int error;
+ struct cachefiles_msg msg;
+};
+
+#define CACHEFILES_REQ_NEW XA_MARK_1
+
#include <trace/events/cachefiles.h>

static inline
@@ -250,6 +268,32 @@ extern struct file *cachefiles_create_tmpfile(struct cachefiles_object *object);
extern bool cachefiles_commit_tmpfile(struct cachefiles_cache *cache,
struct cachefiles_object *object);

+/*
+ * ondemand.c
+ */
+#ifdef CONFIG_CACHEFILES_ONDEMAND
+extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer,
+ size_t buflen);
+
+extern int cachefiles_ondemand_cinit(struct cachefiles_cache *cache,
+ char *args);
+
+extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
+
+#else
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer, size_t buflen)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+ return 0;
+}
+#endif
+
/*
* security.c
*/
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index f256c8aff7bb..22aba4c6a762 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -444,10 +444,9 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
struct dentry *fan = volume->fanout[(u8)object->cookie->key_hash];
struct file *file;
struct path path;
- uint64_t ni_size = object->cookie->object_size;
+ uint64_t ni_size;
long ret;

- ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);

cachefiles_begin_secure(cache, &saved_cred);

@@ -473,6 +472,15 @@ struct file *cachefiles_create_tmpfile(struct cachefiles_object *object)
goto out_dput;
}

+ ret = cachefiles_ondemand_init_object(object);
+ if (ret < 0) {
+ file = ERR_PTR(ret);
+ goto out_dput;
+ }
+
+ ni_size = object->cookie->object_size;
+ ni_size = round_up(ni_size, CACHEFILES_DIO_BLOCK_SIZE);
+
if (ni_size > 0) {
trace_cachefiles_trunc(object, d_backing_inode(path.dentry), 0, ni_size,
cachefiles_trunc_expand_tmpfile);
@@ -573,6 +581,10 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
}
_debug("file -> %pd positive", dentry);

+ ret = cachefiles_ondemand_init_object(object);
+ if (ret < 0)
+ goto error_fput;
+
ret = cachefiles_check_auxdata(object, file);
if (ret < 0)
goto check_failed;
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
new file mode 100644
index 000000000000..0742c4a7797a
--- /dev/null
+++ b/fs/cachefiles/ondemand.c
@@ -0,0 +1,348 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022, Alibaba Cloud
+ */
+#include <linux/fdtable.h>
+#include <linux/anon_inodes.h>
+#include <linux/uio.h>
+#include "internal.h"
+
+static int cachefiles_ondemand_fd_release(struct inode *inode,
+ struct file *file)
+{
+ struct cachefiles_object *object = file->private_data;
+
+ /*
+ * Uninstall anon_fd to the cachefiles object, so that no further
+ * associated requests will get enqueued.
+ */
+ object->fd = -1;
+
+ cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+ return 0;
+}
+
+static ssize_t cachefiles_ondemand_fd_write_iter(struct kiocb *kiocb,
+ struct iov_iter *iter)
+{
+ struct cachefiles_object *object = kiocb->ki_filp->private_data;
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct file *file = object->file;
+ size_t len = iter->count;
+ loff_t pos = kiocb->ki_pos;
+ const struct cred *saved_cred;
+ int ret;
+
+ if (!file)
+ return -ENOBUFS;
+
+ cachefiles_begin_secure(cache, &saved_cred);
+ ret = __cachefiles_prepare_write(object, file, &pos, &len, true);
+ cachefiles_end_secure(cache, saved_cred);
+ if (ret < 0)
+ return ret;
+
+ ret = __cachefiles_write(object, file, pos, iter, NULL, NULL);
+ if (!ret)
+ ret = len;
+
+ return ret;
+}
+
+static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos, int whence)
+{
+ struct cachefiles_object *object = filp->private_data;
+ struct file *file = object->file;
+
+ if (!file)
+ return -ENOBUFS;
+
+ return vfs_llseek(file, pos, whence);
+}
+
+static const struct file_operations cachefiles_ondemand_fd_fops = {
+ .owner = THIS_MODULE,
+ .release = cachefiles_ondemand_fd_release,
+ .write_iter = cachefiles_ondemand_fd_write_iter,
+ .llseek = cachefiles_ondemand_fd_llseek,
+};
+
+/*
+ * Init request completion
+ * - command: "cinit <id>[,<cache_size>]"
+ */
+int cachefiles_ondemand_cinit(struct cachefiles_cache *cache, char *args)
+{
+ struct cachefiles_req *req;
+ struct cachefiles_open *load;
+ struct fscache_cookie *cookie;
+ char *pid, *psize;
+ unsigned long id, flags, size = 0;
+ int ret;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ if (!*args) {
+ pr_err("Empty id specified\n");
+ return -EINVAL;
+ }
+
+ pid = args;
+ psize = strchr(args, ',');
+ if (psize) {
+ *psize = 0;
+ psize++;
+
+ ret = kstrtoul(psize, 0, &size);
+ if (ret)
+ return ret;
+ }
+
+ ret = kstrtoul(pid, 0, &id);
+ if (ret)
+ return ret;
+
+ req = xa_erase(&cache->reqs, id);
+ if (!req)
+ return -EINVAL;
+
+ load = (void *)req->msg.data;
+ flags = load->flags;
+
+ if (test_bit(CACHEFILES_OPEN_WANT_CACHE_SIZE, &flags)) {
+ if (size) {
+ cookie = req->object->cookie;
+ cookie->object_size = size;
+ clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
+ } else {
+ req->error = -EINVAL;
+ }
+ }
+
+ complete(&req->done);
+ return 0;
+}
+
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_open *load;
+ struct fd f;
+ int ret;
+
+ object = cachefiles_grab_object(req->object,
+ cachefiles_obj_get_ondemand_fd);
+
+ ret = anon_inode_getfd("[cachefiles]", &cachefiles_ondemand_fd_fops,
+ object, O_WRONLY);
+ if (ret < 0) {
+ cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
+ return ret;
+ }
+
+ f = fdget_pos(ret);
+ if (WARN_ON_ONCE(!f.file))
+ return -EBADFD;
+
+ f.file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
+ fdput_pos(f);
+
+ load = (void *)req->msg.data;
+ load->fd = object->fd = ret;
+
+ return 0;
+}
+
+ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
+ char __user *_buffer, size_t buflen)
+{
+ struct cachefiles_req *req;
+ struct cachefiles_msg *msg;
+ unsigned long id = 0;
+ size_t n;
+ int ret = 0;
+ XA_STATE(xas, &cache->reqs, 0);
+
+ /*
+ * Search for request that has not ever been processed, to prevent
+ * requests from being sent to user daemon repeatedly.
+ */
+ xa_lock(&cache->reqs);
+ req = xas_find_marked(&xas, UINT_MAX, CACHEFILES_REQ_NEW);
+ if (!req) {
+ xa_unlock(&cache->reqs);
+ return 0;
+ }
+
+ msg = &req->msg;
+ n = msg->len;
+
+ if (n > buflen) {
+ xa_unlock(&cache->reqs);
+ return -EMSGSIZE;
+ }
+
+ xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
+ xa_unlock(&cache->reqs);
+
+ msg->id = id = xas.xa_index;
+
+ if (msg->opcode == CACHEFILES_OP_OPEN) {
+ ret = cachefiles_ondemand_get_fd(req);
+ if (ret)
+ goto error;
+ }
+
+ if (copy_to_user(_buffer, msg, n) != 0) {
+ ret = -EFAULT;
+ goto err_put_fd;
+ }
+
+ return n;
+
+err_put_fd:
+ if (msg->opcode == CACHEFILES_OP_OPEN)
+ close_fd(req->object->fd);
+error:
+ xa_erase(&cache->reqs, id);
+ req->error = ret;
+ complete(&req->done);
+ return ret;
+}
+
+typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
+
+static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
+ enum cachefiles_opcode opcode,
+ size_t data_len,
+ init_req_fn init_req,
+ void *private)
+{
+ struct cachefiles_cache *cache = object->volume->cache;
+ struct cachefiles_req *req;
+ XA_STATE(xas, &cache->reqs, 0);
+ int ret;
+
+ if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
+ return -EOPNOTSUPP;
+
+ if (test_bit(CACHEFILES_DEAD, &cache->flags))
+ return -EIO;
+
+ req = kzalloc(sizeof(*req) + data_len, GFP_KERNEL);
+ if (!req)
+ return -ENOMEM;
+
+ req->object = object;
+ init_completion(&req->done);
+ req->msg.opcode = opcode;
+ req->msg.len = sizeof(struct cachefiles_msg) + data_len;
+
+ ret = init_req(req, private);
+ if (ret)
+ goto out;
+
+ do {
+ /*
+ * Stop enqueuing the request when daemon is dying. So we need
+ * to 1) check cache state, and 2) enqueue request if cache is
+ * alive.
+ *
+ * These two ops need to be atomic as a whole. Otherwise request
+ * may be enqueued after xarray has been flushed, in which case
+ * the orphan request will never be completed and thus netfs
+ * will hang there forever.
+ */
+ xas_lock(&xas);
+
+ /* recheck dead state with lock held */
+ if (test_bit(CACHEFILES_DEAD, &cache->flags)) {
+ xas_unlock(&xas);
+ ret = -EIO;
+ goto out;
+ }
+
+ xas.xa_index = 0;
+ xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
+ if (xas.xa_node == XAS_RESTART)
+ xas_set_err(&xas, -EBUSY);
+ xas_store(&xas, req);
+ xas_clear_mark(&xas, XA_FREE_MARK);
+ xas_set_mark(&xas, CACHEFILES_REQ_NEW);
+ xas_unlock(&xas);
+ } while (xas_nomem(&xas, GFP_KERNEL));
+
+ ret = xas_error(&xas);
+ if (ret)
+ goto out;
+
+ wake_up_all(&cache->daemon_pollwq);
+ wait_for_completion(&req->done);
+ ret = req->error;
+out:
+ kfree(req);
+ return ret;
+}
+
+static int init_open_req(struct cachefiles_req *req, void *private)
+{
+ struct cachefiles_object *object = req->object;
+ struct fscache_cookie *cookie = object->cookie;
+ struct fscache_volume *volume = object->volume->vcookie;
+ struct cachefiles_open *load = (void *)req->msg.data;
+ size_t volume_key_len, cookie_key_len;
+ void *volume_key, *cookie_key;
+ unsigned long flags = 0;
+
+ /*
+ * Volume key is of string format.
+ * key[0] stores strlen() of the string, while the remained part stores
+ * the content of the string (excluding the suffix '\0'). Append the
+ * suffix '\0' to the output volume_key, so that it's a valid string.
+ */
+ volume_key_len = volume->key[0] + 1;
+ volume_key = volume->key + 1;
+
+ /*
+ * Cookie key is of binary format, which is netfs specific.
+ */
+ cookie_key_len = cookie->key_len;
+ cookie_key = fscache_get_key(cookie);
+
+ if (object->cookie->advice & FSCACHE_ADV_WANT_CACHE_SIZE)
+ __set_bit(CACHEFILES_OPEN_WANT_CACHE_SIZE, &flags);
+
+ load->flags = flags;
+ load->volume_key_len = volume_key_len;
+ load->cookie_key_len = cookie_key_len;
+ memcpy(load->data, volume_key, volume_key_len);
+ memcpy(load->data + volume_key_len, cookie_key, cookie_key_len);
+
+ return 0;
+}
+
+int cachefiles_ondemand_init_object(struct cachefiles_object *object)
+{
+ struct fscache_cookie *cookie = object->cookie;
+ struct fscache_volume *volume = object->volume->vcookie;
+ size_t volume_key_len, cookie_key_len, data_len;
+
+ /*
+ * Cachefiles will firstly check cache file under the root cache
+ * directory. If coherency check failed, it will fallback to creating a
+ * new tmpfile as the cache file. Reuse the previously created anon_fd
+ * if any.
+ */
+ if (object->fd > 0)
+ return 0;
+
+ volume_key_len = volume->key[0] + 1;
+ cookie_key_len = cookie->key_len;
+ data_len = sizeof(struct cachefiles_open) +
+ volume_key_len + cookie_key_len;
+
+ return cachefiles_ondemand_send_req(object,
+ CACHEFILES_OP_OPEN, data_len,
+ init_open_req, NULL);
+}
diff --git a/include/linux/fscache.h b/include/linux/fscache.h
index d2430da8aa67..a330354f33ca 100644
--- a/include/linux/fscache.h
+++ b/include/linux/fscache.h
@@ -39,6 +39,7 @@ struct fscache_cookie;
#define FSCACHE_ADV_SINGLE_CHUNK 0x01 /* The object is a single chunk of data */
#define FSCACHE_ADV_WRITE_CACHE 0x00 /* Do cache if written to locally */
#define FSCACHE_ADV_WRITE_NOCACHE 0x02 /* Don't cache if written to locally */
+#define FSCACHE_ADV_WANT_CACHE_SIZE 0x04 /* Retrieve cache size at runtime */

#define FSCACHE_INVAL_DIO_WRITE 0x01 /* Invalidate due to DIO write */

diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index c6f5aa74db89..371e5816e98c 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -31,6 +31,8 @@ enum cachefiles_obj_ref_trace {
cachefiles_obj_see_lookup_failed,
cachefiles_obj_see_withdraw_cookie,
cachefiles_obj_see_withdrawal,
+ cachefiles_obj_get_ondemand_fd,
+ cachefiles_obj_put_ondemand_fd,
};

enum fscache_why_object_killed {
diff --git a/include/uapi/linux/cachefiles.h b/include/uapi/linux/cachefiles.h
new file mode 100644
index 000000000000..0c44d68be6bd
--- /dev/null
+++ b/include/uapi/linux/cachefiles.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_CACHEFILES_H
+#define _LINUX_CACHEFILES_H
+
+#include <linux/types.h>
+
+/*
+ * Fscache ensures that the maximum length of cookie key is 255. The volume key
+ * is controled by netfs, and generally no bigger than 255.
+ */
+#define CACHEFILES_MSG_MAX_SIZE 1024
+
+enum cachefiles_opcode {
+ CACHEFILES_OP_OPEN,
+};
+
+/*
+ * @id identifying position of this message in the radix tree
+ * @opcode message type, CACHEFILE_OP_*
+ * @len message length, including message header and following data
+ * @data message type specific payload
+ */
+struct cachefiles_msg {
+ __u32 id;
+ __u32 opcode;
+ __u32 len;
+ __u8 data[];
+};
+
+struct cachefiles_open {
+ __u32 volume_key_len;
+ __u32 cookie_key_len;
+ __u32 fd;
+ __u32 flags;
+ /* following data contains volume_key and cookie_key in sequence */
+ __u8 data[];
+};
+
+enum cachefiles_open_flags {
+ CACHEFILES_OPEN_WANT_CACHE_SIZE,
+};
+
+#endif
--
2.27.0

2022-03-25 21:20:24

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v6 03/22] cachefiles: notify user daemon with anon_fd when looking up cookie

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220325]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220325-203555
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: csky-defconfig (https://download.01.org/0day-ci/archive/20220326/[email protected]/config)
compiler: csky-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/ec8aa2f84eb47244377e4b822dd77d82ee54714a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220325-203555
git checkout ec8aa2f84eb47244377e4b822dd77d82ee54714a
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=csky SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

csky-linux-ld: fs/cachefiles/daemon.o: in function `cachefiles_ondemand_daemon_read':
>> daemon.c:(.text+0x97c): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/interface.o: in function `cachefiles_ondemand_daemon_read':
interface.c:(.text+0x1ec): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/io.o: in function `cachefiles_ondemand_daemon_read':
io.c:(.text+0x720): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/key.o: in function `cachefiles_ondemand_daemon_read':
key.c:(.text+0x0): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/main.o: in function `cachefiles_ondemand_daemon_read':
main.c:(.text+0x0): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/namei.o: in function `cachefiles_ondemand_daemon_read':
namei.c:(.text+0xf8): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/security.o: in function `cachefiles_ondemand_daemon_read':
security.c:(.text+0x24): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/volume.o: in function `cachefiles_ondemand_daemon_read':
volume.c:(.text+0x0): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here
csky-linux-ld: fs/cachefiles/xattr.o: in function `cachefiles_ondemand_daemon_read':
xattr.c:(.text+0x0): multiple definition of `cachefiles_ondemand_daemon_read'; fs/cachefiles/cache.o:cache.c:(.text+0x18): first defined here

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-03-25 23:53:20

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v6 03/22] cachefiles: notify user daemon with anon_fd when looking up cookie

Hi Jeffle,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on rostedt-trace/for-next linus/master v5.17]
[cannot apply to xiang-erofs/dev-test dhowells-fs/fscache-next next-20220325]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220325-203555
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220326/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0f6d9501cf49ce02937099350d08f20c4af86f3d)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/ec8aa2f84eb47244377e4b822dd77d82ee54714a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jeffle-Xu/fscache-erofs-fscache-based-on-demand-read-semantics/20220325-203555
git checkout ec8aa2f84eb47244377e4b822dd77d82ee54714a
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from fs/cachefiles/cache.c:11:
>> fs/cachefiles/internal.h:285:9: warning: no previous prototype for function 'cachefiles_ondemand_daemon_read' [-Wmissing-prototypes]
ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
^
fs/cachefiles/internal.h:285:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
^
static
1 warning generated.
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/daemon.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/interface.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/io.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/key.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/main.o:(.text+0x38C0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/namei.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/security.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/volume.o:(.text+0x0)
--
>> ld.lld: error: duplicate symbol: cachefiles_ondemand_daemon_read
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/cache.o:(cachefiles_ondemand_daemon_read)
>>> defined at internal.h:287 (fs/cachefiles/internal.h:287)
>>> fs/cachefiles/xattr.o:(.text+0x0)

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-03-28 11:28:20

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v6 10/22] erofs: add mode checking helper



On 3/28/22 10:42 AM, Gao Xiang wrote:
> On Fri, Mar 25, 2022 at 08:22:11PM +0800, Jeffle Xu wrote:
>> Until then erofs is exactly blockdev based filesystem. In other using
>> scenarios (e.g. container image), erofs needs to run upon files.
>>
>> This patch set is going to introduces a new nodev mode, in which erofs
>> could be mounted from a bootstrap blob file containing complete erofs
>> image.
>>
>> Add a helper checking which mode erofs works in.
>>
>> Signed-off-by: Jeffle Xu <[email protected]>
>> ---
>> fs/erofs/internal.h | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index e424293f47a2..1486e2573667 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -161,6 +161,11 @@ struct erofs_sb_info {
>> #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option)
>> #define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)
>>
>> +static inline bool erofs_is_nodev_mode(struct super_block *sb)
>
> I've seen a lot of such
>
> + if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) &&
> + erofs_is_nodev_mode(sb)) {
>
> usages in the followup patches, which makes me wonder if the configuration
> can be checked in the helper as well. Also maybe rename it as
> erofs_is_fscache_mode()?
>

Sure. Will be done in the next version.

--
Thanks,
Jeffle

2022-03-28 11:29:58

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v6 15/22] erofs: register cookie context for bootstrap blob

On Fri, Mar 25, 2022 at 08:22:16PM +0800, Jeffle Xu wrote:
> Registers fscache_cookie for the bootstrap blob file. The bootstrap blob
> file can be specified by a new mount option, which is going to be
> introduced by a following patch.
>
> Something worth mentioning about the cleanup routine.
>
> 1. The init routine is prior to when the root inode gets initialized,
> and thus the corresponding cleanup routine shall be placed inside
> .kill_sb() callback.
>
> 2. The init routine will instantiate anonymous inodes under the
> super_block, and thus .put_super() callback shall also contain the
> cleanup routine. Or we'll get "VFS: Busy inodes after unmount." warning.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/internal.h | 3 +++
> fs/erofs/super.c | 17 +++++++++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 459f31803c3b..d8c886a7491e 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -73,6 +73,7 @@ struct erofs_mount_opts {
> /* threshold for decompression synchronously */
> unsigned int max_sync_decompress_pages;
> #endif
> + char *tag;
> unsigned int mount_opt;
> };
>
> @@ -151,6 +152,8 @@ struct erofs_sb_info {
> /* sysfs support */
> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
> struct completion s_kobj_unregister;
> +
> + struct erofs_fscache *bootstrap;

the concept of bootstrap is nydus-specific. Actually here we need
a fscache context of the primary device.

So I prefer struct erofs_fscache *s_fscache;

Also please help revise the subject and commit message about
bootstrap.

Thanks,
Gao Xiang

2022-03-28 11:58:04

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v6 17/22] erofs: implement fscache-based data read for non-inline layout

On Fri, Mar 25, 2022 at 08:22:18PM +0800, Jeffle Xu wrote:
> Implements the data plane of reading data from bootstrap blob file over
> fscache for non-inline layout.
>
> Be noted that compressed layout is not supported yet.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/fscache.c | 83 +++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/inode.c | 8 ++++-
> fs/erofs/internal.h | 5 +++
> 3 files changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index 91377939b4f7..4a9a4e60c15d 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -60,10 +60,93 @@ static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
> return ret;
> }
>
> +static inline int erofs_fscache_get_map(struct erofs_map_blocks *map,
> + struct super_block *sb)

I wonder if m_fscache should be settled in struct erofs_map_dev

And such helper can be merged into erofs_map_dev() as well.

> +{
> + struct erofs_sb_info *sbi = EROFS_SB(sb);
> +
> + map->m_fscache = sbi->bootstrap;
> + return 0;
> +}
> +
> +static int erofs_fscache_readpage_noinline(struct folio *folio,
> + struct erofs_map_blocks *map)
> +{
> + struct fscache_cookie *cookie = map->m_fscache->cookie;
> + /*
> + * 1) For FLAT_PLAIN layout, the output map.m_la shall be equal to o_la,
> + * and the output map.m_pa is exactly the physical address of o_la.
> + * 2) For CHUNK_BASED layout, the output map.m_la is rounded down to the
> + * nearest chunk boundary, and the output map.m_pa is actually the
> + * physical address of this chunk boundary. So we need to recalculate
> + * the actual physical address of o_la.
> + */
> + loff_t start = map->m_pa + (map->o_la - map->m_la);

I think o_la can be directly replaced with "folio_pos(folio)".

Also such helper might be unneeded...

> +
> + return erofs_fscache_read_folio(cookie, folio, start);
> +}
> +
> +static int erofs_fscache_do_readpage(struct folio *folio)

Can it fold into erofs_fscache_readpage?
Another unneeded helper...

> +{
> + struct inode *inode = folio_file_mapping(folio)->host;
> + struct erofs_inode *vi = EROFS_I(inode);
> + struct super_block *sb = inode->i_sb;
> + struct erofs_map_blocks map;
> + int ret;
> +
> + if (erofs_inode_is_data_compressed(vi->datalayout)) {

It's impossible for now. So the check above is redundant.

Thanks,
Gao Xiang

> + erofs_info(sb, "compressed layout not supported yet");
> + return -EOPNOTSUPP;
> + }
> +
> + DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ);
> +
> + map.m_la = map.o_la = folio_pos(folio);
> +
> + ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
> + if (ret)
> + return ret;
> +
> + if (!(map.m_flags & EROFS_MAP_MAPPED)) {
> + folio_zero_range(folio, 0, folio_size(folio));
> + return 0;
> + }
> +
> + ret = erofs_fscache_get_map(&map, sb);
> + if (ret)
> + return ret;
> +
> + switch (vi->datalayout) {
> + case EROFS_INODE_FLAT_PLAIN:
> + case EROFS_INODE_CHUNK_BASED:
> + return erofs_fscache_readpage_noinline(folio, &map);
> + default:
> + DBG_BUGON(1);
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static int erofs_fscache_readpage(struct file *file, struct page *page)
> +{
> + struct folio *folio = page_folio(page);
> + int ret;
> +
> + ret = erofs_fscache_do_readpage(folio);
> + if (!ret)
> + folio_mark_uptodate(folio);
> +
> + folio_unlock(folio);
> + return ret;
> +}
> +
> static const struct address_space_operations erofs_fscache_blob_aops = {
> .readpage = erofs_fscache_readpage_blob,
> };
>
> +const struct address_space_operations erofs_fscache_access_aops = {
> + .readpage = erofs_fscache_readpage,
> +};
> +
> /*
> * erofs_fscache_get_folio - find and read page cache of blob file
> * @ctx: the context of the blob file
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index ff62f84f47d3..744faf3ef9f4 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -296,7 +296,13 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
> err = z_erofs_fill_inode(inode);
> goto out_unlock;
> }
> - inode->i_mapping->a_ops = &erofs_raw_access_aops;
> +
> +#ifdef CONFIG_EROFS_FS_ONDEMAND
> + if (erofs_is_nodev_mode(inode->i_sb))
> + inode->i_mapping->a_ops = &erofs_fscache_access_aops;
> +#endif
> + if (!erofs_is_nodev_mode(inode->i_sb))
> + inode->i_mapping->a_ops = &erofs_raw_access_aops;
>
> out_unlock:
> erofs_put_metabuf(&buf);
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index fa89a1e3012f..6537ededed51 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -442,6 +442,9 @@ struct erofs_map_blocks {
> unsigned short m_deviceid;
> char m_algorithmformat;
> unsigned int m_flags;
> +
> + struct erofs_fscache *m_fscache;
> + erofs_off_t o_la;
> };
>
> /* Flags used by erofs_map_blocks_flatmode() */
> @@ -634,6 +637,8 @@ struct erofs_fscache *erofs_fscache_get(struct super_block *sb, char *path,
> void erofs_fscache_put(struct erofs_fscache *ctx);
>
> struct folio *erofs_fscache_get_folio(struct erofs_fscache *ctx, pgoff_t index);
> +
> +extern const struct address_space_operations erofs_fscache_access_aops;
> #else
> static inline int erofs_init_fscache(void) { return 0; }
> static inline void erofs_exit_fscache(void) {}
> --
> 2.27.0

2022-03-28 14:51:07

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v6 10/22] erofs: add mode checking helper

On Fri, Mar 25, 2022 at 08:22:11PM +0800, Jeffle Xu wrote:
> Until then erofs is exactly blockdev based filesystem. In other using
> scenarios (e.g. container image), erofs needs to run upon files.
>
> This patch set is going to introduces a new nodev mode, in which erofs
> could be mounted from a bootstrap blob file containing complete erofs
> image.
>
> Add a helper checking which mode erofs works in.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/internal.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index e424293f47a2..1486e2573667 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -161,6 +161,11 @@ struct erofs_sb_info {
> #define set_opt(opt, option) ((opt)->mount_opt |= EROFS_MOUNT_##option)
> #define test_opt(opt, option) ((opt)->mount_opt & EROFS_MOUNT_##option)
>
> +static inline bool erofs_is_nodev_mode(struct super_block *sb)

I've seen a lot of such

+ if (IS_ENABLED(CONFIG_EROFS_FS_ONDEMAND) &&
+ erofs_is_nodev_mode(sb)) {

usages in the followup patches, which makes me wonder if the configuration
can be checked in the helper as well. Also maybe rename it as
erofs_is_fscache_mode()?

Thanks,
Gao Xiang

> +{
> + return !sb->s_bdev;
> +}
> +
> enum {
> EROFS_ZIP_CACHE_DISABLED,
> EROFS_ZIP_CACHE_READAHEAD,
> --
> 2.27.0

2022-03-28 17:52:55

by Jingbo Xu

[permalink] [raw]
Subject: Re: [PATCH v6 15/22] erofs: register cookie context for bootstrap blob



On 3/28/22 11:04 AM, Gao Xiang wrote:
> On Fri, Mar 25, 2022 at 08:22:16PM +0800, Jeffle Xu wrote:
>> Registers fscache_cookie for the bootstrap blob file. The bootstrap blob
>> file can be specified by a new mount option, which is going to be
>> introduced by a following patch.
>>
>> Something worth mentioning about the cleanup routine.
>>
>> 1. The init routine is prior to when the root inode gets initialized,
>> and thus the corresponding cleanup routine shall be placed inside
>> .kill_sb() callback.
>>
>> 2. The init routine will instantiate anonymous inodes under the
>> super_block, and thus .put_super() callback shall also contain the
>> cleanup routine. Or we'll get "VFS: Busy inodes after unmount." warning.
>>
>> Signed-off-by: Jeffle Xu <[email protected]>
>> ---
>> fs/erofs/internal.h | 3 +++
>> fs/erofs/super.c | 17 +++++++++++++++++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index 459f31803c3b..d8c886a7491e 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -73,6 +73,7 @@ struct erofs_mount_opts {
>> /* threshold for decompression synchronously */
>> unsigned int max_sync_decompress_pages;
>> #endif
>> + char *tag;
>> unsigned int mount_opt;
>> };
>>
>> @@ -151,6 +152,8 @@ struct erofs_sb_info {
>> /* sysfs support */
>> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */
>> struct completion s_kobj_unregister;
>> +
>> + struct erofs_fscache *bootstrap;
>
> the concept of bootstrap is nydus-specific. Actually here we need
> a fscache context of the primary device.
>
> So I prefer struct erofs_fscache *s_fscache;
>
> Also please help revise the subject and commit message about
> bootstrap.
>

OK, will be done in the next version.


--
Thanks,
Jeffle

2022-03-28 20:40:01

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH v6 20/22] erofs: implement fscache-based data read for data blobs

On Fri, Mar 25, 2022 at 08:22:21PM +0800, Jeffle Xu wrote:
> Implements the data plane of reading data from data blob file over
> fscache.
>
> Signed-off-by: Jeffle Xu <[email protected]>
> ---
> fs/erofs/data.c | 3 +++
> fs/erofs/fscache.c | 15 +++++++++++++--
> fs/erofs/internal.h | 1 +
> 3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index b4571bea93d5..b9a05de3c3b2 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -206,6 +206,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> map->m_bdev = sb->s_bdev;
> map->m_daxdev = EROFS_SB(sb)->dax_dev;
> map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
> + map->m_fscache = EROFS_SB(sb)->bootstrap;
>
> if (map->m_deviceid) {
> down_read(&devs->rwsem);
> @@ -217,6 +218,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> map->m_bdev = dif->bdev;
> map->m_daxdev = dif->dax_dev;
> map->m_dax_part_off = dif->dax_part_off;
> + map->m_fscache = dif->blob;
> up_read(&devs->rwsem);
> } else if (devs->extra_devices) {
> down_read(&devs->rwsem);
> @@ -234,6 +236,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> map->m_bdev = dif->bdev;
> map->m_daxdev = dif->dax_dev;
> map->m_dax_part_off = dif->dax_part_off;
> + map->m_fscache = dif->blob;
> break;
> }
> }
> diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
> index d75958470645..cbb39657615e 100644
> --- a/fs/erofs/fscache.c
> +++ b/fs/erofs/fscache.c
> @@ -63,9 +63,20 @@ static int erofs_fscache_readpage_blob(struct file *data, struct page *page)
> static inline int erofs_fscache_get_map(struct erofs_map_blocks *map,
> struct super_block *sb)
> {

So erofs_fscache_get_map seems really unneeded...
erofs_map_dev can be used directly so we can avoid this patch.

Thanks,
Gao Xiang

> - struct erofs_sb_info *sbi = EROFS_SB(sb);
> + struct erofs_map_dev mdev;
> + int ret;
> +
> + mdev = (struct erofs_map_dev) {
> + .m_deviceid = map->m_deviceid,
> + .m_pa = map->m_pa,
> + };
> +
> + ret = erofs_map_dev(sb, &mdev);
> + if (ret)
> + return ret;
>
> - map->m_fscache = sbi->bootstrap;
> + map->m_fscache = mdev.m_fscache;
> + map->m_pa = mdev.m_pa;
> return 0;
> }
>
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 94a118caf580..cea08f12a2c3 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -487,6 +487,7 @@ struct erofs_map_dev {
> struct block_device *m_bdev;
> struct dax_device *m_daxdev;
> u64 m_dax_part_off;
> + struct erofs_fscache *m_fscache;
>
> erofs_off_t m_pa;
> unsigned int m_deviceid;
> --
> 2.27.0

2022-03-29 09:09:12

by Jingbo Xu

[permalink] [raw]
Subject: Re: [Linux-cachefs] [PATCH v6 03/22] cachefiles: notify user daemon with anon_fd when looking up cookie



On 3/25/22 8:22 PM, Jeffle Xu wrote:

> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
> index e80673d0ab97..8a0f1b691aca 100644
> --- a/fs/cachefiles/internal.h
> +++ b/fs/cachefiles/internal.h
> @@ -15,6 +15,8 @@
>
> +/*
> + * ondemand.c
> + */
> +#ifdef CONFIG_CACHEFILES_ONDEMAND
> +extern ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
> + char __user *_buffer,
> + size_t buflen);
> +
> +extern int cachefiles_ondemand_cinit(struct cachefiles_cache *cache,
> + char *args);
> +
> +extern int cachefiles_ondemand_init_object(struct cachefiles_object *object);
> +
> +#else

> +ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
> + char __user *_buffer, size_t buflen)

Needs to be declared as static inline ...

> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline int cachefiles_ondemand_init_object(struct cachefiles_object *object)
> +{
> + return 0;
> +}
> +#endif


--
Thanks,
Jeffle