LinuxLists.cc - [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

2022-04-06 14:32:29

Subject: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

changes since v7:
- rebased to 5.18-rc1
- include "cachefiles: unmark inode in use in error path" patch into
this patchset to avoid warning from test robot (patch 1)
- cachefiles: rename [cookie|volume]_key_len field of struct
cachefiles_open to [cookie|volume]_key_size to avoid potential
misunderstanding. Also add more documentation to
include/uapi/linux/cachefiles.h. (patch 3)
- cachefiles: valid check for error code returned from user daemon
(patch 3)
- cachefiles: change WARN_ON_ONCE() to pr_info_once() when user daemon
closes anon_fd prematurely (patch 4/5)
- ready for complete review

Kernel Patchset
---------------
Git tree:

https://github.com/lostjeffle/linux.git jingbo/dev-erofs-fscache-v8

Gitweb:

https://github.com/lostjeffle/linux/commits/jingbo/dev-erofs-fscache-v8

User Daemon for Quick Test
--------------------------
Git tree:

https://github.com/lostjeffle/demand-read-cachefilesd.git main

Gitweb:

https://github.com/lostjeffle/demand-read-cachefilesd

RFC: https://lore.kernel.org/all/[email protected]/t/
v1: https://lore.kernel.org/lkml/[email protected]/T/
v2: https://lore.kernel.org/all/[email protected]/t/
v3: https://lore.kernel.org/lkml/[email protected]/T/
v4: https://lore.kernel.org/lkml/[email protected]/T/#t
v5: https://lore.kernel.org/lkml/[email protected]/T/
v6: https://lore.kernel.org/lkml/[email protected]/T/
v7: https://www.spinics.net/lists/linux-fsdevel/msg215066.html

[Background]
============
Nydus [1] is an image distribution service especially optimized for
distribution over network. Nydus is an excellent container image
acceleration solution, since it only pulls data from remote when needed,
a.k.a. on-demand reading and it also supports chunk-based deduplication,
compression, etc.

erofs (Enhanced Read-Only File System) is a filesystem designed for
read-only scenarios. (Documentation/filesystem/erofs.rst)

Over the past months we've been focusing on supporting Nydus image service
with in-kernel erofs format[2]. In that case, each container image will be
organized in one bootstrap (metadata) and (optional) multiple data blobs in
erofs format. Massive container images will be stored on one machine.

To accelerate the container startup (fetching container images from remote
and then start the container), we do hope that the bootstrap & blob files
could support on-demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
Then it'll have native performance after data is available locally.

That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management instead.

The fscache on-demand read feature aims to be implemented in a generic way
so that it can benefit other use cases and/or filesystems if it's
implemented in the fscache subsystem.

[1] https://nydus.dev
[2] https://sched.co/pcdL

[Overall Design]
================
Please refer to patch 7 ("cachefiles: document on-demand read mode") for
more details.

When working in the original mode, cachefiles mainly serves as a local cache
for remote networking fs, while in on-demand read mode, cachefiles can work
in the scenario where on-demand read semantics is needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when cache miss, netfs itself will fetch data from remote, and then write the
fetched data into cache file. While in on-demand read mode, a user daemon is
responsible for fetching data and then feeds to the kernel fscache side.

The on-demand read mode relies on a simple protocol used for communication
between kernel and user daemon.

The proposed implementation relies on the anonymous fd mechanism to avoid
the dependence on the format of cache file. When a fscache cachefile is opened
for the first time, an anon_fd associated with the cache file is sent to the
user daemon. With the given anon_fd, user daemon could fetch and write data
into the cache file in the background, even when kernel has not triggered the
cache miss. Besides, the write() syscall to the anon_fd will finally call
cachefiles kernel module, which will write data to cache file in the latest
format of cache file.

1. cache miss
When cache miss, cachefiles kernel module will notify user daemon with the
anon_fd, along with the requested file range. When notified, user daemon
needs to fetch data of the requested file range, and then write the fetched
data into cache file with the given anonymous fd. When finished processing
the request, user daemon needs to notify the kernel.

After notifying the user daemon, the kernel read routine will hang there,
until the request is handled by user daemon. When it's awaken by the
notification from user daemon, i.e. the corresponding hole has been filled
by the user daemon, it will retry to read from the same file range.

2. cache hit
Once data is already ready in cache file, netfs will read from cache
file directly.

[Advantage of fscache-based on-demand read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.

If data has already been ready in the backing file, netfs (e.g. erofs)
will read from the backing file directly and won't be trapped to user
space anymore. Thus the user daemon could fetch data (from remote)
asynchronously on the background, and thus accelerate the backing file
accessing in some degree.

2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.

In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.

Jeffle Xu (20):
cachefiles: unmark inode in use in error path
cachefiles: extract write routine
cachefiles: notify user daemon with anon_fd when looking up cookie
cachefiles: notify user daemon when withdrawing cookie
cachefiles: implement on-demand read
cachefiles: enable on-demand read mode
cachefiles: document on-demand read mode
erofs: make erofs_map_blocks() generally available
erofs: add mode checking helper
erofs: register fscache volume
erofs: add fscache context helper functions
erofs: add anonymous inode managing page cache for data blob
erofs: add erofs_fscache_read_folios() helper
erofs: register fscache context for primary data blob
erofs: register fscache context for extra data blobs
erofs: implement fscache-based metadata read
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data readahead
erofs: add 'fsid' mount option

.../filesystems/caching/cachefiles.rst | 165 ++++++
fs/cachefiles/Kconfig | 11 +
fs/cachefiles/Makefile | 1 +
fs/cachefiles/daemon.c | 90 +++-
fs/cachefiles/interface.c | 2 +
fs/cachefiles/internal.h | 67 +++
fs/cachefiles/io.c | 72 ++-
fs/cachefiles/namei.c | 49 +-
fs/cachefiles/ondemand.c | 479 ++++++++++++++++++
fs/erofs/Kconfig | 10 +
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 27 +-
fs/erofs/fscache.c | 369 ++++++++++++++
fs/erofs/inode.c | 5 +
fs/erofs/internal.h | 55 ++
fs/erofs/super.c | 99 +++-
include/linux/fscache.h | 1 +
include/linux/netfs.h | 1 +
include/trace/events/cachefiles.h | 2 +
include/uapi/linux/cachefiles.h | 72 +++
20 files changed, 1501 insertions(+), 77 deletions(-)
create mode 100644 fs/cachefiles/ondemand.c
create mode 100644 fs/erofs/fscache.c
create mode 100644 include/uapi/linux/cachefiles.h

--
2.27.0

2022-04-06 14:32:35

Subject: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Subject: [PATCH v8 08/20] erofs: make erofs_map_blocks() generally available

Subject: [PATCH v8 12/20] erofs: add anonymous inode managing page cache for data blob

Subject: [PATCH v8 18/20] erofs: implement fscache-based data read for inline layout

Subject: [PATCH v8 11/20] erofs: add fscache context helper functions

Subject: [PATCH v8 05/20] cachefiles: implement on-demand read

Subject: [PATCH v8 07/20] cachefiles: document on-demand read mode

Subject: [PATCH v8 17/20] erofs: implement fscache-based data read for non-inline layout

Subject: [PATCH v8 20/20] erofs: add 'fsid' mount option

Subject: [PATCH v8 19/20] erofs: implement fscache-based data readahead

Subject: [PATCH v8 09/20] erofs: add mode checking helper

Subject: [PATCH v8 10/20] erofs: register fscache volume

Subject: [PATCH v8 14/20] erofs: register fscache context for primary data blob

Subject: Re: [PATCH v8 08/20] erofs: make erofs_map_blocks() generally available

Subject: Re: [PATCH v8 10/20] erofs: register fscache volume

Subject: Re: [PATCH v8 12/20] erofs: add anonymous inode managing page cache for data blob

Subject: Re: [PATCH v8 20/20] erofs: add 'fsid' mount option

Subject: Re: [PATCH v8 14/20] erofs: register fscache context for primary data blob

Subject: Re: [PATCH v8 09/20] erofs: add mode checking helper

Subject: Re: [PATCH v8 19/20] erofs: implement fscache-based data readahead

Subject: Re: [PATCH v8 17/20] erofs: implement fscache-based data read for non-inline layout

Subject: Re: [PATCH v8 11/20] erofs: add fscache context helper functions

Subject: Re: [PATCH v8 18/20] erofs: implement fscache-based data read for inline layout

Subject: Re: [PATCH v8 12/20] erofs: add anonymous inode managing page cache for data blob

Subject: Re: [PATCH v8 05/20] cachefiles: implement on-demand read

Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Subject: Re: [PATCH v8 05/20] cachefiles: implement on-demand read

Subject: Re: [PATCH v8 07/20] cachefiles: document on-demand read mode

Subject: Re: [PATCH v8 07/20] cachefiles: document on-demand read mode

Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Subject: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Subject: Re: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics

Subject: Re: Re: [PATCH v8 00/20] fscache,erofs: fscache-based on-demand read semantics