[Background]
============
erofs (Enhanced Read-Only File System) is a filesystem specially
optimised for read-only scenarios. (Documentation/filesystem/erofs.rst)
Recently we are focusing on erofs in container images distribution
scenario (https://sched.co/pcdL). In this case, erofs can be mounted
from one bootstrap file (metadata) with (optional) multiple data blob
files (data) stored on another local filesystem. (All these files are
actually image files in erofs disk format.)
To accelerate the container startup (fetching container image from remote
and then start the container), we do hope that the bootstrap blob file
could support demand read. That is, erofs can be mounted and accessed
even when the bootstrap/data blob files have not been fully downloaded.
That means we have to manage the cache state of the bootstrap/data blob
files (if cache hit, read directly from the local cache; if cache miss,
fetch the data somehow). It would be painful and may be dumb for erofs to
implement the cache management itself. Thus we prefer fscache/cachefiles
to do the cache management. Besides, the demand-read feature shall be
general and it can benefit other using scenarios if it can be implemented
in fscache level.
[Overall Design]
================
The upper fs uses a backing file on the local fs as the local cache
(exactly the "cachefiles" way), and relies on fscache to detect if data
is ready or not (cache hit/miss). Since currently fscache detects cache
hit/miss by detecting the hole of the backing files, our demand-read
mechanism also relies on the hole detecting.
1. initial phase
On the first beginning, the user daemon will touch the backing files
(bootstrap/data blob files) under corresponding directory (under
<root>/cache/<volume>) in advance. These backing files are completely
sparse files (with zero disk usage). Since these backing files are all
read-only and the file size is fixed, user daemon will set corresponding
file size and thus create all these sparse backing files in advance.
2. cache miss
When a file range (of bootstrap/data blob file) is accessed for the
first time, a cache miss will be triggered and then .issue_op() will be
called to fetch the data somehow.
In the demand-read case, we relies on a user daemon to fetch the data
from local/remote. In this case, .issue_op() just packages the file
range into a message and informs the user daemon, which was polling and
waiting on /dev/cachefiles. Once awaken, the user daemon will then read
/dev/cachefiles to get the file range information, and go to fetch the
data corresponding to the file range. Once data ready, the user daemon
will write the fetched data into the backing file and then inform the
previous .issue_op() by writing to /dev/cachefiles. The previous
.issue_op() calling will be blocked there until it is informed by the
user daemon that the data has been ready. By then the data has been
ready in the backing file, and the netfs API will re-initiate a read
request from the backing file.
3. cache hit
Once data is already ready in the backing file, netfs API will read from
the backing file directly.
[Advantage of fscache-based demand-read]
========================================
1. Asynchronous Prefetch
In current mechanism, fscache is responsible for cache state management,
while the data plane (fetch data from local/remote on cache miss) is
done on the user daemon side.
If data has already been ready in the backing file, netfs API will read
from the backing file directly and won't be trapped to user space anymore.
Thus the user daemon could fetch data (from remote) asynchronously on the
background, and thus accelerate the backing file accessing in some degree.
2. Support massive blob files
Besides this mechanism supports a large amount of backing files, and
thus can benefit the densely employed scenario.
In our using scenario, one container image can correspond to one
bootstrap file (required) and multiple data blob files (optional). For
example, one container image for node.js will corresponds to ~20 files
in total. In densely employed environment, there could be as many as
hundreds of containers and thus thousands of backing files on one
machine.
[Invalidation Strategy]
=======================
Currently I have no clear idea on the invalidation (culling) strategy
yet... It needs further discussion and then gets implemented later.
[Patchset Organization]
=======================
- patch 1-16 implement the data plane over fscache. Until then erofs
could access the bootstrap blob file (backing file) with fscache,
though the backing file needs to be ready (fully downloaded).
- patch 17-19 implement the demand-read semantics. Then it will rely on
the user daemon to fetch the data once the backing file has not been
ready (cache miss).
[Interaction with fscache/cachefiles/netfs]
===========================================
fscache/cachefiles/netfs are initially designed to serve as a local
cache for remote filesystems. As they are used to implement the
demand-read semantics, the logic may need to be twisted somehow.
This RFC pathset is still quite coarse and is only used to show the
skeleton of the whole mechanism. Thus to get a workable model as soon as
possible, the refactoring to fscache/cachefiles/netfs in this pathset is
quite rough. (sorry for that...) Further discussion and clarification is
obviously needed.
1. The path of the backing file
In cachefiles, the backing file will be stored under one fan
sub-directory according to a hashing algorithm. While in our using
scenario, user daemon need to touch bootstrap/data blob file under
correct directory in advance.
In this RFC patchset, I directly passthrough the placing algorithm
(patch 2) for convenience of debug. But in the later version, we can
make the hashing algorithm used by cachefiles built-in into the user
daemon, and let user daemon compute the corresponding hash value and
place bootstrap/data blob file under right directory. The goal is to
keep cachefiles' placing logic untouched as much as possible.
2. Upper fs doesn't know file size in advance
The @object_size parameter of fscache_acquire_cookie() represents the
file size of the netfs file, and serves in several places.
- the size of the backing file will be set to @object_size during the
cookie lookup phase.
- @object_size will be used to do the coherency checking (compared with
the file size in "CacheFiles.cache" xattr) during the cookie lookup
phase.
- netfs API will check if the current readed file range hits EOF
according to the file size.
While in demand-read case, the upper fs has no idea of the file size of
the blob file. Besides, since these files are all read-only, the file
size is fixed and (as described in 'initial phase') user daemon has set
corresponding file size (of sparse files), maybe fscache could query the
file size of the backing file directly, e.g. through fstat on the
backing file, instead of relying on upper fs to offer the credible file
size.
Similarly patch 3/10/11 in this RFC just skip the related checking and
logic for demand-read case.
3. Refactor the address_space based netfs_readpage() API
The @folio parameter of netfs_readpage() indicates a page cache in the
address_space of the netfs file, and thus the following logic of netfs
API will directly copy data to the page cache in the address_space,
leaving the @folio parameter aside.
While in demand-read case, the input @folio is no longer a page cache in
the address_space of one file. Instead, it may be just a temporary page
used to contain the data. Thus netfs API needed to be refactored somehow
to adapt to this change.
Patch 8/9 in this RFC are for this purpose.
4. Maybe need another device node
In demand-read case, we rely on the user daemon to fetch data from
local/remote. Currently we re-use "/dev/cachefiles" for the
communication between fscache kernel module and user daemon. It's
obviously not acceptable since "/dev/cachefiles" is only for culling.
Later we could create another device node for this purpose.
[Test]
======
1. create erofs image (bootstrap)
mkfs.erofs test.img tmp/
2. move bootstrap to corresponding place under the root directory of
fscache
3. run user daemon
(https://github.com/lostjeffle/demand-read-cachefilesd/blob/main/cachefilesd2.c)
./cachefilesd2
4. mount erofs from bootstrap
mount -t erofs none -o bootstrap_path=test.img /mnt/
Jeffle Xu (19):
cachefiles: add mode command
cachefiles: implement key scheme for demand-read mode
cachefiles: refactor cachefiles_adjust_size()
netfs: make ops->init_rreq() optional
netfs: refactor netfs_alloc_read_request
netfs: add type field to struct netfs_read_request
netfs: add netfs_readpage_demand()
netfs: refactor netfs_clear_unread()
netfs: refactor netfs_rreq_unlock()
netfs: refactor netfs_rreq_prepare_read
cachefiles: refactor cachefiles_prepare_read
erofs: export erofs_map_blocks
erofs: add bootstrap_path mount option
erofs: introduce fscache support
erofs: implement fscache-based metadata read
erofs: implement fscache-based data read
netfs: support on demand read
cachefiles: support on demand read
erofs: support on demand read
fs/cachefiles/daemon.c | 183 ++++++++++++++++++++++++++++----------
fs/cachefiles/interface.c | 4 +
fs/cachefiles/internal.h | 22 +++++
fs/cachefiles/io.c | 59 +++++++++++-
fs/cachefiles/namei.c | 8 +-
fs/cachefiles/xattr.c | 5 ++
fs/ceph/addr.c | 5 --
fs/erofs/Makefile | 2 +-
fs/erofs/data.c | 18 ++--
fs/erofs/fscache.c | 161 +++++++++++++++++++++++++++++++++
fs/erofs/inode.c | 6 +-
fs/erofs/internal.h | 15 ++++
fs/erofs/super.c | 55 ++++++++++--
fs/netfs/read_helper.c | 179 +++++++++++++++++++++++++++++++++----
include/linux/netfs.h | 16 ++++
15 files changed, 652 insertions(+), 86 deletions(-)
create mode 100644 fs/erofs/fscache.c
--
2.27.0
fscache/cachefiles used to serve as a local cache for remote fs.
This patch set introduces a new use case, in which local read-only
fs could implement demand read with fscache.
Thus 'mode' field is used to distinguish which mode cachefiles works in.
User daemon could set the specified mode with 'mode' command. If user
daemon doesn't ever explicitly set the mode, then cachefiles serves as
the local cache for remote fs by default.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/daemon.c | 35 +++++++++++++++++++++++++++++++++++
fs/cachefiles/internal.h | 6 ++++++
2 files changed, 41 insertions(+)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 40a792421fc1..951963e72b44 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -41,6 +41,7 @@ static int cachefiles_daemon_dir(struct cachefiles_cache *, char *);
static int cachefiles_daemon_inuse(struct cachefiles_cache *, char *);
static int cachefiles_daemon_secctx(struct cachefiles_cache *, char *);
static int cachefiles_daemon_tag(struct cachefiles_cache *, char *);
+static int cachefiles_daemon_mode(struct cachefiles_cache *, char *);
static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
static void cachefiles_daemon_unbind(struct cachefiles_cache *);
@@ -75,6 +76,7 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
{ "inuse", cachefiles_daemon_inuse },
{ "secctx", cachefiles_daemon_secctx },
{ "tag", cachefiles_daemon_tag },
+ { "mode", cachefiles_daemon_mode },
{ "", NULL }
};
@@ -663,6 +665,39 @@ static int cachefiles_daemon_inuse(struct cachefiles_cache *cache, char *args)
return -EINVAL;
}
+/*
+ * Set the cache mode
+ * - command: "mode cache|demand"
+ */
+static int cachefiles_daemon_mode(struct cachefiles_cache *cache, char *args)
+{
+ enum cachefiles_mode mode;
+
+ _enter(",%s", args);
+
+ if (test_bit(CACHEFILES_READY, &cache->flags)) {
+ pr_err("Cache already started\n");
+ return -EINVAL;
+ }
+
+ if (!*args) {
+ pr_err("Empty mode specified\n");
+ return -EINVAL;
+ }
+
+ if (!strncmp(args, "cache", strlen("cache"))) {
+ mode = CACHEFILES_MODE_CACHE;
+ } else if (!strncmp(args, "demand", strlen("demand"))) {
+ mode = CACHEFILES_MODE_DEMAND;
+ } else {
+ pr_err("Invalid mode specified\n");
+ return -EINVAL;
+ }
+
+ cache->mode = mode;
+ return 0;
+}
+
/*
* Bind a directory as a cache
*/
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index f42021b3a0be..1366e4319b4e 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -60,6 +60,11 @@ struct cachefiles_object {
#define CACHEFILES_OBJECT_USING_TMPFILE 0 /* Have an unlinked tmpfile */
};
+enum cachefiles_mode {
+ CACHEFILES_MODE_CACHE, /* local cache for netfs (Default) */
+ CACHEFILES_MODE_DEMAND, /* demand read for read-only fs */
+};
+
/*
* Cache files cache definition
*/
@@ -93,6 +98,7 @@ struct cachefiles_cache {
sector_t brun; /* when to stop culling */
sector_t bcull; /* when to start culling */
sector_t bstop; /* when to stop allocating */
+ enum cachefiles_mode mode;
unsigned long flags;
#define CACHEFILES_READY 0 /* T if cache prepared */
#define CACHEFILES_DEAD 1 /* T if cache dead */
--
2.27.0
In demand-read mode, fs using fscache for demand-read doesn't know the
exact file size of the data blob file, and the input @object_size
parameter of fscache_acquire_cookie() could be fake in this case.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/interface.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 51c968cd00a6..b85051250cb7 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -110,6 +110,7 @@ static int cachefiles_adjust_size(struct cachefiles_object *object)
{
struct iattr newattrs;
struct file *file = object->file;
+ struct cachefiles_cache *cache = object->volume->cache;
uint64_t ni_size;
loff_t oi_size;
int ret;
@@ -123,6 +124,9 @@ static int cachefiles_adjust_size(struct cachefiles_object *object)
if (!file)
return -ENOBUFS;
+ if (cache->mode == CACHEFILES_MODE_DEMAND)
+ return 0;
+
oi_size = i_size_read(file_inode(file));
if (oi_size == ni_size)
return 0;
--
2.27.0
In demand-read mode, user daemon may prepare data blob files in advance
before they are lookud up.
Thus simplify the logic of placing backing files, in which backing files
are under "cache/<volume>/" directory directly.
Also skip coherency checking currently to ease the development and debug.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/namei.c | 8 +++++++-
fs/cachefiles/xattr.c | 5 +++++
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 61d412580353..981e6e80690b 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -603,11 +603,17 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
bool cachefiles_look_up_object(struct cachefiles_object *object)
{
struct cachefiles_volume *volume = object->volume;
- struct dentry *dentry, *fan = volume->fanout[(u8)object->cookie->key_hash];
+ struct cachefiles_cache *cache = volume->cache;
+ struct dentry *dentry, *fan;
int ret;
_enter("OBJ%x,%s,", object->debug_id, object->d_name);
+ if (cache->mode == CACHEFILES_MODE_CACHE)
+ fan = volume->fanout[(u8)object->cookie->key_hash];
+ else
+ fan = volume->dentry;
+
/* Look up path "cache/vol/fanout/file". */
ret = cachefiles_inject_read_error();
if (ret == 0)
diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c
index 0601c46a22ef..f562dd0d4bdd 100644
--- a/fs/cachefiles/xattr.c
+++ b/fs/cachefiles/xattr.c
@@ -88,6 +88,7 @@ int cachefiles_set_object_xattr(struct cachefiles_object *object)
*/
int cachefiles_check_auxdata(struct cachefiles_object *object, struct file *file)
{
+ struct cachefiles_cache *cache = object->volume->cache;
struct cachefiles_xattr *buf;
struct dentry *dentry = file->f_path.dentry;
unsigned int len = object->cookie->aux_len, tlen;
@@ -96,6 +97,10 @@ int cachefiles_check_auxdata(struct cachefiles_object *object, struct file *file
ssize_t xlen;
int ret = -ESTALE;
+ /* TODO: coherency check */
+ if (cache->mode == CACHEFILES_MODE_DEMAND)
+ return 0;
+
tlen = sizeof(struct cachefiles_xattr) + len;
buf = kmalloc(tlen, GFP_KERNEL);
if (!buf)
--
2.27.0
netfs_readpage_demand() is the demand-read version of
netfs_readpage().
When netfs API works in demand-read mode, fs using fscache shall call
netfs_readpage_demand() instead.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/netfs/read_helper.c | 63 ++++++++++++++++++++++++++++++++++++++++++
include/linux/netfs.h | 3 ++
2 files changed, 66 insertions(+)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 9240b85548e4..26fa688f6300 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -1022,6 +1022,69 @@ int netfs_readpage(struct file *file,
}
EXPORT_SYMBOL(netfs_readpage);
+int netfs_readpage_demand(struct folio *folio,
+ const struct netfs_read_request_ops *ops,
+ void *netfs_priv)
+{
+ struct netfs_read_request *rreq;
+ unsigned int debug_index = 0;
+ int ret;
+
+ _enter("%lx", folio_index(folio));
+
+ rreq = __netfs_alloc_read_request(ops, netfs_priv, NULL);
+ if (!rreq) {
+ if (netfs_priv)
+ ops->cleanup(netfs_priv, folio_file_mapping(folio));
+ folio_unlock(folio);
+ return -ENOMEM;
+ }
+ rreq->type = NETFS_TYPE_DEMAND;
+ rreq->folio = folio;
+ rreq->start = folio_file_pos(folio);
+ rreq->len = folio_size(folio);
+ __set_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags);
+
+ if (ops->begin_cache_operation) {
+ ret = ops->begin_cache_operation(rreq);
+ if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) {
+ folio_unlock(folio);
+ goto out;
+ }
+ }
+
+ netfs_stat(&netfs_n_rh_readpage);
+ trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage);
+
+ netfs_get_read_request(rreq);
+
+ atomic_set(&rreq->nr_rd_ops, 1);
+ do {
+ if (!netfs_rreq_submit_slice(rreq, &debug_index))
+ break;
+
+ } while (rreq->submitted < rreq->len);
+
+ /* Keep nr_rd_ops incremented so that the ref always belongs to us, and
+ * the service code isn't punted off to a random thread pool to
+ * process.
+ */
+ do {
+ wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1);
+ netfs_rreq_assess(rreq, false);
+ } while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags));
+
+ ret = rreq->error;
+ if (ret == 0 && rreq->submitted < rreq->len) {
+ trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_readpage);
+ ret = -EIO;
+ }
+out:
+ netfs_put_read_request(rreq, false);
+ return ret;
+}
+EXPORT_SYMBOL(netfs_readpage_demand);
+
/*
* Prepare a folio for writing without reading first
* @folio: The folio being prepared
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 638ea5d63869..de6948bcc80a 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -261,6 +261,9 @@ extern int netfs_readpage(struct file *,
struct folio *,
const struct netfs_read_request_ops *,
void *);
+extern int netfs_readpage_demand(struct folio *,
+ const struct netfs_read_request_ops *,
+ void *);
extern int netfs_write_begin(struct file *, struct address_space *,
loff_t, unsigned int, unsigned int, struct folio **,
void **,
--
2.27.0
In demand-read case, the input folio of netfs API is may not the page
cache inside the address space of the netfs file. Instead it may be just
a temporary page used to contain the data.
In this case, use bvec based iov_iter.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/netfs/read_helper.c | 33 +++++++++++++++++++++++++++------
include/linux/netfs.h | 2 ++
2 files changed, 29 insertions(+), 6 deletions(-)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 26fa688f6300..04d0cc2fca83 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -157,6 +157,18 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq,
netfs_put_read_request(rreq, was_async);
}
+static void netfs_init_iov_iter_bvec(struct netfs_read_subrequest *subreq,
+ struct iov_iter *iter)
+{
+ struct bio_vec *bvec = &subreq->bvec;
+
+ bvec->bv_page = folio_page(subreq->rreq->folio, 0);
+ bvec->bv_offset = subreq->start + subreq->transferred;
+ bvec->bv_len = subreq->len - subreq->transferred;
+
+ iov_iter_bvec(iter, READ, bvec, 1, bvec->bv_len);
+}
+
/*
* Clear the unread part of an I/O request.
*/
@@ -164,9 +176,14 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq)
{
struct iov_iter iter;
- iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages,
- subreq->start + subreq->transferred,
- subreq->len - subreq->transferred);
+ if (subreq->rreq->type == NETFS_TYPE_CACHE) {
+ iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages,
+ subreq->start + subreq->transferred,
+ subreq->len - subreq->transferred);
+ } else { /* type == NETFS_TYPE_DEMAND */
+ netfs_init_iov_iter_bvec(subreq, &iter);
+ }
+
iov_iter_zero(iov_iter_count(&iter), &iter);
}
@@ -190,9 +207,13 @@ static void netfs_read_from_cache(struct netfs_read_request *rreq,
struct iov_iter iter;
netfs_stat(&netfs_n_rh_read);
- iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages,
- subreq->start + subreq->transferred,
- subreq->len - subreq->transferred);
+ if (subreq->rreq->type == NETFS_TYPE_CACHE) {
+ iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages,
+ subreq->start + subreq->transferred,
+ subreq->len - subreq->transferred);
+ } else { /* type == NETFS_TYPE_DEMAND */
+ netfs_init_iov_iter_bvec(subreq, &iter);
+ }
cres->ops->read(cres, subreq->start, &iter, read_hole,
netfs_cache_read_terminated, subreq);
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index de6948bcc80a..5f45eb31defd 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -17,6 +17,7 @@
#include <linux/workqueue.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/bvec.h>
/*
* Overload PG_private_2 to give us PG_fscache - this is used to indicate that
@@ -146,6 +147,7 @@ struct netfs_read_subrequest {
#define NETFS_SREQ_SHORT_READ 2 /* Set if there was a short read from the cache */
#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */
#define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */
+ struct bio_vec bvec;
};
enum netfs_read_request_type {
--
2.27.0
In demand-read case, the input folio of netfs API is may not the page
cache inside the address space of the netfs file. Instead it may be just
a temporary page used to contain the data.
Thus iterate corresponding pages through rreq->folio directly.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/netfs/read_helper.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 04d0cc2fca83..af12a7996672 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -414,6 +414,22 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
bool subreq_failed = false;
+ if (rreq->type == NETFS_TYPE_DEMAND) {
+ folio = rreq->folio;
+
+ list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
+ subreq_failed = (subreq->error < 0);
+ if (subreq_failed)
+ break;
+ }
+
+ if (!subreq_failed)
+ folio_mark_uptodate(folio);
+
+ if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags))
+ folio_unlock(folio);
+ } else {
+
XA_STATE(xas, &rreq->mapping->i_pages, start_page);
if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) {
@@ -480,6 +496,7 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq)
}
}
rcu_read_unlock();
+ }
task_io_account_read(account);
if (rreq->netfs_ops->done)
--
2.27.0
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/netfs/read_helper.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 4247916c7100..9240b85548e4 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -37,7 +37,7 @@ static void netfs_put_subrequest(struct netfs_read_subrequest *subreq,
__netfs_put_subrequest(subreq, was_async);
}
-static struct netfs_read_request *netfs_alloc_read_request(
+static struct netfs_read_request *__netfs_alloc_read_request(
const struct netfs_read_request_ops *ops, void *netfs_priv,
struct file *file)
{
@@ -48,8 +48,6 @@ static struct netfs_read_request *netfs_alloc_read_request(
if (rreq) {
rreq->netfs_ops = ops;
rreq->netfs_priv = netfs_priv;
- rreq->inode = file_inode(file);
- rreq->i_size = i_size_read(rreq->inode);
rreq->debug_id = atomic_inc_return(&debug_ids);
INIT_LIST_HEAD(&rreq->subrequests);
INIT_WORK(&rreq->work, netfs_rreq_work);
@@ -63,6 +61,21 @@ static struct netfs_read_request *netfs_alloc_read_request(
return rreq;
}
+static struct netfs_read_request *netfs_alloc_read_request(
+ const struct netfs_read_request_ops *ops, void *netfs_priv,
+ struct file *file)
+{
+ struct netfs_read_request *rreq;
+
+ rreq = __netfs_alloc_read_request(ops, netfs_priv, file);
+ if (rreq) {
+ rreq->inode = file_inode(file);
+ rreq->i_size = i_size_read(rreq->inode);
+ }
+
+ return rreq;
+}
+
static void netfs_get_read_request(struct netfs_read_request *rreq)
{
refcount_inc(&rreq->usage);
--
2.27.0
In demand-read mode, fs using fscache for demand-read doesn't know and
care the exact file size of the data blob file, and thus skip the file
size check in this case.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/netfs/read_helper.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index af12a7996672..ade1523fc180 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -776,14 +776,16 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq,
goto out;
if (source == NETFS_DOWNLOAD_FROM_SERVER) {
- /* Call out to the netfs to let it shrink the request to fit
- * its own I/O sizes and boundaries. If it shinks it here, it
- * will be called again to make simultaneous calls; if it wants
- * to make serial calls, it can indicate a short read and then
- * we will call it again.
- */
- if (subreq->len > rreq->i_size - subreq->start)
- subreq->len = rreq->i_size - subreq->start;
+ if (rreq->type == NETFS_TYPE_CACHE) {
+ /* Call out to the netfs to let it shrink the request to fit
+ * its own I/O sizes and boundaries. If it shinks it here, it
+ * will be called again to make simultaneous calls; if it wants
+ * to make serial calls, it can indicate a short read and then
+ * we will call it again.
+ */
+ if (subreq->len > rreq->i_size - subreq->start)
+ subreq->len = rreq->i_size - subreq->start;
+ }
if (rreq->netfs_ops->clamp_length &&
!rreq->netfs_ops->clamp_length(subreq)) {
--
2.27.0
In demand-read mode, fs using fscache for demand-read doesn't know and
care the exact file size of the data blob file, and thus skip the file
size check in this case.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 60b1eac2ce78..95e9107dc3bb 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -341,7 +341,8 @@ static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subreque
_enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size);
- if (subreq->start >= i_size) {
+ if (subreq->rreq->type == NETFS_TYPE_CACHE &&
+ subreq->start >= i_size) {
ret = NETFS_FILL_WITH_ZEROES;
why = cachefiles_trace_read_after_eof;
goto out_no_object;
--
2.27.0
Until then erofs is exactly blockdev based filesystem. In other using
scenarios (e.g. container image), erofs needs to run upon files.
This patch introduces a new mount option "bootstrap_path", which is used
to specify the bootstrap blob file containing the complete erofs image.
Then erofs could be mounted from the bootstrap blob file.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 13 ++++++++++---
fs/erofs/internal.h | 1 +
fs/erofs/super.c | 40 +++++++++++++++++++++++++++++++++++-----
3 files changed, 46 insertions(+), 8 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 477aaff0c832..cf71082bd52f 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -11,11 +11,18 @@
struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
{
- struct address_space *const mapping = sb->s_bdev->bd_inode->i_mapping;
+ struct address_space * mapping;
struct page *page;
- page = read_cache_page_gfp(mapping, blkaddr,
- mapping_gfp_constraint(mapping, ~__GFP_FS));
+ if (sb->s_bdev) {
+ mapping = sb->s_bdev->bd_inode->i_mapping;
+ page = read_cache_page_gfp(mapping, blkaddr,
+ mapping_gfp_constraint(mapping, ~__GFP_FS));
+ } else {
+ /* TODO: fscache based data path */
+ page = ERR_PTR(-EINVAL);
+ }
+
/* should already be PageUptodate */
if (!IS_ERR(page))
lock_page(page);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 45fb6f5d11b5..cf69d9c9cbed 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -67,6 +67,7 @@ struct erofs_mount_opts {
unsigned int max_sync_decompress_pages;
#endif
unsigned int mount_opt;
+ char *bootstrap_path;
};
struct erofs_dev_context {
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 6a969b1e0ee6..51695f6d4449 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -440,6 +440,7 @@ enum {
Opt_dax,
Opt_dax_enum,
Opt_device,
+ Opt_bootstrap_path,
Opt_err
};
@@ -464,6 +465,7 @@ static const struct fs_parameter_spec erofs_fs_parameters[] = {
fsparam_flag("dax", Opt_dax),
fsparam_enum("dax", Opt_dax_enum, erofs_dax_param_enums),
fsparam_string("device", Opt_device),
+ fsparam_string("bootstrap_path",Opt_bootstrap_path),
{}
};
@@ -559,6 +561,14 @@ static int erofs_fc_parse_param(struct fs_context *fc,
}
++ctx->devs->extra_devices;
break;
+ case Opt_bootstrap_path:
+ kfree(ctx->opt.bootstrap_path);
+ ctx->opt.bootstrap_path = kstrdup(param->string, GFP_KERNEL);
+ if (!ctx->opt.bootstrap_path)
+ return -ENOMEM;
+ infofc(fc, "RAFS bootstrap_path %s", ctx->opt.bootstrap_path);
+ break;
+
default:
return -ENOPARAM;
}
@@ -633,9 +643,12 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_magic = EROFS_SUPER_MAGIC;
- if (!sb_set_blocksize(sb, EROFS_BLKSIZ)) {
+ if (sb->s_bdev && !sb_set_blocksize(sb, EROFS_BLKSIZ)) {
erofs_err(sb, "failed to set erofs blksize");
return -EINVAL;
+ } else {
+ sb->s_blocksize = EROFS_BLKSIZ;
+ sb->s_blocksize_bits = LOG_BLOCK_SIZE;
}
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
@@ -644,16 +657,21 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
- sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
sbi->devs = ctx->devs;
ctx->devs = NULL;
+ if (sb->s_bdev)
+ sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
+ else
+ sbi->dax_dev = NULL;
+
err = erofs_read_superblock(sb);
if (err)
return err;
if (test_opt(&sbi->opt, DAX_ALWAYS) &&
- !dax_supported(sbi->dax_dev, sb->s_bdev, EROFS_BLKSIZ, 0, bdev_nr_sectors(sb->s_bdev))) {
+ (!sbi->dax_dev ||
+ !dax_supported(sbi->dax_dev, sb->s_bdev, EROFS_BLKSIZ, 0, bdev_nr_sectors(sb->s_bdev)))) {
errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
clear_opt(&sbi->opt, DAX_ALWAYS);
}
@@ -701,6 +719,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
static int erofs_fc_get_tree(struct fs_context *fc)
{
+ struct erofs_fs_context *ctx = fc->fs_private;
+
+ if (ctx->opt.bootstrap_path)
+ return get_tree_nodev(fc, erofs_fc_fill_super);
return get_tree_bdev(fc, erofs_fc_fill_super);
}
@@ -749,6 +771,7 @@ static void erofs_fc_free(struct fs_context *fc)
struct erofs_fs_context *ctx = fc->fs_private;
erofs_free_dev_context(ctx->devs);
+ kfree(ctx->opt.bootstrap_path);
kfree(ctx);
}
@@ -789,7 +812,10 @@ static void erofs_kill_sb(struct super_block *sb)
WARN_ON(sb->s_magic != EROFS_SUPER_MAGIC);
- kill_block_super(sb);
+ if (sb->s_bdev)
+ kill_block_super(sb);
+ else
+ generic_shutdown_super(sb);
sbi = EROFS_SB(sb);
if (!sbi)
@@ -889,7 +915,11 @@ static int erofs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
struct erofs_sb_info *sbi = EROFS_SB(sb);
- u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
+ u64 id = 0;
+
+ /* TODO: fsid in nodev mode */
+ if (sb->s_bdev)
+ id = huge_encode_dev(sb->s_bdev->bd_dev);
buf->f_type = sb->s_magic;
buf->f_bsize = EROFS_BLKSIZ;
--
2.27.0
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 4 ++--
fs/erofs/internal.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 0e35ef3f9f3d..477aaff0c832 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -77,8 +77,8 @@ static int erofs_map_blocks_flatmode(struct inode *inode,
return err;
}
-static int erofs_map_blocks(struct inode *inode,
- struct erofs_map_blocks *map, int flags)
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags)
{
struct super_block *sb = inode->i_sb;
struct erofs_inode *vi = EROFS_I(inode);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 3265688af7f9..45fb6f5d11b5 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -447,6 +447,8 @@ struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr);
int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
+int erofs_map_blocks(struct inode *inode,
+ struct erofs_map_blocks *map, int flags);
/* inode.c */
static inline unsigned long erofs_inode_hash(erofs_nid_t nid)
--
2.27.0
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/ceph/addr.c | 5 -----
fs/netfs/read_helper.c | 3 ++-
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e53c8541f5b2..c3537dfd8c04 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -291,10 +291,6 @@ static void ceph_netfs_issue_op(struct netfs_read_subrequest *subreq)
dout("%s: result %d\n", __func__, err);
}
-static void ceph_init_rreq(struct netfs_read_request *rreq, struct file *file)
-{
-}
-
static void ceph_readahead_cleanup(struct address_space *mapping, void *priv)
{
struct inode *inode = mapping->host;
@@ -306,7 +302,6 @@ static void ceph_readahead_cleanup(struct address_space *mapping, void *priv)
}
static const struct netfs_read_request_ops ceph_netfs_read_ops = {
- .init_rreq = ceph_init_rreq,
.is_cache_enabled = ceph_is_cache_enabled,
.begin_cache_operation = ceph_begin_cache_operation,
.issue_op = ceph_netfs_issue_op,
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index 0807d2bb450d..4247916c7100 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -55,7 +55,8 @@ static struct netfs_read_request *netfs_alloc_read_request(
INIT_WORK(&rreq->work, netfs_rreq_work);
refcount_set(&rreq->usage, 1);
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
- ops->init_rreq(rreq, file);
+ if (ops->init_rreq)
+ ops->init_rreq(rreq, file);
netfs_stat(&netfs_n_rh_rreq);
}
--
2.27.0
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/data.c | 5 +++--
fs/erofs/fscache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 3 +++
fs/erofs/super.c | 10 ++++++----
4 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index cf71082bd52f..47bd3d0ae94c 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -12,6 +12,7 @@
struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
{
struct address_space * mapping;
+ struct erofs_sb_info *sbi;
struct page *page;
if (sb->s_bdev) {
@@ -19,8 +20,8 @@ struct page *erofs_get_meta_page(struct super_block *sb, erofs_blk_t blkaddr)
page = read_cache_page_gfp(mapping, blkaddr,
mapping_gfp_constraint(mapping, ~__GFP_FS));
} else {
- /* TODO: fscache based data path */
- page = ERR_PTR(-EINVAL);
+ sbi = EROFS_SB(sb);
+ page = erofs_readpage_from_fscache(sbi->bootstrap, blkaddr);
}
/* should already be PageUptodate */
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index cf550fdedd1e..6fe31d410cbd 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -1,5 +1,50 @@
#include "internal.h"
+static int erofs_begin_cache_operation(struct netfs_read_request *rreq)
+{
+ return fscache_begin_read_operation(&rreq->cache_resources,
+ rreq->netfs_priv);
+}
+
+static void erofs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
+{
+}
+
+const struct netfs_read_request_ops erofs_req_ops = {
+ .begin_cache_operation = erofs_begin_cache_operation,
+ .cleanup = erofs_priv_cleanup,
+};
+
+struct page *erofs_readpage_from_fscache(struct fscache_cookie *cookie,
+ pgoff_t index)
+{
+ int ret = -ENOMEM;
+ struct folio *folio;
+ struct page *page;
+
+ page = alloc_page(GFP_KERNEL);
+ if (unlikely(!page)) {
+ printk("failed to allocate page\n");
+ goto err;
+ }
+
+ page->index = index;
+ folio = page_folio(page);
+
+ ret = netfs_readpage_demand(folio, &erofs_req_ops, cookie);
+ if (unlikely(ret || !PageUptodate(page))) {
+ printk("failed to read from fscache\n");
+ goto err_page;
+ }
+
+ return page;
+
+err_page:
+ __free_page(page);
+err:
+ return ERR_PTR(ret);
+}
+
int erofs_fscache_init(struct erofs_sb_info *sbi, char *bootstrap_path)
{
sbi->volume = fscache_acquire_volume("erofs", NULL, 0);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 8136ec63a9de..d60d9ffaef2a 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -577,6 +577,9 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
int erofs_fscache_init(struct erofs_sb_info *sbi, char *bootstrap_path);
void erofs_fscache_cleanup(struct erofs_sb_info *sbi);
+struct page *erofs_readpage_from_fscache(struct fscache_cookie *cookie,
+ pgoff_t index);
+
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#endif /* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index f2a5f4cd53fd..bb68bc81a1a7 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -330,21 +330,23 @@ static int erofs_init_devices(struct super_block *sb,
static int erofs_read_superblock(struct super_block *sb)
{
- struct erofs_sb_info *sbi;
+ struct erofs_sb_info *sbi = EROFS_SB(sb);
struct page *page;
struct erofs_super_block *dsb;
unsigned int blkszbits;
void *data;
int ret;
- page = read_mapping_page(sb->s_bdev->bd_inode->i_mapping, 0, NULL);
+ if (sb->s_bdev)
+ page = read_mapping_page(sb->s_bdev->bd_inode->i_mapping, 0, NULL);
+ else
+ page = erofs_readpage_from_fscache(sbi->bootstrap, 0);
+
if (IS_ERR(page)) {
erofs_err(sb, "cannot read erofs superblock");
return PTR_ERR(page);
}
- sbi = EROFS_SB(sb);
-
data = kmap(page);
dsb = (struct erofs_super_block *)(data + EROFS_SUPER_OFFSET);
--
2.27.0
This patch only handles the volume cookie and data file cookie for
bootstrap. The corresponding IO path is remained to be implemented in
the following patch.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/Makefile | 2 +-
fs/erofs/fscache.c | 37 +++++++++++++++++++++++++++++++++++++
fs/erofs/internal.h | 8 ++++++++
fs/erofs/super.c | 5 +++++
4 files changed, 51 insertions(+), 1 deletion(-)
create mode 100644 fs/erofs/fscache.c
diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile
index 756fe2d65272..f9a3609625aa 100644
--- a/fs/erofs/Makefile
+++ b/fs/erofs/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_EROFS_FS) += erofs.o
-erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o
+erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o fscache.o
erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
new file mode 100644
index 000000000000..cf550fdedd1e
--- /dev/null
+++ b/fs/erofs/fscache.c
@@ -0,0 +1,37 @@
+#include "internal.h"
+
+int erofs_fscache_init(struct erofs_sb_info *sbi, char *bootstrap_path)
+{
+ sbi->volume = fscache_acquire_volume("erofs", NULL, 0);
+ if (!sbi->volume) {
+ printk("fscache_acquire_volume() failed\n");
+ return -EINVAL;
+ }
+
+ /*
+ * TODO: @object_size is 0 since erofs can not get size of bootstrap
+ * file.
+ */
+ sbi->bootstrap = fscache_acquire_cookie(sbi->volume, 0,
+ bootstrap_path, strlen(bootstrap_path),
+ NULL, 0,
+ 1 /*TODO: we don't want FSCACHE_COOKIE_NO_DATA_TO_READ set */
+ );
+
+ if (!sbi->bootstrap) {
+ printk("fscache_acquire_cookie() for bootstrap failed\n");
+ /* cleanup for sbi->volume is delayed to erofs_fscache_cleanup() */
+ return -EINVAL;
+ }
+
+ fscache_use_cookie(sbi->bootstrap, false);
+
+ return 0;
+}
+
+void erofs_fscache_cleanup(struct erofs_sb_info *sbi)
+{
+ fscache_unuse_cookie(sbi->bootstrap, NULL, NULL);
+ fscache_relinquish_cookie(sbi->bootstrap, false);
+ fscache_relinquish_volume(sbi->volume, 0, false);
+}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index cf69d9c9cbed..8136ec63a9de 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -17,6 +17,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/iomap.h>
+#include <linux/fscache.h>
#include "erofs_fs.h"
/* redefine pr_fmt "erofs: " */
@@ -106,6 +107,9 @@ struct erofs_sb_info {
/* pseudo inode to manage cached pages */
struct inode *managed_cache;
+ struct fscache_volume *volume;
+ struct fscache_cookie *bootstrap;
+
struct erofs_sb_lz4_info lz4;
#endif /* CONFIG_EROFS_FS_ZIP */
struct erofs_dev_context *devs;
@@ -569,6 +573,10 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
}
#endif /* !CONFIG_EROFS_FS_ZIP */
+/* fscache.c */
+int erofs_fscache_init(struct erofs_sb_info *sbi, char *bootstrap_path);
+void erofs_fscache_cleanup(struct erofs_sb_info *sbi);
+
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
#endif /* __EROFS_INTERNAL_H */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 51695f6d4449..f2a5f4cd53fd 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -665,6 +665,10 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
else
sbi->dax_dev = NULL;
+ err = erofs_fscache_init(sbi, ctx->opt.bootstrap_path);
+ if (err)
+ return err;
+
err = erofs_read_superblock(sb);
if (err)
return err;
@@ -823,6 +827,7 @@ static void erofs_kill_sb(struct super_block *sb)
erofs_free_dev_context(sbi->devs);
fs_put_dax(sbi->dax_dev);
+ erofs_fscache_cleanup(sbi);
kfree(sbi);
sb->s_fs_info = NULL;
}
--
2.27.0
fscache/cachefiles used to serve as a local cache for remote fs.
This patch set introduces a new use case, in which local read-only
fs could implement demand read with fscache.
Thus 'type' field is used to distinguish which mode netfs API works in.
Besides, in demand-read case, local fs using fscache for demand-read
can't offer and also doesn't need the file handle of the netfs file. The
input folio of netfs_readpage() is may not a page cache inside the
address space of the netfs file, and may be just a temporary page
containing the data. What netfs API needs to do is just move data from
backing file the the input folio. Thus buffer the folio in 'struct
netfs_read_request'.
Signed-off-by: Jeffle Xu <[email protected]>
---
include/linux/netfs.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index b46c39d98bbd..638ea5d63869 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -148,6 +148,11 @@ struct netfs_read_subrequest {
#define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */
};
+enum netfs_read_request_type {
+ NETFS_TYPE_CACHE,
+ NETFS_TYPE_DEMAND,
+};
+
/*
* Descriptor for a read helper request. This is used to make multiple I/O
* requests on a variety of sources and then stitch the result together.
@@ -156,6 +161,7 @@ struct netfs_read_request {
struct work_struct work;
struct inode *inode; /* The file being accessed */
struct address_space *mapping; /* The mapping being accessed */
+ struct folio *folio;
struct netfs_cache_resources cache_resources;
struct list_head subrequests; /* Requests to fetch I/O from disk or net */
void *netfs_priv; /* Private data for the netfs */
@@ -177,6 +183,7 @@ struct netfs_read_request {
#define NETFS_RREQ_FAILED 4 /* The request failed */
#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */
const struct netfs_read_request_ops *netfs_ops;
+ enum netfs_read_request_type type;
};
/*
--
2.27.0
Add demand_read() callback to netfs_cache_ops to implement demand read.
To implement demand-read semantics, the data blob file is a sparse file
on the first beginning. When fs starts to access the data blob file, it
will "cache miss" (hit the hole) and then .issue_op() callback will be
called to prepare the data.
.issue_op() callback could call netfs_demand_read() helper function
introduced in this patch to prepare the data, which will in turn call
.demand_read() callback of fscache backend.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/netfs/read_helper.c | 26 ++++++++++++++++++++++++++
include/linux/netfs.h | 4 ++++
2 files changed, 30 insertions(+)
diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c
index ade1523fc180..3460cfd7a570 100644
--- a/fs/netfs/read_helper.c
+++ b/fs/netfs/read_helper.c
@@ -1125,6 +1125,32 @@ int netfs_readpage_demand(struct folio *folio,
}
EXPORT_SYMBOL(netfs_readpage_demand);
+void netfs_demand_read(struct netfs_read_subrequest *subreq)
+{
+ struct netfs_read_request *rreq = subreq->rreq;
+ struct netfs_cache_resources *cres = &rreq->cache_resources;
+ loff_t start_pos;
+ size_t len;
+ int ret;
+
+ start_pos = subreq->start + subreq->transferred;
+ len = subreq->len - subreq->transferred;
+
+ /*
+ * In success case (ret == 0), user daemon has downloaded data for us,
+ * thus transform to NETFS_READ_FROM_CACHE state and advertise that 0
+ * byte readed, so that the request will enter into INCOMPLETE state and
+ * re-read from backing file.
+ */
+ ret = cres->ops->demand_read(cres, start_pos, len);
+ if (!ret) {
+ subreq->source = NETFS_READ_FROM_CACHE;
+ __clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags);
+ }
+
+ netfs_subreq_terminated(subreq, ret, false);
+}
+
/*
* Prepare a folio for writing without reading first
* @folio: The folio being prepared
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 5f45eb31defd..8a9dae361f07 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -253,6 +253,9 @@ struct netfs_cache_ops {
int (*prepare_write)(struct netfs_cache_resources *cres,
loff_t *_start, size_t *_len, loff_t i_size,
bool no_space_allocated_yet);
+
+ int (*demand_read)(struct netfs_cache_resources *cres,
+ loff_t start_pos, size_t len);
};
struct readahead_control;
@@ -271,6 +274,7 @@ extern int netfs_write_begin(struct file *, struct address_space *,
void **,
const struct netfs_read_request_ops *,
void *);
+extern void netfs_demand_read(struct netfs_read_subrequest *);
extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool);
extern void netfs_stats_show(struct seq_file *);
--
2.27.0
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index c849d3a89520..3d254cf7a0e3 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -83,9 +83,15 @@ static void erofs_priv_cleanup(struct address_space *mapping, void *netfs_priv)
{
}
+static void erofs_issue_op(struct netfs_read_subrequest *subreq)
+{
+ netfs_demand_read(subreq);
+}
+
const struct netfs_read_request_ops erofs_req_ops = {
.begin_cache_operation = erofs_begin_cache_operation,
.cleanup = erofs_priv_cleanup,
+ .issue_op = erofs_issue_op,
};
struct page *erofs_readpage_from_fscache(struct fscache_cookie *cookie,
--
2.27.0
fs can call cachefiles_demand_read() helper function to enqueue read
request for demand reading.
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/cachefiles/daemon.c | 148 +++++++++++++++++++++++++++------------
fs/cachefiles/internal.h | 16 +++++
fs/cachefiles/io.c | 56 +++++++++++++++
3 files changed, 175 insertions(+), 45 deletions(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 951963e72b44..7d174a39cd1c 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -44,6 +44,7 @@ static int cachefiles_daemon_tag(struct cachefiles_cache *, char *);
static int cachefiles_daemon_mode(struct cachefiles_cache *, char *);
static int cachefiles_daemon_bind(struct cachefiles_cache *, char *);
static void cachefiles_daemon_unbind(struct cachefiles_cache *);
+static int cachefiles_daemon_done(struct cachefiles_cache *, char *);
static unsigned long cachefiles_open;
@@ -77,6 +78,7 @@ static const struct cachefiles_daemon_cmd cachefiles_daemon_cmds[] = {
{ "secctx", cachefiles_daemon_secctx },
{ "tag", cachefiles_daemon_tag },
{ "mode", cachefiles_daemon_mode },
+ { "done", cachefiles_daemon_done },
{ "", NULL }
};
@@ -110,6 +112,8 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
INIT_LIST_HEAD(&cache->volumes);
INIT_LIST_HEAD(&cache->object_list);
spin_lock_init(&cache->object_list_lock);
+ idr_init(&cache->reqs);
+ spin_lock_init(&cache->reqs_lock);
/* set default caching limits
* - limit at 1% free space and/or free files
@@ -144,6 +148,7 @@ static int cachefiles_daemon_release(struct inode *inode, struct file *file)
cachefiles_daemon_unbind(cache);
/* clean up the control file interface */
+ idr_destroy(&cache->reqs);
cache->cachefilesd = NULL;
file->private_data = NULL;
cachefiles_open = 0;
@@ -164,6 +169,7 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
unsigned long long b_released;
unsigned f_released;
char buffer[256];
+ void *buf;
int n;
//_enter(",,%zu,", buflen);
@@ -171,38 +177,53 @@ static ssize_t cachefiles_daemon_read(struct file *file, char __user *_buffer,
if (!test_bit(CACHEFILES_READY, &cache->flags))
return 0;
- /* check how much space the cache has */
- cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
-
- /* summarise */
- f_released = atomic_xchg(&cache->f_released, 0);
- b_released = atomic_long_xchg(&cache->b_released, 0);
- clear_bit(CACHEFILES_STATE_CHANGED, &cache->flags);
-
- n = snprintf(buffer, sizeof(buffer),
- "cull=%c"
- " frun=%llx"
- " fcull=%llx"
- " fstop=%llx"
- " brun=%llx"
- " bcull=%llx"
- " bstop=%llx"
- " freleased=%x"
- " breleased=%llx",
- test_bit(CACHEFILES_CULLING, &cache->flags) ? '1' : '0',
- (unsigned long long) cache->frun,
- (unsigned long long) cache->fcull,
- (unsigned long long) cache->fstop,
- (unsigned long long) cache->brun,
- (unsigned long long) cache->bcull,
- (unsigned long long) cache->bstop,
- f_released,
- b_released);
+ if (cache->mode == CACHEFILES_MODE_CACHE) {
+ /* check how much space the cache has */
+ cachefiles_has_space(cache, 0, 0, cachefiles_has_space_check);
+
+ /* summarise */
+ f_released = atomic_xchg(&cache->f_released, 0);
+ b_released = atomic_long_xchg(&cache->b_released, 0);
+ clear_bit(CACHEFILES_STATE_CHANGED, &cache->flags);
+
+ n = snprintf(buffer, sizeof(buffer),
+ "cull=%c"
+ " frun=%llx"
+ " fcull=%llx"
+ " fstop=%llx"
+ " brun=%llx"
+ " bcull=%llx"
+ " bstop=%llx"
+ " freleased=%x"
+ " breleased=%llx",
+ test_bit(CACHEFILES_CULLING, &cache->flags) ? '1' : '0',
+ (unsigned long long) cache->frun,
+ (unsigned long long) cache->fcull,
+ (unsigned long long) cache->fstop,
+ (unsigned long long) cache->brun,
+ (unsigned long long) cache->bcull,
+ (unsigned long long) cache->bstop,
+ f_released,
+ b_released);
+ buf = buffer;
+ } else {
+ struct cachefiles_req *req;
+ int id = 0;
+
+ spin_lock(&cache->reqs_lock);
+ req = idr_get_next(&cache->reqs, &id);
+ spin_unlock(&cache->reqs_lock);
+ if (!req)
+ return 0;
+
+ buf = &req->req_in;
+ n = sizeof(req->req_in);
+ }
if (n > buflen)
return -EMSGSIZE;
- if (copy_to_user(_buffer, buffer, n) != 0)
+ if (copy_to_user(_buffer, buf, n) != 0)
return -EFAULT;
return n;
@@ -291,7 +312,7 @@ static ssize_t cachefiles_daemon_write(struct file *file,
* - use EPOLLOUT to indicate culling state
*/
static __poll_t cachefiles_daemon_poll(struct file *file,
- struct poll_table_struct *poll)
+ struct poll_table_struct *poll)
{
struct cachefiles_cache *cache = file->private_data;
__poll_t mask;
@@ -299,11 +320,16 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
poll_wait(file, &cache->daemon_pollwq, poll);
mask = 0;
- if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
- mask |= EPOLLIN;
+ if (cache->mode == CACHEFILES_MODE_CACHE) {
+ if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
+ mask |= EPOLLIN;
- if (test_bit(CACHEFILES_CULLING, &cache->flags))
- mask |= EPOLLOUT;
+ if (test_bit(CACHEFILES_CULLING, &cache->flags))
+ mask |= EPOLLOUT;
+ } else {
+ if(!idr_is_empty(&cache->reqs))
+ mask |= EPOLLIN;
+ }
return mask;
}
@@ -313,7 +339,7 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
* - can be tail-called
*/
static int cachefiles_daemon_range_error(struct cachefiles_cache *cache,
- char *args)
+ char *args)
{
pr_err("Free space limits must be in range 0%%<=stop<cull<run<100%%\n");
@@ -546,6 +572,38 @@ static int cachefiles_daemon_tag(struct cachefiles_cache *cache, char *args)
return 0;
}
+/*
+ * Request completion
+ * - command: "done <id>"
+ */
+static int cachefiles_daemon_done(struct cachefiles_cache *cache, char *args)
+{
+ unsigned long id;
+ int ret;
+ struct cachefiles_req *req;
+
+ _enter(",%s", args);
+
+ if (!*args) {
+ pr_err("Empty id specified\n");
+ return -EINVAL;
+ }
+
+ ret = kstrtoul(args, 0, &id);
+ if (ret)
+ return ret;
+
+ spin_lock(&cache->reqs_lock);
+ req = idr_remove(&cache->reqs, id);
+ spin_unlock(&cache->reqs_lock);
+ if (!req)
+ return -EINVAL;
+
+ complete(&req->done);
+
+ return 0;
+}
+
/*
* Request a node in the cache be culled from the current working directory
* - command: "cull <name>"
@@ -704,22 +762,22 @@ static int cachefiles_daemon_mode(struct cachefiles_cache *cache, char *args)
static int cachefiles_daemon_bind(struct cachefiles_cache *cache, char *args)
{
_enter("{%u,%u,%u,%u,%u,%u},%s",
- cache->frun_percent,
- cache->fcull_percent,
- cache->fstop_percent,
- cache->brun_percent,
- cache->bcull_percent,
- cache->bstop_percent,
- args);
+ cache->frun_percent,
+ cache->fcull_percent,
+ cache->fstop_percent,
+ cache->brun_percent,
+ cache->bcull_percent,
+ cache->bstop_percent,
+ args);
if (cache->fstop_percent >= cache->fcull_percent ||
- cache->fcull_percent >= cache->frun_percent ||
- cache->frun_percent >= 100)
+ cache->fcull_percent >= cache->frun_percent ||
+ cache->frun_percent >= 100)
return -ERANGE;
if (cache->bstop_percent >= cache->bcull_percent ||
- cache->bcull_percent >= cache->brun_percent ||
- cache->brun_percent >= 100)
+ cache->bcull_percent >= cache->brun_percent ||
+ cache->brun_percent >= 100)
return -ERANGE;
if (*args) {
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 1366e4319b4e..72e6e8744788 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -15,6 +15,7 @@
#include <linux/fscache-cache.h>
#include <linux/cred.h>
#include <linux/security.h>
+#include <linux/idr.h>
#define CACHEFILES_DIO_BLOCK_SIZE 4096
@@ -65,6 +66,18 @@ enum cachefiles_mode {
CACHEFILES_MODE_DEMAND, /* demand read for read-only fs */
};
+struct cachefiles_req_in {
+ uint64_t id;
+ uint64_t off;
+ uint64_t len;
+ char path[NAME_MAX];
+};
+
+struct cachefiles_req {
+ struct completion done;
+ struct cachefiles_req_in req_in;
+};
+
/*
* Cache files cache definition
*/
@@ -107,6 +120,9 @@ struct cachefiles_cache {
char *rootdirname; /* name of cache root directory */
char *secctx; /* LSM security context */
char *tag; /* cache binding tag */
+
+ struct idr reqs;
+ spinlock_t reqs_lock;
};
#include <trace/events/cachefiles.h>
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 95e9107dc3bb..b4f6187f4022 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -540,12 +540,68 @@ static void cachefiles_end_operation(struct netfs_cache_resources *cres)
fscache_end_cookie_access(fscache_cres_cookie(cres), fscache_access_io_end);
}
+static struct cachefiles_req *cachefiles_alloc_req(struct cachefiles_object *object,
+ loff_t start_pos,
+ size_t len)
+{
+ struct cachefiles_req *req;
+ struct cachefiles_req_in *req_in;
+
+ req = kzalloc(sizeof(*req), GFP_KERNEL);
+ if (!req)
+ return NULL;
+
+ req_in = &req->req_in;
+
+ req_in->off = start_pos;
+ req_in->len = len;
+ strncpy(req_in->path, object->d_name, sizeof(req_in->path));
+
+ init_completion(&req->done);
+
+ return req;
+}
+
+int cachefiles_demand_read(struct netfs_cache_resources *cres,
+ loff_t start_pos, size_t len)
+{
+ struct cachefiles_object *object;
+ struct cachefiles_cache *cache;
+ struct cachefiles_req *req;
+ int ret;
+
+ object = cachefiles_cres_object(cres);
+ cache = object->volume->cache;
+
+ req = cachefiles_alloc_req(object, start_pos, len);
+ if (!req)
+ return -ENOMEM;
+
+ spin_lock(&cache->reqs_lock);
+ ret = idr_alloc(&cache->reqs, req, 0, 0, GFP_KERNEL);
+ if (ret >= 0)
+ req->req_in.id = ret;
+ spin_unlock(&cache->reqs_lock);
+ if (ret < 0) {
+ kfree(req);
+ return -ENOMEM;
+ }
+
+ wake_up_all(&cache->daemon_pollwq);
+
+ wait_for_completion(&req->done);
+ kfree(req);
+
+ return 0;
+}
+
static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
.end_operation = cachefiles_end_operation,
.read = cachefiles_read,
.write = cachefiles_write,
.prepare_read = cachefiles_prepare_read,
.prepare_write = cachefiles_prepare_write,
+ .demand_read = cachefiles_demand_read,
};
/*
--
2.27.0
Signed-off-by: Jeffle Xu <[email protected]>
---
fs/erofs/fscache.c | 73 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/inode.c | 6 +++-
fs/erofs/internal.h | 1 +
3 files changed, 79 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 6fe31d410cbd..c849d3a89520 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -1,5 +1,78 @@
#include "internal.h"
+/*
+ * erofs_fscache_readpage
+ *
+ * Copy data from backpage (bootstrap) to page of files among erofs.
+ */
+static int erofs_fscache_readpage(struct file *file, struct page *page)
+{
+ struct inode *inode = page->mapping->host;
+ struct super_block *sb = inode->i_sb;
+ erofs_off_t pos = page->index << PAGE_SHIFT;
+ struct erofs_map_blocks map = { .m_la = pos };
+ erofs_blk_t blkaddr;
+ struct page *backpage;
+ u64 total, batch, copied = 0;
+ char *vsrc, *vdst; /* virtual address of mapped src/dst page */
+ char *psrc, *pdst; /* cursor inside src/dst page */
+ u64 osrc; /* offset inside src page */
+ int err;
+
+ err = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
+ if (err)
+ goto out;
+
+ total = min_t(u64, PAGE_SIZE, map.m_plen - (pos - map.m_la));
+ blkaddr = map.m_pa >> PAGE_SHIFT;
+ osrc = map.m_pa & (PAGE_SIZE - 1);
+
+ while (total) {
+ backpage = erofs_get_meta_page(sb, blkaddr);
+ if (IS_ERR(backpage)) {
+ err = PTR_ERR(backpage);
+ goto out;
+ }
+
+ vsrc = psrc = kmap_atomic(backpage);
+ vdst = pdst = kmap_atomic(page);
+
+ psrc += osrc;
+ pdst += copied;
+ batch = min_t(u64, PAGE_SIZE - osrc, total);
+
+ memcpy(pdst, psrc, batch);
+
+ copied += batch;
+ total -= batch;
+ blkaddr++;
+ osrc = 0; /* copy from the beginning of the next backpage */
+
+ /*
+ * Avoid 'scheduling while atomic' error. Unmap before going
+ * into the next turn, since we may schedule inside
+ * erofs_get_meta_page().
+ * */
+ kunmap_atomic(vsrc);
+ kunmap_atomic(vdst);
+
+ unlock_page(backpage);
+ put_page(backpage);
+ }
+
+out:
+ if (err)
+ SetPageError(page);
+ else
+ SetPageUptodate(page);
+ unlock_page(page);
+ return err;
+}
+
+const struct address_space_operations erofs_fscache_access_aops = {
+ .readpage = erofs_fscache_readpage,
+};
+
static int erofs_begin_cache_operation(struct netfs_read_request *rreq)
{
return fscache_begin_read_operation(&rreq->cache_resources,
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 2345f1de438e..452d147277c4 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -299,7 +299,11 @@ static int erofs_fill_inode(struct inode *inode, int isdir)
err = z_erofs_fill_inode(inode);
goto out_unlock;
}
- inode->i_mapping->a_ops = &erofs_raw_access_aops;
+
+ if (inode->i_sb->s_bdev)
+ inode->i_mapping->a_ops = &erofs_raw_access_aops;
+ else
+ inode->i_mapping->a_ops = &erofs_fscache_access_aops;
out_unlock:
unlock_page(page);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index d60d9ffaef2a..dd3f2edae603 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -353,6 +353,7 @@ struct page *erofs_grab_cache_page_nowait(struct address_space *mapping,
extern const struct super_operations erofs_sops;
extern const struct address_space_operations erofs_raw_access_aops;
+extern const struct address_space_operations erofs_fscache_access_aops;
extern const struct address_space_operations z_erofs_aops;
/*
--
2.27.0
Jeffle Xu <[email protected]> wrote:
> Thus simplify the logic of placing backing files, in which backing files
> are under "cache/<volume>/" directory directly.
You then have a scalability issue on the directory inode lock - and there may
also be limits on the capacity of a directory. The hash function is meant to
work the same, no matter the cpu arch, so you should be able to copy that to
userspace and derive the hash yourself.
> Also skip coherency checking currently to ease the development and debug.
Better if you can do that in erofs rather than cachefiles. Just set your
coherency data to all zeros or something.
David
Jeffle Xu <[email protected]> wrote:
> +enum cachefiles_mode {
> + CACHEFILES_MODE_CACHE, /* local cache for netfs (Default) */
> + CACHEFILES_MODE_DEMAND, /* demand read for read-only fs */
> +};
> +
I would suggest just adding a flag for the moment.
David
Jeffle Xu <[email protected]> wrote:
> In demand-read case, the input folio of netfs API is may not the page
"is may not the page"? I think you're missing a verb (and you have too many
auxiliary verbs;-)
David
On 12/10/21 11:41 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> In demand-read case, the input folio of netfs API is may not the page
>
> "is may not the page"? I think you're missing a verb (and you have too many
> auxiliary verbs;-)
>
Sorry for my poor English... What I want to express is that
"In demand-read case, the input folio of netfs API may not be the page
cache inside the address space of the netfs file."
--
Thanks,
Jeffle
On 12/10/21 7:05 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> +enum cachefiles_mode {
>> + CACHEFILES_MODE_CACHE, /* local cache for netfs (Default) */
>> + CACHEFILES_MODE_DEMAND, /* demand read for read-only fs */
>> +};
>> +
>
> I would suggest just adding a flag for the moment.
>
Make sense. Thanks.
--
Thanks,
Jeffle
On 12/10/21 7:04 PM, David Howells wrote:
> Jeffle Xu <[email protected]> wrote:
>
>> Thus simplify the logic of placing backing files, in which backing files
>> are under "cache/<volume>/" directory directly.
>
> You then have a scalability issue on the directory inode lock - and there may
> also be limits on the capacity of a directory. The hash function is meant to
> work the same, no matter the cpu arch, so you should be able to copy that to
> userspace and derive the hash yourself.
Yes, as described in the cover letter, I plan to make the hashing
algorithm used by cachefiles built-in into our user daemon, so that the
user daemon could place the blob file on the right place. Then the core
logic of cachefiles won't be touched as much as possible.
>
>> Also skip coherency checking currently to ease the development and debug.
>
> Better if you can do that in erofs rather than cachefiles. Just set your
> coherency data to all zeros or something.
>
Yes it is preferred to keep the general part of cachefiles untouched.
Later we can set "CacheFiles.cache" xattr on blob files in advance to
pass this check.
--
Thanks,
Jeffle
On 12/11/21 1:23 PM, JeffleXu wrote:
>
>
> On 12/10/21 11:41 PM, David Howells wrote:
>> Jeffle Xu <[email protected]> wrote:
>>
>>> In demand-read case, the input folio of netfs API is may not the page
>>
>> "is may not the page"? I think you're missing a verb (and you have too many
>> auxiliary verbs;-)
>>
>
> Sorry for my poor English... What I want to express is that
>
> "In demand-read case, the input folio of netfs API may not be the page
> cache inside the address space of the netfs file."
>
By the way, can we change the current address_space based netfs API to
folio-based, which shall be more general? That is, the current
implementation of netfs API uses (address_space, page_offset, len) tuple
to describe the destination where the read data shall be store into.
While in the demand-read case, the input folio may not be the page
cache, and thus there's no address_space attached with it.
--
Thanks,
Jeffle
Hi Jeffle,
On Sat, Dec 11, 2021 at 01:44:47PM +0800, JeffleXu wrote:
>
>
> On 12/11/21 1:23 PM, JeffleXu wrote:
> >
> >
> > On 12/10/21 11:41 PM, David Howells wrote:
> >> Jeffle Xu <[email protected]> wrote:
> >>
> >>> In demand-read case, the input folio of netfs API is may not the page
> >>
> >> "is may not the page"? I think you're missing a verb (and you have too many
> >> auxiliary verbs;-)
> >>
> >
> > Sorry for my poor English... What I want to express is that
> >
> > "In demand-read case, the input folio of netfs API may not be the page
> > cache inside the address space of the netfs file."
> >
>
> By the way, can we change the current address_space based netfs API to
> folio-based, which shall be more general? That is, the current
> implementation of netfs API uses (address_space, page_offset, len) tuple
> to describe the destination where the read data shall be store into.
> While in the demand-read case, the input folio may not be the page
> cache, and thus there's no address_space attached with it.
Thanks for your hard effort on this!
Just a rough look. Could we use a pseudo inode (actually the current
managed_inode can be used as this) to retain metadata for fscache
scenarios? (since it's better to cache all metadata rather than drop
directly, also the alloc_page() - free_page() cycle takes more time).
Also if my own limited understanding is correct, you could directly
use file inode pages with netfs_readpage_demand() rather than
get_meta_page and then memcpy to the file inode pages.
Thanks,
Gao Xiang
>
> --
> Thanks,
> Jeffle