2023-01-11 06:20:14

by Jia Zhu

[permalink] [raw]
Subject: [PATCH V4 0/5] Introduce daemon failover mechanism to recover from crashing

Changes since v3:
1. Add xa_lock for traverse xarray in cachefiles_daemon_poll().
2. Use macro to simplify the code in cachefiles_ondemand_select_req().

[Background]
============
In ondemand read mode, if user daemon closes anonymous fd(e.g. daemon
crashes), subsequent read and inflight requests based on these fd will
return -EIO.
Even if above mentioned case is tolerable for some individual users, but
when it happenens in real cloud service production environment, such IO
errors will be passed to cloud service users and impact its working jobs.
It's terrible for cloud service stability.

[Design]
========
The main idea of daemon failover is reopen the inflight req related object,
thus the newly started daemon could process the req as usual.
To implement that, we need to support:
1. Store inflight requests during daemon crash.
2. Hold the handle of /dev/cachefiles(by container snapshotter/systemd).
BTW, if user chooses not to keep /dev/cachefiles fd, failover is not enabled.
Inflight requests return error and passed it to container.(same behavior as now).

[Flow Path]
===========
This patchset introduce three states for ondemand object:
CLOSE: Object which just be allocated or closed by user daemon.
OPEN: Object which related OPEN request has been processed correctly.
REOPENING: Object which has been closed, and is drived to open by a read
request.

1. Daemon use UDS send/receive fd to keep and pass the fd reference of
"/dev/cachefiles".
2. User daemon crashes -> restart and recover dev fd's reference.
3. User daemon write "restore" to device.
2.1 Reset the object's state from CLOSE to REOPENING.
2.2 Init a work which reinit the object and add it to wq. (daemon can
get rid of kernel space and handle that open request).
4. The user of upper filesystem won't notice that the daemon ever crashed
since the inflight IO is restored and handled correctly.

[Test]
======
There is a testcase for above mentioned scenario.
A user process read the file by fscache ondemand reading.
At the same time, we kill the daemon constantly.
The expected result is that the file read by user is consistent with
original, and the user doesn't notice that daemon has ever been killed.

https://github.com/userzj/demand-read-cachefilesd/commits/failover-test

[GitWeb]
========
https://github.com/userzj/linux/tree/fscache-failover-v5

RFC: https://lore.kernel.org/all/[email protected]/
V1: https://lore.kernel.org/all/[email protected]/
V2: https://lore.kernel.org/all/[email protected]/
V3: https://lore.kernel.org/all/[email protected]/

Jia Zhu (5):
cachefiles: introduce object ondemand state
cachefiles: extract ondemand info field from cachefiles_object
cachefiles: resend an open request if the read request's object is
closed
cachefiles: narrow the scope of triggering EPOLLIN events in ondemand
mode
cachefiles: add restore command to recover inflight ondemand read
requests

fs/cachefiles/daemon.c | 16 +++-
fs/cachefiles/interface.c | 6 ++
fs/cachefiles/internal.h | 57 +++++++++++++-
fs/cachefiles/ondemand.c | 160 ++++++++++++++++++++++++++++----------
4 files changed, 192 insertions(+), 47 deletions(-)

--
2.20.1


2023-01-11 06:20:14

by Jia Zhu

[permalink] [raw]
Subject: [PATCH V4 4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode

Don't trigger EPOLLIN when there are only reopening read requests in
xarray.

Suggested-by: Xin Yin <[email protected]>
Signed-off-by: Jia Zhu <[email protected]>
Reviewed-by: Jingbo Xu <[email protected]>
---
fs/cachefiles/daemon.c | 15 +++++++++++++--
fs/cachefiles/internal.h | 12 ++++++++++++
2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index aa4efcabb5e3..b8d8f280fb7a 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -355,14 +355,25 @@ static __poll_t cachefiles_daemon_poll(struct file *file,
struct poll_table_struct *poll)
{
struct cachefiles_cache *cache = file->private_data;
+ struct xarray *xa = &cache->reqs;
+ struct cachefiles_req *req;
+ unsigned long index;
__poll_t mask;

poll_wait(file, &cache->daemon_pollwq, poll);
mask = 0;

if (cachefiles_in_ondemand_mode(cache)) {
- if (!xa_empty(&cache->reqs))
- mask |= EPOLLIN;
+ if (!xa_empty(xa)) {
+ xa_lock(xa);
+ xa_for_each_marked(xa, index, req, CACHEFILES_REQ_NEW) {
+ if (!cachefiles_ondemand_is_reopening_read(req)) {
+ mask |= EPOLLIN;
+ break;
+ }
+ }
+ xa_unlock(xa);
+ }
} else {
if (test_bit(CACHEFILES_STATE_CHANGED, &cache->flags))
mask |= EPOLLIN;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 2ed836d4169e..3d94990a8b38 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -326,6 +326,13 @@ cachefiles_ondemand_set_object_##_state(struct cachefiles_object *object) \
CACHEFILES_OBJECT_STATE_FUNCS(open);
CACHEFILES_OBJECT_STATE_FUNCS(close);
CACHEFILES_OBJECT_STATE_FUNCS(reopening);
+
+static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req)
+{
+ return cachefiles_ondemand_object_is_reopening(req->object) &&
+ req->msg.opcode == CACHEFILES_OP_READ;
+}
+
#else
static inline ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
char __user *_buffer, size_t buflen)
@@ -353,6 +360,11 @@ static inline int cachefiles_ondemand_init_obj_info(struct cachefiles_object *ob
{
return 0;
}
+
+static inline bool cachefiles_ondemand_is_reopening_read(struct cachefiles_req *req)
+{
+ return false;
+}
#endif

/*
--
2.20.1

2023-03-28 14:21:36

by David Howells

[permalink] [raw]
Subject: Re: [PATCH V4 4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode

Jia Zhu <[email protected]> wrote:

> + if (!xa_empty(xa)) {
> + xa_lock(xa);
> + xa_for_each_marked(xa, index, req, CACHEFILES_REQ_NEW) {
> + if (!cachefiles_ondemand_is_reopening_read(req)) {
> + mask |= EPOLLIN;
> + break;
> + }
> + }
> + xa_unlock(xa);
> + }

I wonder if there's a more efficient way to do this. I guess it depends on
how many reqs you expect to get in a queue. It might be worth taking the
rcu_read_lock before calling xa_lock() and holding it over the whole loop.

David

2023-03-29 08:18:02

by Jia Zhu

[permalink] [raw]
Subject: Re: [External] Re: [PATCH V4 4/5] cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode



在 2023/3/28 22:19, David Howells 写道:
> Jia Zhu <[email protected]> wrote:
>
>> + if (!xa_empty(xa)) {
>> + xa_lock(xa);
>> + xa_for_each_marked(xa, index, req, CACHEFILES_REQ_NEW) {
>> + if (!cachefiles_ondemand_is_reopening_read(req)) {
>> + mask |= EPOLLIN;
>> + break;
>> + }
>> + }
>> + xa_unlock(xa);
>> + }
>
> I wonder if there's a more efficient way to do this. I guess it depends on
> how many reqs you expect to get in a queue. It might be worth taking the
> rcu_read_lock before calling xa_lock() and holding it over the whole loop.
>
Thanks for the advice, will use rcu_read_lock(unlock) to replace it.
> David
>