2022-09-22 06:26:09

by Ziyang Zhang

[permalink] [raw]
Subject: [PATCH V5 0/7] ublk_drv: add USER_RECOVERY support

ublk_drv is a driver simply passes all blk-mq rqs to userspace
target(such as ublksrv[1]). For each ublk queue, there is one
ubq_daemon(pthread). All ubq_daemons share the same process
which opens /dev/ublkcX. The ubq_daemon code infinitely loops on
io_uring_enter() to send/receive io_uring cmds which pass
information of blk-mq rqs.

Since the real IO handler(the process/thread opening /dev/ublkcX) is
in userspace, it could crash if:
(1) the user kills -9 it because of IO hang on backend, system
reboot, etc...
(2) the process/thread catches a exception(segfault, divisor error,
oom...) Therefore, the kernel driver has to deal with a dying
ubq_daemon or the process.

Now, if one ubq_daemon(pthread) or the process crashes, ublk_drv
must abort the dying ubq, stop the device and release everything.
This is not a good choice in practice because users do not expect
aborted requests, I/O errors and a released device. They may want
a recovery machenism so that no requests are aborted and no I/O
error occurs. Anyway, users just want everything works as usual.

This patchset implements USER_RECOVERY support. If the process
or any ubq_daemon(pthread) crashes(exits accidentally), we allow
user to provide new process and ubq_daemons.

Note: The responsibility of recovery belongs to the user who opens
/dev/ublkcX. After a crash, the kernel driver only switch the
device's state to be ready for recovery(START_USER_RECOVERY) or
termination(STOP_DEV). The state is defined as UBLK_S_DEV_QUIESCED.
This patchset does not provide how to detect such a crash in userspace.
The user has may ways to do so. For example, user may:
(1) send GET_DEV_INFO on specific dev_id and check if its state is
UBLK_S_DEV_QUIESCED.
(2) 'ps' on ublksrv_pid.

Recovery feature is quite useful for real products. In detail,
we support this scenario:
(1) The /dev/ublkc0 is opened by process 0.
(2) Fio is running on /dev/ublkb0 exposed by ublk_drv and all
rqs are handled by process 0.
(3) Process 0 suddenly crashes(e.g. segfault);
(4) Fio is still running and submit IOs(but these IOs cannot
be dispatched now)
(5) User starts process 1 and attach it to /dev/ublkc0
(6) All rqs are handled by process 1 now and IOs can be
completed now.

Note: The backend must tolerate double-write because we re-issue
a rq sent to the old process 0 before.

We provide a sample script here to simulate the above steps:

***************************script***************************
LOOPS=10

__ublk_get_pid() {
pid=`./ublk list -n 0 | grep "pid" | awk '{print $7}'`
echo $pid
}

ublk_recover_kill()
{
for CNT in `seq $LOOPS`; do
dmesg -C
pid=`__ublk_get_pid`
echo -e "*** kill $pid now ***"
kill -9 $pid
sleep 6
echo -e "*** recover now ***"
./ublk recover -n 0
sleep 6
done
}

ublk_test()
{
echo -e "*** add ublk device ***"
./ublk add -t null -d 4 -i 1
sleep 2
echo -e "*** start fio ***"
fio --bs=4k \
--filename=/dev/ublkb0 \
--runtime=140s \
--rw=read &
sleep 4
ublk_recover_kill
wait
echo -e "*** delete ublk device ***"
./ublk del -n 0
}

for CNT in `seq 4`; do
modprobe -rv ublk_drv
modprobe ublk_drv
echo -e "************ round $CNT ************"
ublk_test
sleep 5
done
***************************script***************************

You may run it with our modified ublksrv[2] which supports
recovery feature. No I/O error occurs and you can verify it
by typing
$ perf-tools/bin/tpoint block:block_rq_error

The basic idea of USER_RECOVERY is quite straightfoward:
(1) quiesce ublk queues and requeue/abort rqs.
(2) release/free everything belongs to the dying process.
Note: Since ublk_drv does save information about user process,
this work is important because we don't expect any resource
lekage. Particularly, ioucmds from the dying ubq_daemons
need to be completed(freed).
(3) allow new ubq_daemons issue FETCH_REQ.
Note: ublk_ch_uring_cmd() checks some states and flags. We
have to set them to a correct value.

Here is steps to reocver:
(0) requests dispatched after the corresponding ubq_daemon is dying
are requeued.
(1) monitor_work finds one dying ubq_daemon, and it should
schedule quiesce_work and requeue/abort requests issued to
userspace before the ubq_daemon is dying.
(2) quiesce_work must (a)quiesce request queue to ban any incoming
ublk_queue_rq(), (b)wait unitl all rqs are IDLE, (c)complete old
ioucmds. Then the ublk device is ready for recovery or stop.
(3) Since io_uring resources are released, ublk_ch_release() is called
and all ublk_ios are reset to be ready for a new process.
(4) Then, user should start a new process and ubq_daemons(pthreads) and
send FETCH_REQ by io_uring_enter() to make all ubqs be ready. The
user must correctly setup queues, flags and so on(how to persist
user's information is not related to this patchset).
(6) The user sends RECOVER_DEV ctrl-cmd to /dev/ublk-control with a
dev_id X.
(7) After receiving RECOVER_DEV, ublk_drv waits for all ubq_daemons
getting ready. Then it unquiesces request queue and new rqs are
allowed.

You should use ublksrv[2] and tests[3] provided by us. We add 3 additional
tests to verify that recovery feature works. Our code will be PR-ed to
Ming's repo soon.

[1] https://github.com/ming1/ubdsrv
[2] https://github.com/old-memories/ubdsrv/tree/recovery-v1
[3] https://github.com/old-memories/ubdsrv/tree/recovery-v1/tests/generic

Since V4:
(1) remove WARN_ON_ONCE() checks on ubq->nr_io_ready
(2) cancel monitor_work after ublk_wait_tagset_rqs_idle() returns.
(3) refactor ublk_reinit code as generic code for both non-recovery
and recovery
(4) rename RESTART_DEV as RECOVER_DEV
(5) avoid UAF on ubq_daemon in ublk_ch_release()

Since V3:
(1) do not kick requeue list in ublk_queue_rq() or io_uring fallback wq
with a dying ubq_daemon but kicking the list once while unquiescing dev
(2) add comment on requeing rqs in ublk_queue_rq(), or io_uring fallback wq
with a dying ubq_daemon
(3) split support for UBLK_F_USER_RECOVERY_REISSUE into a single patch
(4) let monitor_work abort/requeue rqs issued to userspace instead of
quiesce_work with recovery enabled
(5) alway wait until no INFLIGHT rq exists in ublk_quiesce_dev()
(6) move ublk re-init stuff into ublk_ch_release()
(7) let ublk_quiesce_dev() go on as long as one ubq_daemon is dying
(8) add only one ctrl-cmd and rename it as RESTART_DEV
(9) check ub.dev_info->flags instead of iterating on all ubqs
(10) do not disable recoevry feature, but always qiuesce dev in
ublk_stop_dev() and then unquiesce it
(11) add doc on USER_RECOVERY feature

Since V2:
(1) run ublk_quiesce_dev() in a standalone work.
(2) do not run monitor_work after START_USER_RECOVERY is handled.
(3) refactor recovery feature code so that it does not affect current code.

Since V1:
(1) refactor cover letter. Add intruduction on "how to detect a crash" and
"why we need recovery feature".
(2) do not refactor task_work and ublk_queue_rq().
(3) allow users freely stop/recover the device.
(4) add comment on ublk_cancel_queue().
(5) refactor monitor_work and aborting machenism since we add recovery
machenism in monitor_work.

ZiyangZhang (7):
ublk_drv: check 'current' instead of 'ubq_daemon'
ublk_drv: define macros for recovery feature and check them
ublk_drv: requeue rqs with recovery feature enabled
ublk_drv: consider recovery feature in aborting mechanism
ublk_drv: support UBLK_F_USER_RECOVERY_REISSUE
ublk_drv: allow new process to open ublk chardev with recovery feature
enabled
Documentation: document ublk user recovery feature

Documentation/block/ublk.rst | 25 ++++
drivers/block/ublk_drv.c | 259 ++++++++++++++++++++++++++++++++--
include/uapi/linux/ublk_cmd.h | 6 +
3 files changed, 276 insertions(+), 14 deletions(-)

--
2.27.0


2022-09-22 06:26:31

by Ziyang Zhang

[permalink] [raw]
Subject: [PATCH V5 7/7] Documentation: document ublk user recovery feature

Add documentation for user recovery feature of ublk subsystem.

Signed-off-by: ZiyangZhang <[email protected]>
---
Documentation/block/ublk.rst | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index 2122d1a4a541..3f1bfd51898b 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -144,6 +144,31 @@ managing and controlling ublk devices with help of several control commands:
For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's
responsibility to save IO target specific info in userspace.

+- ``UBLK_CMD_RESTART_DEV``
+
+ This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. The old
+ process has exited and ublk device is quiesced. Then, user should start a new
+ process which opens ``/dev/ublkc*`` and gets all ublk queues be ready. Finally
+ user should send this command. When this command returns, ublk device is
+ unquiesced and new I/O requests are passed to the new process.
+
+- user recovery feature description
+
+ Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and
+ ``UBLK_F_USER_RECOVERY_REISSUE``.
+
+ With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublksrv io handler) is
+ dying, ublk does not release ``/dev/ublkc*`` or ``/dev/ublkb*`` but requeues all
+ inflight requests which have not been issued to userspace. Requests which have
+ been issued to userspace are aborted.
+
+ With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublksrv io
+ handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``, requests which have been
+ issued to userspace are requeued and will be re-issued to the new process after
+ handling ``UBLK_CMD_RESTART_DEV``. ``UBLK_F_USER_RECOVERY_REISSUE`` is designed
+ for backends who tolerate double-write since the driver may issue the same
+ I/O request twice. It might be useful to a read-only FS or a VM backend.
+
Data plane
----------

--
2.27.0

2022-09-22 06:26:46

by Ziyang Zhang

[permalink] [raw]
Subject: [PATCH V5 6/7] ublk_drv: allow new process to open ublk chardev with recovery feature enabled

With recovery feature enabled, if ublk chardev is ready to be released
and quiesce_work has been scheduled, we:
(1) reinit all ubqs, including:
(a) put the task_struct and reset ->ubq_daemon to NULL.
(b) reset all ublk_io.
(2) reset ub->mm to NULL.
Then ublk chardev is released and new process can open it.

RECOVER_DEV is introduced as a new ctrl-cmd for recovery feature.
After the chardev is opened and all ubqs are ready, user should send
RECOVER_DEV to:
(1) wait until all new ubq_daemons getting ready.
(2) update ublksrv_pid
(3) unquiesce the request queue and expect incoming ublk_queue_rq()
(4) convert ub's state to UBLK_S_DEV_LIVE
(5) reschedule monitor_work

Signed-off-by: ZiyangZhang <[email protected]>
---
drivers/block/ublk_drv.c | 67 +++++++++++++++++++++++++++++++++++
include/uapi/linux/ublk_cmd.h | 1 +
2 files changed, 68 insertions(+)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 3977869d2bc4..ac8bf497567f 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -898,10 +898,40 @@ static int ublk_ch_open(struct inode *inode, struct file *filp)
return 0;
}

+static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
+{
+ int i;
+
+ /* old daemon is PF_EXITING, put it now */
+ if (ubq->ubq_daemon) {
+ put_task_struct(ubq->ubq_daemon);
+ /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
+ ubq->ubq_daemon = NULL;
+ }
+
+ for (i = 0; i < ubq->q_depth; i++) {
+ struct ublk_io *io = &ubq->ios[i];
+
+ /* forget everything now and be ready for new FETCH_REQ */
+ io->flags = 0;
+ io->cmd = NULL;
+ io->addr = 0;
+ }
+ ubq->nr_io_ready = 0;
+}
+
static int ublk_ch_release(struct inode *inode, struct file *filp)
{
struct ublk_device *ub = filp->private_data;
+ int i;

+ pr_devel("%s: reinit queues for dev id %d.\n", __func__, ub->dev_info.dev_id);
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_queue_reinit(ub, ublk_get_queue(ub, i));
+ /* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
+ ub->mm = NULL;
+ ub->nr_queues_ready = 0;
+ init_completion(&ub->completion);
clear_bit(UB_STATE_OPEN, &ub->state);
return 0;
}
@@ -1873,6 +1903,40 @@ static int ublk_ctrl_set_params(struct io_uring_cmd *cmd)
return ret;
}

+static int ublk_ctrl_recover_dev(struct io_uring_cmd *cmd)
+{
+ struct ublksrv_ctrl_cmd *header = (struct ublksrv_ctrl_cmd *)cmd->cmd;
+ int ublksrv_pid = (int)header->data[0];
+ struct ublk_device *ub;
+ int ret = -EINVAL;
+
+ ub = ublk_get_device_from_id(header->dev_id);
+ if (!ub)
+ return ret;
+
+ /* wait until new ubq_daemon sending all FETCH_REQ */
+ wait_for_completion_interruptible(&ub->completion);
+
+ mutex_lock(&ub->mutex);
+ if (!ublk_can_use_recovery(ub))
+ goto out_unlock;
+
+ if (ub->dev_info.state != UBLK_S_DEV_QUIESCED) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ ub->dev_info.ublksrv_pid = ublksrv_pid;
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ blk_mq_kick_requeue_list(ub->ub_disk->queue);
+ ub->dev_info.state = UBLK_S_DEV_LIVE;
+ schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD);
+ ret = 0;
+ out_unlock:
+ mutex_unlock(&ub->mutex);
+ ublk_put_device(ub);
+ return ret;
+}
+
static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
unsigned int issue_flags)
{
@@ -1914,6 +1978,9 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
case UBLK_CMD_SET_PARAMS:
ret = ublk_ctrl_set_params(cmd);
break;
+ case UBLK_CMD_RECOVER_DEV:
+ ret = ublk_ctrl_recover_dev(cmd);
+ break;
default:
break;
}
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 332370628757..deb674f1fbc9 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -17,6 +17,7 @@
#define UBLK_CMD_STOP_DEV 0x07
#define UBLK_CMD_SET_PARAMS 0x08
#define UBLK_CMD_GET_PARAMS 0x09
+#define UBLK_CMD_RECOVER_DEV 0x10

/*
* IO commands, issued by ublk server, and handled by ublk driver.
--
2.27.0

2022-09-22 06:27:56

by Ziyang Zhang

[permalink] [raw]
Subject: [PATCH V5 3/7] ublk_drv: requeue rqs with recovery feature enabled

With recovery feature enabled, in ublk_queue_rq or task work
(in exit_task_work or fallback wq), we requeue rqs instead of
ending(aborting) them. Besides, No matter recovery feature is enabled
or disabled, we schedule monitor_work immediately.

Signed-off-by: ZiyangZhang <[email protected]>
---
drivers/block/ublk_drv.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 05bfbaa49696..a1cbcc5e9285 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -655,6 +655,16 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)

#define UBLK_REQUEUE_DELAY_MS 3

+static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
+ struct request *rq)
+{
+ /* We cannot process this rq so just requeue it. */
+ if (ublk_queue_can_use_recovery(ubq))
+ blk_mq_requeue_request(rq, false);
+ else
+ blk_mq_end_request(rq, BLK_STS_IOERR);
+}
+
static inline void __ublk_rq_task_work(struct request *req)
{
struct ublk_queue *ubq = req->mq_hctx->driver_data;
@@ -677,7 +687,7 @@ static inline void __ublk_rq_task_work(struct request *req)
* (2) current->flags & PF_EXITING.
*/
if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
- blk_mq_end_request(req, BLK_STS_IOERR);
+ __ublk_abort_rq_in_task_work(ubq, req);
mod_delayed_work(system_wq, &ub->monitor_work, 0);
return;
}
@@ -752,6 +762,17 @@ static void ublk_rq_task_work_fn(struct callback_head *work)
__ublk_rq_task_work(req);
}

+static inline blk_status_t __ublk_abort_rq(struct ublk_queue *ubq,
+ struct request *rq)
+{
+ /* We cannot process this rq so just requeue it. */
+ if (ublk_queue_can_use_recovery(ubq)) {
+ blk_mq_requeue_request(rq, false);
+ return BLK_STS_OK;
+ }
+ return BLK_STS_IOERR;
+}
+
static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
const struct blk_mq_queue_data *bd)
{
@@ -769,7 +790,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
if (unlikely(ubq_daemon_is_dying(ubq))) {
fail:
mod_delayed_work(system_wq, &ubq->dev->monitor_work, 0);
- return BLK_STS_IOERR;
+ return __ublk_abort_rq(ubq, rq);
}

if (ublk_can_use_task_work(ubq)) {
--
2.27.0

2022-09-22 06:30:35

by Ziyang Zhang

[permalink] [raw]
Subject: [PATCH V5 5/7] ublk_drv: support UBLK_F_USER_RECOVERY_REISSUE

UBLK_F_USER_RECOVERY_REISSUE implies that:
With a dying ubq_daemon, ublk_drv let monitor_work requeues rq issued to
userspace(ublksrv) before the ubq_daemon is dying.

UBLK_F_USER_RECOVERY_REISSUE is designed for backends which:
(1) tolerate double-write since ublk_drv may issue the same rq
twice.
(2) does not let frontend users get I/O error, such as read-only FS
and VM backend.

Signed-off-by: ZiyangZhang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
drivers/block/ublk_drv.c | 22 ++++++++++++++++++----
include/uapi/linux/ublk_cmd.h | 2 ++
2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 888ad16c5dd8..3977869d2bc4 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -50,7 +50,8 @@
#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
| UBLK_F_URING_CMD_COMP_IN_TASK \
| UBLK_F_NEED_GET_DATA \
- | UBLK_F_USER_RECOVERY)
+ | UBLK_F_USER_RECOVERY \
+ | UBLK_F_USER_RECOVERY_REISSUE)

/* All UBLK_PARAM_TYPE_* should be included here */
#define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD)
@@ -325,6 +326,15 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
PAGE_SIZE);
}

+static inline bool ublk_queue_can_use_recovery_reissue(
+ struct ublk_queue *ubq)
+{
+ if ((ubq->flags & UBLK_F_USER_RECOVERY) &&
+ (ubq->flags & UBLK_F_USER_RECOVERY_REISSUE))
+ return true;
+ return false;
+}
+
static inline bool ublk_queue_can_use_recovery(
struct ublk_queue *ubq)
{
@@ -629,13 +639,17 @@ static void ublk_complete_rq(struct request *req)
* Also aborting may not be started yet, keep in mind that one failed
* request may be issued by block layer again.
*/
-static void __ublk_fail_req(struct ublk_io *io, struct request *req)
+static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
+ struct request *req)
{
WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);

if (!(io->flags & UBLK_IO_FLAG_ABORTED)) {
io->flags |= UBLK_IO_FLAG_ABORTED;
- blk_mq_end_request(req, BLK_STS_IOERR);
+ if (ublk_queue_can_use_recovery_reissue(ubq))
+ blk_mq_requeue_request(req, false);
+ else
+ blk_mq_end_request(req, BLK_STS_IOERR);
}
}

@@ -973,7 +987,7 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
*/
rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
if (rq)
- __ublk_fail_req(io, rq);
+ __ublk_fail_req(ubq, io, rq);
}
}
ublk_put_device(ub);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 340ff14bde49..332370628757 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -76,6 +76,8 @@

#define UBLK_F_USER_RECOVERY (1UL << 3)

+#define UBLK_F_USER_RECOVERY_REISSUE (1UL << 4)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
--
2.27.0

2022-09-22 06:39:36

by Ziyang Zhang

[permalink] [raw]
Subject: [PATCH V5 2/7] ublk_drv: define macros for recovery feature and check them

Define some macros for recovery feature.

UBLK_S_DEV_QUIESCED implies that ublk_device is quiesced
and is ready for recovery. This state can be observed by userspace.

UBLK_F_USER_RECOVERY implies that:
(1) ublk_drv enables recovery feature. It won't let monitor_work to
automatically abort rqs and release the device.
(2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to
userspace(ublksrv) before crash.
(3) With a dying ubq_daemon, in task work and ublk_queue_rq(),
ublk_drv requeues rqs.

Signed-off-by: ZiyangZhang <[email protected]>
---
drivers/block/ublk_drv.c | 18 +++++++++++++++++-
include/uapi/linux/ublk_cmd.h | 3 +++
2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index c39b67d7133d..05bfbaa49696 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -49,7 +49,8 @@
/* All UBLK_F_* have to be included into UBLK_F_ALL */
#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
| UBLK_F_URING_CMD_COMP_IN_TASK \
- | UBLK_F_NEED_GET_DATA)
+ | UBLK_F_NEED_GET_DATA \
+ | UBLK_F_USER_RECOVERY)

/* All UBLK_PARAM_TYPE_* should be included here */
#define UBLK_PARAM_TYPE_ALL (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD)
@@ -323,6 +324,21 @@ static inline int ublk_queue_cmd_buf_size(struct ublk_device *ub, int q_id)
PAGE_SIZE);
}

+static inline bool ublk_queue_can_use_recovery(
+ struct ublk_queue *ubq)
+{
+ if (ubq->flags & UBLK_F_USER_RECOVERY)
+ return true;
+ return false;
+}
+
+static inline bool ublk_can_use_recovery(struct ublk_device *ub)
+{
+ if (ub->dev_info.flags & UBLK_F_USER_RECOVERY)
+ return true;
+ return false;
+}
+
static void ublk_free_disk(struct gendisk *disk)
{
struct ublk_device *ub = disk->private_data;
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 677edaab2b66..340ff14bde49 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -74,9 +74,12 @@
*/
#define UBLK_F_NEED_GET_DATA (1UL << 2)

+#define UBLK_F_USER_RECOVERY (1UL << 3)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
+#define UBLK_S_DEV_QUIESCED 2

/* shipped via sqe->cmd of io_uring command */
struct ublksrv_ctrl_cmd {
--
2.27.0

2022-09-23 02:41:02

by Ziyang Zhang

[permalink] [raw]
Subject: Re: [PATCH V5 6/7] ublk_drv: allow new process to open ublk chardev with recovery feature enabled

On 2022/9/22 14:17, ZiyangZhang wrote:
> With recovery feature enabled, if ublk chardev is ready to be released
> and quiesce_work has been scheduled, we:
> (1) reinit all ubqs, including:
> (a) put the task_struct and reset ->ubq_daemon to NULL.
> (b) reset all ublk_io.
> (2) reset ub->mm to NULL.
> Then ublk chardev is released and new process can open it.
>
> RECOVER_DEV is introduced as a new ctrl-cmd for recovery feature.
> After the chardev is opened and all ubqs are ready, user should send
> RECOVER_DEV to:
> (1) wait until all new ubq_daemons getting ready.
> (2) update ublksrv_pid
> (3) unquiesce the request queue and expect incoming ublk_queue_rq()
> (4) convert ub's state to UBLK_S_DEV_LIVE
> (5) reschedule monitor_work
>
> Signed-off-by: ZiyangZhang <[email protected]>
> ---
> drivers/block/ublk_drv.c | 67 +++++++++++++++++++++++++++++++++++
> include/uapi/linux/ublk_cmd.h | 1 +
> 2 files changed, 68 insertions(+)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 3977869d2bc4..ac8bf497567f 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -898,10 +898,40 @@ static int ublk_ch_open(struct inode *inode, struct file *filp)
> return 0;
> }
>
> +static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
> +{
> + int i;
> +
> + /* old daemon is PF_EXITING, put it now */
> + if (ubq->ubq_daemon) {
> + put_task_struct(ubq->ubq_daemon);
> + /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
> + ubq->ubq_daemon = NULL;
> + }
> +
> + for (i = 0; i < ubq->q_depth; i++) {
> + struct ublk_io *io = &ubq->ios[i];
> +
> + /* forget everything now and be ready for new FETCH_REQ */
> + io->flags = 0;
> + io->cmd = NULL;
> + io->addr = 0;
> + }
> + ubq->nr_io_ready = 0;
> +}
> +
> static int ublk_ch_release(struct inode *inode, struct file *filp)
> {
> struct ublk_device *ub = filp->private_data;
> + int i;
>
> + pr_devel("%s: reinit queues for dev id %d.\n", __func__, ub->dev_info.dev_id);
> + for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> + ublk_queue_reinit(ub, ublk_get_queue(ub, i));
> + /* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */
> + ub->mm = NULL;
> + ub->nr_queues_ready = 0;
> + init_completion(&ub->completion);
> clear_bit(UB_STATE_OPEN, &ub->state);
> return 0;
> }

This one is not correct. I will figure out how to correctly
put_task_struct(ubq->ubq_daemon) with recovery feature enabled.

2022-09-23 03:20:08

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH V5 2/7] ublk_drv: define macros for recovery feature and check them

On Thu, Sep 22, 2022 at 02:17:29PM +0800, ZiyangZhang wrote:
> Define some macros for recovery feature.
>
> UBLK_S_DEV_QUIESCED implies that ublk_device is quiesced
> and is ready for recovery. This state can be observed by userspace.
>
> UBLK_F_USER_RECOVERY implies that:
> (1) ublk_drv enables recovery feature. It won't let monitor_work to
> automatically abort rqs and release the device.
> (2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to
> userspace(ublksrv) before crash.
> (3) With a dying ubq_daemon, in task work and ublk_queue_rq(),
> ublk_drv requeues rqs.
>
> Signed-off-by: ZiyangZhang <[email protected]>

Reviewed-by: Ming Lei <[email protected]>


Thanks,
Ming

2022-09-23 04:08:27

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH V5 3/7] ublk_drv: requeue rqs with recovery feature enabled

On Thu, Sep 22, 2022 at 02:17:30PM +0800, ZiyangZhang wrote:
> With recovery feature enabled, in ublk_queue_rq or task work
> (in exit_task_work or fallback wq), we requeue rqs instead of
> ending(aborting) them. Besides, No matter recovery feature is enabled
> or disabled, we schedule monitor_work immediately.
>
> Signed-off-by: ZiyangZhang <[email protected]>
> ---
> drivers/block/ublk_drv.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 05bfbaa49696..a1cbcc5e9285 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -655,6 +655,16 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
>
> #define UBLK_REQUEUE_DELAY_MS 3
>
> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> + struct request *rq)
> +{
> + /* We cannot process this rq so just requeue it. */
> + if (ublk_queue_can_use_recovery(ubq))
> + blk_mq_requeue_request(rq, false);
> + else
> + blk_mq_end_request(rq, BLK_STS_IOERR);
> +}

__ublk_abort_rq_in_task_work() can be renamed as __ublk_abort_rq(), then
be reused for the user of ublk_queue_rq().

> +
> static inline void __ublk_rq_task_work(struct request *req)
> {
> struct ublk_queue *ubq = req->mq_hctx->driver_data;
> @@ -677,7 +687,7 @@ static inline void __ublk_rq_task_work(struct request *req)
> * (2) current->flags & PF_EXITING.
> */
> if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) {
> - blk_mq_end_request(req, BLK_STS_IOERR);
> + __ublk_abort_rq_in_task_work(ubq, req);
> mod_delayed_work(system_wq, &ub->monitor_work, 0);
> return;
> }
> @@ -752,6 +762,17 @@ static void ublk_rq_task_work_fn(struct callback_head *work)
> __ublk_rq_task_work(req);
> }
>
> +static inline blk_status_t __ublk_abort_rq(struct ublk_queue *ubq,
> + struct request *rq)
> +{
> + /* We cannot process this rq so just requeue it. */
> + if (ublk_queue_can_use_recovery(ubq)) {
> + blk_mq_requeue_request(rq, false);
> + return BLK_STS_OK;
> + }
> + return BLK_STS_IOERR;
> +}

The above helper isn't needed.

> +
> static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> const struct blk_mq_queue_data *bd)
> {
> @@ -769,7 +790,7 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> if (unlikely(ubq_daemon_is_dying(ubq))) {
> fail:
> mod_delayed_work(system_wq, &ubq->dev->monitor_work, 0);
> - return BLK_STS_IOERR;
> + return __ublk_abort_rq(ubq, rq);

Then you can call __ublk_abort_rq(), and return BLK_STS_OK.


thanks.
Ming