2020-06-07 15:36:16

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 0/4] cancel all reqs of an exiting task

io_uring_flush() {
...
if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
io_wq_cancel_pid(ctx->io_wq, task_pid_vnr(current));
}

This cancels only the first matched request. The pathset is mainly
about fixing that. [1,2] are preps, [3/4] is the fix.

The [4/4] tries to improve the worst case for io_uring_cancel_files(),
that's when they are a lot of inflights with ->files. Instead of doing
{kill(); wait();} one by one, it cancels all of them at once.

Pavel Begunkov (4):
io-wq: reorder cancellation pending -> running
io-wq: add an option to cancel all matched reqs
io_uring: cancel all task's requests on exit
io_uring: batch cancel in io_uring_cancel_files()

fs/io-wq.c | 108 ++++++++++++++++++++++++++------------------------
fs/io-wq.h | 3 +-
fs/io_uring.c | 29 ++++++++++++--
3 files changed, 83 insertions(+), 57 deletions(-)

--
2.24.0


2020-06-07 15:36:20

by Pavel Begunkov

[permalink] [raw]
Subject: [liburing PATCH] flush/test: test flush of dying process

Make sure all requests of a going away process are cancelled.

Signed-off-by: Pavel Begunkov <[email protected]>
---
test/io-cancel.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 102 insertions(+)

diff --git a/test/io-cancel.c b/test/io-cancel.c
index 75fbf43..d8b04d9 100644
--- a/test/io-cancel.c
+++ b/test/io-cancel.c
@@ -10,6 +10,8 @@
#include <fcntl.h>
#include <sys/types.h>
#include <sys/time.h>
+#include <sys/mman.h>
+#include <sys/wait.h>

#include "liburing.h"

@@ -17,6 +19,8 @@
#define BS 4096
#define BUFFERS (FILE_SIZE / BS)

+#define NR_REQS 10
+
static struct iovec *vecs;

static int create_buffers(void)
@@ -228,10 +232,108 @@ err:
return 1;
}

+static void submit_child(struct io_uring *ring, int fds[2])
+{
+ struct io_uring_sqe *sqe;
+ int ret, i;
+ char buffer[128];
+
+ for (i = 0; i < NR_REQS; ++i) {
+ sqe = io_uring_get_sqe(ring);
+ if (!sqe) {
+ fprintf(stderr, "get sqe failed\n");
+ goto err;
+ }
+ io_uring_prep_read(sqe, fds[0], buffer, sizeof(buffer), 0);
+ sqe->flags |= IOSQE_ASYNC;
+ }
+
+ ret = io_uring_submit(ring);
+ if (ret != NR_REQS) {
+ printf("sqe submit failed: %d\n", ret);
+ goto err;
+ }
+
+ exit(0);
+err:
+ exit(-1);
+}
+
+/* make sure requests of a going away task are cancelled */
+static int test_cancel_exiting_task(void)
+{
+ struct io_uring *ring;
+ int ret, i;
+ pid_t p;
+ int fds[2];
+
+ ring = mmap(0, sizeof(*ring), PROT_READ|PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS, 0, 0);
+ if (!ring) {
+ fprintf(stderr, "mmap failed\n");
+ return 1;
+ }
+
+ ret = io_uring_queue_init(NR_REQS * 2, ring, 0);
+ if (ret < 0) {
+ fprintf(stderr, "queue init failed\n");
+ return 1;
+ }
+
+ if (pipe(fds)) {
+ perror("pipe() failed");
+ exit(1);
+ }
+
+ p = fork();
+ if (p < 0) {
+ printf("fork() failed\n");
+ return 1;
+ }
+
+ if (p == 0) {
+ /* child */
+ submit_child(ring, fds);
+ } else {
+ int wstatus;
+ struct io_uring_cqe *cqe;
+
+ waitpid(p, &wstatus, 0);
+ if (!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) != 0) {
+ fprintf(stderr, "child failed\n");
+ return 1;
+ }
+
+ for (i = 0; i < NR_REQS; ++i) {
+ ret = io_uring_wait_cqe(ring, &cqe);
+ if (ret < 0) {
+ printf("wait_cqes: wait completion %d\n", ret);
+ return 1;
+ }
+ if (cqe->res != -ECANCELED && cqe->res != -EINTR) {
+ fprintf(stderr, "invalid CQE: %i\n", cqe->res);
+ return 1;
+ }
+ io_uring_cqe_seen(ring, cqe);
+ }
+ }
+
+ close(fds[0]);
+ close(fds[1]);
+ io_uring_queue_exit(ring);
+ munmap(ring, sizeof(*ring));
+ return 0;
+}
+
int main(int argc, char *argv[])
{
int i, ret;

+ if (test_cancel_exiting_task()) {
+ fprintf(stderr, "test_cancel_exiting_task failed\n");
+ return 1;
+ }
+
if (create_file(".basic-rw")) {
fprintf(stderr, "file creation failed\n");
goto err;
--
2.24.0

2020-06-07 15:36:22

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 4/4] io_uring: batch cancel in io_uring_cancel_files()

Instead of waiting for each request one by one, first try to cancel all
of them in a batched manner, and then go over inflight_list/etc to reap
leftovers.

Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index bcfb3b14b888..3aebbf96c123 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7519,9 +7519,22 @@ static int io_uring_release(struct inode *inode, struct file *file)
return 0;
}

+static bool io_wq_files_match(struct io_wq_work *work, void *data)
+{
+ struct files_struct *files = data;
+
+ return work->files == files;
+}
+
static void io_uring_cancel_files(struct io_ring_ctx *ctx,
struct files_struct *files)
{
+ if (list_empty_careful(&ctx->inflight_list))
+ return;
+
+ /* cancel all at once, should be faster than doing it one by one*/
+ io_wq_cancel_cb(ctx->io_wq, io_wq_files_match, files, true);
+
while (!list_empty_careful(&ctx->inflight_list)) {
struct io_kiocb *cancel_req = NULL, *req;
DEFINE_WAIT(wait);
--
2.24.0

2020-06-07 15:36:33

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 3/4] io_uring: cancel all task's requests on exit

If a process is going away, io_uring_flush() will cancel only 1
request with a matching pid. Cancel all of them

Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.c | 14 --------------
fs/io-wq.h | 1 -
fs/io_uring.c | 14 ++++++++++++--
3 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 6d2e8ccc229e..2bfa9117bc28 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -1022,20 +1022,6 @@ enum io_wq_cancel io_wq_cancel_work(struct io_wq *wq, struct io_wq_work *cwork)
return io_wq_cancel_cb(wq, io_wq_io_cb_cancel_data, (void *)cwork, false);
}

-static bool io_wq_pid_match(struct io_wq_work *work, void *data)
-{
- pid_t pid = (pid_t) (unsigned long) data;
-
- return work->task_pid == pid;
-}
-
-enum io_wq_cancel io_wq_cancel_pid(struct io_wq *wq, pid_t pid)
-{
- void *data = (void *) (unsigned long) pid;
-
- return io_wq_cancel_cb(wq, io_wq_pid_match, data, false);
-}
-
struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
{
int ret = -ENOMEM, node;
diff --git a/fs/io-wq.h b/fs/io-wq.h
index 8902903831f2..df8a4cd3236d 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -129,7 +129,6 @@ static inline bool io_wq_is_hashed(struct io_wq_work *work)

void io_wq_cancel_all(struct io_wq *wq);
enum io_wq_cancel io_wq_cancel_work(struct io_wq *wq, struct io_wq_work *cwork);
-enum io_wq_cancel io_wq_cancel_pid(struct io_wq *wq, pid_t pid);

typedef bool (work_cancel_fn)(struct io_wq_work *, void *);

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8b0c9a5bcec1..bcfb3b14b888 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7577,6 +7577,13 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
}
}

+static bool io_cancel_pid_cb(struct io_wq_work *work, void *data)
+{
+ pid_t pid = (pid_t) (unsigned long) data;
+
+ return work->task_pid == pid;
+}
+
static int io_uring_flush(struct file *file, void *data)
{
struct io_ring_ctx *ctx = file->private_data;
@@ -7586,8 +7593,11 @@ static int io_uring_flush(struct file *file, void *data)
/*
* If the task is going away, cancel work it may have pending
*/
- if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
- io_wq_cancel_pid(ctx->io_wq, task_pid_vnr(current));
+ if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
+ void *data = (void *) (unsigned long)task_pid_vnr(current);
+
+ io_wq_cancel_cb(ctx->io_wq, io_cancel_pid_cb, data, true);
+ }

return 0;
}
--
2.24.0

2020-06-07 15:37:36

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 1/4] io-wq: reorder cancellation pending -> running

Go all over all pending lists and cancel works there, and only then
try to match running requests. No functional changes here, just a
preparation for bulk cancellation.

Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.c | 54 ++++++++++++++++++++++++++++++++----------------------
1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 4023c9846860..3283f8c5b5a1 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -931,19 +931,14 @@ static bool io_wq_worker_cancel(struct io_worker *worker, void *data)
return ret;
}

-static enum io_wq_cancel io_wqe_cancel_work(struct io_wqe *wqe,
- struct io_cb_cancel_data *match)
+static bool io_wqe_cancel_pending_work(struct io_wqe *wqe,
+ struct io_cb_cancel_data *match)
{
struct io_wq_work_node *node, *prev;
struct io_wq_work *work;
unsigned long flags;
bool found = false;

- /*
- * First check pending list, if we're lucky we can just remove it
- * from there. CANCEL_OK means that the work is returned as-new,
- * no completion will be posted for it.
- */
spin_lock_irqsave(&wqe->lock, flags);
wq_list_for_each(node, prev, &wqe->work_list) {
work = container_of(node, struct io_wq_work, list);
@@ -956,21 +951,20 @@ static enum io_wq_cancel io_wqe_cancel_work(struct io_wqe *wqe,
}
spin_unlock_irqrestore(&wqe->lock, flags);

- if (found) {
+ if (found)
io_run_cancel(work, wqe);
- return IO_WQ_CANCEL_OK;
- }
+ return found;
+}
+
+static bool io_wqe_cancel_running_work(struct io_wqe *wqe,
+ struct io_cb_cancel_data *match)
+{
+ bool found;

- /*
- * Now check if a free (going busy) or busy worker has the work
- * currently running. If we find it there, we'll return CANCEL_RUNNING
- * as an indication that we attempt to signal cancellation. The
- * completion will run normally in this case.
- */
rcu_read_lock();
found = io_wq_for_each_worker(wqe, io_wq_worker_cancel, match);
rcu_read_unlock();
- return found ? IO_WQ_CANCEL_RUNNING : IO_WQ_CANCEL_NOTFOUND;
+ return found;
}

enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
@@ -980,18 +974,34 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
.fn = cancel,
.data = data,
};
- enum io_wq_cancel ret = IO_WQ_CANCEL_NOTFOUND;
int node;

+ /*
+ * First check pending list, if we're lucky we can just remove it
+ * from there. CANCEL_OK means that the work is returned as-new,
+ * no completion will be posted for it.
+ */
for_each_node(node) {
struct io_wqe *wqe = wq->wqes[node];

- ret = io_wqe_cancel_work(wqe, &match);
- if (ret != IO_WQ_CANCEL_NOTFOUND)
- break;
+ if (io_wqe_cancel_pending_work(wqe, &match))
+ return IO_WQ_CANCEL_OK;
}

- return ret;
+ /*
+ * Now check if a free (going busy) or busy worker has the work
+ * currently running. If we find it there, we'll return CANCEL_RUNNING
+ * as an indication that we attempt to signal cancellation. The
+ * completion will run normally in this case.
+ */
+ for_each_node(node) {
+ struct io_wqe *wqe = wq->wqes[node];
+
+ if (io_wqe_cancel_running_work(wqe, &match))
+ return IO_WQ_CANCEL_RUNNING;
+ }
+
+ return IO_WQ_CANCEL_NOTFOUND;
}

static bool io_wq_io_cb_cancel_data(struct io_wq_work *work, void *data)
--
2.24.0

2020-06-07 15:40:27

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 2/4] io-wq: add an option to cancel all matched reqs

This adds support for cancelling all io-wq works matching a predicate.
It isn't used yet, so no change in observable behaviour.

Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.c | 60 +++++++++++++++++++++++++++++----------------------
fs/io-wq.h | 2 +-
fs/io_uring.c | 2 +-
3 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 3283f8c5b5a1..6d2e8ccc229e 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -907,13 +907,15 @@ void io_wq_cancel_all(struct io_wq *wq)
struct io_cb_cancel_data {
work_cancel_fn *fn;
void *data;
+ int nr_running;
+ int nr_pending;
+ bool cancel_all;
};

static bool io_wq_worker_cancel(struct io_worker *worker, void *data)
{
struct io_cb_cancel_data *match = data;
unsigned long flags;
- bool ret = false;

/*
* Hold the lock to avoid ->cur_work going out of scope, caller
@@ -924,55 +926,55 @@ static bool io_wq_worker_cancel(struct io_worker *worker, void *data)
!(worker->cur_work->flags & IO_WQ_WORK_NO_CANCEL) &&
match->fn(worker->cur_work, match->data)) {
send_sig(SIGINT, worker->task, 1);
- ret = true;
+ match->nr_running++;
}
spin_unlock_irqrestore(&worker->lock, flags);

- return ret;
+ return match->nr_running && !match->cancel_all;
}

-static bool io_wqe_cancel_pending_work(struct io_wqe *wqe,
+static void io_wqe_cancel_pending_work(struct io_wqe *wqe,
struct io_cb_cancel_data *match)
{
struct io_wq_work_node *node, *prev;
struct io_wq_work *work;
unsigned long flags;
- bool found = false;

+retry:
spin_lock_irqsave(&wqe->lock, flags);
wq_list_for_each(node, prev, &wqe->work_list) {
work = container_of(node, struct io_wq_work, list);
+ if (!match->fn(work, match->data))
+ continue;

- if (match->fn(work, match->data)) {
- wq_list_del(&wqe->work_list, node, prev);
- found = true;
- break;
- }
+ wq_list_del(&wqe->work_list, node, prev);
+ spin_unlock_irqrestore(&wqe->lock, flags);
+ io_run_cancel(work, wqe);
+ match->nr_pending++;
+ if (!match->cancel_all)
+ return;
+
+ /* not safe to continue after unlock */
+ goto retry;
}
spin_unlock_irqrestore(&wqe->lock, flags);
-
- if (found)
- io_run_cancel(work, wqe);
- return found;
}

-static bool io_wqe_cancel_running_work(struct io_wqe *wqe,
+static void io_wqe_cancel_running_work(struct io_wqe *wqe,
struct io_cb_cancel_data *match)
{
- bool found;
-
rcu_read_lock();
- found = io_wq_for_each_worker(wqe, io_wq_worker_cancel, match);
+ io_wq_for_each_worker(wqe, io_wq_worker_cancel, match);
rcu_read_unlock();
- return found;
}

enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
- void *data)
+ void *data, bool cancel_all)
{
struct io_cb_cancel_data match = {
- .fn = cancel,
- .data = data,
+ .fn = cancel,
+ .data = data,
+ .cancel_all = cancel_all,
};
int node;

@@ -984,7 +986,8 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
for_each_node(node) {
struct io_wqe *wqe = wq->wqes[node];

- if (io_wqe_cancel_pending_work(wqe, &match))
+ io_wqe_cancel_pending_work(wqe, &match);
+ if (match.nr_pending && !match.cancel_all)
return IO_WQ_CANCEL_OK;
}

@@ -997,10 +1000,15 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
for_each_node(node) {
struct io_wqe *wqe = wq->wqes[node];

- if (io_wqe_cancel_running_work(wqe, &match))
+ io_wqe_cancel_running_work(wqe, &match);
+ if (match.nr_running && !match.cancel_all)
return IO_WQ_CANCEL_RUNNING;
}

+ if (match.nr_running)
+ return IO_WQ_CANCEL_RUNNING;
+ if (match.nr_pending)
+ return IO_WQ_CANCEL_OK;
return IO_WQ_CANCEL_NOTFOUND;
}

@@ -1011,7 +1019,7 @@ static bool io_wq_io_cb_cancel_data(struct io_wq_work *work, void *data)

enum io_wq_cancel io_wq_cancel_work(struct io_wq *wq, struct io_wq_work *cwork)
{
- return io_wq_cancel_cb(wq, io_wq_io_cb_cancel_data, (void *)cwork);
+ return io_wq_cancel_cb(wq, io_wq_io_cb_cancel_data, (void *)cwork, false);
}

static bool io_wq_pid_match(struct io_wq_work *work, void *data)
@@ -1025,7 +1033,7 @@ enum io_wq_cancel io_wq_cancel_pid(struct io_wq *wq, pid_t pid)
{
void *data = (void *) (unsigned long) pid;

- return io_wq_cancel_cb(wq, io_wq_pid_match, data);
+ return io_wq_cancel_cb(wq, io_wq_pid_match, data, false);
}

struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
diff --git a/fs/io-wq.h b/fs/io-wq.h
index 5ba12de7572f..8902903831f2 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -134,7 +134,7 @@ enum io_wq_cancel io_wq_cancel_pid(struct io_wq *wq, pid_t pid);
typedef bool (work_cancel_fn)(struct io_wq_work *, void *);

enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel,
- void *data);
+ void *data, bool cancel_all);

struct task_struct *io_wq_get_task(struct io_wq *wq);

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6391a00ff8b7..8b0c9a5bcec1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4884,7 +4884,7 @@ static int io_async_cancel_one(struct io_ring_ctx *ctx, void *sqe_addr)
enum io_wq_cancel cancel_ret;
int ret = 0;

- cancel_ret = io_wq_cancel_cb(ctx->io_wq, io_cancel_cb, sqe_addr);
+ cancel_ret = io_wq_cancel_cb(ctx->io_wq, io_cancel_cb, sqe_addr, false);
switch (cancel_ret) {
case IO_WQ_CANCEL_OK:
ret = 0;
--
2.24.0

2020-06-08 00:15:14

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 0/4] cancel all reqs of an exiting task

On 6/7/20 9:32 AM, Pavel Begunkov wrote:
> io_uring_flush() {
> ...
> if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
> io_wq_cancel_pid(ctx->io_wq, task_pid_vnr(current));
> }
>
> This cancels only the first matched request. The pathset is mainly
> about fixing that. [1,2] are preps, [3/4] is the fix.
>
> The [4/4] tries to improve the worst case for io_uring_cancel_files(),
> that's when they are a lot of inflights with ->files. Instead of doing
> {kill(); wait();} one by one, it cancels all of them at once.
>
> Pavel Begunkov (4):
> io-wq: reorder cancellation pending -> running
> io-wq: add an option to cancel all matched reqs
> io_uring: cancel all task's requests on exit
> io_uring: batch cancel in io_uring_cancel_files()
>
> fs/io-wq.c | 108 ++++++++++++++++++++++++++------------------------
> fs/io-wq.h | 3 +-
> fs/io_uring.c | 29 ++++++++++++--
> 3 files changed, 83 insertions(+), 57 deletions(-)

Can you rebase this to include the changing of using ->task_pid to
->task instead? See:

https://lore.kernel.org/io-uring/[email protected]/T/#u

Might as well do it at the same time, imho, since the cancel-by-task is
being reworked anyway.

--
Jens Axboe

2020-06-08 07:39:15

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 0/4] cancel all reqs of an exiting task

On 08/06/2020 03:12, Jens Axboe wrote:
> On 6/7/20 9:32 AM, Pavel Begunkov wrote:
>> io_uring_flush() {
>> ...
>> if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
>> io_wq_cancel_pid(ctx->io_wq, task_pid_vnr(current));
>> }
>>
>> This cancels only the first matched request. The pathset is mainly
>> about fixing that. [1,2] are preps, [3/4] is the fix.
>>
>> The [4/4] tries to improve the worst case for io_uring_cancel_files(),
>> that's when they are a lot of inflights with ->files. Instead of doing
>> {kill(); wait();} one by one, it cancels all of them at once.
>>
>> Pavel Begunkov (4):
>> io-wq: reorder cancellation pending -> running
>> io-wq: add an option to cancel all matched reqs
>> io_uring: cancel all task's requests on exit
>> io_uring: batch cancel in io_uring_cancel_files()
>>
>> fs/io-wq.c | 108 ++++++++++++++++++++++++++------------------------
>> fs/io-wq.h | 3 +-
>> fs/io_uring.c | 29 ++++++++++++--
>> 3 files changed, 83 insertions(+), 57 deletions(-)
>
> Can you rebase this to include the changing of using ->task_pid to
> ->task instead? See:
>
> https://lore.kernel.org/io-uring/[email protected]/T/#u
>
> Might as well do it at the same time, imho, since the cancel-by-task is
> being reworked anyway.

Ok, I was thinking to look there after anyway


--
Pavel Begunkov