I have been investigated a deadlock situation in my io_uring usage where
the SQPOLL thread was going to sleep and my user threads were waiting
inside io_uring_enter() for completions.
https://github.com/axboe/liburing/issues/367
This patch serie is the result of my investigation.
Olivier Langlois (2):
io_uring: Fix race condition when sqp thread goes to sleep
io_uring: Create define to modify a SQPOLL parameter
fs/io_uring.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--
2.32.0
If an asynchronous completion happens before the task is preparing
itself to wait and set its state to TASK_INTERRUPTIBLE, the completion
will not wake up the sqp thread.
Signed-off-by: Olivier Langlois <[email protected]>
---
fs/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fc8637f591a6..7c545fa66f31 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6902,7 +6902,7 @@ static int io_sq_thread(void *data)
}
prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
- if (!io_sqd_events_pending(sqd)) {
+ if (!io_sqd_events_pending(sqd) && !io_run_task_work()) {
needs_sched = true;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
io_ring_set_wakeup_flag(ctx);
--
2.32.0
The magic number used to cap the number of entries extracted from an
io_uring instance SQ before moving to the other instances is an
interesting parameter to experiment with.
A define has been created to make it easy to change its value from a
single location.
Signed-off-by: Olivier Langlois <[email protected]>
---
fs/io_uring.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7c545fa66f31..e7997f9bf879 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -89,6 +89,7 @@
#define IORING_MAX_ENTRIES 32768
#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES)
+#define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
/*
* Shift of 9 is 512 entries, or exactly one page on 64-bit archs
@@ -6797,8 +6798,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
to_submit = io_sqring_entries(ctx);
/* if we're handling multiple rings, cap submit size for fairness */
- if (cap_entries && to_submit > 8)
- to_submit = 8;
+ if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
+ to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
if (!list_empty(&ctx->iopoll_list) || to_submit) {
unsigned nr_events = 0;
--
2.32.0
On 6/23/21 7:50 PM, Olivier Langlois wrote:
> If an asynchronous completion happens before the task is preparing
> itself to wait and set its state to TASK_INTERRUPTIBLE, the completion
> will not wake up the sqp thread.
Looks good, the bug should be pretty old.
Cc: [email protected]
Reviewed-by: Pavel Begunkov <[email protected]>
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
> fs/io_uring.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index fc8637f591a6..7c545fa66f31 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6902,7 +6902,7 @@ static int io_sq_thread(void *data)
> }
>
> prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
> - if (!io_sqd_events_pending(sqd)) {
> + if (!io_sqd_events_pending(sqd) && !io_run_task_work()) {
> needs_sched = true;
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> io_ring_set_wakeup_flag(ctx);
>
--
Pavel Begunkov
On 6/23/21 7:50 PM, Olivier Langlois wrote:
> The magic number used to cap the number of entries extracted from an
> io_uring instance SQ before moving to the other instances is an
> interesting parameter to experiment with.
>
> A define has been created to make it easy to change its value from a
> single location.
It's better to send fixes separately from other improvements,
because the process a bit different for them, go into different
branches and so on.
Jens, any chance you can pick as is (at least 1/2)?
Reviewed-by: Pavel Begunkov <[email protected]>
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
> fs/io_uring.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 7c545fa66f31..e7997f9bf879 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -89,6 +89,7 @@
>
> #define IORING_MAX_ENTRIES 32768
> #define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES)
> +#define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
>
> /*
> * Shift of 9 is 512 entries, or exactly one page on 64-bit archs
> @@ -6797,8 +6798,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
>
> to_submit = io_sqring_entries(ctx);
> /* if we're handling multiple rings, cap submit size for fairness */
> - if (cap_entries && to_submit > 8)
> - to_submit = 8;
> + if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
> + to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
>
> if (!list_empty(&ctx->iopoll_list) || to_submit) {
> unsigned nr_events = 0;
>
--
Pavel Begunkov
On 6/23/21 4:14 PM, Pavel Begunkov wrote:
> On 6/23/21 7:50 PM, Olivier Langlois wrote:
>> The magic number used to cap the number of entries extracted from an
>> io_uring instance SQ before moving to the other instances is an
>> interesting parameter to experiment with.
>>
>> A define has been created to make it easy to change its value from a
>> single location.
>
> It's better to send fixes separately from other improvements,
> because the process a bit different for them, go into different
> branches and so on.
It's not a huge problem even if they go to different branches,
for these I'd be more comfortable doing 5.14 anyway and that
makes it even less of a concern.
--
Jens Axboe
On 6/23/21 12:50 PM, Olivier Langlois wrote:
> I have been investigated a deadlock situation in my io_uring usage where
> the SQPOLL thread was going to sleep and my user threads were waiting
> inside io_uring_enter() for completions.
>
> https://github.com/axboe/liburing/issues/367
Applied, thanks.
--
Jens Axboe
On 6/23/21 11:24 PM, Jens Axboe wrote:
> On 6/23/21 4:14 PM, Pavel Begunkov wrote:
>> On 6/23/21 7:50 PM, Olivier Langlois wrote:
>>> The magic number used to cap the number of entries extracted from an
>>> io_uring instance SQ before moving to the other instances is an
>>> interesting parameter to experiment with.
>>>
>>> A define has been created to make it easy to change its value from a
>>> single location.
>>
>> It's better to send fixes separately from other improvements,
>> because the process a bit different for them, go into different
>> branches and so on.
>
> It's not a huge problem even if they go to different branches,
> for these I'd be more comfortable doing 5.14 anyway and that
> makes it even less of a concern.
Ok, good to know. I was finding splitting more convenient
as a default option, easier with b4, more confidence that
they apply to the right branch and so on
--
Pavel Begunkov