LinuxLists.cc - [PATCH 1/1] io_uring: don't wait when under-submitting

2019-12-13 07:52:34

Subject: [PATCH 1/1] io_uring: don't wait when under-submitting

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

Don't wait/poll if can't submit all sqes, and return -EAGAIN

Signed-off-by: Pavel Begunkov <[email protected]>
---

I wonder, why it doesn't return 2 error codes for submission and waiting
separately? It's a bit puzzling figuring out what to return. I guess the
same with the userspace side.

fs/io_uring.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 42de210be631..82152ea13fe2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4877,6 +4877,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
&cur_mm, false);
mutex_unlock(&ctx->uring_lock);
+
+ if (submitted != to_submit) {
+ submitted = -EAGAIN;
+ goto out;
+ }
}
if (flags & IORING_ENTER_GETEVENTS) {
unsigned nr_events = 0;
@@ -4890,6 +4895,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
}
}

+out:
percpu_ref_put(&ctx->refs);
out_fput:
fdput(f);
--
2.24.0

2019-12-13 18:24:06

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 1/1] io_uring: don't wait when under-submitting

On 12/13/19 12:51 AM, Pavel Begunkov wrote:
> There is no reliable way to submit and wait in a single syscall, as
> io_submit_sqes() may under-consume sqes (in case of an early error).
> Then it will wait for not-yet-submitted requests, deadlocking the user
> in most cases.

Why not just cap the wait_nr? If someone does to_submit = 8, wait_nr = 8,
and we only submit 4, just wait for 4? Ala:

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 81219a631a6d..4a76ccbb7856 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5272,6 +5272,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
&cur_mm, false);
mutex_unlock(&ctx->uring_lock);
+ if (submitted <= 0)
+ goto done;
+ if (submitted != to_submit && min_complete > submitted)
+ min_complete = submitted;
}
if (flags & IORING_ENTER_GETEVENTS) {
unsigned nr_events = 0;
@@ -5284,7 +5288,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
}
}
-
+done:
percpu_ref_put(&ctx->refs);
out_fput:
fdput(f);

--
Jens Axboe

2019-12-13 18:33:39

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 1/1] io_uring: don't wait when under-submitting

On 12/13/19 11:22 AM, Jens Axboe wrote:
> On 12/13/19 12:51 AM, Pavel Begunkov wrote:
>> There is no reliable way to submit and wait in a single syscall, as
>> io_submit_sqes() may under-consume sqes (in case of an early error).
>> Then it will wait for not-yet-submitted requests, deadlocking the user
>> in most cases.
>
> Why not just cap the wait_nr? If someone does to_submit = 8, wait_nr = 8,
> and we only submit 4, just wait for 4? Ala:
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 81219a631a6d..4a76ccbb7856 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5272,6 +5272,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
> submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
> &cur_mm, false);
> mutex_unlock(&ctx->uring_lock);
> + if (submitted <= 0)
> + goto done;
> + if (submitted != to_submit && min_complete > submitted)
> + min_complete = submitted;
> }
> if (flags & IORING_ENTER_GETEVENTS) {
> unsigned nr_events = 0;
> @@ -5284,7 +5288,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
> ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
> }
> }
> -
> +done:
> percpu_ref_put(&ctx->refs);
> out_fput:
> fdput(f);
>

This is probably a bit cleaner, since it only adjusts if we're going to
wait.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 81219a631a6d..e262549a2601 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5272,11 +5272,15 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
&cur_mm, false);
mutex_unlock(&ctx->uring_lock);
+ if (submitted <= 0)
+ goto done;
}
if (flags & IORING_ENTER_GETEVENTS) {
unsigned nr_events = 0;

min_complete = min(min_complete, ctx->cq_entries);
+ if (submitted != to_submit && min_complete > submitted)
+ min_complete = submitted;

if (ctx->flags & IORING_SETUP_IOPOLL) {
ret = io_iopoll_check(ctx, &nr_events, min_complete);
@@ -5284,7 +5288,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
}
}
-
+done:
percpu_ref_put(&ctx->refs);
out_fput:
fdput(f);

--
Jens Axboe

2019-12-13 21:34:15

by Pavel Begunkov

[permalink] [raw]

Subject: Re: [PATCH 1/1] io_uring: don't wait when under-submitting

On 13/12/2019 21:32, Jens Axboe wrote:
> On 12/13/19 11:22 AM, Jens Axboe wrote:
>> On 12/13/19 12:51 AM, Pavel Begunkov wrote:
>>> There is no reliable way to submit and wait in a single syscall, as
>>> io_submit_sqes() may under-consume sqes (in case of an early error).
>>> Then it will wait for not-yet-submitted requests, deadlocking the user
>>> in most cases.
>>
>> Why not just cap the wait_nr? If someone does to_submit = 8, wait_nr = 8,
>> and we only submit 4, just wait for 4? Ala:
>>

Is it worth entangling the code? I don't expect anyone trying to recover,
maybe except full reset/restart. So, failing ASAP seemed to me as the
right thing to do. It may also mean nothing to the user if e.g.
submit(1), submit(1), ..., submit_and_wait(1, n)

Anyway, this shouldn't even happen in a not buggy code, so I'm fine with
any version as long as it doesn't lock up. I'll resend if you still prefer
to cap it.

>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 81219a631a6d..4a76ccbb7856 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5272,6 +5272,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>> submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
>> &cur_mm, false);
>> mutex_unlock(&ctx->uring_lock);
>> + if (submitted <= 0)
>> + goto done;
>> + if (submitted != to_submit && min_complete > submitted)
>> + min_complete = submitted;
>> }
>> if (flags & IORING_ENTER_GETEVENTS) {
>> unsigned nr_events = 0;
>> @@ -5284,7 +5288,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>> ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
>> }
>> }
>> -
>> +done:
>> percpu_ref_put(&ctx->refs);
>> out_fput:
>> fdput(f);
>>
>
> This is probably a bit cleaner, since it only adjusts if we're going to
> wait.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 81219a631a6d..e262549a2601 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5272,11 +5272,15 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
> submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
> &cur_mm, false);
> mutex_unlock(&ctx->uring_lock);
> + if (submitted <= 0)
> + goto done;
> }
> if (flags & IORING_ENTER_GETEVENTS) {
> unsigned nr_events = 0;
>
> min_complete = min(min_complete, ctx->cq_entries);
> + if (submitted != to_submit && min_complete > submitted)
> + min_complete = submitted;
>
> if (ctx->flags & IORING_SETUP_IOPOLL) {
> ret = io_iopoll_check(ctx, &nr_events, min_complete);
> @@ -5284,7 +5288,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
> ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
> }
> }
> -
> +done:
> percpu_ref_put(&ctx->refs);
> out_fput:
> fdput(f);
>

--
Pavel Begunkov

2019-12-13 21:49:33

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 1/1] io_uring: don't wait when under-submitting

> On Dec 13, 2019, at 2:32 PM, Pavel Begunkov <[email protected]> wrote:
>
> On 13/12/2019 21:32, Jens Axboe wrote:
>>> On 12/13/19 11:22 AM, Jens Axboe wrote:
>>> On 12/13/19 12:51 AM, Pavel Begunkov wrote:
>>>> There is no reliable way to submit and wait in a single syscall, as
>>>> io_submit_sqes() may under-consume sqes (in case of an early error).
>>>> Then it will wait for not-yet-submitted requests, deadlocking the user
>>>> in most cases.
>>>
>>> Why not just cap the wait_nr? If someone does to_submit = 8, wait_nr = 8,
>>> and we only submit 4, just wait for 4? Ala:
>>>
>
> Is it worth entangling the code? I don't expect anyone trying to recover,
> maybe except full reset/restart. So, failing ASAP seemed to me as the
> right thing to do. It may also mean nothing to the user if e.g.
> submit(1), submit(1), ..., submit_and_wait(1, n)
>
> Anyway, this shouldn't even happen in a not buggy code, so I'm fine with
> any version as long as it doesn't lock up. I'll resend if you still prefer
> to cap it.

I like the cap version a lot better, and in fact we did used to have that but lost it early. I like it behaviorally a lot better, too.

Can you resend? Thanks!

2019-12-14 14:33:19

by Pavel Begunkov

[permalink] [raw]

Subject: [PATCH v2] io_uring: don't wait when under-submitting

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

Don't wait/poll if can't submit all sqes, and return -EAGAIN

Signed-off-by: Pavel Begunkov <[email protected]>
---

v2: cap min_complete if submitted partially (Jens Axboe)

fs/io_uring.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c2f66e30a812..4c281f382bec 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3526,11 +3526,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
unsigned int sqe_flags;

req = io_get_req(ctx, statep);
- if (unlikely(!req)) {
- if (!submitted)
- submitted = -EAGAIN;
+ if (unlikely(!req))
break;
- }
if (!io_get_sqring(ctx, req)) {
__io_free_req(req);
break;
@@ -4910,6 +4907,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
&cur_mm, false);
mutex_unlock(&ctx->uring_lock);
+
+ if (submitted != to_submit) {
+ if (!submitted) {
+ submitted = -EAGAIN;
+ goto done;
+ }
+ min_complete = min(min_complete, (u32)submitted);
+ }
}
if (flags & IORING_ENTER_GETEVENTS) {
unsigned nr_events = 0;
@@ -4922,7 +4927,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
}
}
-
+done:
percpu_ref_put(&ctx->refs);
out_fput:
fdput(f);
--
2.24.0

2019-12-14 14:42:06

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH v2] io_uring: don't wait when under-submitting

On 12/14/19 7:31 AM, Pavel Begunkov wrote:
> There is no reliable way to submit and wait in a single syscall, as
> io_submit_sqes() may under-consume sqes (in case of an early error).
> Then it will wait for not-yet-submitted requests, deadlocking the user
> in most cases.
>
> Don't wait/poll if can't submit all sqes, and return -EAGAIN
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
>
> v2: cap min_complete if submitted partially (Jens Axboe)

Can you update the commit message as well, doesn't really reflect the
current state of it... Apart from that, looks good to me!

--
Jens Axboe

2019-12-14 14:47:02

by Pavel Begunkov

[permalink] [raw]

Subject: Re: [PATCH v2] io_uring: don't wait when under-submitting

On 14/12/2019 17:39, Jens Axboe wrote:
> On 12/14/19 7:31 AM, Pavel Begunkov wrote:
>> There is no reliable way to submit and wait in a single syscall, as
>> io_submit_sqes() may under-consume sqes (in case of an early error).
>> Then it will wait for not-yet-submitted requests, deadlocking the user
>> in most cases.
>>
>> Don't wait/poll if can't submit all sqes, and return -EAGAIN
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>>
>> v2: cap min_complete if submitted partially (Jens Axboe)
>
> Can you update the commit message as well, doesn't really reflect the
> current state of it... Apart from that, looks good to me!
>
Right, thanks for noticing!

--
Pavel Begunkov

Attachments:

signature.asc (849.00 B)
OpenPGP digital signature

2019-12-14 14:54:43

by Pavel Begunkov

[permalink] [raw]

Subject: [PATCH v3] io_uring: don't wait when under-submitting

There is no reliable way to submit and wait in a single syscall, as
io_submit_sqes() may under-consume sqes (in case of an early error).
Then it will wait for not-yet-submitted requests, deadlocking the user
in most cases.

In such cases adjust min_complete, so it won't wait for more than
what have been submitted in the current call to io_uring_enter(). It
may be less than totally in-flight including previous submissions,
but this shouldn't do harm and up to a user.

Signed-off-by: Pavel Begunkov <[email protected]>
---

v2: cap min_complete if submitted partially (Jens Axboe)
v3: update commit message (Jens Axboe)

fs/io_uring.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 81219a631a6d..5dfc805ec31c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3763,11 +3763,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
unsigned int sqe_flags;

req = io_get_req(ctx, statep);
- if (unlikely(!req)) {
- if (!submitted)
- submitted = -EAGAIN;
+ if (unlikely(!req))
break;
- }
if (!io_get_sqring(ctx, req)) {
__io_free_req(req);
break;
@@ -5272,6 +5269,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
&cur_mm, false);
mutex_unlock(&ctx->uring_lock);
+
+ if (submitted != to_submit) {
+ if (!submitted) {
+ submitted = -EAGAIN;
+ goto done;
+ }
+ min_complete = min(min_complete, (u32)submitted);
+ }
}
if (flags & IORING_ENTER_GETEVENTS) {
unsigned nr_events = 0;
@@ -5284,7 +5289,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
ret = io_cqring_wait(ctx, min_complete, sig, sigsz);
}
}
-
+done:
percpu_ref_put(&ctx->refs);
out_fput:
fdput(f);
--
2.24.0

2019-12-14 18:46:01

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH v3] io_uring: don't wait when under-submitting

On 12/14/19 7:53 AM, Pavel Begunkov wrote:
> There is no reliable way to submit and wait in a single syscall, as
> io_submit_sqes() may under-consume sqes (in case of an early error).
> Then it will wait for not-yet-submitted requests, deadlocking the user
> in most cases.
>
> In such cases adjust min_complete, so it won't wait for more than
> what have been submitted in the current call to io_uring_enter(). It
> may be less than totally in-flight including previous submissions,
> but this shouldn't do harm and up to a user.

Thanks, applied.

--
Jens Axboe

2019-12-15 05:44:41

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH v3] io_uring: don't wait when under-submitting

On 12/14/19 11:43 AM, Jens Axboe wrote:
> On 12/14/19 7:53 AM, Pavel Begunkov wrote:
>> There is no reliable way to submit and wait in a single syscall, as
>> io_submit_sqes() may under-consume sqes (in case of an early error).
>> Then it will wait for not-yet-submitted requests, deadlocking the user
>> in most cases.
>>
>> In such cases adjust min_complete, so it won't wait for more than
>> what have been submitted in the current call to io_uring_enter(). It
>> may be less than totally in-flight including previous submissions,
>> but this shouldn't do harm and up to a user.
>
> Thanks, applied.

This causes a behavioral change where if you ask to submit 1 but
there's nothing in the SQ ring, then you would get 0 before. Now
you get -EAGAIN. This doesn't make a lot of sense, since there's no
point in retrying as that won't change anything.

Can we please just do something like the one I sent, instead of trying
to over-complicate it?

--
Jens Axboe

2019-12-15 15:49:40

by Pavel Begunkov

[permalink] [raw]

Subject: Re: [PATCH v3] io_uring: don't wait when under-submitting

On 15/12/2019 08:42, Jens Axboe wrote:
> On 12/14/19 11:43 AM, Jens Axboe wrote:
>> On 12/14/19 7:53 AM, Pavel Begunkov wrote:
>>> There is no reliable way to submit and wait in a single syscall, as
>>> io_submit_sqes() may under-consume sqes (in case of an early error).
>>> Then it will wait for not-yet-submitted requests, deadlocking the user
>>> in most cases.
>>>
>>> In such cases adjust min_complete, so it won't wait for more than
>>> what have been submitted in the current call to io_uring_enter(). It
>>> may be less than totally in-flight including previous submissions,
>>> but this shouldn't do harm and up to a user.
>>
>> Thanks, applied.
>
> This causes a behavioral change where if you ask to submit 1 but
> there's nothing in the SQ ring, then you would get 0 before. Now
> you get -EAGAIN. This doesn't make a lot of sense, since there's no
> point in retrying as that won't change anything.
>
> Can we please just do something like the one I sent, instead of trying
> to over-complicate it?
>

Ok, when I get to a compiler.

--
Pavel Begunkov

Attachments:

signature.asc (849.00 B)
OpenPGP digital signature

2019-12-15 21:34:35

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH v3] io_uring: don't wait when under-submitting

On 12/15/19 8:48 AM, Pavel Begunkov wrote:
> On 15/12/2019 08:42, Jens Axboe wrote:
>> On 12/14/19 11:43 AM, Jens Axboe wrote:
>>> On 12/14/19 7:53 AM, Pavel Begunkov wrote:
>>>> There is no reliable way to submit and wait in a single syscall, as
>>>> io_submit_sqes() may under-consume sqes (in case of an early error).
>>>> Then it will wait for not-yet-submitted requests, deadlocking the user
>>>> in most cases.
>>>>
>>>> In such cases adjust min_complete, so it won't wait for more than
>>>> what have been submitted in the current call to io_uring_enter(). It
>>>> may be less than totally in-flight including previous submissions,
>>>> but this shouldn't do harm and up to a user.
>>>
>>> Thanks, applied.
>>
>> This causes a behavioral change where if you ask to submit 1 but
>> there's nothing in the SQ ring, then you would get 0 before. Now
>> you get -EAGAIN. This doesn't make a lot of sense, since there's no
>> point in retrying as that won't change anything.
>>
>> Can we please just do something like the one I sent, instead of trying
>> to over-complicate it?
>>
>
> Ok, when I get to a compiler.

Great, thanks. BTW, I noticed when a regression test failed.

--
Jens Axboe