2021-06-15 10:03:19

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH] io_uring: store back buffer in case of failure

On 6/11/21 8:46 PM, Olivier Langlois wrote:
> the man page says the following:
>
> If succesful, the resulting CQE will have IORING_CQE_F_BUFFER set
> in the flags part of the struct, and the upper IORING_CQE_BUFFER_SHIFT
> bits will contain the ID of the selected buffers.
>
> in order to respect this contract, the buffer is stored back in case of
> an error. There are several reasons to do that:
>
> 1. doing otherwise is counter-intuitive and error-prone (I cannot think
> of a single example of a syscall failing and still require the user to
> free the allocated resources). Especially when the documention
> explicitly mention that this is the behavior to expect.
>
> 2. it is inefficient because the buffer is unneeded since there is no
> data to transfer back to the user and the buffer will need to be
> returned back to io_uring to avoid a leak.
>
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
> fs/io_uring.c | 97 +++++++++++++++++++++++++++++++++------------------
> 1 file changed, 64 insertions(+), 33 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 42380ed563c4..502d7cd81a8c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
[...]
> +static unsigned int io_put_kbuf(struct io_kiocb *req, struct io_buffer *kbuf,
> + u16 bgid, long res, unsigned int issue_flags)
> {
> unsigned int cflags;
> + struct io_ring_ctx *ctx = req->ctx;
> + struct io_buffer *head;
> + bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>
> cflags = kbuf->bid << IORING_CQE_BUFFER_SHIFT;
> cflags |= IORING_CQE_F_BUFFER;
> req->flags &= ~REQ_F_BUFFER_SELECTED;
> - kfree(kbuf);
> +
> + /*
> + * Theoritically, res == 0 could be included as well but that would
> + * break the contract established in the man page saying that
> + * a buffer is returned if the operation is successful.
> + */
> + if (unlikely(res < 0)) {
> + io_ring_submit_lock(ctx, !force_nonblock);

io_complete_rw() is called from an IRQ context, so it can't sleep/wait.

> +
> + lockdep_assert_held(&ctx->uring_lock);
> +
> + head = xa_load(&ctx->io_buffers, bgid);
> + if (head) {
> + list_add_tail(&kbuf->list, &head->list);
> + cflags = 0;
> + } else {
> + INIT_LIST_HEAD(&kbuf->list);
> + if (!xa_insert(&ctx->io_buffers, bgid, kbuf, GFP_KERNEL))
> + cflags = 0;
> + }
> + io_ring_submit_unlock(ctx, !force_nonblock);
> + }
> + if (cflags)
> + kfree(kbuf);
> return cflags;
> }
>
[...]
> -static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
> +static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret,
> + unsigned int issue_flags)
> {
> switch (ret) {
> case -EIOCBQUEUED:
> @@ -2728,7 +2775,7 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
> ret = -EINTR;
> fallthrough;
> default:
> - kiocb->ki_complete(kiocb, ret, 0);
> + kiocb->ki_complete(kiocb, ret, issue_flags);

Don't remember what the second argument of .ki_complete is for,
but it definitely should not be used for issue_flags. E.g. because
we get two different meanings for it depending on a context --
either a result from the block layer or issue_flags.

--
Pavel Begunkov


2021-06-15 21:54:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] io_uring: store back buffer in case of failure

Ditto for this one, don't see it in my email nor on the list.

--
Jens Axboe

2021-06-16 19:51:26

by Olivier Langlois

[permalink] [raw]
Subject: Re: [PATCH] io_uring: store back buffer in case of failure

On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
> Ditto for this one, don't see it in my email nor on the list.
>
I can resend you a private copy of this one but as Pavel pointed out,
it contains fatal flaws.

So unless someone can tell me that the idea is interesting and has
potential and can give me some a hint or 2 about how to address the
challenges to fix the current flaws, it is pretty much a show stopper
to me and I think that I am going to let it go...


2021-06-16 19:54:18

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH] io_uring: store back buffer in case of failure

On 6/16/21 2:42 PM, Olivier Langlois wrote:
> On Tue, 2021-06-15 at 15:51 -0600, Jens Axboe wrote:
>> Ditto for this one, don't see it in my email nor on the list.
>>
> I can resend you a private copy of this one but as Pavel pointed out,
> it contains fatal flaws.
>
> So unless someone can tell me that the idea is interesting and has
> potential and can give me some a hint or 2 about how to address the
> challenges to fix the current flaws, it is pretty much a show stopper
> to me and I think that I am going to let it go...

It'd need to go through some other context, e.g. task context.
task_work_add() + custom handler would work, either buf-select
synchronisation can be reworked, but both would rather be
bulky and not great.

I'm more curious if we ever hit REQ_F_BUFFER_SELECTED branch in
io_clean_op(), because it would just drop the id on the floor...
It might use a WARN_ONCE.

--
Pavel Begunkov