2022-02-22 05:09:45

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw

On 2/21/22 14:16, Dylan Yudaken wrote:
> In read/write ops, preincrement f_pos when no offset is specified, and
> then attempt fix up the position after IO completes if it completed less
> than expected. This fixes the problem where multiple queued up IO will all
> obtain the same f_pos, and so perform the same read/write.
>
> This is still not as consistent as sync r/w, as it is able to advance the
> file offset past the end of the file. It seems it would be quite a
> performance hit to work around this limitation - such as by keeping track
> of concurrent operations - and the downside does not seem to be too
> problematic.
>
> The attempt to fix up the f_pos after will at least mean that in situations
> where a single operation is run, then the position will be consistent.
>
> Co-developed-by: Jens Axboe <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> Signed-off-by: Dylan Yudaken <[email protected]>
> ---
> fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 68 insertions(+), 13 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index abd8c739988e..a951d0754899 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)

[...]

> + return false;
> }
> }
> - return is_stream ? NULL : &kiocb->ki_pos;
> + *ppos = is_stream ? NULL : &kiocb->ki_pos;
> + return false;
> +}
> +
> +static inline void
> +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64 actual)

That's a lot of inlining, I wouldn't be surprised if the compiler
will even refuse to do that.

io_kiocb_done_pos() {
// rest of it
}

inline io_kiocb_done_pos() {
if (!(flags & CUR_POS));
return;
__io_kiocb_done_pos();
}

io_kiocb_update_pos() is huge as well

> +{
> + u64 expected;
> +
> + if (likely(!(req->flags & REQ_F_CUR_POS)))
> + return;
> +
> + expected = req->rw.len;
> + if (actual >= expected)
> + return;
> +
> + /*
> + * It's not definitely safe to lock here, and the assumption is,
> + * that if we cannot lock the position that it will be changing,
> + * and if it will be changing - then we can't update it anyway
> + */
> + if (req->file->f_mode & FMODE_ATOMIC_POS
> + && !mutex_trylock(&req->file->f_pos_lock))
> + return;
> +
> + /*
> + * now we want to move the pointer, but only if everything is consistent
> + * with how we left it originally
> + */
> + if (req->file->f_pos == kiocb->ki_pos + (expected - actual))
> + req->file->f_pos = kiocb->ki_pos;

I wonder, is it good enough / safe to just assign it considering that
the request was executed outside of locks? vfs_seek()?

> +
> + /* else something else messed with f_pos and we can't do anything */
> +
> + if (req->file->f_mode & FMODE_ATOMIC_POS)
> + mutex_unlock(&req->file->f_pos_lock);
> }

Do we even care about races while reading it? E.g.
pos = READ_ONCE();

>
> - ppos = io_kiocb_update_pos(req, kiocb);
> -
> ret = rw_verify_area(READ, req->file, ppos, req->result);
> if (unlikely(ret)) {
> kfree(iovec);
> + io_kiocb_done_pos(req, kiocb, 0);

Why do we update it on failure?

[...]

> - ppos = io_kiocb_update_pos(req, kiocb);
> -
> ret = rw_verify_area(WRITE, req->file, ppos, req->result);
> if (unlikely(ret))
> goto out_free;
> @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
> return ret ?: -EAGAIN;
> }
> out_free:
> + io_kiocb_done_pos(req, kiocb, 0);

Looks weird. It appears we don't need it on failure and
successes are covered by kiocb_done() / ->ki_complete

> /* it's reportedly faster than delegating the null check to kfree() */
> if (iovec)
> kfree(iovec);

--
Pavel Begunkov


2022-02-22 07:21:01

by Hao Xu

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw


On 2/22/22 02:00, Pavel Begunkov wrote:
> On 2/21/22 14:16, Dylan Yudaken wrote:
>> In read/write ops, preincrement f_pos when no offset is specified, and
>> then attempt fix up the position after IO completes if it completed less
>> than expected. This fixes the problem where multiple queued up IO
>> will all
>> obtain the same f_pos, and so perform the same read/write.
>>
>> This is still not as consistent as sync r/w, as it is able to advance
>> the
>> file offset past the end of the file. It seems it would be quite a
>> performance hit to work around this limitation - such as by keeping
>> track
>> of concurrent operations - and the downside does not seem to be too
>> problematic.
>>
>> The attempt to fix up the f_pos after will at least mean that in
>> situations
>> where a single operation is run, then the position will be consistent.
>>
>> Co-developed-by: Jens Axboe <[email protected]>
>> Signed-off-by: Jens Axboe <[email protected]>
>> Signed-off-by: Dylan Yudaken <[email protected]>
>> ---
>>   fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---------
>>   1 file changed, 68 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index abd8c739988e..a951d0754899 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb
>> *kiocb, ssize_t ret)
>
> [...]
>
>> +            return false;
>>           }
>>       }
>> -    return is_stream ? NULL : &kiocb->ki_pos;
>> +    *ppos = is_stream ? NULL : &kiocb->ki_pos;
>> +    return false;
>> +}
>> +
>> +static inline void
>> +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64
>> actual)
>
> That's a lot of inlining, I wouldn't be surprised if the compiler
> will even refuse to do that.
>
> io_kiocb_done_pos() {
>     // rest of it
> }
>
> inline io_kiocb_done_pos() {
>     if (!(flags & CUR_POS));
>         return;
>     __io_kiocb_done_pos();
> }
>
> io_kiocb_update_pos() is huge as well
>
>> +{
>> +    u64 expected;
>> +
>> +    if (likely(!(req->flags & REQ_F_CUR_POS)))
>> +        return;
>> +
>> +    expected = req->rw.len;
>> +    if (actual >= expected)
>> +        return;
>> +
>> +    /*
>> +     * It's not definitely safe to lock here, and the assumption is,
>> +     * that if we cannot lock the position that it will be changing,
>> +     * and if it will be changing - then we can't update it anyway
>> +     */
>> +    if (req->file->f_mode & FMODE_ATOMIC_POS
>> +        && !mutex_trylock(&req->file->f_pos_lock))
>> +        return;
>> +
>> +    /*
>> +     * now we want to move the pointer, but only if everything is
>> consistent
>> +     * with how we left it originally
>> +     */
>> +    if (req->file->f_pos == kiocb->ki_pos + (expected - actual))
>> +        req->file->f_pos = kiocb->ki_pos;
>
> I wonder, is it good enough / safe to just assign it considering that
> the request was executed outside of locks? vfs_seek()?
>
>> +
>> +    /* else something else messed with f_pos and we can't do
>> anything */
>> +
>> +    if (req->file->f_mode & FMODE_ATOMIC_POS)
>> +        mutex_unlock(&req->file->f_pos_lock);
>>   }
>
> Do we even care about races while reading it? E.g.
> pos = READ_ONCE();
>
>>   -    ppos = io_kiocb_update_pos(req, kiocb);
>> -
>>       ret = rw_verify_area(READ, req->file, ppos, req->result);
>>       if (unlikely(ret)) {
>>           kfree(iovec);
>> +        io_kiocb_done_pos(req, kiocb, 0);
>
> Why do we update it on failure?
It seems like a fallback, if no pos change, fallback file->f_pos to the
original place
>
> [...]
>
>> -    ppos = io_kiocb_update_pos(req, kiocb);
>> -
>>       ret = rw_verify_area(WRITE, req->file, ppos, req->result);
>>       if (unlikely(ret))
>>           goto out_free;
>> @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req,
>> unsigned int issue_flags)
>>           return ret ?: -EAGAIN;
>>       }
>>   out_free:
>> +    io_kiocb_done_pos(req, kiocb, 0);
>
> Looks weird. It appears we don't need it on failure and
> successes are covered by kiocb_done() / ->ki_complete
>
>>       /* it's reportedly faster than delegating the null check to
>> kfree() */
>>       if (iovec)
>>           kfree(iovec);
>

2022-02-22 08:39:03

by Dylan Yudaken

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw

On Mon, 2022-02-21 at 18:00 +0000, Pavel Begunkov wrote:
> On 2/21/22 14:16, Dylan Yudaken wrote:
> > In read/write ops, preincrement f_pos when no offset is specified,
> > and
> > then attempt fix up the position after IO completes if it completed
> > less
> > than expected. This fixes the problem where multiple queued up IO
> > will all
> > obtain the same f_pos, and so perform the same read/write.
> >
> > This is still not as consistent as sync r/w, as it is able to
> > advance the
> > file offset past the end of the file. It seems it would be quite a
> > performance hit to work around this limitation - such as by keeping
> > track
> > of concurrent operations - and the downside does not seem to be too
> > problematic.
> >
> > The attempt to fix up the f_pos after will at least mean that in
> > situations
> > where a single operation is run, then the position will be
> > consistent.
> >
> > Co-developed-by: Jens Axboe <[email protected]>
> > Signed-off-by: Jens Axboe <[email protected]>
> > Signed-off-by: Dylan Yudaken <[email protected]>
> > ---
> >   fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++----
> > -----
> >   1 file changed, 68 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index abd8c739988e..a951d0754899 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb
> > *kiocb, ssize_t ret)
>
> [...]
>
> > +                       return false;
> >                 }
> >         }
> > -       return is_stream ? NULL : &kiocb->ki_pos;
> > +       *ppos = is_stream ? NULL : &kiocb->ki_pos;
> > +       return false;
> > +}
> > +
> > +static inline void
> > +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64
> > actual)
>
> That's a lot of inlining, I wouldn't be surprised if the compiler
> will even refuse to do that.
>
> io_kiocb_done_pos() {
>         // rest of it
> }
>
> inline io_kiocb_done_pos() {
>         if (!(flags & CUR_POS));
>                 return;
>         __io_kiocb_done_pos();
> }
>
> io_kiocb_update_pos() is huge as well

Good idea, will split the slower paths out.

>
> > +{
> > +       u64 expected;
> > +
> > +       if (likely(!(req->flags & REQ_F_CUR_POS)))
> > +               return;
> > +
> > +       expected = req->rw.len;
> > +       if (actual >= expected)
> > +               return;
> > +
> > +       /*
> > +        * It's not definitely safe to lock here, and the
> > assumption is,
> > +        * that if we cannot lock the position that it will be
> > changing,
> > +        * and if it will be changing - then we can't update it
> > anyway
> > +        */
> > +       if (req->file->f_mode & FMODE_ATOMIC_POS
> > +               && !mutex_trylock(&req->file->f_pos_lock))
> > +               return;
> > +
> > +       /*
> > +        * now we want to move the pointer, but only if everything
> > is consistent
> > +        * with how we left it originally
> > +        */
> > +       if (req->file->f_pos == kiocb->ki_pos + (expected -
> > actual))
> > +               req->file->f_pos = kiocb->ki_pos;
>
> I wonder, is it good enough / safe to just assign it considering that
> the request was executed outside of locks? vfs_seek()?

No I do not think so - in the case of multiple r/w the same thing will
happen, even with no vfs_seek().

>
> > +
> > +       /* else something else messed with f_pos and we can't do
> > anything */
> > +
> > +       if (req->file->f_mode & FMODE_ATOMIC_POS)
> > +               mutex_unlock(&req->file->f_pos_lock);
> >   }
>
> Do we even care about races while reading it? E.g.
> pos = READ_ONCE();

I think so - if I remove all the locks the test cases fail.

>
> >  
> > -       ppos = io_kiocb_update_pos(req, kiocb);
> > -
> >         ret = rw_verify_area(READ, req->file, ppos, req->result);
> >         if (unlikely(ret)) {
> >                 kfree(iovec);
> > +               io_kiocb_done_pos(req, kiocb, 0);
>
> Why do we update it on failure?
>
> [...]
>
> > -       ppos = io_kiocb_update_pos(req, kiocb);
> > -
> >         ret = rw_verify_area(WRITE, req->file, ppos, req->result);
> >         if (unlikely(ret))
> >                 goto out_free;
> > @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req,
> > unsigned int issue_flags)
> >                 return ret ?: -EAGAIN;
> >         }
> >   out_free:
> > +       io_kiocb_done_pos(req, kiocb, 0);
>
> Looks weird. It appears we don't need it on failure and
> successes are covered by kiocb_done() / ->ki_complete
>
> >         /* it's reportedly faster than delegating the null check to
> > kfree() */
> >         if (iovec)
> >                 kfree(iovec);
>