2020-03-13 17:52:05

by Jens Axboe

[permalink] [raw]
Subject: [GIT PULL] io_uring fixes for 5.6-rc

Hi Linus,

Just a single fix here, improving the RCU callback ordering from last
week. After a bit more perusing by Paul, he poked a hole in the
original.

Please pull!


git://git.kernel.dk/linux-block.git tags/io_uring-5.6-2020-03-13


----------------------------------------------------------------
Jens Axboe (1):
io_uring: ensure RCU callback ordering with rcu_barrier()

fs/io_uring.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)

--
Jens Axboe


2020-03-13 20:22:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] io_uring fixes for 5.6-rc

On Fri, Mar 13, 2020 at 10:50 AM Jens Axboe <[email protected]> wrote:
>
> Just a single fix here, improving the RCU callback ordering from last
> week. After a bit more perusing by Paul, he poked a hole in the
> original.

Ouch.

If I read this patch correctly, you're now adding a rcu_barrier() onto
the system workqueue for each io_uring context freeing op.

This makes me worry:

- I think system_wq is unordered, so does it even guarantee that the
rcu_barrier happens after whatever work you're expecting it to be
after?

Or is it using a workqueue not because it wants to serialize with any
other work, but because it needs to use rcu_barrier in a context where
it can't sleep?

But the commit message does seem to imply that ordering is important..

- doesn't this have the potential to flood the system_wq be full of
flushing things that all could take a while..

I've pulled it, and it may all be correct, just chalk this message up
to "Linus got nervous looking at it".

Added Paul and Tejun to the participants explicitly.

Linus

2020-03-13 20:28:29

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] io_uring fixes for 5.6-rc

The pull request you sent on Fri, 13 Mar 2020 11:50:42 -0600:

> git://git.kernel.dk/linux-block.git tags/io_uring-5.6-2020-03-13

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5007928eaeb7681501e94ac7516f6c6200f993fa

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

2020-03-13 20:36:37

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL] io_uring fixes for 5.6-rc

On Fri, Mar 13, 2020 at 01:18:30PM -0700, Linus Torvalds wrote:
> On Fri, Mar 13, 2020 at 10:50 AM Jens Axboe <[email protected]> wrote:
> >
> > Just a single fix here, improving the RCU callback ordering from last
> > week. After a bit more perusing by Paul, he poked a hole in the
> > original.
>
> Ouch.
>
> If I read this patch correctly, you're now adding a rcu_barrier() onto
> the system workqueue for each io_uring context freeing op.
>
> This makes me worry:
>
> - I think system_wq is unordered, so does it even guarantee that the
> rcu_barrier happens after whatever work you're expecting it to be
> after?
>
> Or is it using a workqueue not because it wants to serialize with any
> other work, but because it needs to use rcu_barrier in a context where
> it can't sleep?
>
> But the commit message does seem to imply that ordering is important..
>
> - doesn't this have the potential to flood the system_wq be full of
> flushing things that all could take a while..
>
> I've pulled it, and it may all be correct, just chalk this message up
> to "Linus got nervous looking at it".
>
> Added Paul and Tejun to the participants explicitly.

The idea is that rcu_barrier() waits for callbacks from all earlier
call_rcu()s to be invoked. So as long as you know that the call_rcu()
happened earlier than the rcu_barrier(), the rcu_barrier() is guaranteed
to wait for that call_rcu()'s callback.

In this case (and Jens will correct me in the sadly likely event that
I get the story confused), we have a call_rcu() followed by scheduling
work on that same task. The work has to start executing after it was
scheduled, so if that work does an rcu_barrier(), then that rcu_barrier()
will wait on the call_rcu()'s callback to be invoked.

Jens could invoke the rcu_barrier() just before scheduling the work,
but the synchronous delay from the rcu_barrier() is a problem.

Jens, what did I mess up in the above story? ;-)

I defer to Jens and Tejun on the possibility of ending up with all
workqueue kthreads waiting on rcu_barrier(). If that is a problem,
there are some ways of dealing with it, though none that I can think of
that come for free.

Thanx, Paul

2020-03-13 21:35:18

by Jens Axboe

[permalink] [raw]
Subject: Re: [GIT PULL] io_uring fixes for 5.6-rc

On 3/13/20 2:18 PM, Linus Torvalds wrote:
> On Fri, Mar 13, 2020 at 10:50 AM Jens Axboe <[email protected]> wrote:
>>
>> Just a single fix here, improving the RCU callback ordering from last
>> week. After a bit more perusing by Paul, he poked a hole in the
>> original.
>
> Ouch.
>
> If I read this patch correctly, you're now adding a rcu_barrier() onto
> the system workqueue for each io_uring context freeing op.

It's actually not quite that bad, it's for every context that's used
registered file. That will generally be long term use cases, like server
backend kind of stuff, not for short lived or "normal" use cases.

> This makes me worry:
>
> - I think system_wq is unordered, so does it even guarantee that the
> rcu_barrier happens after whatever work you're expecting it to be
> after?

The ordering is wrt an rcu callback that's already queued. So we don't
care about ordering of other work at all, we just care about issuing
that rcu_barrier() before we exit + free, so we know that the existing
(if any) rcu callback has run.

> Or is it using a workqueue not because it wants to serialize with any
> other work, but because it needs to use rcu_barrier in a context where
> it can't sleep?

Really just using a workqueue because we already have one for this
particular item, and that takes the latency of needing the rcu barrier
out of the fast path for the application.

> But the commit message does seem to imply that ordering is important..

Only for a previous rcu callback, not for work items!

> - doesn't this have the potential to flood the system_wq be full of
> flushing things that all could take a while..
>
> I've pulled it, and it may all be correct, just chalk this message up
> to "Linus got nervous looking at it".

All good, always appreciate extra eyes on it! We could do the
rcu_barrier() inline and just take the hit there, and there's also room
to be a bit smarter and only do the barrier if we know we have to. But
since this is 5.6 material, I didn't want to complicate things further.

--
Jens Axboe