2021-06-22 20:46:37

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] io_uring: Fix race condition when sqp thread goes to sleep

On 6/22/21 7:55 PM, Olivier Langlois wrote:
> If an asynchronous completion happens before the task is preparing
> itself to wait and set its state to TASK_INTERRUPTIBLE, the completion
> will not wake up the sqp thread.
>
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
> fs/io_uring.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index fc8637f591a6..02f789e07d4c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6902,7 +6902,7 @@ static int io_sq_thread(void *data)
> }
>
> prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
> - if (!io_sqd_events_pending(sqd)) {
> + if (!io_sqd_events_pending(sqd) && !current->task_works) {

Agree that it should be here, but we also lack a good enough
task_work_run() around, and that may send the task burn CPU
for a while in some cases. Let's do

if (!io_sqd_events_pending(sqd) && !io_run_task_work())
...

fwiw, no need to worry about TASK_INTERRUPTIBLE as
io_run_task_work() sets it to TASK_RUNNING.

> needs_sched = true;
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> io_ring_set_wakeup_flag(ctx);
>

--
Pavel Begunkov


2021-06-22 22:38:03

by Olivier Langlois

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] io_uring: Fix race condition when sqp thread goes to sleep

On Tue, 2021-06-22 at 21:45 +0100, Pavel Begunkov wrote:
> On 6/22/21 7:55 PM, Olivier Langlois wrote:
> > If an asynchronous completion happens before the task is preparing
> > itself to wait and set its state to TASK_INTERRUPTIBLE, the
> > completion
> > will not wake up the sqp thread.
> >
> > Signed-off-by: Olivier Langlois <[email protected]>
> > ---
> > ?fs/io_uring.c | 2 +-
> > ?1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index fc8637f591a6..02f789e07d4c 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -6902,7 +6902,7 @@ static int io_sq_thread(void *data)
> > ????????????????}
> > ?
> > ????????????????prepare_to_wait(&sqd->wait, &wait,
> > TASK_INTERRUPTIBLE);
> > -???????????????if (!io_sqd_events_pending(sqd)) {
> > +???????????????if (!io_sqd_events_pending(sqd) && !current-
> > >task_works) {
>
> Agree that it should be here, but we also lack a good enough
> task_work_run() around, and that may send the task burn CPU
> for a while in some cases. Let's do
>
> if (!io_sqd_events_pending(sqd) && !io_run_task_work())
> ?? ...

I can do that if you want but considering that the function is inline
and the race condition is a relatively rare occurence, is the cost
coming with inline expansion really worth it in this case?
>
> fwiw, no need to worry about TASK_INTERRUPTIBLE as
> io_run_task_work() sets it to TASK_RUNNING.

I wasn't worried about that as I believe that finish_wait() is taking
care the state as well.

What I wasn't sure about was if the patch was sufficient to totally
eliminate the race condition.

I had to educate myself about how schedule() works to appreciate its
design and convince myself that the patch was good.
>
> > ????????????????????????needs_sched = true;
> > ????????????????????????list_for_each_entry(ctx, &sqd->ctx_list,
> > sqd_list) {
> > ????????????????????????????????io_ring_set_wakeup_flag(ctx);
> >
>


2021-06-22 22:44:07

by Olivier Langlois

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] io_uring: Fix race condition when sqp thread goes to sleep

On Tue, 2021-06-22 at 18:37 -0400, Olivier Langlois wrote:
> On Tue, 2021-06-22 at 21:45 +0100, Pavel Begunkov wrote:
> > On 6/22/21 7:55 PM, Olivier Langlois wrote:
> > > If an asynchronous completion happens before the task is
> > > preparing
> > > itself to wait and set its state to TASK_INTERRUPTIBLE, the
> > > completion
> > > will not wake up the sqp thread.
> > >
> > > Signed-off-by: Olivier Langlois <[email protected]>
> > > ---
> > > ?fs/io_uring.c | 2 +-
> > > ?1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > > index fc8637f591a6..02f789e07d4c 100644
> > > --- a/fs/io_uring.c
> > > +++ b/fs/io_uring.c
> > > @@ -6902,7 +6902,7 @@ static int io_sq_thread(void *data)
> > > ????????????????}
> > > ?
> > > ????????????????prepare_to_wait(&sqd->wait, &wait,
> > > TASK_INTERRUPTIBLE);
> > > -???????????????if (!io_sqd_events_pending(sqd)) {
> > > +???????????????if (!io_sqd_events_pending(sqd) && !current-
> > > > task_works) {
> >
> > Agree that it should be here, but we also lack a good enough
> > task_work_run() around, and that may send the task burn CPU
> > for a while in some cases. Let's do
> >
> > if (!io_sqd_events_pending(sqd) && !io_run_task_work())
> > ?? ...
>
> I can do that if you want but considering that the function is inline
> and the race condition is a relatively rare occurence, is the cost
> coming with inline expansion really worth it in this case?
> >
On hand, there is the inline expansion concern.

OTOH, the benefit of going with your suggestion is that completions
generally precedes new submissions so yes, it might be better that way.

I'm really unsure about this. I'm just raising the concern and I'll let
you make the final decision...


2021-06-22 23:04:16

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] io_uring: Fix race condition when sqp thread goes to sleep

On 6/22/21 11:42 PM, Olivier Langlois wrote:
> On Tue, 2021-06-22 at 18:37 -0400, Olivier Langlois wrote:
>> On Tue, 2021-06-22 at 21:45 +0100, Pavel Begunkov wrote:
>>> On 6/22/21 7:55 PM, Olivier Langlois wrote:
>>>> If an asynchronous completion happens before the task is
>>>> preparing
>>>> itself to wait and set its state to TASK_INTERRUPTIBLE, the
>>>> completion
>>>> will not wake up the sqp thread.
>>>>
>>>> Signed-off-by: Olivier Langlois <[email protected]>
>>>> ---
>>>>  fs/io_uring.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index fc8637f591a6..02f789e07d4c 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -6902,7 +6902,7 @@ static int io_sq_thread(void *data)
>>>>                 }
>>>>  
>>>>                 prepare_to_wait(&sqd->wait, &wait,
>>>> TASK_INTERRUPTIBLE);
>>>> -               if (!io_sqd_events_pending(sqd)) {
>>>> +               if (!io_sqd_events_pending(sqd) && !current-
>>>>> task_works) {
>>>
>>> Agree that it should be here, but we also lack a good enough
>>> task_work_run() around, and that may send the task burn CPU
>>> for a while in some cases. Let's do
>>>
>>> if (!io_sqd_events_pending(sqd) && !io_run_task_work())
>>>    ...
>>
>> I can do that if you want but considering that the function is inline
>> and the race condition is a relatively rare occurence, is the cost
>> coming with inline expansion really worth it in this case?
>>>
> On hand, there is the inline expansion concern.
>
> OTOH, the benefit of going with your suggestion is that completions
> generally precedes new submissions so yes, it might be better that way.
>
> I'm really unsure about this. I'm just raising the concern and I'll let
> you make the final decision...

It seems it may actually loop infinitely until it gets a signal,
so yes. And even if not, rare stalls are nasty, they will ruin
some 9s of latency and hard to catch.

That part is quite cold anyway, would generate some extra cold
instructions, meh

--
Pavel Begunkov

2021-06-23 13:54:14

by Olivier Langlois

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] io_uring: Fix race condition when sqp thread goes to sleep

On Wed, 2021-06-23 at 00:03 +0100, Pavel Begunkov wrote:
> On 6/22/21 11:42 PM, Olivier Langlois wrote:
> > On Tue, 2021-06-22 at 18:37 -0400, Olivier Langlois wrote:
> > > On Tue, 2021-06-22 at 21:45 +0100, Pavel Begunkov wrote:
> > >
> > >
> > > I can do that if you want but considering that the function is
> > > inline
> > > and the race condition is a relatively rare occurence, is the
> > > cost
> > > coming with inline expansion really worth it in this case?
> > > >
> > On hand, there is the inline expansion concern.
> >
> > OTOH, the benefit of going with your suggestion is that completions
> > generally precedes new submissions so yes, it might be better that
> > way.
> >
> > I'm really unsure about this. I'm just raising the concern and I'll
> > let
> > you make the final decision...
>
> It seems it may actually loop infinitely until it gets a signal,
> so yes. And even if not, rare stalls are nasty, they will ruin
> some 9s of latency and hard to catch.
>
> That part is quite cold anyway, would generate some extra cold
> instructions, meh
>
I'm not 100% sure to see the infinite loop possibility but I guess that
with some badly placed preemptions, it could take few iterations before
entering the block:

if (sqt_spin || !time_after(jiffies, timeout)) {

So I will go ahead with your suggestion.

I'll retest the new patch version (it should be a formality) and I'll
resend an update once done.

Greetings,


2021-06-23 16:04:04

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 1/2 v2] io_uring: Fix race condition when sqp thread goes to sleep

On 6/23/21 2:52 PM, Olivier Langlois wrote:
> On Wed, 2021-06-23 at 00:03 +0100, Pavel Begunkov wrote:
>> On 6/22/21 11:42 PM, Olivier Langlois wrote:
>>> On Tue, 2021-06-22 at 18:37 -0400, Olivier Langlois wrote:
>>>> On Tue, 2021-06-22 at 21:45 +0100, Pavel Begunkov wrote:
>>>>
>>>>
>>>> I can do that if you want but considering that the function is
>>>> inline
>>>> and the race condition is a relatively rare occurence, is the
>>>> cost
>>>> coming with inline expansion really worth it in this case?
>>>>>
>>> On hand, there is the inline expansion concern.
>>>
>>> OTOH, the benefit of going with your suggestion is that completions
>>> generally precedes new submissions so yes, it might be better that
>>> way.
>>>
>>> I'm really unsure about this. I'm just raising the concern and I'll
>>> let
>>> you make the final decision...
>>
>> It seems it may actually loop infinitely until it gets a signal,
>> so yes. And even if not, rare stalls are nasty, they will ruin
>> some 9s of latency and hard to catch.
>>
>> That part is quite cold anyway, would generate some extra cold
>> instructions, meh
>>
> I'm not 100% sure to see the infinite loop possibility but I guess that
> with some badly placed preemptions, it could take few iterations before
> entering the block:
>
> if (sqt_spin || !time_after(jiffies, timeout)) {

Had a case in mind, but looking through the branches it can't
really happen. Agree that won't be infinite in real life, until
we start using (and there was an RFC) finer grained timeouts.

In any case for several reasons think it's the right thing to do.

> So I will go ahead with your suggestion.
>
> I'll retest the new patch version (it should be a formality) and I'll
> resend an update once done.

Perfect

--
Pavel Begunkov