2019-09-22 19:30:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting

On 9/22/19 2:08 AM, Pavel Begunkov (Silence) wrote:
> From: Pavel Begunkov <[email protected]>
>
> There could be a lot of overhead within generic wait_event_*() used for
> waiting for large number of completions. The patchset removes much of
> it by using custom wait event (wait_threshold).
>
> Synthetic test showed ~40% performance boost. (see patch 2)

I'm fine with the io_uring side of things, but to queue this up we
really need Peter or Ingo to sign off on the core wakeup bits...

Peter?

--
Jens Axboe


2019-09-24 16:54:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting


* Jens Axboe <[email protected]> wrote:

> On 9/22/19 2:08 AM, Pavel Begunkov (Silence) wrote:
> > From: Pavel Begunkov <[email protected]>
> >
> > There could be a lot of overhead within generic wait_event_*() used for
> > waiting for large number of completions. The patchset removes much of
> > it by using custom wait event (wait_threshold).
> >
> > Synthetic test showed ~40% performance boost. (see patch 2)
>
> I'm fine with the io_uring side of things, but to queue this up we
> really need Peter or Ingo to sign off on the core wakeup bits...
>
> Peter?

I'm not sure an extension is needed for such a special interface, why not
just put a ->threshold value next to the ctx->wait field and use either
the regular wait_event() APIs with the proper condition, or
wait_event_cmd() style APIs if you absolutely need something more complex
to happen inside?

Should result in a much lower linecount and no scheduler changes. :-)

Thanks,

Ingo

2019-09-25 20:50:24

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] Optimise io_uring completion waiting

Hi, and thanks for the feedback.

It could be done with @cond indeed, that's how it works for now.
However, this addresses performance issues only.

The problem with wait_event_*() is that, if we have a counter and are
trying to wake up tasks after each increment, it would schedule each
waiting task O(threshold) times just for it to spuriously check @cond
and go back to sleep. All that overhead (memory barriers, registers
save/load, accounting, etc) turned out to be enough for some workloads
to slow down the system.

With this specialisation it still traverses a wait list and makes
indirect calls to the checker callback, but the list supposedly is
fairly small, so performance there shouldn't be a problem, at least for
now.

Regarding semantics; It should wake a task when a value passed to
wake_up_threshold() is greater or equal then a task's threshold, that is
specified individually for each task in wait_threshold_*().

In pseudo code:
```
def wake_up_threshold(n, wait_queue):
for waiter in wait_queue:
waiter.wake_up_if(n >= waiter.threshold);
```

Any thoughts how to do it better? Ideas are very welcome.

BTW, this monster is mostly a copy-paste from wait_event_*(),
wait_bit_*(). We could try to extract some common parts from these
three, but that's another topic.


On 23/09/2019 11:35, Ingo Molnar wrote:
>
> * Jens Axboe <[email protected]> wrote:
>
>> On 9/22/19 2:08 AM, Pavel Begunkov (Silence) wrote:
>>> From: Pavel Begunkov <[email protected]>
>>>
>>> There could be a lot of overhead within generic wait_event_*() used for
>>> waiting for large number of completions. The patchset removes much of
>>> it by using custom wait event (wait_threshold).
>>>
>>> Synthetic test showed ~40% performance boost. (see patch 2)
>>
>> I'm fine with the io_uring side of things, but to queue this up we
>> really need Peter or Ingo to sign off on the core wakeup bits...
>>
>> Peter?
>
> I'm not sure an extension is needed for such a special interface, why not
> just put a ->threshold value next to the ctx->wait field and use either
> the regular wait_event() APIs with the proper condition, or
> wait_event_cmd() style APIs if you absolutely need something more complex
> to happen inside?
>
> Should result in a much lower linecount and no scheduler changes. :-)
>
> Thanks,
>
> Ingo
>

--
Yours sincerely,
Pavel Begunkov


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature