LinuxLists.cc - [PATCH] smp, ipi: Speed up IPI handling by invoking the callbacks in reverse order

2014-06-04 19:41:00

Subject: [PATCH] smp, ipi: Speed up IPI handling by invoking the callbacks in reverse order

The current implementation of lockless list (llist) has a drawback: if we
want to traverse the list in FIFO order (oldest to newest), we need to
reverse the list first (and this can be expensive if the list is large,
since this is an O(n) operation).

However, for callbacks that are queued using smp-call-function IPIs, the
requirement is that:
a. we invoke all of them, without missing any.
b. we invoke them as soon as possible.

In other words, we don't actually (need to) guarantee that the callbacks
will be invoked in FIFO order. So don't bother reversing the list; just
invoke the callbacks as they are (i.e., in reverse order). This would
probably speed-up the smp-call-function interrupt handler a tiny bit, when
flushing multiple pending callbacks upon receiving a single IPI.

But for debugging purposes, reverse the list and print it in the original
(FIFO) order in the WARN_ON case.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
---

kernel/smp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 5295388..be55094 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -229,7 +229,6 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)

head = &__get_cpu_var(call_single_queue);
entry = llist_del_all(head);
- entry = llist_reverse_order(entry);

/* There shouldn't be any pending callbacks on an offline CPU. */
if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
@@ -237,6 +236,8 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
warned = true;
WARN(1, "IPI on offline CPU %d\n", smp_processor_id());

+ entry = llist_reverse_order(entry);
+
/*
* We don't have to use the _safe() variant here
* because we are not invoking the IPI handlers yet.

2014-06-04 19:47:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] smp, ipi: Speed up IPI handling by invoking the callbacks in reverse order

On Thu, Jun 05, 2014 at 01:09:35AM +0530, Srivatsa S. Bhat wrote:
> The current implementation of lockless list (llist) has a drawback: if we
> want to traverse the list in FIFO order (oldest to newest), we need to
> reverse the list first (and this can be expensive if the list is large,
> since this is an O(n) operation).

Have you actually looked at the queue depth of this thing? Large queues
are a problem for interrupt latency.

2014-06-04 20:08:52

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] smp, ipi: Speed up IPI handling by invoking the callbacks in reverse order

On 06/05/2014 01:17 AM, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 01:09:35AM +0530, Srivatsa S. Bhat wrote:
>> The current implementation of lockless list (llist) has a drawback: if we
>> want to traverse the list in FIFO order (oldest to newest), we need to
>> reverse the list first (and this can be expensive if the list is large,
>> since this is an O(n) operation).
>
> Have you actually looked at the queue depth of this thing? Large queues
> are a problem for interrupt latency.
>

Actually, I wrote this patch just by looking at the code and realizing
that we don't need to reverse the list. In practice, I haven't actually
seen any noticeable interrupt latencies or large queues so far. So I think
this patch is just a very tiny optimization, that's all.

Regards,
Srivatsa S. Bhat

2014-06-05 07:26:42

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] smp, ipi: Speed up IPI handling by invoking the callbacks in reverse order

On Thu, Jun 05, 2014 at 01:37:25AM +0530, Srivatsa S. Bhat wrote:
> On 06/05/2014 01:17 AM, Peter Zijlstra wrote:
> > On Thu, Jun 05, 2014 at 01:09:35AM +0530, Srivatsa S. Bhat wrote:
> >> The current implementation of lockless list (llist) has a drawback: if we
> >> want to traverse the list in FIFO order (oldest to newest), we need to
> >> reverse the list first (and this can be expensive if the list is large,
> >> since this is an O(n) operation).
> >
> > Have you actually looked at the queue depth of this thing? Large queues
> > are a problem for interrupt latency.
> >
>
> Actually, I wrote this patch just by looking at the code and realizing
> that we don't need to reverse the list. In practice, I haven't actually
> seen any noticeable interrupt latencies or large queues so far. So I think
> this patch is just a very tiny optimization, that's all.

So conceptually it makes sense to service in FIFO because the first
entry is waiting longest, by servicing them in LIFO order you get far
more variance in latency.

And if the list is small, the cost isn't high.

Then again, we don't have any good numbers one way or the other.

Attachments:

(No filename) (1.12 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-06 07:38:41

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] smp, ipi: Speed up IPI handling by invoking the callbacks in reverse order

On 06/05/2014 12:56 PM, Peter Zijlstra wrote:
> On Thu, Jun 05, 2014 at 01:37:25AM +0530, Srivatsa S. Bhat wrote:
>> On 06/05/2014 01:17 AM, Peter Zijlstra wrote:
>>> On Thu, Jun 05, 2014 at 01:09:35AM +0530, Srivatsa S. Bhat wrote:
>>>> The current implementation of lockless list (llist) has a drawback: if we
>>>> want to traverse the list in FIFO order (oldest to newest), we need to
>>>> reverse the list first (and this can be expensive if the list is large,
>>>> since this is an O(n) operation).
>>>
>>> Have you actually looked at the queue depth of this thing? Large queues
>>> are a problem for interrupt latency.
>>>
>>
>> Actually, I wrote this patch just by looking at the code and realizing
>> that we don't need to reverse the list. In practice, I haven't actually
>> seen any noticeable interrupt latencies or large queues so far. So I think
>> this patch is just a very tiny optimization, that's all.
>
> So conceptually it makes sense to service in FIFO because the first
> entry is waiting longest, by servicing them in LIFO order you get far
> more variance in latency.
>
> And if the list is small, the cost isn't high.
>
> Then again, we don't have any good numbers one way or the other.
>

Hmm, right. I thought hard to see if there is a clever way to maintain
the llist in the FIFO order itself, while still preserving the atomicity
guarantees, but I couldn't think of anything sane :-(

Regards,
Srivatsa S. Bhat