MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.02.1504281328190.28505@nftneq.ynat.uz>
References: <20150413190350.GA9485@kroah.com> <20150423130548.GA4253@kroah.com>
 <20150423163616.GA10874@kroah.com> <CALCETrWoCsXDWYPedwMDk1s=Bxtc2s+oMCLCok8jg8-RbEL4KQ@mail.gmail.com>
 <20150423171640.GA11227@kroah.com> <553A4A2F.5090406@samsung.com>
 <CA+55aFzotCVgjW4Yj+HMfgw77vZwtfn0WXA6=OdZfU5fyB4u7w@mail.gmail.com>
 <CALCETrW+erTbMBHnN5zv_noP38Gznm4E0U7qk+uxOvM+a=NvCg@mail.gmail.com>
 <CA+55aFxRa3mwL-17hUuUGpjCeGJXseGteuZdj8eiP5MMkGWO7A@mail.gmail.com>
 <CA+55aFxdguuO7CLKYYq_qM6S6FT2CicZ1E5D5xGE54OdJNQwew@mail.gmail.com>
 <CAEntwhEO3CZ4ub4dRWxWz6dPynsfXh=1kAsBWO2wjLKArH1Dyw@mail.gmail.com>
 <CAEntwhHEx+XCdOPdH7EdvzBRoNKkUFKTaH==iMo4-bi=F-CVkA@mail.gmail.com>
 <alpine.DEB.2.02.1504281016110.28505@nftneq.ynat.uz> <CAEntwhH6cHWFsShtxxsE4EPayvMqOrVKJfvm3iYUU3qu0aU-6g@mail.gmail.com>
 <alpine.DEB.2.02.1504281328190.28505@nftneq.ynat.uz>
From: Andy Lutomirski <luto@amacapital.net>
Date: Tue, 28 Apr 2015 13:42:10 -0700
Message-ID: <CALCETrVCPy+UVUBxkD8qdEavfgpPArWof+2mguGjH3ZHJ-47PA@mail.gmail.com>
Subject: Re: [GIT PULL] kdbus for 4.1-rc1
To: David Lang <david@lang.hm>
Cc: Havoc Pennington <hp@pobox.com>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Lukasz Skalski <l.skalski@samsung.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Arnd Bergmann <arnd@arndb.de>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
        Tom Gundersen <teg@jklm.no>, Jiri Kosina <jkosina@suse.cz>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Daniel Mack <daniel@zonque.org>,
        David Herrmann <dh.herrmann@gmail.com>,
        Djalal Harouni <tixxdz@opendz.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4265
Lines: 91

On Tue, Apr 28, 2015 at 1:34 PM, David Lang <david@lang.hm> wrote:
> On Tue, 28 Apr 2015, Havoc Pennington wrote:
>
>> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@lang.hm> wrote:
>>>
>>> If the examples that are being used to show the performance advantage of
>>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>>> other examples available to people who don't live and breath dbus that
>>> 'so
>>> things right' so that the kernel developers can see what you think is the
>>> real problem and how kdbus addresses it.
>>>
>>> So far, this 'wrong' example is the only thing that's been posted to show
>>> the performance advantage of kdbus.
>>
>>
>> I'm hopeful someone will do that.
>>
>> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>>
>> * the bus daemon means an extra read/parse and marshal/write per
>> message, so 4 vs. 2
>> * the existence of the bus daemon therefore makes a message
>> send/receive take roughly twice as long
>>
>> https://lwn.net/Articles/580194/ has a bit more elaboration about
>> number of copies, validations, and context switches in each case.
>>
>> From what I can tell, the core performance claim for kdbus is that for
>> a userspace daemon to be a routing intermediary, it has to receive and
>> re-send messages. If the baseline performance of IPC is the cost to
>> send once and receive once, adding the daemon means there's twice as
>> much to do (1 more receive, 1 more send). However fast you make
>> send/receive, the daemon always means there are twice as many
>> send/receives as there would be with no daemon.
>
>
> there are twice as many context switches, nobody disputes that, the question
> is if it matters.
>
> It doesn't matter if the message router is in kernel space or user space, it
> still needs to read/parse, marshal/write the data, so you aren't saving that
> time due to it being in the kernel.
>
>> If that isn't what a benchmark shows, then there's a mystery to
>> explain... (one disruption to the ratio of course could be if the
>> clients use a much faster or slower dbus lib than the daemon)
>>
>> As noted many times, of course this 2x penalty for the daemon was a
>> conscious tradeoff - kdbus is trying to escape the tradeoff in order
>> to extend usage of dbus to more use cases. Given the tradeoff,
>> _existing_ uses of dbus seem to prefer the performance hit to the loss
>> of useful semantics, but potential new users would like to or need to
>> have both.
>
>
> If there is a 2x performance improvement for being in the kernel, but a 100x
> performance improvement from fixing the userspace code, the effort should be
> spent on the userspace code, not on moving things to kernel space.

I would guess that, if we compared a highly optimized userspace
implementation to a kernel implementation, we'd see less than 2x
difference.  After all, a userspace daemon doesn't really need to
unmarshal and re-marshal anything except headers.  For large messages,
we could use splice and avoid a couple of copies, too.

If the scheduler became a bottleneck, it could be interesting to add
something like a send-and-poll primitive.  I suspect that some
workloads currently do unnecessary context switches with only standard
POSIX primitives.  If A sends a message to B, then there's a brief
window in which both A and B are runnable.  Ideally we wouldn't
context switch until A calls poll or epoll_wait, but I don't know how
well that works in practice.

There's more room for generic improvements than just that.  At LSF/MM
we were talking about more scalable epoll variants that would allow a
multithreaded daemon to be woken up on the core that received incoming
data.  That would allow an efficient multi-queue dbus with fewer
migrations and IPIs.

At some point, I'd like to implement PCID on x86 (if no one beats me
to it, and this is a low priority for me), which will allow us to skip
expensive TLB flushes while context switching.  I have no idea whether
ARM can do something similar.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/