MIME-Version: 1.0
From: Andy Lutomirski <luto@amacapital.net>
Date: Mon, 22 Jun 2015 23:06:09 -0700
Message-ID: <CALCETrW5sdcwM4zoX7BQdf+a2fWgQhd6rPDg8br0FU7Xy-2ZhQ@mail.gmail.com>
Subject: kdbus: to merge or not to merge?
To: Linus Torvalds <torvalds@linux-foundation.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        David Herrmann <dh.herrmann@gmail.com>,
        Djalal Harouni <tixxdz@opendz.org>,
        Greg KH <gregkh@linuxfoundation.org>,
        Havoc Pennington <havoc.pennington@gmail.com>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
        Tom Gundersen <teg@jklm.no>, Daniel Mack <daniel@zonque.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5339
Lines: 98

Hi Linus,

Can you opine as to whether you think that kdbus should be merged?  I
don't mean whether you'd accept a pull request that Greg may or may
not send during this merge window -- I mean whether you think that
kdbus should be merged if it had appropriate review and people were
okay with the implementation.

The current state of uncertainty is problematic, I think.  The kdbus
team is spending a lot of time making things compatible with kdbus,
and the latest systemd release makes kdbus userspace support
mandatory.  The kernel people who would review it (myself included)
probably don't want to review new versions at a line-by-line level,
because we (myself included) either don't know whether there's any
point or don't think that it should be merged *even if the
implementation were flawless*.

For my part, here's my argument why the answer should be "no, kdbus
shouldn't be merged":

1. It's not necessary.  kdbus is a giant API surface.  The problems it
purports to solve are (very roughly) performance, ability to collect
metadata in a manner that doesn't suck, sandbox support, better
logging/monitoring, and availability very early in userspace startup.
I think that the performance issues should be solved in userspace --
gdbus performance is atrocious for reasons that have nothing to do
with the kernel or context switches [1].  The metadata problem, to the
extent that it's a real problem, can and should be solved by improving
AF_UNIX.  The logging, monitoring, and early userspace problems can
and should be solved in userspace.  See #3 below for my thoughts on
the sandbox.  Right now, kdbus sounds awfully like Tux.

2. Kdbus introduces a novel buffering model.  Receivers allocate a big
chunk of what's essentially tmpfs space.  Assuming that space is
available (in a virtual memory sense), senders synchronously write to
the receivers' tmpfs space.  Broadcast senders synchronously write to
*all* receivers' tmpfs space.  I think that, regardless of
implementation, this is problematic if the sender and the receiver are
in different memcgs.  Suppose that the message is to be written to a
page in the receivers' tmpfs space that is not currently resident.  If
the write happens in the sender's memcg context, then a receiver can
effectively allocate an unlimited number of pages in the sender's
memcg, which will, in practice, be the init memcg if the sender is
systemd.  This breaks the memcg model.  If, on the other hand, the
sender writes to the receiver's tmpfs space in the receiver's memcg
context, then the sender will block (or fail?  presumably
unpredictable failures are a bad thing) if the receiver's memcg is at
capacity.

3. The sandbox model is, in my opinion, an experiment that isn't going
to succeed.  It's a poor model: a "restricted endpoint" (i.e. a
sandboxed kdbus client) sees a view of the world defined by a limited
policy language implemented by the kernel.  This completely fails to
express what I think should be common use cases.  If a sandboxed app
is given permission to access, say,
/org/gnome/evolution/dataserver/CalendarView/3125/12, then it knows
that it's looking at CalendarView/3125/12 (whatever that means) and
there's no way to hide the name.  If someone subsequently deletes that
CalendarView and creates a new one with that name, racelessly blocking
access to the new one for the app may be complicated.  If a sandbox
wants to prompt the user before allowing access to some resource, it
has a problem: the policy language doesn't seem to be able to express
request interception.

The sandbox model is also already starting to accumulate kludges.
Apparently it was recently discovered that the kdbus connection
lifetime model was incompatible with sandbox policy, so as of a recent
change [2] connection lifetime messages completely bypass sandbox
policy.  Maybe this isn't obviously insecure, but it seems like a bad
sign that "it's probably okay to poke this hole" is already happening
before the thing is even merged.

I'll point out that a pure userspace implementation of sandboxed dbus
connections would be straightforward to implement today, would have
none of these problems, and would allow arbitrarily complex policy and
the flexibility to redesign it in the future if the initial design
turned out to be inappropriate for the sandbox being written.  (You
could even have two different implementations to go with two different
sandboxes.  Let a thousand sandboxes bloom, which is easy in userspace
but not so great in the kernel.)

In summary, I think that a very high quality implementation of the
kdbus concept and API would be a bit faster than a very high quality
userspace implementation of dbus.  Other than that, I think it would
actually be worse.  The kdbus proponents seem to be comparing the
current kdbus implementation to the current userspace implementation,
and a favorable comparison there is not a good reason to merge it.

--Andy

[1] I spent a while today trying to benchmark sd-bus.  I gave up,
because I couldn't get test code to build.  I don't have the patience
to try harder.

[2] https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/?h=kdbus&id=d27c8057699d164648b7d8c1559fa6529998f89d
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/