2015-04-13 19:03:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [GIT PULL] kdbus for 4.1-rc1

The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:

Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1

for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:

kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)

----------------------------------------------------------------
kdbus for 4.1-rc1

Here's the kdbus pull request for 4.1-rc1.

It's been under development for many years now, and been in linux-next
for many months, and has undergone loads of testing a review and even a few
good arguments. It comes with full documentation and tests.

There has been a few complaints about the code, notably from people who
don't like the use of metadata in the bus messages. That is actually
one of the main features here, as we can get this data in a secure and
reliable way, and it's something that userspace requires today. So
while it does look "odd" to people who are not familiar with dbus, this
is something that finally fixes a number of almost unfixable races in
the current dbus implementations.

The rest of this pull request message comes from the kdbus patch posting
messages as sent to lkml previously:

Reasons kdbus should be in the kernel, instead of userspace as it is
currently done today includes the following:

* Performance: Fewer process context switches, fewer copies, fewer
syscalls, larger memory chunks via memfd. This is really important
for a whole class of userspace programs that are ported from other
operating systems that are run on tiny ARM systems that rely on
hundreds of thousands of messages passed at boot time, and at
"critical" times in their user interaction loops. DBus is not used
for performance sensitive applications because DBus is slow.
We want to make it fast so we can finally use it for low-latency,
high-throughput applications. A simple DBus method-call+reply takes
200us on an up-to-date test machine, with kdbus it takes 8us (with
UDS about 2us). If the packet size is increased from 8k to 128k,
kdbus even beats UDS due to single-copy transfers.

* Security: The peers which communicate do not have to trust each
other, as the only trustworthy component in the game is the kernel
which adds metadata and ensures that all data passed as payload is
either copied or sealed, so that the receiver can parse the data
without having to protect against changing memory while parsing
buffers. Also, all the data transfer is controlled by the kernel,
so that LSMs can track and control what is going on, without
involving userspace. Because of the LSM issue, security people are
much happier with this model than the current scheme of having to
hook into dbus to mediate things.
* More types of metadata can be attached to messages than in userspace

* Semantics for apps with heavy data payloads (media apps, for
instance) with optinal priority message dequeuing, and global
message ordering. Some "crazy" people are playing with using kdbus
for audio data in the system. I'm not saying that this is the best
model for this, but until now, there wasn't any other way to do this
without having to create custom "buses", one for each application
library.

* Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions. For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services

* Eavesdropping on the kernel level, so privileged users can hook into
the message stream without hacking support for that into their
userspace processes

* A number of smaller benefits: for example kdbus learned a way to peek
full messages without dequeing them, which is really useful for
logging metadata when handling bus-activation requests.

* dbus-daemon is not available during early-boot or shutdown.

DBus marshaling is the de-facto standard in all major(!) Linux desktop
systems. It is well established and accepted by many DEs. It also
solves many other problems, including: policy, authentication /
authorization, well-known name registry, efficient broadcasts /
multicasts, peer discovery, bus discovery, metadata transmission, and
more.

It is a shame that we cannot use this well-established protocol for
low-latency applications. We, effectively, have to duplicate all this
code on custom UDS and other transports just because DBus is too slow.
kdbus tries to unify those efforts, so that we don't need multiple
policy implementations, name registries and peer discovery mechanisms.
Furthermore, kdbus implements comprehensive, yet optional, metadata
transmission that allows to identify and authenticate peers in a
race-free manner (which is *not* possible with UDS).

Also, kdbus provides a single transport bus with sequential message
numbering. If you use multiple channels, you cannot give any ordering
guarantees across peers (for instance, regarding parallel name-registry
changes).

Of course, some of the bits above could be implemented in userspace
alone, for example with more sophisticated memory management APIs, but
this is usually done by losing out on the other details. For example,
for many of the memory management APIs, it's hard to not require the
communicating peers to fully trust each other. And we _really_ don't
want peers to have to trust each other.

Another benefit of having this in the kernel, rather than as a userspace
daemon, is that you can now easily use the bus from the initrd, or up to
the very end when the system shuts down. On current userspace D-Bus,
this is not really possible, as this requires passing the bus instance
around between initrd and the "real" system. Such a transition of all
fds also requires keeping full state of what has already been read from
the connection fds. kdbus makes this much simpler, as we can change the
ownership of the bus, just by passing one fd over from one part to the
other.

Given the theoretical advantages above, here are some real-world
examples:

* The Tizen developers have been complaining about the high latency
of DBus for polkit'ish policy queries. That's why their
authentication framework uses custom UDS sockets (called 'Cynara').
If a UI-interaction needs multiple authentication-queries, you don't
want it to take multiple milliseconds, given that you usually want
to render the result in the same frame.

* PulseAudio doesn't use DBus for data transmission. They had to
implement their own marshaling code, transport layer and so on, just
because DBus1-latency is horrible. With kdbus, we can basically drop
this code-duplication and unify the IPC layer. Same is true for
Wayland, btw.

* By moving broadcast-transmission into the kernel, we can use the
time-slices of the sender to perform heavy operations. This is also
true for policy decisions, etc. With a userspace daemon, we cannot
perform operations in a time-slice of the caller. This makes DoS
attacks much harder.

* With priority-inheritance, we can do synchronous calls into trusted
peers and let them optionally use our time-slice to perform the
action. This allows syscall-like/binder-like method-calls into other
processes. Without priority-inheritance, this is not possible in a
secure manner (see 'priority-inheritance').

* Logging-daemons often want to attach metadata to log-messages so
debugging/filtering gets easier. If short-lived programs send
log-messages, the destination peer might not be able to read such
metadata from /proc, as the process might no longer be available at
that time. Same is true for policy-decisions like polkit does. You
cannot send off method-calls and exit. You have to wait for a reply,
even though you might not even care for it. If you don't wait, the
other side might not be able to verify your identity and as such
reject the request.

* Even though the dbus traffic on idle-systems might be low, this
doesn't mean it's not significant at boot-times or under high-load.
If you run a dbus-monitor of your choice, you will see there is an
significant number of messages exchanged during VT-switches, startup,
shutdown, suspend, wakeup, hotplugging and similar situations where
lots of control-messages are exchanged. We don't want to spend
hundreds of ms just to transmit those messages.

Signed-off-by: Greg Kroah-Hartman <[email protected]>

----------------------------------------------------------------
Arnd Bergmann (1):
kdbus: avoid the use of struct timespec

Daniel Mack (18):
kdbus: add documentation
kdbus: add uapi header file
kdbus: add driver skeleton, ioctl entry points and utility functions
kdbus: add connection pool implementation
kdbus: add connection, queue handling and message validation code
kdbus: add node and filesystem implementation
kdbus: add code to gather metadata
kdbus: add code for notifications and matches
kdbus: add code for buses, domains and endpoints
kdbus: add name registry implementation
kdbus: add policy database implementation
kdbus: add Makefile, Kconfig and MAINTAINERS entry
kdbus: add walk-through user space example
kdbus: add selftests
Documentation: kdbus: fix location for generated files
kdbus: connection: fix handling of failed fget()
kdbus: Fix CONFIG_KDBUS help text
samples: kdbus: build kdbus-workers conditionally

David Herrmann (5):
kdbus: samples/kdbus: add -lrt
samples/kdbus: drop wrong include
Documentation/kdbus: fix out-of-tree builds
Documentation/kdbus: support quiet builds
selftests/kdbus: fix gitignore

Lucas De Marchi (1):
kdbus: fix header guard name

Lukasz Skalski (1):
Documentation/kdbus: replace 'reply_cookie' with 'cookie_reply'

Nicolas Iooss (1):
kdbus: fix minor typo in the walk-through example

Sergei Zviagintsev (5):
kdbus: uapi: Fix kernel-doc for enum kdbus_send_flags
Documentation: kdbus: Fix list of KDBUS_CMD_ENDPOINT_UPDATE errors
Documentation: kdbus: Update list of ioctls which cause writing to receiver's pool
Documentation: kdbus: Fix description of KDBUS_SEND_SYNC_REPLY flag
Documentation: kdbus: Fix typos

Tyler Baker (1):
selftest/kdbus: enable cross compilation

Documentation/Makefile | 2 +-
Documentation/ioctl/ioctl-number.txt | 1 +
Documentation/kdbus/.gitignore | 2 +
Documentation/kdbus/Makefile | 40 +
Documentation/kdbus/kdbus.bus.xml | 359 ++++
Documentation/kdbus/kdbus.connection.xml | 1250 ++++++++++++
Documentation/kdbus/kdbus.endpoint.xml | 429 ++++
Documentation/kdbus/kdbus.fs.xml | 124 ++
Documentation/kdbus/kdbus.item.xml | 839 ++++++++
Documentation/kdbus/kdbus.match.xml | 555 ++++++
Documentation/kdbus/kdbus.message.xml | 1276 ++++++++++++
Documentation/kdbus/kdbus.name.xml | 711 +++++++
Documentation/kdbus/kdbus.policy.xml | 406 ++++
Documentation/kdbus/kdbus.pool.xml | 326 +++
Documentation/kdbus/kdbus.xml | 1012 ++++++++++
Documentation/kdbus/stylesheet.xsl | 16 +
MAINTAINERS | 13 +
Makefile | 1 +
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/kdbus.h | 979 +++++++++
include/uapi/linux/magic.h | 2 +
init/Kconfig | 13 +
ipc/Makefile | 2 +-
ipc/kdbus/Makefile | 22 +
ipc/kdbus/bus.c | 560 ++++++
ipc/kdbus/bus.h | 101 +
ipc/kdbus/connection.c | 2214 +++++++++++++++++++++
ipc/kdbus/connection.h | 257 +++
ipc/kdbus/domain.c | 296 +++
ipc/kdbus/domain.h | 77 +
ipc/kdbus/endpoint.c | 275 +++
ipc/kdbus/endpoint.h | 67 +
ipc/kdbus/fs.c | 510 +++++
ipc/kdbus/fs.h | 28 +
ipc/kdbus/handle.c | 617 ++++++
ipc/kdbus/handle.h | 85 +
ipc/kdbus/item.c | 339 ++++
ipc/kdbus/item.h | 64 +
ipc/kdbus/limits.h | 64 +
ipc/kdbus/main.c | 125 ++
ipc/kdbus/match.c | 559 ++++++
ipc/kdbus/match.h | 35 +
ipc/kdbus/message.c | 616 ++++++
ipc/kdbus/message.h | 133 ++
ipc/kdbus/metadata.c | 1159 +++++++++++
ipc/kdbus/metadata.h | 57 +
ipc/kdbus/names.c | 772 +++++++
ipc/kdbus/names.h | 74 +
ipc/kdbus/node.c | 910 +++++++++
ipc/kdbus/node.h | 84 +
ipc/kdbus/notify.c | 248 +++
ipc/kdbus/notify.h | 30 +
ipc/kdbus/policy.c | 489 +++++
ipc/kdbus/policy.h | 51 +
ipc/kdbus/pool.c | 728 +++++++
ipc/kdbus/pool.h | 46 +
ipc/kdbus/queue.c | 678 +++++++
ipc/kdbus/queue.h | 92 +
ipc/kdbus/reply.c | 257 +++
ipc/kdbus/reply.h | 68 +
ipc/kdbus/util.c | 201 ++
ipc/kdbus/util.h | 74 +
samples/Kconfig | 7 +
samples/Makefile | 3 +-
samples/kdbus/.gitignore | 1 +
samples/kdbus/Makefile | 9 +
samples/kdbus/kdbus-api.h | 114 ++
samples/kdbus/kdbus-workers.c | 1326 ++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kdbus/.gitignore | 1 +
tools/testing/selftests/kdbus/Makefile | 48 +
tools/testing/selftests/kdbus/kdbus-enum.c | 94 +
tools/testing/selftests/kdbus/kdbus-enum.h | 14 +
tools/testing/selftests/kdbus/kdbus-test.c | 923 +++++++++
tools/testing/selftests/kdbus/kdbus-test.h | 85 +
tools/testing/selftests/kdbus/kdbus-util.c | 1615 +++++++++++++++
tools/testing/selftests/kdbus/kdbus-util.h | 222 +++
tools/testing/selftests/kdbus/test-activator.c | 318 +++
tools/testing/selftests/kdbus/test-attach-flags.c | 750 +++++++
tools/testing/selftests/kdbus/test-benchmark.c | 451 +++++
tools/testing/selftests/kdbus/test-bus.c | 175 ++
tools/testing/selftests/kdbus/test-chat.c | 122 ++
tools/testing/selftests/kdbus/test-connection.c | 616 ++++++
tools/testing/selftests/kdbus/test-daemon.c | 65 +
tools/testing/selftests/kdbus/test-endpoint.c | 341 ++++
tools/testing/selftests/kdbus/test-fd.c | 789 ++++++++
tools/testing/selftests/kdbus/test-free.c | 64 +
tools/testing/selftests/kdbus/test-match.c | 441 ++++
tools/testing/selftests/kdbus/test-message.c | 731 +++++++
tools/testing/selftests/kdbus/test-metadata-ns.c | 506 +++++
tools/testing/selftests/kdbus/test-monitor.c | 176 ++
tools/testing/selftests/kdbus/test-names.c | 194 ++
tools/testing/selftests/kdbus/test-policy-ns.c | 632 ++++++
tools/testing/selftests/kdbus/test-policy-priv.c | 1269 ++++++++++++
tools/testing/selftests/kdbus/test-policy.c | 80 +
tools/testing/selftests/kdbus/test-sync.c | 369 ++++
tools/testing/selftests/kdbus/test-timeout.c | 99 +
97 files changed, 34069 insertions(+), 3 deletions(-)
create mode 100644 Documentation/kdbus/.gitignore
create mode 100644 Documentation/kdbus/Makefile
create mode 100644 Documentation/kdbus/kdbus.bus.xml
create mode 100644 Documentation/kdbus/kdbus.connection.xml
create mode 100644 Documentation/kdbus/kdbus.endpoint.xml
create mode 100644 Documentation/kdbus/kdbus.fs.xml
create mode 100644 Documentation/kdbus/kdbus.item.xml
create mode 100644 Documentation/kdbus/kdbus.match.xml
create mode 100644 Documentation/kdbus/kdbus.message.xml
create mode 100644 Documentation/kdbus/kdbus.name.xml
create mode 100644 Documentation/kdbus/kdbus.policy.xml
create mode 100644 Documentation/kdbus/kdbus.pool.xml
create mode 100644 Documentation/kdbus/kdbus.xml
create mode 100644 Documentation/kdbus/stylesheet.xsl
create mode 100644 include/uapi/linux/kdbus.h
create mode 100644 ipc/kdbus/Makefile
create mode 100644 ipc/kdbus/bus.c
create mode 100644 ipc/kdbus/bus.h
create mode 100644 ipc/kdbus/connection.c
create mode 100644 ipc/kdbus/connection.h
create mode 100644 ipc/kdbus/domain.c
create mode 100644 ipc/kdbus/domain.h
create mode 100644 ipc/kdbus/endpoint.c
create mode 100644 ipc/kdbus/endpoint.h
create mode 100644 ipc/kdbus/fs.c
create mode 100644 ipc/kdbus/fs.h
create mode 100644 ipc/kdbus/handle.c
create mode 100644 ipc/kdbus/handle.h
create mode 100644 ipc/kdbus/item.c
create mode 100644 ipc/kdbus/item.h
create mode 100644 ipc/kdbus/limits.h
create mode 100644 ipc/kdbus/main.c
create mode 100644 ipc/kdbus/match.c
create mode 100644 ipc/kdbus/match.h
create mode 100644 ipc/kdbus/message.c
create mode 100644 ipc/kdbus/message.h
create mode 100644 ipc/kdbus/metadata.c
create mode 100644 ipc/kdbus/metadata.h
create mode 100644 ipc/kdbus/names.c
create mode 100644 ipc/kdbus/names.h
create mode 100644 ipc/kdbus/node.c
create mode 100644 ipc/kdbus/node.h
create mode 100644 ipc/kdbus/notify.c
create mode 100644 ipc/kdbus/notify.h
create mode 100644 ipc/kdbus/policy.c
create mode 100644 ipc/kdbus/policy.h
create mode 100644 ipc/kdbus/pool.c
create mode 100644 ipc/kdbus/pool.h
create mode 100644 ipc/kdbus/queue.c
create mode 100644 ipc/kdbus/queue.h
create mode 100644 ipc/kdbus/reply.c
create mode 100644 ipc/kdbus/reply.h
create mode 100644 ipc/kdbus/util.c
create mode 100644 ipc/kdbus/util.h
create mode 100644 samples/kdbus/.gitignore
create mode 100644 samples/kdbus/Makefile
create mode 100644 samples/kdbus/kdbus-api.h
create mode 100644 samples/kdbus/kdbus-workers.c
create mode 100644 tools/testing/selftests/kdbus/.gitignore
create mode 100644 tools/testing/selftests/kdbus/Makefile
create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-enum.h
create mode 100644 tools/testing/selftests/kdbus/kdbus-test.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-test.h
create mode 100644 tools/testing/selftests/kdbus/kdbus-util.c
create mode 100644 tools/testing/selftests/kdbus/kdbus-util.h
create mode 100644 tools/testing/selftests/kdbus/test-activator.c
create mode 100644 tools/testing/selftests/kdbus/test-attach-flags.c
create mode 100644 tools/testing/selftests/kdbus/test-benchmark.c
create mode 100644 tools/testing/selftests/kdbus/test-bus.c
create mode 100644 tools/testing/selftests/kdbus/test-chat.c
create mode 100644 tools/testing/selftests/kdbus/test-connection.c
create mode 100644 tools/testing/selftests/kdbus/test-daemon.c
create mode 100644 tools/testing/selftests/kdbus/test-endpoint.c
create mode 100644 tools/testing/selftests/kdbus/test-fd.c
create mode 100644 tools/testing/selftests/kdbus/test-free.c
create mode 100644 tools/testing/selftests/kdbus/test-match.c
create mode 100644 tools/testing/selftests/kdbus/test-message.c
create mode 100644 tools/testing/selftests/kdbus/test-metadata-ns.c
create mode 100644 tools/testing/selftests/kdbus/test-monitor.c
create mode 100644 tools/testing/selftests/kdbus/test-names.c
create mode 100644 tools/testing/selftests/kdbus/test-policy-ns.c
create mode 100644 tools/testing/selftests/kdbus/test-policy-priv.c
create mode 100644 tools/testing/selftests/kdbus/test-policy.c
create mode 100644 tools/testing/selftests/kdbus/test-sync.c
create mode 100644 tools/testing/selftests/kdbus/test-timeout.c


2015-04-13 19:33:50

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Greg Kroah-Hartman <[email protected]> writes:

> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>
> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>
> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>
> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>
> ----------------------------------------------------------------
> kdbus for 4.1-rc1
>
> Here's the kdbus pull request for 4.1-rc1.
>
> It's been under development for many years now, and been in linux-next
> for many months, and has undergone loads of testing a review and even a few
> good arguments. It comes with full documentation and tests.

> There has been a few complaints about the code, notably from people who
> don't like the use of metadata in the bus messages. That is actually
> one of the main features here, as we can get this data in a secure and
> reliable way, and it's something that userspace requires today. So
> while it does look "odd" to people who are not familiar with dbus, this
> is something that finally fixes a number of almost unfixable races in
> the current dbus implementations.

And the code that transfers the meta-data is wrong.

It is generally not something that userspace requires today, certainly
userspace is not using it.

You are exporting a weird set of information in a unique way that makes
it race free enough to make ``security'' decisions upon but the data
in general is not appropriate to make those decisions.

I remain opposed to this half thought out trash of an ABI for the
meta-data.

Just because something happens to be exported in a DEBUG api today does
not make it appropriate for userspace to run around making security
decisions with that information.

Nacked-by: "Eric W. Biederman" <[email protected]>

I think it is premature to be merging kdbus. You have fuddamental
issues that can not be fixed once the ABI is frozen.

The semantics of the meta-data you export are extremely poorly defined.

> The rest of this pull request message comes from the kdbus patch posting
> messages as sent to lkml previously:
>
> Reasons kdbus should be in the kernel, instead of userspace as it is
> currently done today includes the following:
>
> * Performance: Fewer process context switches, fewer copies, fewer
> syscalls, larger memory chunks via memfd. This is really important
> for a whole class of userspace programs that are ported from other
> operating systems that are run on tiny ARM systems that rely on
> hundreds of thousands of messages passed at boot time, and at
> "critical" times in their user interaction loops. DBus is not used
> for performance sensitive applications because DBus is slow.
> We want to make it fast so we can finally use it for low-latency,
> high-throughput applications. A simple DBus method-call+reply takes
> 200us on an up-to-date test machine, with kdbus it takes 8us (with
> UDS about 2us). If the packet size is increased from 8k to 128k,
> kdbus even beats UDS due to single-copy transfers.

And with a good design kdbus could be faster.

> * Security: The peers which communicate do not have to trust each
> other, as the only trustworthy component in the game is the kernel
> which adds metadata and ensures that all data passed as payload is
> either copied or sealed, so that the receiver can parse the data
> without having to protect against changing memory while parsing
> buffers. Also, all the data transfer is controlled by the kernel,
> so that LSMs can track and control what is going on, without
> involving userspace. Because of the LSM issue, security people are
> much happier with this model than the current scheme of having to
> hook into dbus to mediate things.
> * More types of metadata can be attached to messages than in
> userspace

The meta-data is poorly thought and and much of it is not appropriate
for making security decisions anywhere except in the kernel.

All I have seen with the meta-data discussion is sticking heads in the
sand and resubmitting and hoping your reviewers go away.

If you won't do a good responsible job on this before the code is merged
how can we possibly expect you to do a good job later. Or is this going
to be another API where userspace will be broken at arbitrary moments by
arbitrary users?

How are you going to fix the security issues your poor API comes with it
when then are eventually spelled out clearly and to fix them means
breaking everyones desktop environment?

Eric

2015-04-13 19:42:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote:
> Greg Kroah-Hartman <[email protected]> writes:
>
> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >
> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >
> > are available in the git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >
> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >
> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >
> > ----------------------------------------------------------------
> > kdbus for 4.1-rc1
> >
> > Here's the kdbus pull request for 4.1-rc1.
> >
> > It's been under development for many years now, and been in linux-next
> > for many months, and has undergone loads of testing a review and even a few
> > good arguments. It comes with full documentation and tests.
>
> > There has been a few complaints about the code, notably from people who
> > don't like the use of metadata in the bus messages. That is actually
> > one of the main features here, as we can get this data in a secure and
> > reliable way, and it's something that userspace requires today. So
> > while it does look "odd" to people who are not familiar with dbus, this
> > is something that finally fixes a number of almost unfixable races in
> > the current dbus implementations.
>
> And the code that transfers the meta-data is wrong.
>
> It is generally not something that userspace requires today, certainly
> userspace is not using it.
>
> You are exporting a weird set of information in a unique way that makes
> it race free enough to make ``security'' decisions upon but the data
> in general is not appropriate to make those decisions.

I asked this before but you didn't answer as to why you thought these
decisions were not valid. It's what userspace does today already.

> I remain opposed to this half thought out trash of an ABI for the
> meta-data.

You don't have to enable the metadata if you don't want to use it, it's
an option :)

> Just because something happens to be exported in a DEBUG api today does
> not make it appropriate for userspace to run around making security
> decisions with that information.

What is exported in a debug api today that is being used here? I asked
this before but never saw a response.

> > * Performance: Fewer process context switches, fewer copies, fewer
> > syscalls, larger memory chunks via memfd. This is really important
> > for a whole class of userspace programs that are ported from other
> > operating systems that are run on tiny ARM systems that rely on
> > hundreds of thousands of messages passed at boot time, and at
> > "critical" times in their user interaction loops. DBus is not used
> > for performance sensitive applications because DBus is slow.
> > We want to make it fast so we can finally use it for low-latency,
> > high-throughput applications. A simple DBus method-call+reply takes
> > 200us on an up-to-date test machine, with kdbus it takes 8us (with
> > UDS about 2us). If the packet size is increased from 8k to 128k,
> > kdbus even beats UDS due to single-copy transfers.
>
> And with a good design kdbus could be faster.

Faster than today, sure, we've already found some areas that can be
optimized, but that's all internal changes, to be done later, nothing
affecting the userspace api at all.

Even then, today it's very fast.

> > * Security: The peers which communicate do not have to trust each
> > other, as the only trustworthy component in the game is the kernel
> > which adds metadata and ensures that all data passed as payload is
> > either copied or sealed, so that the receiver can parse the data
> > without having to protect against changing memory while parsing
> > buffers. Also, all the data transfer is controlled by the kernel,
> > so that LSMs can track and control what is going on, without
> > involving userspace. Because of the LSM issue, security people are
> > much happier with this model than the current scheme of having to
> > hook into dbus to mediate things.
> > * More types of metadata can be attached to messages than in
> > userspace
>
> The meta-data is poorly thought and and much of it is not appropriate
> for making security decisions anywhere except in the kernel.
>
> All I have seen with the meta-data discussion is sticking heads in the
> sand and resubmitting and hoping your reviewers go away.

No, we have asked for specifics but have gotten none, other than random
complaints like this. Please be specific as to what is being used
incorrectly.

> If you won't do a good responsible job on this before the code is merged
> how can we possibly expect you to do a good job later. Or is this going
> to be another API where userspace will be broken at arbitrary moments by
> arbitrary users?
>
> How are you going to fix the security issues your poor API comes with it
> when then are eventually spelled out clearly and to fix them means
> breaking everyones desktop environment?

What security issues? There are none that I know of, please be specific
and not just make vague accusations please.

thanks,

greg k-h

2015-04-13 19:49:31

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
<[email protected]> wrote:
>> I remain opposed to this half thought out trash of an ABI for the
>> meta-data.
>
> You don't have to enable the metadata if you don't want to use it, it's
> an option :)

Wasn't this also an argument for CONFIG_CGROUPS?
Now we're forced to enable it by default to boot a recent distro
and CONFIG_CGROUPS is still not fixed.

--
Thanks,
//richard

2015-04-13 19:55:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> >> I remain opposed to this half thought out trash of an ABI for the
> >> meta-data.
> >
> > You don't have to enable the metadata if you don't want to use it, it's
> > an option :)
>
> Wasn't this also an argument for CONFIG_CGROUPS?
> Now we're forced to enable it by default to boot a recent distro
> and CONFIG_CGROUPS is still not fixed.

CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some
words with you :)

Anyway, yes, it's an option, but given that people are using this
metadata today in userspace just fine, I fail to see how having the
kernel be a transport for this same data is an issue. When the kernel
is the transport, it can do so in a race-free way, and you can properly
do security tests/logic based on it.

thanks,

greg k-h

2015-04-13 19:57:32

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1


Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman:
> On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
>> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>>>> I remain opposed to this half thought out trash of an ABI for the
>>>> meta-data.
>>>
>>> You don't have to enable the metadata if you don't want to use it, it's
>>> an option :)
>>
>> Wasn't this also an argument for CONFIG_CGROUPS?
>> Now we're forced to enable it by default to boot a recent distro
>> and CONFIG_CGROUPS is still not fixed.
>
> CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some
> words with you :)

Tejun is working on it and does a *very* good job. But as long the unified
hirarchy is not complete/stable we're facing issues.
Ever tried to run systemd a linux container? ;)

Thanks,
//richard

2015-04-13 20:03:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 09:57:24PM +0200, Richard Weinberger wrote:
>
> Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman:
> > On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
> >> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >>>> I remain opposed to this half thought out trash of an ABI for the
> >>>> meta-data.
> >>>
> >>> You don't have to enable the metadata if you don't want to use it, it's
> >>> an option :)
> >>
> >> Wasn't this also an argument for CONFIG_CGROUPS?
> >> Now we're forced to enable it by default to boot a recent distro
> >> and CONFIG_CGROUPS is still not fixed.
> >
> > CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some
> > words with you :)
>
> Tejun is working on it and does a *very* good job. But as long the unified
> hirarchy is not complete/stable we're facing issues.
> Ever tried to run systemd a linux container? ;)

Works just fine for me, I do it daily. Here's how I spin up a debian
image on my local filesystem, running systemd within it just swimmingly:
sudo systemd-nspawn -D debian/ /sbin/init

Also works just fine with gentoo and arch images, both of which I use on
a weekly basis in this manner.

Perhaps you are doing something odd that prevents this from working for
you?

thanks,

greg k-h

2015-04-13 20:09:01

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 13.04.2015 um 22:03 schrieb Greg Kroah-Hartman:
> On Mon, Apr 13, 2015 at 09:57:24PM +0200, Richard Weinberger wrote:
>>
>> Am 13.04.2015 um 21:54 schrieb Greg Kroah-Hartman:
>>> On Mon, Apr 13, 2015 at 09:49:27PM +0200, Richard Weinberger wrote:
>>>> On Mon, Apr 13, 2015 at 9:42 PM, Greg Kroah-Hartman
>>>> <[email protected]> wrote:
>>>>>> I remain opposed to this half thought out trash of an ABI for the
>>>>>> meta-data.
>>>>>
>>>>> You don't have to enable the metadata if you don't want to use it, it's
>>>>> an option :)
>>>>
>>>> Wasn't this also an argument for CONFIG_CGROUPS?
>>>> Now we're forced to enable it by default to boot a recent distro
>>>> and CONFIG_CGROUPS is still not fixed.
>>>
>>> CONFIG_CGROUPS is "not fixed"? I think Tejun would like to have some
>>> words with you :)
>>
>> Tejun is working on it and does a *very* good job. But as long the unified
>> hirarchy is not complete/stable we're facing issues.
>> Ever tried to run systemd a linux container? ;)
>
> Works just fine for me, I do it daily. Here's how I spin up a debian
> image on my local filesystem, running systemd within it just swimmingly:
> sudo systemd-nspawn -D debian/ /sbin/init
>
> Also works just fine with gentoo and arch images, both of which I use on
> a weekly basis in this manner.
>
> Perhaps you are doing something odd that prevents this from working for
> you?

systemd-nspawn does not support user namespaces.

But the real issue is that cgroup notification does not work within namespaces.
I.e. systemd within the namespaces does not get a notify when all processes within a cgroup
are gone.
You'll notice that by running a container a long time, systemd will get slower and slower
as a lot of sessions (mostly crond) will stay.
It is known by systemd folks and I have been told that they need the new unified cgroup
hirarchy to deal with that.

I consult a lot in the linux container hosting area and had a lot of "fun" with issues like
that...

Thanks,
//richard

2015-04-13 20:13:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>
> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>
> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>
> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>
> ----------------------------------------------------------------
> kdbus for 4.1-rc1
>
> Here's the kdbus pull request for 4.1-rc1.
>
> It's been under development for many years now, and been in linux-next
> for many months, and has undergone loads of testing a review and even a few
> good arguments. It comes with full documentation and tests.
>
> There has been a few complaints about the code, notably from people who
> don't like the use of metadata in the bus messages. That is actually
> one of the main features here, as we can get this data in a secure and
> reliable way, and it's something that userspace requires today. So
> while it does look "odd" to people who are not familiar with dbus, this
> is something that finally fixes a number of almost unfixable races in
> the current dbus implementations.

While I generally like the concept of having a better in-kernel IPC
mechanism, after some consideration I don't think this belongs in the
kernel in its current form. Here's why.

First, the naming is counterintuitive. There are "endpoints", but you
don't send messages to endpoints. In fact, an basic kdbus setup will
have exactly one endpoint AFAICT. Wtf? This makes talking about it
awkward.

A lot of the design seems to be to violate the concept of "mechanism,
not policy". Kdbus is very much a port of userspace dbus to the
kernel, and it appears to be a port designed to preserve some
questionable design decisions instead of learning from them.

For example, kdbus sticks a whole policy database in the kernel, but
that policy database (AFAICT -- holy crap it's overcomplicated) is
*not* a simple set of rules like "if A then allow B". Instead it has
really weird dependencies not on what name you're sending to but on
what *other* names the thing you're sending to has. Sorry, but this
way lies (a) the inability for a large set of developers to understand
what's going on and (b) security bugs. Also, the result probably
can't be reused as part of a non-legacy-filled sensible design

Kdbus claims to be very fast. Unfortunately, requests for a broad set
of benchmarks have mostly been ignored, my attempts to benchmark it
(admittedly I didn't try that hard) were several times worse than
published figures, and, most tellingly, *no one* has claimed that
kdbus is faster than AF_UNIX. In fact, everyone seems to acknowledge
that kdbus is several times slower than AF_UNIX.

The metadata thing is problematic. It seems to be intended to serve
two purposes: data gathering for logging and authentication.
Unfortunately, it has issues. There are no fewer than *three*
metadata capture points: creation of a bus, connection to a bus, and
sending of a message. The kdbus authors like to point out that these
are all optional, but IMO that's bunk. Someone will write a userspace
library that rejects messages from people who don't enable all of
them, then then we're screwed.

Why are we screwed? Because any kdbus client *won't know which
metadata matters*. That means that we automatically have the worst of
all worlds, not the best. Also, the bus creation metadata is
completely worthless for anything other than logging, but someone will
use it for something other than logging, at which point it's
vulnerable to a DoS. No one has explained to my satisfaction why this
isn't a problem.

Also, the metadata code captures things that are, in my book
completely unacceptable, such as cmdline and (!) capabilities. I bet
that the cmdline capture is extra special fscked up when cgroups and
such are in play because *it reads from the sender's VM*. IOW it's
insecure and pointless. (OK, it has a point: logging. But I really
don't think that belongs in the kernel.)

In summary, the general idea is good, but the implementation isn't
general enough, the policy stuff is too specialized and enshrines bad
design, the performance isn't good enough to justify it, and the
metadata is nasty.

So, for what it's worth, NACK in its present form. Sorry.

--Andy

2015-04-13 20:22:58

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> > I remain opposed to this half thought out trash of an ABI for the
> > meta-data.
>
> You don't have to enable the metadata if you don't want to use it, it's
> an option :)

OK, _that_ argument needs to be stomped out. It had been used before,
and it was a deliberate scam. There is no such thing as optional kernel
interface, especially when udev/dbus/systemd crowd is nearby. We'd been
through that excuse before; remember how devtmpfs was pushed in as "optional"?

This is a huge red flag. On the level of "I need your account information
to transfer $200M you might have inherited from my deceased client".

Just to recap how it went the last time around: Kay kept pushing his piece of
code into the tree, claiming that it was optional, that nobody who doesn't
like it has to enable it, so what's the problem? OK, in it went. And pretty
soon udev (maintained by the same... meticulously honorable person) had
stopped working on the kernels that didn't have that enabled.

We had been there before. To paraphrase another... meticulously honorable
person, "if you didn't want something relied upon, why have you put it into the
kernel?" Said person is on the record as having no problem whatsoever with
adding dependencies to the bottom of userland stack.

IMO either it's OK without "if you don't like it, don't enable it", or it
should not be merged at all.

2015-04-13 20:37:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 09:22:33PM +0100, Al Viro wrote:
> On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> > > I remain opposed to this half thought out trash of an ABI for the
> > > meta-data.
> >
> > You don't have to enable the metadata if you don't want to use it, it's
> > an option :)
>
> OK, _that_ argument needs to be stomped out. It had been used before,
> and it was a deliberate scam. There is no such thing as optional kernel
> interface, especially when udev/dbus/systemd crowd is nearby. We'd been
> through that excuse before; remember how devtmpfs was pushed in as "optional"?
>
> This is a huge red flag. On the level of "I need your account information
> to transfer $200M you might have inherited from my deceased client".
>
> Just to recap how it went the last time around: Kay kept pushing his piece of
> code into the tree, claiming that it was optional, that nobody who doesn't
> like it has to enable it, so what's the problem? OK, in it went. And pretty
> soon udev (maintained by the same... meticulously honorable person) had
> stopped working on the kernels that didn't have that enabled.
>
> We had been there before. To paraphrase another... meticulously honorable
> person, "if you didn't want something relied upon, why have you put it into the
> kernel?" Said person is on the record as having no problem whatsoever with
> adding dependencies to the bottom of userland stack.
>
> IMO either it's OK without "if you don't like it, don't enable it", or it
> should not be merged at all.

We want it. I want it. Andy asked for the option to be disabled as he
didn't want it, so it was made that way. I'll gladly put that back in,
as I don't know of any problems with it, other than Eric's vague rants
about the issue.

thanks,

greg k-h

2015-04-13 20:45:55

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >
> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >
> > are available in the git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >
> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >
> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >
> > ----------------------------------------------------------------
> > kdbus for 4.1-rc1
> >
> > Here's the kdbus pull request for 4.1-rc1.
> >
> > It's been under development for many years now, and been in linux-next
> > for many months, and has undergone loads of testing a review and even a few
> > good arguments. It comes with full documentation and tests.
> >
> > There has been a few complaints about the code, notably from people who
> > don't like the use of metadata in the bus messages. That is actually
> > one of the main features here, as we can get this data in a secure and
> > reliable way, and it's something that userspace requires today. So
> > while it does look "odd" to people who are not familiar with dbus, this
> > is something that finally fixes a number of almost unfixable races in
> > the current dbus implementations.
>
> While I generally like the concept of having a better in-kernel IPC
> mechanism, after some consideration I don't think this belongs in the
> kernel in its current form. Here's why.
>
> First, the naming is counterintuitive. There are "endpoints", but you
> don't send messages to endpoints. In fact, an basic kdbus setup will
> have exactly one endpoint AFAICT. Wtf? This makes talking about it
> awkward.

Did you read the documentation? We've been over this before, and it
should all be addressed in the documentation based on this coming up.

> A lot of the design seems to be to violate the concept of "mechanism,
> not policy". Kdbus is very much a port of userspace dbus to the
> kernel, and it appears to be a port designed to preserve some
> questionable design decisions instead of learning from them.
>
> For example, kdbus sticks a whole policy database in the kernel, but
> that policy database (AFAICT -- holy crap it's overcomplicated) is
> *not* a simple set of rules like "if A then allow B". Instead it has
> really weird dependencies not on what name you're sending to but on
> what *other* names the thing you're sending to has. Sorry, but this
> way lies (a) the inability for a large set of developers to understand
> what's going on and (b) security bugs. Also, the result probably
> can't be reused as part of a non-legacy-filled sensible design

What policy database? Matching messages to subscribers? That's the
same type of "database" that other ipc subsystems need/want, there's
nothing radical here.

And lots of things has changed from userspace, based on a decade of
knowledge of how dbus works, and how dbus itself was implemented. The
design, and code, has been reviewed by those developers. Where issues
were raised, they were fixed.

Yes, dbus is "odd", but it serves a real need, and does so quite well,
and now kdbus is the next evolution of that system, fixing and
addressing the issues learned from implementing and designing dbus and
previous versions of this type of ipc (corba, dcom, com, etc.)

> Kdbus claims to be very fast. Unfortunately, requests for a broad set
> of benchmarks have mostly been ignored, my attempts to benchmark it
> (admittedly I didn't try that hard) were several times worse than
> published figures, and, most tellingly, *no one* has claimed that
> kdbus is faster than AF_UNIX. In fact, everyone seems to acknowledge
> that kdbus is several times slower than AF_UNIX.

It does more than AF_UNIX, so of course it's going to be slower. But
you can't do all the things you need to do with dbus with just AF_UNIX,
it's a different model. Again, the documentation should explain this.

And the benchmarks and source were posted by David previously, with full
details, this is the first time I've heard you could not reproduce them
using that code.

> The metadata thing is problematic. It seems to be intended to serve
> two purposes: data gathering for logging and authentication.
> Unfortunately, it has issues. There are no fewer than *three*
> metadata capture points: creation of a bus, connection to a bus, and
> sending of a message. The kdbus authors like to point out that these
> are all optional, but IMO that's bunk. Someone will write a userspace
> library that rejects messages from people who don't enable all of
> them, then then we're screwed.

Remember, you asked for it to be optional, it wasn't in the beginning :)

So let's make it not optional, great. And the capture points are in
different places as it is different data and entry points.

> Why are we screwed? Because any kdbus client *won't know which
> metadata matters*. That means that we automatically have the worst of
> all worlds, not the best. Also, the bus creation metadata is
> completely worthless for anything other than logging, but someone will
> use it for something other than logging, at which point it's
> vulnerable to a DoS. No one has explained to my satisfaction why this
> isn't a problem.

I don't think the creation data is worthless, I'm pretty sure the
SELinux people are using it to validate things, but I could be wrong.
Others on the cc: know more about that than I do and can provide
details.

> Also, the metadata code captures things that are, in my book
> completely unacceptable, such as cmdline and (!) capabilities. I bet
> that the cmdline capture is extra special fscked up when cgroups and
> such are in play because *it reads from the sender's VM*. IOW it's
> insecure and pointless. (OK, it has a point: logging. But I really
> don't think that belongs in the kernel.)

The sender's vm is what is wanted here. And cmdline is something that
userspace gets today, and does things with, as does SELinux, and
auditing. Same for capabilities, it's not insecure and pointless, it's
the same thing that is provided to userspace, and userspace makes
decisions on today, independent of kdbus/dbus.

thanks,

greg k-h

2015-04-13 21:02:01

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> >
>> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> >
>> > are available in the git repository at:
>> >
>> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> >
>> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> >
>> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> >
>> > ----------------------------------------------------------------
>> > kdbus for 4.1-rc1
>> >
>> > Here's the kdbus pull request for 4.1-rc1.
>> >
>> > It's been under development for many years now, and been in linux-next
>> > for many months, and has undergone loads of testing a review and even a few
>> > good arguments. It comes with full documentation and tests.
>> >
>> > There has been a few complaints about the code, notably from people who
>> > don't like the use of metadata in the bus messages. That is actually
>> > one of the main features here, as we can get this data in a secure and
>> > reliable way, and it's something that userspace requires today. So
>> > while it does look "odd" to people who are not familiar with dbus, this
>> > is something that finally fixes a number of almost unfixable races in
>> > the current dbus implementations.
>>
>> While I generally like the concept of having a better in-kernel IPC
>> mechanism, after some consideration I don't think this belongs in the
>> kernel in its current form. Here's why.
>>
>> First, the naming is counterintuitive. There are "endpoints", but you
>> don't send messages to endpoints. In fact, an basic kdbus setup will
>> have exactly one endpoint AFAICT. Wtf? This makes talking about it
>> awkward.
>
> Did you read the documentation? We've been over this before, and it
> should all be addressed in the documentation based on this coming up.
>
>> A lot of the design seems to be to violate the concept of "mechanism,
>> not policy". Kdbus is very much a port of userspace dbus to the
>> kernel, and it appears to be a port designed to preserve some
>> questionable design decisions instead of learning from them.
>>
>> For example, kdbus sticks a whole policy database in the kernel, but
>> that policy database (AFAICT -- holy crap it's overcomplicated) is
>> *not* a simple set of rules like "if A then allow B". Instead it has
>> really weird dependencies not on what name you're sending to but on
>> what *other* names the thing you're sending to has. Sorry, but this
>> way lies (a) the inability for a large set of developers to understand
>> what's going on and (b) security bugs. Also, the result probably
>> can't be reused as part of a non-legacy-filled sensible design
>
> What policy database? Matching messages to subscribers? That's the
> same type of "database" that other ipc subsystems need/want, there's
> nothing radical here.

Let me quote from the latest version of the kdbus docs:

Note that TALK access is checked against all names of a connection. For
example, if a connection owns both <constant>'org.foo.bar'</constant> and
<constant>'org.blah.baz'</constant>, and the policy database allows
<constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
permission is also granted to <constant>'org.foo.bar'</constant>. That
might sound illogical, but after all, we allow messages to be directed to
either the ID or a well-known name, and policy is applied to the
connection, not the name. In other words, the effective TALK policy for a
connection is the most permissive of all names the connection owns.

In my humble opinion, this paragraph speaks for itself. The design is
bad, full stop.

[...]

> And the benchmarks and source were posted by David previously, with full
> details, this is the first time I've heard you could not reproduce them
> using that code.

No it's not. But I got bored and didn't try again.

>
>> The metadata thing is problematic. It seems to be intended to serve
>> two purposes: data gathering for logging and authentication.
>> Unfortunately, it has issues. There are no fewer than *three*
>> metadata capture points: creation of a bus, connection to a bus, and
>> sending of a message. The kdbus authors like to point out that these
>> are all optional, but IMO that's bunk. Someone will write a userspace
>> library that rejects messages from people who don't enable all of
>> them, then then we're screwed.
>
> Remember, you asked for it to be optional, it wasn't in the beginning :)
>
> So let's make it not optional, great. And the capture points are in
> different places as it is different data and entry points.

Then I'll have to find a way to embolden my NACK further. My point is
that capturing garbage like cmdline and capabilities (again, that
latter part is completely unacceptable under any circumstances
whatsoever) on behalf of *all* senders is a disaster. If it's
optional, then I can at least hope that userspace will honor the
optionality and let everything turn it off. If it's mandatory, then
kdbus is just unsafe to use to send messages to untrusted parties.

>
>> Why are we screwed? Because any kdbus client *won't know which
>> metadata matters*. That means that we automatically have the worst of
>> all worlds, not the best. Also, the bus creation metadata is
>> completely worthless for anything other than logging, but someone will
>> use it for something other than logging, at which point it's
>> vulnerable to a DoS. No one has explained to my satisfaction why this
>> isn't a problem.
>
> I don't think the creation data is worthless, I'm pretty sure the
> SELinux people are using it to validate things, but I could be wrong.
> Others on the cc: know more about that than I do and can provide
> details.

Does that code even exist in public form yet?

>
>> Also, the metadata code captures things that are, in my book
>> completely unacceptable, such as cmdline and (!) capabilities. I bet
>> that the cmdline capture is extra special fscked up when cgroups and
>> such are in play because *it reads from the sender's VM*. IOW it's
>> insecure and pointless. (OK, it has a point: logging. But I really
>> don't think that belongs in the kernel.)
>
> The sender's vm is what is wanted here. And cmdline is something that
> userspace gets today, and does things with, as does SELinux, and
> auditing. Same for capabilities, it's not insecure and pointless, it's
> the same thing that is provided to userspace, and userspace makes
> decisions on today, independent of kdbus/dbus.

Is there anything that userspace makes decisions on based on
capabilities? If so, please tell me and I'll entertain myself by
writing exploits for them.

The fact that some existing userspace does awful things does *not*
justify adding new kernel mechanisms with which to repeat those
mistakes.

--Andy

2015-04-14 00:24:01

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

[email protected] (Eric W. Biederman) writes:

> Greg Kroah-Hartman <[email protected]> writes:
>
>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>>
>> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>>
>> are available in the git repository at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>>
>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>>
>> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>>
>> ----------------------------------------------------------------
>> kdbus for 4.1-rc1
>>
>> Here's the kdbus pull request for 4.1-rc1.
>>
>> It's been under development for many years now, and been in linux-next
>> for many months, and has undergone loads of testing a review and even a few
>> good arguments. It comes with full documentation and tests.
>
>> There has been a few complaints about the code, notably from people who
>> don't like the use of metadata in the bus messages. That is actually
>> one of the main features here, as we can get this data in a secure and
>> reliable way, and it's something that userspace requires today. So
>> while it does look "odd" to people who are not familiar with dbus, this
>> is something that finally fixes a number of almost unfixable races in
>> the current dbus implementations.
>
> And the code that transfers the meta-data is wrong.

In fact it is worse than I thought.

With an userspace application able to give meaning to any of the bits of
meta-data that are passed (capabilities, cgroup, security labels, etc)
that in the fullness of time dropping in them will grant you more
permissions somewhere.

Which means that it becomes impossible to change anything. Impossible
to jail anything. It in fact becomes impossible to do anything right.

Which means the ultimate result of the direction kdbus is going is a
world where nothing can be done without introducing a security issue or
breaking userspace.

So as far as I can tell kdbus has a fundamental design flaw.

My apologies for being the bearer of bad news.

Eric

2015-04-14 00:35:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 5:19 PM, Eric W. Biederman
<[email protected]> wrote:
> [email protected] (Eric W. Biederman) writes:
>
>> Greg Kroah-Hartman <[email protected]> writes:
>>
>>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>>>
>>> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>>>
>>> are available in the git repository at:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>>>
>>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>>>
>>> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>>>
>>> ----------------------------------------------------------------
>>> kdbus for 4.1-rc1
>>>
>>> Here's the kdbus pull request for 4.1-rc1.
>>>
>>> It's been under development for many years now, and been in linux-next
>>> for many months, and has undergone loads of testing a review and even a few
>>> good arguments. It comes with full documentation and tests.
>>
>>> There has been a few complaints about the code, notably from people who
>>> don't like the use of metadata in the bus messages. That is actually
>>> one of the main features here, as we can get this data in a secure and
>>> reliable way, and it's something that userspace requires today. So
>>> while it does look "odd" to people who are not familiar with dbus, this
>>> is something that finally fixes a number of almost unfixable races in
>>> the current dbus implementations.
>>
>> And the code that transfers the meta-data is wrong.
>
> In fact it is worse than I thought.
>
> With an userspace application able to give meaning to any of the bits of
> meta-data that are passed (capabilities, cgroup, security labels, etc)
> that in the fullness of time dropping in them will grant you more
> permissions somewhere.
>
> Which means that it becomes impossible to change anything. Impossible
> to jail anything. It in fact becomes impossible to do anything right.
>
> Which means the ultimate result of the direction kdbus is going is a
> world where nothing can be done without introducing a security issue or
> breaking userspace.
>
> So as far as I can tell kdbus has a fundamental design flaw.
>
> My apologies for being the bearer of bad news.
>

I agree here. I cannot overstate the degree to which passing caps
around through metadata is a bad idea.

LSM labels are probably nearly as bad. Having LSM hooks in kdbus is
one thing, but passing the *raw labels* around and letting userspace
muck with them will cause the policy situation to be incomprehensible.

User code should get simple yes/no answers from LSM policy, not raw data.

--Andy

2015-04-14 17:50:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >> >
> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >> >
> >> > are available in the git repository at:
> >> >
> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >> >
> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >> >
> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >> >
> >> > ----------------------------------------------------------------
> >> > kdbus for 4.1-rc1
> >> >
> >> > Here's the kdbus pull request for 4.1-rc1.
> >> >
> >> > It's been under development for many years now, and been in linux-next
> >> > for many months, and has undergone loads of testing a review and even a few
> >> > good arguments. It comes with full documentation and tests.
> >> >
> >> > There has been a few complaints about the code, notably from people who
> >> > don't like the use of metadata in the bus messages. That is actually
> >> > one of the main features here, as we can get this data in a secure and
> >> > reliable way, and it's something that userspace requires today. So
> >> > while it does look "odd" to people who are not familiar with dbus, this
> >> > is something that finally fixes a number of almost unfixable races in
> >> > the current dbus implementations.
> >>
> >> While I generally like the concept of having a better in-kernel IPC
> >> mechanism, after some consideration I don't think this belongs in the
> >> kernel in its current form. Here's why.
> >>
> >> First, the naming is counterintuitive. There are "endpoints", but you
> >> don't send messages to endpoints. In fact, an basic kdbus setup will
> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it
> >> awkward.
> >
> > Did you read the documentation? We've been over this before, and it
> > should all be addressed in the documentation based on this coming up.
> >
> >> A lot of the design seems to be to violate the concept of "mechanism,
> >> not policy". Kdbus is very much a port of userspace dbus to the
> >> kernel, and it appears to be a port designed to preserve some
> >> questionable design decisions instead of learning from them.
> >>
> >> For example, kdbus sticks a whole policy database in the kernel, but
> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
> >> *not* a simple set of rules like "if A then allow B". Instead it has
> >> really weird dependencies not on what name you're sending to but on
> >> what *other* names the thing you're sending to has. Sorry, but this
> >> way lies (a) the inability for a large set of developers to understand
> >> what's going on and (b) security bugs. Also, the result probably
> >> can't be reused as part of a non-legacy-filled sensible design
> >
> > What policy database? Matching messages to subscribers? That's the
> > same type of "database" that other ipc subsystems need/want, there's
> > nothing radical here.
>
> Let me quote from the latest version of the kdbus docs:
>
> Note that TALK access is checked against all names of a connection. For
> example, if a connection owns both <constant>'org.foo.bar'</constant> and
> <constant>'org.blah.baz'</constant>, and the policy database allows
> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
> permission is also granted to <constant>'org.foo.bar'</constant>. That
> might sound illogical, but after all, we allow messages to be directed to
> either the ID or a well-known name, and policy is applied to the
> connection, not the name. In other words, the effective TALK policy for a
> connection is the most permissive of all names the connection owns.
>
> In my humble opinion, this paragraph speaks for itself. The design is
> bad, full stop.

First off, thanks for reading the docs, I appreciate that. But realize
also, that this is straight from the D-Bus spec. We aren't doing
anything "radical" here, this is what your desktop uses that you are
typing your email from.

Yes, it's an unfortunate design, but one that we are all stuck with
(think of it as having to implement code for horrid hardware that you
have to get to work properly.) There are many applications out there
which don't address messages to?their well-known name destination but
to the ID which they looked up?earlier and cached. In fact, that
behavior is the default in the gdbus?library implementation.

If a connection owns two names, and one is more permissive than the?
other one, an attacker could as well choose the more openly configured?
name to get a message delivered. That's nothing we can protect from?
really. So ideally you never do that, just like you shouldn't do that
in an network configuration with DNS, if you want to manage access
properly.

The logic here is comparable to IP vs. DNS
- A host may have multiple DNS names assigned, just like a service may
be the owner of multiple well-known names
- Clients can talk to a service using its unique ID (uint64_t) or its
well known name.
- Clients can as well look up the ID of a well-known name and address
messages to it directly
- Hence, we cannot make decisions based on the well-known name that has
been used to send the message
- Instead, we have to fall back to the logic described in the docs
- Firewall rules are applied to IPs, _not_ DNS names!

D-Bus is a specification that has been out there for over a decade, and
we are not designing anything new here, but rather implementing it as
designed. We have to be compatible to the existing users of the DBus
system, and don't have the luxury of being able to change core things
like this and expect the world to be able to change just because the
design is not as clean as it should/could be.

Again, just like getting horrid hardware to work properly, sometimes we
have to write odd code. Or having to implement a network protocol that
doesn't seem to be designed "perfectly", yet is used by a few hundred
million systems so we have to remain compatible. This is all that we
are doing here for stuff like this.

Remember, this is called kDBUS, not kGENERICIPC, no matter how much we
would have liked that to happen from a kernel standpoint. :)

> > And the benchmarks and source were posted by David previously, with full
> > details, this is the first time I've heard you could not reproduce them
> > using that code.
>
> No it's not. But I got bored and didn't try again.

Sorry, I was not aware of that.

> >> The metadata thing is problematic. It seems to be intended to serve
> >> two purposes: data gathering for logging and authentication.

You forgot about introspection, more on that below.

> >> Unfortunately, it has issues. There are no fewer than *three*
> >> metadata capture points: creation of a bus, connection to a bus, and
> >> sending of a message. The kdbus authors like to point out that these
> >> are all optional, but IMO that's bunk. Someone will write a userspace
> >> library that rejects messages from people who don't enable all of
> >> them, then then we're screwed.
> >
> > Remember, you asked for it to be optional, it wasn't in the beginning :)
> >
> > So let's make it not optional, great. And the capture points are in
> > different places as it is different data and entry points.
>
> Then I'll have to find a way to embolden my NACK further. My point is
> that capturing garbage like cmdline and capabilities (again, that
> latter part is completely unacceptable under any circumstances
> whatsoever) on behalf of *all* senders is a disaster. If it's
> optional, then I can at least hope that userspace will honor the
> optionality and let everything turn it off. If it's mandatory, then
> kdbus is just unsafe to use to send messages to untrusted parties.

It's opted in by the receiving peer if the task implementing a service
wants to access these pieces of information. It is optional, and the
documentation clearly states that userspace should cope with this, and
also, when they are available we make sure to provide the correct
race-free information.

As said many times before, an application can do so already today with
information from other API file systems, so why is this suddenly a
problem when kdbus optionally offers the exact same information along
with each transmitted message? Yes, we all "hate" capabilities, but
userspace uses them, and gets access to them all the time through the
POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
/proc/pid/status. They are something that we have to support and handle
properly.

In the very first submission of kdbus, we stated that we want to allow
userspace methods to access these same bits to be able to make decisions
about permissions. And to do so in a race-free manner, which is very
hard, if not almost impossible, to do so from userspace alone.

For instance, if a task has CAP_NET_ADMIN set, we can use that
information in order to allow or disallow certain actions to be taken by
a privileged process. Or, if a client that has the capability to call
reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
system, the system daemon listening for that message knows that yes, at
the time that the client made that call, it really did have that
capability so it is ok to actually reboot the system.

Instead of trying to use SCM_CREDENTIALS to get the pid and another
round of cap_get_pid() and the like, all of which are susceptable to
racing and all sorts of other horrors, that are insecure, we can provide
this information in an atomic, and secure way.

The kernel today, and userspace, relies on capabilities all the time
(i.e. almost every syscall), how are they something that is somehow not
valid to use and support?


And of course, as Eric will point out, capabailities are not
translatable across user namespaces, which is a problem. Because of
this, we dispose of that piece of metadata information when a message
crosses a user namespace boundry. This is the right thing to do, which
is not the case for almost all other kernel apis which report bogus
capabilies when user namespaces are crossed.

So we implemented this correctly, and somehow that is a feature so bad
that both you and Eric think the whole baby should be thrown out? How
else should this be implemented?

As for the command line information, yes, it is "unsafe", and we clearly
state taht in the documentation. However, it is still a very valid
piece of information. For example, when a service is activated by a
method, getting to know which binary caused that to happen is very
usefull when debugging. It's also very useful when debugging multi-call
binaries because the command line actually tells you argv[0] correctly.

Because of this, that's why lots of userspace tools use the command line
information today, again, providing that information is a help to them,
why wouldn't we provide them that help when we have access to it?

Metadata attachment has always been optional, based on the setting of
the receiving peer, but we have added, at your request, the ability to
globally limit what kdbus is able to transport for that metadata,
regardless of the settings on both sides. It sounds like this option
isn't liked, and I'll be glad to revert it as I do think the metadata is
useful and wanted.

> >> Why are we screwed? Because any kdbus client *won't know which
> >> metadata matters*. That means that we automatically have the worst of
> >> all worlds, not the best. Also, the bus creation metadata is
> >> completely worthless for anything other than logging, but someone will
> >> use it for something other than logging, at which point it's
> >> vulnerable to a DoS. No one has explained to my satisfaction why this
> >> isn't a problem.


Metadata is gathered for logging, authentication and introspection. Bus
creator metadata is not used for logging or authentication, but for
introspection only. It could be really useful for a service that has a
bus handle to actually know which bus it is connected to, but it's not
supposed to be used as authentication measure. So I was wrong to think
that the SELinux people use it, sorry about that.

Remember there are three different places that metadata is collected,
for three different things. Yeah, we call them all "metadata", which is
probably why the confusion here, but these all are different "things"
entirely, and the documentation does describe this really well. If not,
please let me know and I will work on it to make it more clear.

The important point here is that you cannot look at this concept without
keeping the dbus spec in scope. Nobody is supposed to write native kdbus
clients directly. you can, of course, but the entire concept of how
services are implemented follows a higer-level logic which is supposed
to be implemented by high-level libraries.

Yes, this isn't the best argument for why you might feel more
comforatable about merging this code, as us kernel developers are used
to stand-alone apis that they can use without library helpers, but it is
common and needed. But really, when was the last time you wrote an ALSA
library from scratch? :)

Again, remember the compatibility requirements for your userspace D-Bus
clients today, we have to ensure this, or this code is pointless.

A word about introspection. In talking about this with Daniel on IRC
today, he came up with this good example to explain it better to me, as
I didn't quite understand it well. I'll paraphrase it here, keeping
with the "bus" metaphor that D-Bus requires:

Imagine we're all taking a little tour, out to the nature, a
lake or something. We're taking a bus to get there. The bus can
accommodate a large number of people, and we don't know yet who
will join. Everybody who enters the bus has to show their
passport to the conductor (refrain from calling it driver,
because hell no, it clearly isn't a driver!! ;)).

The conductor makes a copy of each of the people entering the
bus, because it wants to know who's on the bus. One property of
that strange bunch of programmers on the bus is that they don't
necessarily respond to anything, but whenever anyone in the bus
talks to another person, they show their passport in order to
identify themselves, because you know you can't trust anyone.

Next, the police stops the bus and wants to know who's on it.
As the programmers usually don't respond when being spoken to,
especially if it is the police, the bus conductor hands out a
list of all the passport copies he gathered. That is called
introspection that is not backed by cooperative bus members.
The conductor makes a copy of each OF THE PASSPORT of the people
entering the bus, to help the police (i.e. debuggers) determine
who is on the bus.

It is a property of the bus itself which describes which
personal data you have to give to the conductor in order to be
allowed in. If you're not willing to give out all the bits the
bus requires you to, you have to stay out. That's not a problem
of the system, but rather something to discuss with the owner of
the bus. This way, it is totally possible to have a bus that
does not require anything from its passengers, and passengers
that do not allow any personal information to be revealed, but
then the police can't do much of course when it stops the bus in
order to introspect it.

Then there is a set of global laws in that world in which all
the busses live. These laws define which data is allowed to be
passed around at all in general. When a bus requires its
passengers to reveal their hair color, for instance, but passing
that information around is forbidden by global law. This
requirement is ignored when buses are created or anyone enters
any of those buses.

And to complete the story and outline the differences of the
passports that were used to make a copy from and the one that is
used during communication, we'd have to a add story about people
changing their hair color constantly in the washroom on the back
of the bus, out of sight of the conductor, but this metaphor is
getting quite long enough already...

Does that help explain introspection and the need for it here?


> >> Also, the metadata code captures things that are, in my book
> >> completely unacceptable, such as cmdline and (!) capabilities. I bet
> >> that the cmdline capture is extra special fscked up when cgroups and
> >> such are in play because *it reads from the sender's VM*. IOW it's
> >> insecure and pointless. (OK, it has a point: logging. But I really
> >> don't think that belongs in the kernel.)
> >
> > The sender's vm is what is wanted here. And cmdline is something that
> > userspace gets today, and does things with, as does SELinux, and
> > auditing. Same for capabilities, it's not insecure and pointless, it's
> > the same thing that is provided to userspace, and userspace makes
> > decisions on today, independent of kdbus/dbus.
>
> Is there anything that userspace makes decisions on based on
> capabilities? If so, please tell me and I'll entertain myself by
> writing exploits for them.
>
> The fact that some existing userspace does awful things does *not*
> justify adding new kernel mechanisms with which to repeat those
> mistakes.

polkit used to do something like this, but the obvious race conditions
that you know about prevented it from working properly, so other odd
work-arounds had to be created. However, if we can provide this in a
race free manner, those work-arounds are no longer needed.

As documented in the original email on this thread, Tizen wants to use
this, as it solves a real need that they have. Their workarounds
involve using custom UDS sockets, but the latency involved is horrid and
unacceptable. Using a kdbus message solves this issue for them,
allowing UI rendering to work properly/quickly.

Again, capabilities are something we all require and rely on today,
passing the current capability on to a recipient isn't a way to raise
privileges at all, but rather, properly determine if they are present
at sending time, if wanted. How does that create an insecure system?
What am I missing that is so bad here with the design we have?

thanks,

greg k-h

2015-04-14 17:55:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 07:19:49PM -0500, Eric W. Biederman wrote:
> [email protected] (Eric W. Biederman) writes:
>
> > Greg Kroah-Hartman <[email protected]> writes:
> >
> >> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >>
> >> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >>
> >> are available in the git repository at:
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >>
> >> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >>
> >> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >>
> >> ----------------------------------------------------------------
> >> kdbus for 4.1-rc1
> >>
> >> Here's the kdbus pull request for 4.1-rc1.
> >>
> >> It's been under development for many years now, and been in linux-next
> >> for many months, and has undergone loads of testing a review and even a few
> >> good arguments. It comes with full documentation and tests.
> >
> >> There has been a few complaints about the code, notably from people who
> >> don't like the use of metadata in the bus messages. That is actually
> >> one of the main features here, as we can get this data in a secure and
> >> reliable way, and it's something that userspace requires today. So
> >> while it does look "odd" to people who are not familiar with dbus, this
> >> is something that finally fixes a number of almost unfixable races in
> >> the current dbus implementations.
> >
> > And the code that transfers the meta-data is wrong.
>
> In fact it is worse than I thought.

Please see the email response I just wrote to Andy about this, it should
address these misconceptions.

thanks,

greg k-h

2015-04-14 18:57:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
>> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
>> >> <[email protected]> wrote:
>> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> >> >
>> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> >> >
>> >> > are available in the git repository at:
>> >> >
>> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> >> >
>> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> >> >
>> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> >> >
>> >> > ----------------------------------------------------------------
>> >> > kdbus for 4.1-rc1
>> >> >
>> >> > Here's the kdbus pull request for 4.1-rc1.
>> >> >
>> >> > It's been under development for many years now, and been in linux-next
>> >> > for many months, and has undergone loads of testing a review and even a few
>> >> > good arguments. It comes with full documentation and tests.
>> >> >
>> >> > There has been a few complaints about the code, notably from people who
>> >> > don't like the use of metadata in the bus messages. That is actually
>> >> > one of the main features here, as we can get this data in a secure and
>> >> > reliable way, and it's something that userspace requires today. So
>> >> > while it does look "odd" to people who are not familiar with dbus, this
>> >> > is something that finally fixes a number of almost unfixable races in
>> >> > the current dbus implementations.
>> >>
>> >> While I generally like the concept of having a better in-kernel IPC
>> >> mechanism, after some consideration I don't think this belongs in the
>> >> kernel in its current form. Here's why.
>> >>
>> >> First, the naming is counterintuitive. There are "endpoints", but you
>> >> don't send messages to endpoints. In fact, an basic kdbus setup will
>> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it
>> >> awkward.
>> >
>> > Did you read the documentation? We've been over this before, and it
>> > should all be addressed in the documentation based on this coming up.
>> >
>> >> A lot of the design seems to be to violate the concept of "mechanism,
>> >> not policy". Kdbus is very much a port of userspace dbus to the
>> >> kernel, and it appears to be a port designed to preserve some
>> >> questionable design decisions instead of learning from them.
>> >>
>> >> For example, kdbus sticks a whole policy database in the kernel, but
>> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
>> >> *not* a simple set of rules like "if A then allow B". Instead it has
>> >> really weird dependencies not on what name you're sending to but on
>> >> what *other* names the thing you're sending to has. Sorry, but this
>> >> way lies (a) the inability for a large set of developers to understand
>> >> what's going on and (b) security bugs. Also, the result probably
>> >> can't be reused as part of a non-legacy-filled sensible design
>> >
>> > What policy database? Matching messages to subscribers? That's the
>> > same type of "database" that other ipc subsystems need/want, there's
>> > nothing radical here.
>>
>> Let me quote from the latest version of the kdbus docs:
>>
>> Note that TALK access is checked against all names of a connection. For
>> example, if a connection owns both <constant>'org.foo.bar'</constant> and
>> <constant>'org.blah.baz'</constant>, and the policy database allows
>> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
>> permission is also granted to <constant>'org.foo.bar'</constant>. That
>> might sound illogical, but after all, we allow messages to be directed to
>> either the ID or a well-known name, and policy is applied to the
>> connection, not the name. In other words, the effective TALK policy for a
>> connection is the most permissive of all names the connection owns.
>>
>> In my humble opinion, this paragraph speaks for itself. The design is
>> bad, full stop.
>
> First off, thanks for reading the docs, I appreciate that. But realize
> also, that this is straight from the D-Bus spec. We aren't doing
> anything "radical" here, this is what your desktop uses that you are
> typing your email from.
>
> Yes, it's an unfortunate design, but one that we are all stuck with
> (think of it as having to implement code for horrid hardware that you
> have to get to work properly.)

I agree. You've sent a pull request for an unfortunate design. I
don't think that unfortunate design belongs in the kernel. If it says
in userspace, then user programmers could potentially fix it some day.

> There are many applications out there
> which don't address messages to their well-known name destination but
> to the ID which they looked up earlier and cached. In fact, that
> behavior is the default in the gdbus library implementation.
>
> If a connection owns two names, and one is more permissive than the
> other one, an attacker could as well choose the more openly configured
> name to get a message delivered. That's nothing we can protect from
> really. So ideally you never do that, just like you shouldn't do that
> in an network configuration with DNS, if you want to manage access
> properly.
>
> The logic here is comparable to IP vs. DNS

[snip some]

It's comparable to someone trying to write a firewall that filters on
DNS names. There's a good reason that people don't do that.

[snip]


>>
>> Then I'll have to find a way to embolden my NACK further. My point is
>> that capturing garbage like cmdline and capabilities (again, that
>> latter part is completely unacceptable under any circumstances
>> whatsoever) on behalf of *all* senders is a disaster. If it's
>> optional, then I can at least hope that userspace will honor the
>> optionality and let everything turn it off. If it's mandatory, then
>> kdbus is just unsafe to use to send messages to untrusted parties.
>
> It's opted in by the receiving peer if the task implementing a service
> wants to access these pieces of information. It is optional, and the
> documentation clearly states that userspace should cope with this, and
> also, when they are available we make sure to provide the correct
> race-free information.
>
> As said many times before, an application can do so already today with
> information from other API file systems, so why is this suddenly a
> problem when kdbus optionally offers the exact same information along
> with each transmitted message? Yes, we all "hate" capabilities, but
> userspace uses them, and gets access to them all the time through the
> POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
> /proc/pid/status. They are something that we have to support and handle
> properly.
>
> In the very first submission of kdbus, we stated that we want to allow
> userspace methods to access these same bits to be able to make decisions
> about permissions. And to do so in a race-free manner, which is very
> hard, if not almost impossible, to do so from userspace alone.
>
> For instance, if a task has CAP_NET_ADMIN set, we can use that
> information in order to allow or disallow certain actions to be taken by
> a privileged process. Or, if a client that has the capability to call
> reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
> system, the system daemon listening for that message knows that yes, at
> the time that the client made that call, it really did have that
> capability so it is ok to actually reboot the system.
>
> Instead of trying to use SCM_CREDENTIALS to get the pid and another
> round of cap_get_pid() and the like, all of which are susceptable to
> racing and all sorts of other horrors, that are insecure, we can provide
> this information in an atomic, and secure way.

/me suppresses a long string of expletives.

Please point me at the code that does this with caps. It's WRONG in
userspace and it's WRONG in the kernel. I want to know what code that
runs on my system does this so I can send the appropriate bug reports
and get it fixed. I think the RHEL crowd at least will take it
seriously when I tell them that this is a security hole.

>
> The kernel today, and userspace, relies on capabilities all the time
> (i.e. almost every syscall), how are they something that is somehow not
> valid to use and support?

No. The *kernel* relies on caps. Userspace should not.

>
>
> And of course, as Eric will point out, capabailities are not
> translatable across user namespaces, which is a problem. Because of
> this, we dispose of that piece of metadata information when a message
> crosses a user namespace boundry. This is the right thing to do, which
> is not the case for almost all other kernel apis which report bogus
> capabilies when user namespaces are crossed.

The right thing to do is to not use capabilities for userspace stuff.

>
> So we implemented this correctly, and somehow that is a feature so bad
> that both you and Eric think the whole baby should be thrown out? How
> else should this be implemented?

It shouldn't be implemented.

>
> As documented in the original email on this thread, Tizen wants to use
> this, as it solves a real need that they have. Their workarounds
> involve using custom UDS sockets, but the latency involved is horrid and
> unacceptable. Using a kdbus message solves this issue for them,
> allowing UI rendering to work properly/quickly.
>
> Again, capabilities are something we all require and rely on today,
> passing the current capability on to a recipient isn't a way to raise
> privileges at all, but rather, properly determine if they are present
> at sending time, if wanted. How does that create an insecure system?
> What am I missing that is so bad here with the design we have?

That, even if the implementation could be made to be useful and
correct, capabilities refer to privileges wrt the kernel, not
userspace. They're not the right bit of policy to look at here.

For example, the thing that should make it possible to run 'systemctl
reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the
permission to hard reboot the system immediately, and that's not what
'systemctl reboot' is for.

I find myself comparing kdbus to win32k, and that's not a good sign...


--Andy

2015-04-14 19:24:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> >> >> <[email protected]> wrote:
> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >> >> >
> >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >> >> >
> >> >> > are available in the git repository at:
> >> >> >
> >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >> >> >
> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >> >> >
> >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >> >> >
> >> >> > ----------------------------------------------------------------
> >> >> > kdbus for 4.1-rc1
> >> >> >
> >> >> > Here's the kdbus pull request for 4.1-rc1.
> >> >> >
> >> >> > It's been under development for many years now, and been in linux-next
> >> >> > for many months, and has undergone loads of testing a review and even a few
> >> >> > good arguments. It comes with full documentation and tests.
> >> >> >
> >> >> > There has been a few complaints about the code, notably from people who
> >> >> > don't like the use of metadata in the bus messages. That is actually
> >> >> > one of the main features here, as we can get this data in a secure and
> >> >> > reliable way, and it's something that userspace requires today. So
> >> >> > while it does look "odd" to people who are not familiar with dbus, this
> >> >> > is something that finally fixes a number of almost unfixable races in
> >> >> > the current dbus implementations.
> >> >>
> >> >> While I generally like the concept of having a better in-kernel IPC
> >> >> mechanism, after some consideration I don't think this belongs in the
> >> >> kernel in its current form. Here's why.
> >> >>
> >> >> First, the naming is counterintuitive. There are "endpoints", but you
> >> >> don't send messages to endpoints. In fact, an basic kdbus setup will
> >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it
> >> >> awkward.
> >> >
> >> > Did you read the documentation? We've been over this before, and it
> >> > should all be addressed in the documentation based on this coming up.
> >> >
> >> >> A lot of the design seems to be to violate the concept of "mechanism,
> >> >> not policy". Kdbus is very much a port of userspace dbus to the
> >> >> kernel, and it appears to be a port designed to preserve some
> >> >> questionable design decisions instead of learning from them.
> >> >>
> >> >> For example, kdbus sticks a whole policy database in the kernel, but
> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
> >> >> *not* a simple set of rules like "if A then allow B". Instead it has
> >> >> really weird dependencies not on what name you're sending to but on
> >> >> what *other* names the thing you're sending to has. Sorry, but this
> >> >> way lies (a) the inability for a large set of developers to understand
> >> >> what's going on and (b) security bugs. Also, the result probably
> >> >> can't be reused as part of a non-legacy-filled sensible design
> >> >
> >> > What policy database? Matching messages to subscribers? That's the
> >> > same type of "database" that other ipc subsystems need/want, there's
> >> > nothing radical here.
> >>
> >> Let me quote from the latest version of the kdbus docs:
> >>
> >> Note that TALK access is checked against all names of a connection. For
> >> example, if a connection owns both <constant>'org.foo.bar'</constant> and
> >> <constant>'org.blah.baz'</constant>, and the policy database allows
> >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
> >> permission is also granted to <constant>'org.foo.bar'</constant>. That
> >> might sound illogical, but after all, we allow messages to be directed to
> >> either the ID or a well-known name, and policy is applied to the
> >> connection, not the name. In other words, the effective TALK policy for a
> >> connection is the most permissive of all names the connection owns.
> >>
> >> In my humble opinion, this paragraph speaks for itself. The design is
> >> bad, full stop.
> >
> > First off, thanks for reading the docs, I appreciate that. But realize
> > also, that this is straight from the D-Bus spec. We aren't doing
> > anything "radical" here, this is what your desktop uses that you are
> > typing your email from.
> >
> > Yes, it's an unfortunate design, but one that we are all stuck with
> > (think of it as having to implement code for horrid hardware that you
> > have to get to work properly.)
>
> I agree. You've sent a pull request for an unfortunate design. I
> don't think that unfortunate design belongs in the kernel. If it says
> in userspace, then user programmers could potentially fix it some day.

You might not like the design, but it is a valid design. Again, we
don't refuse to support hardware that is designed badly. Or support
protocols we don't necessarily like, that's not the job of a kernel or
operating system.

And here's Havoc's response as to why actually, this is a good design:
http://lists.freedesktop.org/archives/dbus/2015-April/016651.html

so while we might not think it's nice, maybe we are just not that
knowledgeable in this design space, and need to trust those that are.

I know I do.

I'll respond to the rest after I get some dinner...

thanks,

greg k-h

2015-04-14 19:26:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> You might not like the design, but it is a valid design. Again, we
> don't refuse to support hardware that is designed badly.

Yeah except the small difference that unlike this, we can't change
hardware.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-14 19:32:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> > You might not like the design, but it is a valid design. Again, we
> > don't refuse to support hardware that is designed badly.
>
> Yeah except the small difference that unlike this, we can't change
> hardware.

And we can't change the design/implementation of many things, again,
it's not the kernel's job to prevent something, just because we don't
like the RFC, from being accepted.

Go read Havoc's email about why the design is the way it is that I just
posted. Maybe we are the ones that really don't know the issues
involved enough to say that the current design is somehow "wrong".

thanks,

greg k-h

2015-04-14 19:36:05

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:

> > I agree. You've sent a pull request for an unfortunate design. I
> > don't think that unfortunate design belongs in the kernel. If it says
> > in userspace, then user programmers could potentially fix it some day.
>
> You might not like the design, but it is a valid design. Again, we
> don't refuse to support hardware that is designed badly. Or support
> protocols we don't necessarily like, that's not the job of a kernel or
> operating system.

Bullshit. The problem you seem to deliberately ignore is that once it's
in the kernel, it's impossible to eradicate. It's not just a crap design,
it's a crap design you are taking in as-is.

And no, "the sole consumer of that API knows better, so bend over" is not
a good idea. We have shitloads of examples when single-consumer APIs
turned into screaming horrors; taking that in over the objections to API
design, merely on "they do it that way, who the hell we are to say they
are wrong?" is insane.

2015-04-14 19:40:36

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> > > You might not like the design, but it is a valid design. Again, we
> > > don't refuse to support hardware that is designed badly.
> >
> > Yeah except the small difference that unlike this, we can't change
> > hardware.
>
> And we can't change the design/implementation of many things, again,
> it's not the kernel's job to prevent something, just because we don't
> like the RFC, from being accepted.

Translate, please. What exactly will be prevented by NAK on your Fine
Piece Of Software? Not dbus working as it does, surely?

2015-04-14 19:44:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>
> > > I agree. You've sent a pull request for an unfortunate design. I
> > > don't think that unfortunate design belongs in the kernel. If it says
> > > in userspace, then user programmers could potentially fix it some day.
> >
> > You might not like the design, but it is a valid design. Again, we
> > don't refuse to support hardware that is designed badly. Or support
> > protocols we don't necessarily like, that's not the job of a kernel or
> > operating system.
>
> Bullshit. The problem you seem to deliberately ignore is that once it's
> in the kernel, it's impossible to eradicate. It's not just a crap design,
> it's a crap design you are taking in as-is.

It is not a crap design. Go read the link I provided. Havoc points out
exactly why the design is the way it is, for very valid reasons. It's
actually much like X11 is as well, but not like "normal" IP connections
at all.

> And no, "the sole consumer of that API knows better, so bend over" is not
> a good idea. We have shitloads of examples when single-consumer APIs
> turned into screaming horrors; taking that in over the objections to API
> design, merely on "they do it that way, who the hell we are to say they
> are wrong?" is insane.

Again, in this domain, the design is sound. So much so that everyone
who works in that area moved toward it (KDE, Qt, Go, etc.) We might not
think it makes sense, and it did take me a while to wrap my head around
it, but to call it "crap" is unfair, sorry.

greg k-h

2015-04-14 19:48:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 08:40:04PM +0100, Al Viro wrote:
> On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote:
> > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
> > > > You might not like the design, but it is a valid design. Again, we
> > > > don't refuse to support hardware that is designed badly.
> > >
> > > Yeah except the small difference that unlike this, we can't change
> > > hardware.
> >
> > And we can't change the design/implementation of many things, again,
> > it's not the kernel's job to prevent something, just because we don't
> > like the RFC, from being accepted.
>
> Translate, please. What exactly will be prevented by NAK on your Fine
> Piece Of Software? Not dbus working as it does, surely?

I don't understand. You can not like the D-Bus model (and accordingly
the X11 model), but to prevent users from wanting to use it in a more
secure, and faster way by implementing it like we have seems very odd to
me.

It's not going to stop anything from working, it's just going to stop
some programs from being able to do things they really want to do (see
the first email for examples.)

Yes, we could make this live outside the kernel tree, but that's not the
way we work anymore. We merge things that are useful, that match our
security and coding requirements, and are going to be maintained by
people we trust. To have the only major objection be "we don't like the
way the protocol is designed because we know better, sorry", isn't ok at
all.

greg k-h

2015-04-14 19:56:07

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 09:48:04PM +0200, Greg Kroah-Hartman wrote:
> It's not going to stop anything from working, it's just going to stop
> some programs from being able to do things they really want to do (see
> the first email for examples.)

Until it is made "mandatory" as Al said earlier.

> Yes, we could make this live outside the kernel tree, but that's not the
> way we work anymore.

> We merge things that are useful, that match our
> security and coding requirements, and are going to be maintained by
> people we trust.

We trust? I'm not going to even comment on that.

And frankly, merging a useful piece of code sounds completely different
to me than this serious backlash I'm reading from the sidelines.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-14 20:12:01

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Dienstag, 14. April 2015, 21:48:04 schrieb Greg Kroah-Hartman:
> On Tue, Apr 14, 2015 at 08:40:04PM +0100, Al Viro wrote:
> > On Tue, Apr 14, 2015 at 09:32:29PM +0200, Greg Kroah-Hartman wrote:
> > > On Tue, Apr 14, 2015 at 09:24:29PM +0200, Borislav Petkov wrote:
> > > > On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman
wrote:
> > > > > You might not like the design, but it is a valid design. Again,
> > > > > we
> > > > > don't refuse to support hardware that is designed badly.
> > > >
> > > > Yeah except the small difference that unlike this, we can't change
> > > > hardware.
> > >
> > > And we can't change the design/implementation of many things, again,
> > > it's not the kernel's job to prevent something, just because we
> > > don't
> > > like the RFC, from being accepted.
> >
> > Translate, please. What exactly will be prevented by NAK on your Fine
> > Piece Of Software? Not dbus working as it does, surely?
>
> I don't understand. You can not like the D-Bus model (and accordingly
> the X11 model), but to prevent users from wanting to use it in a more
> secure, and faster way by implementing it like we have seems very odd to
> me.
>
> It's not going to stop anything from working, it's just going to stop
> some programs from being able to do things they really want to do (see
> the first email for examples.)
>
> Yes, we could make this live outside the kernel tree, but that's not the
> way we work anymore. We merge things that are useful, that match our
> security and coding requirements, and are going to be maintained by
> people we trust. To have the only major objection be "we don't like
> the way the protocol is designed because we know better, sorry", isn't
> ok at all.

Greg, I think I understood Al here.

dbus as it is used in KDE, GNOME, network-manager, systemd, you name it
does work. Not merging kdbus will not break it.

So the ones who want to see kdbus in kernel want to do something better or
differently like it is currently done in dbus. And yes, I have seen the
presentations about the benefits of having dbus in the kernel.

But if thats the case, what I think Al asks for a *new* kernel component
is a sound design that does not repeat any flaws from the original design
as the original design is no hardware that cannot be changed anymore after
production.

And to whether the design of kdbus is sound there seem to be strong
different oppinions about it. I think it is important to accept that and go
from there.

On the other hand, if you do things differently enough from the way
userspace dbus is doing it in order to have such a sound design, it may be
necessary to adapt all applications to it. But since kdbus is not yet in
the kernel officially this would not violate the "we never ever break
userspace" rule, cause the kernel obviously doesn?t guarantee the
stability of the current userspace dbus API, cause it doesn?t yet have
such an API at all. But if kdbus goes in, it has, and then it needs to
guarantee it until this "never break userspace" rule is changed, *if*
ever.

And also: Even if the kernel API is different in order to be sound, it may
be possible to adapt userspace dbus to use it to improve upon some of its
current flaws so that applications using it do not need to be changed at
all.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-14 20:15:14

by John Stoffel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>>> "Greg" == Greg Kroah-Hartman <[email protected]> writes:

Greg> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
>> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
>> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
>> >> <[email protected]> wrote:
>> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
>> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
>> >> >> <[email protected]> wrote:
>> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>> >> >> >
>> >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>> >> >> >
>> >> >> > are available in the git repository at:
>> >> >> >
>> >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>> >> >> >
>> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>> >> >> >
>> >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>> >> >> >
>> >> >> > ----------------------------------------------------------------
>> >> >> > kdbus for 4.1-rc1
>> >> >> >
>> >> >> > Here's the kdbus pull request for 4.1-rc1.
>> >> >> >
>> >> >> > It's been under development for many years now, and been in linux-next
>> >> >> > for many months, and has undergone loads of testing a review and even a few
>> >> >> > good arguments. It comes with full documentation and tests.
>> >> >> >
>> >> >> > There has been a few complaints about the code, notably from people who
>> >> >> > don't like the use of metadata in the bus messages. That is actually
>> >> >> > one of the main features here, as we can get this data in a secure and
>> >> >> > reliable way, and it's something that userspace requires today. So
>> >> >> > while it does look "odd" to people who are not familiar with dbus, this
>> >> >> > is something that finally fixes a number of almost unfixable races in
>> >> >> > the current dbus implementations.
>> >> >>
>> >> >> While I generally like the concept of having a better in-kernel IPC
>> >> >> mechanism, after some consideration I don't think this belongs in the
>> >> >> kernel in its current form. Here's why.
>> >> >>
>> >> >> First, the naming is counterintuitive. There are "endpoints", but you
>> >> >> don't send messages to endpoints. In fact, an basic kdbus setup will
>> >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it
>> >> >> awkward.
>> >> >
>> >> > Did you read the documentation? We've been over this before, and it
>> >> > should all be addressed in the documentation based on this coming up.
>> >> >
>> >> >> A lot of the design seems to be to violate the concept of "mechanism,
>> >> >> not policy". Kdbus is very much a port of userspace dbus to the
>> >> >> kernel, and it appears to be a port designed to preserve some
>> >> >> questionable design decisions instead of learning from them.
>> >> >>
>> >> >> For example, kdbus sticks a whole policy database in the kernel, but
>> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
>> >> >> *not* a simple set of rules like "if A then allow B". Instead it has
>> >> >> really weird dependencies not on what name you're sending to but on
>> >> >> what *other* names the thing you're sending to has. Sorry, but this
>> >> >> way lies (a) the inability for a large set of developers to understand
>> >> >> what's going on and (b) security bugs. Also, the result probably
>> >> >> can't be reused as part of a non-legacy-filled sensible design
>> >> >
>> >> > What policy database? Matching messages to subscribers? That's the
>> >> > same type of "database" that other ipc subsystems need/want, there's
>> >> > nothing radical here.
>> >>
>> >> Let me quote from the latest version of the kdbus docs:
>> >>
>> >> Note that TALK access is checked against all names of a connection. For
>> >> example, if a connection owns both <constant>'org.foo.bar'</constant> and
>> >> <constant>'org.blah.baz'</constant>, and the policy database allows
>> >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
>> >> permission is also granted to <constant>'org.foo.bar'</constant>. That
>> >> might sound illogical, but after all, we allow messages to be directed to
>> >> either the ID or a well-known name, and policy is applied to the
>> >> connection, not the name. In other words, the effective TALK policy for a
>> >> connection is the most permissive of all names the connection owns.
>> >>
>> >> In my humble opinion, this paragraph speaks for itself. The design is
>> >> bad, full stop.
>> >
>> > First off, thanks for reading the docs, I appreciate that. But realize
>> > also, that this is straight from the D-Bus spec. We aren't doing
>> > anything "radical" here, this is what your desktop uses that you are
>> > typing your email from.
>> >
>> > Yes, it's an unfortunate design, but one that we are all stuck with
>> > (think of it as having to implement code for horrid hardware that you
>> > have to get to work properly.)
>>
>> I agree. You've sent a pull request for an unfortunate design. I
>> don't think that unfortunate design belongs in the kernel. If it says
>> in userspace, then user programmers could potentially fix it some day.

Greg> You might not like the design, but it is a valid design. Again, we
Greg> don't refuse to support hardware that is designed badly. Or support
Greg> protocols we don't necessarily like, that's not the job of a kernel or
Greg> operating system.

Greg> And here's Havoc's response as to why actually, this is a good design:
Greg> http://lists.freedesktop.org/archives/dbus/2015-April/016651.html

This is an interesting discussion, and one thing that sticks out to me
is the comments in the URL above talking about how clients are
supposed to use a generic name to bind to a resource, but actually do
a lookup to get the specific name, and then bind to THAT.

So the security concerns raised by Andy do seem to make sense, in that
either security needs to be the same across all names of a service, so
that you don't have problems with varying levels once people have
connected. In terms of the X11 analogy, if I have someone connect,
and then I do 'xhost -' it removes all access. It's not dependent on
whether I'm bound to a specific or general service.

So the security aspect really needs to be that the most restrictive
takes precedence, not the other way around.

And after having read a bunch of the docs, looked at the FAQ, etc;
it's still no clearer to me what DBUS and KDBUS provides that's all so
important or critical. Sure, it might be nice to have, but that's ok.

So I think that's the steps people need to take, give concrete example
of how DBUS is better than anything else out there and won't cause
more problems down the line.

John

2015-04-14 21:54:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 04:14:34PM -0400, John Stoffel wrote:
>
> So I think that's the steps people need to take, give concrete example
> of how DBUS is better than anything else out there and won't cause
> more problems down the line.

I believe that Linux Plumbers is still accepting MicroConferences. I wonder
if this would be a good one to have. Try to get everyone face to face and
talk about how exactly kdbus should be implemented in the kernel.

This doesn't look to me like it is going to be solved via electronic
communication. Looks like the old free beer at a convention where everyone
can give their drunken arguments may be quite productive.

Greg, you told me you'll be there. What about everyone else? Want to
write up a MicroConf:

http://wiki.linuxplumbersconf.org/2015:topics

It's not that far off. Kdbus has waited this long, I'm sure it can wait
till August as well.

I'd really love to see this happen. I'll even supply the popcorn ;-)

-- Steve

2015-04-14 22:05:21

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, 14 Apr 2015, Steven Rostedt wrote:

> I believe that Linux Plumbers is still accepting MicroConferences. I
> wonder if this would be a good one to have. Try to get everyone face to
> face and talk about how exactly kdbus should be implemented in the
> kernel.

I personally would even put more emphasis on a session that would first
focus on "why", before we look at "how".

I have already asked about this during the earlier RFC submissions, but
the only "take-home message" I took from that discussion was "because it's
faster than what we currently have". I don't find that a sufficient
justification by itself for something so complex (with potential
implications all over the place for the whole Linux ecosystem), especially
given the fact we already have sealed memfds zerocopy etc (and I am not
even talking about the "infinite set-in-stone userspace API" implications
this has).

So definitely +1 from me for this discussion to happen, being it either
LPC (which I will unfortunately probably have to miss due to personal
reaons this year) or KS. It might help people like me, who have trouble
understanding why we need it, and LKML discussions don't provide enough
answers for them.

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-14 22:33:40

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:

> Yes, it's an unfortunate design, but one that we are all stuck with
> (think of it as having to implement code for horrid hardware that you
> have to get to work properly.)

Greg, I personally consider this a rather defunct analogy. Broken hardware
comes from "outter space" we just have to live with somehow, and
eventually try to gradually improve by working with vendors (and you
yourself have of course made huge improvements in this very area).

Linux userspace is coming, well, from Linux developers. The sole fact that
someone wrote a daemon that runs on Linux seems like a very poor
justification for sucking the daemon into kernel "because we have to live
with it".
Userspace has to live with it somehow (and eventually fix itself if
necessary), yes. Why should kernel just contribute to this "unfortunate
design" if it really isn't, in any way, obliged or forced to do so?

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-14 22:39:33

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:

> I don't understand. You can not like the D-Bus model (and accordingly
> the X11 model),

I thought that the general hatred level of the X11 "model" and the
protocol lead to al the efforts to reimplement this properly ... in
userspace (for example Wayland, right?).

I don't think anyone was ever seriously suggesting "X11 model is broken,
so let's push it to kernel" ... ?

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-15 01:36:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <[email protected]> wrote:
> On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
>> > I remain opposed to this half thought out trash of an ABI for the
>> > meta-data.
>>
>> You don't have to enable the metadata if you don't want to use it, it's
>> an option :)
>
> OK, _that_ argument needs to be stomped out. It had been used before,
> and it was a deliberate scam. There is no such thing as optional kernel
> interface, especially when udev/dbus/systemd crowd is nearby. We'd been
> through that excuse before; remember how devtmpfs was pushed in as "optional"?
>
> This is a huge red flag. On the level of "I need your account information
> to transfer $200M you might have inherited from my deceased client".
>
> Just to recap how it went the last time around: Kay kept pushing his piece of
> code into the tree, claiming that it was optional, that nobody who doesn't
> like it has to enable it, so what's the problem? OK, in it went. And pretty
> soon udev (maintained by the same... meticulously honorable person) had
> stopped working on the kernels that didn't have that enabled.
>
> We had been there before. To paraphrase another... meticulously honorable
> person, "if you didn't want something relied upon, why have you put it into the
> kernel?" Said person is on the record as having no problem whatsoever with
> adding dependencies to the bottom of userland stack.

It appears that, if kdbus is merged, upstream udev may end up requiring it:

http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html

Grumble.

--Andy

2015-04-15 06:54:20

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <[email protected]> wrote:
>> We had been there before. To paraphrase another... meticulously honorable
>> person, "if you didn't want something relied upon, why have you put it into the
>> kernel?" Said person is on the record as having no problem whatsoever with
>> adding dependencies to the bottom of userland stack.
>
> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>
> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html

Why so surprised?
kdbus will be a major hard-dependency for every non-trivial userland.
Like cgroups...

--
Thanks,
//richard

2015-04-15 06:59:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote:
> So definitely +1 from me for this discussion to happen, being it
> either LPC (which I will unfortunately probably have to miss due to
> personal reaons this year) or KS. It might help people like me, who
> have trouble understanding why we need it, and LKML discussions don't
> provide enough answers for them.

Oh, and then please do a writeup so that people like me can read about
it and find out the answer to that same question.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-15 07:31:54

by Mike Galbraith

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 2015-04-15 at 08:54 +0200, Richard Weinberger wrote:
> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <[email protected]
> > wrote:
> > > We had been there before. To paraphrase another... meticulously
> > > honorable
> > > person, "if you didn't want something relied upon, why have you
> > > put it into the
> > > kernel?" Said person is on the record as having no problem
> > > whatsoever with
> > > adding dependencies to the bottom of userland stack.
> >
> > It appears that, if kdbus is merged, upstream udev may end up
> > requiring it:
> >
> > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>
> Why so surprised?
> kdbus will be a major hard-dependency for every non-trivial userland.
> Like cgroups...

Heh, makes one wonder how we ever survived.

My openSUSE box is thoroughly infested with latest system-disease, and
it seems the thing has now mandated group scheduling. Whether you
need/want it and its size large overhead or not is immaterial. I'm
not seeing an on/off switch anyway.

(shrug, axe should work as substitute, say "byebye tentacle").

-Mike

2015-04-15 08:19:01

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <[email protected]>
wrote:
> > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> >> > I remain opposed to this half thought out trash of an ABI for the
> >> > meta-data.
> >>
> >> You don't have to enable the metadata if you don't want to use it,
> >> it's
> >> an option :)
> >
> > OK, _that_ argument needs to be stomped out. It had been used before,
> > and it was a deliberate scam. There is no such thing as optional
> > kernel interface, especially when udev/dbus/systemd crowd is nearby.
> > We'd been through that excuse before; remember how devtmpfs was
> > pushed in as "optional"?
> >
> > This is a huge red flag. On the level of "I need your account
> > information to transfer $200M you might have inherited from my
> > deceased client".
> >
> > Just to recap how it went the last time around: Kay kept pushing his
> > piece of code into the tree, claiming that it was optional, that
> > nobody who doesn't like it has to enable it, so what's the problem?
> > OK, in it went. And pretty soon udev (maintained by the same...
> > meticulously honorable person) had stopped working on the kernels
> > that didn't have that enabled.
> >
> > We had been there before. To paraphrase another... meticulously
> > honorable person, "if you didn't want something relied upon, why have
> > you put it into the kernel?" Said person is on the record as having
> > no problem whatsoever with adding dependencies to the bottom of
> > userland stack.
>
> It appears that, if kdbus is merged, upstream udev may end up requiring
> it:
>
> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>
> Grumble.

Honestly, I think that tightly coupling systemd and udev to certain kernel
versions in lock step is crap.

That you require some minimum version after some reasonable time, sure.
But in lockstep? Seriously.

I certainly do not want a broken system just cause I have to load an older
kernel version for some reason.

And yes, I think its good not to force just about any userspace idea into
the kernel.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-15 08:30:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 06:36:28PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <[email protected]> wrote:
> > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> >> > I remain opposed to this half thought out trash of an ABI for the
> >> > meta-data.
> >>
> >> You don't have to enable the metadata if you don't want to use it, it's
> >> an option :)
> >
> > OK, _that_ argument needs to be stomped out. It had been used before,
> > and it was a deliberate scam. There is no such thing as optional kernel
> > interface, especially when udev/dbus/systemd crowd is nearby. We'd been
> > through that excuse before; remember how devtmpfs was pushed in as "optional"?
> >
> > This is a huge red flag. On the level of "I need your account information
> > to transfer $200M you might have inherited from my deceased client".
> >
> > Just to recap how it went the last time around: Kay kept pushing his piece of
> > code into the tree, claiming that it was optional, that nobody who doesn't
> > like it has to enable it, so what's the problem? OK, in it went. And pretty
> > soon udev (maintained by the same... meticulously honorable person) had
> > stopped working on the kernels that didn't have that enabled.
> >
> > We had been there before. To paraphrase another... meticulously honorable
> > person, "if you didn't want something relied upon, why have you put it into the
> > kernel?" Said person is on the record as having no problem whatsoever with
> > adding dependencies to the bottom of userland stack.
>
> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>
> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html

Why would anyone propose a kernel api if they didn't actually plan to
use it? Look at the first email in this thread, it shows the
people/projects that want to use this. This is a crazy argument to try
to make people, "stop using the feature that the kernel provides you!"

greg k-h

2015-04-15 08:32:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote:
> Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <[email protected]>
> wrote:
> > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman wrote:
> > >> > I remain opposed to this half thought out trash of an ABI for the
> > >> > meta-data.
> > >>
> > >> You don't have to enable the metadata if you don't want to use it,
> > >> it's
> > >> an option :)
> > >
> > > OK, _that_ argument needs to be stomped out. It had been used before,
> > > and it was a deliberate scam. There is no such thing as optional
> > > kernel interface, especially when udev/dbus/systemd crowd is nearby.
> > > We'd been through that excuse before; remember how devtmpfs was
> > > pushed in as "optional"?
> > >
> > > This is a huge red flag. On the level of "I need your account
> > > information to transfer $200M you might have inherited from my
> > > deceased client".
> > >
> > > Just to recap how it went the last time around: Kay kept pushing his
> > > piece of code into the tree, claiming that it was optional, that
> > > nobody who doesn't like it has to enable it, so what's the problem?
> > > OK, in it went. And pretty soon udev (maintained by the same...
> > > meticulously honorable person) had stopped working on the kernels
> > > that didn't have that enabled.
> > >
> > > We had been there before. To paraphrase another... meticulously
> > > honorable person, "if you didn't want something relied upon, why have
> > > you put it into the kernel?" Said person is on the record as having
> > > no problem whatsoever with adding dependencies to the bottom of
> > > userland stack.
> >
> > It appears that, if kdbus is merged, upstream udev may end up requiring
> > it:
> >
> > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> >
> > Grumble.
>
> Honestly, I think that tightly coupling systemd and udev to certain kernel
> versions in lock step is crap.

Where do you see that happening?

> That you require some minimum version after some reasonable time, sure.
> But in lockstep? Seriously.

Has that happened in the past? Look at the minimum requirements of
systemd/udev today, something like the 3.7 kernel release, many years
old.

> I certainly do not want a broken system just cause I have to load an older
> kernel version for some reason.

No one does. But, work with your distribution if you end up with
something like this. Remember, the goal is that you can always run
newer kernels on older userspace, as that is something that we kernel
developers can enforce. Userspace programs have other requirements /
communities, it's up to them to decide what their oldest kernel version
they wish to support. Hint, even glibc makes these kinds of
requirements, it's nothing new at all here, so why is this even an
issue?

> And yes, I think its good not to force just about any userspace idea into
> the kernel.

Do you have any technical objections to the patch as proposed?

thanks,

greg k-h

2015-04-15 08:36:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 04:14:34PM -0400, John Stoffel wrote:
> >>>>> "Greg" == Greg Kroah-Hartman <[email protected]> writes:
>
> Greg> On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> >> On Tue, Apr 14, 2015 at 10:50 AM, Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >> > On Mon, Apr 13, 2015 at 02:01:21PM -0700, Andy Lutomirski wrote:
> >> >> On Mon, Apr 13, 2015 at 1:45 PM, Greg Kroah-Hartman
> >> >> <[email protected]> wrote:
> >> >> > On Mon, Apr 13, 2015 at 01:13:26PM -0700, Andy Lutomirski wrote:
> >> >> >> On Mon, Apr 13, 2015 at 12:03 PM, Greg Kroah-Hartman
> >> >> >> <[email protected]> wrote:
> >> >> >> > The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
> >> >> >> >
> >> >> >> > Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
> >> >> >> >
> >> >> >> > are available in the git repository at:
> >> >> >> >
> >> >> >> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
> >> >> >> >
> >> >> >> > for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
> >> >> >> >
> >> >> >> > kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
> >> >> >> >
> >> >> >> > ----------------------------------------------------------------
> >> >> >> > kdbus for 4.1-rc1
> >> >> >> >
> >> >> >> > Here's the kdbus pull request for 4.1-rc1.
> >> >> >> >
> >> >> >> > It's been under development for many years now, and been in linux-next
> >> >> >> > for many months, and has undergone loads of testing a review and even a few
> >> >> >> > good arguments. It comes with full documentation and tests.
> >> >> >> >
> >> >> >> > There has been a few complaints about the code, notably from people who
> >> >> >> > don't like the use of metadata in the bus messages. That is actually
> >> >> >> > one of the main features here, as we can get this data in a secure and
> >> >> >> > reliable way, and it's something that userspace requires today. So
> >> >> >> > while it does look "odd" to people who are not familiar with dbus, this
> >> >> >> > is something that finally fixes a number of almost unfixable races in
> >> >> >> > the current dbus implementations.
> >> >> >>
> >> >> >> While I generally like the concept of having a better in-kernel IPC
> >> >> >> mechanism, after some consideration I don't think this belongs in the
> >> >> >> kernel in its current form. Here's why.
> >> >> >>
> >> >> >> First, the naming is counterintuitive. There are "endpoints", but you
> >> >> >> don't send messages to endpoints. In fact, an basic kdbus setup will
> >> >> >> have exactly one endpoint AFAICT. Wtf? This makes talking about it
> >> >> >> awkward.
> >> >> >
> >> >> > Did you read the documentation? We've been over this before, and it
> >> >> > should all be addressed in the documentation based on this coming up.
> >> >> >
> >> >> >> A lot of the design seems to be to violate the concept of "mechanism,
> >> >> >> not policy". Kdbus is very much a port of userspace dbus to the
> >> >> >> kernel, and it appears to be a port designed to preserve some
> >> >> >> questionable design decisions instead of learning from them.
> >> >> >>
> >> >> >> For example, kdbus sticks a whole policy database in the kernel, but
> >> >> >> that policy database (AFAICT -- holy crap it's overcomplicated) is
> >> >> >> *not* a simple set of rules like "if A then allow B". Instead it has
> >> >> >> really weird dependencies not on what name you're sending to but on
> >> >> >> what *other* names the thing you're sending to has. Sorry, but this
> >> >> >> way lies (a) the inability for a large set of developers to understand
> >> >> >> what's going on and (b) security bugs. Also, the result probably
> >> >> >> can't be reused as part of a non-legacy-filled sensible design
> >> >> >
> >> >> > What policy database? Matching messages to subscribers? That's the
> >> >> > same type of "database" that other ipc subsystems need/want, there's
> >> >> > nothing radical here.
> >> >>
> >> >> Let me quote from the latest version of the kdbus docs:
> >> >>
> >> >> Note that TALK access is checked against all names of a connection. For
> >> >> example, if a connection owns both <constant>'org.foo.bar'</constant> and
> >> >> <constant>'org.blah.baz'</constant>, and the policy database allows
> >> >> <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
> >> >> permission is also granted to <constant>'org.foo.bar'</constant>. That
> >> >> might sound illogical, but after all, we allow messages to be directed to
> >> >> either the ID or a well-known name, and policy is applied to the
> >> >> connection, not the name. In other words, the effective TALK policy for a
> >> >> connection is the most permissive of all names the connection owns.
> >> >>
> >> >> In my humble opinion, this paragraph speaks for itself. The design is
> >> >> bad, full stop.
> >> >
> >> > First off, thanks for reading the docs, I appreciate that. But realize
> >> > also, that this is straight from the D-Bus spec. We aren't doing
> >> > anything "radical" here, this is what your desktop uses that you are
> >> > typing your email from.
> >> >
> >> > Yes, it's an unfortunate design, but one that we are all stuck with
> >> > (think of it as having to implement code for horrid hardware that you
> >> > have to get to work properly.)
> >>
> >> I agree. You've sent a pull request for an unfortunate design. I
> >> don't think that unfortunate design belongs in the kernel. If it says
> >> in userspace, then user programmers could potentially fix it some day.
>
> Greg> You might not like the design, but it is a valid design. Again, we
> Greg> don't refuse to support hardware that is designed badly. Or support
> Greg> protocols we don't necessarily like, that's not the job of a kernel or
> Greg> operating system.
>
> Greg> And here's Havoc's response as to why actually, this is a good design:
> Greg> http://lists.freedesktop.org/archives/dbus/2015-April/016651.html
>
> This is an interesting discussion, and one thing that sticks out to me
> is the comments in the URL above talking about how clients are
> supposed to use a generic name to bind to a resource, but actually do
> a lookup to get the specific name, and then bind to THAT.
>
> So the security concerns raised by Andy do seem to make sense, in that
> either security needs to be the same across all names of a service, so
> that you don't have problems with varying levels once people have
> connected. In terms of the X11 analogy, if I have someone connect,
> and then I do 'xhost -' it removes all access. It's not dependent on
> whether I'm bound to a specific or general service.
>
> So the security aspect really needs to be that the most restrictive
> takes precedence, not the other way around.

But look at how dbus handles this, isn't this done in the correct way?

> And after having read a bunch of the docs, looked at the FAQ, etc;
> it's still no clearer to me what DBUS and KDBUS provides that's all so
> important or critical. Sure, it might be nice to have, but that's ok.

The first email I wrote here explains all of this, are those not valid
uses for such a service that the kernel can provide?

> So I think that's the steps people need to take, give concrete example
> of how DBUS is better than anything else out there and won't cause
> more problems down the line.

D-Bus has been around for over 10 years now, and was the result of many
failed attempts to do something much like this (COM, DCOM, CORBA, and a
few others). The developers involved had lots of experience in this
area, and created a solution that ended up working very well for the
problem domain. So well that all other competing technologies in that
area were obsoleted and abondonded and everyone has moved to D-Bus as it
solves the problems they have in a correct manner.

The reason nothing else has come along might just be because nothing
else _needs_ to come along, D-Bus solves the need.

So unless you see a technical reason why the proposed code is somehow
not correct, I don't understand your complaint.

thanks,

greg k-h

2015-04-15 08:37:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote:
> On Tue, 14 Apr 2015, Steven Rostedt wrote:
>
> > I believe that Linux Plumbers is still accepting MicroConferences. I
> > wonder if this would be a good one to have. Try to get everyone face to
> > face and talk about how exactly kdbus should be implemented in the
> > kernel.
>
> I personally would even put more emphasis on a session that would first
> focus on "why", before we look at "how".
>
> I have already asked about this during the earlier RFC submissions, but
> the only "take-home message" I took from that discussion was "because it's
> faster than what we currently have". I don't find that a sufficient
> justification by itself for something so complex (with potential
> implications all over the place for the whole Linux ecosystem), especially
> given the fact we already have sealed memfds zerocopy etc (and I am not
> even talking about the "infinite set-in-stone userspace API" implications
> this has).

I wrote many many lines of "why" in the patch submissions, and in the
first email in this thread. Are any of those specific solutions and
"why" reasons not correct in your opinion? If so, great, please let me
know.

But to say that no one is focusing on "why" is a slight to those of us
who have been providing just that.

thanks,

greg k-h

2015-04-15 08:38:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:39:22AM +0200, Jiri Kosina wrote:
> On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
>
> > I don't understand. You can not like the D-Bus model (and accordingly
> > the X11 model),
>
> I thought that the general hatred level of the X11 "model" and the
> protocol lead to al the efforts to reimplement this properly ... in
> userspace (for example Wayland, right?).
>
> I don't think anyone was ever seriously suggesting "X11 model is broken,
> so let's push it to kernel" ... ?

Ok, fine, it's a broken metaphore, see Havoc's email for why I brought
that up here. It's the issue that a stateful bus is required for
applications that is the main point I'm trying to get across.

thanks,

greg k-h

2015-04-15 08:44:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 14, 2015 at 09:53:36PM +0200, Borislav Petkov wrote:
> On Tue, Apr 14, 2015 at 09:48:04PM +0200, Greg Kroah-Hartman wrote:
> > It's not going to stop anything from working, it's just going to stop
> > some programs from being able to do things they really want to do (see
> > the first email for examples.)
>
> Until it is made "mandatory" as Al said earlier.

If you really don't like userspace using features the kernel provides
you, well, there's nothing I can say that will change that odd feeling,
sorry.

If we don't want to make the metadata thing optional because everyone
will end up always using it, great, we will go make that change, that's
not an issue at all. It will then end up looking like the first
proposal that was made many months ago :)

> > Yes, we could make this live outside the kernel tree, but that's not the
> > way we work anymore.
>
> > We merge things that are useful, that match our
> > security and coding requirements, and are going to be maintained by
> > people we trust.
>
> We trust? I'm not going to even comment on that.

Really? Who in that MAINTAINERS file entry do you not trust?
Seriously, if that's the issue here, please let me know. Do you not
trust me? Daniel? David? Djalal? All of us have been long-time
kernel developers and maintainers of other portions of the kernel stack
that you rely on every day. If you have objections to any of us
maintaining this code, let me know. Otherwise, stop making foolish
statements.

> And frankly, merging a useful piece of code sounds completely different
> to me than this serious backlash I'm reading from the sidelines.

I don't understand what this means. If you have a technical reason for
why this code shouldn't be merged, great, please let me know and we can
work to address that. Andy and Al have spent time reviewing and giving
us comments, and that's wonderful and valuable and is why I treat their
comments seriously. If you are interested in the code, please review
it, otherwise I don't see what this adds to the conversation at all, do
you?

thanks,

greg k-h

2015-04-15 08:48:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <[email protected]> wrote:
> >> We had been there before. To paraphrase another... meticulously honorable
> >> person, "if you didn't want something relied upon, why have you put it into the
> >> kernel?" Said person is on the record as having no problem whatsoever with
> >> adding dependencies to the bottom of userland stack.
> >
> > It appears that, if kdbus is merged, upstream udev may end up requiring it:
> >
> > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>
> Why so surprised?
> kdbus will be a major hard-dependency for every non-trivial userland.
> Like cgroups...

Maybe because things like cgroups, and kdbus in the future, solves a
need that the developers in that area have to solve problems and
provide functionality that their users require?

Look, us kernel developers only work on one huge, multithreaded, global
state binary. Our experience in multi-application interactions with
shared state and permission requirements is usually quite limited. If
you don't trust the developers of those programs outside the kernel,
don't use them, there are still distros out there that don't require
them.

But if you do trust them, then don't make snide comments about how they
don't know what they are doing, because that's just flat out rude.

greg k-h

2015-04-15 08:53:00

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 15. April 2015, 10:32:19 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote:
> > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <[email protected]>
> >
> > wrote:
> > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman
wrote:
> > > >> > I remain opposed to this half thought out trash of an ABI for
> > > >> > the
> > > >> > meta-data.
> > > >>
> > > >> You don't have to enable the metadata if you don't want to use
> > > >> it,
> > > >> it's
> > > >> an option :)
> > > >
> > > > OK, _that_ argument needs to be stomped out. It had been used
> > > > before,
> > > > and it was a deliberate scam. There is no such thing as optional
> > > > kernel interface, especially when udev/dbus/systemd crowd is
> > > > nearby.
> > > > We'd been through that excuse before; remember how devtmpfs was
> > > > pushed in as "optional"?
> > > >
> > > > This is a huge red flag. On the level of "I need your account
> > > > information to transfer $200M you might have inherited from my
> > > > deceased client".
> > > >
> > > > Just to recap how it went the last time around: Kay kept pushing
> > > > his
> > > > piece of code into the tree, claiming that it was optional, that
> > > > nobody who doesn't like it has to enable it, so what's the
> > > > problem?
> > > > OK, in it went. And pretty soon udev (maintained by the same...
> > > > meticulously honorable person) had stopped working on the kernels
> > > > that didn't have that enabled.
> > > >
> > > > We had been there before. To paraphrase another... meticulously
> > > > honorable person, "if you didn't want something relied upon, why
> > > > have
> > > > you put it into the kernel?" Said person is on the record as
> > > > having
> > > > no problem whatsoever with adding dependencies to the bottom of
> > > > userland stack.
> > >
> > > It appears that, if kdbus is merged, upstream udev may end up
> > > requiring
> > > it:
> > >
> > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.
> > > html
> > >
> > > Grumble.
> >
> > Honestly, I think that tightly coupling systemd and udev to certain
> > kernel versions in lock step is crap.
>
> Where do you see that happening?
>
> > That you require some minimum version after some reasonable time,
> > sure.
> > But in lockstep? Seriously.
>
> Has that happened in the past? Look at the minimum requirements of
> systemd/udev today, something like the 3.7 kernel release, many years
> old.

I refer to the linked mailing list post from Lennart as I quote here:

> To make this clear, we expect that systemd and kernels are updated in
> lockstep. We explicitly do not support really old kernels with really
> (which means 3.4 right now), but even that should be taken with a grain
> of salt, as we already made clear that soon after kdbus is merged into
> the kernel we'll probably make a hard requirement on it from the systemd
> side.

Thats plenty clear, isn´t it? As soond as kdbus is merged into kernel,
systemd will depend on it, and then… if I need to go back to older kernel,
I have to downgrade systemd as well?

> > I certainly do not want a broken system just cause I have to load an
> > older kernel version for some reason.
>
> No one does. But, work with your distribution if you end up with
> something like this. Remember, the goal is that you can always run
> newer kernels on older userspace, as that is something that we kernel
> developers can enforce. Userspace programs have other requirements /
> communities, it's up to them to decide what their oldest kernel version
> they wish to support. Hint, even glibc makes these kinds of
> requirements, it's nothing new at all here, so why is this even an
> issue?

Its no issue for me that systemd required kernel 3.7. But… what Lennart
announces above regarding kdbus reads quite differently.

> > And yes, I think its good not to force just about any userspace idea
> > into the kernel.
>
> Do you have any technical objections to the patch as proposed?

If I had, I would have written it. I explained already that I see that
kernel developers have strong technical objections with kdbus. And that I
think it is important to acknowledge it, instead of telling them, that the
API is required from userspace, userspace people know what they do, and
they should just go away with their concerns.

Thats at least how I received quite some of your responses.

Well and I raised an eyebrow on the busname matching rules and the
capability stuff. Yet, I didn´t comment on it, cause I didn´t look at it
in-depth. I just ask you to take those seriously who did.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-15 08:54:50

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> If you have a technical reason for why this code shouldn't be merged,
> great, please let me know and we can work to address that. Andy and Al
> have spent time reviewing and giving us comments, and that's wonderful
> and valuable and is why I treat their comments seriously. If you are
> interested in the code, please review it, otherwise I don't see what
> this adds to the conversation at all, do you?

You've actually touched another issue I see here, and that is -- the code
is complex like crazy.

I've spent big part of past two days trying to get my head around it, but
I am still far away from getting at least the 1000 miles overview of how
exactly the message passing is designed.

I understand that the primary reason for this complexity is probably the
dbus protocol specification itself.

But the problem really is that I don't think you've received even a single
Reviewed-by: from someone who hasn't been directly involved in developing
the code, right?

For something that's potentially such a core mechanism as a completely
new, massively-adopted IPC, this does send a warning singal.

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-15 08:56:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:33:30AM +0200, Jiri Kosina wrote:
> On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
>
> > Yes, it's an unfortunate design, but one that we are all stuck with
> > (think of it as having to implement code for horrid hardware that you
> > have to get to work properly.)
>
> Greg, I personally consider this a rather defunct analogy. Broken hardware
> comes from "outter space" we just have to live with somehow, and
> eventually try to gradually improve by working with vendors (and you
> yourself have of course made huge improvements in this very area).
>
> Linux userspace is coming, well, from Linux developers. The sole fact that
> someone wrote a daemon that runs on Linux seems like a very poor
> justification for sucking the daemon into kernel "because we have to live
> with it".
> Userspace has to live with it somehow (and eventually fix itself if
> necessary), yes. Why should kernel just contribute to this "unfortunate
> design" if it really isn't, in any way, obliged or forced to do so?

I retract my "unfortunate design" statement, as Havoc pointed out
exactly why that design is the way it is, and it makes sense to me.

To quote the email that he wrote:
The reason is that dbus views the world in a stateful way
assuming that connections, and name ownership, can be tracked
reliably. This is different from say http, and it's one reason
that people used to Internet-oriented protocols find dbus
strange.

I'm one of those "people used to internet-oriented protocols", and I bet
that almost all of us kernel developers also fall into that category, as
the kernel for the most part, is one big tool to help implement those
Internet-oriented protocols :)

The very history of D-Bus, where it came from, who is now using it, what
happened to all of the other proposed solutions in this area, is worth
examining if you are interested in it. This type of protocol solves a
real problem in this area, one that everyone has congregated on as the
best-known solution for that issue. It's used everywhere, on servers,
embedded systems, desktops, you name it. All languages have bindings
for it, and it's the underpinning of a modern Linux stack. For us to
somehow say that it's a "horible protocol" is terribly unfair, and
unkind, to all of the people who have worked to make it the best
possible solution for this problem space.

And honestly, I don't have a better proposal. And I seriously doubt
that anyone here does either. In the many years I've spent working on
this, dbus has seemed to be odd, and strange, to the way that the kernel
has normally worked, because it is. And that's not a bad thing, it's
just different, and for us to support real needs and requirements of our
users, is the requirement of the Linux kernel.

Now if there are technical problems or insecurities in the proposed code
submission, wonderful, please let me know and I'll be glad to work to
address them. But let's just drop the whole "oooh, look, D-Bus is
horrible looking, we can't support that!", is not a valid justification.

And I'll defer back to the old AF_DBUS proposal, which was looked at
from a technical point of view of the network developers who said that
they didn't think that putting the D-Bus model into a network stack made
any sense from a technical point of view, and outligned their
objectsions. And they were right, hence this different proposal many
years later based on their insight and suggestions.

If you have objections like that, great, please let me know.

thanks,

greg k-h

2015-04-15 09:01:06

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
>> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <[email protected]> wrote:
>>>> We had been there before. To paraphrase another... meticulously honorable
>>>> person, "if you didn't want something relied upon, why have you put it into the
>>>> kernel?" Said person is on the record as having no problem whatsoever with
>>>> adding dependencies to the bottom of userland stack.
>>>
>>> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>>>
>>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>>
>> Why so surprised?
>> kdbus will be a major hard-dependency for every non-trivial userland.
>> Like cgroups...
>
> Maybe because things like cgroups, and kdbus in the future, solves a
> need that the developers in that area have to solve problems and
> provide functionality that their users require?

I agree that a high level bus is needed and dbus is not perfect.
But this does not mean that we need a in-kernel dbus in any case.

> Look, us kernel developers only work on one huge, multithreaded, global
> state binary. Our experience in multi-application interactions with
> shared state and permission requirements is usually quite limited. If
> you don't trust the developers of those programs outside the kernel,
> don't use them, there are still distros out there that don't require
> them.

We're all forced to use cgroups, systemd, udev unless we want to have busybox
as userland. That's a fact.
systemd and its dependencies are not a bad thing per se.
But we have to be very sure that new hard-dependencies are
in well shape before we push them into the kernel.
IMHO this is also Andy and Eris's point.

Thanks,
//richard

2015-04-15 09:02:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 10:52:37AM +0200, Martin Steigerwald wrote:
> Am Mittwoch, 15. April 2015, 10:32:19 schrieb Greg Kroah-Hartman:
> > On Wed, Apr 15, 2015 at 10:18:46AM +0200, Martin Steigerwald wrote:
> > > Am Dienstag, 14. April 2015, 18:36:28 schrieb Andy Lutomirski:
> > > > On Mon, Apr 13, 2015 at 1:22 PM, Al Viro <[email protected]>
> > >
> > > wrote:
> > > > > On Mon, Apr 13, 2015 at 09:42:17PM +0200, Greg Kroah-Hartman
> wrote:
> > > > >> > I remain opposed to this half thought out trash of an ABI for
> > > > >> > the
> > > > >> > meta-data.
> > > > >>
> > > > >> You don't have to enable the metadata if you don't want to use
> > > > >> it,
> > > > >> it's
> > > > >> an option :)
> > > > >
> > > > > OK, _that_ argument needs to be stomped out. It had been used
> > > > > before,
> > > > > and it was a deliberate scam. There is no such thing as optional
> > > > > kernel interface, especially when udev/dbus/systemd crowd is
> > > > > nearby.
> > > > > We'd been through that excuse before; remember how devtmpfs was
> > > > > pushed in as "optional"?
> > > > >
> > > > > This is a huge red flag. On the level of "I need your account
> > > > > information to transfer $200M you might have inherited from my
> > > > > deceased client".
> > > > >
> > > > > Just to recap how it went the last time around: Kay kept pushing
> > > > > his
> > > > > piece of code into the tree, claiming that it was optional, that
> > > > > nobody who doesn't like it has to enable it, so what's the
> > > > > problem?
> > > > > OK, in it went. And pretty soon udev (maintained by the same...
> > > > > meticulously honorable person) had stopped working on the kernels
> > > > > that didn't have that enabled.
> > > > >
> > > > > We had been there before. To paraphrase another... meticulously
> > > > > honorable person, "if you didn't want something relied upon, why
> > > > > have
> > > > > you put it into the kernel?" Said person is on the record as
> > > > > having
> > > > > no problem whatsoever with adding dependencies to the bottom of
> > > > > userland stack.
> > > >
> > > > It appears that, if kdbus is merged, upstream udev may end up
> > > > requiring
> > > > it:
> > > >
> > > > http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.
> > > > html
> > > >
> > > > Grumble.
> > >
> > > Honestly, I think that tightly coupling systemd and udev to certain
> > > kernel versions in lock step is crap.
> >
> > Where do you see that happening?
> >
> > > That you require some minimum version after some reasonable time,
> > > sure.
> > > But in lockstep? Seriously.
> >
> > Has that happened in the past? Look at the minimum requirements of
> > systemd/udev today, something like the 3.7 kernel release, many years
> > old.
>
> I refer to the linked mailing list post from Lennart as I quote here:
>
> > To make this clear, we expect that systemd and kernels are updated in
> > lockstep. We explicitly do not support really old kernels with really
> > (which means 3.4 right now), but even that should be taken with a grain
> > of salt, as we already made clear that soon after kdbus is merged into
> > the kernel we'll probably make a hard requirement on it from the systemd
> > side.
>
> Thats plenty clear, isn´t it? As soond as kdbus is merged into kernel,
> systemd will depend on it, and then… if I need to go back to older kernel,
> I have to downgrade systemd as well?
>
> > > I certainly do not want a broken system just cause I have to load an
> > > older kernel version for some reason.
> >
> > No one does. But, work with your distribution if you end up with
> > something like this. Remember, the goal is that you can always run
> > newer kernels on older userspace, as that is something that we kernel
> > developers can enforce. Userspace programs have other requirements /
> > communities, it's up to them to decide what their oldest kernel version
> > they wish to support. Hint, even glibc makes these kinds of
> > requirements, it's nothing new at all here, so why is this even an
> > issue?
>
> Its no issue for me that systemd required kernel 3.7. But… what Lennart
> announces above regarding kdbus reads quite differently.

Adding features to the systemd repo, and then having those releases make
it out to your distro is a multi-year timeframe normally, and
multi-month at the least. If a distro made such a decision to not
support old kernels by accepting such a userspace requirement, take it
up with them.

And there are forks of systemd that keep around older kernel support,
and distros use them for this very reason. Because they want to use old
kernel versions, and that's great.

It's the same for any kernel feature, programs are free to use them if
they want to. If glibc were to make the requirement tomorrow that they
are going to use memfd for their internal use and require that everyone
update their kernels for their new release, we would all laugh that that
is pretty funny and their user base would suffer.

But again, that's nothing that the kernel has any control over, take it
up with that project if you object to that.

Personally, I want people to use the new code/features I provide them in
the kernel, and get upset when people don't. Otherwise, why would I
have spent so much time creating them and supporting them in the first
place?

> > > And yes, I think its good not to force just about any userspace idea
> > > into the kernel.
> >
> > Do you have any technical objections to the patch as proposed?
>
> If I had, I would have written it. I explained already that I see that
> kernel developers have strong technical objections with kdbus. And that I
> think it is important to acknowledge it, instead of telling them, that the
> API is required from userspace, userspace people know what they do, and
> they should just go away with their concerns.
>
> Thats at least how I received quite some of your responses.
>
> Well and I raised an eyebrow on the busname matching rules and the
> capability stuff. Yet, I didn´t comment on it, cause I didn´t look at it
> in-depth. I just ask you to take those seriously who did.

I take technical comments very seriously, where have I not? If you have
technical reasons why the current implementation has problems, please
let me know, and I will be glad to address them.

thanks,

greg k-h

2015-04-15 09:10:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 10:54:41AM +0200, Jiri Kosina wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
>
> > If you have a technical reason for why this code shouldn't be merged,
> > great, please let me know and we can work to address that. Andy and Al
> > have spent time reviewing and giving us comments, and that's wonderful
> > and valuable and is why I treat their comments seriously. If you are
> > interested in the code, please review it, otherwise I don't see what
> > this adds to the conversation at all, do you?
>
> You've actually touched another issue I see here, and that is -- the code
> is complex like crazy.
>
> I've spent big part of past two days trying to get my head around it, but
> I am still far away from getting at least the 1000 miles overview of how
> exactly the message passing is designed.
>
> I understand that the primary reason for this complexity is probably the
> dbus protocol specification itself.

Yes it is.

> But the problem really is that I don't think you've received even a single
> Reviewed-by: from someone who hasn't been directly involved in developing
> the code, right?

I've asked for it, but finding people to review code is hard, as you
know. It's only 13k lines long, smaller than a serial port driver (my
unit of code review), so it's not all that big.

It's smaller than the USB3 host controller driver as well, and very few
people ever reviewed that beast :)

> For something that's potentially such a core mechanism as a completely
> new, massively-adopted IPC, this does send a warning singal.

If you know of a way to force others to review code, please let me know.

thanks,

greg k-h

2015-04-15 09:20:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:00:50AM +0200, Richard Weinberger wrote:
> Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman:
> > On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
> >> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <[email protected]> wrote:
> >>>> We had been there before. To paraphrase another... meticulously honorable
> >>>> person, "if you didn't want something relied upon, why have you put it into the
> >>>> kernel?" Said person is on the record as having no problem whatsoever with
> >>>> adding dependencies to the bottom of userland stack.
> >>>
> >>> It appears that, if kdbus is merged, upstream udev may end up requiring it:
> >>>
> >>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
> >>
> >> Why so surprised?
> >> kdbus will be a major hard-dependency for every non-trivial userland.
> >> Like cgroups...
> >
> > Maybe because things like cgroups, and kdbus in the future, solves a
> > need that the developers in that area have to solve problems and
> > provide functionality that their users require?
>
> I agree that a high level bus is needed and dbus is not perfect.
> But this does not mean that we need a in-kernel dbus in any case.

So what do you propose to solve the issues presented in my original
email about the usecases that this code addresses?

> > Look, us kernel developers only work on one huge, multithreaded, global
> > state binary. Our experience in multi-application interactions with
> > shared state and permission requirements is usually quite limited. If
> > you don't trust the developers of those programs outside the kernel,
> > don't use them, there are still distros out there that don't require
> > them.
>
> We're all forced to use cgroups, systemd, udev unless we want to have busybox
> as userland. That's a fact.

Is that a problem?

> systemd and its dependencies are not a bad thing per se.
> But we have to be very sure that new hard-dependencies are
> in well shape before we push them into the kernel.

That's fine, and normal, and I expect it. But please provide technical
reasons why the proposal is not acceptable, like Andy has done in this
thread.

thanks,

greg k-h

2015-04-15 09:24:20

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > as userland. That's a fact.
>
> Is that a problem?

I'm amazed that you're really actually asking that question :-(

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-15 09:27:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > > as userland. That's a fact.
> >
> > Is that a problem?
>
> I'm amazed that you're really actually asking that question :-(

Really? Why can't userspace rely on the features that the kernel
provides them? If not, why would the feature be created and supported
by us kernel developers in the first place?

That makes no sense at all, please explain.

greg k-h

2015-04-15 09:28:36

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 15.04.2015 um 11:20 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:00:50AM +0200, Richard Weinberger wrote:
>> Am 15.04.2015 um 10:48 schrieb Greg Kroah-Hartman:
>>> On Wed, Apr 15, 2015 at 08:54:07AM +0200, Richard Weinberger wrote:
>>>> On Wed, Apr 15, 2015 at 3:36 AM, Andy Lutomirski <[email protected]> wrote:
>>>>>> We had been there before. To paraphrase another... meticulously honorable
>>>>>> person, "if you didn't want something relied upon, why have you put it into the
>>>>>> kernel?" Said person is on the record as having no problem whatsoever with
>>>>>> adding dependencies to the bottom of userland stack.
>>>>>
>>>>> It appears that, if kdbus is merged, upstream udev may end up requiring it:
>>>>>
>>>>> http://lists.freedesktop.org/archives/systemd-devel/2014-May/019657.html
>>>>
>>>> Why so surprised?
>>>> kdbus will be a major hard-dependency for every non-trivial userland.
>>>> Like cgroups...
>>>
>>> Maybe because things like cgroups, and kdbus in the future, solves a
>>> need that the developers in that area have to solve problems and
>>> provide functionality that their users require?
>>
>> I agree that a high level bus is needed and dbus is not perfect.
>> But this does not mean that we need a in-kernel dbus in any case.
>
> So what do you propose to solve the issues presented in my original
> email about the usecases that this code addresses?
>
>>> Look, us kernel developers only work on one huge, multithreaded, global
>>> state binary. Our experience in multi-application interactions with
>>> shared state and permission requirements is usually quite limited. If
>>> you don't trust the developers of those programs outside the kernel,
>>> don't use them, there are still distros out there that don't require
>>> them.
>>
>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
>> as userland. That's a fact.
>
> Is that a problem?
>
>> systemd and its dependencies are not a bad thing per se.
>> But we have to be very sure that new hard-dependencies are
>> in well shape before we push them into the kernel.
>
> That's fine, and normal, and I expect it. But please provide technical
> reasons why the proposal is not acceptable, like Andy has done in this
> thread.

I did not state that the proposal is not acceptable.
My statement was that we have to be well aware of the fact that
we will be forced to use kdbus in future as it will become a dependency.

Some developers on IRC said they don't care about kdbus at all as long they
can disable it. This is wrong, we have to use it. And that is fine.
But we're all have be aware of the implications. kdbus will be ABI.

Thanks,
//richard

2015-04-15 09:28:48

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 15. April 2015, 11:02:12 schrieb Greg Kroah-Hartman:
> > > > And yes, I think its good not to force just about any userspace
> > > > idea
> > > > into the kernel.
> > >
> > >
> > >
> > > Do you have any technical objections to the patch as proposed?
> >
> >
> >
> > If I had, I would have written it. I explained already that I see
> > that
> > kernel developers have strong technical objections with kdbus. And
> > that I think it is important to acknowledge it, instead of telling
> > them, that the API is required from userspace, userspace people know
> > what they do, and they should just go away with their concerns.
> >
> >
> >
> > Thats at least how I received quite some of your responses.
> >
> >
> >
> > Well and I raised an eyebrow on the busname matching rules and the
> > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at
> > it in-depth. I just ask you to take those seriously who did.
>
> I take technical comments very seriously, where have I not? If you have
> technical reasons why the current implementation has problems, please
> let me know, and I will be glad to address them.

>From what I read you basically answered all technical comments like in:

The dbus API is like it is for a very good reason, everyone is using it
and everyone agrees. Capabilities are used in userspace for good reason
and so on.

But I see, here, not everyone does.

Most of your answers didn´t seem to address the concerns raised of having
this in the *kernel*. Especially the security concerns.

Thats what I meant with "And yes, I think its good not to force just about
any userspace into the kernel". I think arguing with this is how userspace
does it pattern, even if it truly is for a very good reason, is not
sufficient as argument for having it in the kernel.

I am just looking at the argumentative pattern here. If other kernel
developers complain about how hard it is to review and wrap their mind
around the kdbus patches… I am scared at just trying to understand the
patches. So no technical complaints from me. I did not nack it nor do I
see myself in the position to nack it.

So feel free to do with my argument what you like. I just tried to
understand why the communication in here works in circles as it does and I
think will continue to work like that as long as its the userspace does it
that way argument or this is optional argument only. For the discussion to
go anywhere its important to acknowledge each other.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-15 09:31:03

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
>> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
>>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
>>>> as userland. That's a fact.
>>>
>>> Is that a problem?
>>
>> I'm amazed that you're really actually asking that question :-(
>
> Really? Why can't userspace rely on the features that the kernel
> provides them? If not, why would the feature be created and supported
> by us kernel developers in the first place?

This IMHO not the problem.
But if we add a new component to the kernel which *will* be used
by almost every userland out there (systemd won the "init wars")
we have to make sure that we're all fine with it.
Andy and Eric have some very valid concerns.

Thanks,
//richard

2015-04-15 09:37:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 10:44:40AM +0200, Greg Kroah-Hartman wrote:
> If you really don't like userspace using features the kernel provides
> you, well, there's nothing I can say that will change that odd feeling,
> sorry.

Are you even reading what people are saying?

I don't like the mandatory(!) aspect of this, which it will eventually
become. There is this thing called "choice", remember?

> Really? Who in that MAINTAINERS file entry do you not trust?

The fact that you're still pushing for this current design *in the face*
of people pointing out serious design flaws with this makes me not
really trust you.

> I don't understand what this means. If you have a technical reason
> for why this code shouldn't be merged, great, please let me know and
> we can work to address that. Andy and Al have spent time reviewing
> and giving us comments, and that's wonderful and valuable and is
> why I treat their comments seriously. If you are interested in the
> code, please review it,

Yeah, I took a brief look at the code. It is overcomplicated.

If I were to review it properly, I'd ask you to split it in small
patchsets. Hell, I'm pretty sure you would do the same for code you
don't know if you were in my shoes.

Also, considering the complexity of this patchset, it doesn't have
a single Reviewed-by by an external party. If this were any other
submission, it would've been kicked to the curb a long time ago.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-15 09:46:41

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > > > as userland. That's a fact.
> > >
> > > Is that a problem?
> >
> > I'm amazed that you're really actually asking that question :-(
>
> Really? Why can't userspace rely on the features that the kernel
> provides them?

Userspace can do whatever it wants. As long as I'm not being *forced* to
do what userspace thinks is the right thing.

It seems to me that since that whole systemd* debacle started, we're
forgetting the choice aspect.

And dammit, I want my choice. I want to be able to choose what I'm
running. Not run what someone else thought what would be good for me to
run. If I wanted that, I'd long switched to windoze or äbble.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-15 09:49:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:30:52AM +0200, Richard Weinberger wrote:
> Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman:
> > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> >> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> >>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
> >>>> as userland. That's a fact.
> >>>
> >>> Is that a problem?
> >>
> >> I'm amazed that you're really actually asking that question :-(
> >
> > Really? Why can't userspace rely on the features that the kernel
> > provides them? If not, why would the feature be created and supported
> > by us kernel developers in the first place?
>
> This IMHO not the problem.
> But if we add a new component to the kernel which *will* be used
> by almost every userland out there (systemd won the "init wars")
> we have to make sure that we're all fine with it.

Sure, but why would this be different from any other kernel feature that
we add? We have to be sure we are fine with everything we merge, as we
are saying we are going to maintain this stuff for forever.

> Andy and Eric have some very valid concerns.

I've tried to address Andy's concerns, Eric is not being very specific,
so there's nothing I can do there :)

thanks,

greg k-h

2015-04-15 09:53:49

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 15.04.2015 um 11:49 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:30:52AM +0200, Richard Weinberger wrote:
>> Am 15.04.2015 um 11:27 schrieb Greg Kroah-Hartman:
>>> On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
>>>> On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
>>>>>> We're all forced to use cgroups, systemd, udev unless we want to have busybox
>>>>>> as userland. That's a fact.
>>>>>
>>>>> Is that a problem?
>>>>
>>>> I'm amazed that you're really actually asking that question :-(
>>>
>>> Really? Why can't userspace rely on the features that the kernel
>>> provides them? If not, why would the feature be created and supported
>>> by us kernel developers in the first place?
>>
>> This IMHO not the problem.
>> But if we add a new component to the kernel which *will* be used
>> by almost every userland out there (systemd won the "init wars")
>> we have to make sure that we're all fine with it.
>
> Sure, but why would this be different from any other kernel feature that
> we add? We have to be sure we are fine with everything we merge, as we
> are saying we are going to maintain this stuff for forever.

There is nothing different. The series has currently two NACKs,
0 ACKs and 0 Reviews.
I don't think that any other series would get merged in such a state.

>> Andy and Eric have some very valid concerns.
>
> I've tried to address Andy's concerns, Eric is not being very specific,
> so there's nothing I can do there :)

What about Stevens proposal to talk at Plumbers?
I fear the discussion is at a dead end and needs a face to face
resolution.

Thanks,
//richard

2015-04-15 10:38:26

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 00:39:22 +0200 (CEST)
Jiri Kosina <[email protected]> wrote:

> On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
>
> > I don't understand. You can not like the D-Bus model (and accordingly
> > the X11 model),
>
> I thought that the general hatred level of the X11 "model" and the
> protocol lead to al the efforts to reimplement this properly ... in
> userspace (for example Wayland, right?).
>
> I don't think anyone was ever seriously suggesting "X11 model is broken,
> so let's push it to kernel" ... ?

The X11 model is *nothing* to do with the dbus/kdbus model. X11 does
properties by attaching them to windows. Those properties can be
monitored for changes and they can be queried. Setting them is
asynchronous, querying them is sync or with the newer event based
libraries can be async. X11 properties are network safe, handled through
the same X11 authority as everything else. Two apps can happily run on
different systems sharing a display over the network and sharing and
responding to changes in X11 properties - and it just works.

The Gnome people tried to re-invent X11 properties and embedding badly
with CORBA, then with dbus, despite the fact the Andrew system could
already do it really fast and cleanly even before Gnome was thought of.

There is no comparison between the elegance of X11 property setting and a
chunk of proposed kernel code that is half the size of a tiny X server!

The dbus model is also flawed in a load of other ways in user space
because message handling in the hands of people with no concept of
systemic performance analysis just leads to disaster. One of the big
reasons dbus is so "slow" isn't that dbus is "slow", it's that the
crapware on top of it makes *thousands* of dbus queries.

If you must do it in kernel why not use the Android binder - it's awful,
broken, and dubiously secure, but at least we'd still only have one awful,
broken dubiously secure rpc/property layer in kernel.

"It's the issue that a stateful bus is required for
applications that is the main point I'm trying to get across."

That would be the "if dbus crashes I have to reboot" design flaw of
Gnome and friends. The only state you need is beyond the endpoints. It's a
message passing system. If you think message passing needs state then I'd
take a look at the internet. State belongs in the end points.

It's telling that I can lose and recover my internet connection without
rebooting but not my desktops internal messaging.

Alan

2015-04-15 11:06:51

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> To quote the email that he wrote:
> The reason is that dbus views the world in a stateful way
> assuming that connections, and name ownership, can be tracked
> reliably. This is different from say http, and it's one reason
> that people used to Internet-oriented protocols find dbus
> strange.
>
> I'm one of those "people used to internet-oriented protocols", and I bet
> that almost all of us kernel developers also fall into that category, as
> the kernel for the most part, is one big tool to help implement those
> Internet-oriented protocols :)

I worked on protocols with state. I suffered X.25, X.29, coloured book,
ISDN. It's a completely *crap* model. It has unfixable reliability
problems. It has unfixable flow control problems. The only thing it buys
you is the ability to have more traffic in flight between end points than
you have transient memory for at the endpoints.

You don't need a grand unified state to track service locations and access
(ie names), which is fortunate or we'd be rebooting the internet and
all attached computers all the time.

> The very history of D-Bus, where it came from, who is now using it, what
> happened to all of the other proposed solutions in this area, is worth
> examining if you are interested in it. This type of protocol solves a

History is why you got where you did. The history of Windows 98 explains
how they got there. It doesn't mean that continuing the same mistake is a
good idea.

> embedded systems, desktops, you name it. All languages have bindings
> for it, and it's the underpinning of a modern Linux stack. For us to

Everything used to have just a choice of COBOL or FORTRAN bindings. That
was not a good reason to continue to program the world in either of them.

> that anyone here does either. In the many years I've spent working on
> this, dbus has seemed to be odd, and strange, to the way that the kernel
> has normally worked, because it is. And that's not a bad thing, it's
> just different, and for us to support real needs and requirements of our
> users, is the requirement of the Linux kernel.

There are I think a set of intertwined problems here

- An efficient delivery system for multicast messages delivered locally
(be that MPI, dbus whatever - it's not "dbus or nothing")

- A kernel side dynamic namespace to describe what goes where

- A kernel side security model to describe who may receive what, and
which additional information/tags/cred info

- Something that provides state to stuff that needs it (and probably
belongs in userspace - dbus name service etc)

- Something that maps dbus and other models onto the kernel security
model (and we have tools like EBPF which are very powerful)

- Something that maps the kernel layer onto models like MPI-3

> Now if there are technical problems or insecurities in the proposed code
> submission, wonderful, please let me know and I'll be glad to work to
> address them. But let's just drop the whole "oooh, look, D-Bus is
> horrible looking, we can't support that!", is not a valid justification.

We can however leave it in userspace until we understand the right small
clean way to support it and other needs. At the moment for example
cluster people can't really use this stuff because its not network aware,
and HPC people can't use it because it's got dbus hardwired into it so
can't speak MPI-3 and the like even though MPI 3 has similar concepts
around DPM, as well as having proper models for parallelism and
collective operations that are lacking in dbus.

If the userspace folks choose to continue to implement dbust over it but
the kernel layer is clean and generic then all is good, because someone
can replace dbust with something better. If its got dbust hard wired into
it then its a complete mess.

Alan

2015-04-15 11:26:35

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> Look, us kernel developers only work on one huge, multithreaded, global
> state binary. Our experience in multi-application interactions with
> shared state and permission requirements is usually quite limited. If
> you don't trust the developers of those programs outside the kernel,
> don't use them, there are still distros out there that don't require
> them.

Speak for yourself. There are a lot of us here who work and have worked
on low level messaging, on networking, on clusters and on things like
distributed shared memory, infiniband etc. I've worked on networks,
including broken stateful protocols, I've maintined and developed
internet and ISDN router code, I've worked with message passing realtime
systems.

Equally the folks who wrote dbus generally also know sweet fa about
writing a kernel and maintaining it for 25 years. Gtk is on its 3rd
completely incompatible instance (and has incompatibilities even within
major versions), Gnome is on its third major incompatible release -
closer would be to say at least the "second project with the same name",
and neither are as old as the kernel.

dbus is not an appropriate design for a kernel messaging layer for a
variety of reasons. That's not to say dbus shouldn't be able to use a
fast kernel messaging layer, or that one shouldn't exist.

dbus is basically a very large very specialized and somewhat flawed
policy engine on top of what should be simple messaging. The two need
splitting apart.

Abstract low level messaging layers are not a new concept. V7 unix had
one experimentally. It's about getting the separation right.

IMHO that probably involves getting the right people in the right place
together - dbus designers, MPI and realtime people, kernel folks and
possibly also some of the hardware messaging folk.

In filesystem terms

- stop writing a dbus only file system
- figure out what a messaging "vfs" looks like
- figure out what an clean low level kernel model looks like
- figure out what has to be where to put the policy in userspace

What might also be worth review is how much dbus traffic actually ought to
be an object store implemented say with tmpfs and inotify type
functionality (or extensions of that) so that you can
set/read/enumerate/get change notifications on properties.

Alan

2015-04-15 11:40:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:44:11AM +0200, Borislav Petkov wrote:
> On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman wrote:
> > > > > We're all forced to use cgroups, systemd, udev unless we want to have busybox
> > > > > as userland. That's a fact.
> > > >
> > > > Is that a problem?
> > >
> > > I'm amazed that you're really actually asking that question :-(
> >
> > Really? Why can't userspace rely on the features that the kernel
> > provides them?
>
> Userspace can do whatever it wants. As long as I'm not being *forced* to
> do what userspace thinks is the right thing.
>
> It seems to me that since that whole systemd* debacle started, we're
> forgetting the choice aspect.

What "choice" aspect? Surely you aren't going to make the "Linux is
about choice" argument are you?

> And dammit, I want my choice. I want to be able to choose what I'm
> running. Not run what someone else thought what would be good for me to
> run. If I wanted that, I'd long switched to windoze or ?bble.

Oh crap, you went there :)

Take a look at http://www.islinuxaboutchoice.com/ please.

And yes, you can take Linux (the kernel) and do whatever you want with
it (look at Android for an example of no existing userspace code, just
the kernel and everything else new for a "choice".)

You have to trust someone to help make your system work together in a
unified way. If you can't trust your distro's engineers, then either
start your own distro, or only run busybox on top of a kernel. You
really don't have much other "choice" than that :)

So stop making this discussion be about "oh those horrid systemd
developers, I don't want their code as my init system" as that's not
what any of this is about at all. It's about the patches being
proposed, and the API involved in it. Please stick to that.

greg k-h

2015-04-15 11:45:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:35:07AM +0200, Borislav Petkov wrote:
> On Wed, Apr 15, 2015 at 10:44:40AM +0200, Greg Kroah-Hartman wrote:
> > If you really don't like userspace using features the kernel provides
> > you, well, there's nothing I can say that will change that odd feeling,
> > sorry.
>
> Are you even reading what people are saying?

You aren't reading the patches :)

> I don't like the mandatory(!) aspect of this, which it will eventually
> become. There is this thing called "choice", remember?

See my other response about that.

> > Really? Who in that MAINTAINERS file entry do you not trust?
>
> The fact that you're still pushing for this current design *in the face*
> of people pointing out serious design flaws with this makes me not
> really trust you.

Please discuss these "serious design flaws". I have responded to all of
the ones that I have seen so far in this thread. And in all of the
other threads since this patch series was first posted months ago. I
would love to discuss the code, so please, let's do that.

> > I don't understand what this means. If you have a technical reason
> > for why this code shouldn't be merged, great, please let me know and
> > we can work to address that. Andy and Al have spent time reviewing
> > and giving us comments, and that's wonderful and valuable and is
> > why I treat their comments seriously. If you are interested in the
> > code, please review it,
>
> Yeah, I took a brief look at the code. It is overcomplicated.
>
> If I were to review it properly, I'd ask you to split it in small
> patchsets. Hell, I'm pretty sure you would do the same for code you
> don't know if you were in my shoes.

It has been split into small patchsets, see the original postings.

And really, 13k lines of code is not all that big. We review driver
submissions larger than that all the time. Remember, your USB host
controller driver is bigger than that.

> Also, considering the complexity of this patchset, it doesn't have
> a single Reviewed-by by an external party. If this were any other
> submission, it would've been kicked to the curb a long time ago.

Please, review it, I would love for others to do so, and have been
asking for that since the beginning of this whole process months ago.

And I'd like to thank Andy and others for doing that. Based on their
review comments we have changed the api, redone the infrastructure, and
modified lots of different things. The code has massively changed for
the better because of this process. I'm not asking for it to stop, I'm
asking for it to be merged now as everyone seems to have not had any
more comments on the code anymore, other than Andy's specific comments,
and everyone else's vague rants.

I'm addressing Andy's comments, and I would love to address yours, if
you actually made any technical ones here.

thanks,

greg k-h

2015-04-15 11:49:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:37:27AM +0100, One Thousand Gnomes wrote:
> On Wed, 15 Apr 2015 00:39:22 +0200 (CEST)
> Jiri Kosina <[email protected]> wrote:
>
> > On Tue, 14 Apr 2015, Greg Kroah-Hartman wrote:
> >
> > > I don't understand. You can not like the D-Bus model (and accordingly
> > > the X11 model),
> >
> > I thought that the general hatred level of the X11 "model" and the
> > protocol lead to al the efforts to reimplement this properly ... in
> > userspace (for example Wayland, right?).
> >
> > I don't think anyone was ever seriously suggesting "X11 model is broken,
> > so let's push it to kernel" ... ?
>
> The X11 model is *nothing* to do with the dbus/kdbus model. X11 does
> properties by attaching them to windows. Those properties can be
> monitored for changes and they can be queried. Setting them is
> asynchronous, querying them is sync or with the newer event based
> libraries can be async. X11 properties are network safe, handled through
> the same X11 authority as everything else. Two apps can happily run on
> different systems sharing a display over the network and sharing and
> responding to changes in X11 properties - and it just works.
>
> The Gnome people tried to re-invent X11 properties and embedding badly
> with CORBA, then with dbus, despite the fact the Andrew system could
> already do it really fast and cleanly even before Gnome was thought of.
>
> There is no comparison between the elegance of X11 property setting and a
> chunk of proposed kernel code that is half the size of a tiny X server!

Hey, take that up with Havoc, he made the comparison :)

> The dbus model is also flawed in a load of other ways in user space
> because message handling in the hands of people with no concept of
> systemic performance analysis just leads to disaster. One of the big
> reasons dbus is so "slow" isn't that dbus is "slow", it's that the
> crapware on top of it makes *thousands* of dbus queries.

There's the issue of thousands of dbus queries, and then there's the
issue that making those queries takes a measurable amount of time. We
can fix the later one, the first one, well, not so much, but we can
provide the resources for them to make a faster system if they want to.

> If you must do it in kernel why not use the Android binder - it's awful,
> broken, and dubiously secure, but at least we'd still only have one awful,
> broken dubiously secure rpc/property layer in kernel.

Binder does not match up to the dbus model at all, I've written about
this in the past, and can dig it up again if you want. And, there is
active research in moving the binder userspace library onto the kdbus
code base, allowing the binder kernel driver to be removed one day.
That would be a good thing to have happen, but I'm not holding my
breath for it. Using it the other way around isn't going to work.

> "It's the issue that a stateful bus is required for
> applications that is the main point I'm trying to get across."
>
> That would be the "if dbus crashes I have to reboot" design flaw of
> Gnome and friends. The only state you need is beyond the endpoints. It's a
> message passing system. If you think message passing needs state then I'd
> take a look at the internet. State belongs in the end points.

The internet model with state in the endpoints doesn't always transfer
properly to local applications, see Havoc's email for the details about
that.

> It's telling that I can lose and recover my internet connection without
> rebooting but not my desktops internal messaging.

Yes, as those are totally different things, let's not mix the issue up
here please.

thanks,

greg k-h

2015-04-15 11:53:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:28:36AM +0200, Martin Steigerwald wrote:
> Am Mittwoch, 15. April 2015, 11:02:12 schrieb Greg Kroah-Hartman:
> > > > > And yes, I think its good not to force just about any userspace
> > > > > idea
> > > > > into the kernel.
> > > >
> > > >
> > > >
> > > > Do you have any technical objections to the patch as proposed?
> > >
> > >
> > >
> > > If I had, I would have written it. I explained already that I see
> > > that
> > > kernel developers have strong technical objections with kdbus. And
> > > that I think it is important to acknowledge it, instead of telling
> > > them, that the API is required from userspace, userspace people know
> > > what they do, and they should just go away with their concerns.
> > >
> > >
> > >
> > > Thats at least how I received quite some of your responses.
> > >
> > >
> > >
> > > Well and I raised an eyebrow on the busname matching rules and the
> > > capability stuff. Yet, I didn´t comment on it, cause I didn´t look at
> > > it in-depth. I just ask you to take those seriously who did.
> >
> > I take technical comments very seriously, where have I not? If you have
> > technical reasons why the current implementation has problems, please
> > let me know, and I will be glad to address them.
>
> >From what I read you basically answered all technical comments like in:
>
> The dbus API is like it is for a very good reason, everyone is using it
> and everyone agrees. Capabilities are used in userspace for good reason
> and so on.
>
> But I see, here, not everyone does.
>
> Most of your answers didn´t seem to address the concerns raised of having
> this in the *kernel*. Especially the security concerns.

I have responded to the security concerns, please don't say that I did not.

> Thats what I meant with "And yes, I think its good not to force just about
> any userspace into the kernel". I think arguing with this is how userspace
> does it pattern, even if it truly is for a very good reason, is not
> sufficient as argument for having it in the kernel.
>
> I am just looking at the argumentative pattern here. If other kernel
> developers complain about how hard it is to review and wrap their mind
> around the kdbus patches… I am scared at just trying to understand the
> patches. So no technical complaints from me. I did not nack it nor do I
> see myself in the position to nack it.

Please take the time to read it, 13k lines isn't much. To not read the
code and yet complain about the code is total nonsense.

greg k-h

2015-04-15 12:00:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

[Back to the capability discussion]

On Tue, Apr 14, 2015 at 11:57:22AM -0700, Andy Lutomirski wrote:
> >> Then I'll have to find a way to embolden my NACK further. My point is
> >> that capturing garbage like cmdline and capabilities (again, that
> >> latter part is completely unacceptable under any circumstances
> >> whatsoever) on behalf of *all* senders is a disaster. If it's
> >> optional, then I can at least hope that userspace will honor the
> >> optionality and let everything turn it off. If it's mandatory, then
> >> kdbus is just unsafe to use to send messages to untrusted parties.
> >
> > It's opted in by the receiving peer if the task implementing a service
> > wants to access these pieces of information. It is optional, and the
> > documentation clearly states that userspace should cope with this, and
> > also, when they are available we make sure to provide the correct
> > race-free information.
> >
> > As said many times before, an application can do so already today with
> > information from other API file systems, so why is this suddenly a
> > problem when kdbus optionally offers the exact same information along
> > with each transmitted message? Yes, we all "hate" capabilities, but
> > userspace uses them, and gets access to them all the time through the
> > POSIX apis (capget(), cap_get_pid(), capgetp(), etc.) and through
> > /proc/pid/status. They are something that we have to support and handle
> > properly.
> >
> > In the very first submission of kdbus, we stated that we want to allow
> > userspace methods to access these same bits to be able to make decisions
> > about permissions. And to do so in a race-free manner, which is very
> > hard, if not almost impossible, to do so from userspace alone.
> >
> > For instance, if a task has CAP_NET_ADMIN set, we can use that
> > information in order to allow or disallow certain actions to be taken by
> > a privileged process. Or, if a client that has the capability to call
> > reboot (i.e. have CAP_SYS_REBOOT) makes the D-Bus call to reboot the
> > system, the system daemon listening for that message knows that yes, at
> > the time that the client made that call, it really did have that
> > capability so it is ok to actually reboot the system.
> >
> > Instead of trying to use SCM_CREDENTIALS to get the pid and another
> > round of cap_get_pid() and the like, all of which are susceptable to
> > racing and all sorts of other horrors, that are insecure, we can provide
> > this information in an atomic, and secure way.
>
> /me suppresses a long string of expletives.
>
> Please point me at the code that does this with caps. It's WRONG in
> userspace and it's WRONG in the kernel. I want to know what code that
> runs on my system does this so I can send the appropriate bug reports
> and get it fixed. I think the RHEL crowd at least will take it
> seriously when I tell them that this is a security hole.

Look at how polkit and login manager work. Or anything that uses
SCM_CREDENTIALS. Also I think PAM does odd things with credentials, but
it's been a long time since I looked at any PAM code, I could be wrong.
Also look at users of SO_PEERCRED, as those are used in places as well,
but you know all about those.

Also look at programs that make those capability calls, they are
obviously using them for some reason, right? Nothing we can do about
them, and it's not the main issue here at all, sorry for the
side-discussion.

> > The kernel today, and userspace, relies on capabilities all the time
> > (i.e. almost every syscall), how are they something that is somehow not
> > valid to use and support?
>
> No. The *kernel* relies on caps. Userspace should not.

Userspace uses caps to have the kernel do things. Or not do things. If
not, why do we have things like SCM_CREDINTIALS in the first place?

> > And of course, as Eric will point out, capabailities are not
> > translatable across user namespaces, which is a problem. Because of
> > this, we dispose of that piece of metadata information when a message
> > crosses a user namespace boundry. This is the right thing to do, which
> > is not the case for almost all other kernel apis which report bogus
> > capabilies when user namespaces are crossed.
>
> The right thing to do is to not use capabilities for userspace stuff.

Again, userspace needs them in order to have the kernel do things for
userspace as needed. Look at the Tizen example in the first email,
where they had to use SCM_CREDENTIALS, and all of the speed/latency
issues that this resulted in.

> > So we implemented this correctly, and somehow that is a feature so bad
> > that both you and Eric think the whole baby should be thrown out? How
> > else should this be implemented?
>
> It shouldn't be implemented.

Great, so can we also drop those POSIX functions and the /proc/
information as well? I didn't think so :)

> > As documented in the original email on this thread, Tizen wants to use
> > this, as it solves a real need that they have. Their workarounds
> > involve using custom UDS sockets, but the latency involved is horrid and
> > unacceptable. Using a kdbus message solves this issue for them,
> > allowing UI rendering to work properly/quickly.
> >
> > Again, capabilities are something we all require and rely on today,
> > passing the current capability on to a recipient isn't a way to raise
> > privileges at all, but rather, properly determine if they are present
> > at sending time, if wanted. How does that create an insecure system?
> > What am I missing that is so bad here with the design we have?
>
> That, even if the implementation could be made to be useful and
> correct, capabilities refer to privileges wrt the kernel, not
> userspace. They're not the right bit of policy to look at here.

So what is the right bit of policy to look at then?

> For example, the thing that should make it possible to run 'systemctl
> reboot' or whatever is not CAP_SYS_BOOT, because CAP_SYS_BOOT is the
> permission to hard reboot the system immediately, and that's not what
> 'systemctl reboot' is for.

'systemctl reboot' calls a bunch of other things to determine if you
have local access to the machine, or permissions to reboot the machine
(i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
and then, it decides to reboot or not. That happens today, right? I
don't understand the argument here.

confused,

greg k-h

2015-04-15 12:05:04

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> > There is no comparison between the elegance of X11 property setting and a
> > chunk of proposed kernel code that is half the size of a tiny X server!
>
> Hey, take that up with Havoc, he made the comparison :)

And it concerns me you blindly repeat it without realising its wrong.

> > The dbus model is also flawed in a load of other ways in user space
> > because message handling in the hands of people with no concept of
> > systemic performance analysis just leads to disaster. One of the big
> > reasons dbus is so "slow" isn't that dbus is "slow", it's that the
> > crapware on top of it makes *thousands* of dbus queries.
>
> There's the issue of thousands of dbus queries, and then there's the
> issue that making those queries takes a measurable amount of time. We
> can fix the later one, the first one, well, not so much, but we can
> provide the resources for them to make a faster system if they want to.

If you fix the thousands of queries problem do you need kernel help at
all.

> The internet model with state in the endpoints doesn't always transfer
> properly to local applications, see Havoc's email for the details about
> that.

URL ?

(note how beautifully btw the stateless network and the URL string will
become a reference to state)

> > It's telling that I can lose and recover my internet connection without
> > rebooting but not my desktops internal messaging.
>
> Yes, as those are totally different things, let's not mix the issue up
> here please.

They are *NOT* different things. They are fundamental properties of the
underlying architecture. I worked on stateful networks and still have
the scars. It is a fundamental property of stateful network that every
time any key component goes castors up you lose the lot. It is a fairly
fundamental property of stateless networks that equipment going castors
up has no material impact on the network

The internet is built upon three fundamental breakthroughs in technology

- That stateless networks scale and can be reliable while stateful ones
cannot scale and cannot be fixed to do so

- That flow control is possible over a stateless network

- That efficient data routing is possible over a stateless network

Those are absolutely critical parts of any network or messaging
implementation.

Alan

2015-04-15 12:09:38

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> 'systemctl reboot' calls a bunch of other things to determine if you
> have local access to the machine, or permissions to reboot the machine
> (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> and then, it decides to reboot or not. That happens today, right? I
> don't understand the argument here.

And what exactly is the argument that this is the way it should be
implemnted?

Why can't it just rely on the kernel to provide final answer to "to reboot
or not to reboot, that is the question"?

At the end of the day, it's the kernel that decides whether it will really
ultimately ask the platform to reboot.

If, for whatever reason (which might be completely invisible to userspace)
kernel decides not to do so, userspace has to be able to recover from such
failure in any case.

--
Jiri Kosina
SUSE Labs

2015-04-15 12:19:10

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 14:09:24 +0200 (CEST)
Jiri Kosina <[email protected]> wrote:

> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
>
> > 'systemctl reboot' calls a bunch of other things to determine if you
> > have local access to the machine, or permissions to reboot the machine
> > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> > and then, it decides to reboot or not. That happens today, right? I
> > don't understand the argument here.

The first problem with that is that if you run the capability model in
the kernel combined with our distributions through any kind of formal
analysis it'll come out with more holes than a roll of wire netting.

There are lots of capability handling bugs that allow you to get one
capability from another where it should not be possible. Linux
capabilities were a little ad-hoc and a "neat idea" in their day.

It's not how anyone would do them now. At best they are ok for little
things like network raw access in ping/traceroute.

Thats an implementation detail. If we were to adopt something like
capsicum the stuff you pass would look way different and the model would
potentially work.

> And what exactly is the argument that this is the way it should be
> implemnted?

For me the fact that capabilities are known legacy and broken, and the
model will change. Better would be to just pass some "cookie" that can be
used to ask "is the sender allowed to X" via the LSM modules.

That futureproofs the portability I think - and is also actually more
powerful anyway.

> Why can't it just rely on the kernel to provide final answer to "to reboot
> or not to reboot, that is the question"?

It can, however you may want userspace to assert privileges and reboot
even though the user doesn't have the right powers directly (think about
mundane things like ctrl-alt-del or the reboot button on a desktop).

Alan

2015-04-15 12:27:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 02:09:24PM +0200, Jiri Kosina wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
>
> > 'systemctl reboot' calls a bunch of other things to determine if you
> > have local access to the machine, or permissions to reboot the machine
> > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> > and then, it decides to reboot or not. That happens today, right? I
> > don't understand the argument here.
>
> And what exactly is the argument that this is the way it should be
> implemnted?

I can't answer that, discuss it with the developers of that userspace
code please.

> Why can't it just rely on the kernel to provide final answer to "to reboot
> or not to reboot, that is the question"?

Usually you want to do a few things before telling the kernel to reboot,
like unmount all filesystems and the like :)

Anyway, we are getting away from the code at hand, please, let's discuss
that.

greg k-h

2015-04-15 12:30:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:18:28PM +0100, One Thousand Gnomes wrote:
> On Wed, 15 Apr 2015 14:09:24 +0200 (CEST)
> Jiri Kosina <[email protected]> wrote:
>
> > On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
> >
> > > 'systemctl reboot' calls a bunch of other things to determine if you
> > > have local access to the machine, or permissions to reboot the machine
> > > (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
> > > and then, it decides to reboot or not. That happens today, right? I
> > > don't understand the argument here.
>
> The first problem with that is that if you run the capability model in
> the kernel combined with our distributions through any kind of formal
> analysis it'll come out with more holes than a roll of wire netting.
>
> There are lots of capability handling bugs that allow you to get one
> capability from another where it should not be possible. Linux
> capabilities were a little ad-hoc and a "neat idea" in their day.

"formal analysis"? Heh, yeah, I know all about that, and really, that's
not anything we can do about here.

> It's not how anyone would do them now. At best they are ok for little
> things like network raw access in ping/traceroute.
>
> Thats an implementation detail. If we were to adopt something like
> capsicum the stuff you pass would look way different and the model would
> potentially work.

True, the capsicum developers seem to have gone quiet on us :(

> > And what exactly is the argument that this is the way it should be
> > implemnted?
>
> For me the fact that capabilities are known legacy and broken, and the
> model will change. Better would be to just pass some "cookie" that can be
> used to ask "is the sender allowed to X" via the LSM modules.
>
> That futureproofs the portability I think - and is also actually more
> powerful anyway.

Yes, that would work, but that kind of sounds like the same thing we
have today, just with a different name :)

thanks,

greg k-h

2015-04-15 12:37:30

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote:

> I've asked for it, but finding people to review code is hard, as you
> know. It's only 13k lines long, smaller than a serial port driver (my
> unit of code review), so it's not all that big.
>
> It's smaller than the USB3 host controller driver as well, and very few
> people ever reviewed that beast :)
>
> > For something that's potentially such a core mechanism as a completely
> > new, massively-adopted IPC, this does send a warning singal.
>
> If you know of a way to force others to review code, please let me know.

Have it in a less nasty state, perhaps? Random question:

al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock
ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock);
ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock);
ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock);
ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock);
ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock);

Do you see anything wrong with that? Or with things like that:
mutex_lock(&pos->lock);
v_pre = atomic_read(&pos->active);
if (v_pre >= 0)
atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
else if (v_pre == KDBUS_NODE_NEW)
atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
mutex_unlock(&pos->lock);
What are the locking rules for ->active/->waitq/->lock? Are those the
outermost thing in the hierarchy? Or is that dependent on the node location?
It sure as hell is outside of (at least) ->mmap_sem (by way of
kdbus_conn_connect() establishing that ->active/->waitq is outside of
->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything
taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone
brings in a lot...

Document your goddamn locking, would you? It *IS* new code, and you, as you
say, had very few people working on it, so you don't have the excuses for
the mess existing in older parts of the tree.

Locking complexity in there is easily as bad as that of VFS sans the RCU fun;
sure, I can spend a week and (hopefully) document it for you, but I would
really prefer if you guys had done that. And I *do* appreciate the comments
in node.c, but they are nowhere near enough.

Tracking the call chains in there and trying to derive the locking ordering
from those is quite a bit of work; _verifying_ that it matches the claimed
one would be expected from reviewers, but as it is you are asking to spend
a lot of efforts to close the gaps in your documentation. Sheesh...

2015-04-15 12:41:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:03:54PM +0100, One Thousand Gnomes wrote:
> > > There is no comparison between the elegance of X11 property setting and a
> > > chunk of proposed kernel code that is half the size of a tiny X server!
> >
> > Hey, take that up with Havoc, he made the comparison :)
>
> And it concerns me you blindly repeat it without realising its wrong.

It's a metaphor that makes sense to me given my limited knowledge of the
x11 protocol. If it's wrong, ok, I'm willing to learn, but I think it's
still relevant here.

> > > The dbus model is also flawed in a load of other ways in user space
> > > because message handling in the hands of people with no concept of
> > > systemic performance analysis just leads to disaster. One of the big
> > > reasons dbus is so "slow" isn't that dbus is "slow", it's that the
> > > crapware on top of it makes *thousands* of dbus queries.
> >
> > There's the issue of thousands of dbus queries, and then there's the
> > issue that making those queries takes a measurable amount of time. We
> > can fix the later one, the first one, well, not so much, but we can
> > provide the resources for them to make a faster system if they want to.
>
> If you fix the thousands of queries problem do you need kernel help at
> all.

I've worked with developers of such systems, and no, they can't fix that
problem. They are using "legacy" applications that they have to run on
some type of operating system, and really don't want to use legacy
operating systems anymore. Those "legacy" oses provide a system bus
that allows them to send thousands of queries just fine, but when moving
to Linux, we don't have anything other than D-Bus, so their library is
ported to use it, and they have to handle their old applications that
need/want the zillions of messages.

Then they thow the thing on a very underpowered ARM processor and
complain about boot time being so slow, but that's a different issue...

> > The internet model with state in the endpoints doesn't always transfer
> > properly to local applications, see Havoc's email for the details about
> > that.
>
> URL ?
>
> (note how beautifully btw the stateless network and the URL string will
> become a reference to state)

Heh, yes, but there's very little state here:
http://lists.freedesktop.org/archives/dbus/2015-April/016651.html

There's also a follow-on message from the current D-Bus maintainer:
http://lists.freedesktop.org/archives/dbus/2015-April/016653.html

> > > It's telling that I can lose and recover my internet connection without
> > > rebooting but not my desktops internal messaging.
> >
> > Yes, as those are totally different things, let's not mix the issue up
> > here please.
>
> They are *NOT* different things. They are fundamental properties of the
> underlying architecture. I worked on stateful networks and still have
> the scars. It is a fundamental property of stateful network that every
> time any key component goes castors up you lose the lot. It is a fairly
> fundamental property of stateless networks that equipment going castors
> up has no material impact on the network
>
> The internet is built upon three fundamental breakthroughs in technology
>
> - That stateless networks scale and can be reliable while stateful ones
> cannot scale and cannot be fixed to do so
>
> - That flow control is possible over a stateless network
>
> - That efficient data routing is possible over a stateless network
>
> Those are absolutely critical parts of any network or messaging
> implementation.

People take those stateless models and build stateful ones on top of
them, yes, it's great. But you still need a stateful model somewhere in
order to be able to achieve many things (think a shopping cart
application).

Anyway, this is getting off-topic, there is very little "state" in the
kdbus kernel code here, other than a naming database that Havoc and
Simon explain the need for, and the normal lifecycle of kdbus "nodes"
(new, linked, active, inactive, drained, freed).

thanks,

greg k-h

2015-04-15 12:55:40

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote:
> > There is no comparison between the elegance of X11 property setting and a
> > chunk of proposed kernel code that is half the size of a tiny X server!
>
> Hey, take that up with Havoc, he made the comparison :)

Let me get it straight - you swing the reference to his posting as damn nearly
the main argument, and yet you make _this_ reply when it gets questioned?
Seriously?

There's nothing wrong with "go read $PAPER, a lot of your questions are
addressed there", but only if you are ready to answer the questions and
objections from those who have read it. "Hey, take that up with
$AUTHOR" doesn't cut it; try anything even remotely similar with e.g.
reviewers of academic paper and see where it ends up.

Havoc isn't submitting that thing; you are. If you are not qualified to
defend your design and he is, try to talk him into doing that.

Frankly, the longer it goes, the less I like the picture. It will be up to
Linus, of course, but IMO the whole situation seriously stinks. ;-/

2015-04-15 13:06:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:40:36PM +0200, Greg Kroah-Hartman wrote:
> So stop making this discussion be about "oh those horrid systemd
> developers, I don't want their code as my init system" as that's not
> what any of this is about at all. It's about the patches being
> proposed, and the API involved in it. Please stick to that.

Well, you went there by saying that I should simply accept systemd and
whatever other crap people are producing just because Linux is not about
choice. And I'm still amazed that you really and seriously think that -
you must've been drinking the systemd cool aid for too long.

So to get back to kdbust: the design of this thing is flawed, it clearly
needs a lot more discussing and changes and it *absolutely* has no place
upstream in its current form as *no* *one* has reviewed that pile except
Andy and Eric to a certain degree. Oh and I haven't seen them lift their
NAKs yet...

You and I know that's not how stuff is upstreamed.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-15 13:13:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:36:33PM +0100, Al Viro wrote:
> al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock
> ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock);
> ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock);

Heh, that's a leftover from an older version, I'll go fix that up to be
a simple mutex, which is all that this is doing here anyway.

> Do you see anything wrong with that? Or with things like that:
> mutex_lock(&pos->lock);
> v_pre = atomic_read(&pos->active);
> if (v_pre >= 0)
> atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
> else if (v_pre == KDBUS_NODE_NEW)
> atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
> mutex_unlock(&pos->lock);
> What are the locking rules for ->active/->waitq/->lock? Are those the
> outermost thing in the hierarchy? Or is that dependent on the node location?
> It sure as hell is outside of (at least) ->mmap_sem (by way of
> kdbus_conn_connect() establishing that ->active/->waitq is outside of
> ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything
> taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone
> brings in a lot...
>
> Document your goddamn locking, would you? It *IS* new code, and you, as you
> say, had very few people working on it, so you don't have the excuses for
> the mess existing in older parts of the tree.

Fair enough, documenting the locking is a good thing, that will make
reviewing this easier, I'll go work on that.

> Locking complexity in there is easily as bad as that of VFS sans the RCU fun;
> sure, I can spend a week and (hopefully) document it for you, but I would
> really prefer if you guys had done that. And I *do* appreciate the comments
> in node.c, but they are nowhere near enough.

Thanks, it's hard to balance the comment/code level at times. And yes,
it is complex and should be explained better, will work on that.

greg k-h

2015-04-15 13:22:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
> dbus is not an appropriate design for a kernel messaging layer for a
> variety of reasons. That's not to say dbus shouldn't be able to use a
> fast kernel messaging layer, or that one shouldn't exist.
>
> dbus is basically a very large very specialized and somewhat flawed
> policy engine on top of what should be simple messaging. The two need
> splitting apart.
>
> Abstract low level messaging layers are not a new concept. V7 unix had
> one experimentally. It's about getting the separation right.
>
> IMHO that probably involves getting the right people in the right place
> together - dbus designers, MPI and realtime people, kernel folks and
> possibly also some of the hardware messaging folk.
>
> In filesystem terms
>
> - stop writing a dbus only file system
> - figure out what a messaging "vfs" looks like
> - figure out what an clean low level kernel model looks like
> - figure out what has to be where to put the policy in userspace
>
> What might also be worth review is how much dbus traffic actually ought to
> be an object store implemented say with tmpfs and inotify type
> functionality (or extensions of that) so that you can
> set/read/enumerate/get change notifications on properties.

FWIW, this sounds really sane and makes a lot of sense to me. I'd be
willing to give it some review cycles, as far as I can, when done this
way.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-15 14:07:45

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> operating systems anymore. Those "legacy" oses provide a system bus
> that allows them to send thousands of queries just fine, but when moving
> to Linux, we don't have anything other than D-Bus, so their library is
> ported to use it, and they have to handle their old applications that
> need/want the zillions of messages.

And if you look at those systems btw many of them have a very compact,
very clean very simple message passing interface,often in the hundreds
not tens of thousands of lines of code.

> People take those stateless models and build stateful ones on top of
> them, yes, it's great. But you still need a stateful model somewhere in
> order to be able to achieve many things (think a shopping cart
> application).

We put the IP stack in the kernel not the shopping cart. A good shopping
cart of course only has state on the client.

> Anyway, this is getting off-topic, there is very little "state" in the
> kdbus kernel code here, other than a naming database that Havoc and
> Simon explain the need for, and the normal lifecycle of kdbus "nodes"
> (new, linked, active, inactive, drained, freed).

I'm not convinced the naming data belongs in kernel beyond the simplest
of "node 147". I'd offer a sort of proof by armwaving of this that if you
have


/dev/dbus/014 /dev/dbus/027 etc

you can add a symlink to /dev/dbus/014 of

/dev/dbus-by-name/gnome-wombat-grooming-daemon

or whatever

and we do that today for every other naming database and static
allocation we've spent the past 15 years evicting from the kernel.

That state isn't then held in a daemon that can crash nor is it invisible
to debuggers, user tools and admins.

Alan

2015-04-15 14:49:14

by Michal Schmidt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/15/2015 09:31 AM, Mike Galbraith wrote:
> it seems [systemd] has now mandated group scheduling.

What makes you think so? Was it the fact that by default you have a
populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some
unit requests the use of the cpu controller using one of the CPU*=
directives from systemd.resource-control(5), or (perhaps more likely)
because there is a privileged unit with Delegate=yes. The most likely
candidate is [email protected], and so you could try preventing it from
starting:
systemctl mask [email protected]

Note that systemd still works without group scheduling or any cgroup
subsystems enabled in the kernel:

$ grep GROUP .config
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_NETFILTER_XT_MATCH_CGROUP is not set
# CONFIG_NETFILTER_XT_MATCH_DEVGROUP is not set
# CONFIG_NET_CLS_CGROUP is not set
# CONFIG_CGROUP_NET_PRIO is not set
# CONFIG_CGROUP_NET_CLASSID is not set

Michal

2015-04-15 15:35:01

by Mike Galbraith

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote:
> On 04/15/2015 09:31 AM, Mike Galbraith wrote:
> > it seems [systemd] has now mandated group scheduling.
>
> What makes you think so?

If group sched is available, systemd decides on its own to use it,
thus making the decision to eat that overhead for me should I happen
to boot say an enterprise kernel to do some performance measurements.

Perhaps there is a way to beg it to please not do that, but if so, I
didn't find it in time. The service that started group scheduling was
explicitly disabled by me, but systemd started it at boot despite
that. Perhaps I didn't express my wishes clearly enough, or I need to
burn a virgin or something to become worthy of its attention, dunno.

Applying my axe to its tentacles fixed the communication issue.

-Mike

2015-04-15 15:42:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:40:36PM +0200, Greg Kroah-Hartman wrote:
>
> You have to trust someone to help make your system work together in a
> unified way. If you can't trust your distro's engineers, then either
> start your own distro, or only run busybox on top of a kernel. You
> really don't have much other "choice" than that :)

And obviously there is a lack of trust. And once kdbus is in, we must use
it, or support our own distro where we just do not have the time.

Personally, I'm fine with getting something in that will help userspace
tools work better. The issue I see, mostly from the side lines as I haven't
totally submerged myself into the dbus protocol (I think I should spend
some time to do just that), this is going too fast. Once it is in the kernel,
whatever ABI we expose is locked in stone. There's no changing it. We need
to make sure that this is well thought out. People seem to be of the impression
that the current dbus design has flaws, but because everything relies on it
we must still push it into the kernel because it mimics what is out there
in user space. I disagree.

As others have said. We do not need to follow the dbus design. If we can supply
a better transport layer than what the kernel supplies today, then tools will
eventually merge to it away from dbus. Perhaps the kernel can supply just enough
to have dbus improve its speed, but not with the entire complex solution that
kdbus is presenting today.

This isn't a case of Republicans vs Democrats pushing a health care system within
a window that was rushed. Now the US has a health care system that somewhat works
but due to politics its not being fixed (the ABI is solidified). I don't want
to have the same thing with kdbus. We are technical people here, lets solve it
with a technical solution, and not rush into things. dbus works today, what's
the rush to put something into the kernel that must be supported forever. Lets
make sure we do it right.

I'm serious about my Linux Plumbers proposal. If you can make it, and get the dbus
authors there too, and hopefully, Andy, Al and Eric can make it too. We should
really sit down and talk about it. Any other kernel developer that wants to
participate should, as a prerequisite, sit down and write a dbus interface, such
that they have an idea of how it works. I plan to. And I hope that I can learn
more about the interface and productively join in this discussion.

I'm willing to moderate the kdbus microconference. I think I'll add it now.

Thoughts?

-- Steve

2015-04-15 15:46:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
>
> IMHO that probably involves getting the right people in the right place
> together - dbus designers, MPI and realtime people, kernel folks and
> possibly also some of the hardware messaging folk.

/me continues on as a broken record

I suggest that we can do this at Linux Plumbers, and then follow up at
Kernel Summit, for those that can (or wont) attend plumbers.

-- Steve

2015-04-15 15:47:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 8:45 AM, Steven Rostedt <[email protected]> wrote:
> On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
>>
>> IMHO that probably involves getting the right people in the right place
>> together - dbus designers, MPI and realtime people, kernel folks and
>> possibly also some of the hardware messaging folk.
>
> /me continues on as a broken record
>
> I suggest that we can do this at Linux Plumbers, and then follow up at
> Kernel Summit, for those that can (or wont) attend plumbers.

I'm definitely available for KS. I'm not sure about Plumbers.

--Andy

>
> -- Steve
>



--
Andy Lutomirski
AMA Capital Management, LLC

2015-04-15 16:01:40

by Rik van Riel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:

>> that anyone here does either. In the many years I've spent working on
>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>> has normally worked, because it is. And that's not a bad thing, it's
>> just different, and for us to support real needs and requirements of our
>> users, is the requirement of the Linux kernel.
>
> There are I think a set of intertwined problems here
>
> - An efficient delivery system for multicast messages delivered locally
> (be that MPI, dbus whatever - it's not "dbus or nothing")
>
> - A kernel side dynamic namespace to describe what goes where
>
> - A kernel side security model to describe who may receive what, and
> which additional information/tags/cred info
>
> - Something that provides state to stuff that needs it (and probably
> belongs in userspace - dbus name service etc)
>
> - Something that maps dbus and other models onto the kernel security
> model (and we have tools like EBPF which are very powerful)
>
> - Something that maps the kernel layer onto models like MPI-3

It is not clear to me why user space applications would
have to change if the kernel bus used for dbus behaves
differently from the userspace dbus daemon.

Can't libdbus take care of the differences, and remove
some of the problems highlighted by Alan (eg. the possibility
of the protocol requiring the kernel to keep more messages
in flight than we have memory for) ?

2015-04-15 16:27:53

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

I'm temporarily joining the list if anyone has questions about why
dbus was originally the way it is. If you would like answers about its
latest usage, systemd, or the kernel implementation, those are best
answered by others.

I "led" the original design but I was hardly the only person involved.
I was sort of synthesizing previous efforts, lots of ideas from other
people, and mediating the politics of the time.

What I'd like to see in this conversation is: understanding what
exists, and why it exists.

If people understand that then I think they can make good decisions,
using whatever process or timeline you like; I don't pretend to know
much about kdbus, but I see a lot of confusion here about the use-case
and design of dbus itself.

No one should take the design on faith. To improve and maintain
something it must be understood.

Why should you bother to understand dbus as it exists? It's pretty
successful, and I think for a reason. Hundreds of programs are using
dbus, it's become (over a decade) foundational to the most-used Linux
userspaces, there are many different implementations of it, and it's
been quite a stable design over that time without any major changes. I
don't think that's because it's perfect; I do think it's because some
things are right, in ways that previous designs were not. The Linux
userspace community went through a lot of alternatives before dbus,
and dbus was the one that lasted.

The worst-case scenario in my mind would be for the kernel to merge
something dbus-like, but with ill-informed changes that render it
worse. Then you would have a new ABI that nobody wants to use. We have
a design in the wild that's been very successful. People using it for
its intended use-case seem to like it. Step 1 is to try to understand
why that is.

I will try to give my take on some of the reasons.

I can't emphasize enough that the success of dbus was *because of*
many "obvious" criticisms people may have. Why? Tradeoffs. Given
infinite time and resources, many of those tradeoffs can be mitigated
or avoided - and I see kdbus as part of an effort to do so.

The first and most important tradeoff: the central daemon (the hub in
the wheel). A central daemon has several disadvantages. The success of
dbus happened because those disadvantages, in this context, are not as
important as the advantages.

The advantages include:

* ability to send a broadcast message to all interested processes
* tracking/discovering well-known and unique names
* crossing security domains (system-daemon-to-per-user-UIs, in
particular) in an orderly fashion
* reducing the number of file descriptors needed for N apps to all
talk to each other
* relatively simple model for application developers to get right

The disadvantages include:

* performance (extra context switches, copies, and validations)
* it's difficult to handle killing/restarting the central daemon;
dbus actually gives clients all the tools to do this, but in practice
if you restart the daemon you are gambling that a hundred clients
connected to it have implemented bug-free restart handling.
* not a distributed cluster (it's a single bottleneck and point of
failure running on a single machine - the daemon is a source of truth,
which is also its virtue of course)

For dbus to be as useful as it has been, these disadvantages, while
not desirable, were acceptable tradeoffs. So it would be a mistake to
solve any of these disadvantages by breaking the advantages.

Message passing or IPC isn't really the most important part of dbus.
Process lifecycle tracking and discovery are more important. However,
by integrating the IPC system with the lifecycle tracking you can
simplify the overall system and avoid race conditions. For example,
you can have processes that auto-launch race-free when you send them a
message, or more generally you can have an ordering between lifecycle
events and other messages. For example if I send out a broadcast
message and then disconnect, other clients will see first the
broadcast and then the disconnect and won't have to handle the
out-of-order case.

dbus has a lot of semantic guarantees, such as message ordering, that
reduce application complexity and therefore reduce code and reduce
bugs.

When implementing a Linux workstation userspace, ideally you have lots
of little processes that do one thing each; but the tradeoff is that
multi-process adds complexity. If your model for a multi-process
program is that it has to solve a lot of hard distributed system
problems, then it adds a LOT of complexity. But when everyone's on a
single machine, it is not necessary to solve (all of) those problems,
and in fact trying to solve non-problems creates bugs by adding
tricky, rarely-touched codepaths. It is overengineering to treat "tray
icon talking to NetworkManager" the same way you would treat IPC and
shared state within a distributed cluster.

Multi-process is valuable though; an alternative userspace design
could be like Eclipse or Emacs, i.e. one enormous process with
plugins, which would be a mess.

There was some debate over my X11 analogy. One of the "thought
experiments" while figuring out dbus was "why does CORBA seem to be at
the root of endless bug reports, while X11 isn't?"

Here are some things I think dbus has in common with X11:

* it's a hub-and-spoke design (a central server that all apps connect
to) rather than a design where every process talks directly to every
other process
* dbus names are directly modeled on X selections (see ICCCM)
* designed to allow race-free asynchronous usage and minimize the
need for round trips (though apps can certainly design bad APIs, see
http://dbus.freedesktop.org/doc/dbus-api-design.html for advice on
avoiding that)
* binary protocol rather than text
* generally assumes a reliable network - assumes all messages will
arrive, as long as the connection is live
* similar model for discovering and authenticating to the server
* allows clients to track each other's lifecycle
* it is stateful; clients connect, fetch the current state, then
track changes to the state via events.

Some differences from X11 of course:

* X11 is a domain-specific server (about sharing the graphics and
input hardware among multiple clients), while with dbus the
domain-specific API will be in some client and the bus is only an
intermediary.
* X11 therefore has a bunch more server state than dbus; dbus only
has to track clients, not track the state of the window system.
* IPC on X11 is sort of bolted on in an ugly way (client messages)
while dbus cleanly maps to the OO model people are used to in the rest
of their code.

Havoc

2015-04-15 16:35:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:45:52AM -0400, Steven Rostedt wrote:
> On Wed, Apr 15, 2015 at 12:25:55PM +0100, One Thousand Gnomes wrote:
> >
> > IMHO that probably involves getting the right people in the right place
> > together - dbus designers, MPI and realtime people, kernel folks and
> > possibly also some of the hardware messaging folk.
>
> /me continues on as a broken record
>
> I suggest that we can do this at Linux Plumbers, and then follow up at
> Kernel Summit, for those that can (or wont) attend plumbers.

I really doubt this will work for Plumbers, sorry. And technical things
don't work well, if at all, at Kernel Summit.

We have had meetings about this at the past two Plumbers conferences,
where none of these things came up (i.e. dislike of the D-Bus model).

I'll be glad to discuss this at both places, but let's try to work
through the technical things through email, as really, that's the best
place for it.

Al just proved this by pointing out some issues to be resolved (RW lock
only used as a W lock, odd atomic values and locking without documenting
the lifecycles, etc.) And that's the way this is supposed to work,
nothing new/different here that I can see.

thanks,

greg k-h

2015-04-15 16:40:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:41:53AM -0400, Steven Rostedt wrote:
>
> And obviously there is a lack of trust. And once kdbus is in, we must use
> it, or support our own distro where we just do not have the time.

Just like cgroups, and ftrace :)

> Personally, I'm fine with getting something in that will help userspace
> tools work better. The issue I see, mostly from the side lines as I haven't
> totally submerged myself into the dbus protocol (I think I should spend
> some time to do just that), this is going too fast. Once it is in the kernel,
> whatever ABI we expose is locked in stone. There's no changing it. We need
> to make sure that this is well thought out. People seem to be of the impression
> that the current dbus design has flaws, but because everything relies on it
> we must still push it into the kernel because it mimics what is out there
> in user space. I disagree.

"fast"? Are you kidding me? This stuff has been under active, public,
development for over two years. We have been posting public patches,
asking for review and comments for _months_ now. Given that there were
no more specific review comments on the patch set, and its success in
linux-next for almost the entire 4.0 development cycle, I asked it to be
merged.

I don't know too many other kernel features/drivers that have taken this
long, or done this "slowly", do you?

> As others have said. We do not need to follow the dbus design. If we can supply
> a better transport layer than what the kernel supplies today, then tools will
> eventually merge to it away from dbus. Perhaps the kernel can supply just enough
> to have dbus improve its speed, but not with the entire complex solution that
> kdbus is presenting today.

I originally thought this would work too. 8 months of work later, I was
proven wrong, that will not work. Or it imposes too much additional
work on userspace that really makes no sense at all. The in-kernel code
isn't a lot (again, 13k lines, smaller than almost all of the drivers
you are using today on an individual basis) It's also really fast, but
with benchmarks, David and Andy have found some minor bottlenecks that
can make things faster.

Yes it seems complex, but read the documentation to get an idea of what
is happening here. I think you will get a better appreciation of what
is going on.

thanks,

greg k-h

2015-04-15 16:42:31

by Mike Galbraith

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote:

> systemctl mask [email protected]

That off switch may work better, I'll try it when I have time to
squabble with the thing again, thanks.

user@0 was disabled in yast (suse admin tool) by me, yet found to be
in state disabled+active upon every boot. Just as yast did, systemctl
status reported it as being both disabled and active, which led me to
the conclusion that someone other than me controls this service.

-Mike

2015-04-15 16:45:48

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <[email protected]> wrote:
> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:
>
>>> that anyone here does either. In the many years I've spent working on
>>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>>> has normally worked, because it is. And that's not a bad thing, it's
>>> just different, and for us to support real needs and requirements of our
>>> users, is the requirement of the Linux kernel.
>>
>> There are I think a set of intertwined problems here
>>
>> - An efficient delivery system for multicast messages delivered locally
>> (be that MPI, dbus whatever - it's not "dbus or nothing")
>>
>> - A kernel side dynamic namespace to describe what goes where
>>
>> - A kernel side security model to describe who may receive what, and
>> which additional information/tags/cred info
>>
>> - Something that provides state to stuff that needs it (and probably
>> belongs in userspace - dbus name service etc)
>>
>> - Something that maps dbus and other models onto the kernel security
>> model (and we have tools like EBPF which are very powerful)
>>
>> - Something that maps the kernel layer onto models like MPI-3

When trying to split apart problems, for dbus it's important to keep
ordering guarantees.

That is, with dbus if I send a broadcast message, then send a unicast
request to another client, then drop the connection causing the bus to
broadcast that I've dropped; then the other client will see those
things in that order - the broadcast, then the request, and then that
I've dropped the connection.

If you have separate facilities for these things, it could get hard to
keep them in order. dbus uses the simple model that they stay in order
because the bus conceptually has a single dispatch queue.

By pushing everything through one queue, dbus is trying to reduce the
number of codepaths in applications. Apps have a lot of new problems
to solve if messages get their order scrambled.

(dbus does NOT guarantee order across multiple clients, of course -
there's no guarantee that all clients get the broadcast, before anyone
gets the next message - each client has its own buffer on both read
and write. The ordering is only with respect to each client's message
stream.)

Ordering is vital for tracking state, because if you're sending out
events to describe changes in state, the order of those changes is
important.

Of course there are more complex ways to handle this over in
distributed-systems-world.

Havoc

2015-04-15 16:47:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote:
>
> > But the problem really is that I don't think you've received even a single
> > Reviewed-by: from someone who hasn't been directly involved in developing
> > the code, right?
>
> I've asked for it, but finding people to review code is hard, as you

Perhaps try harder. You know more kernel developers than I do. You don't have
anyone you can say "hey, I need this code reviewed, can you spend some time
to review it for me"? I have a few developers that are willing to do that
for me, and I wont push some code (if it is complex) until they give their
review-by for it. I did that with the latest TRACE_DEFINE_ENUM() code, as
well as my ftrace trampoline code and the multi buffer code. None of that
went in until I had their reviewed-by tags.

> know. It's only 13k lines long, smaller than a serial port driver (my
> unit of code review), so it's not all that big.

Length of code does not determine the complexity of it.

>
> It's smaller than the USB3 host controller driver as well, and very few
> people ever reviewed that beast :)
>
> > For something that's potentially such a core mechanism as a completely
> > new, massively-adopted IPC, this does send a warning singal.
>
> If you know of a way to force others to review code, please let me know.

Keep asking, that's the best way. That's what I do. Also, I really like Alan's
approach to this. Let me requote it here:

- stop writing a dbus only file system
- figure out what a messaging "vfs" looks like
- figure out what an clean low level kernel model looks like
- figure out what has to be where to put the policy in userspace

-- Steve

2015-04-15 16:48:57

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> The in-kernel code isn't a lot (again, 13k lines, smaller than almost
> all of the drivers you are using today on an individual basis) It's

I originally didn't want to comment on this, but now that you are making
this argument for 3rd or 4th time, I can't really resist. What exactly are
you trying to "prove" by the 13k-lines argument?

mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole
memory reclaim is trivial to review?

--
Jiri Kosina
SUSE Labs

2015-04-15 17:07:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 18:35:20 +0200
Greg Kroah-Hartman <[email protected]> wrote:


> > I suggest that we can do this at Linux Plumbers, and then follow up at
> > Kernel Summit, for those that can (or wont) attend plumbers.
>
> I really doubt this will work for Plumbers, sorry. And technical things
> don't work well, if at all, at Kernel Summit.
>
> We have had meetings about this at the past two Plumbers conferences,
> where none of these things came up (i.e. dislike of the D-Bus model).

But were the people that are not liking it at those conference sessions?


>
> I'll be glad to discuss this at both places, but let's try to work
> through the technical things through email, as really, that's the best
> place for it.
>
> Al just proved this by pointing out some issues to be resolved (RW lock
> only used as a W lock, odd atomic values and locking without documenting
> the lifecycles, etc.) And that's the way this is supposed to work,
> nothing new/different here that I can see.

But you are missing one of the complaints that I'm reading from
people. The proposed ABI is too complex. Do we really want to jump into
having to support another tty layer?

One thing that I think may be really worth doing is that everyone on
this thread that has not yet done so, write a simple dbus application
to try to understand its design. Break it down to the requirements that
are needed, and discuss that.

Is there a reason that this patch must go in this merge window? Having
something this controversial take place during the merge window
suggests its a bit premature to push in now. Especially since it
creates a new user space interface. I think we need to really think
hard and long before we add something that can not be modified at a
later date.

I personally think face to face may help, even if it's just hallway
tracks. But at a minimum, I think more kernel developers need to play
with dbus to understand this more. And then be able to give a better
feedback. I'm also thinking that the bare minimum for a transport layer
should go in. Find out the exact requirements (as Alan suggested) and
implement that, instead of just implementing the full layer that is
happening in userspace today.

-- Steve

2015-04-15 17:20:49

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 18:40:33 +0200
Greg Kroah-Hartman <[email protected]> wrote:

> On Wed, Apr 15, 2015 at 11:41:53AM -0400, Steven Rostedt wrote:
> >
> > And obviously there is a lack of trust. And once kdbus is in, we must use
> > it, or support our own distro where we just do not have the time.
>
> Just like cgroups, and ftrace :)

Exactly.

>
> > Personally, I'm fine with getting something in that will help userspace
> > tools work better. The issue I see, mostly from the side lines as I haven't
> > totally submerged myself into the dbus protocol (I think I should spend
> > some time to do just that), this is going too fast. Once it is in the kernel,
> > whatever ABI we expose is locked in stone. There's no changing it. We need
> > to make sure that this is well thought out. People seem to be of the impression
> > that the current dbus design has flaws, but because everything relies on it
> > we must still push it into the kernel because it mimics what is out there
> > in user space. I disagree.
>
> "fast"? Are you kidding me? This stuff has been under active, public,
> development for over two years. We have been posting public patches,
> asking for review and comments for _months_ now. Given that there were
> no more specific review comments on the patch set, and its success in
> linux-next for almost the entire 4.0 development cycle, I asked it to be
> merged.
>
> I don't know too many other kernel features/drivers that have taken this
> long, or done this "slowly", do you?

What other features/drivers that you know introduce a major new IPC
user space interface that will be a core component of the system?

>
> > As others have said. We do not need to follow the dbus design. If we can supply
> > a better transport layer than what the kernel supplies today, then tools will
> > eventually merge to it away from dbus. Perhaps the kernel can supply just enough
> > to have dbus improve its speed, but not with the entire complex solution that
> > kdbus is presenting today.
>
> I originally thought this would work too. 8 months of work later, I was
> proven wrong, that will not work. Or it imposes too much additional
> work on userspace that really makes no sense at all. The in-kernel code
> isn't a lot (again, 13k lines, smaller than almost all of the drivers
> you are using today on an individual basis) It's also really fast, but
> with benchmarks, David and Andy have found some minor bottlenecks that
> can make things faster.
>
> Yes it seems complex, but read the documentation to get an idea of what
> is happening here. I think you will get a better appreciation of what
> is going on.

I read a bit of the documentation, but not enough. I really need to sit
down and play with code. That's the way I learn and understand.

-- Steve

2015-04-15 17:31:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:06:49PM -0400, Steven Rostedt wrote:
> On Wed, 15 Apr 2015 18:35:20 +0200
> Greg Kroah-Hartman <[email protected]> wrote:
>
>
> > > I suggest that we can do this at Linux Plumbers, and then follow up at
> > > Kernel Summit, for those that can (or wont) attend plumbers.
> >
> > I really doubt this will work for Plumbers, sorry. And technical things
> > don't work well, if at all, at Kernel Summit.
> >
> > We have had meetings about this at the past two Plumbers conferences,
> > where none of these things came up (i.e. dislike of the D-Bus model).
>
> But were the people that are not liking it at those conference sessions?

People who don't like a topic, usually go to a session about it, why
would they? :)

> > I'll be glad to discuss this at both places, but let's try to work
> > through the technical things through email, as really, that's the best
> > place for it.
> >
> > Al just proved this by pointing out some issues to be resolved (RW lock
> > only used as a W lock, odd atomic values and locking without documenting
> > the lifecycles, etc.) And that's the way this is supposed to work,
> > nothing new/different here that I can see.
>
> But you are missing one of the complaints that I'm reading from
> people. The proposed ABI is too complex. Do we really want to jump into
> having to support another tty layer?

Don't make idle comments, the tty layer is far more complex and larger
than the kdbus code, with much nastier issues and problems. And we
handle that just fine :)

As far as the "support" issue, we have 4 people who are all experienced,
senior kernel developers who are signed up to maintain this. There's
more experience here for this one MAINTAINERS entry per line of code
than I have seen in quite some time.

Are people somehow worried that all 4 of us are going to run away? Do
people not trust the 4 of us to stick around and maintain this and deal
with any issues found for the next few decades? If so, please let us
know, as it seems like people feel we are dumping this code on them to
maintain, which is anything but true.

> One thing that I think may be really worth doing is that everyone on
> this thread that has not yet done so, write a simple dbus application
> to try to understand its design. Break it down to the requirements that
> are needed, and discuss that.

I've done that, it's hard, use the gdbus interface instead, it makes
your life much easier.

I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
all use the library to interact with the kernel. Do that here, there
are wonderful dbus libraries out there, for all languages. Use them
instead.

> Is there a reason that this patch must go in this merge window?

What makes this merge window any different from any other? Again, I
explained why I asked it to be merged at this point in time. If people
have technical issues with it, I'll be more than glad to work them out
and merge it later, there's no "hard and fast deadline" anyone is asking
for here.

> Having something this controversial take place during the merge window
> suggests its a bit premature to push in now.

"take place"? Have you been ignoring these patches posted numerous
times for many months? This is the point in time to ask for code to be
merged, just like any other code, nothing is special here.

thanks,

greg k-h

2015-04-15 17:33:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote:
>
> There's the issue of thousands of dbus queries, and then there's the
> issue that making those queries takes a measurable amount of time. We
> can fix the later one, the first one, well, not so much, but we can
> provide the resources for them to make a faster system if they want to.

I'll argue that you can't fix the later one. One thing that I've observed over
the years of having faster computers is, as soon as you make it faster, people
will write slower software.

Currently the issue is that we have thousands of dbus queries, you make dbus
10x faster, I guarantee that people will write software with 10 thousand dbus
queries and we are no better off than we are today.

-- Steve

2015-04-15 17:34:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 06:48:46PM +0200, Jiri Kosina wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
>
> > The in-kernel code isn't a lot (again, 13k lines, smaller than almost
> > all of the drivers you are using today on an individual basis) It's
>
> I originally didn't want to comment on this, but now that you are making
> this argument for 3rd or 4th time, I can't really resist. What exactly are
> you trying to "prove" by the 13k-lines argument?
>
> mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole
> memory reclaim is trivial to review?

I'm trying to say that it's not a ton of code. lines of code are of
course not a valid way to judge complexity, and I'm not trying to say
that. I am trying to point out that it isn't "huge" by comparing it to
other chunks of code that we all know and love.

We merge subsystems with new userspace apis that are large than this all
the time. I'm trying to say this isn't something "unusual" at all.

thanks,

greg k-h

2015-04-15 17:42:21

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 1:20 PM, Steven Rostedt <[email protected]> wrote:
> I read a bit of the documentation, but not enough. I really need to sit
> down and play with code. That's the way I learn and understand.
>

It might be useful for some of the current devs to post about the best
APIs to play with these days - my old libdbus is pretty painful,
compared to some of the newer stuff.

gdbus nicely shows a callback-based way to handle owning a service,
using a function like g_bus_own_name:

https://developer.gnome.org/gio/stable/gio-Owning-Bus-Names.html#g-bus-own-name

The callback-based approach means the library can handle
reconnection/restart on behalf of the app.

The flip side (the way you use rather than provide a service) looks similar:
https://developer.gnome.org/gio/stable/gio-Watching-Bus-Names.html#g-bus-watch-name

Here the library can deal with complexities of a service being
restarted, the app only has to write the callbacks so they can be
called more than once (with alternating appeared/vanished handlers).

You can see in those API docs more of the ordering guarantees, in this
case on callback invocation - less for apps to screw up.

Havoc

2015-04-15 17:55:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote:
> > I don't know too many other kernel features/drivers that have taken this
> > long, or done this "slowly", do you?
>
> What other features/drivers that you know introduce a major new IPC
> user space interface that will be a core component of the system?

We've been merging these about one every other kernel release for a
while now. Look at the drivers/misc/mic/ for one such example, there
are many others like this that are dealing with distributed systems and
having the kernel communicate between them through some custom userspace
api. Usually ioctls :)

We merge a lot of stuff, and unfortunately it's hard to get a view of
everything that happens all the time. I suggest reading at least the
shortlog summary of every commit if people are curious, I know I do.

thanks,

greg k-h

2015-04-15 17:59:42

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-14 15:43, Greg Kroah-Hartman wrote:
> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>>
>>>> I agree. You've sent a pull request for an unfortunate design. I
>>>> don't think that unfortunate design belongs in the kernel. If it says
>>>> in userspace, then user programmers could potentially fix it some day.
>>>
>>> You might not like the design, but it is a valid design. Again, we
>>> don't refuse to support hardware that is designed badly. Or support
>>> protocols we don't necessarily like, that's not the job of a kernel or
>>> operating system.
>>
>> And no, "the sole consumer of that API knows better, so bend over" is not
>> a good idea. We have shitloads of examples when single-consumer APIs
>> turned into screaming horrors; taking that in over the objections to API
>> design, merely on "they do it that way, who the hell we are to say they
>> are wrong?" is insane.
>
> Again, in this domain, the design is sound. So much so that everyone
> who works in that area moved toward it (KDE, Qt, Go, etc.) We might not
> think it makes sense, and it did take me a while to wrap my head around
> it, but to call it "crap" is unfair, sorry.
>

The reason that 'everyone who works in this area' adopted is not as much
that the design is sound (I'm not arguing whether it is or isn't in this
case) as it is that none of them could come up with anything better.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-15 18:04:56

by Rik van Riel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/15/2015 01:59 PM, Austin S Hemmelgarn wrote:
> On 2015-04-14 15:43, Greg Kroah-Hartman wrote:
>> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
>>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>>>
>>>>> I agree. You've sent a pull request for an unfortunate design. I
>>>>> don't think that unfortunate design belongs in the kernel. If it says
>>>>> in userspace, then user programmers could potentially fix it some day.
>>>>
>>>> You might not like the design, but it is a valid design. Again, we
>>>> don't refuse to support hardware that is designed badly. Or support
>>>> protocols we don't necessarily like, that's not the job of a kernel or
>>>> operating system.
>>>
>>> And no, "the sole consumer of that API knows better, so bend over" is
>>> not
>>> a good idea. We have shitloads of examples when single-consumer APIs
>>> turned into screaming horrors; taking that in over the objections to API
>>> design, merely on "they do it that way, who the hell we are to say they
>>> are wrong?" is insane.
>>
>> Again, in this domain, the design is sound. So much so that everyone
>> who works in that area moved toward it (KDE, Qt, Go, etc.) We might not
>> think it makes sense, and it did take me a while to wrap my head around
>> it, but to call it "crap" is unfair, sorry.
>>
>
> The reason that 'everyone who works in this area' adopted is not as much
> that the design is sound (I'm not arguing whether it is or isn't in this
> case) as it is that none of them could come up with anything better.

They are smart people, and I would not underestimate
the usefulness of the user space API (above the
dbus library) that they came up with.

That does not mean the actual in-kernel implementation
needs to follow the same design criteria. It may make
sense to have part of the implementation in kernel space,
part in user space, and allow the userspace part to be
switched out to accommodate other protocols over the same
in-kernel bus...

Moving some of the policy bits into a user space daemon
may make sense. Storing messages that cannot be delivered
right now in user space could make sense, too.

2015-04-15 18:04:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 19:31:45 +0200
Greg Kroah-Hartman <[email protected]> wrote:

> > But were the people that are not liking it at those conference sessions?
>
> People who don't like a topic, usually go to a session about it, why
> would they? :)

Exactly, but if you invite those people, and say "hey, here's your
chance to set us straight" maybe they'll come. I would.

But give them a few weeks notice, so that they can study what's out
there.


> > But you are missing one of the complaints that I'm reading from
> > people. The proposed ABI is too complex. Do we really want to jump into
> > having to support another tty layer?
>
> Don't make idle comments, the tty layer is far more complex and larger

We are all making our own little exaggerated metaphors. ;-)

> than the kdbus code, with much nastier issues and problems. And we
> handle that just fine :)
>
> As far as the "support" issue, we have 4 people who are all experienced,
> senior kernel developers who are signed up to maintain this. There's
> more experience here for this one MAINTAINERS entry per line of code
> than I have seen in quite some time.

No, but people seems to be worried about the complexity. If everyone
understands that there's no other choice but to have it complex (like
RCU is), then everyone will be fine with it. But right now, people are
questioning why it needs to be complex. But we need more people to
spend time on it to make sure it does.


> > One thing that I think may be really worth doing is that everyone on
> > this thread that has not yet done so, write a simple dbus application
> > to try to understand its design. Break it down to the requirements that
> > are needed, and discuss that.
>
> I've done that, it's hard, use the gdbus interface instead, it makes
> your life much easier.

I still need to play with the code and see exactly what it does. What
goes into the kernel needs to be the raw interface only. Everything
else should be in a library that takes care of the details. Is that
what is here?


>
> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
> all use the library to interact with the kernel. Do that here, there
> are wonderful dbus libraries out there, for all languages. Use them
> instead.

Is this what is being proposed (again, I need to go back and read the
original change log. I did it once, but mostly forgot what was in it).

>
> > Is there a reason that this patch must go in this merge window?
>
> What makes this merge window any different from any other? Again, I
> explained why I asked it to be merged at this point in time. If people
> have technical issues with it, I'll be more than glad to work them out
> and merge it later, there's no "hard and fast deadline" anyone is asking
> for here.

Well, there's been a few minor things that have been pointed out (the
locking), and having something as small as that take place during a
merge window, to me, would be cause to wait another merge window.

>
> > Having something this controversial take place during the merge window
> > suggests its a bit premature to push in now.
>
> "take place"? Have you been ignoring these patches posted numerous
> times for many months? This is the point in time to ask for code to be
> merged, just like any other code, nothing is special here.

But there are still complaints about it. Perhaps people are just
noticing. We are all busy, and nobody (but perhaps Andrew Morton and
Jon Corbet) reads every LKML message. It's now getting more eyes.
That's a good thing.

I'd like more time to play with it so that I can understand why exactly
it needs to go in as you say it does.

-- Steve

2015-04-15 18:07:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 19:33:57 +0200
Greg Kroah-Hartman <[email protected]> wrote:

> We merge subsystems with new userspace apis that are large than this all
> the time. I'm trying to say this isn't something "unusual" at all.

I believe the difference is that those subsystems are not part of the
core system infrastructure. If it is, can you please tell me what they
are.

People don't use perf and tracing to run their desktops. People will be
using kdbus though.

-- Steve

2015-04-15 18:11:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
> On Wed, Apr 15, 2015 at 01:49:36PM +0200, Greg Kroah-Hartman wrote:
> >
> > There's the issue of thousands of dbus queries, and then there's the
> > issue that making those queries takes a measurable amount of time. We
> > can fix the later one, the first one, well, not so much, but we can
> > provide the resources for them to make a faster system if they want to.
>
> I'll argue that you can't fix the later one. One thing that I've observed over
> the years of having faster computers is, as soon as you make it faster, people
> will write slower software.
>
> Currently the issue is that we have thousands of dbus queries, you make dbus
> 10x faster, I guarantee that people will write software with 10 thousand dbus
> queries and we are no better off than we are today.

Then they get to buy a faster machine :)

2015-04-15 18:12:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote:
> > Yes it seems complex, but read the documentation to get an idea of what
> > is happening here. I think you will get a better appreciation of what
> > is going on.
>
> I read a bit of the documentation, but not enough. I really need to sit
> down and play with code. That's the way I learn and understand.

Here's a good mapping for C developers that Lennart wrote last year:
https://lwn.net/Articles/619250/
that should give you a good starting point.

greg k-h

2015-04-15 18:12:38

by James Bottomley

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 2015-04-15 at 10:37 +0200, Greg Kroah-Hartman wrote:
> On Wed, Apr 15, 2015 at 12:05:01AM +0200, Jiri Kosina wrote:
> > On Tue, 14 Apr 2015, Steven Rostedt wrote:
> >
> > > I believe that Linux Plumbers is still accepting MicroConferences. I
> > > wonder if this would be a good one to have. Try to get everyone face to
> > > face and talk about how exactly kdbus should be implemented in the
> > > kernel.
> >
> > I personally would even put more emphasis on a session that would first
> > focus on "why", before we look at "how".
> >
> > I have already asked about this during the earlier RFC submissions, but
> > the only "take-home message" I took from that discussion was "because it's
> > faster than what we currently have". I don't find that a sufficient
> > justification by itself for something so complex (with potential
> > implications all over the place for the whole Linux ecosystem), especially
> > given the fact we already have sealed memfds zerocopy etc (and I am not
> > even talking about the "infinite set-in-stone userspace API" implications
> > this has).
>
> I wrote many many lines of "why" in the patch submissions, and in the
> first email in this thread. Are any of those specific solutions and
> "why" reasons not correct in your opinion? If so, great, please let me
> know.
>
> But to say that no one is focusing on "why" is a slight to those of us
> who have been providing just that.

Please stop. A debate that degenerates into a disagreement about whether
specific questions have or have not been answered is no debate at all:
it's an ideological show case. If both sides are going to do the same
at plumbers (or elsewhere) it will be a waste of time (well, except as a
spectator sport).

To make this work, you need (as the plumbers MC templates tell you) a
list of key attendees from all sides of the debate who'll commit to
coming (mostly what I've heard so far is people committing not to
coming) and a list of guiding topics which people will commit to
discussing honestly.

For me the biggest issue is the container problem: it's really hard to
containerise kdbus because of the stateful nature of the protocol and
the fact that it has a well known system bus. Separation into domains
works for OS containers, but application containers need more fluidity.
It's not unlike the same problem on windows: Windows application
containers are very difficult to do because the global registry means
that OLE handlers all have to run inside your container as well
(effectively making it an OS container). I'm sure, since we already
have a lot of containers people going to plumbers, that we can get them
to turn up for the discussion.

James

2015-04-15 18:16:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 12:44:44PM -0400, Havoc Pennington wrote:
>
> By pushing everything through one queue, dbus is trying to reduce the
> number of codepaths in applications. Apps have a lot of new problems
> to solve if messages get their order scrambled.

But can't a dbus library handle this for the apps? Like implementing TCP on
top of UDP. I really doubt the entire dbus protocol needs to be pushed into
the kernel.

I'm going to try to spend some time reading about dbus and playing with the
code (thanks for the links BTW!). Then I can see if I can come up with
something too. Or at least be able to ask the right questions.

-- Steve

2015-04-15 18:18:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
>>
>> I'll argue that you can't fix the later one. One thing that I've observed over
>> the years of having faster computers is, as soon as you make it faster, people
>> will write slower software.
>>
>> Currently the issue is that we have thousands of dbus queries, you make dbus
>> 10x faster, I guarantee that people will write software with 10 thousand dbus
>> queries and we are no better off than we are today.
>
> Then they get to buy a faster machine :)

Is there actually a performance issue?

I've seen this claimed, but I have never seen any actual numbers. What
speeds up? By how much? is it actually measurable?

Maybe they've marched past me in this thread-from-hell. But I can't
recall having seen any (not now, not before).

That said, I think the more serious issue is that if Luto complains
about the capability-capturing code being completely broken, then
people need to take that *seriously*.

Linus

2015-04-15 18:29:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds
<[email protected]> wrote:
>
> I've seen this claimed, but I have never seen any actual numbers. What
> speeds up? By how much? is it actually measurable?

And just to clarify: by "what speeds up, and by how much", I do _not_
mean "sending a dbus message speeds up by 10x and avoids context
switches". I've seen _those_ numbers. But does it actually matter?

It was more of a "there are thousands of dbus messages during boot,
but can you actually measure the speedup?" question. That's the kind
of numbers I've not seen.

Linus

2015-04-15 18:37:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:28:58AM -0700, Linus Torvalds wrote:
> On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > I've seen this claimed, but I have never seen any actual numbers. What
> > speeds up? By how much? is it actually measurable?
>
> And just to clarify: by "what speeds up, and by how much", I do _not_
> mean "sending a dbus message speeds up by 10x and avoids context
> switches". I've seen _those_ numbers. But does it actually matter?
>
> It was more of a "there are thousands of dbus messages during boot,
> but can you actually measure the speedup?" question. That's the kind
> of numbers I've not seen.

Someone from BMW did the testing on one of their car systems a while ago
and posted some numbers, it was a factor of 10 faster. I'll try to dig
it up, but it was burried in a powerpoint presentation, so it might be
hard to find, give me a day.

thanks,

greg k-h

2015-04-15 18:38:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:18:45AM -0700, Linus Torvalds wrote:
> That said, I think the more serious issue is that if Luto complains
> about the capability-capturing code being completely broken, then
> people need to take that *seriously*.

I am taking that seriously, it's been a long day, will respond to it
tomorrow.

thanks,

greg k-h

2015-04-15 18:41:18

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 2:16 PM, Steven Rostedt <[email protected]> wrote:
> But can't a dbus library handle this for the apps? Like implementing TCP on
> top of UDP. I really doubt the entire dbus protocol needs to be pushed into
> the kernel.

You could probably do something like assign sequence numbers,
temporarily relax ordering, and then reconstruct the order where
needed, but somebody still has to assign the sequence numbers in
order, and the bus has to process requests in order (it can't flip a
subscribe and an unsubscribe, for example). So I don't know whether
you could get anywhere with it or not.

The current model (userspace dbus daemon, don't know about kdbus) is like this:

- you have a pool of ordered incoming queues from each client, where
each incoming queue conceptually ends with EOF of course

- the main bus loop does:
- pick the head message or EOF from any nonempty incoming queue for dispatch
- route it according to destination address or subscribers
- if the destination includes the bus itself (e.g. someone wanting
to subscribe or own a name or whatever) then process the request...
note that this will potentially affect how the next message gets
routed
- for each destination client, write the message to the ordered
outgoing queue for that client
- if the incoming queue has EOF then send out notifications about
that to interested clients, clean up bus names, etc.

Conceptually, filling and draining the queues could easily be in
separate threads, though the userspace daemon doesn't do that.

> I'm going to try to spend some time reading about dbus and playing with the
> code (thanks for the links BTW!). Then I can see if I can come up with
> something too. Or at least be able to ask the right questions.
>

Greg may be right to point people to Lennart's C binding which has a
lot less "baggage" than the GLib stuff, which assumes knowledge of the
"glib way"

Havoc

2015-04-15 19:04:34

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 15. April 2015, 13:40:36 schrieb Greg Kroah-Hartman:
> On Wed, Apr 15, 2015 at 11:44:11AM +0200, Borislav Petkov wrote:
> > On Wed, Apr 15, 2015 at 11:27:13AM +0200, Greg Kroah-Hartman wrote:
> > > On Wed, Apr 15, 2015 at 11:21:49AM +0200, Borislav Petkov wrote:
> > > > On Wed, Apr 15, 2015 at 11:20:34AM +0200, Greg Kroah-Hartman
wrote:
> > > > > > We're all forced to use cgroups, systemd, udev unless we want
> > > > > > to have busybox as userland. That's a fact.
> > > > >
> > > > > Is that a problem?
> > > >
> > > > I'm amazed that you're really actually asking that question :-(
> > >
> > > Really? Why can't userspace rely on the features that the kernel
> > > provides them?
> >
> > Userspace can do whatever it wants. As long as I'm not being *forced*
> > to do what userspace thinks is the right thing.
> >
> > It seems to me that since that whole systemd* debacle started, we're
> > forgetting the choice aspect.
>
> What "choice" aspect? Surely you aren't going to make the "Linux is
> about choice" argument are you?
>
> > And dammit, I want my choice. I want to be able to choose what I'm
> > running. Not run what someone else thought what would be good for me
> > to
> > run. If I wanted that, I'd long switched to windoze or ?bble.
>
> Oh crap, you went there :)
>
> Take a look at http://www.islinuxaboutchoice.com/ please.

Just one question:

In what way is the post of a single kernel developer authoritative for the
whole community?

Even if I would make a poster of 200x100 meters or so and stick it onto a
building, it wouldn?t be.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-15 20:22:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <[email protected]> wrote:
> On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <[email protected]> wrote:
>> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:
>>
>>>> that anyone here does either. In the many years I've spent working on
>>>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>>>> has normally worked, because it is. And that's not a bad thing, it's
>>>> just different, and for us to support real needs and requirements of our
>>>> users, is the requirement of the Linux kernel.
>>>
>>> There are I think a set of intertwined problems here
>>>
>>> - An efficient delivery system for multicast messages delivered locally
>>> (be that MPI, dbus whatever - it's not "dbus or nothing")
>>>
>>> - A kernel side dynamic namespace to describe what goes where
>>>
>>> - A kernel side security model to describe who may receive what, and
>>> which additional information/tags/cred info
>>>
>>> - Something that provides state to stuff that needs it (and probably
>>> belongs in userspace - dbus name service etc)
>>>
>>> - Something that maps dbus and other models onto the kernel security
>>> model (and we have tools like EBPF which are very powerful)
>>>
>>> - Something that maps the kernel layer onto models like MPI-3
>
> When trying to split apart problems, for dbus it's important to keep
> ordering guarantees.
>
> That is, with dbus if I send a broadcast message, then send a unicast
> request to another client, then drop the connection causing the bus to
> broadcast that I've dropped; then the other client will see those
> things in that order - the broadcast, then the request, and then that
> I've dropped the connection.

This leads me to a potentially interesting question: where's the
buffering? If there's a bus with lots of untrusted clients and one of
them broadcasts data faster than all receivers can process it, where
does it go?

At least with a userspace solution, it's clear what the OOM killer
should kill when this happens. Unless it's PID 1. Sigh.

--Andy

2015-04-15 20:42:10

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 01:22:12PM -0700, Andy Lutomirski wrote:

> This leads me to a potentially interesting question: where's the
> buffering? If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
>
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens. Unless it's PID 1. Sigh.

... and there is a PID 1 specimen that really likes to spew over dbus.
A lot. I had never been able to find out _why_ does systemd feel like
broadcasting all kinds of stuff from PID 1 - maybe somebody in this
thread can answer that. For example, what's the point of broadcasting
mount table updates, when
* it can't hope to catch all individual changes - they _can_ get
lumped together, no matter what it tries.
* any process can just as easily keep track of that data on its
own as it could by watching those broadcasts; parsing /proc/self/mountinfo
isn't harder than parsing notifications.
* you need to start with obtaining the original state somehow, or
what would you apply those updates to?
* if one insists on having a daemon doing such broadcasts, what
the hell is the point of having PID 1 do that? Exact same logics would
do just fine. Moreover, you could have one running in a namespace of
your session, which is something PID 1 won't see.

Sure, I understand why it wants to be aware of what's mounted and where it's
mounted. Just as it wants to know what time it is. Should it broadcast
a dbus message every second, just to tell everyone what had it found about
the time?

I'm somewhat tempted to propose AF_TWITTER - would match the style... ;-/
And frankly, this really looks like a social media braindamage - complete
with status update broadcast every time a plane flies by...

2015-04-15 21:08:08

by Rik van Riel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/15/2015 04:22 PM, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <[email protected]> wrote:
>> On Wed, Apr 15, 2015 at 12:00 PM, Rik van Riel <[email protected]> wrote:
>>> On 04/15/2015 07:06 AM, One Thousand Gnomes wrote:
>>>
>>>>> that anyone here does either. In the many years I've spent working on
>>>>> this, dbus has seemed to be odd, and strange, to the way that the kernel
>>>>> has normally worked, because it is. And that's not a bad thing, it's
>>>>> just different, and for us to support real needs and requirements of our
>>>>> users, is the requirement of the Linux kernel.
>>>>
>>>> There are I think a set of intertwined problems here
>>>>
>>>> - An efficient delivery system for multicast messages delivered locally
>>>> (be that MPI, dbus whatever - it's not "dbus or nothing")
>>>>
>>>> - A kernel side dynamic namespace to describe what goes where
>>>>
>>>> - A kernel side security model to describe who may receive what, and
>>>> which additional information/tags/cred info
>>>>
>>>> - Something that provides state to stuff that needs it (and probably
>>>> belongs in userspace - dbus name service etc)
>>>>
>>>> - Something that maps dbus and other models onto the kernel security
>>>> model (and we have tools like EBPF which are very powerful)
>>>>
>>>> - Something that maps the kernel layer onto models like MPI-3
>>
>> When trying to split apart problems, for dbus it's important to keep
>> ordering guarantees.
>>
>> That is, with dbus if I send a broadcast message, then send a unicast
>> request to another client, then drop the connection causing the bus to
>> broadcast that I've dropped; then the other client will see those
>> things in that order - the broadcast, then the request, and then that
>> I've dropped the connection.
>
> This leads me to a potentially interesting question: where's the
> buffering? If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
>
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens. Unless it's PID 1. Sigh.

It may be useful to do the buffering (and general interception
of any message that cannot be delivered) in a userspace program.

Not only to get the buffers out of the kernel and into swappable
memory, but also so people could re-use the same infrastructure
for things like cluster communication (or communication between
different containers) - the userspace daemons could take care of
routing messages to and from the outside.

They could also be useful to keep some of the policy stuff
outside of the kernel, if only to ensure that the kernel side
policy is not set in stone, and people can do things differently
in the future if they want to.

2015-04-15 21:56:46

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 19:55:15 +0200
Greg Kroah-Hartman <[email protected]> wrote:

> On Wed, Apr 15, 2015 at 01:20:37PM -0400, Steven Rostedt wrote:
> > > I don't know too many other kernel features/drivers that have taken this
> > > long, or done this "slowly", do you?
> >
> > What other features/drivers that you know introduce a major new IPC
> > user space interface that will be a core component of the system?
>
> We've been merging these about one every other kernel release for a
> while now. Look at the drivers/misc/mic/

For a single specific piece of hardware, not a general API that by your
own admission will effectively be mandatory.

Alan

2015-04-15 21:56:53

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 19:31:45 +0200
> Don't make idle comments, the tty layer is far more complex and larger
> than the kdbus code, with much nastier issues and problems. And we
> handle that just fine :)

The tty layer is the way it is because of design decisions dating back 20
years that were (with hindsight) wrong coupled with the fact that POSIX
took a lot of the behavioural guarantees from an armwaving claim about
what Unix(tm) implemented without thinking about how to implement them
(as far as I can tell - given many of the guarantees are broken in Unix!)

> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
> all use the library to interact with the kernel. Do that here, there
> are wonderful dbus libraries out there, for all languages. Use them
> instead.

Agreed entirely - I don't disagree that we need a fast messaging layer.
The question is what bits belong in kernel. Go wants one, JMS wants one,
porting from stuff like QNX wants one (although they use the POSIX API
on QNX), MPI wants one (but with some useful and subtly different
semantics), various embedded things from tiny uKernels want one.

The question is what the kernel bit should actually look like, and how
many we need.

My guess is that we actually have three of the big use cases covered

- futexes and shared memory cover the tiny uKernel emulation bits (and on
a lawnmower engine sized ARM thats probably the only way to get the
speed approaching that of a tiny rtos)
- posix queues cover things like QNX porting
- publish/subscribe - via tmpfs

but we don't cover

- multicasting
- some types of credential and authority passing
- scatter/gather without excessive userspace wakes

> > Is there a reason that this patch must go in this merge window?
>
> What makes this merge window any different from any other? Again, I
> explained why I asked it to be merged at this point in time. If people
> have technical issues with it, I'll be more than glad to work them out
> and merge it later, there's no "hard and fast deadline" anyone is asking
> for here.

The problem I have is that every time someone points out a fundamental
design issue you simply say "Why haven't you reviewed 13,000 lines of
code".

I haven't given it an in depth review for the same reason as if someone
posted 13,000 lines of "I've got an awesome new file system which uses a
FAT and 8.3 file names". There's some more pressing concerns to sort
first. The fact it's complex and hard to follow also doesn't encourage
review.

And the fact Al tried to read it and is asking for help really worries
me 8)

>
> > Having something this controversial take place during the merge window
> > suggests its a bit premature to push in now.
>
> "take place"? Have you been ignoring these patches posted numerous
> times for many months? This is the point in time to ask for code to be
> merged, just like any other code, nothing is special here.

Well - you've asked. I see two NACKs from people with great taste. So I
think the next step is to defer trying to submit it and work through the
fact that Al can't follow the locking, and other people don't believe the
security model is maintainable.

Alan

2015-04-15 21:58:43

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 4:22 PM, Andy Lutomirski <[email protected]> wrote:
>
> This leads me to a potentially interesting question: where's the
> buffering? If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
>
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens. Unless it's PID 1. Sigh.
>

There's the history and there's the probably-should-happen. I'm sure
this can be improved.

What I think should probably happen is:

- if a client is trying to send a message and the bus's incoming
buffer from that client is full, the bus should stop reading (forcing
the client to do its own buffering).
- if a client is not consuming messages fast enough and the bus's
outgoing buffer to that client fills up, the client should be
disconnected.

This would essentially copy how the X server works (again).

The original userspace implementation has configurable buffer size
limits and also limits on resources (such as number of connections and
match rules) used by a single user, but I don't think it does the
right things when limits are reached.

When the incoming queue is full for a client, I'm not sure whether it
stops reading from that client or sends the client errors, I don't
remember.

When the outgoing-from-the-daemon queue is full (a client isn't
reading messages fast enough), if I remember right messages to that
client are dropped with an error reply to the sender - this error
probably gets ignored much of the time in practice, but in theory the
sender could retry.

A full outgoing queue for one client doesn't affect other clients, who
are still able to receive messages. For broadcast messages, a full
queue means a client will miss those broadcasts.

Disconnecting might be better than this drop-the-message behavior,
because clients could then assume that *either* they got all messages
that were broadcast, *or* they got disconnected - they won't ever
silently miss broadcasts and end up in a weird confused state.

Xserver does this - if I'm reading the code correctly just now
(xserver/os/io.c, FlushClient()), it buffers outgoing messages until
realloc fails, and then it disconnects the client.

If X didn't do this, then clients could miss events and become
confused about the state of the server. The same will often apply in
dbus scenarios.

In practice right now APIs are designed and limits are configured to
try to avoid ever hitting the limits (unless something is malicious or
badly broken), because if you hit them things go to hell - much like
running out of memory, or hitting file descriptor limits.

Disconnecting slow-reading clients would probably improve this; the
full buffer would be instantly freed, and the client could reconnect
and re-establish all state it cares about, if it wants to. So it might
gracefully recover sometimes, if the problem was transient.

Havoc

2015-04-15 22:09:01

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> When trying to split apart problems, for dbus it's important to keep
> ordering guarantees.

Yes I assumed that - minus disconnection/reconnect and running out of
queue space. Some users also want priority queueing (with or without the
guarantee for the same priority). Many of the other systems that can use
a fast multicast messaging system have priority queues - which is one
reason the existing POSIX messaging has priority.

> That is, with dbus if I send a broadcast message, then send a unicast
> request to another client, then drop the connection causing the bus to
> broadcast that I've dropped; then the other client will see those
> things in that order - the broadcast, then the request, and then that
> I've dropped the connection.

That's a simple matter of refcounting the buffers 8). I'm not really
concerned about the low level queue side of things. The proposed
implementation looks horribly convoluted for what the sk_buff layer can
already do standing on one leg. We know how to implement that part
cleanly, and its probably not hard to nail onto AF_UNIX or to expand
posix message queues to provide that service (and maybe then even
convince POSIX about it)

If it was just "here's a general purpose multicast message service" in a
small clean chunk of code I'd be cheering it into the tree. Even if you
need complicated filter rules because we can use EBPF to allow the client
library to do really sophisticated filtering and avoid wakeups for noise.

It's the complexity, the attachment to a lot of state in kernel and the
fact it doesn't appear to solve the general purpose problems that bothers
me.

> By pushing everything through one queue, dbus is trying to reduce the
> number of codepaths in applications. Apps have a lot of new problems
> to solve if messages get their order scrambled.

And I assume any user space solution for that purpose would end up
re-ordering messages if they could get shuffled so its

> (dbus does NOT guarantee order across multiple clients, of course -
> there's no guarantee that all clients get the broadcast, before anyone
> gets the next message - each client has its own buffer on both read
> and write. The ordering is only with respect to each client's message
> stream.)
>
> Ordering is vital for tracking state, because if you're sending out
> events to describe changes in state, the order of those changes is
> important.

Most of the time IMHO you don't want to listen to changes in state, you
want to notice that the state wasn't the value it was before and adapt.

> Of course there are more complex ways to handle this over in
> distributed-systems-world.

And publish/subscribe models - which for certain uses scale better, are
easier to make reliable and avoid a lot of the mess.

Alan

2015-04-15 22:11:47

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 2:56 PM, One Thousand Gnomes
<[email protected]> wrote:
> On Wed, 15 Apr 2015 19:31:45 +0200
>> Don't make idle comments, the tty layer is far more complex and larger
>> than the kdbus code, with much nastier issues and problems. And we
>> handle that just fine :)
>
> The tty layer is the way it is because of design decisions dating back 20
> years that were (with hindsight) wrong coupled with the fact that POSIX
> took a lot of the behavioural guarantees from an armwaving claim about
> what Unix(tm) implemented without thinking about how to implement them
> (as far as I can tell - given many of the guarantees are broken in Unix!)
>
>> I'll again refer to ALSA here, no one writes a "raw" ALSA program, they
>> all use the library to interact with the kernel. Do that here, there
>> are wonderful dbus libraries out there, for all languages. Use them
>> instead.
>
> Agreed entirely - I don't disagree that we need a fast messaging layer.
> The question is what bits belong in kernel. Go wants one, JMS wants one,
> porting from stuff like QNX wants one (although they use the POSIX API
> on QNX), MPI wants one (but with some useful and subtly different
> semantics), various embedded things from tiny uKernels want one.
>
> The question is what the kernel bit should actually look like, and how
> many we need.
>
> My guess is that we actually have three of the big use cases covered
>
> - futexes and shared memory cover the tiny uKernel emulation bits (and on
> a lawnmower engine sized ARM thats probably the only way to get the
> speed approaching that of a tiny rtos)
> - posix queues cover things like QNX porting
> - publish/subscribe - via tmpfs
>
> but we don't cover
>
> - multicasting
> - some types of credential and authority passing
> - scatter/gather without excessive userspace wakes

I would really like to see a very lightweight capability-based
messaging system. By "capability-based" I don't mean Linux
capabilities. I mean that a user program could give some very
lightweight token to a peer authorizing that peer to use some service
(by reference to the same token), and the peer could pass it on to
other peers as an introduction mechanism. (Search for
"capability-based security".)

This is functionally identical to passing AF_UNIX socket fds over
SCM_RIGHTS, but I want something much lighter weight.

Also, getting the really high performance stuff right would be nice.
Binder has one thing going for it (IIRC -- I've talked about it to
some of the authors, but I've never so much as glanced at the code):
it has a primitive to send and wait for a reply. This reduces the
load on scheduler.

I wish kdbus were blazingly fast, but I don't think it is :( I think
the bar should be either similar performance to (peer-to-peer) AF_UNIX
or something possibly more complex but considerably faster.

--Andy

2015-04-15 22:17:21

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015 11:28:58 -0700
Linus Torvalds <[email protected]> wrote:

> On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > I've seen this claimed, but I have never seen any actual numbers. What
> > speeds up? By how much? is it actually measurable?
>
> And just to clarify: by "what speeds up, and by how much", I do _not_
> mean "sending a dbus message speeds up by 10x and avoids context
> switches". I've seen _those_ numbers. But does it actually matter?

In the desktop case some of the desktop folks manage to get themselves to
the point they send so many messages that it does. That's rather a
reflection on people programming performance critical code armed with
tools that are too easy to use combined with the fact that messaging is
hard to understand and model latency-wise.

There is a better way to fix those.

For MPI and some of the 'we used to run on an RT nano-kernel' people then
kdbus as proposed won't help - but they do have problems where
(particularly on very slow processors) Linux is naturally enough not that
comparable with a minimally memory protecting rtos doing atomic swaps on
pointers in a shared memory.

Alan

2015-04-15 22:18:25

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:

> This is functionally identical to passing AF_UNIX socket fds over
> SCM_RIGHTS, but I want something much lighter weight.

Most of the weight in SCM_RIGHTS comes from the fact that you can
pass AF_UNIX sockets over it, which requires a garbage collector.
Exclude that and suddenly it becomes very cheap...

2015-04-15 22:22:49

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> The reason that 'everyone who works in this area' adopted is not as much
> that the design is sound (I'm not arguing whether it is or isn't in this
> case) as it is that none of them could come up with anything better.

Actually most message passing code uses things like JMS and the various
MQ libraries. Most IoT uses things other than dbus, small deep embedded
never uses dbus.

In the desktop space dbus wins because its very very easy to use and by
network effects. Everything else related already talks via dbus, so you
are going to have to talk dbus anyway to get anything done.

Alan

2015-04-15 22:26:46

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:18 AM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
>> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
>>>
>>> I'll argue that you can't fix the later one. One thing that I've observed over
>>> the years of having faster computers is, as soon as you make it faster, people
>>> will write slower software.
>>>
>>> Currently the issue is that we have thousands of dbus queries, you make dbus
>>> 10x faster, I guarantee that people will write software with 10 thousand dbus
>>> queries and we are no better off than we are today.
>>
>> Then they get to buy a faster machine :)
>
> Is there actually a performance issue?
>
> I've seen this claimed, but I have never seen any actual numbers. What
> speeds up? By how much? is it actually measurable?
>
> Maybe they've marched past me in this thread-from-hell. But I can't
> recall having seen any (not now, not before).
>
> That said, I think the more serious issue is that if Luto complains
> about the capability-capturing code being completely broken, then
> people need to take that *seriously*.

To be fair: the userspace version in systemd is completely broken, and
v1 of kdbus's was completely broken. v2's is, as far as I know, just
conceptually wrong and highly unlikely to be useful in any legitimate
fashion, but it's no longer obvious to me that it's exploitable.

(That being said, Eric doesn't like it, and I haven't re-read it
recently. So it could still be completely broken.)

--Andy

2015-04-15 22:29:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <[email protected]> wrote:
> On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
>
>> This is functionally identical to passing AF_UNIX socket fds over
>> SCM_RIGHTS, but I want something much lighter weight.
>
> Most of the weight in SCM_RIGHTS comes from the fact that you can
> pass AF_UNIX sockets over it, which requires a garbage collector.
> Exclude that and suddenly it becomes very cheap...

I should have been more specific. I don't mean the performance of
SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
around, each with their socket data structures and buffers.

I think that dbus could be quite efficiently implemented with a
userspace daemon that just introduces peers to each other, but the fd
explosion could be rather bad for some use cases.

I'll be the first to admit that I don't have a clean API in mind.
There was a lightweight fd proposal way back when, but it never went
anywhere, and it might not be suitable anyway.

--Andy

2015-04-15 22:49:18

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <[email protected]> wrote:
> > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
> >
> >> This is functionally identical to passing AF_UNIX socket fds over
> >> SCM_RIGHTS, but I want something much lighter weight.
> >
> > Most of the weight in SCM_RIGHTS comes from the fact that you can
> > pass AF_UNIX sockets over it, which requires a garbage collector.
> > Exclude that and suddenly it becomes very cheap...
>
> I should have been more specific. I don't mean the performance of
> SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
> around, each with their socket data structures and buffers.
>
> I think that dbus could be quite efficiently implemented with a
> userspace daemon that just introduces peers to each other, but the fd
> explosion could be rather bad for some use cases.
>
> I'll be the first to admit that I don't have a clean API in mind.
> There was a lightweight fd proposal way back when, but it never went
> anywhere, and it might not be suitable anyway.

Wait, are you talking about the overhead of descriptors used for capability
tokens (essentially zero - one system-wide struct file per capability +
one pointer in descriptor table of anyone who holds it + two bits in
bitmaps in the sam descriptor tables) or about the overhead of descriptors
used to send/receive those over? The latter don't have to be sockets
at all - they could bloody well be files on some ipcfs, or character device,
or FIFOs, etc.

2015-04-15 22:54:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 3:48 PM, Al Viro <[email protected]> wrote:
> On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <[email protected]> wrote:
>> > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
>> >
>> >> This is functionally identical to passing AF_UNIX socket fds over
>> >> SCM_RIGHTS, but I want something much lighter weight.
>> >
>> > Most of the weight in SCM_RIGHTS comes from the fact that you can
>> > pass AF_UNIX sockets over it, which requires a garbage collector.
>> > Exclude that and suddenly it becomes very cheap...
>>
>> I should have been more specific. I don't mean the performance of
>> SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
>> around, each with their socket data structures and buffers.
>>
>> I think that dbus could be quite efficiently implemented with a
>> userspace daemon that just introduces peers to each other, but the fd
>> explosion could be rather bad for some use cases.
>>
>> I'll be the first to admit that I don't have a clean API in mind.
>> There was a lightweight fd proposal way back when, but it never went
>> anywhere, and it might not be suitable anyway.
>
> Wait, are you talking about the overhead of descriptors used for capability
> tokens (essentially zero - one system-wide struct file per capability +
> one pointer in descriptor table of anyone who holds it + two bits in
> bitmaps in the sam descriptor tables) or about the overhead of descriptors
> used to send/receive those over? The latter don't have to be sockets
> at all - they could bloody well be files on some ipcfs, or character device,
> or FIFOs, etc.

Huh, interesting.

I was imagining that each of a server's peers (capability holders)
would have a fresh struct file, but maybe this wouldn't be needed at
all. You'd still need a way to get replies to your request, but the
API could just as easily be:

int send_to_capability(int dest, int source, const void *data, size_t len, ...);

where dest would be the destination's fd and source would be whatever
receive queue I expect the response on.

So maybe this is feasible. It doesn't solve broadcasts, but dbus
unicast could easily layer over a facility like this and the context
switch problem would go away for unicast.

Heck, I'd use it for my own proprietary stuff, too. It would be way
easier than the absurd tangle of socketpairs I currently use.

--Andy

2015-04-15 22:56:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 2015-04-15 at 23:48 +0100, Al Viro wrote:
> On Wed, Apr 15, 2015 at 03:28:58PM -0700, Andy Lutomirski wrote:
> > On Wed, Apr 15, 2015 at 3:18 PM, Al Viro <[email protected]> wrote:
> > > On Wed, Apr 15, 2015 at 03:11:17PM -0700, Andy Lutomirski wrote:
> > >
> > >> This is functionally identical to passing AF_UNIX socket fds over
> > >> SCM_RIGHTS, but I want something much lighter weight.
> > >
> > > Most of the weight in SCM_RIGHTS comes from the fact that you can
> > > pass AF_UNIX sockets over it, which requires a garbage collector.
> > > Exclude that and suddenly it becomes very cheap...
> >
> > I should have been more specific. I don't mean the performance of
> > SCM_RIGHTS itself; I mean the memory overhead of keeping tons of fds
> > around, each with their socket data structures and buffers.
> >
> > I think that dbus could be quite efficiently implemented with a
> > userspace daemon that just introduces peers to each other, but the fd
> > explosion could be rather bad for some use cases.
> >
> > I'll be the first to admit that I don't have a clean API in mind.
> > There was a lightweight fd proposal way back when, but it never went
> > anywhere, and it might not be suitable anyway.
>
> Wait, are you talking about the overhead of descriptors used for capability
> tokens (essentially zero - one system-wide struct file per capability +
> one pointer in descriptor table of anyone who holds it + two bits in
> bitmaps in the sam descriptor tables) or about the overhead of descriptors
> used to send/receive those over? The latter don't have to be sockets
> at all - they could bloody well be files on some ipcfs, or character device,
> or FIFOs, etc.

This kind of remind me futex : From an apparent simple idea we got to
the point of having more than 3000 lines of code in kernel/futex.c

It is sad that af_unix was chosen to support fd passing in the first
place. This is serious DOS vector.

2015-04-15 23:27:42

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 03:54:10PM -0700, Andy Lutomirski wrote:
> Huh, interesting.
>
> I was imagining that each of a server's peers (capability holders)
> would have a fresh struct file, but maybe this wouldn't be needed at
> all. You'd still need a way to get replies to your request, but the
> API could just as easily be:
>
> int send_to_capability(int dest, int source, const void *data, size_t len, ...);
>
> where dest would be the destination's fd and source would be whatever
> receive queue I expect the response on.
>
> So maybe this is feasible. It doesn't solve broadcasts, but dbus
> unicast could easily layer over a facility like this and the context
> switch problem would go away for unicast.
>
> Heck, I'd use it for my own proprietary stuff, too. It would be way
> easier than the absurd tangle of socketpairs I currently use.

BTW, the main issue with AF_UNIX passing is that recepient isn't asleep
awaiting for descriptors - they are thrown by sender at whoever's receiving
and sit there until somebody gets around to picking them.

_IF_ we had
client: I want a desciptor <goes to sleep, interruptibly>
kernel: assign it a sequence number
server: sees request (including sequence number)
server: give this fd to originator of request #N
kernel: check if originator is still there, insert the damn thing into their
descriptor table if they still are and return the obtained number
or
server: tell the originator of request #N to fuck off
kernel: check if originator is still there and gleefully pass the "fuck off" if
they still are

we wouldn't have the in-flight state at all, and there goes the garbage
collection shite. With some elaboration, it could even carry the
authentication traffic - "fuck off" might be "answer this challenge", with
the next "I want a descriptor" carrying reply...

2015-04-16 00:47:50

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 4:27 PM, Al Viro <[email protected]> wrote:
> On Wed, Apr 15, 2015 at 03:54:10PM -0700, Andy Lutomirski wrote:
>> Huh, interesting.
>>
>> I was imagining that each of a server's peers (capability holders)
>> would have a fresh struct file, but maybe this wouldn't be needed at
>> all. You'd still need a way to get replies to your request, but the
>> API could just as easily be:
>>
>> int send_to_capability(int dest, int source, const void *data, size_t len, ...);
>>
>> where dest would be the destination's fd and source would be whatever
>> receive queue I expect the response on.
>>
>> So maybe this is feasible. It doesn't solve broadcasts, but dbus
>> unicast could easily layer over a facility like this and the context
>> switch problem would go away for unicast.
>>
>> Heck, I'd use it for my own proprietary stuff, too. It would be way
>> easier than the absurd tangle of socketpairs I currently use.
>
> BTW, the main issue with AF_UNIX passing is that recepient isn't asleep
> awaiting for descriptors - they are thrown by sender at whoever's receiving
> and sit there until somebody gets around to picking them.
>
> _IF_ we had
> client: I want a desciptor <goes to sleep, interruptibly>
> kernel: assign it a sequence number
> server: sees request (including sequence number)
> server: give this fd to originator of request #N
> kernel: check if originator is still there, insert the damn thing into their
> descriptor table if they still are and return the obtained number
> or
> server: tell the originator of request #N to fuck off
> kernel: check if originator is still there and gleefully pass the "fuck off" if
> they still are
>
> we wouldn't have the in-flight state at all, and there goes the garbage
> collection shite. With some elaboration, it could even carry the
> authentication traffic - "fuck off" might be "answer this challenge", with
> the next "I want a descriptor" carrying reply...

I wonder if we could get away with having the receiver pre-allocate
some placeholder fds and then have the kernel replace a placeholder
with a passed fd immediately when the fd is sent and enqueue *that* in
the cmsg data. If you send an fd to someone who hasn't assigned any
placeholders to the receiving socket, then you get an error.

To keep the accounting sane, a placeholder would be a bona fide fd,
presumably a reference to a global placeholder anon_inode.

--Andy

2015-04-16 01:04:27

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 05:47:18PM -0700, Andy Lutomirski wrote:

> I wonder if we could get away with having the receiver pre-allocate
> some placeholder fds and then have the kernel replace a placeholder
> with a passed fd immediately when the fd is sent and enqueue *that* in
> the cmsg data. If you send an fd to someone who hasn't assigned any
> placeholders to the receiving socket, then you get an error.

*UGH*

It's a really bad idea. The thing is, descriptor table that isn't shared
is assumed to be unchanged. So when fdget() looks a file up, it doesn't
have to bump its refcount - the reference in descriptor table itself will
stay. Conversely, fdput() doesn't have to drop it in such case (we encode
whether we need to drop into struct fd returned by fdget() and passed to
fdput()).

That relies on no third-party modifications of descriptor table and yes,
the effect _is_ noticable - playing with struct file refcounts does result
in considerable overhead.

If recepient sits in "gimme a descriptor", we are fine - if descriptor table
was shared, the other users would be doing full refcount song and dance and
if it wasn't, recepient is the sole user _and_ it isn't betwee fdget() and
fdput() at the moment. With your "replace the dummies when sending" trick
we break all of that - we don't know what the recepient is doing at the moment
and for all we know they might be in the middle of something like e.g.
fstat() on your placeholder. With rather unpleasant effects...

2015-04-16 05:54:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Apr 15, 2015 6:04 PM, "Al Viro" <[email protected]> wrote:
>
> On Wed, Apr 15, 2015 at 05:47:18PM -0700, Andy Lutomirski wrote:
>
> > I wonder if we could get away with having the receiver pre-allocate
> > some placeholder fds and then have the kernel replace a placeholder
> > with a passed fd immediately when the fd is sent and enqueue *that* in
> > the cmsg data. If you send an fd to someone who hasn't assigned any
> > placeholders to the receiving socket, then you get an error.
>
> *UGH*
>
> It's a really bad idea. The thing is, descriptor table that isn't shared
> is assumed to be unchanged. So when fdget() looks a file up, it doesn't
> have to bump its refcount - the reference in descriptor table itself will
> stay. Conversely, fdput() doesn't have to drop it in such case (we encode
> whether we need to drop into struct fd returned by fdget() and passed to
> fdput()).
>
> That relies on no third-party modifications of descriptor table and yes,
> the effect _is_ noticable - playing with struct file refcounts does result
> in considerable overhead.
>
> If recepient sits in "gimme a descriptor", we are fine - if descriptor table
> was shared, the other users would be doing full refcount song and dance and
> if it wasn't, recepient is the sole user _and_ it isn't betwee fdget() and
> fdput() at the moment. With your "replace the dummies when sending" trick
> we break all of that - we don't know what the recepient is doing at the moment
> and for all we know they might be in the middle of something like e.g.
> fstat() on your placeholder. With rather unpleasant effects...

Hmm.

I don't love the special blocking call either -- it break polling loops.

We could have the existence of a placeholderfd count as an extra
reference to the descriptor table, with the associated performance
hit. Or we could allow each placeholderfd to collect one received fd
but not actually switch over. The latter is ugly and still has minor
DoS issues -- we'd have to prevent placeholderfds from being passed
through this mechanism or SCM_RIGHTS.

But wait... what about an evil trick? What if all placeholderfds are
the *same* struct file and that struct file is never deleted? Then
fdget on a placeholderfd is safe, since it's implicitly pinned.

--Andy

2015-04-16 08:44:01

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:

> > I originally didn't want to comment on this, but now that you are
> > making this argument for 3rd or 4th time, I can't really resist. What
> > exactly are you trying to "prove" by the 13k-lines argument?
> >
> > mm/vmscan.c is less that 4k lines. Does that sole fact mean that the whole
> > memory reclaim is trivial to review?
>
> I'm trying to say that it's not a ton of code. lines of code are of
> course not a valid way to judge complexity, and I'm not trying to say
> that. I am trying to point out that it isn't "huge" by comparing it to
> other chunks of code that we all know and love.
>
> We merge subsystems with new userspace apis that are large than this all
> the time. I'm trying to say this isn't something "unusual" at all.

I agree with you on that point. Merging 13k lines isn't a big deal, we do
that all the time.

But I don't think anyone in this (or previous) thread brought up the
number of lines of kdbus as an unltimate argument for questioning or even
NACKing it.

So I completely fail to see why this is so relevant that you keep
repeating it.

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-16 10:32:03

by Daniel Mack

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/16/2015 12:11 AM, Andy Lutomirski wrote:
> Also, getting the really high performance stuff right would be nice.
> Binder has one thing going for it (IIRC -- I've talked about it to
> some of the authors, but I've never so much as glanced at the code):
> it has a primitive to send and wait for a reply. This reduces the
> load on scheduler.

kdbus has the same thing, we call it a synchronous reply. That concept
is actually comprehensively explained in kdbus.message(7):

By default, all calls to kdbus are considered asynchronous,
non-blocking. However, as there are many use cases that need
to wait for a remote peer to answer a method call, there's a
way to send a message and wait for a reply in a synchronous
fashion. This is what the KDBUS_SEND_SYNC_REPLY controls. The
KDBUS_CMD_SEND ioctl will block until the reply has arrived,
the timeout limit is reached, in case the remote connection
was shut down, or if interrupted by a signal before any reply;
see signal(7). The offset of the reply message in the sender's
pool is stored in in offset_reply when the ioctl has returned
without error. Hence, there is no need for another KDBUS_CMD_RECV
ioctl or anything else to receive the reply.



Thanks,
Daniel

2015-04-16 12:02:49

by Tom Gundersen

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 2:09 PM, Jiri Kosina <[email protected]> wrote:
> On Wed, 15 Apr 2015, Greg Kroah-Hartman wrote:
>
>> 'systemctl reboot' calls a bunch of other things to determine if you
>> have local access to the machine, or permissions to reboot the machine
>> (i.e. CAP_SYS_BOOT), and other things that polkit might allow you to do,
>> and then, it decides to reboot or not. That happens today, right? I
>> don't understand the argument here.
>
> And what exactly is the argument that this is the way it should be
> implemnted?
>
> Why can't it just rely on the kernel to provide final answer to "to reboot
> or not to reboot, that is the question"?
>
> At the end of the day, it's the kernel that decides whether it will really
> ultimately ask the platform to reboot.
>
> If, for whatever reason (which might be completely invisible to userspace)
> kernel decides not to do so, userspace has to be able to recover from such
> failure in any case.

This is not how shutting down a general purpose operating system
works. If a system is shut down, all user sessions are terminated, all
services are stopped in the right order, all remaining processes
killed, all file systems are unmounted, all storage devices
disassembled, and so on. All this is implemented entirely in userspace
and involves a number of complex transitions from the normal init
system, to a shutdown PID 1 process and finally a transition back to
the initial ramdisk so that we can unmount the root file system even.
After all that is done, in the right order, following dependencies,
while enforcing timeouts, then the very last step is actually the
reboot() system call that then brings the kernel to a halt, and
possibly turns off power.

Thus I don't see how your suggestion can be applied in any way to how
system shutdown works: the shutdown procedure includes these
non-trivial preparation steps described above, and it is essential
that this preparation is not begun unless the client requesting it
actually has sufficient rights to do so. Or to put this another way:
if the system went all the way down, so that everything is killed,
unmounted, disassembled, to the point even that we transitioned away
from the root file system, then the reboot() system call is really
just the tiniest bit of it. And you should not be able to get there if
you originally didn't even possess the capability to execute that last
step...

Moreover, the daemon performing the shutdown tasks is necessarily
always privileged enough to do so, so calling into the kernel and see
what happens is completely the wrong thing to do (it would simply
succeed). What matters is if the client calling the daemon is
sufficiently privileged. If the client has the capabilites necessary
to call the reboot syscall directly, it makes no sense to disallow
them from doing a clean reboot. It would be like giving someone access
to pull the power plug, but not allow them to shutdown the machine
cleanly.

To conclude, the kernel makes the decision for allowing reboot() to
succeed based on CAP_SYS_BOOT, so when we decide whether or not to
perform the preparation steps, we really must also use CAP_SYS_BOOT.
If we are more restrictive, it does not gain us anything as people
with CAP_SYS_BOOT can just circumvent our logic and "pull the plug" by
calling reboot() directly. If we are less restrictive and for instance
check for uid==0 it would essentially mean that we have added a way to
circumvent the dropping of CAP_SYS_BOOT.

Cheers,

Tom

2015-04-16 12:13:52

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley
<[email protected]> wrote:
> For me the biggest issue is the container problem: it's really hard to
> containerise kdbus because of the stateful nature of the protocol and
> the fact that it has a well known system bus. Separation into domains
> works for OS containers, but application containers need more fluidity.
> It's not unlike the same problem on windows: Windows application
> containers are very difficult to do because the global registry means
> that OLE handlers all have to run inside your container as well
> (effectively making it an OS container). I'm sure, since we already
> have a lot of containers people going to plumbers, that we can get them
> to turn up for the discussion.

kdbus actually works very well in OS containers that mount a new
kdbusfs inside the container. This new instance of kdbus will be
entirely seperated from any other on the system. We've designed it
that way especially with OS containers in mind. This is explained in
kdbus.fs(7). It's very similar to devpts' container support, where you
mount a new instance of devpts into each container instance you run.

For Docker-style (i.e. app-focused) containers, it's a more complex
story. kdbus will not solve this for you, but at least one thing
deserves being mentioned: for this kind of sandboxing kdbus certainly
makes things *easier*, compared to dbus1. Why? because the kernel
gains a notion of individual messages and method call transactions,
something that is completely unavailable if you stick to dbus1 where
all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In
fact, kdbus as it is right now even contains minimal but explicit
support for sandboxing, by allowing creation of multiple bus endpoints
to the same bus that carry additional, more restrictive policy.

Thanks
David

2015-04-16 12:15:36

by Olaf Hering

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 16, Tom Gundersen wrote:

> to a shutdown PID 1 process and finally a transition back to
> the initial ramdisk so that we can unmount the root file system even.

Is that wishful thinking or actually implemented somewhere?

Olaf

2015-04-16 12:43:20

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 16.04.2015 um 14:15 schrieb Olaf Hering:
> On Thu, Apr 16, Tom Gundersen wrote:
>
>> to a shutdown PID 1 process and finally a transition back to
>> the initial ramdisk so that we can unmount the root file system even.
>
> Is that wishful thinking or actually implemented somewhere?

This is done on any system, which uses dracut and systemd for a long time now.

As SUSE switched to dracut recently, it should be the same as on RHEL-7/Fedora
now.

If /run/initramfs/shutdown exists and is executable,
/usr/lib/systemd/systemd-shutdown switches root to /run/initramfs/ and executes
shutdown.

The shutdown script umounts the old real root (after umounting
/oldroot/{proc,sys,run,dev}), then if the old real root was living on an
assembled device, like mdraid, the device is disassembled and waited for the
device to be clean.

See
http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/99shutdown/shutdown.sh
and for example for mdraid:
http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/90mdraid/md-shutdown.sh

This solved quite a lot of problems for unsynced raids.

2015-04-16 13:13:44

by Tom Gundersen

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/15/2015 10:22 PM, Andy Lutomirski wrote:
> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <[email protected]> wrote:
>> That is, with dbus if I send a broadcast message, then send a unicast
>> request to another client, then drop the connection causing the bus to
>> broadcast that I've dropped; then the other client will see those
>> things in that order - the broadcast, then the request, and then that
>> I've dropped the connection.
>
> This leads me to a potentially interesting question: where's the
> buffering? If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?

The concepts implemented in kdbus are actually quite different from dbus1:

Every connection to the bus has a memory pool assigned to store
incoming messages and variably sized runtime data returned by kdbus.
The pool memory is swappable, backed by a shmem file which is
associated with the bus connection.

Also, broadcasts are opt-in, so you only receive them if you
subscribed for the specific signal. It is either sent by another
userspace task, or by the kernel itself for things like name owner
changes. In order to receive those, a connection must install a match.
By default, no-one will receive any broadcasts.

All types of messages (unicast and broadcast) are directly stored into
a pool slice of the receiving connection, and this slice is not reused
by the kernel until userspace is finished with it and frees it. Hence,
a client which doesn't process its incoming messages will, at some
point, run out of pool space. If that happens for unicast messages,
the sender will get an EXFULL error. If it happens for a multicast
message, all we can do is drop the message, and tell the receiver how
many messages have been lost when it issues KDBUS_CMD_RECV the next
time. There's more on that in kdbus.message(7).

Also note that there is a quota logic in kdbus which protects against
a single connection conducting a DOS against another one. Together
with the policy code, this logic prevents one peer from flooding the
pool of another peer. Communication with a 3rd party is not affected
by this, due to the fair allocation scheme of the pool logic.

All this is explained in detail in kdbus.pool(7), but please let us
know if anything there is unclear.

> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens. Unless it's PID 1. Sigh.

No, if the buffering was done in the sender, the OOM killer would
catch the sending peer, which is of course the wrong thing to do,
because one connection could blow up a task simply by not responding
to the messages it sends. This is the reason why the pool concept was
a design principle in kdbus from the very beginning.

Cheers,

Tom

2015-04-16 13:14:40

by Daniel Mack

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/16/2015 12:08 AM, One Thousand Gnomes wrote:
>> When trying to split apart problems, for dbus it's important to keep
>> ordering guarantees.
>
> Yes I assumed that - minus disconnection/reconnect and running out of
> queue space. Some users also want priority queueing (with or without the
> guarantee for the same priority). Many of the other systems that can use
> a fast multicast messaging system have priority queues - which is one
> reason the existing POSIX messaging has priority.

And so does kdbus. By default, strict ordering is enforced when messages
are received, but optionally, that action may be constrained to messages
of a minimal priority. This allows for use cases where timing critical
data is interleaved with control data on the same connection. That's
described in kdbus.message(7), and is also covered by test cases.


Thanks,
Daniel

2015-04-16 14:35:22

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 16, 2015 at 6:13 AM, Tom Gundersen <[email protected]> wrote:
> On 04/15/2015 10:22 PM, Andy Lutomirski wrote:
>> On Wed, Apr 15, 2015 at 9:44 AM, Havoc Pennington <[email protected]> wrote:
>>> That is, with dbus if I send a broadcast message, then send a unicast
>>> request to another client, then drop the connection causing the bus to
>>> broadcast that I've dropped; then the other client will see those
>>> things in that order - the broadcast, then the request, and then that
>>> I've dropped the connection.
>>
>> This leads me to a potentially interesting question: where's the
>> buffering? If there's a bus with lots of untrusted clients and one of
>> them broadcasts data faster than all receivers can process it, where
>> does it go?
>
> The concepts implemented in kdbus are actually quite different from dbus1:
>
> Every connection to the bus has a memory pool assigned to store
> incoming messages and variably sized runtime data returned by kdbus.
> The pool memory is swappable, backed by a shmem file which is
> associated with the bus connection.
>
> Also, broadcasts are opt-in, so you only receive them if you
> subscribed for the specific signal. It is either sent by another
> userspace task, or by the kernel itself for things like name owner
> changes. In order to receive those, a connection must install a match.
> By default, no-one will receive any broadcasts.
>
> All types of messages (unicast and broadcast) are directly stored into
> a pool slice of the receiving connection, and this slice is not reused
> by the kernel until userspace is finished with it and frees it. Hence,
> a client which doesn't process its incoming messages will, at some
> point, run out of pool space. If that happens for unicast messages,
> the sender will get an EXFULL error. If it happens for a multicast
> message, all we can do is drop the message, and tell the receiver how
> many messages have been lost when it issues KDBUS_CMD_RECV the next
> time. There's more on that in kdbus.message(7).
>
> Also note that there is a quota logic in kdbus which protects against
> a single connection conducting a DOS against another one. Together
> with the policy code, this logic prevents one peer from flooding the
> pool of another peer. Communication with a 3rd party is not affected
> by this, due to the fair allocation scheme of the pool logic.
>
> All this is explained in detail in kdbus.pool(7), but please let us
> know if anything there is unclear.
>

This is neat, but it sounds like it will potentially add large amounts
of latency under even mild memory pressure.

Whose memcg does the pool use? If it's the receiver's, and if the
receiver can configure a memcg, then it seems that even a single
receiver could probably cause the sender to block for an unlimited
amount of time.

(And yes, I really hope that some day the cgroupns issues get resolved
and some programs really will be able to create their own cgroups,
even on systemd-using systems using the systemd-blessed
configuration.)

--Andy

2015-04-16 15:01:33

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
> Whose memcg does the pool use?

The pool-owner's (i.e., the receiver's).

> If it's the receiver's, and if the
> receiver can configure a memcg, then it seems that even a single
> receiver could probably cause the sender to block for an unlimited
> amount of time.

How? Which of those calls can block? I don't see how that can happen.

Thanks
David

2015-04-16 16:02:52

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 6:22 PM, One Thousand Gnomes
<[email protected]> wrote:
> Actually most message passing code uses things like JMS and the various
> MQ libraries. Most IoT uses things other than dbus, small deep embedded
> never uses dbus.

fwiw, to me it's a mistake to think of dbus as "the same space" as
something like JMS, or even small deep embedded uses.
The use cases and appropriate tradeoffs are different enough that it's
hard for me to think about them as one thing.

If different uses can share some common kernel mechanisms then great,
but one does have to be careful about
one-size-fits-all-actually-fits-nobody.

> In the desktop space dbus wins because its very very easy to use and by
> network effects. Everything else related already talks via dbus, so you
> are going to have to talk dbus anyway to get anything done.

You may agree with me, but to me "easy to use" is necessary to dbus's
utility - it's not a cosmetic feature. I was on the receiving end of
the Linux desktop bug firehose both pre-dbus and post-dbus, and having
IPC that's easy to use *correctly* means there are fewer bugs in that
firehose. At least, fewer bugs caused by IPC.

Of course, the thing that needs to be easy is the library API; it's OK
if an underlying kernel API is hard, as long as it gives the library
developers what they need to implement the easier API.

It is OK to push complexity onto userspace, but it's a mistake to push
it onto apps (as opposed to libraries that can be gotten right once
for all apps). If you push complexity onto apps you get buggier apps,
because application developers are experts in their app domain but
aren't experts in every underlying platform feature.

Why is dbus relatively easy to use? Some important pieces:

- the semantic guarantees such as ordering that we've already mentioned
- completeness - solves locating and tracking other processes, solves
both unicast and broadcast, etc.
- defines a mapping to objects-with-methods OO model

Can it be even better - for sure.

Havoc

2015-04-16 17:11:56

by Robert Schwebel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 15, 2015 at 11:22:18PM +0100, One Thousand Gnomes wrote:
> > The reason that 'everyone who works in this area' adopted is not as much
> > that the design is sound (I'm not arguing whether it is or isn't in this
> > case) as it is that none of them could come up with anything better.
>
> Actually most message passing code uses things like JMS and the various
> MQ libraries. Most IoT uses things other than dbus, small deep embedded
> never uses dbus.

For what it's worth: we more and more use dbus for small deep embedded
systems, IoT, loosely coupled industrial control applications etc.

rsc
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2015-04-16 17:04:45

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <[email protected]> wrote:
> Hi
>
> On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
>> Whose memcg does the pool use?
>
> The pool-owner's (i.e., the receiver's).
>
>> If it's the receiver's, and if the
>> receiver can configure a memcg, then it seems that even a single
>> receiver could probably cause the sender to block for an unlimited
>> amount of time.
>
> How? Which of those calls can block? I don't see how that can happen.

I admit I don't fully understand memcg, but vfs_iter_write is
presumably going to need to get write access to the target pool page,
and that, in turn, will need that page to exist in memory and to be
writable, which may need to page it in and/or allocate a page. If
that uses the receiver's memcg (as it should), then the receiver can
make it block. Even if it doesn't use the receiver's memcg, it can
trigger direct reclaim, I think.

--Andy

2015-04-16 17:16:14

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> And so does kdbus. By default, strict ordering is enforced when messages
> are received, but optionally, that action may be constrained to messages
> of a minimal priority. This allows for use cases where timing critical
> data is interleaved with control data on the same connection. That's
> described in kdbus.message(7), and is also covered by test cases.

More to the point "and so do POSIX message queues". They are also a
standard, a cross OS feature and relatively cleanly implemented in
kernel, ditto some classes of socket behaviour are similar and SYS5 IPC
(of which we shall not speak further I hope 8) ). I'm not saying that
they solve the problem but they might avoid some of the complexities.

Filtering is generalizable in Linux with a few lines of code, so rather
than hardcoding dbus semantics EBPF can express pretty much any
uni/multi/broadcast filtering policy rule for dbus or anything else.

I agree entirely with Havoc that the ease of use wants to be preserved
and semantics at the top of the dbus library shoudn't change. Dbus does
have the problem of being too easy to use badly, but that's hard to fix
technically 8)

Alan

2015-04-16 17:31:31

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Wed, Apr 15, 2015 at 2:36 PM, Al Viro <[email protected]> wrote:
> On Wed, Apr 15, 2015 at 11:09:48AM +0200, Greg Kroah-Hartman wrote:
>
>> I've asked for it, but finding people to review code is hard, as you
>> know. It's only 13k lines long, smaller than a serial port driver (my
>> unit of code review), so it's not all that big.
>>
>> It's smaller than the USB3 host controller driver as well, and very few
>> people ever reviewed that beast :)
>>
>> > For something that's potentially such a core mechanism as a completely
>> > new, massively-adopted IPC, this does send a warning singal.
>>
>> If you know of a way to force others to review code, please let me know.
>
> Have it in a less nasty state, perhaps? Random question:
>
> al@duke:~/linux/trees/vfs$ git grep -n -w kdbus_node_idr_lock
> ipc/kdbus/node.c:237:static DECLARE_RWSEM(kdbus_node_idr_lock);
> ipc/kdbus/node.c:340: down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:344: up_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:444: down_write(&kdbus_node_idr_lock);
> ipc/kdbus/node.c:452: up_write(&kdbus_node_idr_lock);

As Greg said, this is a leftover from times we actually needed a
lookup here. Nice catch, I have a local patch to convert the whole IDR
into an IDA and drop the lock entirely (like kernfs does right now,
for kernfs_node->ino).

> Do you see anything wrong with that? Or with things like that:
> mutex_lock(&pos->lock);
> v_pre = atomic_read(&pos->active);
> if (v_pre >= 0)
> atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
> else if (v_pre == KDBUS_NODE_NEW)
> atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
> mutex_unlock(&pos->lock);
> What are the locking rules for ->active/->waitq/->lock? Are those the
> outermost thing in the hierarchy? Or is that dependent on the node location?
> It sure as hell is outside of (at least) ->mmap_sem (by way of
> kdbus_conn_connect() establishing that ->active/->waitq is outside of
> ->conn_rwlock, which due to kdbus_bus_broadcast() nests outside of anything
> taken by kdbus_meta_proc_collect(), which includes ->mmap_sem) and that alone
> brings in a lot...

I'm working on patches to add more comments similar to how we did in
node.c. For now, please see my explanations below:

node->lock is the _innermost_ lock. node->active implements revoke
support for nodes. It follows what kernfs->active does and isn't a
lock in particular. We kinda treat it as rwsem, where down_write() is
the outer-most lock in kdbus and _only_ called without any other lock
held (kdbus_node_deactivate()). Read-side, we never ever block on the
"lock", but only use try-lock. If it fails, the node is dead/revoked.
Therefore, the read-side of 'active' nests almost arbitrarily. We hold
'active'-references almost everywhere, to make sure a node is not
destroyed while we use it. However, we never sleep for an indefinite
time while holding it.
Given that the write-side is the outer-most lock in kdbus, it doesn't
dead-lock against the try-lock readers.

> Document your goddamn locking, would you? It *IS* new code, and you, as you
> say, had very few people working on it, so you don't have the excuses for
> the mess existing in older parts of the tree.

Locking order (outer-most to inner-most):
1) domain->lock
2) names->rwlock
3) endpoint->lock
4) bus->conn_rwlock
5) policy->entries_rwlock
6) connection->lock
7) metadata->lock

mmap_sem nests below metadata->lock. With the rcu-protected exe_file
patches by Davidlohr Bueso, we can even drop that dependency. They
have kinda stalled, though.

Then we have a bunch of data structure protection, which can be called
from any context:
* bus->notify_lock
* pool->lock
* match->mdb_rwlock
* node->lock

Lastly, there're 2 locks which nest around everything and must not be
taken with any lock held:
* handle->rwlock (taken in ioctl-entry)
* bus->notify_flush_lock (taken in work-queue)

General object stacking is:
domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node}
The conn_rwlock protection of the conn-list locks on kdbus_bus is the
only lock that doesn't follow this ordering.

Thanks
David

2015-04-16 18:03:21

by Djalal Harouni

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On Wed, Apr 15, 2015 at 05:07:28PM -0400, Rik van Riel wrote:
[...]
> > This leads me to a potentially interesting question: where's the
> > buffering? If there's a bus with lots of untrusted clients and one of
> > them broadcasts data faster than all receivers can process it, where
> > does it go?
> >
> > At least with a userspace solution, it's clear what the OOM killer
> > should kill when this happens. Unless it's PID 1. Sigh
>
> It may be useful to do the buffering (and general interception
> of any message that cannot be delivered) in a userspace program.
>
> Not only to get the buffers out of the kernel and into swappable
> memory, but also so people could re-use the same infrastructure
> for things like cluster communication (or communication between
> different containers) - the userspace daemons could take care of
> routing messages to and from the outside.
>
> They could also be useful to keep some of the policy stuff
> outside of the kernel, if only to ensure that the kernel side
> policy is not set in stone, and people can do things differently
> in the future if they want to.
>
kdbus connections have memory pools, please check kdbus.pool(7). The
pool has its own quota accounting to prevent bad scenarios, and the
memory is attributed to the connection.

Messages that can't be delivered are not stored in the pool, but senders
will get an appropriate error code. For further details on how this
works, please see kdbus.message(7). If you are aware of any corner-cases
we overlooked, please let us know.

Regarding the policy, the implementaion is hardly more complex than
traditional UNIX file permissions. Bus names may have multiple
permissions assined, each of which consist of a bit-mask to denote OWN,
TALK and SEE flags which are applied to UIDs, GIDs or "world". This
policy has to be enforced by the kernel, therfore the information it
acts upon also needs to be stored there. For further details, please see
kdbus.policy(7).

The concept of a name policy originates from dbus1 [1], however we
simplified it substantially, removing features which we believe rather
belong into userspace.

[1] http://dbus.freedesktop.org/doc/dbus-daemon.1.html


--
Djalal Harouni
http://opendz.org

2015-04-16 18:20:11

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Wed, Apr 15, 2015 at 8:18 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
>> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
>>>
>>> I'll argue that you can't fix the later one. One thing that I've observed over
>>> the years of having faster computers is, as soon as you make it faster, people
>>> will write slower software.
>>>
>>> Currently the issue is that we have thousands of dbus queries, you make dbus
>>> 10x faster, I guarantee that people will write software with 10 thousand dbus
>>> queries and we are no better off than we are today.
>>
>> Then they get to buy a faster machine :)
>
> Is there actually a performance issue?
>
> I've seen this claimed, but I have never seen any actual numbers. What
> speeds up? By how much? is it actually measurable?

For us, boot speed-up has not been the primary concern. The boot-time
speedup that kdbus provides is unlikely to be significant on a generic
linux distro today, given that nowadays the slow parts during bootup
are firmware and hw initialization. The number of dbus messages sent
during bootup of a general purpose Linux distro is relatively small.
OTOH, during start-up of desktop environments, the dbus traffic is
substantial, but until the porting of Qt and glib to kdbus has been
completed and merged the real-world effect of this is minimal.

Our interest in improved raw performance is mainly motivated by making
dbus a viable protocol in situations where its semantics are
appropriate. But for performance reasons one had to use custom
protocols instead so far. Greg gave some examples in the cover letter,
multi-media being the most obvious area where the ten-fold decrease in
latency and the ability to efficiently copy large chunks of memory
from one process to another is relevant.

> Maybe they've marched past me in this thread-from-hell. But I can't
> recall having seen any (not now, not before).
>
> That said, I think the more serious issue is that if Luto complains
> about the capability-capturing code being completely broken, then
> people need to take that *seriously*.

Absolutely. In v3 of the patch set we addressed Andy's concerns, and
now he seems to be convinced that the kdbus code is not vulnerable
[1].

Thanks
David

[1] http://lists.freedesktop.org/archives/systemd-devel/2015-April/030776.html

2015-04-16 19:02:10

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <[email protected]> wrote:
> All types of messages (unicast and broadcast) are directly stored into
> a pool slice of the receiving connection, and this slice is not reused
> by the kernel until userspace is finished with it and frees it. Hence,
> a client which doesn't process its incoming messages will, at some
> point, run out of pool space. If that happens for unicast messages,
> the sender will get an EXFULL error. If it happens for a multicast
> message, all we can do is drop the message, and tell the receiver how
> many messages have been lost when it issues KDBUS_CMD_RECV the next
> time. There's more on that in kdbus.message(7).
>

Have you guys already grappled with what libraries/apps should do with
this information?

To handle the knowledge that "N messages have been lost," it seems
like the client must answer "are there any messages that, if lost,
would put any code using this connection into a confused state" and
then the client has to recover from said confused state.

A library probably can't do this - it doesn't know what state matters
or how to recover it - so each app would have to... and are
connections ever shared between modules of an app? (for example: could
a library such as GTK+ or pulseaudio be using the connection, and then
application code is also using the connection, so none of those code
modules has the whole picture... at that point, none of the modules
knows what to do about lost messages... to try to handle lost messages
in a module, you'd need a private connection(?)... which might be fine
as long as each app having a number of connections isn't too bloated.)

How to handle a send error depends a lot on what's being sent... but
if I were writing a general-purpose library wrapper, I'd be very
tempted to hide EXFULL behind an unbounded (or very-high-bounded)
userspace send buffer, which of course is what you were trying to
avoid, but I am skeptical that the average app will handle this error
sensibly.

The traditional userspace bus isn't any better than what you've
described here, of course - it's even worse - and it works well
enough. The limits are simply set high enough that they won't be hit
unless someone's broken or evil. Which is also the traditional
approach to say file descriptor limits or swap space: set the limit
high and hope you won't reach it. For the case of the X server, the
limit on message buffers appears to be "until malloc fails," so they
have the limit quite high, higher than userspace dbus does. "set high
limits and don't hit them" is a tried-and-true approach.

With either the existing userspace bus or kdbus, I bet you could come
up with ways to use limit exhaustion to get various services and apps
into confused states as they miss messages they were relying on,
simply because this is too hard for apps to reliably get right. The
lower the limits, the easier it would be to cause trouble by forcing
them to be hit.

In a perfect world we could figure out which client is "at fault" for
filling a buffer - the slow receiver or the overzealous sender - so we
could throttle or disconnect the guilty party instead of throwing
errors that won't be handled well ... but not sure that's practical.

Havoc

2015-04-16 20:56:25

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 16, 2015 at 07:31:22PM +0200, David Herrmann wrote:

> I'm working on patches to add more comments similar to how we did in
> node.c. For now, please see my explanations below:
>
> node->lock is the _innermost_ lock.
> node->active implements revoke
> support for nodes. It follows what kernfs->active does and isn't a
> lock in particular. We kinda treat it as rwsem, where down_write() is
> the outer-most lock in kdbus and _only_ called without any other lock
> held (kdbus_node_deactivate()). Read-side, we never ever block on the
> "lock", but only use try-lock. If it fails, the node is dead/revoked.
> Therefore, the read-side of 'active' nests almost arbitrarily. We hold
> 'active'-references almost everywhere, to make sure a node is not
> destroyed while we use it. However, we never sleep for an indefinite
> time while holding it.

Umm... Theoretically, but ->mmap_sem being under it means that it might
involve something like an NFS server timing out, so the latency might
suck very badly.

> Given that the write-side is the outer-most lock in kdbus, it doesn't
> dead-lock against the try-lock readers.

Huh? I see at least this call chain:
kdbus_handle_ioctl_control()
kdbus_node_acquire()
kdbus_cmd_bus_make()
kdbus_node_deactivate()
Granted, it won't be the _same_ node (otherwise you'd deadlock solid
right there and then), but it means that your locking order is sensitive
to something about nodes; it's not entirely determined by the lock type.

> Locking order (outer-most to inner-most):
> 1) domain->lock
> 2) names->rwlock
> 3) endpoint->lock
> 4) bus->conn_rwlock
> 5) policy->entries_rwlock
> 6) connection->lock
> 7) metadata->lock
>
> mmap_sem nests below metadata->lock. With the rcu-protected exe_file
> patches by Davidlohr Bueso, we can even drop that dependency. They
> have kinda stalled, though.
>
> Then we have a bunch of data structure protection, which can be called
> from any context:
> * bus->notify_lock
> * pool->lock
> * match->mdb_rwlock
> * node->lock
>
> Lastly, there're 2 locks which nest around everything and must not be
> taken with any lock held:
> * handle->rwlock (taken in ioctl-entry)

as well as in ->poll(), for completeness sake. The latter, BTW, isn't
nice - kdbus is far from being the only thing that does it, but having
->poll() block can be somewhat surprising...

> * bus->notify_flush_lock (taken in work-queue)

Hmm... That needs some care - it means that it nests inside anything held
by callers of cancel_delayed_work_sync() on the corresponding work. AFAICS,
there's at least one call chain leading to that from kdbus_node_deactivate()
(via ->release_cb == kdbus_ep_release -> kdbus_conn_disconnect ->
cancel_delayed_work_sync(&conn->work)) wait for kdbus_reply_list_scan_work
-> kdbus_notify_flush grabs ->notify_flush_lock). Tracking back further is
harder - not all call sites of kdbus_node_deactivate() can lead to that...

BTW, it's not only done in wq callbacks - there's a direct chain from
kdbus_conn_disconnect() as well (both through kdbus_name_release_all ->
kdbus_notify_flush and directly through kdbus_notify_flush()). And from
ioctl(), by many paths, while we are at it, but that only means that it
nests inside handle->rwlock, and _that_ is really the outermost.

What nests inside that one? It definitely a part of hierarchy - it can't
be excluded from deadlock analysis as effectively outermost. As for the
stuff under it... registry->rwlock is obvious, what else?

> General object stacking is:
> domain -> bus -> endpoint -> policy -> connection -> {metadata,pool,match,node}
> The conn_rwlock protection of the conn-list locks on kdbus_bus is the
> only lock that doesn't follow this ordering.

2015-04-17 09:19:59

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <[email protected]> wrote:
> > Hi
> >
> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
> >> Whose memcg does the pool use?
> >
> > The pool-owner's (i.e., the receiver's).
> >
> >> If it's the receiver's, and if the
> >> receiver can configure a memcg, then it seems that even a single
> >> receiver could probably cause the sender to block for an unlimited
> >> amount of time.
> >
> > How? Which of those calls can block? I don't see how that can happen.
>
> I admit I don't fully understand memcg, but vfs_iter_write is
> presumably going to need to get write access to the target pool page,
> and that, in turn, will need that page to exist in memory and to be
> writable, which may need to page it in and/or allocate a page. If
> that uses the receiver's memcg (as it should), then the receiver can
> make it block. Even if it doesn't use the receiver's memcg, it can
> trigger direct reclaim, I think.

Yes, memcg direct reclaim might trigger but we are no longer waiting for
the OOM victim from non page fault paths so the time is bounded. It
still might a quite some time, though, depending on the amount of work
done in the direct reclaim.
--
Michal Hocko
SUSE Labs

2015-04-17 13:24:06

by Daniel Mack

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi Havoc,

On 04/16/2015 09:01 PM, Havoc Pennington wrote:
> On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <[email protected]> wrote:
>> All types of messages (unicast and broadcast) are directly stored into
>> a pool slice of the receiving connection, and this slice is not reused
>> by the kernel until userspace is finished with it and frees it. Hence,
>> a client which doesn't process its incoming messages will, at some
>> point, run out of pool space. If that happens for unicast messages,
>> the sender will get an EXFULL error. If it happens for a multicast
>> message, all we can do is drop the message, and tell the receiver how
>> many messages have been lost when it issues KDBUS_CMD_RECV the next
>> time. There's more on that in kdbus.message(7).
>
> Have you guys already grappled with what libraries/apps should do with
> this information?
>
> To handle the knowledge that "N messages have been lost," it seems
> like the client must answer "are there any messages that, if lost,
> would put any code using this connection into a confused state" and
> then the client has to recover from said confused state.

This can only happen with user-originated DBus signal messages. For
unicast messages such as method calls, the sender will actually see
-EXFULL, and no part of the message is transmitted, leaving neither side
in a confused state. But yes, for broadcast signal messages, we can't
reject the sender because one single peer is out of buffer space, and we
can't allow boundless allocations on the receiver either, so informing
the other side is the best we can do.

Note that dbus-daemon just drops such signals silently. So with this
counter we simply add a debug mechanism for now. There hasn't been a
consensus on how to react to such errors on the application level. The
easiest way is obviously to re-sync all your state with the peer (which
could be as easy as calling ObjectManager.GetManagedObjects() or
Properties.GetAll()).

> A library probably can't do this - it doesn't know what state matters
> or how to recover it - so each app would have to... and are
> connections ever shared between modules of an app? (for example: could
> a library such as GTK+ or pulseaudio be using the connection, and then
> application code is also using the connection, so none of those code
> modules has the whole picture... at that point, none of the modules
> knows what to do about lost messages... to try to handle lost messages
> in a module, you'd need a private connection(?)... which might be fine
> as long as each app having a number of connections isn't too bloated.)
>
> How to handle a send error depends a lot on what's being sent... but
> if I were writing a general-purpose library wrapper, I'd be very
> tempted to hide EXFULL behind an unbounded (or very-high-bounded)
> userspace send buffer, which of course is what you were trying to
> avoid, but I am skeptical that the average app will handle this error
> sensibly.

Actually, we see no real difference between constrained outgoing or
incoming buffers. Even with a very-high-bounded send-buffer, you still
need to deal with it running full.

> The traditional userspace bus isn't any better than what you've
> described here, of course - it's even worse - and it works well
> enough. The limits are simply set high enough that they won't be hit
> unless someone's broken or evil. Which is also the traditional
> approach to say file descriptor limits or swap space: set the limit
> high and hope you won't reach it. For the case of the X server, the
> limit on message buffers appears to be "until malloc fails," so they
> have the limit quite high, higher than userspace dbus does. "set high
> limits and don't hit them" is a tried-and-true approach.
>
> With either the existing userspace bus or kdbus, I bet you could come
> up with ways to use limit exhaustion to get various services and apps
> into confused states as they miss messages they were relying on,
> simply because this is too hard for apps to reliably get right. The
> lower the limits, the easier it would be to cause trouble by forcing
> them to be hit.
>
> In a perfect world we could figure out which client is "at fault" for
> filling a buffer - the slow receiver or the overzealous sender - so we
> could throttle or disconnect the guilty party instead of throwing
> errors that won't be handled well ... but not sure that's practical.

Exactly, you need heuristics for that. It's non-trivial to figure out
whether the receiver or sender is to blame.

We've thought about how to address that for a while and came up with a
quota logic that is similar to what dbus-daemon implements in order to
prevent single connections from overflowing the pool of a receiver. The
limits that apply to that are currently hard-coded, and they work well
on our systems. In the future, they can easily be made a bus-wide
property that can be configured at bus creation time.


Thanks,
Daniel

2015-04-17 13:45:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 16, 2015 at 06:37:45PM +0200, Robert Schwebel wrote:
> On Wed, Apr 15, 2015 at 11:22:18PM +0100, One Thousand Gnomes wrote:
> > > The reason that 'everyone who works in this area' adopted is not as much
> > > that the design is sound (I'm not arguing whether it is or isn't in this
> > > case) as it is that none of them could come up with anything better.
> >
> > Actually most message passing code uses things like JMS and the various
> > MQ libraries. Most IoT uses things other than dbus, small deep embedded
> > never uses dbus.
>
> For what it's worth: we more and more use dbus for small deep embedded
> systems, IoT, loosely coupled industrial control applications etc.

Thanks for confirming this, I thought I had seen it used in IoT devices
already.

greg k-h

2015-04-17 14:55:37

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 17, 2015 at 9:23 AM, Daniel Mack <[email protected]> wrote:
>
> This can only happen with user-originated DBus signal messages. For
> unicast messages such as method calls, the sender will actually see
> -EXFULL, and no part of the message is transmitted, leaving neither side
> in a confused state.

Well - big asterisk, * no confused state IF the sender handles EXFULL
in a reasonable way. Which it probably doesn't most of the time :-)
but as you say it's no worse than it ever was.

> But yes, for broadcast signal messages, we can't
> reject the sender because one single peer is out of buffer space, and we
> can't allow boundless allocations on the receiver either, so informing
> the other side is the best we can do.

If this was ever going to happen (if the limits weren't high), I do
think it would be better to disconnect or throttle/backpressure
somehow, instead of breaking semantics. But the trouble is figuring
out how to do that... I don't know how. So the alternative is to set
high limits.

I think you're fine, it obviously works OK with the current userspace
daemon that punts in a similar way, and unix has a long tradition of
limits like this plus applications sucking at handling the "limit
reached" errors. It'll all work out...

> Note that dbus-daemon just drops such signals silently. So with this
> counter we simply add a debug mechanism for now. There hasn't been a
> consensus on how to react to such errors on the application level. The
> easiest way is obviously to re-sync all your state with the peer (which
> could be as easy as calling ObjectManager.GetManagedObjects() or
> Properties.GetAll()).

It's not realistic to expect the bulk of apps to handle this thing.
Special system services such as pid 1, you probably have the expertise
and time to try to carefully restore all state. Regular old apps will
get confused in practice if limits are hit in practice, but people
will configure the limits such that they're only hit if there's some
pathology going on.

>> How to handle a send error depends a lot on what's being sent... but
>> if I were writing a general-purpose library wrapper, I'd be very
>> tempted to hide EXFULL behind an unbounded (or very-high-bounded)
>> userspace send buffer, which of course is what you were trying to
>> avoid, but I am skeptical that the average app will handle this error
>> sensibly.
>
> Actually, we see no real difference between constrained outgoing or
> incoming buffers. Even with a very-high-bounded send-buffer, you still
> need to deal with it running full.

What I'm saying is that there's a practical difference between limits
low enough to be hit in normal operation, and limits high enough that
someobody has to be evil/broken before you hit them. With the "throw
an error" setup, if you set the limits low enough to be hit in
practice, then userspace will be buggy and break - that's my
prediction at least.

It's not different from say the file descriptor limits. If you crank
down your allowed open descriptors such that a user session actually
hits the limit, pretty much the session isn't usable. That's all I'm
saying.

If you wanted to be able to configure the limits low, where they'd be
hit in practice, then I think you'd want to look at some solution
other than tossing these errors that people will fail to handle
correctly, even if that solution were complex and/or heuristic. If you
set the limits high, it doesn't really matter so you can KISS.

It's sort of an academic point ... tons of kernel features already
have this issue. So carry on, you're good. :-)

Havoc

2015-04-17 16:53:55

by Mike Galbraith

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 2015-04-15 at 16:48 +0200, Michal Schmidt wrote:
> On 04/15/2015 09:31 AM, Mike Galbraith wrote:
> > it seems [systemd] has now mandated group scheduling.
>
> What makes you think so? Was it the fact that by default you have a
> populated /sys/fs/cgroup/cpu/ hierarchy? This is either because some
> unit requests the use of the cpu controller using one of the CPU*=
> directives from systemd.resource-control(5), or (perhaps more likely)
> because there is a privileged unit with Delegate=yes. The most likely
> candidate is [email protected], and so you could try preventing it from
> starting:
> systemctl mask [email protected]

BTW, asking it to symlink it's disabled service to /dev/null, did
indeed convince it to stop running said disabled service.

> Note that systemd still works without group scheduling or any cgroup
> subsystems enabled in the kernel:
>
> $ grep GROUP .config
> CONFIG_CGROUPS=y

Yup. CONFIG_CGROUPS=y all by itself isn't useless either, as that
allows the user to use his box for something other than a doorstop.

Hohum, 'nuff of that ;-)

Thanks for the hint, it seems a tad dainbramaged, but it works.

-Mike

2015-04-17 18:55:08

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <[email protected]> wrote:
> On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
>> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <[email protected]> wrote:
>> > Hi
>> >
>> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
>> >> Whose memcg does the pool use?
>> >
>> > The pool-owner's (i.e., the receiver's).
>> >
>> >> If it's the receiver's, and if the
>> >> receiver can configure a memcg, then it seems that even a single
>> >> receiver could probably cause the sender to block for an unlimited
>> >> amount of time.
>> >
>> > How? Which of those calls can block? I don't see how that can happen.
>>
>> I admit I don't fully understand memcg, but vfs_iter_write is
>> presumably going to need to get write access to the target pool page,
>> and that, in turn, will need that page to exist in memory and to be
>> writable, which may need to page it in and/or allocate a page. If
>> that uses the receiver's memcg (as it should), then the receiver can
>> make it block. Even if it doesn't use the receiver's memcg, it can
>> trigger direct reclaim, I think.
>
> Yes, memcg direct reclaim might trigger but we are no longer waiting for
> the OOM victim from non page fault paths so the time is bounded. It
> still might a quite some time, though, depending on the amount of work
> done in the direct reclaim.

Is that still true if OOM notifiers are involved? I've lost track of
what changed there.

Any any event, I'm not entirely convinced that having a broadcast send
cause, say, PID 1 to block until an unbounded number of pages in a
potentially unbounded number of memcgs are reclaimed is a good idea.

In the kdbus model's favor, I think that allowing pages of data in the
receive queue to be swapped out is potentially quite nice, but I'm
less convinced about non-full pages in the receive queue. There's a
resource management tradeoff here, and one nice thing about AF_UNIX is
that sends are genuinely non-blocking.

--Andy

2015-04-17 19:27:43

by James Bottomley

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, 2015-04-16 at 14:13 +0200, David Herrmann wrote:
> Hi
>
> On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley
> <[email protected]> wrote:
> > For me the biggest issue is the container problem: it's really hard to
> > containerise kdbus because of the stateful nature of the protocol and
> > the fact that it has a well known system bus. Separation into domains
> > works for OS containers, but application containers need more fluidity.
> > It's not unlike the same problem on windows: Windows application
> > containers are very difficult to do because the global registry means
> > that OLE handlers all have to run inside your container as well
> > (effectively making it an OS container). I'm sure, since we already
> > have a lot of containers people going to plumbers, that we can get them
> > to turn up for the discussion.
>
> kdbus actually works very well in OS containers that mount a new
> kdbusfs inside the container. This new instance of kdbus will be
> entirely seperated from any other on the system. We've designed it
> that way especially with OS containers in mind. This is explained in
> kdbus.fs(7). It's very similar to devpts' container support, where you
> mount a new instance of devpts into each container instance you run.
>
> For Docker-style (i.e. app-focused) containers, it's a more complex
> story.

Well, no, docker-style is just one flavour of application containers.
I'm actually much more interested in something very different:
applications that use container features (like docker, rocket and
systemd). Facilitating them is an interesting exercise.

Also, applications inside containers were around long before docker in
the PaaS space at least.

> kdbus will not solve this for you, but at least one thing
> deserves being mentioned: for this kind of sandboxing kdbus certainly
> makes things *easier*, compared to dbus1.

So slightly better than really difficult isn't terribly useful.

> Why? because the kernel
> gains a notion of individual messages and method call transactions,
> something that is completely unavailable if you stick to dbus1 where
> all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In
> fact, kdbus as it is right now even contains minimal but explicit
> support for sandboxing, by allowing creation of multiple bus endpoints
> to the same bus that carry additional, more restrictive policy.

Sandboxing is a minor (albeit very useful) use of containers.

You nicely ignored the actual problem I listed, which is the system bus.
And the specific example of what happens. Let me try again. Just to
provide the context, Virtuozzo has long supported containers on both
Windows and Linux. We have been doing application containers on Linux
for a long time, but we've been having issues doing the same thing on
windows (in spite of the fact that our windows container system is very
similar to the Linux one).

In windows, OLE + the global registry is dbus on steroids. The idea
seems simple and elegant: remote system elements are provided to you via
an IPC interaction instead of being directly dynamically linked into
your virtual address space. It allows windows applications to deal with
arbitrary objects of unknown type because the type handlers are provided
by the system via OLE. It's really elegant in a single user desktop
environment because the system's job is to serve and protect only that
user. In a multi user environment (as MS found with VDI) it's a lot
more problematic because now either the type handlers are global
(meaning local users can't modify them unlike in the single user case)
or they're all local, meaning we're back to OS containers again. If you
think abstractly of containers as a way to bring multi-user features to
single user environments (essentially that's what OS virtualization is)
you can see immediately why we're having such issues with non-os
containers on Windows because the single bus/global namespace idea
doesn't play well with multi-user.

This is why I think kdbus is a bad idea: it solidifies as a linux kernel
API something which runs counter to granular OS virtualization (and
something which caused Windows to fall behind Linux in the container
space). Splitting out the acceleration problem and leaving the rest to
user space currently looks fine because the ideas Al and Andy are
kicking around don't cause problems with OS virtualization.

James

2015-04-17 20:28:00

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley
<[email protected]> wrote:
>
> This is why I think kdbus is a bad idea: it solidifies as a linux kernel
> API something which runs counter to granular OS virtualization (and
> something which caused Windows to fall behind Linux in the container
> space). Splitting out the acceleration problem and leaving the rest to
> user space currently looks fine because the ideas Al and Andy are
> kicking around don't cause problems with OS virtualization.
>

I'm interested in understanding this problem (if only for my own
curiosity) but I'm not confident I understand what you're saying
correctly.

Can I try to explain back / ask questions and see what I have right?

I think you are saying that if an application relies on a system
service (= any other process that runs on the system bus) then to
virtualize that app by itself in a dedicated container, the system bus
and the system service need to also be in the container. So the
container ends up with a bunch of stuff in it beyond only the
application. Right / wrong / confused?

I also think you're saying that userspace dbus has the same issue
(this isn't a userspace vs. kernel thing per se), the objection to
kdbus is that it makes this issue more solidified / harder to fix?

Do you have ideas on how to go about fixing it, whether in userspace
or kernel dbus?

Havoc

2015-04-17 21:45:15

by Alex Elsayed

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Havoc Pennington wrote:

> Hi,
>
> On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley
> <[email protected]> wrote:
>>
>> This is why I think kdbus is a bad idea: it solidifies as a linux kernel
>> API something which runs counter to granular OS virtualization (and
>> something which caused Windows to fall behind Linux in the container
>> space). Splitting out the acceleration problem and leaving the rest to
>> user space currently looks fine because the ideas Al and Andy are
>> kicking around don't cause problems with OS virtualization.
>>
>
> I'm interested in understanding this problem (if only for my own
> curiosity) but I'm not confident I understand what you're saying
> correctly.
>
> Can I try to explain back / ask questions and see what I have right?
>
> I think you are saying that if an application relies on a system
> service (= any other process that runs on the system bus) then to
> virtualize that app by itself in a dedicated container, the system bus
> and the system service need to also be in the container. So the
> container ends up with a bunch of stuff in it beyond only the
> application. Right / wrong / confused?
>
> I also think you're saying that userspace dbus has the same issue
> (this isn't a userspace vs. kernel thing per se), the objection to
> kdbus is that it makes this issue more solidified / harder to fix?
>
> Do you have ideas on how to go about fixing it, whether in userspace
> or kernel dbus?
>
> Havoc

So far as I understand (and this may be wrong), this is the use case of
kdbus "endpoints" - you'd create a (constrained) kdbus endpoint on the host,
and then expose it to the application, such that the application uses it as
if it were the system bus.

2015-04-18 11:44:07

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Thu, Apr 16, 2015 at 10:55 PM, Al Viro <[email protected]> wrote:
> On Thu, Apr 16, 2015 at 07:31:22PM +0200, David Herrmann wrote:
>
>> I'm working on patches to add more comments similar to how we did in
>> node.c. For now, please see my explanations below:
>>
>> node->lock is the _innermost_ lock.
>> node->active implements revoke
>> support for nodes. It follows what kernfs->active does and isn't a
>> lock in particular. We kinda treat it as rwsem, where down_write() is
>> the outer-most lock in kdbus and _only_ called without any other lock
>> held (kdbus_node_deactivate()). Read-side, we never ever block on the
>> "lock", but only use try-lock. If it fails, the node is dead/revoked.
>> Therefore, the read-side of 'active' nests almost arbitrarily. We hold
>> 'active'-references almost everywhere, to make sure a node is not
>> destroyed while we use it. However, we never sleep for an indefinite
>> time while holding it.
>
> Umm... Theoretically, but ->mmap_sem being under it means that it might
> involve something like an NFS server timing out, so the latency might
> suck very badly.

Fixed! [1]
Linus just pulled akpm#3, which includes the rcu-protection for
exe-file. No more direct mmap_sem access in kdbus, anymore.

>> Given that the write-side is the outer-most lock in kdbus, it doesn't
>> dead-lock against the try-lock readers.
>
> Huh? I see at least this call chain:
> kdbus_handle_ioctl_control()
> kdbus_node_acquire()
> kdbus_cmd_bus_make()
> kdbus_node_deactivate()
> Granted, it won't be the _same_ node (otherwise you'd deadlock solid
> right there and then), but it means that your locking order is sensitive
> to something about nodes; it's not entirely determined by the lock type.

Indeed. We do allow pinning parent objects when deactivating its
children. I updated my doc-drafts accordingly.

>> Locking order (outer-most to inner-most):
>> 1) domain->lock
>> 2) names->rwlock
>> 3) endpoint->lock
>> 4) bus->conn_rwlock
>> 5) policy->entries_rwlock
>> 6) connection->lock
>> 7) metadata->lock
>>
>> mmap_sem nests below metadata->lock. With the rcu-protected exe_file
>> patches by Davidlohr Bueso, we can even drop that dependency. They
>> have kinda stalled, though.
>>
>> Then we have a bunch of data structure protection, which can be called
>> from any context:
>> * bus->notify_lock
>> * pool->lock
>> * match->mdb_rwlock
>> * node->lock
>>
>> Lastly, there're 2 locks which nest around everything and must not be
>> taken with any lock held:
>> * handle->rwlock (taken in ioctl-entry)
>
> as well as in ->poll(), for completeness sake. The latter, BTW, isn't
> nice - kdbus is far from being the only thing that does it, but having
> ->poll() block can be somewhat surprising...

I have a patch to fix this [2]. But it's more complex than the rwsem,
and requires some more review. However, it reduces the handle-locking
to a minimum, such that we only lock it during setup and can reduce it
to a mutex.

>> * bus->notify_flush_lock (taken in work-queue)
>
> Hmm... That needs some care - it means that it nests inside anything held
> by callers of cancel_delayed_work_sync() on the corresponding work. AFAICS,
> there's at least one call chain leading to that from kdbus_node_deactivate()
> (via ->release_cb == kdbus_ep_release -> kdbus_conn_disconnect ->
> cancel_delayed_work_sync(&conn->work)) wait for kdbus_reply_list_scan_work
> -> kdbus_notify_flush grabs ->notify_flush_lock). Tracking back further is
> harder - not all call sites of kdbus_node_deactivate() can lead to that...
>
> BTW, it's not only done in wq callbacks - there's a direct chain from
> kdbus_conn_disconnect() as well (both through kdbus_name_release_all ->
> kdbus_notify_flush and directly through kdbus_notify_flush()). And from
> ioctl(), by many paths, while we are at it, but that only means that it
> nests inside handle->rwlock, and _that_ is really the outermost.

Sorry, this was a mistake on my side. We do call kdbus_notify_flush()
directly quite often. And it nests underneath the handle, correct. I
noted this down.

I did have patches to actually move the kdbus_notify_flush() call to
the end of kdbus_handle_ioctl() and friends. Such so we flush all
collected notifications on return to user-space, which would make the
locking more obvious. However, it didn't make it much simpler, imo, so
it was never applied.

> What nests inside that one? It definitely a part of hierarchy - it can't
> be excluded from deadlock analysis as effectively outermost. As for the
> stuff under it... registry->rwlock is obvious, what else?

(Updated) Data-structure locks:
* bus->notify_lock
* pool->lock
* match->mdb_rwlock
* node->lock

Updated locking order:
1) handle->rwlock
2) bus->notify_flush_lock
3) domain->lock
4) names->rwlock
5) endpoint->lock
6) bus->conn_rwlock
7) policy->entries_rwlock
8) connection->lock
9) metadata->lock

* node->active read-side locks arbitrarily underneath handle->rwlock.

* node->active write-side nests underneath handle->rwlock, and
underneath read-side of any parent-node->active.

Thanks! Much appreciated!
David

[1] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=f396c12ecfda1717e5f76d6b4ab11e4db232e60d
[2] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=61875e1abd38a965c9f7dfca28068dd0a871961c

2015-04-20 12:43:14

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri 17-04-15 11:54:42, Andy Lutomirski wrote:
> On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <[email protected]> wrote:
> > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
> >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <[email protected]> wrote:
> >> > Hi
> >> >
> >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
> >> >> Whose memcg does the pool use?
> >> >
> >> > The pool-owner's (i.e., the receiver's).
> >> >
> >> >> If it's the receiver's, and if the
> >> >> receiver can configure a memcg, then it seems that even a single
> >> >> receiver could probably cause the sender to block for an unlimited
> >> >> amount of time.
> >> >
> >> > How? Which of those calls can block? I don't see how that can happen.
> >>
> >> I admit I don't fully understand memcg, but vfs_iter_write is
> >> presumably going to need to get write access to the target pool page,
> >> and that, in turn, will need that page to exist in memory and to be
> >> writable, which may need to page it in and/or allocate a page. If
> >> that uses the receiver's memcg (as it should), then the receiver can
> >> make it block. Even if it doesn't use the receiver's memcg, it can
> >> trigger direct reclaim, I think.
> >
> > Yes, memcg direct reclaim might trigger but we are no longer waiting for
> > the OOM victim from non page fault paths so the time is bounded. It
> > still might a quite some time, though, depending on the amount of work
> > done in the direct reclaim.
>
> Is that still true if OOM notifiers are involved? I've lost track of
> what changed there.

memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm:
memcg: enable memcg OOM killer only for user faults)

> Any any event, I'm not entirely convinced that having a broadcast send
> cause, say, PID 1 to block until an unbounded number of pages in a
> potentially unbounded number of memcgs are reclaimed is a good idea.

This deserves a clarification I guess. It is the memcg of the current
task which gets charged during the page fault normally. So if PID1 tries
to fault the memory in it will be its (most probably root) memcg which
gets charged. If the memory was already charged to a different task's
memcg and then it got swapped out, though, the PID1 would indeed wait
for the reclaim in the target memcg to swap the page back in.

In either case this sounds like a potential problem, because tasks
could hide their memory charges from the limit or PID1 context could
be blocked. But maybe I just misunderstood the and an uncharged memory
cannot be used for the buffer.

> In the kdbus model's favor, I think that allowing pages of data in the
> receive queue to be swapped out is potentially quite nice, but I'm
> less convinced about non-full pages in the receive queue. There's a
> resource management tradeoff here, and one nice thing about AF_UNIX is
> that sends are genuinely non-blocking.
>
> --Andy

--
Michal Hocko
SUSE Labs

2015-04-20 18:01:48

by James Bottomley

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote:
> Hi,
>
> On Fri, Apr 17, 2015 at 3:27 PM, James Bottomley
> <[email protected]> wrote:
> >
> > This is why I think kdbus is a bad idea: it solidifies as a linux kernel
> > API something which runs counter to granular OS virtualization (and
> > something which caused Windows to fall behind Linux in the container
> > space). Splitting out the acceleration problem and leaving the rest to
> > user space currently looks fine because the ideas Al and Andy are
> > kicking around don't cause problems with OS virtualization.
> >
>
> I'm interested in understanding this problem (if only for my own
> curiosity) but I'm not confident I understand what you're saying
> correctly.
>
> Can I try to explain back / ask questions and see what I have right?
>
> I think you are saying that if an application relies on a system
> service (= any other process that runs on the system bus) then to
> virtualize that app by itself in a dedicated container, the system bus
> and the system service need to also be in the container. So the
> container ends up with a bunch of stuff in it beyond only the
> application. Right / wrong / confused?

Right. Consider named as the unix equivalent. In most application
containers, it's provided from outside. However, any container that
wants it provided inside simply intercepts and overrides the well known
socket. We can do this in UNIX because there's no global bus handling
these queries, it's simply a matter of knowing where the socket is. In
windows you can't pick and choose the services you consume from outside.
Either you pull the whole OLE namespace into the container, and thus
have to provide everything from within, or try to run with none of it
provided by the container. It's this everything or nothing that's the
problem. Container virtualisation is about being granular and a system
bus (or global OLE namespace) is about being monolithic.

> I also think you're saying that userspace dbus has the same issue
> (this isn't a userspace vs. kernel thing per se), the objection to
> kdbus is that it makes this issue more solidified / harder to fix?

Yes, it does. We have problems containerising Linux desktops as well.
However, most of our server stuff is daemon and socket based, so that
containerises nicely. In windows, OLE has been absorbed even into the
server model which is why they have a bigger problem.

> Do you have ideas on how to go about fixing it, whether in userspace
> or kernel dbus?

Well, I've always suspected the solution would be for dbus to have a
hierarchical namespace of its own with the default policy be pass
message to parent namespace. This would allow a container to determine
which services were serviced outside and which inside the container (if
you attach as a provider to the system bus in the container, that
attachment supersedes the parent).

However, this doesn't solve the security problem: just because a
container hasn't attached an interior provider doesn't mean it should be
allowed complete access to all services provided from outside. This is
the nasty problem because it involves some type of filter on busses
which pass through containers.

James

2015-04-20 20:04:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 5:43 AM, Michal Hocko <[email protected]> wrote:
> On Fri 17-04-15 11:54:42, Andy Lutomirski wrote:
>> On Fri, Apr 17, 2015 at 2:19 AM, Michal Hocko <[email protected]> wrote:
>> > On Thu 16-04-15 10:04:17, Andy Lutomirski wrote:
>> >> On Thu, Apr 16, 2015 at 8:01 AM, David Herrmann <[email protected]> wrote:
>> >> > Hi
>> >> >
>> >> > On Thu, Apr 16, 2015 at 4:34 PM, Andy Lutomirski <[email protected]> wrote:
>> >> >> Whose memcg does the pool use?
>> >> >
>> >> > The pool-owner's (i.e., the receiver's).
>> >> >
>> >> >> If it's the receiver's, and if the
>> >> >> receiver can configure a memcg, then it seems that even a single
>> >> >> receiver could probably cause the sender to block for an unlimited
>> >> >> amount of time.
>> >> >
>> >> > How? Which of those calls can block? I don't see how that can happen.
>> >>
>> >> I admit I don't fully understand memcg, but vfs_iter_write is
>> >> presumably going to need to get write access to the target pool page,
>> >> and that, in turn, will need that page to exist in memory and to be
>> >> writable, which may need to page it in and/or allocate a page. If
>> >> that uses the receiver's memcg (as it should), then the receiver can
>> >> make it block. Even if it doesn't use the receiver's memcg, it can
>> >> trigger direct reclaim, I think.
>> >
>> > Yes, memcg direct reclaim might trigger but we are no longer waiting for
>> > the OOM victim from non page fault paths so the time is bounded. It
>> > still might a quite some time, though, depending on the amount of work
>> > done in the direct reclaim.
>>
>> Is that still true if OOM notifiers are involved? I've lost track of
>> what changed there.
>
> memcg OOM is not triggered from get_user_pages. See 519e52473ebe (mm:
> memcg: enable memcg OOM killer only for user faults)
>
>> Any any event, I'm not entirely convinced that having a broadcast send
>> cause, say, PID 1 to block until an unbounded number of pages in a
>> potentially unbounded number of memcgs are reclaimed is a good idea.
>
> This deserves a clarification I guess. It is the memcg of the current
> task which gets charged during the page fault normally. So if PID1 tries
> to fault the memory in it will be its (most probably root) memcg which
> gets charged. If the memory was already charged to a different task's
> memcg and then it got swapped out, though, the PID1 would indeed wait
> for the reclaim in the target memcg to swap the page back in.
>
> In either case this sounds like a potential problem, because tasks
> could hide their memory charges from the limit or PID1 context could
> be blocked. But maybe I just misunderstood the and an uncharged memory
> cannot be used for the buffer.
>

Hmm. One of the explicit design goals of kdbus is for sandboxing,
i.e. creating a restricted view ("endpoint") and letting sandboxed
things talk to non-sandboxed things outside through that restricted
view.

Given that, the ability for a broadcast receiver to cause a sender
(PID 1?) to allocate root-memcg pages seems like it could be a
problem.

--Andy

2015-04-20 20:26:31

by George Spelvin

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> It's used everywhere, on servers,
> embedded systems, desktops, you name it. All languages have bindings
> for it, and it's the underpinning of a modern Linux stack.

Since when? D-bus is some GUI depoendency. On my console-only servers, it's
not needed, and not installed:

# dpkg-query -s libdbus-1-3 dbus
dpkg-query: package 'libdbus-1-3' is not installed and no information is available
dpkg-query: package 'dbus' is not installed and no information is available
# dpkg-query -l \*dbus\*
dpkg-query: no packages found matching *dbus*


It's also not needed on a basic GUI system. Firefox complains about saving
preferences if it's not running, but runs just fine:

$ pgrep dbus
$ ps 6570 19644 25779 29487 29492
PID TTY STAT TIME COMMAND
6570 ? Sl 1:48 iceweasel
19644 ? Sl 0:29 /usr/bin/vlc -I qt4
25779 ? Sl 0:16 rhythmbox
29487 ? Sl 0:00 /usr/bin/gnumeric
29492 ? Sl 0:01 /usr/bin/gimp
$

Richard Weinberger wrote:
> kdbus will be a major hard-dependency for every non-trivial userland.
> Like cgroups...

and

> We're all forced to use cgroups, systemd, udev unless we want to have busybox
> as userland. That's a fact.

My daily desktop also has
# CONFIG_CGROUPS is not set

And no systemd. Udev actually does something useful, so I have it on my
desktop, but I have machines with a static /dev instead.

2015-04-20 20:43:26

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

David,

On Thu, Apr 16, 2015 at 8:20 PM, David Herrmann <[email protected]> wrote:
> Hi
>
> On Wed, Apr 15, 2015 at 8:18 PM, Linus Torvalds
> <[email protected]> wrote:
>> On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>>> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
>>>>
>>>> I'll argue that you can't fix the later one. One thing that I've observed over
>>>> the years of having faster computers is, as soon as you make it faster, people
>>>> will write slower software.
>>>>
>>>> Currently the issue is that we have thousands of dbus queries, you make dbus
>>>> 10x faster, I guarantee that people will write software with 10 thousand dbus
>>>> queries and we are no better off than we are today.
>>>
>>> Then they get to buy a faster machine :)
>>
>> Is there actually a performance issue?
>>
>> I've seen this claimed, but I have never seen any actual numbers. What
>> speeds up? By how much? is it actually measurable?
>
> For us, boot speed-up has not been the primary concern. The boot-time
> speedup that kdbus provides is unlikely to be significant on a generic
> linux distro today, given that nowadays the slow parts during bootup
> are firmware and hw initialization. The number of dbus messages sent
> during bootup of a general purpose Linux distro is relatively small.
> OTOH, during start-up of desktop environments, the dbus traffic is
> substantial, but until the porting of Qt and glib to kdbus has been
> completed and merged the real-world effect of this is minimal.
>
> Our interest in improved raw performance is mainly motivated by making
> dbus a viable protocol in situations where its semantics are
> appropriate. But for performance reasons one had to use custom
> protocols instead so far. Greg gave some examples in the cover letter,
> multi-media being the most obvious area where the ten-fold decrease in
> latency and the ability to efficiently copy large chunks of memory
> from one process to another is relevant.

In which situation on a common Linux system is the current dbus too slow today?
I've never seen a issue like "Oh my system is slow because dbus is
eating too much CPU cycles".

dbus my have issues which are worth to fix. But moving dbus more or
less in the kernel
seems overkill.
So, what exactly are these issues and why can't we add new IPC
primitives to Linux which
allow a decent userland dbus?
To me kdbus seems much like an ad-hoc solution which is very dbus centric.
IIRC Alan asked the same question.

--
Thanks,
//richard

2015-04-20 20:56:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 10:43:19PM +0200, Richard Weinberger wrote:
> David,
>
> On Thu, Apr 16, 2015 at 8:20 PM, David Herrmann <[email protected]> wrote:
> > Hi
> >
> > On Wed, Apr 15, 2015 at 8:18 PM, Linus Torvalds
> > <[email protected]> wrote:
> >> On Wed, Apr 15, 2015 at 11:11 AM, Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >>> On Wed, Apr 15, 2015 at 01:33:27PM -0400, Steven Rostedt wrote:
> >>>>
> >>>> I'll argue that you can't fix the later one. One thing that I've observed over
> >>>> the years of having faster computers is, as soon as you make it faster, people
> >>>> will write slower software.
> >>>>
> >>>> Currently the issue is that we have thousands of dbus queries, you make dbus
> >>>> 10x faster, I guarantee that people will write software with 10 thousand dbus
> >>>> queries and we are no better off than we are today.
> >>>
> >>> Then they get to buy a faster machine :)
> >>
> >> Is there actually a performance issue?
> >>
> >> I've seen this claimed, but I have never seen any actual numbers. What
> >> speeds up? By how much? is it actually measurable?
> >
> > For us, boot speed-up has not been the primary concern. The boot-time
> > speedup that kdbus provides is unlikely to be significant on a generic
> > linux distro today, given that nowadays the slow parts during bootup
> > are firmware and hw initialization. The number of dbus messages sent
> > during bootup of a general purpose Linux distro is relatively small.
> > OTOH, during start-up of desktop environments, the dbus traffic is
> > substantial, but until the porting of Qt and glib to kdbus has been
> > completed and merged the real-world effect of this is minimal.
> >
> > Our interest in improved raw performance is mainly motivated by making
> > dbus a viable protocol in situations where its semantics are
> > appropriate. But for performance reasons one had to use custom
> > protocols instead so far. Greg gave some examples in the cover letter,
> > multi-media being the most obvious area where the ten-fold decrease in
> > latency and the ability to efficiently copy large chunks of memory
> > from one process to another is relevant.
>
> In which situation on a common Linux system is the current dbus too slow today?
> I've never seen a issue like "Oh my system is slow because dbus is
> eating too much CPU cycles".

See the original email which explained all of the things we can not do
with D-Bus, some of which are due to speed, that can now be done with the
kdbus code.

> dbus my have issues which are worth to fix. But moving dbus more or
> less in the kernel
> seems overkill.

Why do you think so? How is this code "overkill"?

> So, what exactly are these issues and why can't we add new IPC
> primitives to Linux which
> allow a decent userland dbus?

That's exactly what this patchset does.

> To me kdbus seems much like an ad-hoc solution which is very dbus centric.

Yes it is, but the "dbus centric" thing is a valid model that is quite
useful and in use by a lot of programs as it solves a real problem.

> IIRC Alan asked the same question.

Yes, you can build everything off of tiny socket calls, but when you do
that, you end up with the D-Bus userspace implementation we have today,
with the issues that it has. By moving portions of that model into the
kernel, as is done here, it solves a number of these issues, and allows
for a lot more flexibility and things to be done that are impossible
with the current model of trying to build on top of tiny ipc functions.

The existing code is much stripped down from what you think of as a
D-Bus daemon today, only the exact needed pieces are implemented here.

Do you see anything wrong with the code as is submitted (aside from the
issues that Al has pointed out that are being resolved already?)

thanks,

greg k-h

2015-04-20 21:16:55

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Greg,

Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman:
>> In which situation on a common Linux system is the current dbus too slow today?
>> I've never seen a issue like "Oh my system is slow because dbus is
>> eating too much CPU cycles".
>
> See the original email which explained all of the things we can not do
> with D-Bus, some of which are due to speed, that can now be done with the
> kdbus code.

okay, let's do it together.

1. Performance
You write:
"DBus is not used for performance sensitive applications because DBus is slow.
We want to make it fast so we can finally use it for low-latency,
high-throughput applications."

Which applications exactly?
This reads to me like a solution for a non-existing problem.

2. Security
I don't think that you need a 13k piece of code in the kernel to solve
that issue.

3. Semantics for apps with heavy data payloads

Again, sounds like a solution for a non-existing problem.

4. "Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions."

You really need a in-kernel dbus with 13k to solve that?

5. Eavesdropping on the kernel level

Same here.

6. dbus-daemon is not available during early-boot or shutdown.

Why do you need dbus in that stage?


Can you now please start answering questions instead of pointing to random mails?

>> dbus my have issues which are worth to fix. But moving dbus more or
>> less in the kernel
>> seems overkill.
>
> Why do you think so? How is this code "overkill"?

Did you try to review it? ;-)
No really, it is by means no IPC primitive it is a super high level
in-kernel IPC and not trivial.

>> So, what exactly are these issues and why can't we add new IPC
>> primitives to Linux which
>> allow a decent userland dbus?
>
> That's exactly what this patchset does.

No, it moves dbus into the kernel.

>> To me kdbus seems much like an ad-hoc solution which is very dbus centric.
>
> Yes it is, but the "dbus centric" thing is a valid model that is quite
> useful and in use by a lot of programs as it solves a real problem.

What programs?

>> IIRC Alan asked the same question.
>
> Yes, you can build everything off of tiny socket calls, but when you do
> that, you end up with the D-Bus userspace implementation we have today,
> with the issues that it has. By moving portions of that model into the
> kernel, as is done here, it solves a number of these issues, and allows
> for a lot more flexibility and things to be done that are impossible
> with the current model of trying to build on top of tiny ipc functions.
>
> The existing code is much stripped down from what you think of as a
> D-Bus daemon today, only the exact needed pieces are implemented here.
>
> Do you see anything wrong with the code as is submitted (aside from the
> issues that Al has pointed out that are being resolved already?)

The code is fine, the concepts are fishy as pointed out many times in this thread.
My point is that we should try hard to fix dbus instead of moving it into the kernel.

Thanks,
//richard

2015-04-20 21:46:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote:
> Greg,
>
> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman:
> >> In which situation on a common Linux system is the current dbus too slow today?
> >> I've never seen a issue like "Oh my system is slow because dbus is
> >> eating too much CPU cycles".
> >
> > See the original email which explained all of the things we can not do
> > with D-Bus, some of which are due to speed, that can now be done with the
> > kdbus code.
>
> okay, let's do it together.
>
> 1. Performance
> You write:
> "DBus is not used for performance sensitive applications because DBus is slow.
> We want to make it fast so we can finally use it for low-latency,
> high-throughput applications."
>
> Which applications exactly?
> This reads to me like a solution for a non-existing problem.

Anything that uses UDS for large buffers today can switch to using kdbus
for it's data stream as it is faster. I know the Pulse Audio people
have discussed this, and there are other people as well (Enlightenment
library developers, glib, wayland, etc.) Without the code being in the
kernel, no project is going to spend the time to convert their codebase
to a feature that isn't accepted.

> 2. Security
> I don't think that you need a 13k piece of code in the kernel to solve
> that issue.

Wait, what? How can you blow by that requirement by just saying that
this proposal isn't acceptable? You can't do that, sorry. Please show
how what we have proposed does not provide the security requirements as
is documented.

> 3. Semantics for apps with heavy data payloads
>
> Again, sounds like a solution for a non-existing problem.

No, media apps need to share their data somehow, and kdbus provides a
way to do that. GNOME portals are one such proposed codebase that is
looking to use kdbus for this, and again, so is Pulse Audio and the
other groups listed above.

> 4. "Being in the kernel closes a lot of races which can't be fixed with
> the current userspace solutions."
>
> You really need a in-kernel dbus with 13k to solve that?

Do you know of a smaller amount of code to solve this problem? If so,
wonderful, please show us, but we aren't playing code golf here. We are
proposing something that is well documented and easy to maintain, while
still being fast and correct. If it you think this can be done in a
smaller amount of code, please show us where we are doing needless
things in the patches.

> 5. Eavesdropping on the kernel level
>
> Same here.

same here for what? Again, please point out the wasted code we have,
and we will be glad to remove it. Or change it to be smaller. Odds are
we have done something wrong and it can be reduced, but I don't see it.
Hints are greatly appreciated.

> 6. dbus-daemon is not available during early-boot or shutdown.
>
> Why do you need dbus in that stage?

You need a way to transfer system state from the initrd to the root
disk, and using D-Bus is a great way to do that. systemd goes through a
lot of special gyrations to achieve this that can be greatly simplified
by using kdbus. If the kernel provides the service, you can also ensure
that you have a working D-Bus early on and very late, so that
applications / daemons can properly rely on it for startup/shutdown,
which can not be done today.

> Can you now please start answering questions instead of pointing to random mails?

I've done nothing _but_ answer questions in this email-thread-of-doom.

> >> dbus my have issues which are worth to fix. But moving dbus more or
> >> less in the kernel
> >> seems overkill.
> >
> > Why do you think so? How is this code "overkill"?
>
> Did you try to review it? ;-)
> No really, it is by means no IPC primitive it is a super high level
> in-kernel IPC and not trivial.

Yes, it's not "trivial" as it solves a "non-trivial" problem. I agree.
But to claim that somehow a much smaller solution can be found without
pointing out what we did wrong with our proposal is just rude.

> >> So, what exactly are these issues and why can't we add new IPC
> >> primitives to Linux which
> >> allow a decent userland dbus?
> >
> > That's exactly what this patchset does.
>
> No, it moves dbus into the kernel.

No, it moves part of the dbus model into the kernel. There are still
other things that are outside of the kernel, as they do not belong
in the kernel. You need a library in userspace to properly implement
the D-Bus protocol and interaction that userspace programs are needing.
If we had implemented the "whole thing" in the kernel, then that library
would not be needed.

> >> To me kdbus seems much like an ad-hoc solution which is very dbus centric.
> >
> > Yes it is, but the "dbus centric" thing is a valid model that is quite
> > useful and in use by a lot of programs as it solves a real problem.
>
> What programs?

Really? Come on, look at all of the user stack that relies on D-Bus
today (GNOME, KDE), and all of the programs that take advantage of the
libraries that provide D-Bus bindings (Qt, glib, Go, python, perl, etc.)

Yes, it's possible to run a Linux without using D-Bus, but almost no
distro these days does that anymore. Just look at what is on your
desktop and server today that all major Linux companies support.

As was pointed out, even the tiny IoT devices running Linux are now
using D-Bus, it's everywhere :)

> >> IIRC Alan asked the same question.
> >
> > Yes, you can build everything off of tiny socket calls, but when you do
> > that, you end up with the D-Bus userspace implementation we have today,
> > with the issues that it has. By moving portions of that model into the
> > kernel, as is done here, it solves a number of these issues, and allows
> > for a lot more flexibility and things to be done that are impossible
> > with the current model of trying to build on top of tiny ipc functions.
> >
> > The existing code is much stripped down from what you think of as a
> > D-Bus daemon today, only the exact needed pieces are implemented here.
> >
> > Do you see anything wrong with the code as is submitted (aside from the
> > issues that Al has pointed out that are being resolved already?)
>
> The code is fine, the concepts are fishy as pointed out many times in
> this thread. My point is that we should try hard to fix dbus instead
> of moving it into the kernel.

What needs to be "fixed" in D-Bus that this patchset provide a solution
for?

And no, the concepts are not "fishy" at all. They solve a real problem,
and need, that programs today rely on. The concepts have been used for
many decades, and this specific implementation has lasted for over a
decade as it seems to be the best model that has evolved over time. And
there is no known proposed model out there to succeed it that I know of,
do you know of one?

Because of that, and the thread where the proposed security problems
were agreed not to be a security problem, I don't see a reason anymore
why this code should not be merged.

With the exception of Al's code review, which is being addressed. But
that's a minor thing, not a major design flaw at all.

thanks,

greg k-h

2015-04-20 22:06:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 2:46 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote:
>> Greg,
>>
>> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman:
>> >> In which situation on a common Linux system is the current dbus too slow today?
>> >> I've never seen a issue like "Oh my system is slow because dbus is
>> >> eating too much CPU cycles".
>> >
>> > See the original email which explained all of the things we can not do
>> > with D-Bus, some of which are due to speed, that can now be done with the
>> > kdbus code.
>>
>> okay, let's do it together.
>>
>> 1. Performance
>> You write:
>> "DBus is not used for performance sensitive applications because DBus is slow.
>> We want to make it fast so we can finally use it for low-latency,
>> high-throughput applications."
>>
>> Which applications exactly?
>> This reads to me like a solution for a non-existing problem.
>
> Anything that uses UDS for large buffers today can switch to using kdbus
> for it's data stream as it is faster. I know the Pulse Audio people
> have discussed this, and there are other people as well (Enlightenment
> library developers, glib, wayland, etc.) Without the code being in the
> kernel, no project is going to spend the time to convert their codebase
> to a feature that isn't accepted.

Anything that uses UDS for large buffers today can switch to using
memfd over SCM_RIGHTS right now. If SCM_RIGHTS is too slow, then we
can fix it along the lines that Al proposed.

>
>> 2. Security
>> I don't think that you need a 13k piece of code in the kernel to solve
>> that issue.
>
> Wait, what? How can you blow by that requirement by just saying that
> this proposal isn't acceptable? You can't do that, sorry. Please show
> how what we have proposed does not provide the security requirements as
> is documented.

This is backwards. The way this discussion is going is:

kdbus promoters: here's some code

someone else: the code does such and such in a way that's wrong for xyz reason

kdbus promoters: show us the implementation bug in such and such

This is not how this discussion should work. Richard didn't say there
was a bug in your code; he said that your code was too large.

>
>> 3. Semantics for apps with heavy data payloads
>>
>> Again, sounds like a solution for a non-existing problem.
>
> No, media apps need to share their data somehow, and kdbus provides a
> way to do that. GNOME portals are one such proposed codebase that is
> looking to use kdbus for this, and again, so is Pulse Audio and the
> other groups listed above.

AFAICT you're talking about passing data into and out of a sandbox for
processing or UI purposes. We have two excellent ways to do that
right now: memfd and splice, depending on exactly what you're doing.

>
>> 4. "Being in the kernel closes a lot of races which can't be fixed with
>> the current userspace solutions."
>>
>> You really need a in-kernel dbus with 13k to solve that?
>
> Do you know of a smaller amount of code to solve this problem? If so,
> wonderful, please show us, but we aren't playing code golf here. We are
> proposing something that is well documented and easy to maintain, while
> still being fast and correct. If it you think this can be done in a
> smaller amount of code, please show us where we are doing needless
> things in the patches.

I do. Implement something like my old SCM_IDENTITY proposal, which is
kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I
never proposed most of the absurd metadata items that kdbus is
proposing, and I also suggesting doing it over plain old UNIX sockets.

If that ends up at more than 500 LOC, then something's wrong. Also,
everyone gets the benefit, not just kdbus.

[snip]

> Because of that, and the thread where the proposed security problems
> were agreed not to be a security problem, I don't see a reason anymore
> why this code should not be merged.
>
> With the exception of Al's code review, which is being addressed. But
> that's a minor thing, not a major design flaw at all.

My NACK stands. A security problem was fixed, but the metadata system
has multiple problems, each of which is independently sufficient to
earn my nack.

On top of that, the policy mechanism is iffy and is probably worthy of my nack.

On top of that, I think that someone into resource management needs to
seriously consider whether having a broadcast send do get_user_pages
or the equivalent on pages supplied by untrusted recipients (plural!)
is a good idea.

On top of *that*, I have serious doubts that the whole design make
sense. That doesn't earn my nack specifically, but it sure seems like
a lot of people share my doubts that the design makes sense, and I
don't hear a whole lot of people saying that they thing the design is
a good thing to put in the kernel.

Also, the current thread-of-lesser-doom on the systemd list greatly
decreases my confidence that the issues that have earned my nack will
get resolved. The kdbus designers seem to be unwilling to accept that
code should be merged into the kernel merely because I (me,
personally) don't see a straightforward security exploit that the code
enables.

--Andy

2015-04-21 07:58:04

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 03:06:09PM -0700, Andy Lutomirski wrote:
>
> I do. Implement something like my old SCM_IDENTITY proposal, which is
> kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I
> never proposed most of the absurd metadata items that kdbus is
> proposing, and I also suggesting doing it over plain old UNIX sockets.

I'd like to point out that the DBus spec describes a fairly
standard per-connection authentication mechanism.
It seems that userspace DBus doesn't need per-message metadata.


Johannes

2015-04-21 08:09:43

by Daniel Mack

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On 04/20/2015 08:01 PM, James Bottomley wrote:
> On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote:

>> Do you have ideas on how to go about fixing it, whether in userspace
>> or kernel dbus?
>
> Well, I've always suspected the solution would be for dbus to have a
> hierarchical namespace of its own with the default policy be pass
> message to parent namespace. This would allow a container to determine
> which services were serviced outside and which inside the container (if
> you attach as a provider to the system bus in the container, that
> attachment supersedes the parent).
>
> However, this doesn't solve the security problem: just because a
> container hasn't attached an interior provider doesn't mean it should be
> allowed complete access to all services provided from outside. This is
> the nasty problem because it involves some type of filter on busses
> which pass through containers.

Fair point, we've been thinking about that as well. What we implemented
for that is something we call 'custom endpoints', which is described in
kdbus.endpoint(7).

In short, an endpoint is an entry point to the bus. Each bus provides a
default endpoint node that enforces the bus-wide policy rules that
define which well-known names a peer may own, see, or talk to. Custom
endpoints can be added to carry additional policy rules for peers
connected through it, and redirecting a task or container to the custom
endpoint instead of the default one is as easy as bind-mounting the
node. systemd units actually have support for that since a while, which
is how we tested this feature. This implementation doesn't even add much
code to kdbus, because we do have the policy code around anyway, so
that's just a matter of which policy database to look at during runtime.

That said, it would actually even be easy to implement a way to allow
overriding names on custom endpoints too, so that services inside a
container can replace such that already exist on the bus. It's just that
so far, we haven't yet seen a use case for this.


Thanks,
Daniel

2015-04-21 08:18:35

by Richard Cochran

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 11:46:51PM +0200, Greg Kroah-Hartman wrote:
> As was pointed out, even the tiny IoT devices running Linux are now
> using D-Bus, it's everywhere :)

I would like to take issue with that assertion. Some people are
putting dbus and systemd into embedded devices, but not into "tiny"
ones but rather "beefy" ones. Furthermore, just because some people
use systemd in embedded doesn't mean that that is the optimal solution
or that everyone does it that way.

Thanks,
Richard

2015-04-21 09:08:10

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote:
> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman:
> >> In which situation on a common Linux system is the current dbus too slow today?
> >> I've never seen a issue like "Oh my system is slow because dbus is
> >> eating too much CPU cycles".
> >
> > See the original email which explained all of the things we can not do
> > with D-Bus, some of which are due to speed, that can now be done with the
> > kdbus code.
>
> okay, let's do it together.
>
> 1. Performance
> You write:
> "DBus is not used for performance sensitive applications because DBus is slow.
> We want to make it fast so we can finally use it for low-latency,
> high-throughput applications."
>
> Which applications exactly?
> This reads to me like a solution for a non-existing problem.

This is the line of thinking I was aiming at during a previous
review cycle. Basically, as Havoc outlined in his mail explaining
the design decisions, he traded speed for simplicity and chose
the slowest possible messaging topology (everything goes through
a central broker). That makes sense because, to quote from his mail:

http://article.gmane.org/gmane.linux.kernel/1931720
> Message passing or IPC isn't really the most important part of dbus.
> Process lifecycle tracking and discovery are more important.

I asked for performance numbers and got this reply from David Herrmann:
http://article.gmane.org/gmane.linux.kernel.api/7636

My line of thinking had been to amend DBus with optional direct
client/server communication for the performance critical
cases, since I believe those cases are RPC calls and not other
types of messaging (see also the "Performance" section in the
cover letter of this thread). (My other line of thinking had
been: if you need performance, don't use DBus e.g. in the
case of the tiny ARM systems sending hundreds of thousands of
messages during boot, quoted by Greg.)

Now, after reading Havoc's description of the DBus design
trade-offs, I have doubts that modifiying the DBus architecture
in userspace to speed it up is a good thing. OTOH I am
still convinced kdbus is the wrong solution, for the sole
reason called gut feeling. There is nothing about the kdbus API
that makes me go "oh nice, elegant, I want to use it". And
the kdbus authors seem to agree and tell you you should only ever
use it via a library like sd-bus and try to justify it by
comparing it to ALSA :-(

The ideas discussed among Alan, Andy and Al, even if ad-hoc
and yet immature, immediately seemed to be much more
appealing to me. I hope something real comes out of that.


Johannes

2015-04-21 09:36:11

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> On top of that, I think that someone into resource management needs to
> seriously consider whether having a broadcast send do get_user_pages
> or the equivalent on pages supplied by untrusted recipients (plural!)
> is a good idea.

Oh but its so much fun if you pass pages belonging to a device driver, or
pass bits of a GEM object thereby keeping entire graphics textures
referenced 8)

The get_user_pages stuff looks like simple case of premature
optimisation (or I suspect pessmimisation). If bulk data goes via
memfd/splice/socket point to point as it should (or via shared memory
objects) then you don't need all the page pinning and that begins to
simplify the code a lot. Copies of small messages are cheap if not free
on most processors. Its a memcpy into a simple refcounted buffer which
gets deleted when the refcount hits zero.

(I'd argue for also looking at stuff like GEM and the dma buffers for some
of the big stuff. As we move towards more and more accelerators again
being able to pass around handles between accelerator more and more
important)

We do need something for the multicast messaging. Whether that's
supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX
message queue extensions ?). There's no real IP layer reliable ordered
multicast delivery system that is low latency and lightweight because
once it hits real networks it changes from a hard problem into a
seriously hard problem because of multicast implosions and the like.

Alan

2015-04-21 10:17:51

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
<[email protected]> wrote:
>> On top of that, I think that someone into resource management needs to
>> seriously consider whether having a broadcast send do get_user_pages
>> or the equivalent on pages supplied by untrusted recipients (plural!)
>> is a good idea.
>
> Oh but its so much fun if you pass pages belonging to a device driver, or
> pass bits of a GEM object thereby keeping entire graphics textures
> referenced 8)

We do not use GUP, nor do we pass around pinned pages. All we use is
__vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
copy_from_user() internally relies on GUP or not, is an orthogonal
issue that does not belong here.

Thanks
David

2015-04-21 10:31:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 20, 2015 at 03:06:09PM -0700, Andy Lutomirski wrote:
> On Mon, Apr 20, 2015 at 2:46 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote:
> >> Greg,
> >>
> >> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman:
> >> >> In which situation on a common Linux system is the current dbus too slow today?
> >> >> I've never seen a issue like "Oh my system is slow because dbus is
> >> >> eating too much CPU cycles".
> >> >
> >> > See the original email which explained all of the things we can not do
> >> > with D-Bus, some of which are due to speed, that can now be done with the
> >> > kdbus code.
> >>
> >> okay, let's do it together.
> >>
> >> 1. Performance
> >> You write:
> >> "DBus is not used for performance sensitive applications because DBus is slow.
> >> We want to make it fast so we can finally use it for low-latency,
> >> high-throughput applications."
> >>
> >> Which applications exactly?
> >> This reads to me like a solution for a non-existing problem.
> >
> > Anything that uses UDS for large buffers today can switch to using kdbus
> > for it's data stream as it is faster. I know the Pulse Audio people
> > have discussed this, and there are other people as well (Enlightenment
> > library developers, glib, wayland, etc.) Without the code being in the
> > kernel, no project is going to spend the time to convert their codebase
> > to a feature that isn't accepted.
>
> Anything that uses UDS for large buffers today can switch to using
> memfd over SCM_RIGHTS right now. If SCM_RIGHTS is too slow, then we
> can fix it along the lines that Al proposed.

But that doesn't solve the latency issues.

As has been said many times in this thread, when using UDS to build a
better IPC for apps, you will probably end up with todays D-Bus
userspace implementation, and not have any of the other things that we
keep talking about kdbus having.

Bringing up SCM_RIGHTS means that this is not going to be a bus system
at all. One principal design goal is to _not_ have peer-to-peer
connections between all communicating parties, but rather one connection
to a central component. If that component is not in the kernel, it has
to be a userspace deamon, which in turn has all of the issues that
dbus-daemon currently has.

> >> 2. Security
> >> I don't think that you need a 13k piece of code in the kernel to solve
> >> that issue.
> >
> > Wait, what? How can you blow by that requirement by just saying that
> > this proposal isn't acceptable? You can't do that, sorry. Please show
> > how what we have proposed does not provide the security requirements as
> > is documented.
>
> This is backwards. The way this discussion is going is:
>
> kdbus promoters: here's some code
>
> someone else: the code does such and such in a way that's wrong for xyz reason
>
> kdbus promoters: show us the implementation bug in such and such
>
> This is not how this discussion should work. Richard didn't say there
> was a bug in your code; he said that your code was too large.

"Your code is too large" does not provide any value to this discussion
at all, sorry. Richard is being a jerk here, please don't perpetuate
that line of discussion, it's not helpful at all.

> >> 3. Semantics for apps with heavy data payloads
> >>
> >> Again, sounds like a solution for a non-existing problem.
> >
> > No, media apps need to share their data somehow, and kdbus provides a
> > way to do that. GNOME portals are one such proposed codebase that is
> > looking to use kdbus for this, and again, so is Pulse Audio and the
> > other groups listed above.
>
> AFAICT you're talking about passing data into and out of a sandbox for
> processing or UI purposes. We have two excellent ways to do that
> right now: memfd and splice, depending on exactly what you're doing.

That does not solve the latency issues, which is crucial for sound and
graphics.

> >> 4. "Being in the kernel closes a lot of races which can't be fixed with
> >> the current userspace solutions."
> >>
> >> You really need a in-kernel dbus with 13k to solve that?
> >
> > Do you know of a smaller amount of code to solve this problem? If so,
> > wonderful, please show us, but we aren't playing code golf here. We are
> > proposing something that is well documented and easy to maintain, while
> > still being fast and correct. If it you think this can be done in a
> > smaller amount of code, please show us where we are doing needless
> > things in the patches.
>
> I do. Implement something like my old SCM_IDENTITY proposal, which is
> kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I
> never proposed most of the absurd metadata items that kdbus is
> proposing, and I also suggesting doing it over plain old UNIX sockets.

We _want_ this metadata. You don't, that's fine. Calling our position
"absurd" does not contribute to the discussion. We are simply exporting
data that is already accessible via /proc and other locations, and do so
in a race-free manner, something the kernel has never been able to
provide in the past.

We do not, in any way, export any additional internal kernel state,
again, we are merely closing a race gap that has been there.

> > Because of that, and the thread where the proposed security problems
> > were agreed not to be a security problem, I don't see a reason anymore
> > why this code should not be merged.
> >
> > With the exception of Al's code review, which is being addressed. But
> > that's a minor thing, not a major design flaw at all.
>
> My NACK stands. A security problem was fixed,

Please note that this issue was addressed in v2, which was posted many
months ago. It is not present in this submission at all.

> but the metadata system
> has multiple problems, each of which is independently sufficient to
> earn my nack.

If you still see a problem, please explain what it is. At least give a
general outline so that we can try to understand where you are coming
from here. On the systemd mailing list you said that your only issue
was that you are not convinced that this is a useful feature. But now
you are saying you have "multiple concerns". What are they?

>
> On top of that, the policy mechanism is iffy and is probably worthy of my nack.

This is a well-established concept that has worked great for many years.
Why should we break with that?

> On top of that, I think that someone into resource management needs to
> seriously consider whether having a broadcast send do get_user_pages
> or the equivalent on pages supplied by untrusted recipients (plural!)
> is a good idea.

Recipients need TALK access to the sender to receive broadcasts.
Furthermore, even on AF_UNIX you need sender-side buffering, which might
trigger reclaim. But sender-side buffering does not make sense for
broadcasts (there is no POLLOUT for broadcasts), which is why we
implemented the kdbus pools. I really doubt that the netlink-way of
making all buffers kernel-owned is the way to go. But it would be
trivial to change pool-memory on the root-memcg or even lock it, which
would be almost equivalent to kernel-owned buffers, if you think that
would solve a problem.

>
> On top of *that*, I have serious doubts that the whole design make
> sense. That doesn't earn my nack specifically, but it sure seems like
> a lot of people share my doubts that the design makes sense, and I
> don't hear a whole lot of people saying that they thing the design is
> a good thing to put in the kernel.
>
> Also, the current thread-of-lesser-doom on the systemd list greatly
> decreases my confidence that the issues that have earned my nack will
> get resolved. The kdbus designers seem to be unwilling to accept that
> code should be merged into the kernel merely because I (me,
> personally) don't see a straightforward security exploit that the code
> enables.

You have claimed that there is a security issue with no arguments
backing it up. No one is expecting an actual exploit, but at least an
explanation of what you have in mind and how it applies to the kdbus
design would be appreciated. Otherwise this is an argument that no one
can ever refute, and isn't fair.

We are trying to find proper solutions for the problems we see, and that
people tell us about. If there is a security issue in any of this,
please let us know, and if these are unfixable we are very open to
change the design. After all, that's why we are discussing it here in
the open.

Your review comments, and Al's, have been invaluable in helping make
this code better, and I greatly appreciate them. The code is much
better today than the v1 submission, and it shows, your insights here
have been wonderful. But now, by saying that somehow the existing
design details that we have picked are dangerous, without providing any
details as to _why_ they are dangerous, is leaving us with nothing to
actually be able to change.

So please, specifics please, otherwise there's no way that we can
provide a solution for this problem area.

thanks,

greg k-h

2015-04-21 10:51:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 10:35:19AM +0100, One Thousand Gnomes wrote:
>
> We do need something for the multicast messaging. Whether that's
> supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX
> message queue extensions ?). There's no real IP layer reliable ordered
> multicast delivery system that is low latency and lightweight because
> once it hits real networks it changes from a hard problem into a
> seriously hard problem because of multicast implosions and the like.

This was attempted in the past with AF_DBUS, but the networking
maintainers rightfully pointed out that the model there did not work.

thanks,

greg k-h

2015-04-21 10:53:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 12:31:28PM +0200, Greg Kroah-Hartman wrote:
> "Your code is too large" does not provide any value to this discussion
> at all, sorry. Richard is being a jerk here, please don't perpetuate
> that line of discussion, it's not helpful at all.

We're becomint offensive slowly, aren't we?

I'm sorry but Richard's right. A lot of people told you already that
this is a *lot* of code and no other kernel person has given its
Reviewed-by: for this.

Which tells me that no one has reviewed it, maybe because it is a *lot*
of code. Or maybe because people are busy with other stuff and don't
have time to review 13KLOC and a spec ontop for something which is
supposed to speed up some userspace pile which reportedly hasn't been
written yet or for some use cases which by no means justify the addition
of 13KLOC accelerator code to the kernel.

Yeah, right.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-21 11:04:05

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, 21 Apr 2015, Greg Kroah-Hartman wrote:

> > We do need something for the multicast messaging. Whether that's
> > supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX
> > message queue extensions ?). There's no real IP layer reliable ordered
> > multicast delivery system that is low latency and lightweight because
> > once it hits real networks it changes from a hard problem into a
> > seriously hard problem because of multicast implosions and the like.
>
> This was attempted in the past with AF_DBUS, but the networking
> maintainers rightfully pointed out that the model there did not work.

BTW, I don't think this has been brought up in this discussion yet ...
please correct me if I am wrong, my memory is very faint here (*), but
wasn't the main objection to AF_BUS that defining what happens when one of
the subscribed receivers disconnects is a policy matter, and as such
belongs to userspace (which wasn't the case with the submitted AF_BUS
implementation)?

Was that considered unfixable and AF_BUS consequently given up because of
this?

I personally think that AF_BUS makes quite a lot of sense -- it builds on
what we already have (AF_UNIX credential passing, memfd sealing, etc), it
basically "just implements a missing socket semantics" (wrt. reliability
and multicasting).

(*) and I really would like to avoid the digging out and reading thread
similar to this one, about AF_BUS, again

Thanks,

--
Jiri Kosina
SUSE Labs

2015-04-21 11:09:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 12:53:20PM +0200, Borislav Petkov wrote:
> On Tue, Apr 21, 2015 at 12:31:28PM +0200, Greg Kroah-Hartman wrote:
> > "Your code is too large" does not provide any value to this discussion
> > at all, sorry. Richard is being a jerk here, please don't perpetuate
> > that line of discussion, it's not helpful at all.
>
> We're becomint offensive slowly, aren't we?

Vs. being offensive quickly like you started out with? :)

> I'm sorry but Richard's right. A lot of people told you already that
> this is a *lot* of code and no other kernel person has given its
> Reviewed-by: for this.

And I've kept asking for review, and the people who have reviewed it,
their issues have been addressed.

But to somehow say "it's insecure because it is a lot of code" is
unhelpful and flat out mean.

> Which tells me that no one has reviewed it, maybe because it is a *lot*
> of code. Or maybe because people are busy with other stuff and don't
> have time to review 13KLOC and a spec ontop for something which is
> supposed to speed up some userspace pile which reportedly hasn't been
> written yet or for some use cases which by no means justify the addition
> of 13KLOC accelerator code to the kernel.

We add chunks of code large than this to the kernel all the time. For
core infrastructure pieces. Asking for people to review it, and discuss
it is normal, nothing is happening different here.

In the end, not everyone reviews all of the code, that's normal. I see
chunks get merged all the time and I go "what? That's crazy? Who would
want a virtual machine as their networking packet parser?" But given
that the maintainer of the subsystem acked it, and has agreed to
maintain it, I trust them to do the right thing. That's how kernel
development works, on trust.

And that's the core issue here, trust.

You are the only one to bring this up in the thread, but I feel it's the
underlying theme. You act as if you don't trust us, the developers, to
be doing the right thing here. If so, great, please, let's talk about
it.

Trust is about not always getting everything right, but trust that the
people involved will be around to fix it if something is wrong. And
that the people involved are actually working toward something they see
is valuable and needed in an honest way.

If you don't trust me, great, say it. If you don't trust David, Daniel,
and Djalal, great, say so, and we can work on addressing that issue.

If you don't trust the code, wonderful, please let me know that and show
what is untrustworthy about it, again, as Al and Andy have done so. Let
us address it that way, as has been done so thanks to Al and Andy's
review.

But to do drive-by potshots in an email thread, just because you somehow
don't like the color of the bikeshed, without having every looked inside
the bikeshed to see what it is doing there, is completely unfair and a
unreasonable and totally unproductive.

greg "read the code luke" k-h

2015-04-21 11:41:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 01:09:42PM +0200, Greg Kroah-Hartman wrote:
> Vs. being offensive quickly like you started out with? :)

Of course you'll come back with an attack. Polemical tactics or what is
this called?

> And I've kept asking for review, and the people who have reviewed it,
> their issues have been addressed.
>
> But to somehow say "it's insecure because it is a lot of code" is
> unhelpful and flat out mean.

Well, so other people are reviewing it now. And it clearly needs more
review, people suggested even sitting down and talking face to face. So
we can put the mind-boggling upstreaming rush on hold now, right? Until
all the doubts have been cleared and there's general agreement.

And if you tell me that this current state of the affairs is general
agreement, then I can surely understand the NAKs.

> We add chunks of code large than this to the kernel all the time. For
> core infrastructure pieces. Asking for people to review it, and discuss
> it is normal, nothing is happening different here.

13KLOC in one go without a single Reviewed-by? I don't think so.

> In the end, not everyone reviews all of the code, that's normal. I see
> chunks get merged all the time and I go "what? That's crazy? Who would
> want a virtual machine as their networking packet parser?" But given
> that the maintainer of the subsystem acked it, and has agreed to
> maintain it, I trust them to do the right thing. That's how kernel
> development works, on trust.

You got an argument about that already: we're talking core code here,
not some driver or a packet parser which is not used by *everything*.

> And that's the core issue here, trust.
>
> You are the only one to bring this up in the thread, but I feel it's the
> underlying theme. You act as if you don't trust us, the developers, to
> be doing the right thing here. If so, great, please, let's talk about
> it.

Well, maybe I'm the only one to say it...

A lot of people are simply tired of the talk-people-into-submission
tactics and can you blame them?!

> Trust is about not always getting everything right, but trust that the
> people involved will be around to fix it if something is wrong. And
> that the people involved are actually working toward something they see
> is valuable and needed in an honest way.

Yeah, like the time I trusted to open a bugzilla to fix the "debug"
parsing on the command line and systemd hijacking it. After that
debacle, the trust container here is empty. That ship has sailed, sorry.

> If you don't trust me, great, say it. If you don't trust David, Daniel,
> and Djalal, great, say so, and we can work on addressing that issue.
>
> If you don't trust the code, wonderful, please let me know that and show
> what is untrustworthy about it, again, as Al and Andy have done so. Let
> us address it that way, as has been done so thanks to Al and Andy's
> review.
>
> But to do drive-by potshots in an email thread, just because you somehow
> don't like the color of the bikeshed, without having every looked inside
> the bikeshed to see what it is doing there, is completely unfair and a
> unreasonable and totally unproductive.

Yeah right, like you don't do that. There's another one of your attack
the opponent tactics. Oh well, I *trust* you to do *that*! :-D

Let me give you a similar comeback: "But I'm on CC!" :-P

> greg "read the code luke" k-h

Oh but I'm looking. And I don't really like what I'm seeing. But to get
back to Richard's statement: it is a *lot* of code so I, and probably
everyone else too, can't just drop everything and jump on 13KLOC code.
It needs time.

So let's stop the polemics and agree that it is not time to upstream
this thing yet. We should instead get down to business and give it a
good long look until Reviewed-by's start happening.

Agreed?

Boris "Darth Maul".

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-21 12:09:03

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-20 16:26, George Spelvin wrote:
>> It's used everywhere, on servers,
>> embedded systems, desktops, you name it. All languages have bindings
>> for it, and it's the underpinning of a modern Linux stack.
>
> Since when? D-bus is some GUI depoendency. On my console-only servers, it's
> not needed, and not installed:
>
> # dpkg-query -s libdbus-1-3 dbus
> dpkg-query: package 'libdbus-1-3' is not installed and no information is available
> dpkg-query: package 'dbus' is not installed and no information is available
> # dpkg-query -l \*dbus\*
> dpkg-query: no packages found matching *dbus*

Same here, but I use Gentoo, so it's easy to avoid stuff you don't want ;)
[...]
> Richard Weinberger wrote:
> And no systemd. Udev actually does something useful, so I have it on my
> desktop, but I have machines with a static /dev instead.
Likewise, I get by just fine with OpenRC, eudev, and Monit, the
combination of which provides all the functionality of SystemD that I
actually care about.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-21 12:20:36

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue 21-04-15 12:17:49, David Herrmann wrote:
> Hi
>
> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> <[email protected]> wrote:
> >> On top of that, I think that someone into resource management needs to
> >> seriously consider whether having a broadcast send do get_user_pages
> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> >> is a good idea.
> >
> > Oh but its so much fun if you pass pages belonging to a device driver, or
> > pass bits of a GEM object thereby keeping entire graphics textures
> > referenced 8)
>
> We do not use GUP, nor do we pass around pinned pages. All we use is
> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> copy_from_user() internally relies on GUP or not, is an orthogonal
> issue that does not belong here.

It kind of does AFAIU. If for nothing else then the memcg reasons mentioned in
other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
untrusted user is allowed to hand over a shmem backed buffer which hasn't
been charged yet (read faulted in) and then kdbus forced to fault it in
a different user's context then you basically allow to hide memory
allocations from the memcg. That is a clear show stopper.

Or have I misunderstood the way how shmem buffers are used here?

--
Michal Hocko
SUSE Labs

2015-04-21 12:56:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 01:03:59PM +0200, Jiri Kosina wrote:
> On Tue, 21 Apr 2015, Greg Kroah-Hartman wrote:
>
> > > We do need something for the multicast messaging. Whether that's
> > > supporting AF_LOCAL, SOCK_RDP with multicast or something else (POSIX
> > > message queue extensions ?). There's no real IP layer reliable ordered
> > > multicast delivery system that is low latency and lightweight because
> > > once it hits real networks it changes from a hard problem into a
> > > seriously hard problem because of multicast implosions and the like.
> >
> > This was attempted in the past with AF_DBUS, but the networking
> > maintainers rightfully pointed out that the model there did not work.
>
> BTW, I don't think this has been brought up in this discussion yet ...
> please correct me if I am wrong, my memory is very faint here (*), but
> wasn't the main objection to AF_BUS that defining what happens when one of
> the subscribed receivers disconnects is a policy matter, and as such
> belongs to userspace (which wasn't the case with the submitted AF_BUS
> implementation)?
>
> Was that considered unfixable and AF_BUS consequently given up because of
> this?

I think it was one of the reasons, I seem to remember many more. At
that time, I had lunch with David Miller and he told me a few specific
reasons along those lines, and that it just wasn't going to work as a
network protocol at all, and to not try that method anymore, but
instead, do it as a specific IPC interface, as has been done here :)

thanks,

greg k-h

2015-04-21 13:18:48

by Olivier Galibert

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 12:31 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> Bringing up SCM_RIGHTS means that this is not going to be a bus system
> at all. One principal design goal is to _not_ have peer-to-peer
> connections between all communicating parties, but rather one connection
> to a central component. If that component is not in the kernel, it has
> to be a userspace deamon, which in turn has all of the issues that
> dbus-daemon currently has.

You're not making sense there. If there is no daemon, then you're
peer-to-peer, because there's no central component. If you consider
the kernel the central component, then peer-to-peer is almost
impossible by definition.

It seems that almost everybody here thinks that the plumbing (e.g.
transmitting messages in-order with multicasting) should be separated
from the policy (who communicates with who), possibly leveraging the
packet filtering infrastructure to implement the decided policy. What
it is you reject about that point of view, which seems relatively
normal when you think about building a collection of useful tools?

OG.

2015-04-21 13:38:17

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On Tue, Apr 21, 2015 at 5:07 AM, Johannes Stezenbach <[email protected]> wrote:
> My line of thinking had been to amend DBus with optional direct
> client/server communication for the performance critical
> cases, since I believe those cases are RPC calls and not other
> types of messaging (see also the "Performance" section in the
> cover letter of this thread). (My other line of thinking had
> been: if you need performance, don't use DBus e.g. in the
> case of the tiny ARM systems sending hundreds of thousands of
> messages during boot, quoted by Greg.)
>

This has long been sort of the 'party line' and I've told many people
this on the dbus mailing list over the years (almost exactly what you
just said - that for performance-critical cases they should open a
direct socket or use something else or whatever). Usually this makes
app developers a little cranky because something that was going to be
easy in their mind just got harder.

I think the pressure to use dbus happens for several reasons, if you
use a side channel some example complaints people have are:

* you have to reinvent any dbus solutions for security policy,
containerization, debugging, introspection, etc.
* you're now writing custom socket code instead of using the
high-level dbus API
* the side channel loses message ordering with respect to dbus messages
* your app code is kind of "infected" structurally by a performance
optimization concern
* you have to decide in advance which messages are "too big" or "too
numerous" - which may not be obvious, think of a cut-and-paste API,
where usually it's a paragraph of text but it could in theory be a
giant image
* you can't do big/numerous multicast, side channel only solves the unicast

There's no doubt that it's possible to use a side channel - just as it
was possible to construct an ad hoc IPC system prior to dbus - but the
overall OS (counting both kernel and userspace) perhaps becomes more
complex as a result, compared to having one model that supports more
cases.

One way to frame it: the low performance makes dbus into a relatively
leaky abstraction where there's this surprise lurking for app
developers that they might have to roll their own IPC on the side or
special-case some of their messages.

it's not the end of the world, it's just that it would have a certain
amount of overall simplicity (counting userspace+kernel together) if
one solution covered almost all use-cases in this "process-to-process
comms on local system" scenario, instead of 90% of use-cases but too
slow for the last 10%. The simplicity here isn't only for app
developers, it's also for anyone doing debugging or administration or
system integration, where they can deal with one system _or_ one
system plus various ad-hoc side channels.

Havoc

2015-04-21 13:48:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 03:18:35PM +0200, Olivier Galibert wrote:
> On Tue, Apr 21, 2015 at 12:31 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > Bringing up SCM_RIGHTS means that this is not going to be a bus system
> > at all. One principal design goal is to _not_ have peer-to-peer
> > connections between all communicating parties, but rather one connection
> > to a central component. If that component is not in the kernel, it has
> > to be a userspace deamon, which in turn has all of the issues that
> > dbus-daemon currently has.
>
> You're not making sense there. If there is no daemon, then you're
> peer-to-peer, because there's no central component.

The kernel is the central component, as implemented in the patches.

> If you consider the kernel the central component, then peer-to-peer is
> almost impossible by definition.

Um, no, they go through the kernel for that model as well, same
interface, it just depends on the type of message that you are sending
as to who the recipients are (single or more than one.)

> It seems that almost everybody here thinks that the plumbing (e.g.
> transmitting messages in-order with multicasting) should be separated
> from the policy (who communicates with who), possibly leveraging the
> packet filtering infrastructure to implement the decided policy. What
> it is you reject about that point of view, which seems relatively
> normal when you think about building a collection of useful tools?

The plumbing is "separated" from the policy in that they are different
data structures, but you have to have the policy in order to know who to
connect with whom, otherwise it just doesn't work.

How would packet filtering work here for this type of decision making?
That's a much more complex interface than what we have implemented,
don't you agree?

thanks,

greg k-h

2015-04-21 14:01:06

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
> On Tue 21-04-15 12:17:49, David Herrmann wrote:
>> Hi
>>
>> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
>> <[email protected]> wrote:
>> >> On top of that, I think that someone into resource management needs to
>> >> seriously consider whether having a broadcast send do get_user_pages
>> >> or the equivalent on pages supplied by untrusted recipients (plural!)
>> >> is a good idea.
>> >
>> > Oh but its so much fun if you pass pages belonging to a device driver, or
>> > pass bits of a GEM object thereby keeping entire graphics textures
>> > referenced 8)
>>
>> We do not use GUP, nor do we pass around pinned pages. All we use is
>> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
>> copy_from_user() internally relies on GUP or not, is an orthogonal
>> issue that does not belong here.
>
> It kind of does AFAIU.

No, it is not. The issue with GUP is that you elevate the page
ref-count and thus prevent lru isolation, sealing, whatsoever. I
cannot see how it is related to kdbus. However, ...

> If for nothing else then the memcg reasons mentioned in
> other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> untrusted user is allowed to hand over a shmem backed buffer which hasn't
> been charged yet (read faulted in) and then kdbus forced to fault it in
> a different user's context then you basically allow to hide memory
> allocations from the memcg. That is a clear show stopper.
>
> Or have I misunderstood the way how shmem buffers are used here?

..as you mentioned memcg, lets figure that out here. shmem buffers are
used as receive-buffers by kdbus peers. They are read-only to
user-space. All allocations are done by the kernel on message passing.
There is no buffering on the sender's side, only on the receiver's.

There're 3 possible ways to charge for memory that backs a message. We
can charge the sender, the receiver or the kernel. Right now we charge
the sender, as DBus-method-calls imply a trust relationship on the
target. We could as well charge the receiver, which might be
conceptually superior. Anyhow, both are imo better than the
dbus-daemon model, where we basically charge the root-memcg (more
precisely, the memcg of dbus-daemon).

Note that a message is always charged on either the sender or the
receiver. I don't see how memory is hidden from the memcg. If you pass
a memfd to another peer, you also need to be aware that this memory
was charged on you and stays so until the remote peer drops its
reference. But all messages you receive via kdbus are always
read-only.
Btw., binder avoids this issue by not charging anyone, which I also
don't think is a viable solution.

Could you elaborate on the exact issue you're seeing here?

Thanks
David

2015-04-21 14:29:34

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue 21-04-15 16:01:01, David Herrmann wrote:
> Hi
>
> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
> >> Hi
> >>
> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> >> <[email protected]> wrote:
> >> >> On top of that, I think that someone into resource management needs to
> >> >> seriously consider whether having a broadcast send do get_user_pages
> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> >> >> is a good idea.
> >> >
> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
> >> > pass bits of a GEM object thereby keeping entire graphics textures
> >> > referenced 8)
> >>
> >> We do not use GUP, nor do we pass around pinned pages. All we use is
> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> >> copy_from_user() internally relies on GUP or not, is an orthogonal
> >> issue that does not belong here.
> >
> > It kind of does AFAIU.
>
> No, it is not. The issue with GUP is that you elevate the page
> ref-count and thus prevent lru isolation, sealing, whatsoever.

The point was that such a memory might be not present yet and need a
page fault with all the side effects - memory reclaim, memcg charge...

> I cannot see how it is related to kdbus. However, ...
>
> > If for nothing else then the memcg reasons mentioned in
> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
> > been charged yet (read faulted in) and then kdbus forced to fault it in
> > a different user's context then you basically allow to hide memory
> > allocations from the memcg. That is a clear show stopper.
> >
> > Or have I misunderstood the way how shmem buffers are used here?
>
> ..as you mentioned memcg, lets figure that out here. shmem buffers are
> used as receive-buffers by kdbus peers. They are read-only to
> user-space. All allocations are done by the kernel on message passing.

OK, so the shmem buffer is allocated on the kernels behalf and under
its control and no userspace can hand over one to kdbus. Do I get
it right? If yes then the memcg escape I was describing above is
not possible of course. This wasn't clear to me from the previous
discussion. Thanks for the clarification!
--
Michal Hocko
SUSE Labs

2015-04-21 14:47:44

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Tue, Apr 21, 2015 at 4:27 PM, Michal Hocko <[email protected]> wrote:
> On Tue 21-04-15 16:01:01, David Herrmann wrote:
>> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
>> > If for nothing else then the memcg reasons mentioned in
>> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
>> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
>> > been charged yet (read faulted in) and then kdbus forced to fault it in
>> > a different user's context then you basically allow to hide memory
>> > allocations from the memcg. That is a clear show stopper.
>> >
>> > Or have I misunderstood the way how shmem buffers are used here?
>>
>> ..as you mentioned memcg, lets figure that out here. shmem buffers are
>> used as receive-buffers by kdbus peers. They are read-only to
>> user-space. All allocations are done by the kernel on message passing.
>
> OK, so the shmem buffer is allocated on the kernels behalf and under
> its control and no userspace can hand over one to kdbus. Do I get
> it right? If yes then the memcg escape I was describing above is
> not possible of course. This wasn't clear to me from the previous
> discussion. Thanks for the clarification!

Exactly. Much appreciated!

Thanks
David

2015-04-21 15:54:49

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> Um, no, they go through the kernel for that model as well, same
> interface, it just depends on the type of message that you are sending
> as to who the recipients are (single or more than one.)

In other words its bog standard classic network layer multicasting. You
don't need much policy for that

> How would packet filtering work here for this type of decision making?
> That's a much more complex interface than what we have implemented,
> don't you agree?

No - its about 10 lines of code to invoke EBPF. The socket layer supports
it today. What it doesn't have is the multicast transport bits.

It's a classic networking layer item. I know DaveM didn't like the
original because all the policy was mixed up in it and the blocking
problem was undefined but every time I look at this I reach the same
conclusion

- Its a socket layer problem
- The sk_buff structures are the memory allocator needed
- The socket layer does the resource management
- The socket layer has SCM_RIGHTS already
- The socket layer has EBPF already
- The socket layer has a fantastic debugging environment
- It's the same components needed for fast MQ services and (almost) for
HPC uses [HPC wants scatter/gather too]

It just needs

- RDP multicast AF_UNIX type sockets. Not in itself a huge problem,
although it does need some nice refcounting so that you queue each
message once and the readers share it.
- A clear fairly policy-free description of how you deal with multicast
to some clients who are "full"

And I think that aspect of things needs to go back via the networking
maintainers to figure out if this is "DaveM doesn't like dbus" or there
are some actually insoluble problems I'm missing here.

Designing alternate turds to go around DaveM may not be the right
approach even if he's stubborn 8)

Alan

2015-04-21 16:41:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Tom Gundersen <[email protected]> writes:

> Moreover, the daemon performing the shutdown tasks is necessarily
> always privileged enough to do so, so calling into the kernel and see
> what happens is completely the wrong thing to do (it would simply
> succeed). What matters is if the client calling the daemon is
> sufficiently privileged. If the client has the capabilites necessary
> to call the reboot syscall directly, it makes no sense to disallow
> them from doing a clean reboot. It would be like giving someone access
> to pull the power plug, but not allow them to shutdown the machine
> cleanly.
>
> To conclude, the kernel makes the decision for allowing reboot() to
> succeed based on CAP_SYS_BOOT, so when we decide whether or not to
> perform the preparation steps, we really must also use CAP_SYS_BOOT.
> If we are more restrictive, it does not gain us anything as people
> with CAP_SYS_BOOT can just circumvent our logic and "pull the plug" by
> calling reboot() directly. If we are less restrictive and for instance
> check for uid==0 it would essentially mean that we have added a way to
> circumvent the dropping of CAP_SYS_BOOT.

*Blink* Privilege escalation via CAP_SYS_BOOT *Blink*

*Puts on black hat*

HeHeHe. You mean all I need to do to get around all of the logging servers is
capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program
that doesn't run as root so that it can only reboot the system? HeHeHe
So I can just trigger a clean reboot wait for journald, auditd, and
syslog all to shut down and then do evil things to the machine without
having to worry about erasing forensic evidence?

Bahahaha! This looks like fun I should play with this.

*Takes black hat off*

Seriously it does not make sense to reuse these bits for purpose to
which they were not designed. A reboot proceeded by a clean shutdown
is something different from a reboot that skips all of those steps.

I can understand the concerns about not wanting to allow circumventing
dropping CAP_SYS_BOOT but even with that concern in place I think it is
silly. That isn't what CAP_SYS_BOOT means.

Over the long term userspace doing weird things like this will mean that
we will have the change the kernel to add
CAP_SYS_BOOT_THIS_TIME_I_MEAN_IT. And have that control the reboot
system call and have the existing CAP_SYS_BOOT be some kind of token for
userspace.

Instead of going down that rat hole it would be much better for
userspace to figure out a token of their own. Perhaps a file descriptor
certain privileged processes can pass, perhaps something else.

The bottom line is that I tend to suck at figuring out how to exploit
systems and I saw an exploit possibility with the extended privileges
you granted to CAP_SYS_BOOT nearly instantly.

I can't imagine how kernel capabilities are the right too for this kind
of job.

Eric

2015-04-21 16:54:58

by Diego Viola

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

I'd like to see D-Bus in the kernel (kdbus), if that's going to make
D-Bus faster.

See this application taking 15 seconds to start just because D-Bus is too slow.

https://bugs.kde.org/show_bug.cgi?id=342682

Hopefully kdbus solves problems such as this one.

Diego

On Wed, Apr 15, 2015 at 2:59 PM, Austin S Hemmelgarn
<[email protected]> wrote:
> On 2015-04-14 15:43, Greg Kroah-Hartman wrote:
>>
>> On Tue, Apr 14, 2015 at 08:35:33PM +0100, Al Viro wrote:
>>>
>>> On Tue, Apr 14, 2015 at 09:23:57PM +0200, Greg Kroah-Hartman wrote:
>>>
>>>>> I agree. You've sent a pull request for an unfortunate design. I
>>>>> don't think that unfortunate design belongs in the kernel. If it says
>>>>> in userspace, then user programmers could potentially fix it some day.
>>>>
>>>>
>>>> You might not like the design, but it is a valid design. Again, we
>>>> don't refuse to support hardware that is designed badly. Or support
>>>> protocols we don't necessarily like, that's not the job of a kernel or
>>>> operating system.
>>>
>>>
>>> And no, "the sole consumer of that API knows better, so bend over" is not
>>> a good idea. We have shitloads of examples when single-consumer APIs
>>> turned into screaming horrors; taking that in over the objections to API
>>> design, merely on "they do it that way, who the hell we are to say they
>>> are wrong?" is insane.
>>
>>
>> Again, in this domain, the design is sound. So much so that everyone
>> who works in that area moved toward it (KDE, Qt, Go, etc.) We might not
>> think it makes sense, and it did take me a while to wrap my head around
>> it, but to call it "crap" is unfair, sorry.
>>
>
> The reason that 'everyone who works in this area' adopted is not as much
> that the design is sound (I'm not arguing whether it is or isn't in this
> case) as it is that none of them could come up with anything better.
>

2015-04-21 17:06:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 01:54:54PM -0300, Diego Viola wrote:
> I'd like to see D-Bus in the kernel (kdbus), if that's going to make
> D-Bus faster.
>
> See this application taking 15 seconds to start just because D-Bus is too slow.
>
> https://bugs.kde.org/show_bug.cgi?id=342682
>
> Hopefully kdbus solves problems such as this one.

That bug really doesn't look like it would be solved by kdbus, I don't
see a ton of messages being sent as the issue, do you? It seems like
something is timing out and then continuing on with the application
startup.

But, you can try it out, grab the kernel patch, enable it in systemd,
and try it for yourself and let us know!

thanks,

greg k-h

2015-04-21 17:25:08

by Diego Viola

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

I'm not exactly sure what the problem is. It might not even be a
problem with D-bus, and it's probably a timeout issue as you said.

I'll give kdbus a try anyway and report back.

Thanks,

Diego

On Tue, Apr 21, 2015 at 2:06 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Apr 21, 2015 at 01:54:54PM -0300, Diego Viola wrote:
>> I'd like to see D-Bus in the kernel (kdbus), if that's going to make
>> D-Bus faster.
>>
>> See this application taking 15 seconds to start just because D-Bus is too slow.
>>
>> https://bugs.kde.org/show_bug.cgi?id=342682
>>
>> Hopefully kdbus solves problems such as this one.
>
> That bug really doesn't look like it would be solved by kdbus, I don't
> see a ton of messages being sent as the issue, do you? It seems like
> something is timing out and then continuing on with the application
> startup.
>
> But, you can try it out, grab the kernel patch, enable it in systemd,
> and try it for yourself and let us know!
>
> thanks,
>
> greg k-h

2015-04-21 18:12:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <[email protected]> wrote:
> On Tue 21-04-15 16:01:01, David Herrmann wrote:
>> Hi
>>
>> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
>> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
>> >> Hi
>> >>
>> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
>> >> <[email protected]> wrote:
>> >> >> On top of that, I think that someone into resource management needs to
>> >> >> seriously consider whether having a broadcast send do get_user_pages
>> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
>> >> >> is a good idea.
>> >> >
>> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
>> >> > pass bits of a GEM object thereby keeping entire graphics textures
>> >> > referenced 8)
>> >>
>> >> We do not use GUP, nor do we pass around pinned pages. All we use is
>> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
>> >> copy_from_user() internally relies on GUP or not, is an orthogonal
>> >> issue that does not belong here.
>> >
>> > It kind of does AFAIU.
>>
>> No, it is not. The issue with GUP is that you elevate the page
>> ref-count and thus prevent lru isolation, sealing, whatsoever.
>
> The point was that such a memory might be not present yet and need a
> page fault with all the side effects - memory reclaim, memcg charge...
>
>> I cannot see how it is related to kdbus. However, ...
>>
>> > If for nothing else then the memcg reasons mentioned in
>> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
>> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
>> > been charged yet (read faulted in) and then kdbus forced to fault it in
>> > a different user's context then you basically allow to hide memory
>> > allocations from the memcg. That is a clear show stopper.
>> >
>> > Or have I misunderstood the way how shmem buffers are used here?
>>
>> ..as you mentioned memcg, lets figure that out here. shmem buffers are
>> used as receive-buffers by kdbus peers. They are read-only to
>> user-space. All allocations are done by the kernel on message passing.
>
> OK, so the shmem buffer is allocated on the kernels behalf and under
> its control and no userspace can hand over one to kdbus. Do I get
> it right? If yes then the memcg escape I was describing above is
> not possible of course. This wasn't clear to me from the previous
> discussion. Thanks for the clarification!

I'm still missing something here, I think. At the time of pool
creation, the kernel calls shmem_file_setup in the context of the
untrusted user. Then, when a privileged daemon broadcasts, the kernel
calls vfs_iter_write or similar, thus allocating the page, right? I
don't see why the page would be allocated early or why vfs_iter_write
and the associated shmem code would care what memcg created the shmem
file -- all of that code seems to use current's memcg on brief
inspection.

Bear in mind that the bad guy gets to use madvise, etc to mess around
with the page cache state.

--Andy

2015-04-21 18:19:00

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 3:31 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Mon, Apr 20, 2015 at 03:06:09PM -0700, Andy Lutomirski wrote:
>> On Mon, Apr 20, 2015 at 2:46 PM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Mon, Apr 20, 2015 at 11:16:49PM +0200, Richard Weinberger wrote:
>> >> Greg,
>> >>
>> >> Am 20.04.2015 um 22:56 schrieb Greg Kroah-Hartman:
>> >> >> In which situation on a common Linux system is the current dbus too slow today?
>> >> >> I've never seen a issue like "Oh my system is slow because dbus is
>> >> >> eating too much CPU cycles".
>> >> >
>> >> > See the original email which explained all of the things we can not do
>> >> > with D-Bus, some of which are due to speed, that can now be done with the
>> >> > kdbus code.
>> >>
>> >> okay, let's do it together.
>> >>
>> >> 1. Performance
>> >> You write:
>> >> "DBus is not used for performance sensitive applications because DBus is slow.
>> >> We want to make it fast so we can finally use it for low-latency,
>> >> high-throughput applications."
>> >>
>> >> Which applications exactly?
>> >> This reads to me like a solution for a non-existing problem.
>> >
>> > Anything that uses UDS for large buffers today can switch to using kdbus
>> > for it's data stream as it is faster. I know the Pulse Audio people
>> > have discussed this, and there are other people as well (Enlightenment
>> > library developers, glib, wayland, etc.) Without the code being in the
>> > kernel, no project is going to spend the time to convert their codebase
>> > to a feature that isn't accepted.
>>
>> Anything that uses UDS for large buffers today can switch to using
>> memfd over SCM_RIGHTS right now. If SCM_RIGHTS is too slow, then we
>> can fix it along the lines that Al proposed.
>
> But that doesn't solve the latency issues.

I said memfd, not memfd bounced off a userspace daemon. AFAICT
AF_UNIX peer-to-peer is considerably faster than kdbus, and I don't
see why memfd would change this.

>
> As has been said many times in this thread, when using UDS to build a
> better IPC for apps, you will probably end up with todays D-Bus
> userspace implementation, and not have any of the other things that we
> keep talking about kdbus having.
>
> Bringing up SCM_RIGHTS means that this is not going to be a bus system
> at all. One principal design goal is to _not_ have peer-to-peer
> connections between all communicating parties, but rather one connection
> to a central component. If that component is not in the kernel, it has
> to be a userspace deamon, which in turn has all of the issues that
> dbus-daemon currently has.
>

AFAICT userspace dbus-daemon has two major problems:

1. SCM_RIGHTS sucks. That's why I proposed fixing it.

2. Performance. But using an in-kernel bus is far from the only
solution. I much prefer adding something simple and flexible in the
kernel so that a userspace daemon can easily and efficiently introduce
two bus users to each other.

>> >> 3. Semantics for apps with heavy data payloads
>> >>
>> >> Again, sounds like a solution for a non-existing problem.
>> >
>> > No, media apps need to share their data somehow, and kdbus provides a
>> > way to do that. GNOME portals are one such proposed codebase that is
>> > looking to use kdbus for this, and again, so is Pulse Audio and the
>> > other groups listed above.
>>
>> AFAICT you're talking about passing data into and out of a sandbox for
>> processing or UI purposes. We have two excellent ways to do that
>> right now: memfd and splice, depending on exactly what you're doing.
>
> That does not solve the latency issues, which is crucial for sound and
> graphics.

As above, there's only a latency issue right now if you want sound and
graphics to use a *bus*, and even that could be fixed without moving
the bus into the kernel.

>
>> >> 4. "Being in the kernel closes a lot of races which can't be fixed with
>> >> the current userspace solutions."
>> >>
>> >> You really need a in-kernel dbus with 13k to solve that?
>> >
>> > Do you know of a smaller amount of code to solve this problem? If so,
>> > wonderful, please show us, but we aren't playing code golf here. We are
>> > proposing something that is well documented and easy to maintain, while
>> > still being fast and correct. If it you think this can be done in a
>> > smaller amount of code, please show us where we are doing needless
>> > things in the patches.
>>
>> I do. Implement something like my old SCM_IDENTITY proposal, which is
>> kind of like kdbus metadata, opt-in, over UNIX sockets. Except that I
>> never proposed most of the absurd metadata items that kdbus is
>> proposing, and I also suggesting doing it over plain old UNIX sockets.
>
> We _want_ this metadata. You don't, that's fine. Calling our position
> "absurd" does not contribute to the discussion. We are simply exporting
> data that is already accessible via /proc and other locations, and do so
> in a race-free manner, something the kernel has never been able to
> provide in the past.
>
> We do not, in any way, export any additional internal kernel state,
> again, we are merely closing a race gap that has been there.

This has been covered ad nauseum on the systemd thread, so I'm going
not going to respond here.

>
>> > Because of that, and the thread where the proposed security problems
>> > were agreed not to be a security problem, I don't see a reason anymore
>> > why this code should not be merged.
>> >
>> > With the exception of Al's code review, which is being addressed. But
>> > that's a minor thing, not a major design flaw at all.
>>
>> My NACK stands. A security problem was fixed,
>
> Please note that this issue was addressed in v2, which was posted many
> months ago. It is not present in this submission at all.

That's why I said "was fixed".

>
>> but the metadata system
>> has multiple problems, each of which is independently sufficient to
>> earn my nack.
>
> If you still see a problem, please explain what it is. At least give a
> general outline so that we can try to understand where you are coming
> from here. On the systemd mailing list you said that your only issue
> was that you are not convinced that this is a useful feature. But now
> you are saying you have "multiple concerns". What are they?
>

We've only discussed creds on the systemd list. There's still cmdline
and starttime (at least).

I've actually *submitted patches* to fix starttime, but no one seems
to care. i'll resubmit them anyway for 4.2, since I think they're
more generally useful.

[snip]

--Andy

2015-04-21 18:25:53

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 1:09 AM, Daniel Mack <[email protected]> wrote:
> Hi,
>
> On 04/20/2015 08:01 PM, James Bottomley wrote:
>> On Fri, 2015-04-17 at 16:27 -0400, Havoc Pennington wrote:
>
>>> Do you have ideas on how to go about fixing it, whether in userspace
>>> or kernel dbus?
>>
>> Well, I've always suspected the solution would be for dbus to have a
>> hierarchical namespace of its own with the default policy be pass
>> message to parent namespace. This would allow a container to determine
>> which services were serviced outside and which inside the container (if
>> you attach as a provider to the system bus in the container, that
>> attachment supersedes the parent).
>>
>> However, this doesn't solve the security problem: just because a
>> container hasn't attached an interior provider doesn't mean it should be
>> allowed complete access to all services provided from outside. This is
>> the nasty problem because it involves some type of filter on busses
>> which pass through containers.
>
> Fair point, we've been thinking about that as well. What we implemented
> for that is something we call 'custom endpoints', which is described in
> kdbus.endpoint(7).
>
> In short, an endpoint is an entry point to the bus. Each bus provides a
> default endpoint node that enforces the bus-wide policy rules that
> define which well-known names a peer may own, see, or talk to. Custom
> endpoints can be added to carry additional policy rules for peers
> connected through it, and redirecting a task or container to the custom
> endpoint instead of the default one is as easy as bind-mounting the
> node. systemd units actually have support for that since a while, which
> is how we tested this feature. This implementation doesn't even add much
> code to kdbus, because we do have the policy code around anyway, so
> that's just a matter of which policy database to look at during runtime.
>
> That said, it would actually even be easy to implement a way to allow
> overriding names on custom endpoints too, so that services inside a
> container can replace such that already exist on the bus. It's just that
> so far, we haven't yet seen a use case for this.

This is part of why I think that kdbus is the wrong design. All of
this is great, but this is the kind of policy that IMO belongs in
userspace. If nothing else, it means that you can add things like
this in the future without any kernel changes.

dbus-daemon can do all of this (in principle, anyway) already -- just
stick another dbus-daemon-like program in the container that proxies
things as appropriate. I think that a good kernel-accelerated design
could do the same thing without having to put any of this type of
policy in the kernel.

(As an example, capability-based IPC gets all of this for free.)

--Andy

2015-04-21 19:39:18

by Matthew Garrett

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 11:36:54AM -0500, Eric W. Biederman wrote:
>
> HeHeHe. You mean all I need to do to get around all of the logging servers is
> capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program
> that doesn't run as root so that it can only reboot the system? HeHeHe
> So I can just trigger a clean reboot wait for journald, auditd, and
> syslog all to shut down and then do evil things to the machine without
> having to worry about erasing forensic evidence?

CAP_SYS_BOOT gives you kexec, and kexec with init=/bin/sh lets you do
anything. You added that in dc009d92435f99498cbc579ce76bf28e837e2c14 and
now the horse is long gone. Don't give CAP_SYS_BOOT to anything you
don't trust with full privileges.

--
Matthew Garrett | [email protected]

2015-04-21 19:55:27

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-21 15:38, Matthew Garrett wrote:
> On Tue, Apr 21, 2015 at 11:36:54AM -0500, Eric W. Biederman wrote:
>>
>> HeHeHe. You mean all I need to do to get around all of the logging servers is
>> capture CAP_SYS_BOOT? Say like just capture this crazy watchdog program
>> that doesn't run as root so that it can only reboot the system? HeHeHe
>> So I can just trigger a clean reboot wait for journald, auditd, and
>> syslog all to shut down and then do evil things to the machine without
>> having to worry about erasing forensic evidence?
>
> CAP_SYS_BOOT gives you kexec, and kexec with init=/bin/sh lets you do
> anything. You added that in dc009d92435f99498cbc579ce76bf28e837e2c14 and
> now the horse is long gone. Don't give CAP_SYS_BOOT to anything you
> don't trust with full privileges.
>
The point is that Eric's suggestion works even on kernels without
kexec(), which is significant because a significant number of security
minded people (myself included) explicitly disable kexec in their kernel
configuration.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-21 21:10:43

by Eric W. Biederman

[permalink] [raw]
Subject: Issues with capability bits and meta-data in kdbus

Greg Kroah-Hartman <[email protected]> writes:

> On Mon, Apr 13, 2015 at 07:19:49PM -0500, Eric W. Biederman wrote:
>> [email protected] (Eric W. Biederman) writes:
>>
>> > And the code that transfers the meta-data is wrong.
>>
>> In fact it is worse than I thought.
>
> Please see the email response I just wrote to Andy about this, it should
> address these misconceptions.

Nothing has changed my analysis of the kdbus code.

Hopefully now that I have made some semblance of getting some rest and
have a little bit of time I can explain the issues that I see. I will
be focusing on user namespaces and capability bits as that is my area of
the kernel.

- The userspace interface for capability bits has a version number that
determines the quantity and the meaning of the bits, kdbus does not
pass that number to userspace.

- There is a well defined translation of capability bits between user
namespaces kdbus does not perform that translation.

- Access to the capability bits is guarded with PTRACE_MAY_READ
kdbus does not honor that and thus leaks information.

- Usage of the capability bits by userspace is a layering violation.

- The layering violation results in a privilege escalation (by
definiton) whenever userspace uses the presence of the capability bits
to allow anything.

- Another kind of privilege escalation happens when userspace makes a
decision based on the abscense of in kernel capability bits.

The only safe way for userspace to use the kernel capability bits is to
decide which small handful of processes need the bits. Let those
processes be what implements userspace policy and drop the capabilities
from everything else.

With bugs at every layer of the implementation stack from implementation
to design I concluded earlier that the code that transfers these
capabilities is not at all mature or ready to be merged. Which is why I
asked for the code to be left for later until someone would pay
attention to it properly.


I will add that when playing with the unix security mechanism the design
has to be done very carefully, as the system is fragile and certain
kinds of changes modify existing tested designs into insecure code. I
fail to see any of that kind of needed care being applied to the design
of the kdbus mechanisms.


My conclusion is that the design of the meta-data passing code is
fundamentally broken. The issues I have observed with the kernel
capability bits apply to all of the other meta-data except that
meta-data that unix domain sockets already pass. The privilege
escalation issues are fundamentally unfixable, and apply in general.

As this is fundamental to kdbus, kdbus is apparently fundmanetally
broken by design.

Eric

2015-04-22 01:30:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Tue, Apr 21, 2015 at 2:06 PM, Eric W. Biederman
<[email protected]> wrote:
>
> - The userspace interface for capability bits has a version number that
> determines the quantity and the meaning of the bits, kdbus does not
> pass that number to userspace.

Well, realistically, we'll just have to freeze the version anyway.
There's no sane way not to - if anybody believes that you can change
the version without breaking all kdbus users, they are living in some
drug-induced happy-land.

So we could make the version number available to match other
interfaces, but in practice that will just make people think that the
capabilities are versioned, which in reality they are not.

> - There is a well defined translation of capability bits between user
> namespaces kdbus does not perform that translation.

Now this looks like a big oversight, and serious.

If you have a capability inside a lower namespace, and can fool the
other end of a kdbus connection into thinking that you have that
capability globally, then that sounds like a very obvious and bad
security issue. This needs fixing. That said, it's likely a fairly
simple fix.

> - Access to the capability bits is guarded with PTRACE_MAY_READ
> kdbus does not honor that and thus leaks information.

Now, this is likely not a real problem.

Yes, when you try to read other processes capabilities, you need
PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
message would do - it doesn't "read somebody elses capabilities". When
you do a kdbus write, you export your *own* capabilities. If you don't
want others to know what privileges you have, then you shouldn't be
using kdbus.

It's like saying "you need PTRACE_MAY_READ" to be able to read the
process image of another process. True. But if that other process does
a "write()" system call, then the written data will contain the data
from that process. That's how write works.

> - Usage of the capability bits by userspace is a layering violation.
>
> - The layering violation results in a privilege escalation (by
> definiton) whenever userspace uses the presence of the capability bits
> to allow anything.

Well, but that's a user space decision thing. If some system daemon
uses the capabilities of the other end to decide to accept some
message or not, that's a policy decision by that (privileged) system
daemon. The daemon itself obviously still needs to have the proper
capabilities in order to do any action that needs such capabilities.

So it's not a privilege escalation. It's intentional.

> - Another kind of privilege escalation happens when userspace makes a
> decision based on the absense of in kernel capability bits.

I don't see that this is any different from the above.

So I agree that kdbus clearly *must* translate the capabilities when
passing messages from one namespace to another. The fact that you may
have setuid rights in one container, does *not* necessarily mean that
the recipient in another namespace should think you have that
capability - because in the receiving namespace the originator may not
have that capability at all.

So I think that one is a real and serious bug. But the other
complaints seem to be off the mark. It seems quite reasonable to me to
say that a recipient should be able to distinguish between *root*
sending it a dbus message to take down the system, and some random
luser doing the same.

Linus

2015-04-22 01:53:54

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi all!

On Die, 2015-04-21 at 09:37 -0400, Havoc Pennington wrote:
[...]
> This has long been sort of the 'party line' and I've told many people
> this on the dbus mailing list over the years (almost exactly what you
> just said - that for performance-critical cases they should open a
> direct socket or use something else or whatever). Usually this makes
> app developers a little cranky because something that was going to be
> easy in their mind just got harder.

Perhaps these developers should rethink the design and protocols of
their apps - or pay the price for a stupid design which relies on heavy
IPC traffic (and usually - sooner or later - heavy network traffic).
Or - at least - deliver a (technical!) proof why this isn't feasible.

The case of "patching the kernel to lie about the kernel's command line"
just because some ill-designed user-space daemon misused it" was bad
enough and the above smells quite similarly.

Kind regards,
Bernd
--
"I dislike type abstraction if it has no real reason. And saving
on typing is not a good reason - if your typing speed is the main
issue when you're coding, you're doing something seriously wrong."
- Linus Torvalds

2015-04-22 01:54:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Tue, Apr 21, 2015 at 6:30 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 21, 2015 at 2:06 PM, Eric W. Biederman
> <[email protected]> wrote:
>> - Access to the capability bits is guarded with PTRACE_MAY_READ
>> kdbus does not honor that and thus leaks information.
>
> Now, this is likely not a real problem.
>
> Yes, when you try to read other processes capabilities, you need
> PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
> message would do - it doesn't "read somebody elses capabilities". When
> you do a kdbus write, you export your *own* capabilities. If you don't
> want others to know what privileges you have, then you shouldn't be
> using kdbus.
>
> It's like saying "you need PTRACE_MAY_READ" to be able to read the
> process image of another process. True. But if that other process does
> a "write()" system call, then the written data will contain the data
> from that process. That's how write works.

There's an interesting philosophical question here.

If kdbus were a general purpose IPC tool and if the libraries would
expose nice knobs like "set this flag if and only if you want to
assert CAP_WHATEVER to the server", then maybe this would be okay.

But I don't believe that for a second. AFAICS sd-bus (maybe the
primary implementation) will always set that flag if for no other
reason than that it *doesn't know* when the client is trying to assert
a capability. So we'd be giving users a gun which is, in practice,
only ever pointed at the users' feet.

It's a little worse than that, since this gun also shoots
cap_permitted, cap_inheritable, and bset bullets, none of which seem
usable for anything other than foot-shooting even in the best case.
And don't even get me started about some of the other metadata items.

All of this completely ignores the fact that selinux can and does
restrict access to /proc and kdbus will never respect those
restrictions. Also, I'd like to see us move closer to a world where
real distros set hide_pid=1, and this is a big step in the opposite
direction.

--Andy

2015-04-22 02:32:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <[email protected]> wrote:
>
> If kdbus were a general purpose IPC tool

.. but it's not ..

> and if the libraries would
> expose nice knobs like "set this flag if and only if you want to
> assert CAP_WHATEVER to the server", then maybe this would be okay.

I really don't agree.

The whole notion that you should be able to be anonymous when
communicating is *wrong*.

Now, when you talk across machines using TCP, there's no identity that
you can trust, so anonymity is kind of enforced. But locally, if you
want to connect with somebody else, I actually think that tryin gto be
anonymous is just stupid - and more than that - plain wrong.

Yeah, if you do a pipe, that's one thing. You don't "connect" to
somebody else with a pipe, you just create both end points. So there
is little point in having identifying information for pipes. But when
you connect to a service, it just *makes sense* for the other end to
know about you. They should know your user ID, they should know your
identity (pid or whatever), and they should know your capabilities.

If you don't want that, then you use some anonymizing service, and it
because *your* problem. But a server that gets connected by different
people should know who it gets connected by.

Unix domain sockets simply got this wrong.

I don't think thi sis the problem of kdbus. There may be *other&*
problems, but I think it's very reasonable to just have as a basic
*requirement* that when you get connected, you get to know who
connects you, and what their rights are. Because it really *is*
something fundamental, and something important to know. Is it some
random nobody, or is it a system service?

If I was a service writer, that would be *the* most basic requirement
I would have. No ifs, buts or maybes about it.

Linus

2015-04-22 03:12:15

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 9:51 PM, Bernd Petrovitsch
<[email protected]> wrote:
> Hi all!
>
> On Die, 2015-04-21 at 09:37 -0400, Havoc Pennington wrote:
> [...]
>> This has long been sort of the 'party line' and I've told many people
>> this on the dbus mailing list over the years (almost exactly what you
>> just said - that for performance-critical cases they should open a
>> direct socket or use something else or whatever). Usually this makes
>> app developers a little cranky because something that was going to be
>> easy in their mind just got harder.
>
> Perhaps these developers should rethink the design and protocols of
> their apps - or pay the price for a stupid design which relies on heavy
> IPC traffic (and usually - sooner or later - heavy network traffic).
> Or - at least - deliver a (technical!) proof why this isn't feasible.
>
> The case of "patching the kernel to lie about the kernel's command line"
> just because some ill-designed user-space daemon misused it" was bad
> enough and the above smells quite similarly.
>

I don't think it's ridiculous that app developers try the clean,
simple solution first (use one IPC for everything) and only optimize
once they discover they need to.

If dbus were faster, many of these designs might not be stupid and
might not be a mis-use. And the app might delete a lot of code, which
is a plus for anyone using that app. More code = more bugs after all.

I grant you that some apps have bad code, but I don't think it's my
job to punish them by making things slow on purpose. I only made
things slow because I didn't know a way to make them fast without
sacrificing a more important goal, but it was never ideal. The kdbus
developers have proposed a way out of the tradeoff.

Havoc

2015-04-22 03:20:13

by Andy Lutomirski

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Tue, Apr 21, 2015 at 7:32 PM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <[email protected]> wrote:
>>
>> If kdbus were a general purpose IPC tool
>
> .. but it's not ..
>
>> and if the libraries would
>> expose nice knobs like "set this flag if and only if you want to
>> assert CAP_WHATEVER to the server", then maybe this would be okay.
>
> I really don't agree.
>
> The whole notion that you should be able to be anonymous when
> communicating is *wrong*.
>
> Now, when you talk across machines using TCP, there's no identity that
> you can trust, so anonymity is kind of enforced. But locally, if you
> want to connect with somebody else, I actually think that tryin gto be
> anonymous is just stupid - and more than that - plain wrong.
>
> Yeah, if you do a pipe, that's one thing. You don't "connect" to
> somebody else with a pipe, you just create both end points. So there
> is little point in having identifying information for pipes. But when
> you connect to a service, it just *makes sense* for the other end to
> know about you. They should know your user ID, they should know your
> identity (pid or whatever), and they should know your capabilities.
>
> If you don't want that, then you use some anonymizing service, and it
> because *your* problem. But a server that gets connected by different
> people should know who it gets connected by.
>
> Unix domain sockets simply got this wrong.
>
> I don't think thi sis the problem of kdbus. There may be *other&*
> problems, but I think it's very reasonable to just have as a basic
> *requirement* that when you get connected, you get to know who
> connects you, and what their rights are. Because it really *is*
> something fundamental, and something important to know. Is it some
> random nobody, or is it a system service?

Where do you draw the line?

In the dbus dream world, if I type "wget
http://www.example.com/foobar", then my DNS resolver knows the whole
URL I'm downloading (cmdline), that I'm wget (or that I'm pretending
to be wget), that I inherited these capabilities from my caller (using
the totally broken capability inheritance model, but whatever), that I
started at time such-and-such (breaking CRIU), that my gid is
such-and-such, etc. That DNS resolver ideally should sandbox itself,
but it now *necessarily* has this information leak in to its sandbox.

It gets worse. My little tray notifier widget that shows the progress
bar also learns all this information. Heck, my tray notifier probably
also finds out what capabilities systemd is holding every time I plug
in a USB stick, because of broadcasts.

It also conflates information-for-the-hell-of-it, auditing, and
authorization. When I write a program that tries to drop privileges,
I need to know *what those privileges are*. When every damn attribute
of my process might be seen as a privilege by whatever daft daemon,
system or otherwise, I'm talking to, it's *really hard* to figure out
what's exposed if I get compromised.

>
> If I was a service writer, that would be *the* most basic requirement
> I would have. No ifs, buts or maybes about it.

Sorry, but I don't believe you. Do you really need to know all this
random information about everyone who connects? Why only for local
users? Why is it different over a network? Which of those pieces of
information are you going to use for authentication? Which are
potential information leaks for a reason that didn't even exist when
you wrote your service?

If I were a service writer, I would define what privileges are needed
to do what, and I would require those and no more. I don't want to
know more, because everything else I learn for no good reason is a
potential security problem.

As a concrete example, remember all those lovely euid and caps bugs we
found in write(2) a couple years ago? The only reason those bugs were
possible is because write(2) is implemented in the kernel, and the
kernel can do whatever the hell it wants including incorrectly looking
at caps in write(2) and those bugs were IMO especially embarrassing
because all of the code needed for the stupid interaction that caused
those bugs is in the kernel tree. Everyone "knows" that write(2)
*must not honor effective caps*, but a lot of people forgot.

I'd much rather have dbus not tell the peer random stupid things about
me that the peer can fuck up than have dbus tell the peer all this
stuff for the hell of it, because the peer *will* fuck it up, and that
fuckup might only be discoverable when you look at the interaction of
several different programs supplied by several different vendors and
written years apart.

--Andy

2015-04-22 08:58:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote:
> And the code that transfers the meta-data is wrong.
>
> It is generally not something that userspace requires today, certainly
> userspace is not using it.
>
> You are exporting a weird set of information in a unique way that makes
> it race free enough to make ``security'' decisions upon but the data
> in general is not appropriate to make those decisions.
>
> I remain opposed to this half thought out trash of an ABI for the
> meta-data.
>
> Just because something happens to be exported in a DEBUG api today does
> not make it appropriate for userspace to run around making security
> decisions with that information.
>
> Nacked-by: "Eric W. Biederman" <[email protected]>
>
> I think it is premature to be merging kdbus. You have fuddamental
> issues that can not be fixed once the ABI is frozen.
>
> The semantics of the meta-data you export are extremely poorly defined.

Not only that - it looks like a serious amount of work on each sent
packet. So I did some staring, correct me if I missed something:

kdbus_cmd_send - KDBUS_CMD_SEND, ioctl cmd, copy stuff from userspace
|-> kdbus_kmsg_new_from_cmd(), kmalloc+memset + prepare a *lot* of stuff like:
|-> m->proc_meta = kdbus_meta_proc_new();
m->conn_meta = kdbus_meta_conn_new();
...
|-> kdbus_bus_broadcast(conn->ep->bus, conn, kmsg); let's look at the broadcast mode
|-> hash_for_each(bus->conn_hash, i, conn_dst, hentry) { iterate over hash buckets, O(256)
|-> kdbus_meta_proc_collect(kmsg->proc_meta, attach_flags); collect a *lot* of stuff from current etc
|-> kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src, attach_flags); collect more stuff

and this happens on *every* send. A *lot* of work.

Now multiply that by the amount of messages this thing is going to send
per second. It piles up. So you have the overhead right then and there
in the design without even being able to fix it. Or at least pretty damn
hard to fix.

So unless I'm missing something, this right there is a design problem.

Why can't this messaging be done with a nifty O(1) scheme like sending
parties issuing auth tokens and whatever and the kernel doing the
arbitration and distribution of those tokens?

That gets you sandboxing, dropping privileges and whatever else fancy
containers people wanna do for free. Token recipient has the token -
that's all that counts.

Again, this is from a short staring only, I might just as well be
missing something but you'll tell me :-)

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-22 10:46:42

by Alan Cox

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

> > - Access to the capability bits is guarded with PTRACE_MAY_READ
> > kdbus does not honor that and thus leaks information.
>
> Now, this is likely not a real problem.
>
> Yes, when you try to read other processes capabilities, you need
> PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
> message would do - it doesn't "read somebody elses capabilities". When
> you do a kdbus write, you export your *own* capabilities. If you don't
> want others to know what privileges you have, then you shouldn't be
> using kdbus.

That's broken but fixable.

It should not share any capability information *unless* you pass a flag
which says "flash my security badges around".

That fails safe (descriptor passed to another process), and gives a
default behaviour which is non surprising, non leaky and useful for
general purposes. This is also mirroring AF_LOCAL/AF_UNIX where you have
to choose to wave your bits in public.

(again its showing that kdbus really should be done by adding multicast
reliable delivery to AF_LOCAL sockets)

> So I think that one is a real and serious bug. But the other
> complaints seem to be off the mark. It seems quite reasonable to me to
> say that a recipient should be able to distinguish between *root*
> sending it a dbus message to take down the system, and some random
> luser doing the same.

Agreed but there are better ways to do this including opening some
kind of capability object and passing it as proof.

Also do I need to be root when I send the message or root when you ask ...


Alan

2015-04-22 11:40:33

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On 2015-04-21 22:32, Linus Torvalds wrote:
> On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <[email protected]> wrote:
>>
>> If kdbus were a general purpose IPC tool
>
> .. but it's not ..
>

Except, IIRC, that was one of the stated design goals in the original
patch set. I'm pretty sure that i remember a rather verbose exposition
that pretty much could be summarized as "Linux has no general purpose
IPC in the kernel, this fixes that"



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-22 11:41:09

by David Herrmann

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

Hi

On Wed, Apr 22, 2015 at 3:30 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Apr 21, 2015 at 2:06 PM, Eric W. Biederman
> <[email protected]> wrote:
>> - There is a well defined translation of capability bits between user
>> namespaces kdbus does not perform that translation.
>
> Now this looks like a big oversight, and serious.
>
> If you have a capability inside a lower namespace, and can fool the
> other end of a kdbus connection into thinking that you have that
> capability globally, then that sounds like a very obvious and bad
> security issue. This needs fixing. That said, it's likely a fairly
> simple fix.

kdbus drops capability items if we cross user-namespaces. This
security issue was fixed in v2.

I think the translation Eric was referring to, is what cap_capable()
(security/commoncap.c) does. That is, it translates capabilities of
parent namespaces into its child namespaces (making the parent
privileged in the child namespace). Right now, we don't do this.
Instead, we drop the item, so the receiver knows that the information
could not be gathered (unlike a zeroed capability item, which means
the sender does not have the capabilities).

I have an experimental patch to support translation [1]. However, I
dislike copying the code from security/ into kdbus. So if we introduce
translation later on, I'd like to figure out with LSM developers how
to do this best.

Also note that /proc does not do any translation on its own, which
makes cap_get_pid() tricky to use.

>> - Access to the capability bits is guarded with PTRACE_MAY_READ
>> kdbus does not honor that and thus leaks information.
>
> Now, this is likely not a real problem.
>
> Yes, when you try to read other processes capabilities, you need
> PTRACE_MAY_READ to see them. HOWEVER, that's not really what a kdbus
> message would do - it doesn't "read somebody elses capabilities". When
> you do a kdbus write, you export your *own* capabilities. If you don't
> want others to know what privileges you have, then you shouldn't be
> using kdbus.

I fully agree.

And to clear things up: there is no such PTRACE_MODE_READ protection
for capability bits in /proc.

Thanks!
David

[1] http://cgit.freedesktop.org/~dvdhrm/linux/commit/?h=kdbus&id=894085ff39afc653ed711102fa698d937818ce1f

2015-04-22 13:07:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, 2015 at 07:40:25AM -0400, Austin S Hemmelgarn wrote:
> On 2015-04-21 22:32, Linus Torvalds wrote:
> >On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <[email protected]> wrote:
> >>
> >>If kdbus were a general purpose IPC tool
> >
> > .. but it's not ..
> >
>
> Except, IIRC, that was one of the stated design goals in the original patch
> set. I'm pretty sure that i remember a rather verbose exposition that
> pretty much could be summarized as "Linux has no general purpose IPC in the
> kernel, this fixes that"

Did I say that somewhere? Here's what the patchset has always started
with every time I have posted it for review, starting back last year in
October:

kdbus is a kernel-level IPC implementation that aims for
resemblance to the the protocol layer with the existing
userspace D-Bus daemon while enabling some features that
couldn't be implemented before in userspace.

2+ years ago, I had the dream that maybe we could make kdbus into the
"general purpose IPC layer for the kernel", but in working through all
of the issues, and the requirements of the userspace users and
protocols, it just really didn't work out that way, sorry.

I know some people would like such a "general purpose IPC", but perhaps
because no one has ever done it, maybe it either can't be done, or that
no one really wants such a thing. :)

thanks,

greg k-h

2015-04-22 13:10:33

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 21, 2015 at 09:37:44AM -0400, Havoc Pennington wrote:
>
> I think the pressure to use dbus happens for several reasons, if you
> use a side channel some example complaints people have are:
>
> * you have to reinvent any dbus solutions for security policy,
> containerization, debugging, introspection, etc.
> * you're now writing custom socket code instead of using the
> high-level dbus API
> * the side channel loses message ordering with respect to dbus messages
> * your app code is kind of "infected" structurally by a performance
> optimization concern
> * you have to decide in advance which messages are "too big" or "too
> numerous" - which may not be obvious, think of a cut-and-paste API,
> where usually it's a paragraph of text but it could in theory be a
> giant image
> * you can't do big/numerous multicast, side channel only solves the unicast
>
> There's no doubt that it's possible to use a side channel - just as it
> was possible to construct an ad hoc IPC system prior to dbus - but the
> overall OS (counting both kernel and userspace) perhaps becomes more
> complex as a result, compared to having one model that supports more
> cases.
>
> One way to frame it: the low performance makes dbus into a relatively
> leaky abstraction where there's this surprise lurking for app
> developers that they might have to roll their own IPC on the side or
> special-case some of their messages.
>
> it's not the end of the world, it's just that it would have a certain
> amount of overall simplicity (counting userspace+kernel together) if
> one solution covered almost all use-cases in this "process-to-process
> comms on local system" scenario, instead of 90% of use-cases but too
> slow for the last 10%. The simplicity here isn't only for app
> developers, it's also for anyone doing debugging or administration or
> system integration, where they can deal with one system _or_ one
> system plus various ad-hoc side channels.

Clearly it is not useful to put the burden on the app developers.
However, I do not (yet?) understand why direct links couldn't be added
to the DBus daemon and library and be used fairly transparently
by apps:
- allow peers to announce "I allow direct connect"
(we don't want to many sockets/connections, just e.g.
gconf, polkit, ... where it matters for performance)
- when clients do an RPC call, check if the server allows direct
connect and then do it (via DBus daemon as helper)
- obviously the clients would maintain the connection to the
DBus daemon for the remaining purposes

Of course. that means the DBus daemon cannot enforce the policy
anymore, you could use the same database but the code which uses
it would have to be moved into the dbus library.
I must admit that I do not understand the importance of
message ordering between some RPC call and other messages
via the DBus daemon since the app can do the RPC call at any time.

Wrt big/numerous multicast, you are right that this wouldn't
solve it, but doesn't seem the problem we need to address?
At least I've not seen any performance measurements which
would indicate it.

That all said, I'm not opposed at all to adding kernel
infrastructure for the benefit of DBus. However, I am
quite disappointed both by the monolithic, single-purpose
design of the kdbus API, and especially by the way it
is presented to the kernel community. What I mean by the
latter is that we get an amount kernel code which you cannot
understand unless you also understand the userspace
DBus *and* the actual usage of DBus in desktop systems,
and this is accompanied with statements along the line
of "many smart people worked on this for two years and
everyone agreed". I.e., we only get the solution
but not the bckground knowledge to understand and judge
the solution for ourselves.

What I had appreciated instead:
- performance meansurement results which demontrate
the problem and the actual DBus use in practice for
various message type / use cases
- an account of the attempts that have been made to
fix it and the reasons why they failed, so we can
understand how the current design has evolved

The latter may be asking a lot, but IPC is a core OS feature
which comes right after CPU and memory resource management
and basic I/O. The basic IPC APIs are fairly simple, the
socket API is already quite complex, and kdbus goes to
another level of complexity and cruftiness, and with all
the words which have been written in this thread there is
still not an adequate justification for it.

For example, I do understand the policy database has to be
in the kernel as it is checked for every message, but I
don't see why the name service needs to be in the kernel.
I suspect (lacking performance figures) that name ownership
changes are relatively rare, and lookups, too (ISTR you mentioned
clients cache the result).

For the base messaging and policy filtering I don't see why
this has to be one monolithic API and not split in a
fairly simple, general purpose messaging API, and a completely
seperate API for configuring the filters and attaching them
to the bus.


Johannes

2015-04-22 13:28:30

by Havoc Pennington

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, 2015 at 7:40 AM, Austin S Hemmelgarn
<[email protected]> wrote:
> Except, IIRC, that was one of the stated design goals in the original patch
> set. I'm pretty sure that i remember a rather verbose exposition that
> pretty much could be summarized as "Linux has no general purpose IPC in the
> kernel, this fixes that"
>

This is probably just debating definitions and technicalities, but
what I'd say is that dbus is pretty universally applicable *within*
the case of connecting apps and services on the local machine. That's
what it's for. Right now it probably works for I don't know, 85-90% of
that, with kdbus trying to take it closer to 100% by removing
performance concerns and early boot concerns that currently rule out
certain uses.

If we say it isn't "general purpose" we could mean more than one thing -

- it's a complete system / batteries-included, with a defined
protocol, vs. a "make your own protocol" kit
- it isn't especially appropriate as a cross-machine protocol,
whether you mean within a cluster or across the internet
- it isn't portable in a very useful way (it kind of runs on
windows/mac but isn't the native way of doing things there)

On the other hand, it is "general purpose" in the sense that so many
apps and services are using it for so many purposes already (i.e. it
isn't tied to a particular kind of app or service).

I just opened d-feet on my workstation which is default-ish Fedora 21,
I have ~35 well-known names available on the system bus, and ~100 (got
tired of counting) on the session bus. These are all kinds of
different apps and services.

d-feet incidentally is a good way to explore current usage of dbus -
you can list names, list objects within names, list methods/properties
on objects, and even call methods from d-feet. (There are command line
alternatives too like `gdbus`.)

Havoc

2015-04-22 13:47:01

by David Herrmann

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

Hi

On Wed, Apr 22, 2015 at 5:19 AM, Andy Lutomirski <[email protected]> wrote:
> Where do you draw the line?

User-space draws _this_ line.

A bus creator can set the "mandatory metadata mask" of a bus. It
defines a mask all senders (!) have to use as base. The bus creator
can thus mandate a policy for its bus and force everyone who wants to
communicate via this bus to at least agree to transmit the requested
set of information. Using UIDs+GIDs+PIDs+seclabel+names as masks works
just fine.

To be clear, kdbus only transmits metadata that sender and receiver
both agreed on. Both peers have to opt-in for an item to be
transmitted.

Thanks
David

2015-04-22 14:06:09

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On 2015-04-22 09:07, Greg Kroah-Hartman wrote:
> On Wed, Apr 22, 2015 at 07:40:25AM -0400, Austin S Hemmelgarn wrote:
>> On 2015-04-21 22:32, Linus Torvalds wrote:
>>> On Tue, Apr 21, 2015 at 6:54 PM, Andy Lutomirski <[email protected]> wrote:
>>>>
>>>> If kdbus were a general purpose IPC tool
>>>
>>> .. but it's not ..
>>>
>>
>> Except, IIRC, that was one of the stated design goals in the original patch
>> set. I'm pretty sure that i remember a rather verbose exposition that
>> pretty much could be summarized as "Linux has no general purpose IPC in the
>> kernel, this fixes that"
>
> Did I say that somewhere? Here's what the patchset has always started
> with every time I have posted it for review, starting back last year in
> October:
>
> kdbus is a kernel-level IPC implementation that aims for
> resemblance to the the protocol layer with the existing
> userspace D-Bus daemon while enabling some features that
> couldn't be implemented before in userspace.
>
> 2+ years ago, I had the dream that maybe we could make kdbus into the
> "general purpose IPC layer for the kernel", but in working through all
> of the issues, and the requirements of the userspace users and
> protocols, it just really didn't work out that way, sorry.
>
I think it may have been someone else elaborating on this ideal that I
was remembering. Personally, I could care less whether it is considered
'general purpose', as far as I'm concerned, POSIX semaphores, shm, and
UDS fit all the IPC I ever need. On that note, I have considered trying
to implement SOCK_SEQPACKET support for AF_LOCAL, although I've gotten
by just fine using SCTP over the loop-back interface.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-22 14:35:24

by Michele Curti

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

Hi Havoc.

On Wed, Apr 22, 2015 at 09:27:56AM -0400, Havoc Pennington wrote:
>
> If we say it isn't "general purpose" we could mean more than one thing -
>
> - it's a complete system / batteries-included, with a defined
> protocol, vs. a "make your own protocol" kit
> - it isn't especially appropriate as a cross-machine protocol,
> whether you mean within a cluster or across the internet
> - it isn't portable in a very useful way (it kind of runs on
> windows/mac but isn't the native way of doing things there)
>
> On the other hand, it is "general purpose" in the sense that so many
> apps and services are using it for so many purposes already (i.e. it
> isn't tied to a particular kind of app or service).
>

Just out of curiosity, would you like to change something in dbus design,
if you didn't have to worry about ABI breaks and the like?

Thanks,
Michele

2015-04-22 14:57:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue 21-04-15 11:11:35, Andy Lutomirski wrote:
> On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <[email protected]> wrote:
> > On Tue 21-04-15 16:01:01, David Herrmann wrote:
> >> Hi
> >>
> >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
> >> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
> >> >> Hi
> >> >>
> >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> >> >> <[email protected]> wrote:
> >> >> >> On top of that, I think that someone into resource management needs to
> >> >> >> seriously consider whether having a broadcast send do get_user_pages
> >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> >> >> >> is a good idea.
> >> >> >
> >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
> >> >> > pass bits of a GEM object thereby keeping entire graphics textures
> >> >> > referenced 8)
> >> >>
> >> >> We do not use GUP, nor do we pass around pinned pages. All we use is
> >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> >> >> copy_from_user() internally relies on GUP or not, is an orthogonal
> >> >> issue that does not belong here.
> >> >
> >> > It kind of does AFAIU.
> >>
> >> No, it is not. The issue with GUP is that you elevate the page
> >> ref-count and thus prevent lru isolation, sealing, whatsoever.
> >
> > The point was that such a memory might be not present yet and need a
> > page fault with all the side effects - memory reclaim, memcg charge...
> >
> >> I cannot see how it is related to kdbus. However, ...
> >>
> >> > If for nothing else then the memcg reasons mentioned in
> >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
> >> > been charged yet (read faulted in) and then kdbus forced to fault it in
> >> > a different user's context then you basically allow to hide memory
> >> > allocations from the memcg. That is a clear show stopper.
> >> >
> >> > Or have I misunderstood the way how shmem buffers are used here?
> >>
> >> ..as you mentioned memcg, lets figure that out here. shmem buffers are
> >> used as receive-buffers by kdbus peers. They are read-only to
> >> user-space. All allocations are done by the kernel on message passing.
> >
> > OK, so the shmem buffer is allocated on the kernels behalf and under
> > its control and no userspace can hand over one to kdbus. Do I get
> > it right? If yes then the memcg escape I was describing above is
> > not possible of course. This wasn't clear to me from the previous
> > discussion. Thanks for the clarification!
>
> I'm still missing something here, I think. At the time of pool
> creation, the kernel calls shmem_file_setup in the context of the
> untrusted user. Then, when a privileged daemon broadcasts, the kernel
> calls vfs_iter_write or similar, thus allocating the page, right? I
> don't see why the page would be allocated early or why vfs_iter_write
> and the associated shmem code would care what memcg created the shmem
> file -- all of that code seems to use current's memcg on brief
> inspection.

Yes it is the current task on the first charge or the original memcg on
the swap in. But my understanding from the above, and I haven't read the
code yet, is that the untrusted userspace is only reader from the buffer
and isn't allowed to modify the buffer.

> Bear in mind that the bad guy gets to use madvise, etc to mess around
> with the page cache state.

How can an untrusted user play with shmem when it is read-only?
shmem_file_setup shouls create an unlinked file so no process can access
it via tmpfs AFAIU and potentially fault the memory before the producent
will fill it up (thus fault in in the trusted context). I have no idea
how the receiver gets to the buffer though.

--
Michal Hocko
SUSE Labs

2015-04-22 19:36:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Apr 22, 2015 7:57 AM, "Michal Hocko" <[email protected]> wrote:
>
> On Tue 21-04-15 11:11:35, Andy Lutomirski wrote:
> > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <[email protected]> wrote:
> > > On Tue 21-04-15 16:01:01, David Herrmann wrote:
> > >> Hi
> > >>
> > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
> > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
> > >> >> Hi
> > >> >>
> > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> > >> >> <[email protected]> wrote:
> > >> >> >> On top of that, I think that someone into resource management needs to
> > >> >> >> seriously consider whether having a broadcast send do get_user_pages
> > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> > >> >> >> is a good idea.
> > >> >> >
> > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
> > >> >> > pass bits of a GEM object thereby keeping entire graphics textures
> > >> >> > referenced 8)
> > >> >>
> > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is
> > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal
> > >> >> issue that does not belong here.
> > >> >
> > >> > It kind of does AFAIU.
> > >>
> > >> No, it is not. The issue with GUP is that you elevate the page
> > >> ref-count and thus prevent lru isolation, sealing, whatsoever.
> > >
> > > The point was that such a memory might be not present yet and need a
> > > page fault with all the side effects - memory reclaim, memcg charge...
> > >
> > >> I cannot see how it is related to kdbus. However, ...
> > >>
> > >> > If for nothing else then the memcg reasons mentioned in
> > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
> > >> > been charged yet (read faulted in) and then kdbus forced to fault it in
> > >> > a different user's context then you basically allow to hide memory
> > >> > allocations from the memcg. That is a clear show stopper.
> > >> >
> > >> > Or have I misunderstood the way how shmem buffers are used here?
> > >>
> > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are
> > >> used as receive-buffers by kdbus peers. They are read-only to
> > >> user-space. All allocations are done by the kernel on message passing.
> > >
> > > OK, so the shmem buffer is allocated on the kernels behalf and under
> > > its control and no userspace can hand over one to kdbus. Do I get
> > > it right? If yes then the memcg escape I was describing above is
> > > not possible of course. This wasn't clear to me from the previous
> > > discussion. Thanks for the clarification!
> >
> > I'm still missing something here, I think. At the time of pool
> > creation, the kernel calls shmem_file_setup in the context of the
> > untrusted user. Then, when a privileged daemon broadcasts, the kernel
> > calls vfs_iter_write or similar, thus allocating the page, right? I
> > don't see why the page would be allocated early or why vfs_iter_write
> > and the associated shmem code would care what memcg created the shmem
> > file -- all of that code seems to use current's memcg on brief
> > inspection.
>
> Yes it is the current task on the first charge or the original memcg on
> the swap in. But my understanding from the above, and I haven't read the
> code yet, is that the untrusted userspace is only reader from the buffer
> and isn't allowed to modify the buffer.
>
> > Bear in mind that the bad guy gets to use madvise, etc to mess around
> > with the page cache state.
>
> How can an untrusted user play with shmem when it is read-only?
> shmem_file_setup shouls create an unlinked file so no process can access
> it via tmpfs AFAIU and potentially fault the memory before the producent
> will fill it up (thus fault in in the trusted context). I have no idea
> how the receiver gets to the buffer though.
>

The receiver gets to mmap the buffer. I'm not sure what protection they get.

The thing I'm worried about is that the receiver might deliberately
avoid faulting in a bunch of pages and instead wait for the producer
to touch them, causing pages that logically belong to the receiver to
be charged to the producer instead.

--Andy

> --
> Michal Hocko
> SUSE Labs

2015-04-22 20:03:07

by Havoc Pennington

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, 2015 at 10:35 AM, Michele Curti <[email protected]> wrote:
>
> Just out of curiosity, would you like to change something in dbus design,
> if you didn't have to worry about ABI breaks and the like?
>

Good question. I can't remember any big-picture things, I'm sure the
current maintainers and users have a longer list. :-) There are a
variety of little small things, some examples I can immediately think
of:

* the ad hoc authentication protocol is sort of ugly
* the byte order marker in every message is silly
* protocol version in every message is useless
* Ryan Lortie's nice fixes in GVariant, which I think kdbus adopts (
https://people.gnome.org/~ryanl/gvariant-serialisation.pdf ), for the
most part these are 'cleanups' but nullable types ("maybe" types for
Haskell fans) are a notable semantic addition
* specify how it works on Windows, the Windows port last I checked
(years ago) didn't do things in a Windows-sensible way
* specify what happens when resource limits are reached
* wouldn't use XML for introspection data these days
http://dbus.freedesktop.org/doc/dbus-specification.html#introspection-format

The implementation has more problems:

* libdbus had a flawed goal (be the underlying implementation used by
higher-level libs), it turns out it's better to implement the protocol
in every lib, libdbus was trying to serve too many masters. libdbus is
slow and has an annoying API, and the protocol is simple enough for
every "stack" (glib, python, etc.) to implement it themselves.
* rethink what happens when hitting resource limits in the bus
daemon, as discussed in an earlier sub-thread
* OOM handling code in the daemon is quite a burden, maybe there's a
better way http://blog.ometer.com/2008/02/04/out-of-memory-handling-d-bus-experience/
* config file format, security policy stuff... work to do here

Havoc

2015-04-22 21:48:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, 2015 at 1:02 PM, Havoc Pennington <[email protected]> wrote:
> * the byte order marker in every message is silly

It's worse than that.

Conditional byte order is worse than silly - it's terminally stupid.

This is not a "per connection" thing or a "every message"; thing. It's
more fundamental than that. Protocols that have dynamic byte orders
are pure and utter crap.

The only sane model is to specify one fixed byte order. Seriously.
It's equally portable, it generates better code - even on
architectures that then have to unconditionally do byte order swapping
- and it's simpler to add static type checks for etc. It's literally
less code and faster to do a "bswap" instruction than to do a
conditional test of some variable (even if you can then avoid the
bswap dynamically),

In other words, think networking, which statically just decided to use
big-endian. Sure, that was the wrong choice in the end, but even
picking the wrong endianness - but picking it statically - is better
than the horrible mistake of thinking that you should have some
variable byte order.

Linus

2015-04-23 05:36:05

by Havoc Pennington

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, 2015 at 5:48 PM, Linus Torvalds
<[email protected]> wrote:
>
> Conditional byte order is worse than silly - it's terminally stupid.
>

Hey, usually I write a long rant myself, but I was trying to keep it
to one bullet point for once in my life. Way to ruin it, geez.

Havoc

2015-04-23 08:39:40

by Michele Curti

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, 2015 at 04:02:34PM -0400, Havoc Pennington wrote:
> On Wed, Apr 22, 2015 at 10:35 AM, Michele Curti <[email protected]> wrote:
> >
> > Just out of curiosity, would you like to change something in dbus design,
> > if you didn't have to worry about ABI breaks and the like?
> >
>
> Good question. I can't remember any big-picture things, I'm sure the
> current maintainers and users have a longer list. :-) There are a
> variety of little small things, some examples I can immediately think
> of:
>
> * the ad hoc authentication protocol is sort of ugly
> * the byte order marker in every message is silly
> * protocol version in every message is useless
> * Ryan Lortie's nice fixes in GVariant, which I think kdbus adopts (
> https://people.gnome.org/~ryanl/gvariant-serialisation.pdf ), for the
> most part these are 'cleanups' but nullable types ("maybe" types for
> Haskell fans) are a notable semantic addition
> * specify how it works on Windows, the Windows port last I checked
> (years ago) didn't do things in a Windows-sensible way
> * specify what happens when resource limits are reached
> * wouldn't use XML for introspection data these days
> http://dbus.freedesktop.org/doc/dbus-specification.html#introspection-format
>

Nice, thanks!

It seems that all of these are userspace related only. Yes I saw a "gvariant
readme" in systemd sources, now I understood what it is (I'm not an expert) :D

My only fear was that kdbus was trying to keep something that even dbus himself
don't want. But it seems that this is not the case.

Thanks, regards,
Michele

2015-04-23 13:05:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 13, 2015 at 09:03:50PM +0200, Greg Kroah-Hartman wrote:
> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>
> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>
> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>
> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>
> ----------------------------------------------------------------

Given this has been a crazy email thread, let's try to figure out what
the status is here.

Al Viro pointed out some odd locking (r/w lock only used in write mode),
and asked for some more documentation / description of the object model
used here.? David provided that, and will send a minor fix for the rw
lock, so I think that issue is now resolved. David has created a few
other minor changes based on Al's review that I will forward on later.

Andy's concerns about the capability stuff has been hashed out in
multiple threads here.? The kernel code isn't buggy as-designed or
implemented from what we can all tell, it's just that the new
functionality isn't liked by everyone, which is totally fair, but not a
reason to declare that the function isn't useful.

Alan, and others, want a tiny, generic, multi-cast IPC method that also
works across networks.? They feel that this is something that D-Bus
might be able to use in the future in userspace to build on top of.?
Lots of people have said they want something like this for years, but
that doesn't address the issue here with kdbus, which is a very specific
solution for a very common and wide-spread usage model that Linux
userspace relies on today.? I too would love to see such an IPC be
created, and two years ago thought it would be possible to achieve
here.? But over time, and in working with the D-Bus model and
requirements, it just didn't happen here.? Given that no one has ever
been able to accomplish such a thing in the past means that it's either
impossible to do, or that no one really wants such a thing bad enough to
actually do the work :)

Did I miss anything else here?? Are there any technical reasons I'm
forgetting about for why this can't be pulled in as-is for this merge
window?

As for merging this, due to some changes in the vfs tree, specifically
due to 5d5d56897530 ("make new_sync_{read,write}() static"), after the
kdbus code is merged with your latest tree, it can cause problems, as
reported by Sergei Zviagintsev. I didn't want to rebase anything, and
solving the issue against 3.19 would require us to export __vfs_read(),
as Al already did in your tree, so you can just merge it, and then apply
the patch I'll send in response to this message for it, which resolves
the issue.

thanks,

greg k-h

2015-04-23 13:06:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH] kdbus: pool: use __vfs_read()

From: Sergei Zviagintsev <[email protected]>

After commit 5d5d56897530 ("make new_sync_{read,write}() static")
->read() cannot be called directly.

kdbus_pool_slice_copy() leads to oops, which can be reproduced by
launching tools/testing/selftests/kdbus/kdbus-test -t message-quota:

[ 1167.146793] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1167.147554] IP: [< (null)>] (null)
[ 1167.148670] PGD 3a9dd067 PUD 3a841067 PMD 0
[ 1167.149611] Oops: 0010 [#1] SMP
[ 1167.150088] Modules linked in: nfsv3 nfs kdbus lockd grace sunrpc
[ 1167.150771] CPU: 0 PID: 518 Comm: kdbus-test Not tainted 4.0.0-next-20150420-kdbus #62
[ 1167.150771] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1167.150771] task: ffff88003daed120 ti: ffff88003a800000 task.ti: ffff88003a800000
[ 1167.150771] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
[ 1167.150771] RSP: 0018:ffff88003a803bc0 EFLAGS: 00010286
[ 1167.150771] RAX: ffff8800377fb000 RBX: 00000000000201e8 RCX: ffff88003a803c00
[ 1167.150771] RDX: 0000000000000b40 RSI: ffff8800377fb4c0 RDI: ffff88003d815700
[ 1167.150771] RBP: ffff88003a803c48 R08: ffffffff8139e380 R09: ffff880039d80490
[ 1167.150771] R10: ffff88003a803a90 R11: 00000000000004c0 R12: 00000000002a24c0
[ 1167.150771] R13: 0000000000000b40 R14: ffff88003d815700 R15: ffffffff8139e460
[ 1167.150771] FS: 00007f41dccd4740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 1167.150771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1167.150771] CR2: 0000000000000000 CR3: 000000003ccdf000 CR4: 00000000000007b0
[ 1167.150771] Stack:
[ 1167.150771] ffffffffa0065497 ffff88003a803c10 00007ffffffff000 ffff88003aaa67c0
[ 1167.150771] 00000000000004c0 ffff88003aaa6870 ffff88003ca83300 ffffffffa006537d
[ 1167.150771] 00000000000201e8 ffffea0000ddfec0 ffff88003a803c20 0000000000000018
[ 1167.150771] Call Trace:
[ 1167.150771] [<ffffffffa0065497>] ? kdbus_pool_slice_copy+0x127/0x200 [kdbus]
[ 1167.150771] [<ffffffffa006537d>] ? kdbus_pool_slice_copy+0xd/0x200 [kdbus]
[ 1167.150771] [<ffffffffa006670a>] kdbus_queue_entry_move+0xaa/0x180 [kdbus]
[ 1167.150771] [<ffffffffa0059e64>] kdbus_conn_move_messages+0x1e4/0x2c0 [kdbus]
[ 1167.150771] [<ffffffffa006234e>] kdbus_name_acquire+0x31e/0x390 [kdbus]
[ 1167.150771] [<ffffffffa00625c5>] kdbus_cmd_name_acquire+0x125/0x130 [kdbus]
[ 1167.150771] [<ffffffffa005db5d>] kdbus_handle_ioctl+0x4ed/0x610 [kdbus]
[ 1167.150771] [<ffffffff811040e0>] do_vfs_ioctl+0x2e0/0x4e0
[ 1167.150771] [<ffffffff81389750>] ? preempt_schedule_common+0x1f/0x3f
[ 1167.150771] [<ffffffff8110431c>] SyS_ioctl+0x3c/0x80
[ 1167.150771] [<ffffffff8138c36e>] system_call_fastpath+0x12/0x71
[ 1167.150771] Code: Bad RIP value.
[ 1167.150771] RIP [< (null)>] (null)
[ 1167.150771] RSP <ffff88003a803bc0>
[ 1167.150771] CR2: 0000000000000000
[ 1167.168756] ---[ end trace a676bcfa75db5a96 ]---

Use __vfs_read() instead.

Signed-off-by: Sergei Zviagintsev <[email protected]>
Reviewed-by: David Herrmann <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
ipc/kdbus/pool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
index 139bb77056b3..45dcdea505f4 100644
--- a/ipc/kdbus/pool.c
+++ b/ipc/kdbus/pool.c
@@ -675,7 +675,7 @@ int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
}

kaddr = (char __force __user *)kmap(page) + page_off;
- n_read = f_src->f_op->read(f_src, kaddr, copy_len, &off_src);
+ n_read = __vfs_read(f_src, kaddr, copy_len, &off_src);
kunmap(page);
mark_page_accessed(page);
flush_dcache_page(page);
--
2.3.6

2015-04-23 14:18:29

by Alan Cox

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

> Alan, and others, want a tiny, generic, multi-cast IPC method that also
> works across networks.  They feel that this is something that D-Bus

I never said - across networks. And locally it has been done, even
microcontrollers have done it.

> Lots of people have said they want something like this for years, but
> that doesn't address the issue here with kdbus, which is a very specific
> solution for a very common and wide-spread usage model that Linux

You've missed off a variety of important points that have been raised

- whether its a dumb model performancewise compared with using it to set
up a memfd or similar
- cgroup interactions
- the heavyweight nature of going via get_user_pages and __vfs_read raher
than just assuming message sizes are sensibly constrained and could far
better just be allocated and copied to a refcounted kernel buffer
- exposure of capabilities and how you futureproof it

> userspace relies on today.  I too would love to see such an IPC be
> created, and two years ago thought it would be possible to achieve
> here.  But over time, and in working with the D-Bus model and
> requirements, it just didn't happen here.  Given that no one has ever
> been able to accomplish such a thing in the past means that it's either
> impossible to do, or that no one really wants such a thing bad enough to
> actually do the work :)
>
> Did I miss anything else here?  Are there any technical reasons I'm
> forgetting about for why this can't be pulled in as-is for this merge
> window?

Like the outstanding NACKS ?

Greg - you are sounding like you have some kind of special entitlement to
ignore the way this works for everyone else. If you are feeling
frustrated, annoyed and led up several avenues at once then welcome to
the world of every other submitter who doesn't think have some kind of
magic stage door pass to get their crap in the kernel when there
are core maintainers asking hard and unanswerd questions and who have
nacked it.

There's no huge hurry. There are a bunch of things like the interactions
with cgroups, and the privilege and capability model which need careful
examination. Slipping it one release to get that right isn't a big deal -
it's not even as if you can't use hardware without it as with a driver
missing a merge - this is just a performance tweak.

Alan

2015-04-23 16:36:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
>
> Andy's concerns about the capability stuff has been hashed out in
> multiple threads here.? The kernel code isn't buggy as-designed or
> implemented from what we can all tell, it's just that the new
> functionality isn't liked by everyone, which is totally fair, but not a
> reason to declare that the function isn't useful.

Andy, did I capture your existing position correctly?? If we drop the
caps metadata, I'm guessing that you are ok with the code as you have
reviewed it and tested it out.? So should I just add a small patch that
removes this for now? After that, we can discuss the addition of
capabilities to the metadata as an add-on feature with a future patch
and not hold up this larger merge request?

thanks,

greg k-h

2015-04-23 16:46:47

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
>>
>> Andy's concerns about the capability stuff has been hashed out in
>> multiple threads here. The kernel code isn't buggy as-designed or
>> implemented from what we can all tell, it's just that the new
>> functionality isn't liked by everyone, which is totally fair, but not a
>> reason to declare that the function isn't useful.
>
> Andy, did I capture your existing position correctly? If we drop the
> caps metadata, I'm guessing that you are ok with the code as you have
> reviewed it and tested it out. So should I just add a small patch that
> removes this for now? After that, we can discuss the addition of
> capabilities to the metadata as an add-on feature with a future patch
> and not hold up this larger merge request?

No. I can fish out lists I've posted of what I personally dislike.
To repeat from my not-yet-awake memory, briefly:

- starttime, cmdline, and possibly other pieces of metadata are also
problematic. I think starttime is especially bad because it both
breaks CRIU and is IMO completely unnecessary -- I sent out draft
"highpid" patches a while ago to give a much better alternative that
isn't racy and won't break CRIU. But cmdline is also IMO ridiculous.

- There's still an open performance question. Namely: is kdbus performant?

- The policy system still sucks. Now, if we give up on the idea of
anyone ever using it for anything other than dbus as it currently
works, maybe this isn't a real problem.

- Someone should probably convince someone who understands memory
accounting that the pool mechanism accounts memory acceptably. I
don't know much about mm stuff, but I think it's subject to all kinds
of nasty latency and accounting abuses, some of which might even be
exploited by accident.

I haven't reviewed most of it. I've reviewed the metadata code (and
not recently) and the pool *docs*.

Shouldn't the bulk of this code have actual review before it gets
merged? I've only reviewed some of it, and I didn't like what I found
in that small fraction, hence my objections to caps.

--Andy

>
> thanks,
>
> greg k-h



--
Andy Lutomirski
AMA Capital Management, LLC

2015-04-23 17:16:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
> >>
> >> Andy's concerns about the capability stuff has been hashed out in
> >> multiple threads here. The kernel code isn't buggy as-designed or
> >> implemented from what we can all tell, it's just that the new
> >> functionality isn't liked by everyone, which is totally fair, but not a
> >> reason to declare that the function isn't useful.
> >
> > Andy, did I capture your existing position correctly? If we drop the
> > caps metadata, I'm guessing that you are ok with the code as you have
> > reviewed it and tested it out. So should I just add a small patch that
> > removes this for now? After that, we can discuss the addition of
> > capabilities to the metadata as an add-on feature with a future patch
> > and not hold up this larger merge request?
>
> No. I can fish out lists I've posted of what I personally dislike.
> To repeat from my not-yet-awake memory, briefly:
>
> - starttime, cmdline, and possibly other pieces of metadata are also
> problematic. I think starttime is especially bad because it both
> breaks CRIU and is IMO completely unnecessary -- I sent out draft
> "highpid" patches a while ago to give a much better alternative that
> isn't racy and won't break CRIU. But cmdline is also IMO ridiculous.

starttime was removed a while ago, are you sure you are looking at the
latest code?

cmdline has been discussed and it really helps with debugging.
Decisions aren't being made based on it.

> - There's still an open performance question. Namely: is kdbus performant?

Yes, I thought that was already answered. Tizen posted some numbers
with a much older version of the code, before David fixed a bunch of
issues that he and you found, and that averaged between 25-50% faster.
Details are in this presentation:
http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf

The Tizen and GENIVI developers are off running numbers with the latest
code, or so they told me through emails, but I don't know when/if that
will ever happen, so I can't promise more than what is already here.

> - The policy system still sucks. Now, if we give up on the idea of
> anyone ever using it for anything other than dbus as it currently
> works, maybe this isn't a real problem.

As designed, it's for D-Bus, so there's not much I can suggest here,
this isn't a "generic IPC" :)

The binder developers at Samsung have stated that the implementation we
have here works for their model as well, so I guess that is some kind of
verification it's not entirely tied to D-Bus. They have plans on
dropping the existing binder kernel code and using the kdbus code
instead when it is merged.

> - Someone should probably convince someone who understands memory
> accounting that the pool mechanism accounts memory acceptably. I
> don't know much about mm stuff, but I think it's subject to all kinds
> of nasty latency and accounting abuses, some of which might even be
> exploited by accident.

Michal and David agree that this all works properly. I don't know of
anyone else to ask about it, do you?

> I haven't reviewed most of it. I've reviewed the metadata code (and
> not recently) and the pool *docs*.
>
> Shouldn't the bulk of this code have actual review before it gets
> merged? I've only reviewed some of it, and I didn't like what I found
> in that small fraction, hence my objections to caps.

I'd love more review, and we have been asking for it since last October.
You provided a lot of it a while ago, and that helped immensely.

I can't force anyone to read the code, I can only go on what people
offer to do. We have 3 signed-off-bys on the main kdbus patches, and
numerous other different developers have provided fixes / tweaks that
are in this tree, so it's not like this is unread/unposted code here at
all.

thanks,

greg k-h

2015-04-23 17:34:56

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 10:16 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
>> On Thu, Apr 23, 2015 at 9:36 AM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Thu, Apr 23, 2015 at 03:05:48PM +0200, Greg Kroah-Hartman wrote:
>> >>
>> >> Andy's concerns about the capability stuff has been hashed out in
>> >> multiple threads here. The kernel code isn't buggy as-designed or
>> >> implemented from what we can all tell, it's just that the new
>> >> functionality isn't liked by everyone, which is totally fair, but not a
>> >> reason to declare that the function isn't useful.
>> >
>> > Andy, did I capture your existing position correctly? If we drop the
>> > caps metadata, I'm guessing that you are ok with the code as you have
>> > reviewed it and tested it out. So should I just add a small patch that
>> > removes this for now? After that, we can discuss the addition of
>> > capabilities to the metadata as an add-on feature with a future patch
>> > and not hold up this larger merge request?
>>
>> No. I can fish out lists I've posted of what I personally dislike.
>> To repeat from my not-yet-awake memory, briefly:
>>
>> - starttime, cmdline, and possibly other pieces of metadata are also
>> problematic. I think starttime is especially bad because it both
>> breaks CRIU and is IMO completely unnecessary -- I sent out draft
>> "highpid" patches a while ago to give a much better alternative that
>> isn't racy and won't break CRIU. But cmdline is also IMO ridiculous.
>
> starttime was removed a while ago, are you sure you are looking at the
> latest code?

No, I'm sure I haven't. I looked at the latest code just long enough
to see that caps were still there. So the latest code is unreviewed
by me or, as far as I can tell, by anyone else who should review it.

>
> cmdline has been discussed and it really helps with debugging.
> Decisions aren't being made based on it.

This might be addressed by the module parameter. Haven't checked
recent versions.

None of this addresses the fact that metadata is captured both at send
and connect time. I still think that this is asking for tons of
security problems down the line.

>
>> - There's still an open performance question. Namely: is kdbus performant?
>
> Yes, I thought that was already answered. Tizen posted some numbers
> with a much older version of the code, before David fixed a bunch of
> issues that he and you found, and that averaged between 25-50% faster.
> Details are in this presentation:
> http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf
>

AFAICS no one has ever even tried to address whether the kdbus design
(shmem pools, send-time metadata, plus optional memfd) gives as good
performance as plain ol' sockets. A lot of the complexity of kdbus is
due to its novel buffering scheme, and that scheme AFAICS has only
been seriously benchmarked against userspace dbus, which is a poor
reference. I neither see any compelling a priori reason to think that
the buffering scheme is a performance win, nor do I see good numbers.
Instead, I've seen numbers suggesting that it's much slower than
AF_UNIX peer to peer.

I realize that it looks like I'm comparing apples (peer to peer) to
oranges (bus), but that's just because AF_UNIX really is the best
comparison in the absence of a serious attempt at a socket-like bus
with benchmarks.

> The Tizen and GENIVI developers are off running numbers with the latest
> code, or so they told me through emails, but I don't know when/if that
> will ever happen, so I can't promise more than what is already here.
>
>> - The policy system still sucks. Now, if we give up on the idea of
>> anyone ever using it for anything other than dbus as it currently
>> works, maybe this isn't a real problem.
>
> As designed, it's for D-Bus, so there's not much I can suggest here,
> this isn't a "generic IPC" :)

Move it to userspace with a daemon that answers policy questions and
makes introductions?

>
>> - Someone should probably convince someone who understands memory
>> accounting that the pool mechanism accounts memory acceptably. I
>> don't know much about mm stuff, but I think it's subject to all kinds
>> of nasty latency and accounting abuses, some of which might even be
>> exploited by accident.
>
> Michal and David agree that this all works properly. I don't know of
> anyone else to ask about it, do you?

I thought Michal wasn't a little less convinced. I really don't see
why pages allocated due to sends would be charged to the receiver, nor
do I see why, even if that were fixed, it wouldn't be a serious
performance problem with memcgs and memory pressure in play.

I'm really surprised that GENIVI is okay with this. The latency seems
like it will be highly unpredictable.

>
>> I haven't reviewed most of it. I've reviewed the metadata code (and
>> not recently) and the pool *docs*.
>>
>> Shouldn't the bulk of this code have actual review before it gets
>> merged? I've only reviewed some of it, and I didn't like what I found
>> in that small fraction, hence my objections to caps.
>
> I'd love more review, and we have been asking for it since last October.
> You provided a lot of it a while ago, and that helped immensely.
>
> I can't force anyone to read the code, I can only go on what people
> offer to do. We have 3 signed-off-bys on the main kdbus patches, and
> numerous other different developers have provided fixes / tweaks that
> are in this tree, so it's not like this is unread/unposted code here at
> all.

I think it doesn't help that reviewing the code can be a painful
exercise when threads about a single review point drag on for hundreds
of posts. Also, it's discouraging that, after a single review point
results in hundreds of posts, reviewers get asked whether everything's
okay now.

--Andy

2015-04-23 17:43:36

by Stephen Smalley

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
> The binder developers at Samsung have stated that the implementation we
> have here works for their model as well, so I guess that is some kind of
> verification it's not entirely tied to D-Bus. They have plans on
> dropping the existing binder kernel code and using the kdbus code
> instead when it is merged.

Where do things stand wrt LSM hooks for kdbus? I don't see any security
hook calls in the kdbus tree except for the purpose of metadata
collection of process security labels. But nothing for enforcing MAC
over kdbus IPC. binder has a set of security hooks for that purpose, so
it would be a regression wrt MAC enforcement to switch from binder to
kdbus without equivalent checking there.

2015-04-23 17:57:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 10:16 AM, Greg Kroah-Hartman
<[email protected]> wrote:
>>
>> - starttime, cmdline, and possibly other pieces of metadata are also
>> problematic. I think starttime is especially bad because it both
>> breaks CRIU and is IMO completely unnecessary -- I sent out draft
>> "highpid" patches a while ago to give a much better alternative that
>> isn't racy and won't break CRIU. But cmdline is also IMO ridiculous.
>
> starttime was removed a while ago, are you sure you are looking at the
> latest code?
>
> cmdline has been discussed and it really helps with debugging.
> Decisions aren't being made based on it.

Quite frankly, I personally find cmdline/comm etc *much* worse than
sending the capabilities.

The whole notion of knowing "the other end is root" (or more
specifically some capability like "the other end can access raw
hardware") I think is a thing that absolutely makes sense in any
communication channel. I really don't even see why it would be
conditional. I mean, it's not exactly a secret anyway, and it just
makes *sense* for any protocol that may end up doing operations _for_
the recipient.

Same goes for uid etc - if you are implementing a service daemon, the
uid of the requester sure as hell makes a ton of difference in what
you might want to expose. Things like "does this user have access
rights to the printer?" are very natural questions to ask.

So I really don't understand why that part is even controversial.
kdbus wasn't meant to be some generic IPC mechanism. It is meant as a
way to talk to system daemons.

So the whole "capabilities and user information" is really to me a
non-issue. It's clearly required information, and if you don't want to
expose it, you damn well have absolutely *zero* business talking to
system daemons.

Really, it's that simple.

But things like "comm" and the cmdline? That makes me nervous. There
are real privacy issues there. Sure, maybe you think it's useful for
debugging, but the very fact that you think it's useful for debugging
makes me suspect you might be logging it (for future debugging). And
quite frankly, I don't think you should be logging things like that.
Yes, yes, if you're a system admin, you can find those things out, but
they should *not* be something that you just end up logging by mistake
or because "it's easy and all the information is right there".

If somebody is printing something, it shouldn't matter if it's "lpr"
or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
does the printing.

And you can go "but we don't log it" all you want. It's still a bad
idea. Sane people should refuse to allow a system service to see those
kinds of things by default, for a very simple reason: it's none of
their business.

So I'd suggest just getting rid of "tid_comm/pid_comm/cmdline". There
is no possible valid excuse for them. They aren't trustworthy anyway
(ie a real attacker can obfuscate them easily), and they *are*
potentially sensitive.

[ Side note: the tid_comm/pid_comm ones depend on TASK_COMM_LEN
anyway, which might change. a 16-byte command name used to be insanely
long in the traditional unix environment, but these days it's actually
regularly a truncated name due to programs called things like
"gnome-shell-extension-prefs" or
"abrt-action-generate-core-backtrace". ]

Linus

2015-04-23 18:04:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
<[email protected]> wrote:
>
> If somebody is printing something, it shouldn't matter if it's "lpr"
> or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> does the printing.

And btw, it's not just "this is information that shouldn't be logged".

It's literally "information that should not *ever* be used". I can
easily see some phone manufacturer deciding to do "value add" by
adding a special case where a special vendor system manager program
gets a back door to some service, because it needs to access the
camera for user identification at login time, so there's some magic

if (!strcmp(client->pid_comm, "vendor-login-pr"))
return ACCESS_OK;

because "it was the simplest way to do this", and the programmer knew
it was a hack, but he needed to get it working because he had a
deadline yesterday.

And then somebody figures this out, and makes an app that takes
pictures on your phone surreptitiously.

No, we can't protect against vendors doing stupid things, but we very
much also shouldn't make the kernel have interfaces that basically
encourage people to do stupid things because they make irrelevant and
wrongheaded data available.

Linus

2015-04-23 18:33:50

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 3:05 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> Did I miss anything else here? Are there any technical reasons I'm
> forgetting about for why this can't be pulled in as-is for this merge
> window?

Maybe I get again accused of ``being a jerk'' but I still dare to ask about
Boris' unanswered question:
http://marc.info/?l=linux-kernel&m=142969313220781

In fact Boris and I are currently reviewing the code but it is a slow
process as we both have day jobs...
AFACT Steven is also looking at it.

--
Thanks,
//richard

2015-04-23 18:49:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
<[email protected]> wrote:
>
> Same goes for uid etc - if you are implementing a service daemon, the
> uid of the requester sure as hell makes a ton of difference in what
> you might want to expose. Things like "does this user have access
> rights to the printer?" are very natural questions to ask.

Hmm. Looking at the code, it strikes me that not only does
kdbus_meta_proc_collect() collect too much, but some of what it
collects it just seems to do *wrong*.

So I agree with collecting user and credential information (obviously
unlike some people ;), but I think the code that does it is just
wrong.

The way to collect user and credential information is very simple: you
look at "file->f_cred".

That's _it_. Nothing more. Maybe you do "get_cred(file->f_cred):" if
you have lifetimes of this after the "struct file" is gone. But you
don't copy the fields individually or willy-nilly.

That "struct cred" reference gets you all you need. It gets you the
supplementary groups. It gets you the capabilities. It gets you the
user and group id's.

And equally importantly, it gets you the namespace so that you can do
conversions to random target namespaces later, when you actually *use*
the information.

There might be some question about whether you should use
"current->cred" or "file->f_cred", but the latter is almost always the
right thing to use when you are doing file operations. The unix
filesystem security model is about permissions at open time, not at
use time.

Linus

2015-04-23 18:56:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote:
> On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > If somebody is printing something, it shouldn't matter if it's "lpr"
> > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> > does the printing.
>
> And btw, it's not just "this is information that shouldn't be logged".
>
> It's literally "information that should not *ever* be used". I can
> easily see some phone manufacturer deciding to do "value add" by
> adding a special case where a special vendor system manager program
> gets a back door to some service, because it needs to access the
> camera for user identification at login time, so there's some magic
>
> if (!strcmp(client->pid_comm, "vendor-login-pr"))
> return ACCESS_OK;
>
> because "it was the simplest way to do this", and the programmer knew
> it was a hack, but he needed to get it working because he had a
> deadline yesterday.
>
> And then somebody figures this out, and makes an app that takes
> pictures on your phone surreptitiously.
>
> No, we can't protect against vendors doing stupid things, but we very
> much also shouldn't make the kernel have interfaces that basically
> encourage people to do stupid things because they make irrelevant and
> wrongheaded data available.

Doing access control based on comm and cmdline is horrid, I totally
agree. But right now, any process in the system can read any other
process's comm and cmdline value out of /proc today. So removing it
from the metadata is fine for kdbus, I can live with that, but it really
isn't "preventing" anything that's not already visible to everyone, so
if someone wanting to be "bad" could always still log it or do anything
else they wanted with it.

Doesn't syslog uses it today all over the place for logging stuff that
happens in the system?

Or am I missing something here?

thanks,

greg k-h

2015-04-23 19:02:15

by Eric W. Biederman

[permalink] [raw]
Subject: Kdbus needs meaningful review (was: Re: [GIT PULL] kdbus for 4.1-rc1)

Greg Kroah-Hartman <[email protected]> writes:

> On Mon, Apr 13, 2015 at 09:03:50PM +0200, Greg Kroah-Hartman wrote:
>> The following changes since commit 9eccca0843205f87c00404b663188b88eb248051:
>>
>> Linux 4.0-rc3 (2015-03-08 16:09:09 -0700)
>>
>> are available in the git repository at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ tags/kdbus-4.1-rc1
>>
>> for you to fetch changes up to 9fb9cd0f4434a23487b6ef3237e733afae90e336:
>>
>> kdbus: avoid the use of struct timespec (2015-04-10 14:34:53 +0200)
>>
>> ----------------------------------------------------------------
>
> Given this has been a crazy email thread, let's try to figure out what
> the status is here.
>
> Al Viro pointed out some odd locking (r/w lock only used in write mode),
> and asked for some more documentation / description of the object model
> used here.  David provided that, and will send a minor fix for the rw
> lock, so I think that issue is now resolved. David has created a few
> other minor changes based on Al's review that I will forward on later.

As I recall Al could not even trace through all of the locking, and that
prevented him looking at any of the bigger issues. That does not
qualify as fixed with a little patch or two and everything thing is
fixed.

Perhaps with the minor patch or two Al could potentially actually review
the code.

> Andy's concerns about the capability stuff has been hashed out in
> multiple threads here.  The kernel code isn't buggy as-designed or
> implemented from what we can all tell, it's just that the new
> functionality isn't liked by everyone, which is totally fair, but not a
> reason to declare that the function isn't useful.

There are in fact implementation bugs and you asserion that there are
none is the largest reason this code should not be merged. You are
turning a blind eye to problems.

> Alan, and others, want a tiny, generic, multi-cast IPC method that also
> works across networks.  They feel that this is something that D-Bus
> might be able to use in the future in userspace to build on top of. 
> Lots of people have said they want something like this for years, but
> that doesn't address the issue here with kdbus, which is a very specific
> solution for a very common and wide-spread usage model that Linux
> userspace relies on today.  I too would love to see such an IPC be
> created, and two years ago thought it would be possible to achieve
> here.  But over time, and in working with the D-Bus model and
> requirements, it just didn't happen here.  Given that no one has ever
> been able to accomplish such a thing in the past means that it's either
> impossible to do, or that no one really wants such a thing bad enough to
> actually do the work :)

What is the rush?

> Did I miss anything else here?  Are there any technical reasons I'm
> forgetting about for why this can't be pulled in as-is for this merge
> window?

****The code has not been meaningfully or properly reviewed.****


Greg you are pushing entirely too hard for this code to get it. When
someone pushes as hard as you are doing, inevitiable problems get
through.

Greg it is my professional opinion that the code smells. There are all
sort of missteps and oversights that indicate that almost certainly that
something important has been overlooked.

I do not believe this code is yet up to the standards we want for core
kernel code.

This code has astonishingly complex interactions with all kinds of other
kernel subsystems and concerns. As a community we should understand
them and accept them before letting them in.


The only way I have seen anything make meaninful progress with those
kinds of interactions is for the pieces to be teased apart. And then
the code incrementally added to with all of the right people being
pulled into the discussion.

I suspect removing all of the extensions to the capabilities of dbus-1
would be a good start for getting a piece of code that could be
meaningfully reviewed. Then the controversial bits can be addressed on
their own. As it is there is too much for to properly address any one
issue.


Greg this process has fundamentally not given people time to understand
the code, the interactions or the complexities. The discussions show
that.


Further by refusing to tease apart the pieces. By refusing to allow
other people time to understand this code. By refusing to give an inch
and admit anyone else has a valid point real problems, and real issues
can not be revealed and fixed.

With such a pig headed direction I do not believe that kdbus is in any
sense ready for merging.


Eric

p.s. One of the issues of smell that I have been talking about is I see
in kdbus patterns of code construction that have caused real world
performance problems, and real world security issues. And those issues
get ignored when brought up.

p.p.s Not that this complaint is not in any sense new you have been
ignoring people who try to bring up meaningful issues for a long time.
The fact that when people bring up uncomfortable points about the kdbus
code they get routingely blown off certainly contributes to the lack of
meaningful review as it is not rewarding to work with someone who does
not listen to criticism. At this point the strongest possible language
and the strongest possible push back are being used because everything
else is routinely swept under the rug.

2015-04-23 19:01:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 08:33:47PM +0200, Richard Weinberger wrote:
> On Thu, Apr 23, 2015 at 3:05 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > Did I miss anything else here? Are there any technical reasons I'm
> > forgetting about for why this can't be pulled in as-is for this merge
> > window?
>
> Maybe I get again accused of ``being a jerk'' but I still dare to ask about
> Boris' unanswered question:
> http://marc.info/?l=linux-kernel&m=142969313220781

No, I'm not going to say you are being a jerk for asking for a response,
only when you say things like "the code is too big" :)

I go reply now, I didn't really understand it the first time around, and
still don't the second time either...

> In fact Boris and I are currently reviewing the code but it is a slow
> process as we both have day jobs...

That's great, thanks for doing it.

greg k-h

2015-04-23 19:14:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 22, 2015 at 10:58:28AM +0200, Borislav Petkov wrote:
> On Mon, Apr 13, 2015 at 02:29:35PM -0500, Eric W. Biederman wrote:
> > And the code that transfers the meta-data is wrong.
> >
> > It is generally not something that userspace requires today, certainly
> > userspace is not using it.
> >
> > You are exporting a weird set of information in a unique way that makes
> > it race free enough to make ``security'' decisions upon but the data
> > in general is not appropriate to make those decisions.
> >
> > I remain opposed to this half thought out trash of an ABI for the
> > meta-data.
> >
> > Just because something happens to be exported in a DEBUG api today does
> > not make it appropriate for userspace to run around making security
> > decisions with that information.
> >
> > Nacked-by: "Eric W. Biederman" <[email protected]>
> >
> > I think it is premature to be merging kdbus. You have fuddamental
> > issues that can not be fixed once the ABI is frozen.
> >
> > The semantics of the meta-data you export are extremely poorly defined.
>
> Not only that - it looks like a serious amount of work on each sent
> packet. So I did some staring, correct me if I missed something:
>
> kdbus_cmd_send - KDBUS_CMD_SEND, ioctl cmd, copy stuff from userspace
> |-> kdbus_kmsg_new_from_cmd(), kmalloc+memset + prepare a *lot* of stuff like:
> |-> m->proc_meta = kdbus_meta_proc_new();
> m->conn_meta = kdbus_meta_conn_new();
> ...
> |-> kdbus_bus_broadcast(conn->ep->bus, conn, kmsg); let's look at the broadcast mode
> |-> hash_for_each(bus->conn_hash, i, conn_dst, hentry) { iterate over hash buckets, O(256)

I don't know what O(256) means here, O notation usually is used to
show the complexity of a function, so this really is almost always the
same amount of time, based on using the hash function. I've never seen
a number in O() before, but I went to school a long time ago, and
probably forgot something...

Or am I misunderstanding your note here?

> |-> kdbus_meta_proc_collect(kmsg->proc_meta, attach_flags); collect a *lot* of stuff from current etc
> |-> kdbus_meta_conn_collect(kmsg->conn_meta, kmsg, conn_src, attach_flags); collect more stuff
>
> and this happens on *every* send. A *lot* of work.

Yes, these looks like a lot of stuff but it's still really fast. And we
need it.

> Now multiply that by the amount of messages this thing is going to send
> per second. It piles up. So you have the overhead right then and there
> in the design without even being able to fix it. Or at least pretty damn
> hard to fix.

It's way faster than what we have today, and David has found a few
areas that can go faster, so I don't really understand the objection.
If you can come up with a faster way to do this, that would be great and
most appreciated.

> So unless I'm missing something, this right there is a design problem.
>
> Why can't this messaging be done with a nifty O(1) scheme like sending
> parties issuing auth tokens and whatever and the kernel doing the
> arbitration and distribution of those tokens?

Hm, this seems to be to be O(1), pretty constant, we do the same amount
of work all the time. Then we send the message to the people listening
to it (so that is O(n) depending on the number of listeners, really the
best that I think you can get).

Or am I misunderstanding what you are asking for here?

> That gets you sandboxing, dropping privileges and whatever else fancy
> containers people wanna do for free. Token recipient has the token -
> that's all that counts.

I don't understand what a token provides that is different from what is
happening here, please explain. How can that be faster than what we do
today?

confused,

greg k-h

2015-04-23 19:22:33

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Apr 23, 2015 11:56 AM, "Greg Kroah-Hartman"
<[email protected]> wrote:
>
> On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote:
> > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > If somebody is printing something, it shouldn't matter if it's "lpr"
> > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> > > does the printing.
> >
> > And btw, it's not just "this is information that shouldn't be logged".
> >
> > It's literally "information that should not *ever* be used". I can
> > easily see some phone manufacturer deciding to do "value add" by
> > adding a special case where a special vendor system manager program
> > gets a back door to some service, because it needs to access the
> > camera for user identification at login time, so there's some magic
> >
> > if (!strcmp(client->pid_comm, "vendor-login-pr"))
> > return ACCESS_OK;
> >
> > because "it was the simplest way to do this", and the programmer knew
> > it was a hack, but he needed to get it working because he had a
> > deadline yesterday.
> >
> > And then somebody figures this out, and makes an app that takes
> > pictures on your phone surreptitiously.
> >
> > No, we can't protect against vendors doing stupid things, but we very
> > much also shouldn't make the kernel have interfaces that basically
> > encourage people to do stupid things because they make irrelevant and
> > wrongheaded data available.
>
> Doing access control based on comm and cmdline is horrid, I totally
> agree. But right now, any process in the system can read any other
> process's comm and cmdline value out of /proc today. So removing it
> from the metadata is fine for kdbus, I can live with that, but it really
> isn't "preventing" anything that's not already visible to everyone, so
> if someone wanting to be "bad" could always still log it or do anything
> else they wanted with it.

I feel like a broken record. This isn't true in general. Selinux can
and, I believe, often does prevent this.

--Andy

2015-04-23 19:30:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote:
> On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
> > The binder developers at Samsung have stated that the implementation we
> > have here works for their model as well, so I guess that is some kind of
> > verification it's not entirely tied to D-Bus. They have plans on
> > dropping the existing binder kernel code and using the kdbus code
> > instead when it is merged.
>
> Where do things stand wrt LSM hooks for kdbus? I don't see any security
> hook calls in the kdbus tree except for the purpose of metadata
> collection of process security labels. But nothing for enforcing MAC
> over kdbus IPC. binder has a set of security hooks for that purpose, so
> it would be a regression wrt MAC enforcement to switch from binder to
> kdbus without equivalent checking there.

There was a set of LSM hooks proposed for kdbus posted by Karol
Lewandowsk last October, and it also included SELinux and Smack patches.
They were going to be refreshed based on the latest code changes, but I
haven't seen them posted, or I can't seem to find them in my limited
email archive.

Karol, what's the status of them?

thanks,

greg k-h

2015-04-23 19:34:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 12:22:10PM -0700, Andy Lutomirski wrote:
> On Apr 23, 2015 11:56 AM, "Greg Kroah-Hartman"
> <[email protected]> wrote:
> >
> > On Thu, Apr 23, 2015 at 11:04:36AM -0700, Linus Torvalds wrote:
> > > On Thu, Apr 23, 2015 at 10:57 AM, Linus Torvalds
> > > <[email protected]> wrote:
> > > >
> > > > If somebody is printing something, it shouldn't matter if it's "lpr"
> > > > or "firefox http://horses.and.trannyporn.my.little.pony.com/" that
> > > > does the printing.
> > >
> > > And btw, it's not just "this is information that shouldn't be logged".
> > >
> > > It's literally "information that should not *ever* be used". I can
> > > easily see some phone manufacturer deciding to do "value add" by
> > > adding a special case where a special vendor system manager program
> > > gets a back door to some service, because it needs to access the
> > > camera for user identification at login time, so there's some magic
> > >
> > > if (!strcmp(client->pid_comm, "vendor-login-pr"))
> > > return ACCESS_OK;
> > >
> > > because "it was the simplest way to do this", and the programmer knew
> > > it was a hack, but he needed to get it working because he had a
> > > deadline yesterday.
> > >
> > > And then somebody figures this out, and makes an app that takes
> > > pictures on your phone surreptitiously.
> > >
> > > No, we can't protect against vendors doing stupid things, but we very
> > > much also shouldn't make the kernel have interfaces that basically
> > > encourage people to do stupid things because they make irrelevant and
> > > wrongheaded data available.
> >
> > Doing access control based on comm and cmdline is horrid, I totally
> > agree. But right now, any process in the system can read any other
> > process's comm and cmdline value out of /proc today. So removing it
> > from the metadata is fine for kdbus, I can live with that, but it really
> > isn't "preventing" anything that's not already visible to everyone, so
> > if someone wanting to be "bad" could always still log it or do anything
> > else they wanted with it.
>
> I feel like a broken record. This isn't true in general.

Works on my box :)

> Selinux can and, I believe, often does prevent this.

Ok, then the LSM patches for kdbus should be able to also mediate this
as well if needed. I haven't looked at the LSM kdbus patches in a long
time, so I don't remember exactly what they were looking at.

Again, I don't object to dropping this in kdbus, just confused as this
seemed to me to be something that is always available to all processes
anyway, we weren't adding something previously "hidden".

thanks,

greg k-h

2015-04-23 20:51:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 11:56 AM, Greg Kroah-Hartman
<[email protected]> wrote:
>
> Doing access control based on comm and cmdline is horrid, I totally
> agree. But right now, any process in the system can read any other
> process's comm and cmdline value out of /proc today.

You have to work extra hard for it, and it's preventable anyway (ie selinux).

In contrast, with the information in the kdbus message, it's almost
certain that any random "enable debugging for dbus" patch will start
logging it, because "it's just there".

That's a big difference. Most bugs and security issues come because
people make trivial make trivial mistakes, not because people
explicitly go out of their way to make them.

> Doesn't syslog uses it today all over the place for logging stuff that
> happens in the system?

Hell no.

Sure, if an application explicitly says "log this message", then we
save the application name. But not for random system interactions.

The example Andy gave about doing things like name lookup is a good
one. Doesn't systemd already do a dns cache module?

Doing a name lookup is some *seriously* different thing than using
"syslog()" to explicitly log messages.

And if kdbus people can't see that difference, I don't see what we can
discuss here. Do you really not see the privacy implications? It turns
privacy violations from "you have to actually work at it" to "they
happen pretty much by mistake".

Linus

2015-04-23 20:53:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 12:33 PM, Greg KH <[email protected]> wrote:
> On Thu, Apr 23, 2015 at 12:22:10PM -0700, Andy Lutomirski wrote:
>
>> Selinux can and, I believe, often does prevent this.
>
> Ok, then the LSM patches for kdbus should be able to also mediate this
> as well if needed.

No Greg.

Just remove the shit. Really. Take out the command line and the task
name. You already admitted that there is no actual valid use for it.

We don't add crap that then has to be disabled with secuirity rules
just because it was a bad interface. Just make the interface not do it
in the first place. It's that simple.

Linus

2015-04-23 20:56:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote:
> I don't know what O(256) means here, O notation usually is used to
> show the complexity of a function, so this really is almost always the
> same amount of time, based on using the hash function.

This is iterating over 256 hash buckets. So O(n) complexity. Better?

> Yes, these looks like a lot of stuff but it's still really fast.

"really fast" - that's the right way to quantify things, right? Let me
reply in your terms: "no, it is dumb and slow".

> And we need it.

*Of* *course* you need it, what else. Lemme guess: there's no other
way to do this than the way it was done now, right? And we should stop
asking such stupid questions and accept it... Yeah, of course.

> Hm, this seems to be to be O(1), pretty constant, we do the same amount
> of work all the time.

The same *pile* of unnecessary and needless work. You go and collect
*all* that data on *every* packet send?!

How many packets per second are we talking here? 100, 1000, 10000...?

Let's say you're "really fast" because you've bought a "bigger machine"
and do that information collection per packet for, say 10 microseconds
(I'm probably too generous here but whatever).

So at peak rates of 10000 packets per second, and 10µs preparation time
per packet, you're wasting 100000 µs == 100 msec, i.e. 1/10th of a
second you're busy only with sending packets.

Hmm, but then the receiving side needs CPU time too...

Oh yeah, and then those pesky userspace processes need some CPU time
too...

Are you really serious or is this some tactic of deliberately asking
dumb questions? Let me know now so that I can stop wasting my time.

> I don't understand what a token provides that is different from what is
> happening here, please explain. How can that be faster than what we do
> today?

A token-based scheme would give you significantly less traffic;
distributing those in sandboxing, containers, etc for free and you can
throw the metadata collecting in the garbage can:

Example:

* A daemon issues a token, say, a capability to reboot.

* It gives that token (with the kernel as intermediary) to a recipient
which should be allowed to reboot.

* recipient can drop privileges, run in a sandbox, whatever, it still
has that token.

That's exactly one packet sent *without* any information collection.
Recipient has to authenticate itself to the kernel when requesting the
packet.

Clean and simple.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-23 21:22:43

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Thu, Apr 23, 2015 at 10:56 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote:
>> I don't know what O(256) means here, O notation usually is used to
>> show the complexity of a function, so this really is almost always the
>> same amount of time, based on using the hash function.
>
> This is iterating over 256 hash buckets. So O(n) complexity. Better?

No it's not. O(256) equals O(1).

Thanks
David

2015-04-23 21:33:22

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 11:22 PM, David Herrmann <[email protected]> wrote:
> Hi
>
> On Thu, Apr 23, 2015 at 10:56 PM, Borislav Petkov <[email protected]> wrote:
>> On Thu, Apr 23, 2015 at 09:14:33PM +0200, Greg Kroah-Hartman wrote:
>>> I don't know what O(256) means here, O notation usually is used to
>>> show the complexity of a function, so this really is almost always the
>>> same amount of time, based on using the hash function.
>>
>> This is iterating over 256 hash buckets. So O(n) complexity. Better?
>
> No it's not. O(256) equals O(1).

Yeah, that's absolutely correct.
I think Boris wanted to say that iterating over all hash buckets
can be costly.

--
Thanks,
//richard

2015-04-23 21:41:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
> No it's not. O(256) equals O(1).

Ok, you're right. Maybe O() was not the right thing to use when trying
to point out that iterating over 256 hash buckets and then following the
chain in each bucket per packet broadcast looks like a lot.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-24 02:08:31

by Karol Lewandowski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 09:30:13PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote:
> > On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
> > > The binder developers at Samsung have stated that the implementation we
> > > have here works for their model as well, so I guess that is some kind of
> > > verification it's not entirely tied to D-Bus. They have plans on
> > > dropping the existing binder kernel code and using the kdbus code
> > > instead when it is merged.
> >
> > Where do things stand wrt LSM hooks for kdbus? I don't see any security
> > hook calls in the kdbus tree except for the purpose of metadata
> > collection of process security labels. But nothing for enforcing MAC
> > over kdbus IPC. binder has a set of security hooks for that purpose, so
> > it would be a regression wrt MAC enforcement to switch from binder to
> > kdbus without equivalent checking there.
>
> There was a set of LSM hooks proposed for kdbus posted by Karol
> Lewandowsk last October, and it also included SELinux and Smack patches.
> They were going to be refreshed based on the latest code changes, but I
> haven't seen them posted, or I can't seem to find them in my limited
> email archive.

We have been waiting for right moment with these. :-)

> Karol, what's the status of them?

I have handed patchset over to Paul Osmialowski who started rework it for v4
relatively recently. I think it shouldn't be that hard to post updated version...

Paul?

2015-04-24 05:02:56

by Steven Noonan

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
>> No it's not. O(256) equals O(1).
>
> Ok, you're right. Maybe O() was not the right thing to use when trying
> to point out that iterating over 256 hash buckets and then following the
> chain in each bucket per packet broadcast looks like a lot.
>

Heh. I guess you could call it an "expensive O(1)". While big-O
notation is useful for describing algorithm scalability with respect
to input size, it falls flat on its face when trying to articulate
impact in measurable units.

2015-04-24 06:36:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > Hm, this seems to be to be O(1), pretty constant, we do the same amount
> > of work all the time.
>
> The same *pile* of unnecessary and needless work. You go and collect
> *all* that data on *every* packet send?!

No, not at all, the metadata is cached, we only collect that for the
first message sent, if we didn't know it already, or we do it on the
"open" of the connection, depending on what we are gathering metadata
for.

The mc->collected test right before collecting the specific metadata is
that "cached or not" test.

greg k-h

2015-04-24 06:45:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > > Hm, this seems to be to be O(1), pretty constant, we do the same amount
> > > of work all the time.
> >
> > The same *pile* of unnecessary and needless work. You go and collect
> > *all* that data on *every* packet send?!
>
> No, not at all, the metadata is cached, we only collect that for the
> first message sent, if we didn't know it already, or we do it on the
> "open" of the connection, depending on what we are gathering metadata
> for.
>
> The mc->collected test right before collecting the specific metadata is
> that "cached or not" test.

Oh wait, no, there are some send-time metadata that is collected for
every message, see Linus's email for more details about that. Maybe
this can be changed to cache things even more than we currently do.

it's early, shouldn't write emails before coffee...

David had some flamegraphs floating around that showed where all the
time on transmit / receive was being spent, and I don't think that the
metadata area was all that relevant, but I can't find them anymore to
say for sure. There are other areas that can be sped up on the send
path, but perf data is the best way to verify this.

greg k-h

2015-04-24 07:27:16

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Freitag, 24. April 2015, 08:45:15 schrieb Greg Kroah-Hartman:
> On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote:
> > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > > > Hm, this seems to be to be O(1), pretty constant, we do the same
> > > > amount
> > > > of work all the time.
> > >
> > > The same *pile* of unnecessary and needless work. You go and collect
> > > *all* that data on *every* packet send?!
> >
> > No, not at all, the metadata is cached, we only collect that for the
> > first message sent, if we didn't know it already, or we do it on the
> > "open" of the connection, depending on what we are gathering metadata
> > for.
> >
> > The mc->collected test right before collecting the specific metadata
> > is
> > that "cached or not" test.
>
> Oh wait, no, there are some send-time metadata that is collected for
> every message, see Linus's email for more details about that. Maybe
> this can be changed to cache things even more than we currently do.
>
> it's early, shouldn't write emails before coffee...
>
> David had some flamegraphs floating around that showed where all the
> time on transmit / receive was being spent, and I don't think that the
> metadata area was all that relevant, but I can't find them anymore to
> say for sure. There are other areas that can be sped up on the send
> path, but perf data is the best way to verify this.

I think thats exactly the data that others have asked for several times,
so I think it would be good to find it again.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-24 08:35:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 24, 2015 at 08:45:15AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Apr 24, 2015 at 08:36:03AM +0200, Greg Kroah-Hartman wrote:
> > On Thu, Apr 23, 2015 at 10:56:40PM +0200, Borislav Petkov wrote:
> > > > Hm, this seems to be to be O(1), pretty constant, we do the same amount
> > > > of work all the time.
> > >
> > > The same *pile* of unnecessary and needless work. You go and collect
> > > *all* that data on *every* packet send?!
> >
> > No, not at all, the metadata is cached, we only collect that for the
> > first message sent, if we didn't know it already, or we do it on the
> > "open" of the connection, depending on what we are gathering metadata
> > for.
> >
> > The mc->collected test right before collecting the specific metadata is
> > that "cached or not" test.
>
> Oh wait, no, there are some send-time metadata that is collected for
> every message, see Linus's email for more details about that. Maybe
> this can be changed to cache things even more than we currently do.
>
> it's early, shouldn't write emails before coffee...
>
> David had some flamegraphs floating around that showed where all the
> time on transmit / receive was being spent, and I don't think that the
> metadata area was all that relevant, but I can't find them anymore to
> say for sure. There are other areas that can be sped up on the send
> path, but perf data is the best way to verify this.

Here's the graphs that he posted during the last code review cycle that
are relevant here:
http://lkml.iu.edu/hypermail/linux/kernel/1503.2/02624.html

greg k-h

2015-04-24 09:04:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 10:02:52PM -0700, Steven Noonan wrote:
> On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <[email protected]> wrote:
> > On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
> >> No it's not. O(256) equals O(1).
> >
> > Ok, you're right. Maybe O() was not the right thing to use when trying
> > to point out that iterating over 256 hash buckets and then following the
> > chain in each bucket per packet broadcast looks like a lot.
> >
>
> Heh. I guess you could call it an "expensive O(1)". While big-O
> notation is useful for describing algorithm scalability with respect
> to input size, it falls flat on its face when trying to articulate
> impact in measurable units.

Right, so in thinking about this more today, on a fresh head, it still
is O(n) because we do broadcast the packet to n recipients - the
hash_for_each() thing iterates over 256 hash buckets and also follows
the linked list chain in each bucket. Its length is depending on how
many connections are in the bucket, i.e. recipients. And I'd guess that
number changes dynamically so probably linear.

And then there's the collection of, let's call it metadata of
questionable use, *per* packet which is pretty expensive in my book.
It becomes even more expensive if it is completely useless as in, the
receiving side doesn't need it all.

Now, one might argue that you have to do O(n) work when broadcasting
to n recipients anyway and you can't get that cheaper but maybe the
design is not optimal. Maybe it could be made to not broadcast at all,
or broadcast to a subset of recipients, only those which are actually
interested in the broadcast.

That's why I was looking at some simple token-based schemes. And that's
why I think Andy has some very cool ideas which we should definitely pay
attention to:

https://lkml.kernel.org/r/CALCETrXXUiYKAhsXsdqH2uZMddDhK5hX6V9%2BrZcHwa1X5WC%[email protected]

before we go and commit this thing and cast it stone. Because if it goes
in, there's no changing it because we'll be then breaking userspace and
that's no-no.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-24 10:29:00

by Daniel Mack

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On 04/24/2015 11:04 AM, Borislav Petkov wrote:
> On Thu, Apr 23, 2015 at 10:02:52PM -0700, Steven Noonan wrote:
>> On Thu, Apr 23, 2015 at 2:41 PM, Borislav Petkov <[email protected]> wrote:
>>> On Thu, Apr 23, 2015 at 11:22:39PM +0200, David Herrmann wrote:
>>>> No it's not. O(256) equals O(1).
>>>
>>> Ok, you're right. Maybe O() was not the right thing to use when trying
>>> to point out that iterating over 256 hash buckets and then following the
>>> chain in each bucket per packet broadcast looks like a lot.
>>>
>>
>> Heh. I guess you could call it an "expensive O(1)". While big-O
>> notation is useful for describing algorithm scalability with respect
>> to input size, it falls flat on its face when trying to articulate
>> impact in measurable units.
>
> Right, so in thinking about this more today, on a fresh head, it still
> is O(n) because we do broadcast the packet to n recipients - the
> hash_for_each() thing iterates over 256 hash buckets and also follows
> the linked list chain in each bucket. Its length is depending on how
> many connections are in the bucket, i.e. recipients. And I'd guess that
> number changes dynamically so probably linear.

Sure, for broadcasts, we have to walk the list of peers connected to the
bus and see which one is interested in a particular message. We do that
by looking at the match rules of each of them, which are based on
well-known names, IDs, notification types or bloom filters. The policy
logic limits this further, as receivers of a broadcast must have TALK
access to the sender.

If these rules let a message pass, all the metadata that the receiving
peer asked for (by setting a flag at connect time) is collected, unless
it has been collected already for some other peer for the same message.
In other words, in worst case, we collect all the metadata items exactly
once per message.

If none of the connections with permissive match/policy rules for a
message is interested in any metadata items, nothing will be collected
at all.

The reason why the peers are organized in a hash table is that we have
to look them up by ID for unicast messages.

> And then there's the collection of, let's call it metadata of
> questionable use, *per* packet which is pretty expensive in my book.
> It becomes even more expensive if it is completely useless as in, the
> receiving side doesn't need it all.

If the receiving side doesn't need it, it shouldn't opt-in for that
piece of information.

The metadata logic is really only there so receiving peers are directly
supplied with information that they would otherwise look up themselves
from /proc or something. Also, we collect metadata at send time and for
every message intentionally, so that it reflects the state of the sender
at the time of sending. This way, the information is not subject to
races of asynchronous lookups.

> Now, one might argue that you have to do O(n) work when broadcasting
> to n recipients anyway and you can't get that cheaper but maybe the
> design is not optimal. Maybe it could be made to not broadcast at all,
> or broadcast to a subset of recipients, only those which are actually
> interested in the broadcast.

That's exactly what happens :) There are some more details on this in
kdbus.match(7).


Thanks,
Daniel

2015-04-24 10:51:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On Fri, Apr 24, 2015 at 12:28:54PM +0200, Daniel Mack wrote:
> Sure, for broadcasts, we have to walk the list of peers connected to the
> bus and see which one is interested in a particular message. We do that

And this "... we have to walk the list ..." right there raises the
alarm. Can this walking of elements where you know they wouldn't match
be avoided?

> by looking at the match rules of each of them, which are based on
> well-known names, IDs, notification types or bloom filters. The policy
> logic limits this further, as receivers of a broadcast must have TALK
> access to the sender.

So it sounds to me like there are characteristics which can already
prepare lists of recipients interested in some sort of message. So
would it be possible for recipients to "register" for such messages
and the sending side would simply iterate a list of solely interested
recipients?

This will definitely save you the iteration over all n connections and
would make the metadata collection probably not needed (or at least a
subset of it) because recipients will have to establish eligibility for
receiving a certain message at register time and once they're on the
list, you implicitly know why they're there.

I don't know whether that fits all use cases but it definitely does only
the *necessary* work for message transfer and not more.

> If these rules let a message pass, all the metadata that the receiving
> peer asked for (by setting a flag at connect time) is collected, unless
> it has been collected already for some other peer for the same message.
> In other words, in worst case, we collect all the metadata items exactly
> once per message.

Right.

> If none of the connections with permissive match/policy rules for a
> message is interested in any metadata items, nothing will be collected
> at all.

But we still iterate through there and look at the arg @what and
->collected. And this is useless work which can be avoided IMHO.

> If the receiving side doesn't need it, it shouldn't opt-in for that
> piece of information.
>
> The metadata logic is really only there so receiving peers are directly
> supplied with information that they would otherwise look up themselves
> from /proc or something. Also, we collect metadata at send time and for
> every message intentionally, so that it reflects the state of the sender
> at the time of sending. This way, the information is not subject to
> races of asynchronous lookups.

Ok.

> > Now, one might argue that you have to do O(n) work when broadcasting
> > to n recipients anyway and you can't get that cheaper but maybe the
> > design is not optimal. Maybe it could be made to not broadcast at all,
> > or broadcast to a subset of recipients, only those which are actually
> > interested in the broadcast.
>
> That's exactly what happens :) There are some more details on this in
> kdbus.match(7).

But this is not for KDBUS_DST_ID_BROADCAST types, right? Because there
you have to iterate over *all* recipients in the connection hash.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

2015-04-24 11:26:30

by Daniel Mack

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi,

On 04/24/2015 12:50 PM, Borislav Petkov wrote:
> On Fri, Apr 24, 2015 at 12:28:54PM +0200, Daniel Mack wrote:
>> Sure, for broadcasts, we have to walk the list of peers connected to the
>> bus and see which one is interested in a particular message. We do that
>
> And this "... we have to walk the list ..." right there raises the
> alarm. Can this walking of elements where you know they wouldn't match
> be avoided?

Yes, see below.

>> by looking at the match rules of each of them, which are based on
>> well-known names, IDs, notification types or bloom filters. The policy
>> logic limits this further, as receivers of a broadcast must have TALK
>> access to the sender.
>
> So it sounds to me like there are characteristics which can already
> prepare lists of recipients interested in some sort of message. So
> would it be possible for recipients to "register" for such messages
> and the sending side would simply iterate a list of solely interested
> recipients?
>
> This will definitely save you the iteration over all n connections and
> would make the metadata collection probably not needed (or at least a
> subset of it) because recipients will have to establish eligibility for
> receiving a certain message at register time and once they're on the
> list, you implicitly know why they're there.

David is working on patches that store hashes of the matches in trees so
we can look them up more efficiently. We'd still need to check the bloom
filter for all remaining candidates though.

These are, however, implementation details which potentially make the
code harder to read. We are well aware of certain spots that can be made
more efficient, but we were hoping for more reviews by keeping the
implementation simple for now.

>> If none of the connections with permissive match/policy rules for a
>> message is interested in any metadata items, nothing will be collected
>> at all.
>
> But we still iterate through there and look at the arg @what and
> ->collected. And this is useless work which can be avoided IMHO.

Not sure if it really matters, but we can probably add an early bail
there, yes. Something like

what &= ~mp->collected;
if (!what)
return;

Noted down, thanks!

>>> Now, one might argue that you have to do O(n) work when broadcasting
>>> to n recipients anyway and you can't get that cheaper but maybe the
>>> design is not optimal. Maybe it could be made to not broadcast at all,
>>> or broadcast to a subset of recipients, only those which are actually
>>> interested in the broadcast.
>>
>> That's exactly what happens :) There are some more details on this in
>> kdbus.match(7).
>
> But this is not for KDBUS_DST_ID_BROADCAST types, right?

Yes it is - all broadcast messages are subject to opt-in filters
installed by the receiving peer.


Thanks,
Daniel



2015-04-24 13:50:45

by Lukasz Skalski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi All,

On 04/23/2015 07:16 PM, Greg Kroah-Hartman wrote:
> On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
>> - There's still an open performance question. Namely: is kdbus performant?
>
> Yes, I thought that was already answered. Tizen posted some numbers
> with a much older version of the code, before David fixed a bunch of
> issues that he and you found, and that averaged between 25-50% faster.
> Details are in this presentation:
> http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf
>
> The Tizen and GENIVI developers are off running numbers with the latest
> code, or so they told me through emails, but I don't know when/if that
> will ever happen, so I can't promise more than what is already here.
>

I'm working on kdbus support for GLib ([1],[2]). I saw some questions
about kdbus performance, so I've prepared simple benchmark. Because
David already has posted some comparison results between kdbus and UDS,
I've decided to use my GLib port with native kdbus support (it should
be noted, that this port is not finished yet and there are still some
places for improvements, thus please do not treat these test results as
final).

To perform tests I've created two simple apps:

- server: http://fpaste.org/215157/
- client: http://fpaste.org/215156/

The first one (server) registers itself on the bus under well-known
name ("com.test.app") and waits for calls to its objects and methods.
The second one (client) makes calls and records periods of time between
moment of preparing of a call to the moment of receiving an answer. The
measurement is made by performing 20000 of calls and computing a sum of
duration of every call (for two different sizes of message payload:
1000 and 10000 bytes). The client program returns total time of
performed calls after successful execution. All tests have been run on
VirtualBox with ArchLinux and latest version of systemd and kdbus.

The test results are following:

+--------------+--------------------+--------------------+
| | Elapsed time | Elapsed time |
| Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON |
| [bytes] | KDBUS SUPPORT* | |
+--------------+--------------------+--------------------+
| | 1) 2.874264 s | 1) 4.624631 s |
| 1000 | 2) 2.932835 s | 2) 4.669730 s |
| | 3) 2.899634 s | 3) 4.747275 s |
| | 4) 2.970106 s | 4) 4.725723 s |
+--------------+--------------------+--------------------+
| | 3) 3.182379 s | 3) 5.469663 s |
| 10000 | 3) 3.334170 s | 3) 5.520757 s |
| | 3) 3.353305 s | 3) 5.556374 s |
| | 3) 3.367732 s | 3) 5.597758 s |
+--------------+--------------------+--------------------+

*all tests performed without using memfd mechanism.

I hope it will be useful for someone :)

[1] https://github.com/lukasz-skalski/glib
[2] https://bugzilla.gnome.org/show_bug.cgi?id=721861

Cheers,--
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
[email protected]

2015-04-24 14:02:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, Apr 23, 2015 at 11:33:19PM +0200, Richard Weinberger wrote:
> > No it's not. O(256) equals O(1).
>
> Yeah, that's absolutely correct.
> I think Boris wanted to say that iterating over all hash buckets
> can be costly.

You are thinking of 'k' (the constant), where you usually have k*O(1), where k
does matter when comparing two algorithms with the same Big O value. And
sometimes even different O() values if the 'n' is small enough. 100*O(1) vs
1*O(n), the latter is better if n < 100.

Something that runs at O(n) but takes 1ms per n is a much worse algorithm than
something that runs at O(n) and takes 1us per n.

Both have the same O() notation, but which algorithm you use is obvious.

But Greg is right, you O notation isn't applicable here.

-- Steve

2015-04-24 14:20:09

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <[email protected]> wrote:
> - client: http://fpaste.org/215156/
>

Cool - it might also be interesting to try this without blocking round
trips, i.e. send requests as quickly as you can, and collect replies
asynchronously. That's how people ideally use dbus. It should
certainly reduce the total benchmark time, but just wondering if this
usage increases or decreases the delta between userspace daemon and
kdbus.

Havoc

2015-04-24 14:32:42

by Olaf Hering

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Wed, Apr 22, Linus Torvalds wrote:

> Conditional byte order is worse than silly - it's terminally stupid.

> In other words, think networking, which statically just decided to use
> big-endian. Sure, that was the wrong choice in the end, but even

Why was that wrong? Any pointers to further details?

Olaf

2015-04-24 14:34:40

by Lukasz Skalski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/24/2015 04:19 PM, Havoc Pennington wrote:
> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <[email protected]> wrote:
>> - client: http://fpaste.org/215156/
>>
>
> Cool - it might also be interesting to try this without blocking round
> trips, i.e. send requests as quickly as you can, and collect replies
> asynchronously. That's how people ideally use dbus. It should
> certainly reduce the total benchmark time, but just wondering if this
> usage increases or decreases the delta between userspace daemon and
> kdbus.

No problem - I'll prepare also asynchronous version.

>
> Havoc
>

BR,
--
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
[email protected]

2015-04-24 14:39:19

by Michele Curti

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, Apr 24, 2015 at 04:32:12PM +0200, Olaf Hering wrote:
> On Wed, Apr 22, Linus Torvalds wrote:
>
> > Conditional byte order is worse than silly - it's terminally stupid.
>
> > In other words, think networking, which statically just decided to use
> > big-endian. Sure, that was the wrong choice in the end, but even
>
> Why was that wrong? Any pointers to further details?
>

http://www.barrgroup.com/Embedded-Systems/How-To/Big-Endian-Little-Endian

"Serious run-time performance penalties occur when using TCP/IP on a little
endian processor."

The Intel x86 and x86-64 series of processors use the little-endian format

I think for this reason..

Michele

2015-04-24 14:41:11

by Jiri Kosina

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, 24 Apr 2015, Olaf Hering wrote:

> > Conditional byte order is worse than silly - it's terminally stupid.
>
> > In other words, think networking, which statically just decided to use
> > big-endian. Sure, that was the wrong choice in the end, but even
>
> Why was that wrong? Any pointers to further details?

Becase the architecture that is running on overwhelming majority of
today's world computers is little-endian, and therefore has to convert all
the time.

--
Jiri Kosina
SUSE Labs

2015-04-24 15:02:59

by Olaf Hering

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, Apr 24, Michele Curti wrote:

> http://www.barrgroup.com/Embedded-Systems/How-To/Big-Endian-Little-Endian
> "Serious run-time performance penalties occur when using TCP/IP on a little
> endian processor."

This URL lacks the numbers to proof such claim.

Olaf

2015-04-24 15:05:03

by Olaf Hering

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, Apr 24, Jiri Kosina wrote:

> Becase the architecture that is running on overwhelming majority of
> today's world computers is little-endian, and therefore has to convert all
> the time.

There is no swap-on-load/store to compensate for the odd decision?

Olaf

2015-04-24 15:15:12

by Michele Curti

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, Apr 24, 2015 at 05:02:32PM +0200, Olaf Hering wrote:
> On Fri, Apr 24, Michele Curti wrote:
>
> > http://www.barrgroup.com/Embedded-Systems/How-To/Big-Endian-Little-Endian
> > "Serious run-time performance penalties occur when using TCP/IP on a little
> > endian processor."
>
> This URL lacks the numbers to proof such claim.
>

?

It may be negligible compared to the whole stack operations, but swapping
bytes require more instructions that not swapping bytes.

Michele

2015-04-24 17:52:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, Apr 24, 2015 at 7:32 AM, Olaf Hering <[email protected]> wrote:
> On Wed, Apr 22, Linus Torvalds wrote:
>
>> Conditional byte order is worse than silly - it's terminally stupid.
>
>> In other words, think networking, which statically just decided to use
>> big-endian. Sure, that was the wrong choice in the end, but even
>
> Why was that wrong? Any pointers to further details?

Just because BE is effectively dead these days. Every BE architecture
is either gone, or is slowly (or not so slowly) converting to LE.

But more importantly, even when you pick the byte order that history
then relegates to the the losing position, and you end up doing byte
swappign on most machines, that is *still* better than conditionally
*not* doing byte swapping.

So even today, by all means make your protocols or disk images use
big-endian byte formats. But do it unconditionally. Don't make the
mistake of encoding the byte order as part of the data, and then
dynamically switching things (or not) around.

In fact, even today a BE byte order can make sense, if only exactly
because of network byte order - thanks to network byte order, there
are all those nice mostly-portable and often well-optimized "ntohl()"
functions available to you. So rather than introduce your own helper
functions for LE byte order access, you may be better off just using
BE because of existing network infrastructure.

Linus

2015-04-24 18:00:53

by Linus Torvalds

[permalink] [raw]
Subject: Re: Issues with capability bits and meta-data in kdbus

On Fri, Apr 24, 2015 at 10:52 AM, Linus Torvalds
<[email protected]> wrote:
>
> So even today, by all means make your protocols or disk images use
> big-endian byte formats. But do it unconditionally. Don't make the
> mistake of encoding the byte order as part of the data, and then
> dynamically switching things (or not) around.

Side note: even if you pick the "wrong" byte order, an unconditional
byte ordering choice can often avoid the bswap, just because many
operations are byte order independent.

For example, you can still test specific bits in bitfields etc. If you
have a constant mask you want to check, instead of converting the
field to the CPU byte order at runtime, convert that constant _mask_
to the data byte order at compile-time, and use it without any dynamic
byte swapping of the data.

And generally, avoid byte swapping until you really need it. Some
people seem to think that if the data is in one particular byte order
on disk, you should byteswap as you read it in, and as you write it
out. No, it's often best to actually keep it in the original format,
and only byte swap when actually using the value. Then you can mmap
files and not generate extra copies, or - as per above - you may be
able to structure your code to never need the byte swap at all.

So this is why (for example) the kernel byte swapping helper functions
do odd things like this:

#define __swab32(x) \
(__builtin_constant_p((__u32)(x)) ? \
___constant_swab32(x) : \
__fswab32(x))

just because the "___constant_swab32()" is designed to do everything
at compile-time.

Linus

2015-04-24 19:25:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
> > On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <[email protected]> wrote:
> >> - client: http://fpaste.org/215156/
> >>
> >
> > Cool - it might also be interesting to try this without blocking round
> > trips, i.e. send requests as quickly as you can, and collect replies
> > asynchronously. That's how people ideally use dbus. It should
> > certainly reduce the total benchmark time, but just wondering if this
> > usage increases or decreases the delta between userspace daemon and
> > kdbus.
>
> No problem - I'll prepare also asynchronous version.

That would be great to see as well. Many thanks for doing this work.

greg k-h

2015-04-27 08:57:51

by Lukasz Skalski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:
> On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
>> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
>>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <[email protected]> wrote:
>>>> - client: http://fpaste.org/215156/
>>>>
>>>
>>> Cool - it might also be interesting to try this without blocking round
>>> trips, i.e. send requests as quickly as you can, and collect replies
>>> asynchronously. That's how people ideally use dbus. It should
>>> certainly reduce the total benchmark time, but just wondering if this
>>> usage increases or decreases the delta between userspace daemon and
>>> kdbus.
>>
>> No problem - I'll prepare also asynchronous version.
>
> That would be great to see as well. Many thanks for doing this work.

As it was proposed by Havoc and Greg I've created simple benchmark for
asynchronous calls:

- server: http://fpaste.org/215157/ (the same as in the previous test)
- client: http://fpaste.org/215724/ (asynchronous version)

For asynchronous version of client I had to decrease number of calls to
128 (for synchronous version it was x20000 calls), otherwise we can
exceed the maximum number of pending replies per connection.

The test results are following:

+--------------+--------------------+--------------------+
| | Elapsed time | Elapsed time |
| Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON |
| [bytes] | KDBUS SUPPORT* | |
+--------------+--------------------+--------------------+
| | 1) 0.018639 s | 1) 0.029947 s |
| 1000 | 2) 0.017045 s | 2) 0.032812 s |
| | 3) 0.017490 s | 3) 0.029971 s |
| | 4) 0.018001 s | 4) 0.026485 s |
+--------------+--------------------+--------------------+
| | 3) 0.019898 s | 3) 0.040914 s |
| 10000 | 3) 0.022187 s | 3) 0.033604 s |
| | 3) 0.020854 s | 3) 0.037616 s |
| | 3) 0.020020 s | 3) 0.033772 s |
+--------------+--------------------+--------------------+
*all tests performed without using memfd mechanism.

And as I wrote in my previous mail, kdbus transport for GLib is not
finished yet and there are still some places for improvements, so please
do not treat these test results as final).

>
> greg k-h
>

Cheers,
--
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
[email protected]

2015-04-27 12:47:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed 22-04-15 12:36:12, Andy Lutomirski wrote:
> On Apr 22, 2015 7:57 AM, "Michal Hocko" <[email protected]> wrote:
> >
> > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote:
> > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <[email protected]> wrote:
> > > > On Tue 21-04-15 16:01:01, David Herrmann wrote:
> > > >> Hi
> > > >>
> > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
> > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
> > > >> >> Hi
> > > >> >>
> > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> > > >> >> <[email protected]> wrote:
> > > >> >> >> On top of that, I think that someone into resource management needs to
> > > >> >> >> seriously consider whether having a broadcast send do get_user_pages
> > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> > > >> >> >> is a good idea.
> > > >> >> >
> > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
> > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures
> > > >> >> > referenced 8)
> > > >> >>
> > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is
> > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal
> > > >> >> issue that does not belong here.
> > > >> >
> > > >> > It kind of does AFAIU.
> > > >>
> > > >> No, it is not. The issue with GUP is that you elevate the page
> > > >> ref-count and thus prevent lru isolation, sealing, whatsoever.
> > > >
> > > > The point was that such a memory might be not present yet and need a
> > > > page fault with all the side effects - memory reclaim, memcg charge...
> > > >
> > > >> I cannot see how it is related to kdbus. However, ...
> > > >>
> > > >> > If for nothing else then the memcg reasons mentioned in
> > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
> > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in
> > > >> > a different user's context then you basically allow to hide memory
> > > >> > allocations from the memcg. That is a clear show stopper.
> > > >> >
> > > >> > Or have I misunderstood the way how shmem buffers are used here?
> > > >>
> > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are
> > > >> used as receive-buffers by kdbus peers. They are read-only to
> > > >> user-space. All allocations are done by the kernel on message passing.
> > > >
> > > > OK, so the shmem buffer is allocated on the kernels behalf and under
> > > > its control and no userspace can hand over one to kdbus. Do I get
> > > > it right? If yes then the memcg escape I was describing above is
> > > > not possible of course. This wasn't clear to me from the previous
> > > > discussion. Thanks for the clarification!
> > >
> > > I'm still missing something here, I think. At the time of pool
> > > creation, the kernel calls shmem_file_setup in the context of the
> > > untrusted user. Then, when a privileged daemon broadcasts, the kernel
> > > calls vfs_iter_write or similar, thus allocating the page, right? I
> > > don't see why the page would be allocated early or why vfs_iter_write
> > > and the associated shmem code would care what memcg created the shmem
> > > file -- all of that code seems to use current's memcg on brief
> > > inspection.
> >
> > Yes it is the current task on the first charge or the original memcg on
> > the swap in. But my understanding from the above, and I haven't read the
> > code yet, is that the untrusted userspace is only reader from the buffer
> > and isn't allowed to modify the buffer.
> >
> > > Bear in mind that the bad guy gets to use madvise, etc to mess around
> > > with the page cache state.
> >
> > How can an untrusted user play with shmem when it is read-only?
> > shmem_file_setup shouls create an unlinked file so no process can access
> > it via tmpfs AFAIU and potentially fault the memory before the producent
> > will fill it up (thus fault in in the trusted context). I have no idea
> > how the receiver gets to the buffer though.
> >
>
> The receiver gets to mmap the buffer. I'm not sure what protection they get.

OK, so I've checked the code. kdbus_pool_new sets up a shmem file
(unlinked) so not visible externally. The consumer will get it via mmap
on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and
clears VM_MAYWRITE. The receiver even doesn't have access to the shmem
file directly.

It is ugly that kdbus_pool_mmap replaces the original vm_file and make
it point to the shmem file. I am not sure whether this is safe all the
time and it would deserve a big fat comment. On the other hand, it seems
some drivers are doing this already (e.g. dma_buf_mmap).

> The thing I'm worried about is that the receiver might deliberately
> avoid faulting in a bunch of pages and instead wait for the producer
> to touch them, causing pages that logically belong to the receiver to
> be charged to the producer instead.

Hmm, now that I am looking into the code it seems you are right. E.g.
kdbus_cmd_send runs in the context of the sender AFAIU. This gets down
to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this
is where we get to charge the memory. AFAIU the terminology all the
receivers will share the same shmem file when mmaping the endpoint.

This, however, doesn't seem to be exploitable to hide memory charges
because the receiver cannot make the buffer writable. A nasty process
with a small memcg limit could still pre-fault the memory before any
writer gets sends a message and slow the whole endpoint traffic. But
that wouldn't be a completely new thing because processes might hammer
on memory even without memcg... It is just that this would be kind of
easier with memcg.
If that is the concern then the buffer should be pre-charged at the time
when it is created.

--
Michal Hocko
SUSE Labs

2015-04-27 17:18:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 10:57:45AM +0200, Lukasz Skalski wrote:
> On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:
> > On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
> >> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
> >>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <[email protected]> wrote:
> >>>> - client: http://fpaste.org/215156/
> >>>>
> >>>
> >>> Cool - it might also be interesting to try this without blocking round
> >>> trips, i.e. send requests as quickly as you can, and collect replies
> >>> asynchronously. That's how people ideally use dbus. It should
> >>> certainly reduce the total benchmark time, but just wondering if this
> >>> usage increases or decreases the delta between userspace daemon and
> >>> kdbus.
> >>
> >> No problem - I'll prepare also asynchronous version.
> >
> > That would be great to see as well. Many thanks for doing this work.
>
> As it was proposed by Havoc and Greg I've created simple benchmark for
> asynchronous calls:
>
> - server: http://fpaste.org/215157/ (the same as in the previous test)
> - client: http://fpaste.org/215724/ (asynchronous version)
>
> For asynchronous version of client I had to decrease number of calls to
> 128 (for synchronous version it was x20000 calls), otherwise we can
> exceed the maximum number of pending replies per connection.
>
> The test results are following:
>
> +--------------+--------------------+--------------------+
> | | Elapsed time | Elapsed time |
> | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON |
> | [bytes] | KDBUS SUPPORT* | |
> +--------------+--------------------+--------------------+
> | | 1) 0.018639 s | 1) 0.029947 s |
> | 1000 | 2) 0.017045 s | 2) 0.032812 s |
> | | 3) 0.017490 s | 3) 0.029971 s |
> | | 4) 0.018001 s | 4) 0.026485 s |
> +--------------+--------------------+--------------------+
> | | 3) 0.019898 s | 3) 0.040914 s |
> | 10000 | 3) 0.022187 s | 3) 0.033604 s |
> | | 3) 0.020854 s | 3) 0.037616 s |
> | | 3) 0.020020 s | 3) 0.033772 s |
> +--------------+--------------------+--------------------+
> *all tests performed without using memfd mechanism.
>
> And as I wrote in my previous mail, kdbus transport for GLib is not
> finished yet and there are still some places for improvements, so please
> do not treat these test results as final).

Very nice, thanks. Any chance you can bump those message sizes up to
over 512k? I think that will show a huge difference. Even just under
512k should be faster, as you have shown, but I have been told that for
messages larger than 512k, the D-Bus daemon has "issues", which has kept
people from wanting to use messages that large before now.

thanks again,

greg k-h

2015-04-27 20:11:27

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

[resent without HTML]

On Apr 27, 2015 5:46 AM, "Michal Hocko" <[email protected]> wrote:
>
> On Wed 22-04-15 12:36:12, Andy Lutomirski wrote:
> > On Apr 22, 2015 7:57 AM, "Michal Hocko" <[email protected]> wrote:
> > >
> > > On Tue 21-04-15 11:11:35, Andy Lutomirski wrote:
> > > > On Tue, Apr 21, 2015 at 7:27 AM, Michal Hocko <[email protected]> wrote:
> > > > > On Tue 21-04-15 16:01:01, David Herrmann wrote:
> > > > >> Hi
> > > > >>
> > > > >> On Tue, Apr 21, 2015 at 2:20 PM, Michal Hocko <[email protected]> wrote:
> > > > >> > On Tue 21-04-15 12:17:49, David Herrmann wrote:
> > > > >> >> Hi
> > > > >> >>
> > > > >> >> On Tue, Apr 21, 2015 at 11:35 AM, One Thousand Gnomes
> > > > >> >> <[email protected]> wrote:
> > > > >> >> >> On top of that, I think that someone into resource management needs to
> > > > >> >> >> seriously consider whether having a broadcast send do get_user_pages
> > > > >> >> >> or the equivalent on pages supplied by untrusted recipients (plural!)
> > > > >> >> >> is a good idea.
> > > > >> >> >
> > > > >> >> > Oh but its so much fun if you pass pages belonging to a device driver, or
> > > > >> >> > pass bits of a GEM object thereby keeping entire graphics textures
> > > > >> >> > referenced 8)
> > > > >> >>
> > > > >> >> We do not use GUP, nor do we pass around pinned pages. All we use is
> > > > >> >> __vfs_read() / __vfs_write() on shmem. Whether generic_file_write() /
> > > > >> >> copy_from_user() internally relies on GUP or not, is an orthogonal
> > > > >> >> issue that does not belong here.
> > > > >> >
> > > > >> > It kind of does AFAIU.
> > > > >>
> > > > >> No, it is not. The issue with GUP is that you elevate the page
> > > > >> ref-count and thus prevent lru isolation, sealing, whatsoever.
> > > > >
> > > > > The point was that such a memory might be not present yet and need a
> > > > > page fault with all the side effects - memory reclaim, memcg charge...
> > > > >
> > > > >> I cannot see how it is related to kdbus. However, ...
> > > > >>
> > > > >> > If for nothing else then the memcg reasons mentioned in
> > > > >> > other email (http://marc.info/?l=linux-kernel&m=142953380508188). If an
> > > > >> > untrusted user is allowed to hand over a shmem backed buffer which hasn't
> > > > >> > been charged yet (read faulted in) and then kdbus forced to fault it in
> > > > >> > a different user's context then you basically allow to hide memory
> > > > >> > allocations from the memcg. That is a clear show stopper.
> > > > >> >
> > > > >> > Or have I misunderstood the way how shmem buffers are used here?
> > > > >>
> > > > >> ..as you mentioned memcg, lets figure that out here. shmem buffers are
> > > > >> used as receive-buffers by kdbus peers. They are read-only to
> > > > >> user-space. All allocations are done by the kernel on message passing.
> > > > >
> > > > > OK, so the shmem buffer is allocated on the kernels behalf and under
> > > > > its control and no userspace can hand over one to kdbus. Do I get
> > > > > it right? If yes then the memcg escape I was describing above is
> > > > > not possible of course. This wasn't clear to me from the previous
> > > > > discussion. Thanks for the clarification!
> > > >
> > > > I'm still missing something here, I think. At the time of pool
> > > > creation, the kernel calls shmem_file_setup in the context of the
> > > > untrusted user. Then, when a privileged daemon broadcasts, the kernel
> > > > calls vfs_iter_write or similar, thus allocating the page, right? I
> > > > don't see why the page would be allocated early or why vfs_iter_write
> > > > and the associated shmem code would care what memcg created the shmem
> > > > file -- all of that code seems to use current's memcg on brief
> > > > inspection.
> > >
> > > Yes it is the current task on the first charge or the original memcg on
> > > the swap in. But my understanding from the above, and I haven't read the
> > > code yet, is that the untrusted userspace is only reader from the buffer
> > > and isn't allowed to modify the buffer.
> > >
> > > > Bear in mind that the bad guy gets to use madvise, etc to mess around
> > > > with the page cache state.
> > >
> > > How can an untrusted user play with shmem when it is read-only?
> > > shmem_file_setup shouls create an unlinked file so no process can access
> > > it via tmpfs AFAIU and potentially fault the memory before the producent
> > > will fill it up (thus fault in in the trusted context). I have no idea
> > > how the receiver gets to the buffer though.
> > >
> >
> > The receiver gets to mmap the buffer. I'm not sure what protection they get.
>
> OK, so I've checked the code. kdbus_pool_new sets up a shmem file
> (unlinked) so not visible externally. The consumer will get it via mmap
> on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and
> clears VM_MAYWRITE. The receiver even doesn't have access to the shmem
> file directly.
>
> It is ugly that kdbus_pool_mmap replaces the original vm_file and make
> it point to the shmem file. I am not sure whether this is safe all the
> time and it would deserve a big fat comment. On the other hand, it seems
> some drivers are doing this already (e.g. dma_buf_mmap).

What happens to map_files in proc? It seems unlikely that CRIU would
ever work on dma_buf, but this could be a problem for CRIU with kdbus.

>
> > The thing I'm worried about is that the receiver might deliberately
> > avoid faulting in a bunch of pages and instead wait for the producer
> > to touch them, causing pages that logically belong to the receiver to
> > be charged to the producer instead.
>
> Hmm, now that I am looking into the code it seems you are right. E.g.
> kdbus_cmd_send runs in the context of the sender AFAIU. This gets down
> to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this
> is where we get to charge the memory. AFAIU the terminology all the
> receivers will share the same shmem file when mmaping the endpoint.
>
> This, however, doesn't seem to be exploitable to hide memory charges
> because the receiver cannot make the buffer writable. A nasty process
> with a small memcg limit could still pre-fault the memory before any
> writer gets sends a message and slow the whole endpoint traffic. But
> that wouldn't be a completely new thing because processes might hammer
> on memory even without memcg... It is just that this would be kind of
> easier with memcg.
> If that is the concern then the buffer should be pre-charged at the time
> when it is created.

The attach I had in mind was that the nasty process with a small memcg
creates one or many of these and doesn't pre-fault it. Then a sender
(systemd?) sends messages and they get charged, possibly once for each
copy sent, to the root memcg. So kdbus should probably pre-charge the
creator of the pool.

2015-04-27 21:32:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <[email protected]> wrote:
>
> To perform tests I've created two simple apps:
>
> - server: http://fpaste.org/215157/
> - client: http://fpaste.org/215156/

So since Andy reported that dbus seems to be a few orders of magnitude
too slow, I tried to build these apps to see what it even does.

They don't buidl on F21. You seem to be using features that are too
new to exist even in fairly modern distros:

server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared

so I can't even see what dbus does *now*.

That said, either you're running your test on a potato, or dbus is
seriously screwed up. No way should it take 4+ seconds to send a 1000b
message to back and forth 20k times. But as mentioned, I can't even
see what it's doing right now.

Linus

2015-04-27 21:40:30

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 2:32 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <[email protected]> wrote:
>>
>> To perform tests I've created two simple apps:
>>
>> - server: http://fpaste.org/215157/
>> - client: http://fpaste.org/215156/
>
> So since Andy reported that dbus seems to be a few orders of magnitude
> too slow, I tried to build these apps to see what it even does.
>
> They don't buidl on F21. You seem to be using features that are too
> new to exist even in fairly modern distros:
>
> server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared
>
> so I can't even see what dbus does *now*.

Change "USER" to "SESSION". Build with:

gcc -Wall -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -o
client client.c -lglib-2.0 -ldbus-glib-1 -ldbus-1 -lgobject-2.0
-lglib-2.0 -ldbus-1 -lgio-2.0

The again with s/client/server/

For all I know, the USER vs SESSION distinction matters, but I can't
imagine why.

>
> That said, either you're running your test on a potato, or dbus is
> seriously screwed up. No way should it take 4+ seconds to send a 1000b
> message to back and forth 20k times. But as mentioned, I can't even
> see what it's doing right now.

Whee! I'm typing this email on a potato!

--Andy

2015-04-27 22:00:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 2:40 PM, Andy Lutomirski <[email protected]> wrote:
>
> Change "USER" to "SESSION".

That works.

> Build with:

Hell no. I used

gcc client.c -o client $(pkg-config --cflags --libs gtk+-2.0)

instead. That worked.

>> That said, either you're running your test on a potato, or dbus is
>> seriously screwed up. No way should it take 4+ seconds to send a 1000b
>> message to back and forth 20k times. But as mentioned, I can't even
>> see what it's doing right now.
>
> Whee! I'm typing this email on a potato!

No, I think you're right, there's the other non-potato choice: "dbus
is seriously screwed up".

That thing has almost no kernel footprint. It's spending all it's time
in user space overhead.

Quite frankly, the whole "kdbus is important for performance" seems to
be *totally* invalidated by even a minimal look at profiles for that
thing. Here's the top-15 offender list:

2.62% gdbus libc-2.20.so [.] _int_malloc
2.43% gdbus libc-2.20.so [.] free
2.31% server libc-2.20.so [.] free
2.12% gdbus libc-2.20.so [.] malloc
1.77% gdbus libglib-2.0.so.0.4200.2 [.] g_utf8_validate
1.43% gdbus libglib-2.0.so.0.4200.2 [.] g_slice_alloc
1.41% gdbus libglib-2.0.so.0.4200.2 [.] g_hash_table_lookup
1.28% server libc-2.20.so [.] _int_malloc
1.27% gdbus libglib-2.0.so.0.4200.2 [.] g_mutex_lock
1.22% gdbus libglib-2.0.so.0.4200.2 [.] g_variant_unref
1.16% server libc-2.20.so [.] malloc
1.14% gdbus libglib-2.0.so.0.4200.2 [.] g_bit_lock
1.07% gdbus libglib-2.0.so.0.4200.2 [.] g_slice_free1
1.05% gdbus libglib-2.0.so.0.4200.2 [.] g_bit_unlock
1.01% gdbus libglib-2.0.so.0.4200.2 [.] g_mutex_unlock

there's not a kernel function in sight in the top-15, and it's all
just overhead. The above is from the server side, but the client looks
similar.

If somebody wants to speed up dbus, they should likely look at the
user-space code, not the kernel side.

My guess is that pretty much the entirely of the quoted kdbus
"speedup" isn't because it speeds up any kernel side thing, it's
because it avoids the user-space crap in the dbus server.

IOW, all the people who say that it's about avoiding context switches
are probably just full of shit. It's not about context switches, it's
about bad user-level code.

Linus

2015-04-27 22:14:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds
<[email protected]> wrote:
>
> IOW, all the people who say that it's about avoiding context switches
> are probably just full of shit. It's not about context switches, it's
> about bad user-level code.

Just to make sure, I did a system-wide profile (so that you can
actually see the overhead of context switching better), and that
didn't change the picture.

The scheduler overhead *might* be 1% or so.

So really. The people who talk about how kdbus improves performance
are just full of sh*t. Yes, it improves things, but the improvement
seems to be 100% "incidental", in that it avoids a few trips down the
user-space problems.

The real problems seem to be in dbus memory management (suggestion:
keep a small per-thread cache of those message allocations) and to a
smaller degree in the crazy utf8 validation (why the f*ck does it do
that anyway?), with some locking problems thrown in for good measure.

Linus

2015-04-27 22:30:13

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, 27 Apr 2015, Lukasz Skalski wrote:

> Subject: Re: [GIT PULL] kdbus for 4.1-rc1
>
> On 04/24/2015 09:25 PM, Greg Kroah-Hartman wrote:
>> On Fri, Apr 24, 2015 at 04:34:34PM +0200, Lukasz Skalski wrote:
>>> On 04/24/2015 04:19 PM, Havoc Pennington wrote:
>>>> On Fri, Apr 24, 2015 at 9:50 AM, Lukasz Skalski <[email protected]> wrote:
>>>>> - client: http://fpaste.org/215156/
>>>>>
>>>>
>>>> Cool - it might also be interesting to try this without blocking round
>>>> trips, i.e. send requests as quickly as you can, and collect replies
>>>> asynchronously. That's how people ideally use dbus. It should
>>>> certainly reduce the total benchmark time, but just wondering if this
>>>> usage increases or decreases the delta between userspace daemon and
>>>> kdbus.
>>>
>>> No problem - I'll prepare also asynchronous version.
>>
>> That would be great to see as well. Many thanks for doing this work.
>
> As it was proposed by Havoc and Greg I've created simple benchmark for
> asynchronous calls:
>
> - server: http://fpaste.org/215157/ (the same as in the previous test)
> - client: http://fpaste.org/215724/ (asynchronous version)
>
> For asynchronous version of client I had to decrease number of calls to
> 128 (for synchronous version it was x20000 calls), otherwise we can
> exceed the maximum number of pending replies per connection.

aren't we being told that part of the reason for needing kdbus is that
thousands, or tens of thousands of messages are being spewed out? how does
limiting it to 128 messages represent real-life if this is the case?

David Lang

> The test results are following:
>
> +--------------+--------------------+--------------------+
> | | Elapsed time | Elapsed time |
> | Message size | GLIB WITH NATIVE | GLIB + DBUS-DAEMON |
> | [bytes] | KDBUS SUPPORT* | |
> +--------------+--------------------+--------------------+
> | | 1) 0.018639 s | 1) 0.029947 s |
> | 1000 | 2) 0.017045 s | 2) 0.032812 s |
> | | 3) 0.017490 s | 3) 0.029971 s |
> | | 4) 0.018001 s | 4) 0.026485 s |
> +--------------+--------------------+--------------------+
> | | 3) 0.019898 s | 3) 0.040914 s |
> | 10000 | 3) 0.022187 s | 3) 0.033604 s |
> | | 3) 0.020854 s | 3) 0.037616 s |
> | | 3) 0.020020 s | 3) 0.033772 s |
> +--------------+--------------------+--------------------+
> *all tests performed without using memfd mechanism.
>
> And as I wrote in my previous mail, kdbus transport for GLib is not
> finished yet and there are still some places for improvements, so please
> do not treat these test results as final).
>
>>
>> greg k-h
>>
>
> Cheers,
>

2015-04-28 10:39:13

by Lukasz Skalski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/27/2015 11:32 PM, Linus Torvalds wrote:
> On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <[email protected]> wrote:
>>
>> To perform tests I've created two simple apps:
>>
>> - server: http://fpaste.org/215157/
>> - client: http://fpaste.org/215156/
>
> So since Andy reported that dbus seems to be a few orders of magnitude
> too slow, I tried to build these apps to see what it even does.
>
> They don't buidl on F21. You seem to be using features that are too
> new to exist even in fairly modern distros:
>
> server.c:47:24: error: ‘G_BUS_TYPE_USER’ undeclared
>
> so I can't even see what dbus does *now*.
>

I've just explained it in my mail to Andy. As it was discussed some
time ago with GLib developers, we introduced two new bus
types called "user" (G_BUS_TYPE_USER) and "machine"
(G_BUS_TYPE_MACHINE). At this moment, these are only available on GLib
devel branch, so I should replace G_BUS_TYPE_USER with
G_BUS_TYPE_SESSION before I posted my benchmark apps - sorry for that.

> That said, either you're running your test on a potato, or dbus is
> seriously screwed up. No way should it take 4+ seconds to send a 1000b
> message to back and forth 20k times. But as mentioned, I can't even
> see what it's doing right now.
>
> Linus
>

--
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
[email protected]

2015-04-28 10:53:51

by Lukasz Skalski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/28/2015 12:29 AM, David Lang wrote:
> On Mon, 27 Apr 2015, Lukasz Skalski wrote:
>
> aren't we being told that part of the reason for needing kdbus is that
> thousands, or tens of thousands of messages are being spewed out? how
> does limiting it to 128 messages represent real-life if this is the case?
>

AFAIK, at this moment some limits (like for example maximum number of
queued requests waiting for a reply or ) for both - DBus daemon and
kdbus, are the same (or at least quite similar).

> David Lang
>
>

--
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
[email protected]

2015-04-28 12:50:12

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 6:00 PM, Linus Torvalds
<[email protected]> wrote:
> If somebody wants to speed up dbus, they should likely look at the
> user-space code, not the kernel side.

To be more precise, your profile seems to show a lot of the gdbus
(glib bindings) user space code. (And the blocking version of this
benchmark *is* doing something ridiculous, by blocking for every round
trip, which is the one performance mistake the docs say over and over
not to make.)

There are at least two other C bindings (a plain-C one in systemd and
the original libdbus).

If someone wanted to get the noise out of the picture I imagine the
plain-C bindings in systemd might have a lot less in the way of
allocations and locks than gdbus, though I haven't looked at them.
Those systemd bindings are also the ones people asking for more
performance are probably using (because they are talking about early
boot, system services, etc.)

> My guess is that pretty much the entirely of the quoted kdbus
> "speedup" isn't because it speeds up any kernel side thing, it's
> because it avoids the user-space crap in the dbus server.

The dbus bus daemon doesn't link to any g_ functions, fwiw, when
interpreting these profiles. Though nobody would claim the bus daemon
is fast, it is on the order of a few times slower than a raw socket
last I checked (which was a long time ago) ... here are some old
threads:

http://lists.freedesktop.org/pipermail/dbus/2004-November/001779.html
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

In 2004, the libdbus parsing/validation/malloc/etc. overhead was 2.5x
a raw socket without the bus daemon, and about twice that with the bus
daemon (since the daemon adds another read and another write per
message). I'm not aware of any reason this would have changed
dramatically, though it doesn't mean there isn't one.

Havoc

2015-04-28 13:45:10

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 6:14 PM, Linus Torvalds
<[email protected]> wrote:
> The real problems seem to be in dbus memory management (suggestion:
> keep a small per-thread cache of those message allocations) and to a
> smaller degree in the crazy utf8 validation (why the f*ck does it do
> that anyway?), with some locking problems thrown in for good measure.
>

I would say there are two distinct performance topics here.

A. is the fixed overhead of various bindings (which may well vary a
lot by binding). This is parsing, validation, allocation, locking,
whatever. It tends to be "per message operation" (read/parse or
marshal/write of a message).

B. is how many of these "message operations" (read/parse,
marshal/write) are happening.

To make A*B smaller, one can reduce either A or B.

The kdbus idea seems to be mostly about B, eliminating the bus
daemon's read/parse and marshal/write, and reducing it to only one
marshal/write by the sender and one read/parse by the recipient
without the daemon in between.

People have worked on A for clients, by doing the systemd binding for
example, but perhaps they have been reluctant to work on the bus
daemon itself to improve A for the bus because they felt solving B
would involve eliminating the bus daemon anyway. If you are planning
to solve B via kdbus, then optimizing the bus daemon itself would be a
waste of time (A only matters for clients, not the bus, in kdbus
world).

That email I linked earlier
(http://lists.freedesktop.org/archives/dbus/2012-March/015024.html )
has many suggestions on A for the bus daemon itself, but of course
taking the bus daemon out of the equation would be more effective than
any amount of optimizing it.

A. is kind of a realm of many choices - there are tons of bindings,
and people can decide if they want the convenient-but-malloc-happy
glib ones, or the more traditional C style of systemd, or Python or
Java or JavaScript or whatever ... this is an area where people can
make the tradeoff they want. But everyone is "stuck" with the bus
daemon (or kdbus) since it has to be shared among clients, of course.

Havoc

2015-04-28 14:48:48

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

btw if I can make a suggestion, it's quite confusing to talk about
"dbus" unqualified when we are talking about implementation issues,
since it muddles bus daemon vs. clients, and also since there are lots
of implementations of the client bindings:

http://www.freedesktop.org/wiki/Software/DBusBindings/

For the bus daemon, the only two implementations I know of are the
original one (which uses libdbus as its binding) and kdbus, though.

I would expect there's no question the bus daemon can be faster, maybe
say 1.5x raw sockets instead of 2.5x, or whatever - something on that
order. Should probably simply stipulate this for discussion purposes:
"someone could optimize the crap out of the bus daemon". The kdbus
question is about whether to eliminate this daemon entirely.

Havoc

2015-04-28 17:19:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 10:48:10AM -0400, Havoc Pennington wrote:
> btw if I can make a suggestion, it's quite confusing to talk about
> "dbus" unqualified when we are talking about implementation issues,
> since it muddles bus daemon vs. clients, and also since there are lots
> of implementations of the client bindings:
>
> http://www.freedesktop.org/wiki/Software/DBusBindings/
>
> For the bus daemon, the only two implementations I know of are the
> original one (which uses libdbus as its binding) and kdbus, though.
>
> I would expect there's no question the bus daemon can be faster, maybe
> say 1.5x raw sockets instead of 2.5x, or whatever - something on that
> order. Should probably simply stipulate this for discussion purposes:
> "someone could optimize the crap out of the bus daemon". The kdbus
> question is about whether to eliminate this daemon entirely.

So the question is if one of the justifications for moving the daemon
into kernel space is that it's performance is crap, then I think it is
useful to determine whether a fully optimized userspace daemon would
be good enough.

After all, we can go down the Novell Netware path and push arbitrary
web servers, ldap servers, etc. all into the kernel on the excuse of
"the performance would be faster". But that begs the question of how
much performance improvements can be made purely in userspace, and
ignores all of the security and stability costs of moving more and
more code into the kernel.

So the question I have is why in the world do we want to be able to
support 1.5x raw sockets for a bus speed? What's the use case where
that kind of performance is required for a bus based system, and is
that a world we really want to live in? I find dbus to be extremely
hard to debug when my desktop starts doing things I don't want it to
do. The fact that it might be flinging around hundreds of thousands
of messages, and that this is something we want to encourage, doesn't
make me feel any more kindly inclined towards dbus or kdbus....

- Ted

2015-04-28 17:19:58

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, 28 Apr 2015, Havoc Pennington wrote:

> btw if I can make a suggestion, it's quite confusing to talk about
> "dbus" unqualified when we are talking about implementation issues,
> since it muddles bus daemon vs. clients, and also since there are lots
> of implementations of the client bindings:
>
> http://www.freedesktop.org/wiki/Software/DBusBindings/
>
> For the bus daemon, the only two implementations I know of are the
> original one (which uses libdbus as its binding) and kdbus, though.
>
> I would expect there's no question the bus daemon can be faster, maybe
> say 1.5x raw sockets instead of 2.5x, or whatever - something on that
> order. Should probably simply stipulate this for discussion purposes:
> "someone could optimize the crap out of the bus daemon". The kdbus
> question is about whether to eliminate this daemon entirely.

As I'm seeing things, we aren't talking about 1.5x vs 2.5x, we're talking about
1000x

If the examples that are being used to show the performance advantage of kdbus
vs normal dbus are doing the wrong thing, then we need to get some other
examples available to people who don't live and breath dbus that 'so things
right' so that the kernel developers can see what you think is the real problem
and how kdbus addresses it.

So far, this 'wrong' example is the only thing that's been posted to show the
performance advantage of kdbus.

David Lang

2015-04-28 19:20:04

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 1:19 PM, David Lang <[email protected]> wrote:
> If the examples that are being used to show the performance advantage of
> kdbus vs normal dbus are doing the wrong thing, then we need to get some
> other examples available to people who don't live and breath dbus that 'so
> things right' so that the kernel developers can see what you think is the
> real problem and how kdbus addresses it.
>
> So far, this 'wrong' example is the only thing that's been posted to show
> the performance advantage of kdbus.

I'm hopeful someone will do that.

fwiw, I would be suspicious of a broken benchmark if it didn't show:

* the bus daemon means an extra read/parse and marshal/write per
message, so 4 vs. 2
* the existence of the bus daemon therefore makes a message
send/receive take roughly twice as long

https://lwn.net/Articles/580194/ has a bit more elaboration about
number of copies, validations, and context switches in each case.

>From what I can tell, the core performance claim for kdbus is that for
a userspace daemon to be a routing intermediary, it has to receive and
re-send messages. If the baseline performance of IPC is the cost to
send once and receive once, adding the daemon means there's twice as
much to do (1 more receive, 1 more send). However fast you make
send/receive, the daemon always means there are twice as many
send/receives as there would be with no daemon.

If that isn't what a benchmark shows, then there's a mystery to
explain... (one disruption to the ratio of course could be if the
clients use a much faster or slower dbus lib than the daemon)

As noted many times, of course this 2x penalty for the daemon was a
conscious tradeoff - kdbus is trying to escape the tradeoff in order
to extend usage of dbus to more use cases. Given the tradeoff,
_existing_ uses of dbus seem to prefer the performance hit to the loss
of useful semantics, but potential new users would like to or need to
have both.

That LWN article lists some other non-performance rationales for kdbus
too, of course.

Aside: earlier I referred to the systemd dbus client binding without a
link, the link appears to be:
http://cgit.freedesktop.org/systemd/systemd/tree/src/libsystemd/sd-bus

Havoc

2015-04-28 20:26:07

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <[email protected]> wrote:
> So the question is if one of the justifications for moving the daemon
> into kernel space is that it's performance is crap, then I think it is
> useful to determine whether a fully optimized userspace daemon would
> be good enough.
>

Yeah. I don't know how you answer that, because the answer is probably
"it would be good enough for some things and not for other things." It
depends on whether an app is sending enough data to be too slow, and
it depends on the hardware, right.

What I think we might know: the userspace:kernel time-to-send ratio
should always be around 2:1, if both of them are
similarly-implemented, because the userspace version has about 2x the
work to do.

The actual wall-clock time of course depends on the hardware and
what's being sent.

If there was a deviation from 2:1 in a benchmark, it might be because
of implementation issues - so for example libdbus+dbus-daemon might be
3:1 or 5:1 to sd-dbus+kdbus, because sd-dbus isn't as bloated as
libdbus, say. That isn't telling you anything about kernel vs.
userspace architecture, the extra ratio above 2:1 is only telling you
about userspace implementation quality.

For purposes of deciding what to put in kernel - the differences
between dbus client implementations (sd-dbus, libdbus, gdbus, etc.)
seem like irrelevant noise to me.

Re: the slippery slope to LDAP in the kernel - my questions would be
things like 1) what are non-performance reasons to have dbus in the
kernel, such as early boot or security considerations; 2) does LDAP in
kernel give these kind of 2:1 gains; 3) is there a simpler way to get
the 2:1 gain for dbus...

Others can answer those better than I can.

I _would_ say that dbus is more "generic" than something like LDAP;
dbus is specific to the use-case of coordinating processes on a single
machine, but it isn't specific to any particular application, and it's
been used for lots of different applications. On my laptop, which is a
pretty normal fedora 21 as far as I know:

$ rpm -q --whatrequires 'libdbus-1.so.3()(64bit)' | wc -l
113

this omits anyone using a different binding, it's only libdbus users.

> I find dbus to be extremely hard to debug when my desktop starts doing
> things I don't want it to do. The fact that it might be flinging around hundreds
> of thousands of messages, and that this is something we want to encourage,

This particular argument doesn't resonate with me ... if dbus is hard
to debug, it's not as if "ad hoc application-specific sidechannel
somebody cooked up" is going to be easier.

People aren't usually making up data to send around just because they
can. If they need to send an audio stream, and dbus is too slow,
they'll send it another ad hoc way, but it ultimately has to get sent.
Same for most data, it is the size it is and it needs to go where it
needs to go, for some what-the-user-wants-to-do kind of reason.

If apps have to, they say "I'm sorry Dave I can't do that - you can't
software-decode 4K video on your 300mhz ARM" - of course.

Havoc

2015-04-28 20:35:16

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, 28 Apr 2015, Havoc Pennington wrote:

> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <[email protected]> wrote:
>> If the examples that are being used to show the performance advantage of
>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>> other examples available to people who don't live and breath dbus that 'so
>> things right' so that the kernel developers can see what you think is the
>> real problem and how kdbus addresses it.
>>
>> So far, this 'wrong' example is the only thing that's been posted to show
>> the performance advantage of kdbus.
>
> I'm hopeful someone will do that.
>
> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>
> * the bus daemon means an extra read/parse and marshal/write per
> message, so 4 vs. 2
> * the existence of the bus daemon therefore makes a message
> send/receive take roughly twice as long
>
> https://lwn.net/Articles/580194/ has a bit more elaboration about
> number of copies, validations, and context switches in each case.
>
> From what I can tell, the core performance claim for kdbus is that for
> a userspace daemon to be a routing intermediary, it has to receive and
> re-send messages. If the baseline performance of IPC is the cost to
> send once and receive once, adding the daemon means there's twice as
> much to do (1 more receive, 1 more send). However fast you make
> send/receive, the daemon always means there are twice as many
> send/receives as there would be with no daemon.

there are twice as many context switches, nobody disputes that, the question is
if it matters.

It doesn't matter if the message router is in kernel space or user space, it
still needs to read/parse, marshal/write the data, so you aren't saving that
time due to it being in the kernel.

> If that isn't what a benchmark shows, then there's a mystery to
> explain... (one disruption to the ratio of course could be if the
> clients use a much faster or slower dbus lib than the daemon)
>
> As noted many times, of course this 2x penalty for the daemon was a
> conscious tradeoff - kdbus is trying to escape the tradeoff in order
> to extend usage of dbus to more use cases. Given the tradeoff,
> _existing_ uses of dbus seem to prefer the performance hit to the loss
> of useful semantics, but potential new users would like to or need to
> have both.

If there is a 2x performance improvement for being in the kernel, but a 100x
performance improvement from fixing the userspace code, the effort should be
spent on the userspace code, not on moving things to kernel space.

Remember the Tux in-kernel webserver? it showed performance improvements from
putting the http daemon in the kernel, and a lot of the arguments about it sound
very similar (reduced context switches, etc)

David Lang

2015-04-28 20:42:34

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 1:34 PM, David Lang <[email protected]> wrote:
> On Tue, 28 Apr 2015, Havoc Pennington wrote:
>
>> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <[email protected]> wrote:
>>>
>>> If the examples that are being used to show the performance advantage of
>>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>>> other examples available to people who don't live and breath dbus that
>>> 'so
>>> things right' so that the kernel developers can see what you think is the
>>> real problem and how kdbus addresses it.
>>>
>>> So far, this 'wrong' example is the only thing that's been posted to show
>>> the performance advantage of kdbus.
>>
>>
>> I'm hopeful someone will do that.
>>
>> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>>
>> * the bus daemon means an extra read/parse and marshal/write per
>> message, so 4 vs. 2
>> * the existence of the bus daemon therefore makes a message
>> send/receive take roughly twice as long
>>
>> https://lwn.net/Articles/580194/ has a bit more elaboration about
>> number of copies, validations, and context switches in each case.
>>
>> From what I can tell, the core performance claim for kdbus is that for
>> a userspace daemon to be a routing intermediary, it has to receive and
>> re-send messages. If the baseline performance of IPC is the cost to
>> send once and receive once, adding the daemon means there's twice as
>> much to do (1 more receive, 1 more send). However fast you make
>> send/receive, the daemon always means there are twice as many
>> send/receives as there would be with no daemon.
>
>
> there are twice as many context switches, nobody disputes that, the question
> is if it matters.
>
> It doesn't matter if the message router is in kernel space or user space, it
> still needs to read/parse, marshal/write the data, so you aren't saving that
> time due to it being in the kernel.
>
>> If that isn't what a benchmark shows, then there's a mystery to
>> explain... (one disruption to the ratio of course could be if the
>> clients use a much faster or slower dbus lib than the daemon)
>>
>> As noted many times, of course this 2x penalty for the daemon was a
>> conscious tradeoff - kdbus is trying to escape the tradeoff in order
>> to extend usage of dbus to more use cases. Given the tradeoff,
>> _existing_ uses of dbus seem to prefer the performance hit to the loss
>> of useful semantics, but potential new users would like to or need to
>> have both.
>
>
> If there is a 2x performance improvement for being in the kernel, but a 100x
> performance improvement from fixing the userspace code, the effort should be
> spent on the userspace code, not on moving things to kernel space.

I would guess that, if we compared a highly optimized userspace
implementation to a kernel implementation, we'd see less than 2x
difference. After all, a userspace daemon doesn't really need to
unmarshal and re-marshal anything except headers. For large messages,
we could use splice and avoid a couple of copies, too.

If the scheduler became a bottleneck, it could be interesting to add
something like a send-and-poll primitive. I suspect that some
workloads currently do unnecessary context switches with only standard
POSIX primitives. If A sends a message to B, then there's a brief
window in which both A and B are runnable. Ideally we wouldn't
context switch until A calls poll or epoll_wait, but I don't know how
well that works in practice.

There's more room for generic improvements than just that. At LSF/MM
we were talking about more scalable epoll variants that would allow a
multithreaded daemon to be woken up on the core that received incoming
data. That would allow an efficient multi-queue dbus with fewer
migrations and IPIs.

At some point, I'd like to implement PCID on x86 (if no one beats me
to it, and this is a low priority for me), which will allow us to skip
expensive TLB flushes while context switching. I have no idea whether
ARM can do something similar.

--Andy

2015-04-28 20:43:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 12:19 PM, Havoc Pennington <[email protected]> wrote:
>
> From what I can tell, the core performance claim for kdbus is that for
> a userspace daemon to be a routing intermediary, it has to receive and
> re-send messages. If the baseline performance of IPC is the cost to
> send once and receive once, adding the daemon means there's twice as
> much to do (1 more receive, 1 more send). However fast you make
> send/receive, the daemon always means there are twice as many
> send/receives as there would be with no daemon.

HOWEVER.

That's only a good optimization strategy if the code is optimized to begin with.

If the code spends 10x as much time in user space in "overhead" as it
actually spends in the kernel, the proper place to optimize is to get
rid of the 10x. That will make things much faster.

Once user space is lean and mean, at that point do I believe that "ok,
let's add kernel code for the last bit of performance". But as it is
right now, anybody who works on kdbus and claims that _performance_ is
the reason for their work is just looking at teh wrong piece of the
puzzle.

Now, there may be *other* reasons why kdbus is a good idea. But quite
frankly, every time somebody asks "why", performance seems to be one
of the main answers.

And quite frankly, that *stinks*.

Do proper optimizations of the actual real costs before starting to
work on kernel stuff. It's *stupid* to add a kernel driver to get 2x
improvement, when there's a 10x bloat in user space.

Is that really so hard to see? I don't think it is at *all*
appropriate to say "we're a f*cking bloated pig, but we're too lazy to
fix the bloat and the primary performance problems, so we'll add a
kernel interface to partially hide the issue".

That is particularly true because if you fix the user-level
performance problems, you may notice that there was something stupid
in the interfaces, and some of the kernel interface design was wrong.

Linus

2015-04-28 23:13:09

by John Stoffel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>>> "Havoc" == Havoc Pennington <[email protected]> writes:

Havoc> On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <[email protected]> wrote:
>> So the question is if one of the justifications for moving the daemon
>> into kernel space is that it's performance is crap, then I think it is
>> useful to determine whether a fully optimized userspace daemon would
>> be good enough.
>>

Havoc> Yeah. I don't know how you answer that, because the answer is
Havoc> probably "it would be good enough for some things and not for
Havoc> other things." It depends on whether an app is sending enough
Havoc> data to be too slow, and it depends on the hardware, right.

So what happens if we put kdbus into the kernel and it's still too
slow? What then?

Havoc> What I think we might know: the userspace:kernel time-to-send
Havoc> ratio should always be around 2:1, if both of them are
Havoc> similarly-implemented, because the userspace version has about
Havoc> 2x the work to do.

I'm not sure I agree with this statement, just putting something into
the kernel doesn't magically make the work go away, and the overhead
people are talking about won't change if applications and libraries
keep opening/closing the connection to the bus all the time.

Havoc> The actual wall-clock time of course depends on the hardware
Havoc> and what's being sent.

Havoc> If there was a deviation from 2:1 in a benchmark, it might be
Havoc> because of implementation issues - so for example
Havoc> libdbus+dbus-daemon might be 3:1 or 5:1 to sd-dbus+kdbus,
Havoc> because sd-dbus isn't as bloated as libdbus, say. That isn't
Havoc> telling you anything about kernel vs. userspace architecture,
Havoc> the extra ratio above 2:1 is only telling you about userspace
Havoc> implementation quality.

Which is also telling you that maybe userspace could be improved more,
before it needs to even think about going into the kernel?

Havoc> For purposes of deciding what to put in kernel - the
Havoc> differences between dbus client implementations (sd-dbus,
Havoc> libdbus, gdbus, etc.) seem like irrelevant noise to me.

Havoc> Re: the slippery slope to LDAP in the kernel - my questions
Havoc> would be things like 1) what are non-performance reasons to
Havoc> have dbus in the kernel, such as early boot or security
Havoc> considerations; 2) does LDAP in kernel give these kind of 2:1
Havoc> gains; 3) is there a simpler way to get the 2:1 gain for
Havoc> dbus...

Havoc> Others can answer those better than I can.

Havoc> I _would_ say that dbus is more "generic" than something like
Havoc> LDAP; dbus is specific to the use-case of coordinating
Havoc> processes on a single machine, but it isn't specific to any
Havoc> particular application, and it's been used for lots of
Havoc> different applications. On my laptop, which is a pretty normal
Havoc> fedora 21 as far as I know:

LDAP is pretty damn generic, in that you can put pretty large objects
into it, and pretty large OUs, etc. So why would it be a candidate
for going into the kernel? And why is kdbus so important in the
kernel as well? People have talked about it needing to be there for
bootup, but isn't that why we ripped out RAID detection and such from
the kernel and built initramfs, so that there's LESS in the kernel,
and more in an early userspace? Same idea with dbus in my opinion.

Havoc> $ rpm -q --whatrequires 'libdbus-1.so.3()(64bit)' | wc -l
Havoc> 113

Havoc> this omits anyone using a different binding, it's only libdbus users.

>> I find dbus to be extremely hard to debug when my desktop starts doing
>> things I don't want it to do. The fact that it might be flinging around hundreds
>> of thousands of messages, and that this is something we want to encourage,

Havoc> This particular argument doesn't resonate with me ... if dbus
Havoc> is hard to debug, it's not as if "ad hoc application-specific
Havoc> sidechannel somebody cooked up" is going to be easier.

When Ted is saying it's hard to debug... then maybe it's a bit crappy
in design or implementation?

Havoc> People aren't usually making up data to send around just because they
Havoc> can. If they need to send an audio stream, and dbus is too slow,
Havoc> they'll send it another ad hoc way, but it ultimately has to get sent.
Havoc> Same for most data, it is the size it is and it needs to go where it
Havoc> needs to go, for some what-the-user-wants-to-do kind of reason.

Havoc> If apps have to, they say "I'm sorry Dave I can't do that - you
Havoc> can't software-decode 4K video on your 300mhz ARM" - of course.

So why DOES audio need to go via DBUS? What about video? Why
shouldn't that go via dbus as well?

If one userspace implementation is so crappy, why can't that
implementation be tossed and a better one done? Or why can't they
just optimize/tune it in userspace instead?

John

2015-04-29 00:45:49

by Havoc Pennington

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Tue, Apr 28, 2015 at 7:12 PM, John Stoffel <[email protected]> wrote:
> Havoc> Yeah. I don't know how you answer that, because the answer is
> Havoc> probably "it would be good enough for some things and not for
> Havoc> other things." It depends on whether an app is sending enough
> Havoc> data to be too slow, and it depends on the hardware, right.
>
> So what happens if we put kdbus into the kernel and it's still too
> slow? What then?

What my above paragraph was intended to mean is: I don't understand
what it means to ask about a "too slow" fixed line here. Every time
you make it substantively faster, it works for more apps or on slower
hardware, presumably. You dial the speed, and you include or exclude
certain app ideas accordingly.

I think dbus works for lots of purposes now, despite being slow. Lots
of people are using it. In many uses, super-slow-dbus might be 1% of
the profile of whatever the user-visible functionality is, and nobody
cares how fast dbus is. In other uses, they might.

Some people are saying they would use it in more ways if it were
faster and/or available in early boot and/or whatever else. I'm not
those people, because right now I'm not working on dbus or anything
using dbus. They would have to say what's 'fast enough' for them.

"What happens if unix sockets are too slow? what then?" - it's not a
coherent question. It's always relative to what you're trying to do,
surely.

> I'm not sure I agree with this statement, just putting something into
> the kernel doesn't magically make the work go away

The kdbus guys should really explain this. I have my understanding of
it but theirs will be more accurate.

> Which is also telling you that maybe userspace could be improved more,
> before it needs to even think about going into the kernel?

I imagine people have already improved the part of userspace they are
thinking of keeping (sd-dbus, replacing libdbus) and they don't want
to rewrite dbus-daemon only to immediately discard it. (The part of
the "dbus" overall system which hasn't been rewritten and optimized is
the daemon, which could be dropped completely in kdbus-world.)

It's not especially mysterious what's slow about the existing daemon
implementation, in my opinion; it's been the same for 10 years. The
rough outline of speeding it up would be to replace libdbus with
something sd-dbus-like, and then do a round of profiling and tuning.
The 2012 email I linked to earlier had some other ideas. But this is
a lot of work, it isn't "just" port to sd-dbus, the daemon is strongly
entangled with libdbus right now. I don't blame people for being
unmotivated on this if they believe it's a dead end.

In that same 2012 email you'll notice I advised doing exactly what
Linus suggests; do the userspace tuning rather than quote "arguing
with kernel developers":
http://lists.freedesktop.org/archives/dbus/2012-March/015024.html

But I do admire that people felt kdbus was the right answer so have
gone for it anyway, and I do think Linux as a complete OS
(kernel+userspace) deserves a great answer in this problem space.

> LDAP is pretty damn generic, in that you can put pretty large objects
> into it, and pretty large OUs, etc. So why would it be a candidate
> for going into the kernel? And why is kdbus so important in the
> kernel as well? People have talked about it needing to be there for
> bootup, but isn't that why we ripped out RAID detection and such from
> the kernel and built initramfs, so that there's LESS in the kernel,
> and more in an early userspace? Same idea with dbus in my opinion.
>

I don't have a well-developed philosophy on what should be in the
kernel or not. That is something the kernel maintainers have to
answer. My main concern here is that people understand what dbus is
about historically, so they don't do silly stuff - whether cargo cult
keeping a 'feature' that was always a bad idea, or speeding it up by
breaking intentional and important semantics, or whatever.

When I see people saying they don't understand what dbus is because
they have no idea how a Linux workstation userspace is put together,
that's something I can help with.

When I see people saying maybe it isn't worth the complexity to put
this in the kernel if it's only an N% speedup, I can see that, I'm not
going to say that's wrong or right. It depends to me on what apps are
enabled by the N%, or whether early boot and other factors are
important.

> When Ted is saying it's hard to debug... then maybe it's a bit crappy
> in design or implementation?

Or maybe he just doesn't know how to debug it, honestly. I find the
kernel hard to debug because I know very little about it. I find the
desktop simple to debug, at least as simple as debugging millions of
lines of code can be. The difference is that I have never done kernel
debugging and I'm already familiar with how the desktop works.

dbus has tools that log every message and let you explore and
introspect everything on it, etc. - it works for me.

> So why DOES audio need to go via DBUS? What about video? Why
> shouldn't that go via dbus as well?
>
> If one userspace implementation is so crappy, why can't that
> implementation be tossed and a better one done? Or why can't they
> just optimize/tune it in userspace instead?

In this email I listed what I could remember app developers bringing
up when told to use a sidechannel instead of dbus:
http://article.gmane.org/gmane.linux.kernel/1935002

I can't speak to what makes sense for audio or video, but I'm sure
people who work on those things could.

Re: why can't it be done in userspace, the only thing I'd repeat again
here is that when people mention ways to speed up the bus daemon in
userspace, they often sound like they would abandon one or more of the
semantic guarantees of dbus (usually ordering, sometimes things like
the guaranteed-correct sender information or whatever). And _maybe_
some of those guarantees are worth abandoning, but I'd be very careful
with it.

Havoc

2015-04-29 11:34:02

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 01:12, John Stoffel wrote:
>>>>>> "Havoc" == Havoc Pennington <[email protected]> writes:
>
> Havoc> On Tue, Apr 28, 2015 at 1:18 PM, Theodore Ts'o <[email protected]> wrote:
>>> I find dbus to be extremely hard to debug when my desktop starts doing
>>> things I don't want it to do. The fact that it might be flinging around hundreds
>>> of thousands of messages, and that this is something we want to encourage,
>
> Havoc> This particular argument doesn't resonate with me ... if dbus
> Havoc> is hard to debug, it's not as if "ad hoc application-specific
> Havoc> sidechannel somebody cooked up" is going to be easier.
>
> When Ted is saying it's hard to debug... then maybe it's a bit crappy
> in design or implementation?

There is a very nice tool to debug the traffic for kdbus.

http://lists.freedesktop.org/archives/dbus/2014-March/016178.html

Also the patched wireshark makes it as easy as analyzing network traffic.

2015-04-29 12:48:41

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 29.04.2015 01:12, John Stoffel wrote:
> LDAP is pretty damn generic, in that you can put pretty large objects into
> it, and pretty large OUs, etc. So why would it be a candidate for going
> into the kernel? And why is kdbus so important in the kernel as well?
> People have talked about it needing to be there for bootup, but isn't that
> why we ripped out RAID detection and such from the kernel and built
> initramfs, so that there's LESS in the kernel, and more in an early
> userspace? Same idea with dbus in my opinion.

Let me elaborate on the initramfs/shutdown situation a little bit more,
because I have to deal with that every day.

Because of the "let's move everything to userspace" sentiment we nowadays
have the situation, that we need a lot of tools to setup the root device.

Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network
(with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the
LVM, open crypto devices, etc...
And if something goes wrong, you want to have a shell, see all the logs and
debug things.

Now over the time we moved away from simple shell scripts (without any
logging) and static compiled special versions for the initramfs to a mini
distribution in the initramfs, which simplifies maintenance and improves
reliability.

Basically you want to use the same tools in the initramfs (and shutdown)
which you already have and use in your real root, with the same configuration
files and the same interfaces and the same code paths.

Therefore systemd is started in dracut created initramfs, which starts
journald for logging. The same basic systemd targets exist in the initramfs
as on the real root, so normally you don't have to cope with specialized
versions for the initramfs.

The target here is to have the same IPC mechanism from the very beginning to
the very end. No crappy fallback mechanisms in case a daemon is not running
or has crashed, no creepy transition from initramfs root to real root to
shutdown root.

We already have such transitions like: systemd, journald, mdmon [1], etc.
systemd has to serialize itself, journald's file descriptors are transitioned
over, mdmon jumps through hoops. Remember you want to get rid of open files
and executables and have to reexec everything, if you transition from the
initramfs root to the real root, and also from the real root to the shutdown
root.

We really don't want the IPC mechanism to be in a flux state. All tools have
to fallback to a non-standard mechanism in that case.

If I have to pull in a dbus daemon in the initramfs, we still have the
chicken and egg problem for PID 1 talking to the logging daemon and starting
dbus.
systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus
cannot log anything on startup, if journald is not running, etc...

dbus-daemon would have to transition to the real root, and from the real root
to the shutdown root, without losing state.

Of course this can all be done, but it would involve fallback mechanisms,
which we want to get rid off. Hopefully, you don't suggest to merge dbus with
PID 1. Also with a daemon, you will lose the points mentioned in the cover mail
:

* Security: The peers which communicate do not have to trust each
other, as the only trustworthy component in the game is the kernel
which adds metadata and ensures that all data passed as payload is
either copied or sealed, so that the receiver can parse the data
without having to protect against changing memory while parsing
buffers. Also, all the data transfer is controlled by the kernel,
so that LSMs can track and control what is going on, without
involving userspace. Because of the LSM issue, security people are
much happier with this model than the current scheme of having to
hook into dbus to mediate things.

* Being in the kernel closes a lot of races which can't be fixed with
the current userspace solutions. For example, with kdbus, there is a
way a client can disconnect from a bus, but do so only if no further
messages present in its queue, which is crucial for implementing
race-free "exit-on-idle" services

* Eavesdropping on the kernel level, so privileged users can hook into
the message stream without hacking support for that into their
userspace processes

* A number of smaller benefits: for example kdbus learned a way to peek
full messages without dequeing them, which is really useful for
logging metadata when handling bus-activation requests.

I don't care, if the kdbus speedup is only marginal.

In my ideal world, there is a standard IPC mechanism from the beginning to
the end, which does not rely on any process running (except the kernel) and
which is used by _all_ tools, be it a system daemon providing information and
interfaces about device assembly or network setup tools or end user desktop
processes.

dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you can
invent the wheel again (NIH, "we know better") and wait and see, if that
works out. Until then the whole common IPC problem is unresolved and Linux
distributions are just a collection of random software with no common
interoperability and home grown interfaces.

[1] transitioning mdmon is one of the critical parts for an IMSM raid array.
Also running an executable from a disk, which the executable is monitoring,
and which stops functioning, if the executable is not responding is insane.

Thanks for reading...


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVQNL1AAoJEANOs3ABTfJw0BUQAJgj2RNKR8L7xVPwH2GovmST
nioOl6sg9u2m8NgYM6TJUJI3yHbOOiRVCTXHb9fmkTk/hBDxsT+X4lFevh0mDLJu
Y5bk1RwGn8Ail3GLR6il9RhlNMKEqN2Ik4Ey26IdxQkOhIIAy9IfrNBdsdoNpJ7I
P7qhP8J1DKfmIlgryrXy/mTZ1Nl1m6UlpMZHDSqlnPWuT/iJn0wORbs319fgAQx/
kkPvgSqTGkDetHGNzYmghgRzimNBR5ZftH0HS3Chq6rXPiSbdct/dE8VkQRiEWYo
k6tE83qJr9KbSdBFqnbznVaOpTCQatdanVPBzzz4DTkuSKBlAxIbdXRaFsJCSnKp
7r+h8q+AgdALJXEyx5AyBeh8/dK1a/PsMzOtYZg6FXAz211geTxHeY8bTdOrzys9
kJGwUbbq4rIyvseEl53+Ugh2qZQptDKCj6F46H3iuhsOyUbPXzg1E7K8w2gApwSY
L/eLEcQw+TApULyEhDrQqXlFBPz4vFP38mHNQ6T1Yt3sJuVoU12dOQNN6836htpe
h4ijpaTbUkFV8b/7xgGqOlSBio4iSppybXfiBtHT7NBa4da1L+WG0xT+nR8RSMxd
Gblt9ECZmbay6SIMYQBhntZD5Hs76iSJl0j2i9zg8E1pBw8O5w0jvlA02fOz2pkp
wQsPrxNdlkBxFHVtf/3V
=Dc23
-----END PGP SIGNATURE-----

2015-04-29 13:33:31

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 29, 2015 at 2:47 PM, Harald Hoyer <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 29.04.2015 01:12, John Stoffel wrote:
>> LDAP is pretty damn generic, in that you can put pretty large objects into
>> it, and pretty large OUs, etc. So why would it be a candidate for going
>> into the kernel? And why is kdbus so important in the kernel as well?
>> People have talked about it needing to be there for bootup, but isn't that
>> why we ripped out RAID detection and such from the kernel and built
>> initramfs, so that there's LESS in the kernel, and more in an early
>> userspace? Same idea with dbus in my opinion.
>
> Let me elaborate on the initramfs/shutdown situation a little bit more,
> because I have to deal with that every day.
>
> Because of the "let's move everything to userspace" sentiment we nowadays
> have the situation, that we need a lot of tools to setup the root device.
>
> Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network
> (with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the
> LVM, open crypto devices, etc...
> And if something goes wrong, you want to have a shell, see all the logs and
> debug things.

None of these tools depend on dbus (_desktop_ bus).

> Now over the time we moved away from simple shell scripts (without any
> logging) and static compiled special versions for the initramfs to a mini
> distribution in the initramfs, which simplifies maintenance and improves
> reliability.
>
> Basically you want to use the same tools in the initramfs (and shutdown)
> which you already have and use in your real root, with the same configuration
> files and the same interfaces and the same code paths.
>
> Therefore systemd is started in dracut created initramfs, which starts
> journald for logging. The same basic systemd targets exist in the initramfs
> as on the real root, so normally you don't have to cope with specialized
> versions for the initramfs.
>
> The target here is to have the same IPC mechanism from the very beginning to
> the very end. No crappy fallback mechanisms in case a daemon is not running
> or has crashed, no creepy transition from initramfs root to real root to
> shutdown root.
>
> We already have such transitions like: systemd, journald, mdmon [1], etc.
> systemd has to serialize itself, journald's file descriptors are transitioned
> over, mdmon jumps through hoops. Remember you want to get rid of open files
> and executables and have to reexec everything, if you transition from the
> initramfs root to the real root, and also from the real root to the shutdown
> root.
>
> We really don't want the IPC mechanism to be in a flux state. All tools have
> to fallback to a non-standard mechanism in that case.
>
> If I have to pull in a dbus daemon in the initramfs, we still have the
> chicken and egg problem for PID 1 talking to the logging daemon and starting
> dbus.
> systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus
> cannot log anything on startup, if journald is not running, etc...

The only reason why you need dbus in your initramfs is because of systemd
and its tools, isn't it?

> In my ideal world, there is a standard IPC mechanism from the beginning to
> the end, which does not rely on any process running (except the kernel) and
> which is used by _all_ tools, be it a system daemon providing information and
> interfaces about device assembly or network setup tools or end user desktop
> processes.

It depends how you define "beginning". To me an initramfs is a *very* minimal
tool to prepare the rootfs and nothing more (no udev, no systemd, no
"mini distro").
If the initramfs fails to do its job it can print to the console like
the kernel does if it fails
at a very early stage.

--
Thanks,
//richard

2015-04-29 13:36:57

by Stephen Smalley

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/29/2015 08:47 AM, Harald Hoyer wrote:
> On 29.04.2015 01:12, John Stoffel wrote:
>> LDAP is pretty damn generic, in that you can put pretty large objects into
>> it, and pretty large OUs, etc. So why would it be a candidate for going
>> into the kernel? And why is kdbus so important in the kernel as well?
>> People have talked about it needing to be there for bootup, but isn't that
>> why we ripped out RAID detection and such from the kernel and built
>> initramfs, so that there's LESS in the kernel, and more in an early
>> userspace? Same idea with dbus in my opinion.
>
> Let me elaborate on the initramfs/shutdown situation a little bit more,
> because I have to deal with that every day.
>
> Because of the "let's move everything to userspace" sentiment we nowadays
> have the situation, that we need a lot of tools to setup the root device.
>
> Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the network
> (with bridging, bonding, etc.), the iSCSI connection, assemble the raid, the
> LVM, open crypto devices, etc...
> And if something goes wrong, you want to have a shell, see all the logs and
> debug things.
>
> Now over the time we moved away from simple shell scripts (without any
> logging) and static compiled special versions for the initramfs to a mini
> distribution in the initramfs, which simplifies maintenance and improves
> reliability.
>
> Basically you want to use the same tools in the initramfs (and shutdown)
> which you already have and use in your real root, with the same configuration
> files and the same interfaces and the same code paths.
>
> Therefore systemd is started in dracut created initramfs, which starts
> journald for logging. The same basic systemd targets exist in the initramfs
> as on the real root, so normally you don't have to cope with specialized
> versions for the initramfs.
>
> The target here is to have the same IPC mechanism from the very beginning to
> the very end. No crappy fallback mechanisms in case a daemon is not running
> or has crashed, no creepy transition from initramfs root to real root to
> shutdown root.
>
> We already have such transitions like: systemd, journald, mdmon [1], etc.
> systemd has to serialize itself, journald's file descriptors are transitioned
> over, mdmon jumps through hoops. Remember you want to get rid of open files
> and executables and have to reexec everything, if you transition from the
> initramfs root to the real root, and also from the real root to the shutdown
> root.
>
> We really don't want the IPC mechanism to be in a flux state. All tools have
> to fallback to a non-standard mechanism in that case.
>
> If I have to pull in a dbus daemon in the initramfs, we still have the
> chicken and egg problem for PID 1 talking to the logging daemon and starting
> dbus.
> systemd cannot talk to journald via dbus unless dbus-daemon is started, dbus
> cannot log anything on startup, if journald is not running, etc...
>
> dbus-daemon would have to transition to the real root, and from the real root
> to the shutdown root, without losing state.
>
> Of course this can all be done, but it would involve fallback mechanisms,
> which we want to get rid off. Hopefully, you don't suggest to merge dbus with
> PID 1. Also with a daemon, you will lose the points mentioned in the cover mail
> :
>
> * Security: The peers which communicate do not have to trust each
> other, as the only trustworthy component in the game is the kernel
> which adds metadata and ensures that all data passed as payload is
> either copied or sealed, so that the receiver can parse the data
> without having to protect against changing memory while parsing
> buffers. Also, all the data transfer is controlled by the kernel,
> so that LSMs can track and control what is going on, without
> involving userspace. Because of the LSM issue, security people are
> much happier with this model than the current scheme of having to
> hook into dbus to mediate things.

I just want to caution that this justification for kdbus is not fully
realized in the current implementation. As it currently stands, there
are no LSM hook calls in the kdbus tree beyond metadata collection of
security labels. I know there have been experimental proof-of-concept
patches floating around for LSM hooks for kdbus but they aren't merged
as of yet, nor are there any real implementations of the hooks for the
security modules. If kdbus is merged, we need to make sure that the LSM
support is integrated before it gets enabled in any distros or we'll
have a completely unmediated IPC channel, i.e. a bypass of any security
policy restrictions on IPC.

It is also interesting that kdbus allows impersonation of any
credential, including security label, by "privileged" clients, where
privileged simply means it either has CAP_IPC_OWNER or owns (euid
matches uid) the bus. That seems wrong at least for security labels;
either we should not support impersonation of security labels at all or
at least it should be controlled by the security module based on its own
logic and attributes, not based on CAP_IPC_OWNER or a uid match.

Has anyone even reviewed the privilege and UID-based model of kdbus for
its implications with respect to discretionary access control?

> * Being in the kernel closes a lot of races which can't be fixed with
> the current userspace solutions. For example, with kdbus, there is a
> way a client can disconnect from a bus, but do so only if no further
> messages present in its queue, which is crucial for implementing
> race-free "exit-on-idle" services
>
> * Eavesdropping on the kernel level, so privileged users can hook into
> the message stream without hacking support for that into their
> userspace processes

This one worried me a bit, particularly the statement that such
eavesdropping is unobservable by any other participant on the bus.
Seems a bit prone to abuse, particularly since it can be done by any
privileged client, not merely the process that originally created the bus?

2015-04-29 13:39:02

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 15:33, Richard Weinberger wrote:
> It depends how you define "beginning". To me an initramfs is a *very* minimal
> tool to prepare the rootfs and nothing more (no udev, no systemd, no
> "mini distro").
> If the initramfs fails to do its job it can print to the console like
> the kernel does if it fails
> at a very early stage.
>

Your solution might work for your small personal needs, but not for our customers.

2015-04-29 13:46:51

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
> On 29.04.2015 15:33, Richard Weinberger wrote:
>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>> "mini distro").
>> If the initramfs fails to do its job it can print to the console like
>> the kernel does if it fails
>> at a very early stage.
>>
>
> Your solution might work for your small personal needs, but not for our customers.

Correct, I don't know your customers, all I know are my customers. :-)

What feature do your customers need?
I mean, I fully agree with you that an initramfs must not fail silently
but how does dbus help there? If it fails to mount the rootfs there is not
much it can do.

Thanks,
//richard

2015-04-29 14:01:35

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 15:46, Richard Weinberger wrote:
> Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
>> On 29.04.2015 15:33, Richard Weinberger wrote:
>>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>>> "mini distro").
>>> If the initramfs fails to do its job it can print to the console like
>>> the kernel does if it fails
>>> at a very early stage.
>>>
>>
>> Your solution might work for your small personal needs, but not for our customers.
>
> Correct, I don't know your customers, all I know are my customers. :-)
>
> What feature do your customers need?
> I mean, I fully agree with you that an initramfs must not fail silently
> but how does dbus help there? If it fails to mount the rootfs there is not
> much it can do.
>
> Thanks,
> //richard
>

We don't handcraft the initramfs script for every our customers, therefore we
have to generically support hotplug, persistent device names, persistent
interface names, network connectivity in the initramfs, user input handling for
passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
from hibernation, […]

And no, this is not a simple minimal tool.

2015-04-29 14:04:14

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 29.04.2015 um 16:01 schrieb Harald Hoyer:
> On 29.04.2015 15:46, Richard Weinberger wrote:
>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
>>> On 29.04.2015 15:33, Richard Weinberger wrote:
>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>>>> "mini distro").
>>>> If the initramfs fails to do its job it can print to the console like
>>>> the kernel does if it fails
>>>> at a very early stage.
>>>>
>>>
>>> Your solution might work for your small personal needs, but not for our customers.
>>
>> Correct, I don't know your customers, all I know are my customers. :-)
>>
>> What feature do your customers need?
>> I mean, I fully agree with you that an initramfs must not fail silently
>> but how does dbus help there? If it fails to mount the rootfs there is not
>> much it can do.
>>
>> Thanks,
>> //richard
>>
>
> We don't handcraft the initramfs script for every our customers, therefore we
> have to generically support hotplug, persistent device names, persistent
> interface names, network connectivity in the initramfs, user input handling for
> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
> from hibernation, […]

This is correct. But which of these tools/features depend on dbus?

Thanks,
//richard

P.s: Please don't drop the CC list.

2015-04-29 14:11:23

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 16:04, Richard Weinberger wrote:
> Am 29.04.2015 um 16:01 schrieb Harald Hoyer:
>> On 29.04.2015 15:46, Richard Weinberger wrote:
>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
>>>> On 29.04.2015 15:33, Richard Weinberger wrote:
>>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>>>>> "mini distro").
>>>>> If the initramfs fails to do its job it can print to the console like
>>>>> the kernel does if it fails
>>>>> at a very early stage.
>>>>>
>>>>
>>>> Your solution might work for your small personal needs, but not for our customers.
>>>
>>> Correct, I don't know your customers, all I know are my customers. :-)
>>>
>>> What feature do your customers need?
>>> I mean, I fully agree with you that an initramfs must not fail silently
>>> but how does dbus help there? If it fails to mount the rootfs there is not
>>> much it can do.
>>>
>>> Thanks,
>>> //richard
>>>
>>
>> We don't handcraft the initramfs script for every our customers, therefore we
>> have to generically support hotplug, persistent device names, persistent
>> interface names, network connectivity in the initramfs, user input handling for
>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>> from hibernation, […]
>
> This is correct. But which of these tools/features depend on dbus?

I would love to add dbus support to all of them and use it, so I can connect
them all more easily. No need for them to invent their own version of IPC,
which can only be used by their own tool set.

2015-04-29 14:18:16

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 29.04.2015 um 16:11 schrieb Harald Hoyer:
>>> We don't handcraft the initramfs script for every our customers, therefore we
>>> have to generically support hotplug, persistent device names, persistent
>>> interface names, network connectivity in the initramfs, user input handling for
>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>>> from hibernation, […]
>>
>> This is correct. But which of these tools/features depend on dbus?
>
> I would love to add dbus support to all of them and use it, so I can connect
> them all more easily. No need for them to invent their own version of IPC,
> which can only be used by their own tool set.

Why/how do you need to connect them?
Sorry for being persistent but as I use most of these tools too (also in initramfs)
I'm very curious.

Many of us grumpy kernel devs simply don't know all the use case of you have to cover.
So, please explain. :-)

Thanks,
//richard

2015-04-29 14:46:59

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-29 10:11, Harald Hoyer wrote:
> On 29.04.2015 16:04, Richard Weinberger wrote:
>> Am 29.04.2015 um 16:01 schrieb Harald Hoyer:
>>> On 29.04.2015 15:46, Richard Weinberger wrote:
>>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
>>>>> On 29.04.2015 15:33, Richard Weinberger wrote:
>>>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>>>>>> "mini distro").
>>>>>> If the initramfs fails to do its job it can print to the console like
>>>>>> the kernel does if it fails
>>>>>> at a very early stage.
>>>>>>
>>>>>
>>>>> Your solution might work for your small personal needs, but not for our customers.
>>>>
>>>> Correct, I don't know your customers, all I know are my customers. :-)
>>>>
>>>> What feature do your customers need?
>>>> I mean, I fully agree with you that an initramfs must not fail silently
>>>> but how does dbus help there? If it fails to mount the rootfs there is not
>>>> much it can do.
>>>>
>>>> Thanks,
>>>> //richard
>>>>
>>>
>>> We don't handcraft the initramfs script for every our customers, therefore we
>>> have to generically support hotplug, persistent device names, persistent
>>> interface names, network connectivity in the initramfs, user input handling for
>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>>> from hibernation, […]
>>
>> This is correct. But which of these tools/features depend on dbus?
>
> I would love to add dbus support to all of them and use it, so I can connect
> them all more easily. No need for them to invent their own version of IPC,
> which can only be used by their own tool set.
>
Resume is built into the kernel, so no need for IPC there. Keymaps,
fonts, and fsck need no IPC either. FIPS related stuff should need no
IPC. Anything to do with the Device Mapper and hotplug should just need
uevents. While I can kind of see you wanting to have lvmetad in the
initramfs for use with LVM, I've seen all kinds of reports of issues
caused by that. I can also kind of understand wanting some kind of
unified IPC for the netboot related stuff, DBus is still serious
overkill for any of that IMHO. As things stand currently, the few
things in that list that I know actually use IPC for anything get by
just fine (and much faster) using just UDS.



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-29 14:51:43

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 29.04.2015 um 16:46 schrieb Austin S Hemmelgarn:
> On 2015-04-29 10:11, Harald Hoyer wrote:
>> On 29.04.2015 16:04, Richard Weinberger wrote:
>>> Am 29.04.2015 um 16:01 schrieb Harald Hoyer:
>>>> On 29.04.2015 15:46, Richard Weinberger wrote:
>>>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
>>>>>> On 29.04.2015 15:33, Richard Weinberger wrote:
>>>>>>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>>>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>>>>>>> "mini distro").
>>>>>>> If the initramfs fails to do its job it can print to the console like
>>>>>>> the kernel does if it fails
>>>>>>> at a very early stage.
>>>>>>>
>>>>>>
>>>>>> Your solution might work for your small personal needs, but not for our customers.
>>>>>
>>>>> Correct, I don't know your customers, all I know are my customers. :-)
>>>>>
>>>>> What feature do your customers need?
>>>>> I mean, I fully agree with you that an initramfs must not fail silently
>>>>> but how does dbus help there? If it fails to mount the rootfs there is not
>>>>> much it can do.
>>>>>
>>>>> Thanks,
>>>>> //richard
>>>>>
>>>>
>>>> We don't handcraft the initramfs script for every our customers, therefore we
>>>> have to generically support hotplug, persistent device names, persistent
>>>> interface names, network connectivity in the initramfs, user input handling for
>>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>>>> from hibernation, […]
>>>
>>> This is correct. But which of these tools/features depend on dbus?
>>
>> I would love to add dbus support to all of them and use it, so I can connect
>> them all more easily. No need for them to invent their own version of IPC,
>> which can only be used by their own tool set.
>>
> Resume is built into the kernel, so no need for IPC there. Keymaps, fonts, and fsck need no IPC either. FIPS related stuff should need no IPC. Anything to do with the Device
> Mapper and hotplug should just need uevents. While I can kind of see you wanting to have lvmetad in the initramfs for use with LVM, I've seen all kinds of reports of issues caused
> by that. I can also kind of understand wanting some kind of unified IPC for the netboot related stuff, DBus is still serious overkill for any of that IMHO. As things stand
> currently, the few things in that list that I know actually use IPC for anything get by just fine (and much faster) using just UDS.

Even if dbus is really needed you can embed it into the initramfs.
What you need is a good bootstrap procedure between dbus/systemd within the initramfs and the real rootfs.

Thanks,
//richard

2015-04-29 14:54:01

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 16:18, Richard Weinberger wrote:
> Am 29.04.2015 um 16:11 schrieb Harald Hoyer:
>>>> We don't handcraft the initramfs script for every our customers, therefore we
>>>> have to generically support hotplug, persistent device names, persistent
>>>> interface names, network connectivity in the initramfs, user input handling for
>>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>>>> from hibernation, […]
>>>
>>> This is correct. But which of these tools/features depend on dbus?
>>
>> I would love to add dbus support to all of them and use it, so I can connect
>> them all more easily. No need for them to invent their own version of IPC,
>> which can only be used by their own tool set.
>
> Why/how do you need to connect them?
> Sorry for being persistent but as I use most of these tools too (also in initramfs)
> I'm very curious.
>
> Many of us grumpy kernel devs simply don't know all the use case of you have to cover.
> So, please explain. :-)
>

Well, using shell scripts I connected all of these tools in the earlier
versions of dracut [1]. Been there, done that.

When using bash to wait for an interface to come up [2] or doing dhcp [3], the
(at least my) pain threshold is reached, and you want something more sophisticated.

So, one starts eyeing NetworkManager or systemd-networkd. Both of them have CLI
tools and helpers and these tools and helpers talk to each other with (guess
what?) an IPC mechanism, which happens to be DBUS (because it's the IPC of
choice, if you don't want to reinvent the wheel).

But let's not pinpoint that to network alone. Parsing output of tools with
shell scripts is horrible, slow, fragile, error prone.

Sure, I can write one binary to rule them all, pull out all the code from all
tools I need, but for me an IPC mechanism sounds a lot better. And it should be
_one_ common IPC mechanism and not a plethora of them. It should feel like an
operating system and not like a bunch of thrown together software, which is
glued together with some magic shell scripts.


[1] https://dracut.wiki.kernel.org/index.php/Main_Page
[2]
http://git.kernel.org/cgit/boot/dracut/dracut.git/tree/modules.d/40network/net-lib.sh#n483
[3] http://pkgs.fedoraproject.org/cgit/dhcp.git/tree/dhclient-script

2015-04-29 14:59:13

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 29.04.2015 um 16:53 schrieb Harald Hoyer:
> On 29.04.2015 16:18, Richard Weinberger wrote:
>> Am 29.04.2015 um 16:11 schrieb Harald Hoyer:
>>>>> We don't handcraft the initramfs script for every our customers, therefore we
>>>>> have to generically support hotplug, persistent device names, persistent
>>>>> interface names, network connectivity in the initramfs, user input handling for
>>>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>>>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>>>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>>>>> from hibernation, […]
>>>>
>>>> This is correct. But which of these tools/features depend on dbus?
>>>
>>> I would love to add dbus support to all of them and use it, so I can connect
>>> them all more easily. No need for them to invent their own version of IPC,
>>> which can only be used by their own tool set.
>>
>> Why/how do you need to connect them?
>> Sorry for being persistent but as I use most of these tools too (also in initramfs)
>> I'm very curious.
>>
>> Many of us grumpy kernel devs simply don't know all the use case of you have to cover.
>> So, please explain. :-)
>>
>
> Well, using shell scripts I connected all of these tools in the earlier
> versions of dracut [1]. Been there, done that.
>
> When using bash to wait for an interface to come up [2] or doing dhcp [3], the
> (at least my) pain threshold is reached, and you want something more sophisticated.
>
> So, one starts eyeing NetworkManager or systemd-networkd. Both of them have CLI
> tools and helpers and these tools and helpers talk to each other with (guess
> what?) an IPC mechanism, which happens to be DBUS (because it's the IPC of
> choice, if you don't want to reinvent the wheel).
>
> But let's not pinpoint that to network alone. Parsing output of tools with
> shell scripts is horrible, slow, fragile, error prone.

So, you want to replace bash by dbus?
I'll stop now with arguing.
Let's agree to disagree.

> Sure, I can write one binary to rule them all, pull out all the code from all
> tools I need, but for me an IPC mechanism sounds a lot better. And it should be
> _one_ common IPC mechanism and not a plethora of them. It should feel like an
> operating system and not like a bunch of thrown together software, which is
> glued together with some magic shell scripts.

This is how UNIX works. ;)

Thanks,
//richard

2015-04-29 15:03:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote:
> Sure, I can write one binary to rule them all, pull out all the code from all
> tools I need, but for me an IPC mechanism sounds a lot better. And it should be
> _one_ common IPC mechanism and not a plethora of them. It should feel like an
> operating system and not like a bunch of thrown together software, which is
> glued together with some magic shell scripts.

And so requiring wireshark (and X?) in initramfs to debug problems
once dbus is introduced is better?

I would think shell scripts are *easier* to debug when things go
wrong, especially in a minimal environment such as an initial ram
disk. Having had to debug problems in a distro initramfs when trying
to help a customer bring up a FC boot disk long ago in another life,
I'm certain I would rather debug problems while on site at a
classified machine room[1] using shell scripts, and trying to debug
dbus is something that would be infinitely worse.

- Ted

[1] So no laptop, no google, no access to sources to figure out random
dbus messages, etc.

2015-04-29 15:07:39

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 16:46, Austin S Hemmelgarn wrote:
> On 2015-04-29 10:11, Harald Hoyer wrote:
>> On 29.04.2015 16:04, Richard Weinberger wrote:
>>> Am 29.04.2015 um 16:01 schrieb Harald Hoyer:
>>>> On 29.04.2015 15:46, Richard Weinberger wrote:
>>>>> Am 29.04.2015 um 15:38 schrieb Harald Hoyer:
>>>>>> On 29.04.2015 15:33, Richard Weinberger wrote:
>>>>>>> It depends how you define "beginning". To me an initramfs is a *very*
>>>>>>> minimal
>>>>>>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>>>>>>> "mini distro").
>>>>>>> If the initramfs fails to do its job it can print to the console like
>>>>>>> the kernel does if it fails
>>>>>>> at a very early stage.
>>>>>>>
>>>>>>
>>>>>> Your solution might work for your small personal needs, but not for our
>>>>>> customers.
>>>>>
>>>>> Correct, I don't know your customers, all I know are my customers. :-)
>>>>>
>>>>> What feature do your customers need?
>>>>> I mean, I fully agree with you that an initramfs must not fail silently
>>>>> but how does dbus help there? If it fails to mount the rootfs there is not
>>>>> much it can do.
>>>>>
>>>>> Thanks,
>>>>> //richard
>>>>>
>>>>
>>>> We don't handcraft the initramfs script for every our customers, therefore we
>>>> have to generically support hotplug, persistent device names, persistent
>>>> interface names, network connectivity in the initramfs, user input handling
>>>> for
>>>> passwords, fonts, keyboard layouts, fips, fsck, repair tools for file systems,
>>>> raid assembly, LVM assembly, multipath, crypto devices, live images, iSCSI,
>>>> FCoE, all kinds of filesystems with their quirks, IBM z-series support, resume
>>>> from hibernation, […]
>>>
>>> This is correct. But which of these tools/features depend on dbus?
>>
>> I would love to add dbus support to all of them and use it, so I can connect
>> them all more easily. No need for them to invent their own version of IPC,
>> which can only be used by their own tool set.
>>
> Resume is built into the kernel, so no need for IPC there. Keymaps, fonts, and
> fsck need no IPC either. FIPS related stuff should need no IPC. Anything to
> do with the Device Mapper and hotplug should just need uevents. While I can
> kind of see you wanting to have lvmetad in the initramfs for use with LVM, I've
> seen all kinds of reports of issues caused by that. I can also kind of
> understand wanting some kind of unified IPC for the netboot related stuff, DBus
> is still serious overkill for any of that IMHO. As things stand currently, the
> few things in that list that I know actually use IPC for anything get by just
> fine (and much faster) using just UDS.
>
>

He asked what customers need, because he does not need udev, systemd, "mini
distro".

Most of the stuff does not work without udev and something like systemd.

And all of the stuff I mentioned together forms a "mini distro" for me.

2015-04-29 15:18:07

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-29 11:07, Harald Hoyer wrote:
> Most of the stuff does not work without udev and something like systemd.
>
That's funny, apparently the initramfs images I've been using for
multiple months now on server systems at work which don't have systemd,
udev, or dbus, and do LVM/RAID assembly, network configuration, crypto
devices, multipath, many different filesystems, and a number of other
oddball configurations due to the insanity that is the software I have
to deal with from our company, don't work. I wonder how my systems are
booting successfully 100% of the time then?



Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-29 15:16:40

by Simon McVittie

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29/04/15 14:35, Stephen Smalley wrote:
> As it currently stands, there
> are no LSM hook calls in the kdbus tree beyond metadata collection of
> security labels.

SELinux and AppArmor are the two particularly interesting LSMs here:
those are the ones that have support for user-space mediation in
dbus-daemon, and hence the ones for which replacing dbus-daemon with
kdbus, without LSM hooks, would be a regression.

> It is also interesting that kdbus allows impersonation of any
> credential, including security label, by "privileged" clients, where
> privileged simply means it either has CAP_IPC_OWNER or owns (euid
> matches uid) the bus.

FWIW, this particular feature is *not* one of those that are necessary
for feature parity with dbus-daemon. There's no API for making
dbus-daemon fake its clients' credentials; if you can ptrace it, then
you can of course subvert it arbitrarily, but nothing less hackish than
that is currently offered.

> On 04/29/2015 08:47 AM, Harald Hoyer wrote:
>> * Eavesdropping on the kernel level, so privileged users can hook into
>> the message stream without hacking support for that into their
>> userspace processes
>
> This one worried me a bit, particularly the statement that such
> eavesdropping is unobservable by any other participant on the bus.
> Seems a bit prone to abuse, particularly since it can be done by any
> privileged client, not merely the process that originally created the bus?

For feature parity with dbus-daemon, the fact that
eavesdropping/monitoring *exists* is necessary (it's a widely used
developer/sysadmin feature) but the precise mechanics of how you get it
are not necessarily set in stone. In particular, if you think kdbus'
definition of "are you privileged?" may be too broad, that seems a valid
question to be asking.

In traditional D-Bus, individual users can normally eavesdrop/monitor on
their own session buses (which are not a security boundary, unless
specially reconfigured), and this is a useful property; on non-LSM
systems without special configuration, each user should ideally be able
to monitor their own kdbus user bus, too.

The system bus *is* a security boundary, and administrative privileges
should be required to eavesdrop on it. At a high level, someone with
"full root privileges" should be able to eavesdrop, and ordinary users
should not; there are various possible criteria for distinguishing
between those two extremes, and I have no opinion on whether
CAP_IPC_OWNER is the most appropriate cutoff point.

In dbus-daemon, LSMs with integration code in dbus-daemon have the
opportunity to mediate eavesdropping specially. SELinux does not
currently do this (as far as I can see), but AppArmor does, so
AppArmor-confined processes are not normally allowed to eavesdrop on the
session bus (even though the same user's unconfined processes may). That
seems like one of the obvious places for an LSM hook in kdbus.

Having eavesdropping be unobservable means that applications cannot
change their behaviour while they are being watched, either maliciously
(to hide from investigation) or accidentally (bugs that only happen when
not being debugged are the hardest to fix). dbus-daemon's traditional
implementation of eavesdropping has had side-effects in the past, which
is undesirable, and is addressed by the new monitoring interface in
version 1.9. kdbus' version of eavesdropping is quite similar to the new
monitoring interface.

--
Simon McVittie
Collabora Ltd. <http://www.collabora.com/>
For context, I am a D-Bus maintainer, but neither the original designer
of D-Bus nor a kdbus developer.

2015-04-29 15:21:18

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-29 11:03, Theodore Ts'o wrote:
> On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote:
>> Sure, I can write one binary to rule them all, pull out all the code from all
>> tools I need, but for me an IPC mechanism sounds a lot better. And it should be
>> _one_ common IPC mechanism and not a plethora of them. It should feel like an
>> operating system and not like a bunch of thrown together software, which is
>> glued together with some magic shell scripts.
>
> And so requiring wireshark (and X?) in initramfs to debug problems
> once dbus is introduced is better?
>
> I would think shell scripts are *easier* to debug when things go
> wrong, especially in a minimal environment such as an initial ram
> disk. Having had to debug problems in a distro initramfs when trying
> to help a customer bring up a FC boot disk long ago in another life,
> I'm certain I would rather debug problems while on site at a
> classified machine room[1] using shell scripts, and trying to debug
> dbus is something that would be infinitely worse.
>
> - Ted
>
> [1] So no laptop, no google, no access to sources to figure out random
> dbus messages, etc.
Likewise.

I keep hearing from people that shell scripting is hard, it really isn't
compared to a number of other scripting languages, you just need to
actually learn to do it right (which is getting more and more difficult
these days cause fewer and fewer CS schools are teaching Unix).


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-29 15:22:13

by Harald Hoyer

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 17:17, Austin S Hemmelgarn wrote:
> On 2015-04-29 11:07, Harald Hoyer wrote:
>> Most of the stuff does not work without udev and something like systemd.
>>
> That's funny, apparently the initramfs images I've been using for multiple
> months now on server systems at work which don't have systemd, udev, or dbus,
> and do LVM/RAID assembly, network configuration, crypto devices, multipath,
> many different filesystems, and a number of other oddball configurations due to
> the insanity that is the software I have to deal with from our company, don't
> work. I wonder how my systems are booting successfully 100% of the time then?
>
>

Then you should probably open source your initramfs, so we can all benefit from
it and use it for all distributions.

2015-04-29 15:28:05

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer:
> We really don't want the IPC mechanism to be in a flux state. All tools
> have to fallback to a non-standard mechanism in that case.
>
> If I have to pull in a dbus daemon in the initramfs, we still have the
> chicken and egg problem for PID 1 talking to the logging daemon and
> starting dbus.
> systemd cannot talk to journald via dbus unless dbus-daemon is started,
> dbus cannot log anything on startup, if journald is not running, etc...

Do I get this right that it is basically a userspace *design* decision
that you use as a reason to have kdbus inside the kernel?

Is it really necessary to use DBUS for talking to journald? And does it
really matter that much if any message before starting up dbus do not
appear in the log? /proc/kmsg is a ring buffer, it can still be copied over
later.

I remember this kind of reason not not having cgroup management in a
separate process, but these are both in userspace.

"We have done it this way in userspace, thus this needs to be in kernel"
doesn?t sound quite convincing to me as an argument for having dbus inside
the kernel. Userspace uses the API the kernel and glibc provide, yes, it
makes sense to look at what userspace needs, but designing some things in
userspace and then requiring support for these design decisions in the
kernel just doesn?t sound quite right to me.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-29 15:41:56

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-29 11:22, Harald Hoyer wrote:
> On 29.04.2015 17:17, Austin S Hemmelgarn wrote:
>> On 2015-04-29 11:07, Harald Hoyer wrote:
>>> Most of the stuff does not work without udev and something like systemd.
>>>
>> That's funny, apparently the initramfs images I've been using for multiple
>> months now on server systems at work which don't have systemd, udev, or dbus,
>> and do LVM/RAID assembly, network configuration, crypto devices, multipath,
>> many different filesystems, and a number of other oddball configurations due to
>> the insanity that is the software I have to deal with from our company, don't
>> work. I wonder how my systems are booting successfully 100% of the time then?
>>
>>
>
> Then you should probably open source your initramfs, so we can all benefit from
> it and use it for all distributions.
>
It's (mostly, aside from a couple of overlays to deal with the hoops I
have to jump through to get some of our software working) just the
standard one generated by Gentoo's 'genkernel' program (specifically the
version from the genkernel-ng package), although now that I actually
look at it, it might have udev in it, although I'm certain the ones that
I have don't have systemd or dbus.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-29 16:25:17

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 29. April 2015, 11:03:41 schrieb Theodore Ts'o:
> On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote:
> > Sure, I can write one binary to rule them all, pull out all the code
> > from all tools I need, but for me an IPC mechanism sounds a lot
> > better. And it should be _one_ common IPC mechanism and not a
> > plethora of them. It should feel like an operating system and not
> > like a bunch of thrown together software, which is glued together
> > with some magic shell scripts.
>
> And so requiring wireshark (and X?) in initramfs to debug problems
> once dbus is introduced is better?
>
> I would think shell scripts are *easier* to debug when things go
> wrong, especially in a minimal environment such as an initial ram
> disk. Having had to debug problems in a distro initramfs when trying
> to help a customer bring up a FC boot disk long ago in another life,
> I'm certain I would rather debug problems while on site at a
> classified machine room[1] using shell scripts, and trying to debug
> dbus is something that would be infinitely worse.

Later in boot process I have seen some debugging issues with a systemd
governed userspace:

Bug 1213778 - drops into emergency mode without any error message if it
cannot find a filesystem in /etc/fstab
https://bugzilla.redhat.com/show_bug.cgi?id=1213778

Bug 1213781 - does not start ssh service if a filesystem in /etc/fstab
cannot be mounted
https://bugzilla.redhat.com/show_bug.cgi?id=1213781

With shell scripts I had error messages for these, here I had to browse
the output of journalctl -xb to find out.

And then what do I do if dbus communication is not working for some
reason? Will I then be able to use journalctl at all?

Not that proper error reporting can?t be added in systemd and the
dependency handling can?t be fixed towards some more sanity like for
example "no /boot mount, no worries, I still start that ssh service for
you", but currently for me this is a clear and heavy regression when I
compare this with sysvinit boot behavior.

I do fix things when being dropped to initramfs shell, I added some script
snippet for supporting BTRFS RAID 1 while it wasn?t yet supported in
Debian (dunno if it is for booting in Jessie, but as my bug reports have
not been closed yet, maybe bot), I still want to be able to do these kinds
of things.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-29 16:27:08

by John Stoffel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>>> "Harald" == Harald Hoyer <[email protected]> writes:

Harald> On 29.04.2015 15:33, Richard Weinberger wrote:
>> It depends how you define "beginning". To me an initramfs is a *very* minimal
>> tool to prepare the rootfs and nothing more (no udev, no systemd, no
>> "mini distro").
>> If the initramfs fails to do its job it can print to the console like
>> the kernel does if it fails
>> at a very early stage.
>>

Harald> Your solution might work for your small personal needs, but
Harald> not for our customers.

Arguing that your needs outweight mine because you have customers
ain't gonna fly... I don't care about your customers and why should I?
I'm not getting any money from them. Nor do I make any money from
Linux kernel though as an IT person, I support Linux all day long. Do
my requirements get listened to as well?

If your customers wnat this feature, you're more than welcome to fork
the kernel and support it yourself. Oh wait... Redhat does that
already. So what's the problem? Just put it into RHEL (which I use
I admit, along with Debian/Mint) and be done with it.

2015-04-29 16:37:39

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 29 Apr 2015, Martin Steigerwald wrote:

> Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer:
>> We really don't want the IPC mechanism to be in a flux state. All tools
>> have to fallback to a non-standard mechanism in that case.
>>
>> If I have to pull in a dbus daemon in the initramfs, we still have the
>> chicken and egg problem for PID 1 talking to the logging daemon and
>> starting dbus.
>> systemd cannot talk to journald via dbus unless dbus-daemon is started,
>> dbus cannot log anything on startup, if journald is not running, etc...
>
> Do I get this right that it is basically a userspace *design* decision
> that you use as a reason to have kdbus inside the kernel?
>
> Is it really necessary to use DBUS for talking to journald? And does it
> really matter that much if any message before starting up dbus do not
> appear in the log? /proc/kmsg is a ring buffer, it can still be copied over
> later.

I've been getting the early boot messages in my logs for decades (assuming the
system doesn't fail before the syslog daemon is started). It sometimes has
required setting a larger than default ringbuffer in the kernel, but that's easy
enough to do.

David Lang

2015-04-29 17:24:44

by Michal Hocko

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon 27-04-15 13:11:03, Andy Lutomirski wrote:
> [resent without HTML]
>
> On Apr 27, 2015 5:46 AM, "Michal Hocko" <[email protected]> wrote:
> >
> > On Wed 22-04-15 12:36:12, Andy Lutomirski wrote:
[...]
> > > The receiver gets to mmap the buffer. I'm not sure what protection they get.
> >
> > OK, so I've checked the code. kdbus_pool_new sets up a shmem file
> > (unlinked) so not visible externally. The consumer will get it via mmap
> > on the endpoint file by kdbus_pool_mmap and it refuses VM_WRITE and
> > clears VM_MAYWRITE. The receiver even doesn't have access to the shmem
> > file directly.
> >
> > It is ugly that kdbus_pool_mmap replaces the original vm_file and make
> > it point to the shmem file. I am not sure whether this is safe all the
> > time and it would deserve a big fat comment. On the other hand, it seems
> > some drivers are doing this already (e.g. dma_buf_mmap).
>
> What happens to map_files in proc? It seems unlikely that CRIU would
> ever work on dma_buf, but this could be a problem for CRIU with kdbus.

I am not familiar with CRIU and likewise with map_files directory. I've
actually heard about it for the first time.
[looking...]

So proc_map_files_readdir will iterate all VMAs including the one backed
by the buffer and it will see it's vm_file which will point to the shmem
file AFAICS. So it doesn't seem like CRIU would care because all parties
on the buffer would see the same inode. Whether that is really enough, I
dunno.

> > > The thing I'm worried about is that the receiver might deliberately
> > > avoid faulting in a bunch of pages and instead wait for the producer
> > > to touch them, causing pages that logically belong to the receiver to
> > > be charged to the producer instead.
> >
> > Hmm, now that I am looking into the code it seems you are right. E.g.
> > kdbus_cmd_send runs in the context of the sender AFAIU. This gets down
> > to kdbus_pool_slice_copy_iovec which does vfs_iter_write and this
> > is where we get to charge the memory. AFAIU the terminology all the
> > receivers will share the same shmem file when mmaping the endpoint.
> >
> > This, however, doesn't seem to be exploitable to hide memory charges
> > because the receiver cannot make the buffer writable. A nasty process
> > with a small memcg limit could still pre-fault the memory before any
> > writer gets sends a message and slow the whole endpoint traffic. But
> > that wouldn't be a completely new thing because processes might hammer
> > on memory even without memcg... It is just that this would be kind of
> > easier with memcg.
> > If that is the concern then the buffer should be pre-charged at the time
> > when it is created.
>
> The attach I had in mind was that the nasty process with a small memcg
> creates one or many of these and doesn't pre-fault it. Then a sender
> (systemd?) sends messages and they get charged, possibly once for each
> copy sent, to the root memcg.

Dunno but I suspect that systemd will not talk to random endpoints. Or
can those endpoints be registered on a systembus by an untrusted task?
Bus vs. endpoint relation is still not entirely clear to me.

But even if that was possible I fail to see how the small memcg plays
any role when the task doesn't control the shmem buffer directly but it
only has a read only mapping of it.

> So kdbus should probably pre-charge the creator of the pool.

Yes, as I've said above.
--
Michal Hocko
SUSE Labs

2015-04-29 17:39:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
>
> If your customers wnat this feature, you're more than welcome to fork
> the kernel and support it yourself. Oh wait... Redhat does that
> already. So what's the problem? Just put it into RHEL (which I use
> I admit, along with Debian/Mint) and be done with it.

Red Hat tries very hard to push things upstream. It's policy is to not keep
things for themselves, but always work with the community. That way, everyone
benefits. Ideally, we should come up with a solution that works for all.

-- Steve

2015-04-29 17:50:06

by Stephen Smalley

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 04/29/2015 11:18 AM, Simon McVittie wrote:
> On 29/04/15 14:35, Stephen Smalley wrote:
>> It is also interesting that kdbus allows impersonation of any
>> credential, including security label, by "privileged" clients, where
>> privileged simply means it either has CAP_IPC_OWNER or owns (euid
>> matches uid) the bus.
>
> FWIW, this particular feature is *not* one of those that are necessary
> for feature parity with dbus-daemon. There's no API for making
> dbus-daemon fake its clients' credentials; if you can ptrace it, then
> you can of course subvert it arbitrarily, but nothing less hackish than
> that is currently offered.

Then I'd be inclined to drop it from kdbus unless some compelling use
case exists, and even then, I don't believe that CAP_IPC_OWNER or
bus-owner uid match is sufficient even for forging credentials other
than the security label. For socket credentials passing, for example,
the kernel checks CAP_SYS_ADMIN for pid forging, CAP_SETUID for uid
forging, and CAP_SETGID for gid forging. And I don't believe we support
any form of forging of the security label on socket credentials.

> For feature parity with dbus-daemon, the fact that
> eavesdropping/monitoring *exists* is necessary (it's a widely used
> developer/sysadmin feature) but the precise mechanics of how you get it
> are not necessarily set in stone. In particular, if you think kdbus'
> definition of "are you privileged?" may be too broad, that seems a valid
> question to be asking.
>
> In traditional D-Bus, individual users can normally eavesdrop/monitor on
> their own session buses (which are not a security boundary, unless
> specially reconfigured), and this is a useful property; on non-LSM
> systems without special configuration, each user should ideally be able
> to monitor their own kdbus user bus, too.
>
> The system bus *is* a security boundary, and administrative privileges
> should be required to eavesdrop on it. At a high level, someone with
> "full root privileges" should be able to eavesdrop, and ordinary users
> should not; there are various possible criteria for distinguishing
> between those two extremes, and I have no opinion on whether
> CAP_IPC_OWNER is the most appropriate cutoff point.
>
> In dbus-daemon, LSMs with integration code in dbus-daemon have the
> opportunity to mediate eavesdropping specially. SELinux does not
> currently do this (as far as I can see), but AppArmor does, so
> AppArmor-confined processes are not normally allowed to eavesdrop on the
> session bus (even though the same user's unconfined processes may). That
> seems like one of the obvious places for an LSM hook in kdbus.

Yes, we would want to control this in SELinux; I suspect that either the
eavesdropping functionality did not exist in dbus-daemon at the time of
the original dbus-daemon SELinux integration or it was an oversight.

> Having eavesdropping be unobservable means that applications cannot
> change their behaviour while they are being watched, either maliciously
> (to hide from investigation) or accidentally (bugs that only happen when
> not being debugged are the hardest to fix). dbus-daemon's traditional
> implementation of eavesdropping has had side-effects in the past, which
> is undesirable, and is addressed by the new monitoring interface in
> version 1.9. kdbus' version of eavesdropping is quite similar to the new
> monitoring interface.

2015-04-29 18:28:55

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 29. April 2015, 17:22:08 schrieb Harald Hoyer:
> On 29.04.2015 17:17, Austin S Hemmelgarn wrote:
> > On 2015-04-29 11:07, Harald Hoyer wrote:
> >> Most of the stuff does not work without udev and something like
> >> systemd.>
> > That's funny, apparently the initramfs images I've been using for
> > multiple months now on server systems at work which don't have
> > systemd, udev, or dbus, and do LVM/RAID assembly, network
> > configuration, crypto devices, multipath, many different filesystems,
> > and a number of other oddball configurations due to the insanity that
> > is the software I have to deal with from our company, don't work. I
> > wonder how my systems are booting successfully 100% of the time then?
> Then you should probably open source your initramfs, so we can all
> benefit from it and use it for all distributions.

Do you really think that the tooling will make that much of a difference?

I think there will always be cases where a initramfs will not work until
adapted to it. And then its nice, to be able to do things like this:

merkaba:~> cat /etc/initramfs-tools/scripts/local-top/btrfs
#!/bin/sh

PREREQ="lvm"
prereqs()
{
echo $PREREQ
}

case $1 in
prereqs)
prereqs
exit 0;
esac

. /scripts/functions

log_begin_msg "Initializing BTRFS RAID-1."

modprobe btrfs
vgchange -ay
btrfs device scan

log_end_msg


How would I add support for some configuration that a systemd or purely
dracut + udev based initramfs does not support *yet*, on my own?

Yes, one can argue, why doesn?t Debian support it already, but heck, I can
do it myself and report a bug about it, without having to fire up a C
compiler in order to fix things. I may be able to do this myself, but at a
much higher cost in time.

Above thing works so long already that I even often forgot about it.

That said, if I still get a chance to execute a script at some time, a
dracut based initramfs may just be totally fine with it, but I want this
possibility and a shell to fix things up myself it they go wrong. And while
I do not get the need for having systemd in the initramfs at all, I might
be fine with it, if I can fix things up myself in case of problems.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-29 18:54:58

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
> Of course this can all be done, but it would involve fallback mechanisms,
> which we want to get rid off. Hopefully, you don't suggest to merge dbus with
> PID 1. Also with a daemon, you will lose the points mentioned in the cover mail
> :
>
> * Security: The peers which communicate do not have to trust each
> other, as the only trustworthy component in the game is the kernel
> which adds metadata and ensures that all data passed as payload is
> either copied or sealed, so that the receiver can parse the data
> without having to protect against changing memory while parsing
> buffers. Also, all the data transfer is controlled by the kernel,
> so that LSMs can track and control what is going on, without
> involving userspace. Because of the LSM issue, security people are
> much happier with this model than the current scheme of having to
> hook into dbus to mediate things.

Other security people prefer code to stay out of the kernel when possible.

Also, this metadata argument is still invalid. Sockets can be
improved for this purpose.

>
> * Being in the kernel closes a lot of races which can't be fixed with
> the current userspace solutions. For example, with kdbus, there is a
> way a client can disconnect from a bus, but do so only if no further
> messages present in its queue, which is crucial for implementing
> race-free "exit-on-idle" services

This can be implemented in userspace.

Client to dbus daemon: may I exit now?
Dbus daemon to client: yes (and no more messages) or no

>
> * Eavesdropping on the kernel level, so privileged users can hook into
> the message stream without hacking support for that into their
> userspace processes
>

Why would it be a hack in userspace but not a hack in the kernel?

> * A number of smaller benefits: for example kdbus learned a way to peek
> full messages without dequeing them, which is really useful for
> logging metadata when handling bus-activation requests.

MSG_PEEK?

>
> I don't care, if the kdbus speedup is only marginal.
>
> In my ideal world, there is a standard IPC mechanism from the beginning to
> the end, which does not rely on any process running (except the kernel) and
> which is used by _all_ tools, be it a system daemon providing information and
> interfaces about device assembly or network setup tools or end user desktop
> processes.
>
> dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you can
> invent the wheel again (NIH, "we know better") and wait and see, if that
> works out. Until then the whole common IPC problem is unresolved and Linux
> distributions are just a collection of random software with no common
> interoperability and home grown interfaces.

I don't think anyone is suggesting throwing out dbus.

>
> [1] transitioning mdmon is one of the critical parts for an IMSM raid array.
> Also running an executable from a disk, which the executable is monitoring,
> and which stops functioning, if the executable is not responding is insane.
>

That sounds like an excellent reason to just keep running the
initramfs copy, and I don't see why IPC has anything to do with where
the executable comes from.

--Andy

2015-04-29 19:10:11

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 29. April 2015, 13:39:42 schrieb Steven Rostedt:
> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
> > If your customers wnat this feature, you're more than welcome to fork
> > the kernel and support it yourself. Oh wait... Redhat does that
> > already. So what's the problem? Just put it into RHEL (which I use
> > I admit, along with Debian/Mint) and be done with it.
>
> Red Hat tries very hard to push things upstream. It's policy is to not
> keep things for themselves, but always work with the community. That
> way, everyone benefits. Ideally, we should come up with a solution that
> works for all.

I think work with the community is a two-way process.

Two way as in actually really *listening* to feedback instead of trying to
push things as much as possible, believing to be *right* about things. I
honestly dislike the "I know it better than you, go away" kind of attitude
I have seen again and again. Here, in systemd-devel (where I unsubscribed
again as I saw no use in continuing the discussion there) and even in
Debian mailinglists and bug reports.

Wherever I look what I call the "systemd" approach triggers intense
polarity and resistance. With that I do not say there is something wrong
about it, yet, I ask myself, why is that? And my best answer I came up
with up to now comes back to how proponents of the new, different, not
necessary better or worse, way, treat feedback.

There I found to some extent: taking the feedback into account and
actually adressing it. Especially when the feedback fitted into the new way
of doing things.

Yet I also found:

- "I know it better than you, go away."

- "Please only stick to pure technical reasons" as in "Whats wrong with
the code?" disregarding any concerns about the *concept* and about
different oppinions about whether kdbus code actually really belongs into
the kernel

- Ignoring it


So I still say the issues are not purely technical. So I think a purely
technical as in "what?s wrong with the code?" approach will not address
the core of this discussion and the strong resistance against merging
kdbus into the kernel.

It sometimes appears to me like childs arguing about whether to paint
their favorite toy red or green. I think a healthy approach might be to
agree to disagree and work from there. That would at least break the "I am
right", "No, I am right" cycle.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-29 19:28:40

by John Stoffel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>>> "Steven" == Steven Rostedt <[email protected]> writes:

Steven> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
>>
>> If your customers wnat this feature, you're more than welcome to fork
>> the kernel and support it yourself. Oh wait... Redhat does that
>> already. So what's the problem? Just put it into RHEL (which I use
>> I admit, along with Debian/Mint) and be done with it.

Steven> Red Hat tries very hard to push things upstream. It's policy
Steven> is to not keep things for themselves, but always work with the
Steven> community. That way, everyone benefits. Ideally, we should
Steven> come up with a solution that works for all.

Yeah, I agree they have been good. I'm just reacting to the off the
cuff comment of "my customers need it" which isn't a justification for
this feature, esp when it hasn't been shown to be needed in the
kernel.

We went through alot of this with tux the in-kernel httpd server, and
pushing other stuff out to user-space over the years. Why this needs
to come in isn't clear. Or why not just a small part needing to come
in with the rest in userspace.

John

2015-04-29 19:30:39

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-29 14:54, Andy Lutomirski wrote:
> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
>>
>> * Being in the kernel closes a lot of races which can't be fixed with
>> the current userspace solutions. For example, with kdbus, there is a
>> way a client can disconnect from a bus, but do so only if no further
>> messages present in its queue, which is crucial for implementing
>> race-free "exit-on-idle" services
>
> This can be implemented in userspace.
>
> Client to dbus daemon: may I exit now?
> Dbus daemon to client: yes (and no more messages) or no
>
Depending on how this is implemented, there would be a potential issue
if a message arrived for the client after the daemon told it it could
exit, but before it finished shutdown, in which case the message might
get lost.


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-29 19:42:27

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
<[email protected]> wrote:
> On 2015-04-29 14:54, Andy Lutomirski wrote:
>>
>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
>>>
>>>
>>> * Being in the kernel closes a lot of races which can't be fixed with
>>> the current userspace solutions. For example, with kdbus, there is a
>>> way a client can disconnect from a bus, but do so only if no further
>>> messages present in its queue, which is crucial for implementing
>>> race-free "exit-on-idle" services
>>
>>
>> This can be implemented in userspace.
>>
>> Client to dbus daemon: may I exit now?
>> Dbus daemon to client: yes (and no more messages) or no
>>
> Depending on how this is implemented, there would be a potential issue if a
> message arrived for the client after the daemon told it it could exit, but
> before it finished shutdown, in which case the message might get lost.
>

Then implement it the right way? The client sends some kind of
sequence number with its request.

--Andy

2015-04-29 20:16:29

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 29 Apr 2015, Andy Lutomirski wrote:

> On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
> <[email protected]> wrote:
>> On 2015-04-29 14:54, Andy Lutomirski wrote:
>>>
>>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
>>>>
>>>>
>>>> * Being in the kernel closes a lot of races which can't be fixed with
>>>> the current userspace solutions. For example, with kdbus, there is a
>>>> way a client can disconnect from a bus, but do so only if no further
>>>> messages present in its queue, which is crucial for implementing
>>>> race-free "exit-on-idle" services
>>>
>>>
>>> This can be implemented in userspace.
>>>
>>> Client to dbus daemon: may I exit now?
>>> Dbus daemon to client: yes (and no more messages) or no
>>>
>> Depending on how this is implemented, there would be a potential issue if a
>> message arrived for the client after the daemon told it it could exit, but
>> before it finished shutdown, in which case the message might get lost.
>>
>
> Then implement it the right way? The client sends some kind of
> sequence number with its request.

so any app in the system can prevent any other app from exiting/restarting by
just sending it the equivalent of a ping over dbus?

preventing an app from exiting because there are unhandled messages doesn't mean
that those messages are going to be handled, just that they will get read and
dropped on the floor by an app trying to exit. Sometimes you will just end up
with a hung app that can't process messages and needs to be restarted, but can't
be restarted because there are pending messages.

The problem with "guaranteed delivery" messages is that things _will_ go wrong
that will cause the messages to not be received and processed. At that point you
have the choice of loosing some messages or freezing your entire system (you can
buffer them for some time, but eventually you will run out of buffer space)

We see this all the time in the logging world, people configure their systems
for reliable delivery of log messages to a remote machine, then when that remote
machine goes down and can't receive messages (or a network issue blocks the
traffic), the sending machine blocks and causes an outage.

Being too strict about guaranteeing delivery just doesn't work. You must have a
mechanism to abort and throw away unprocessed messages. If this means
disconnecting the receiver so that there are no missing messages to the
receiver, that's a valid choice. But preventing a receiver from exiting because
it hasn't processed a message is not a valid choice.

David Lang

2015-04-29 20:25:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 29, 2015 at 1:15 PM, David Lang <[email protected]> wrote:
> On Wed, 29 Apr 2015, Andy Lutomirski wrote:
>
>> On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
>> <[email protected]> wrote:
>>>
>>> On 2015-04-29 14:54, Andy Lutomirski wrote:
>>>>
>>>>
>>>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
>>>>>
>>>>>
>>>>>
>>>>> * Being in the kernel closes a lot of races which can't be fixed with
>>>>> the current userspace solutions. For example, with kdbus, there is
>>>>> a
>>>>> way a client can disconnect from a bus, but do so only if no
>>>>> further
>>>>> messages present in its queue, which is crucial for implementing
>>>>> race-free "exit-on-idle" services
>>>>
>>>>
>>>>
>>>> This can be implemented in userspace.
>>>>
>>>> Client to dbus daemon: may I exit now?
>>>> Dbus daemon to client: yes (and no more messages) or no
>>>>
>>> Depending on how this is implemented, there would be a potential issue if
>>> a
>>> message arrived for the client after the daemon told it it could exit,
>>> but
>>> before it finished shutdown, in which case the message might get lost.
>>>
>>
>> Then implement it the right way? The client sends some kind of
>> sequence number with its request.
>
>
> so any app in the system can prevent any other app from exiting/restarting
> by just sending it the equivalent of a ping over dbus?
>
> preventing an app from exiting because there are unhandled messages doesn't
> mean that those messages are going to be handled, just that they will get
> read and dropped on the floor by an app trying to exit. Sometimes you will
> just end up with a hung app that can't process messages and needs to be
> restarted, but can't be restarted because there are pending messages.

I think this consideration is more or less the same whether it's
handled in the kernel or in userspace, though.

--Andy

2015-04-29 20:44:00

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 29 Apr 2015, Andy Lutomirski wrote:

> On Wed, Apr 29, 2015 at 1:15 PM, David Lang <[email protected]> wrote:
>> On Wed, 29 Apr 2015, Andy Lutomirski wrote:
>>
>>> On Wed, Apr 29, 2015 at 12:30 PM, Austin S Hemmelgarn
>>> <[email protected]> wrote:
>>>>
>>>> On 2015-04-29 14:54, Andy Lutomirski wrote:
>>>>>
>>>>>
>>>>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> * Being in the kernel closes a lot of races which can't be fixed with
>>>>>> the current userspace solutions. For example, with kdbus, there is
>>>>>> a
>>>>>> way a client can disconnect from a bus, but do so only if no
>>>>>> further
>>>>>> messages present in its queue, which is crucial for implementing
>>>>>> race-free "exit-on-idle" services
>>>>>
>>>>>
>>>>>
>>>>> This can be implemented in userspace.
>>>>>
>>>>> Client to dbus daemon: may I exit now?
>>>>> Dbus daemon to client: yes (and no more messages) or no
>>>>>
>>>> Depending on how this is implemented, there would be a potential issue if
>>>> a
>>>> message arrived for the client after the daemon told it it could exit,
>>>> but
>>>> before it finished shutdown, in which case the message might get lost.
>>>>
>>>
>>> Then implement it the right way? The client sends some kind of
>>> sequence number with its request.
>>
>>
>> so any app in the system can prevent any other app from exiting/restarting
>> by just sending it the equivalent of a ping over dbus?
>>
>> preventing an app from exiting because there are unhandled messages doesn't
>> mean that those messages are going to be handled, just that they will get
>> read and dropped on the floor by an app trying to exit. Sometimes you will
>> just end up with a hung app that can't process messages and needs to be
>> restarted, but can't be restarted because there are pending messages.
>
> I think this consideration is more or less the same whether it's
> handled in the kernel or in userspace, though.

If the justification for why this needs to be in the kernel is that you can't
reliably prevent apps from exiting if there are pending messages, then the
answer of "preventing apps from exiting if there are pending messages isn't a
sane thing to try and do" is a direct counter to that justification for
including it in the kernel.

David Lang

2015-04-29 20:51:45

by David Herrmann

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Hi

On Wed, Apr 29, 2015 at 10:43 PM, David Lang <[email protected]> wrote:
> If the justification for why this needs to be in the kernel is that you
> can't reliably prevent apps from exiting if there are pending messages, [...]

It's not.

> the answer of "preventing apps from exiting if there are pending messages
> isn't a sane thing to try and do" is a direct counter to that justification
> for including it in the kernel.

It's optionally used for reliable exit-on-idle.

Thanks
David

2015-04-29 21:16:56

by Paul Moore

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Fri, Apr 24, 2015 at 4:08 AM, Karol Lewandowski
<[email protected]> wrote:
> On Thu, Apr 23, 2015 at 09:30:13PM +0200, Greg Kroah-Hartman wrote:
>> On Thu, Apr 23, 2015 at 01:42:25PM -0400, Stephen Smalley wrote:
>> > On 04/23/2015 01:16 PM, Greg Kroah-Hartman wrote:
>> > > The binder developers at Samsung have stated that the implementation we
>> > > have here works for their model as well, so I guess that is some kind of
>> > > verification it's not entirely tied to D-Bus. They have plans on
>> > > dropping the existing binder kernel code and using the kdbus code
>> > > instead when it is merged.
>> >
>> > Where do things stand wrt LSM hooks for kdbus? I don't see any security
>> > hook calls in the kdbus tree except for the purpose of metadata
>> > collection of process security labels. But nothing for enforcing MAC
>> > over kdbus IPC. binder has a set of security hooks for that purpose, so
>> > it would be a regression wrt MAC enforcement to switch from binder to
>> > kdbus without equivalent checking there.
>>
>> There was a set of LSM hooks proposed for kdbus posted by Karol
>> Lewandowsk last October, and it also included SELinux and Smack patches.
>> They were going to be refreshed based on the latest code changes, but I
>> haven't seen them posted, or I can't seem to find them in my limited
>> email archive.
>
> We have been waiting for right moment with these. :-)
>
>> Karol, what's the status of them?
>
> I have handed patchset over to Paul Osmialowski who started rework it for v4
> relatively recently. I think it shouldn't be that hard to post updated version...
>
> Paul?

Different Paul here, but very interested in the LSM and SELinux hooks
for obvious reasons; at a bare minimum please CC the LSM list on the
kdbus hooks, and preferably the SELinux list as well. The initial
SELinux hooks I threw together were just a rough first pass, we (the
LSM and SELinux folks) need to have a better discussion about how to
provide the necessary access controls for kdbus ... preferably before
it finds its way into a released kernel.

--
paul moore
http://www.paul-moore.com

2015-04-29 22:35:01

by John Stoffel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>>> "Austin" == Austin S Hemmelgarn <[email protected]> writes:

Austin> On 2015-04-29 14:54, Andy Lutomirski wrote:
>> On Apr 29, 2015 5:48 AM, "Harald Hoyer" <[email protected]> wrote:
>>>
>>> * Being in the kernel closes a lot of races which can't be fixed with
>>> the current userspace solutions. For example, with kdbus, there is a
>>> way a client can disconnect from a bus, but do so only if no further
>>> messages present in its queue, which is crucial for implementing
>>> race-free "exit-on-idle" services
>>
>> This can be implemented in userspace.
>>
>> Client to dbus daemon: may I exit now?
>> Dbus daemon to client: yes (and no more messages) or no
>>

Austin> Depending on how this is implemented, there would be a
Austin> potential issue if a message arrived for the client after the
Austin> daemon told it it could exit, but before it finished shutdown,
Austin> in which case the message might get lost.

What makes anyone think they can guarrantee that a message is even
received? I could see the daemon sending the message and the client
getting a segfault and dumping core. What then? How would kdbus
solve this type of "race" anyway?

Can anyone give a concrete example of one of the races that are closed
here? That's been one of the missing examples. And remember, there's
no perfection. Even in the kernel we just had a discussion about
missed/missing IPIs and lost processor interrupts, etc. Expecting
perfection is just asking for trouble.

That's why there are timeouts, retries and just giving up and throwing
an exception.

John

2015-04-29 22:50:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
> If your customers wnat this feature, you're more than welcome to fork
> the kernel and support it yourself. Oh wait... Redhat does that
> already. So what's the problem? Just put it into RHEL (which I use
> I admit, along with Debian/Mint) and be done with it.

Harald,

If you make the RHEL initramfs harder to debug in the field, I will
await the time when some Red Hat field engineers will need to do the
same sort of thing I have had to do in the field, and be amused when
they want to shake you very warmly by the throat. :-)

Seriously, keep things as simple as possible in the initramfs; don't
use complicated bus protocols; that way lies madness. Enterprise
systems aren't constantly booting (or they shouldn't be, if your
kernels are sufficiently reliable :-), so trying to optimize for an
extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.

- Ted

2015-04-30 00:06:08

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Wed, 29 Apr 2015, Theodore Ts'o wrote:

> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
>> If your customers wnat this feature, you're more than welcome to fork
>> the kernel and support it yourself. Oh wait... Redhat does that
>> already. So what's the problem? Just put it into RHEL (which I use
>> I admit, along with Debian/Mint) and be done with it.
>
> Harald,
>
> If you make the RHEL initramfs harder to debug in the field, I will
> await the time when some Red Hat field engineers will need to do the
> same sort of thing I have had to do in the field, and be amused when
> they want to shake you very warmly by the throat. :-)
>
> Seriously, keep things as simple as possible in the initramfs; don't
> use complicated bus protocols; that way lies madness. Enterprise
> systems aren't constantly booting (or they shouldn't be, if your
> kernels are sufficiently reliable :-), so trying to optimize for an
> extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.

I've had Enterprise systems where I could hit power on two boxes, and finish the
OS install on one before the other has even finished POST and look for the boot
media. I did this 5 years ago, before the "let's speed up boot" push started.

Admittedly, this wasn't a stock distro boot/install, it was my own optimized
one, but it also wasn't as optimized and automated as it could have been
(several points where the installer needed to pick items from a menu and enter
values)

David Lang

2015-04-30 00:15:22

by Dave Airlie

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 30 April 2015 at 10:05, David Lang <[email protected]> wrote:
> On Wed, 29 Apr 2015, Theodore Ts'o wrote:
>
>> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
>>>
>>> If your customers wnat this feature, you're more than welcome to fork
>>> the kernel and support it yourself. Oh wait... Redhat does that
>>> already. So what's the problem? Just put it into RHEL (which I use
>>> I admit, along with Debian/Mint) and be done with it.
>>
>>
>> Harald,
>>
>> If you make the RHEL initramfs harder to debug in the field, I will
>> await the time when some Red Hat field engineers will need to do the
>> same sort of thing I have had to do in the field, and be amused when
>> they want to shake you very warmly by the throat. :-)
>>
>> Seriously, keep things as simple as possible in the initramfs; don't
>> use complicated bus protocols; that way lies madness. Enterprise
>> systems aren't constantly booting (or they shouldn't be, if your
>> kernels are sufficiently reliable :-), so trying to optimize for an
>> extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.
>
>
> I've had Enterprise systems where I could hit power on two boxes, and finish
> the OS install on one before the other has even finished POST and look for
> the boot media. I did this 5 years ago, before the "let's speed up boot"
> push started.
>
> Admittedly, this wasn't a stock distro boot/install, it was my own optimized
> one, but it also wasn't as optimized and automated as it could have been
> (several points where the installer needed to pick items from a menu and
> enter values)
>

You guys might have missed this new industry trend, I think they call
it virtualisation,

I hear it's going to be big, you might want to look into it.

Dave.

2015-04-30 00:18:39

by David Lang

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Thu, 30 Apr 2015, Dave Airlie wrote:

> On 30 April 2015 at 10:05, David Lang <[email protected]> wrote:
>> On Wed, 29 Apr 2015, Theodore Ts'o wrote:
>>
>>> On Wed, Apr 29, 2015 at 12:26:59PM -0400, John Stoffel wrote:
>>>>
>>>> If your customers wnat this feature, you're more than welcome to fork
>>>> the kernel and support it yourself. Oh wait... Redhat does that
>>>> already. So what's the problem? Just put it into RHEL (which I use
>>>> I admit, along with Debian/Mint) and be done with it.
>>>
>>>
>>> Harald,
>>>
>>> If you make the RHEL initramfs harder to debug in the field, I will
>>> await the time when some Red Hat field engineers will need to do the
>>> same sort of thing I have had to do in the field, and be amused when
>>> they want to shake you very warmly by the throat. :-)
>>>
>>> Seriously, keep things as simple as possible in the initramfs; don't
>>> use complicated bus protocols; that way lies madness. Enterprise
>>> systems aren't constantly booting (or they shouldn't be, if your
>>> kernels are sufficiently reliable :-), so trying to optimize for an
>>> extra 2 or 3 seconds worth of boot time really, REALLY isn't worth it.
>>
>>
>> I've had Enterprise systems where I could hit power on two boxes, and finish
>> the OS install on one before the other has even finished POST and look for
>> the boot media. I did this 5 years ago, before the "let's speed up boot"
>> push started.
>>
>> Admittedly, this wasn't a stock distro boot/install, it was my own optimized
>> one, but it also wasn't as optimized and automated as it could have been
>> (several points where the installer needed to pick items from a menu and
>> enter values)
>>
>
> You guys might have missed this new industry trend, I think they call
> it virtualisation,
>
> I hear it's going to be big, you might want to look into it.

So what do you run your virtual machines on? you still have to put an OS on the
hardware to support your VMs. Virtualization doesn't eliminate servers (as much
as some cloud advocates like to claim it does)

And virtualization has overhead, sometimes very significant overhead, so it's
not always the right answer.

David Lang

2015-04-30 01:20:36

by Dave Airlie

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>
>>> I've had Enterprise systems where I could hit power on two boxes, and
>>> finish
>>> the OS install on one before the other has even finished POST and look
>>> for
>>> the boot media. I did this 5 years ago, before the "let's speed up boot"
>>> push started.
>>>
>>> Admittedly, this wasn't a stock distro boot/install, it was my own
>>> optimized
>>> one, but it also wasn't as optimized and automated as it could have been
>>> (several points where the installer needed to pick items from a menu and
>>> enter values)
>>>
>>
>> You guys might have missed this new industry trend, I think they call
>> it virtualisation,
>>
>> I hear it's going to be big, you might want to look into it.
>
>
> So what do you run your virtual machines on? you still have to put an OS on
> the hardware to support your VMs. Virtualization doesn't eliminate servers
> (as much as some cloud advocates like to claim it does)
>
> And virtualization has overhead, sometimes very significant overhead, so
> it's not always the right answer.
>

Thanks for proving my point, RHEL as a distro runs in both scenarios, optimising
one is important at the moment, that fact that it might speed up boot on server
which take 15 mins to POST is a side effect.

For some reason people seem to think their one use case is all that matters,

Dave.

2015-04-30 01:43:30

by John Stoffel

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

>>>>> "David" == David Herrmann <[email protected]> writes:

David> Hi
David> On Wed, Apr 29, 2015 at 10:43 PM, David Lang <[email protected]> wrote:
>> If the justification for why this needs to be in the kernel is that you
>> can't reliably prevent apps from exiting if there are pending messages, [...]

David> It's not.

>> the answer of "preventing apps from exiting if there are pending messages
>> isn't a sane thing to try and do" is a direct counter to that justification
>> for including it in the kernel.

David> It's optionally used for reliable exit-on-idle.

Then why is there a critical race that must be solved in the kernel if
it's optional? And can you please describe in more detail what this
'exit-on-idle' thing is and how it works and why you would use it?

2015-04-30 09:05:36

by Łukasz Stelmach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

It was <2015-04-29 śro 17:21>, when Austin S Hemmelgarn wrote:
> On 2015-04-29 11:03, Theodore Ts'o wrote:
>> On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote:
>>> Sure, I can write one binary to rule them all, pull out all the code from all
>>> tools I need, but for me an IPC mechanism sounds a lot better. And it should be
>>> _one_ common IPC mechanism and not a plethora of them. It should feel like an
>>> operating system and not like a bunch of thrown together software, which is
>>> glued together with some magic shell scripts.
>>
>> And so requiring wireshark (and X?) in initramfs to debug problems
>> once dbus is introduced is better?
>>
>> I would think shell scripts are *easier* to debug when things go
>> wrong,
[...]
> I keep hearing from people that shell scripting is hard, it really
> isn't compared to a number of other scripting languages, you just need
> to actually learn to do it right (which is getting more and more
> difficult these days cause fewer and fewer CS schools are teaching
> Unix).

My 2/100 of a currency of your choice.

As much as I like(ed) shell scripts as a boot up tool and disliked
obscure boot-up procedures of some operating system, I can't help but
notice that GNU/Linux distributions have become very
sophisticated/complcated (cross out if not applicable). Personally
I feel that this degree of coplexity can't be supported by shell scripts
piping data around. It does not scale. I am not 100% sure a new IPC is
the answer, simply because I do not have experience to be so. It
definitely can be and the problem, as I see it, is real.

(The alternative answer is PowerShells capability to pipe objects. I
don't like it and I thik it's not a full answer.)

Regardless, of initrd issues I feel there is a need of a local IPC that
is more capable than UDS. Linus Torvalds is probably right that
dbus-daemon is everything but effictient. I disagree, however, that it
can be optimised and therefore solve *all* issues kdbus is trying to
address. dbus-deamon, by design, can't some things. It can't transmitt
large payloads without copying them. It can't be made race-free.

Kind regards,
--
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


Attachments:
signature.asc (472.00 B)

2015-04-30 09:12:55

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
> Regardless, of initrd issues I feel there is a need of a local IPC that
> is more capable than UDS. Linus Torvalds is probably right that
> dbus-daemon is everything but effictient. I disagree, however, that it
> can be optimised and therefore solve *all* issues kdbus is trying to
> address. dbus-deamon, by design, can't some things. It can't transmitt
> large payloads without copying them. It can't be made race-free.

This is true.
But as long dbus-deamon is not optimized as much as possible there is no reason
to force push kdbus.
As soon dbus-deamon exploits all kernel interfaces as much it can and it still needs
work (may it performance or other stuff) we can think of new kernel features which
can help dbus-deamon.

Thanks,
//richard


Attachments:
signature.asc (819.00 B)
OpenPGP digital signature

2015-04-30 10:19:18

by Łukasz Stelmach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>> Regardless, of initrd issues I feel there is a need of a local IPC
>> that is more capable than UDS. Linus Torvalds is probably right that
>> dbus-daemon is everything but effictient. I disagree, however, that
>> it can be optimised and therefore solve *all* issues kdbus is trying
>> to address. dbus-deamon, by design, can't some things. It can't
>> transmitt large payloads without copying them. It can't be made
>> race-free.
>
> This is true.
> But as long dbus-deamon is not optimized as much as possible there is
> no reason to force push kdbus.
> As soon dbus-deamon exploits all kernel interfaces as much it can and
> it still needs work (may it performance or other stuff) we can think
> of new kernel features which can help dbus-deamon.

I may not be well informed about kernel interfaces, but there are some
use cases no dbus-daemon optimisation can make work properly because of
rece-conditons introduced by the user-space based message router.

For example, a service can't aquire credentials of a client process that
actually sent a request (it can, but it can't trust them). The service
can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
dbus-daemon, can check client's and srevice's labels and enforce a
policy but it is going to be the daemon and not the LSM code in the
kernel.

--
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


Attachments:
signature.asc (472.00 B)

2015-04-30 10:40:12

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>> that is more capable than UDS. Linus Torvalds is probably right that
>>> dbus-daemon is everything but effictient. I disagree, however, that
>>> it can be optimised and therefore solve *all* issues kdbus is trying
>>> to address. dbus-deamon, by design, can't some things. It can't
>>> transmitt large payloads without copying them. It can't be made
>>> race-free.
>>
>> This is true.
>> But as long dbus-deamon is not optimized as much as possible there is
>> no reason to force push kdbus.
>> As soon dbus-deamon exploits all kernel interfaces as much it can and
>> it still needs work (may it performance or other stuff) we can think
>> of new kernel features which can help dbus-deamon.
>
> I may not be well informed about kernel interfaces, but there are some
> use cases no dbus-daemon optimisation can make work properly because of
> rece-conditons introduced by the user-space based message router.
>
> For example, a service can't aquire credentials of a client process that
> actually sent a request (it can, but it can't trust them). The service
> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
> dbus-daemon, can check client's and srevice's labels and enforce a
> policy but it is going to be the daemon and not the LSM code in the
> kernel.

That's why I said we can think of new kernel features if they are needed.
But they current sink or swim approach of kdbus folks is also not the solution.
As I said, if dbus-daemon utilizes the kernel interface as much as possible we
can think of new features.

Thanks,
//richard


Attachments:
signature.asc (819.00 B)
OpenPGP digital signature

2015-04-30 12:16:32

by Łukasz Stelmach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote:
> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>>> that is more capable than UDS. Linus Torvalds is probably right that
>>>> dbus-daemon is everything but effictient. I disagree, however, that
>>>> it can be optimised and therefore solve *all* issues kdbus is trying
>>>> to address. dbus-deamon, by design, can't some things. It can't
>>>> transmitt large payloads without copying them. It can't be made
>>>> race-free.
>>>
>>> This is true.
>>> But as long dbus-deamon is not optimized as much as possible there is
>>> no reason to force push kdbus.
>>> As soon dbus-deamon exploits all kernel interfaces as much it can and
>>> it still needs work (may it performance or other stuff) we can think
>>> of new kernel features which can help dbus-deamon.
>>
>> I may not be well informed about kernel interfaces, but there are
>> some use cases no dbus-daemon optimisation can make work properly
>> because of rece-conditons introduced by the user-space based message
>> router.
>>
>> For example, a service can't aquire credentials of a client process that
>> actually sent a request (it can, but it can't trust them). The service
>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
>> dbus-daemon, can check client's and srevice's labels and enforce a
>> policy but it is going to be the daemon and not the LSM code in the
>> kernel.
>
> That's why I said we can think of new kernel features if they are
> needed. But they current sink or swim approach of kdbus folks is also
> not the solution. As I said, if dbus-daemon utilizes the kernel
> interface as much as possible we can think of new features.

What kernel interfaces do you suggest to use to solve the issues
I mentioned in the second paragraph: race conditions, LSM support (for
example)?

BTW. Does anyone know how microkernel-based OSes implement sockets?
--
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


Attachments:
signature.asc (472.00 B)

2015-04-30 12:23:31

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach:
> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote:
>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>>>> that is more capable than UDS. Linus Torvalds is probably right that
>>>>> dbus-daemon is everything but effictient. I disagree, however, that
>>>>> it can be optimised and therefore solve *all* issues kdbus is trying
>>>>> to address. dbus-deamon, by design, can't some things. It can't
>>>>> transmitt large payloads without copying them. It can't be made
>>>>> race-free.
>>>>
>>>> This is true.
>>>> But as long dbus-deamon is not optimized as much as possible there is
>>>> no reason to force push kdbus.
>>>> As soon dbus-deamon exploits all kernel interfaces as much it can and
>>>> it still needs work (may it performance or other stuff) we can think
>>>> of new kernel features which can help dbus-deamon.
>>>
>>> I may not be well informed about kernel interfaces, but there are
>>> some use cases no dbus-daemon optimisation can make work properly
>>> because of rece-conditons introduced by the user-space based message
>>> router.
>>>
>>> For example, a service can't aquire credentials of a client process that
>>> actually sent a request (it can, but it can't trust them). The service
>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
>>> dbus-daemon, can check client's and srevice's labels and enforce a
>>> policy but it is going to be the daemon and not the LSM code in the
>>> kernel.
>>
>> That's why I said we can think of new kernel features if they are
>> needed. But they current sink or swim approach of kdbus folks is also
>> not the solution. As I said, if dbus-daemon utilizes the kernel
>> interface as much as possible we can think of new features.
>
> What kernel interfaces do you suggest to use to solve the issues
> I mentioned in the second paragraph: race conditions, LSM support (for
> example)?

The question is whether it makes sense to collect this kind of meta data.
I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS.

Thanks,
//richard


Attachments:
signature.asc (819.00 B)
OpenPGP digital signature

2015-04-30 12:40:48

by Łukasz Stelmach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote:
> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach:
>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote:
>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>>>>> that is more capable than UDS.
[...]
>>>> For example, a service can't aquire credentials of a client process that
>>>> actually sent a request (it can, but it can't trust them). The service
>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
>>>> dbus-daemon, can check client's and srevice's labels and enforce a
>>>> policy but it is going to be the daemon and not the LSM code in the
>>>> kernel.
>>>
>>> That's why I said we can think of new kernel features if they are
>>> needed. But they current sink or swim approach of kdbus folks is also
>>> not the solution. As I said, if dbus-daemon utilizes the kernel
>>> interface as much as possible we can think of new features.
>>
>> What kernel interfaces do you suggest to use to solve the issues
>> I mentioned in the second paragraph: race conditions, LSM support (for
>> example)?
>
> The question is whether it makes sense to collect this kind of meta data.
> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS.

Race conditions have nothing to do with metadata. Neither has LSM
support.

AF_UNIX with multicast support wouldn't be AF_UNIX anymore.

AF_BUS? I haven't followed the discussion back then. Why do you think it
is better than kdbus?

--
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


Attachments:
signature.asc (472.00 B)

2015-04-30 12:45:08

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 30.04.2015 um 14:40 schrieb Łukasz Stelmach:
> It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote:
>> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach:
>>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote:
>>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
>>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>>>>>> that is more capable than UDS.
> [...]
>>>>> For example, a service can't aquire credentials of a client process that
>>>>> actually sent a request (it can, but it can't trust them). The service
>>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
>>>>> dbus-daemon, can check client's and srevice's labels and enforce a
>>>>> policy but it is going to be the daemon and not the LSM code in the
>>>>> kernel.
>>>>
>>>> That's why I said we can think of new kernel features if they are
>>>> needed. But they current sink or swim approach of kdbus folks is also
>>>> not the solution. As I said, if dbus-daemon utilizes the kernel
>>>> interface as much as possible we can think of new features.
>>>
>>> What kernel interfaces do you suggest to use to solve the issues
>>> I mentioned in the second paragraph: race conditions, LSM support (for
>>> example)?
>>
>> The question is whether it makes sense to collect this kind of meta data.
>> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS.
>
> Race conditions have nothing to do with metadata. Neither has LSM
> support.

Sorry, I thought you mean the races while collecting metadata in userspace...

> AF_UNIX with multicast support wouldn't be AF_UNIX anymore.
>
> AF_BUS? I haven't followed the discussion back then. Why do you think it
> is better than kdbus?

Please see https://lwn.net/Articles/641278/

Thanks,
//richard


Attachments:
signature.asc (819.00 B)
OpenPGP digital signature

2015-04-30 14:52:44

by Łukasz Stelmach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

It was <2015-04-30 czw 14:45>, when Richard Weinberger wrote:
> Am 30.04.2015 um 14:40 schrieb Łukasz Stelmach:
>> It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote:
>>> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach:
>>>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote:
>>>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
>>>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>>>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>>>>>>> that is more capable than UDS.
>> [...]
>>>>>> For example, a service can't aquire credentials of a client process that
>>>>>> actually sent a request (it can, but it can't trust them). The service
>>>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
>>>>>> dbus-daemon, can check client's and srevice's labels and enforce a
>>>>>> policy but it is going to be the daemon and not the LSM code in the
>>>>>> kernel.
>>>>>
>>>>> That's why I said we can think of new kernel features if they are
>>>>> needed. But they current sink or swim approach of kdbus folks is also
>>>>> not the solution. As I said, if dbus-daemon utilizes the kernel
>>>>> interface as much as possible we can think of new features.
>>>>
>>>> What kernel interfaces do you suggest to use to solve the issues
>>>> I mentioned in the second paragraph: race conditions, LSM support (for
>>>> example)?
>>>
>>> The question is whether it makes sense to collect this kind of meta data.
>>> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS.
>>
>> Race conditions have nothing to do with metadata. Neither has LSM
>> support.
>
> Sorry, I thought you mean the races while collecting metadata in userspace...

My bad, some reace conditions *are* associated with collecting metadata
but ont all. It is impossible (correct me if I am wrong) to implement
reliable die-on-idle with dbus-daemon.

>> AF_UNIX with multicast support wouldn't be AF_UNIX anymore.
>>
>> AF_BUS? I haven't followed the discussion back then. Why do you think it
>> is better than kdbus?
>
> Please see https://lwn.net/Articles/641278/

Thanks. If I understand correctly, the author suggests using EBPF on a
receiveing socket side for receiving multicast messages. This is nice if
you care about introducing (or not) (too?) much of new code. However,
AFAICT it may be more computationally complex than Bloom filters because
you need to run EBPF on every receiving socket instead of getting a list
of a few of them to copy data to. Of course for small number of
receivers the "constant" cost of running the Bloom filter may be higher.

Kind regards,
--
Łukasz Stelmach
Samsung R&D Institute Poland
Samsung Electronics


Attachments:
signature.asc (472.00 B)

2015-04-30 15:05:54

by Richard Weinberger

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am 30.04.2015 um 16:52 schrieb Łukasz Stelmach:
>> Sorry, I thought you mean the races while collecting metadata in userspace...
>
> My bad, some reace conditions *are* associated with collecting metadata
> but ont all. It is impossible (correct me if I am wrong) to implement
> reliable die-on-idle with dbus-daemon.

IIRC Andy gave some ideas howto deal with that.
i.e. https://lkml.org/lkml/2015/4/29/622

>>> AF_UNIX with multicast support wouldn't be AF_UNIX anymore.
>>>
>>> AF_BUS? I haven't followed the discussion back then. Why do you think it
>>> is better than kdbus?
>>
>> Please see https://lwn.net/Articles/641278/
>
> Thanks. If I understand correctly, the author suggests using EBPF on a
> receiveing socket side for receiving multicast messages. This is nice if
> you care about introducing (or not) (too?) much of new code. However,
> AFAICT it may be more computationally complex than Bloom filters because
> you need to run EBPF on every receiving socket instead of getting a list
> of a few of them to copy data to. Of course for small number of
> receivers the "constant" cost of running the Bloom filter may be higher.

To make the story short, the kdbus *concept* needs much more
thought. There are many ideas out there howto deal with dbus issues without introducing
an ad-hoc solution. AF_BUS is just one of them.
IMHO AF_BUS would be nice but the decision is not up to me.

Thanks,
//richard


Attachments:
signature.asc (819.00 B)
OpenPGP digital signature

2015-04-30 20:14:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1



On April 29, 2015 7:47:53 AM CDT, Harald Hoyer <[email protected]> wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA256
>
>On 29.04.2015 01:12, John Stoffel wrote:
>> LDAP is pretty damn generic, in that you can put pretty large objects
>into
>> it, and pretty large OUs, etc. So why would it be a candidate for
>going
>> into the kernel? And why is kdbus so important in the kernel as
>well?
>> People have talked about it needing to be there for bootup, but isn't
>that
>> why we ripped out RAID detection and such from the kernel and built
>> initramfs, so that there's LESS in the kernel, and more in an early
>> userspace? Same idea with dbus in my opinion.
>
>Let me elaborate on the initramfs/shutdown situation a little bit more,
>because I have to deal with that every day.
>
>Because of the "let's move everything to userspace" sentiment we
>nowadays
>have the situation, that we need a lot of tools to setup the root
>device.
>
>Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the
>network
>(with bridging, bonding, etc.), the iSCSI connection, assemble the
>raid, the
>LVM, open crypto devices, etc...
>And if something goes wrong, you want to have a shell, see all the logs
>and
>debug things.
>
>Now over the time we moved away from simple shell scripts (without any
>logging) and static compiled special versions for the initramfs to a
>mini
>distribution in the initramfs, which simplifies maintenance and
>improves
>reliability.
>
>Basically you want to use the same tools in the initramfs (and
>shutdown)
>which you already have and use in your real root, with the same
>configuration
>files and the same interfaces and the same code paths.
>
>Therefore systemd is started in dracut created initramfs, which starts
>journald for logging. The same basic systemd targets exist in the
>initramfs
>as on the real root, so normally you don't have to cope with
>specialized
>versions for the initramfs.
>
>The target here is to have the same IPC mechanism from the very
>beginning to
>the very end. No crappy fallback mechanisms in case a daemon is not
>running
>or has crashed, no creepy transition from initramfs root to real root
>to
>shutdown root.
>
>We already have such transitions like: systemd, journald, mdmon [1],
>etc.
>systemd has to serialize itself, journald's file descriptors are
>transitioned
>over, mdmon jumps through hoops. Remember you want to get rid of open
>files
>and executables and have to reexec everything, if you transition from
>the
>initramfs root to the real root, and also from the real root to the
>shutdown
>root.
>
>We really don't want the IPC mechanism to be in a flux state. All tools
>have
>to fallback to a non-standard mechanism in that case.
>
>If I have to pull in a dbus daemon in the initramfs, we still have the
>chicken and egg problem for PID 1 talking to the logging daemon and
>starting
>dbus.
>systemd cannot talk to journald via dbus unless dbus-daemon is started,
>dbus
>cannot log anything on startup, if journald is not running, etc...
>
>dbus-daemon would have to transition to the real root, and from the
>real root
>to the shutdown root, without losing state.

Which does not sound fundamentally hard.
Unify the roots, and make /run or wherever the dbus socket lives always available. As long as your initramfs has the latest versions of software there is no need for any tricky transitions except to upgrade software on a running system.


>Of course this can all be done, but it would involve fallback
>mechanisms,
>which we want to get rid off.

Only if you design things poorly.

>Hopefully, you don't suggest to merge
>dbus with
>PID 1. Also with a daemon, you will lose the points mentioned in the
>cover mail

I don't see how something that is inappropriate to be in PID 1 is better in PID 0.


>I don't care, if the kdbus speedup is only marginal.
>
>In my ideal world, there is a standard IPC mechanism from the beginning
>to
>the end, which does not rely on any process running (except the kernel)
>and
>which is used by _all_ tools, be it a system daemon providing
>information and
>interfaces about device assembly or network setup tools or end user
>desktop
>processes.

And that is a beautiful dream and an absolutely rubbish way to get there. If the performance is not top notch everything can not use your beautiful IPC mechanism. Which means your dream fails.

Good performance is a hard requirement to get where you want to be.

>dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you
>can
>invent the wheel again (NIH, "we know better") and wait and see, if
>that
>works out. Until then the whole common IPC problem is unresolved and
>Linux
>distributions are just a collection of random software with no common
>interoperability and home grown interfaces.


kdbus seems to be the NIH "we know better better" approach. Many of it's design decisions we have chosen to differently elsewhere in the kernel because the have caused problems. When these issues have been pointed out in review people have blown off leading to the current mess.

Furthermore I don't know that I have seen people arguing for transporting something other than dbus messages but rather I have seen people pointing out there have been many excellent IPC mechanisms that are simpler and faster for the same kind of task and suggesting mapping dbus to better kernel primitives might be productive.

But seriously if you want to have one IPC mechanism to rule them all you won't succeed in convincing everyone with the currently sloppily designed kdbus code.

Performance matters, simplicity matters, being able to explain design decisions matter.

Eric

2015-05-01 15:49:18

by Austin S Hemmelgarn

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 2015-04-29 08:47, Harald Hoyer wrote:
> Until then the whole common IPC problem is unresolved and Linux
> distributions are just a collection of random software with no common
> interoperability and home grown interfaces.
I don't know how I managed to not notice this comment before, but I find
it particularly hilarious.
The part about 'no common interoperability' is just plain BS with the
exception of some of the insanity being touted by systemd advocates and
the insanity that is accessibility software on linux, you can easily
string together pretty much arbitrary strings of commands using fifo's
to achieve almost anything; the actual interoperability issues (WRT to
the command line at least, which is where all the stuff you are
complaining about works) come up only with stuff (like journald for
example) that just refuses to use text interfaces on the command-line.
Also, the 'home grown interfaces' you are complaining about are used on
every operating system (not just Linux or other Unix progeny) every day,
and no amount of better IPC is going to stop that; furthermore, almost
every current 'standard' protocol or interface used on the internet
started out as a 'home grown interface' (TCP/IP immediately comes to
mind, followed shortly by NFS, SMTP, WebDAV, SSH, XML, JSON, and a whole
slew of others).


Attachments:
smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-06-22 17:33:34

by Jindrich Makovicka

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, 27 Apr 2015 15:14:49 -0700, Linus Torvalds wrote:

> On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> IOW, all the people who say that it's about avoiding context switches
>> are probably just full of shit. It's not about context switches, it's
>> about bad user-level code.
>
> Just to make sure, I did a system-wide profile (so that you can actually
> see the overhead of context switching better), and that didn't change
> the picture.
>
> The scheduler overhead *might* be 1% or so.
>
> So really. The people who talk about how kdbus improves performance are
> just full of sh*t. Yes, it improves things, but the improvement seems to
> be 100% "incidental", in that it avoids a few trips down the user-space
> problems.
>
> The real problems seem to be in dbus memory management (suggestion: keep
> a small per-thread cache of those message allocations) and to a smaller
> degree in the crazy utf8 validation (why the f*ck does it do that
> anyway?), with some locking problems thrown in for good measure.

In case someone actually still reads this, I guess the global rw_lock in
gobject/gtype.c is one of the culprits. Every GType instance allocation/
deallocation is serialized using this lock, which pretty much
disqualifies GObject from being used for anything scalable to multiple
threads.

GStreamer used to have serious performance issues due to that, which
AFAIK have been solved by removing GType from GStreamer core in the 1.0
release.

Regards,
--
Jindrich Makovicka

2015-06-22 20:24:33

by Jiri Kosina

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, 22 Jun 2015, Jindrich Makovicka wrote:

> >> IOW, all the people who say that it's about avoiding context switches
> >> are probably just full of shit. It's not about context switches, it's
> >> about bad user-level code.
> >
> > Just to make sure, I did a system-wide profile (so that you can actually
> > see the overhead of context switching better), and that didn't change
> > the picture.
> >
> > The scheduler overhead *might* be 1% or so.
> >
> > So really. The people who talk about how kdbus improves performance are
> > just full of sh*t. Yes, it improves things, but the improvement seems to
> > be 100% "incidental", in that it avoids a few trips down the user-space
> > problems.
> >
> > The real problems seem to be in dbus memory management (suggestion: keep
> > a small per-thread cache of those message allocations) and to a smaller
> > degree in the crazy utf8 validation (why the f*ck does it do that
> > anyway?), with some locking problems thrown in for good measure.
>
> In case someone actually still reads this, I guess the global rw_lock in
> gobject/gtype.c is one of the culprits. Every GType instance allocation/
> deallocation is serialized using this lock, which pretty much
> disqualifies GObject from being used for anything scalable to multiple
> threads.
>
> GStreamer used to have serious performance issues due to that, which
> AFAIK have been solved by removing GType from GStreamer core in the 1.0
> release.

This is interesting piece of information, but you unfortunately dropped
everybody from CC, so it's very likely that it's going to be lost in the
noise.

Could you please resend with CC list restored?

Thanks,

--
Jiri Kosina
SUSE Labs

2015-06-22 21:24:12

by Jindrich Makovicka

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, 27 Apr 2015 15:14:49 -0700
Linus Torvalds <[email protected]> wrote:

> On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > IOW, all the people who say that it's about avoiding context
> > switches are probably just full of shit. It's not about context
> > switches, it's about bad user-level code.
>
> Just to make sure, I did a system-wide profile (so that you can
> actually see the overhead of context switching better), and that
> didn't change the picture.
>
> The scheduler overhead *might* be 1% or so.
>
> So really. The people who talk about how kdbus improves performance
> are just full of sh*t. Yes, it improves things, but the improvement
> seems to be 100% "incidental", in that it avoids a few trips down the
> user-space problems.
>
> The real problems seem to be in dbus memory management (suggestion:
> keep a small per-thread cache of those message allocations) and to a
> smaller degree in the crazy utf8 validation (why the f*ck does it do
> that anyway?), with some locking problems thrown in for good measure.

In case someone actually still reads this, I guess the global rw_lock in
gobject/gtype.c, used by the GDbus binding, is one of the culprits.
Every GType instance allocation/ deallocation is serialized using this
lock, which pretty much disqualifies GObject from being used for
anything scalable to multiple threads.

GStreamer used to have serious performance issues due to that, which
AFAIK have been solved by removing GType from GStreamer core in the 1.0
release.

Regards,

--
Jindrich Makovicka

2015-07-03 09:14:24

by cee1

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

2015-04-30 22:52 GMT+08:00 Łukasz Stelmach <[email protected]>:
> It was <2015-04-30 czw 14:45>, when Richard Weinberger wrote:
>> Am 30.04.2015 um 14:40 schrieb Łukasz Stelmach:
>>> It was <2015-04-30 czw 14:23>, when Richard Weinberger wrote:
>>>> Am 30.04.2015 um 14:16 schrieb Łukasz Stelmach:
>>>>> It was <2015-04-30 czw 12:40>, when Richard Weinberger wrote:
>>>>>> Am 30.04.2015 um 12:19 schrieb Łukasz Stelmach:
>>>>>>> It was <2015-04-30 czw 11:12>, when Richard Weinberger wrote:
>>>>>>>> Am 30.04.2015 um 11:05 schrieb Łukasz Stelmach:
>>>>>>>>> Regardless, of initrd issues I feel there is a need of a local IPC
>>>>>>>>> that is more capable than UDS.
>>> [...]
>>>>>>> For example, a service can't aquire credentials of a client process that
>>>>>>> actually sent a request (it can, but it can't trust them). The service
>>>>>>> can't be protected by LSM on a bus that is driven by dbus-daemon. Yes,
>>>>>>> dbus-daemon, can check client's and srevice's labels and enforce a
>>>>>>> policy but it is going to be the daemon and not the LSM code in the
>>>>>>> kernel.
>>>>>>
>>>>>> That's why I said we can think of new kernel features if they are
>>>>>> needed. But they current sink or swim approach of kdbus folks is also
>>>>>> not the solution. As I said, if dbus-daemon utilizes the kernel
>>>>>> interface as much as possible we can think of new features.
>>>>>
>>>>> What kernel interfaces do you suggest to use to solve the issues
>>>>> I mentioned in the second paragraph: race conditions, LSM support (for
>>>>> example)?
>>>>
>>>> The question is whether it makes sense to collect this kind of meta data.
>>>> I really like Andy and Alan's idea improve AF_UNIX or revive AF_BUS.
>>>
>>> Race conditions have nothing to do with metadata. Neither has LSM
>>> support.
>>
>> Sorry, I thought you mean the races while collecting metadata in userspace...
>
> My bad, some reace conditions *are* associated with collecting metadata
> but ont all. It is impossible (correct me if I am wrong) to implement
> reliable die-on-idle with dbus-daemon.
>
>>> AF_UNIX with multicast support wouldn't be AF_UNIX anymore.
>>>
>>> AF_BUS? I haven't followed the discussion back then. Why do you think it
>>> is better than kdbus?
>>
>> Please see https://lwn.net/Articles/641278/
>
> Thanks. If I understand correctly, the author suggests using EBPF on a
> receiveing socket side for receiving multicast messages. This is nice if
> you care about introducing (or not) (too?) much of new code. However,
> AFAICT it may be more computationally complex than Bloom filters because
> you need to run EBPF on every receiving socket instead of getting a list
> of a few of them to copy data to. Of course for small number of
> receivers the "constant" cost of running the Bloom filter may be higher.

Still think about the idea of implementing KDBUS in the form of socket.

What about using __multicast group__ instead of EBPF, to send/receive
multicast message?

(Which can implement the bloom filter as follows ?) E.g.

Sender:
send to multi_address

Receivers:
if ((multi_address & joined_address) == joined_address) {
/* a message for us */
}

Then we can further apply EBFP to remove the "False positive" case,
which will otherwise wake up user space code, and let it clear "False
positive" case.



--
Regards,

- cee1

2015-07-07 22:22:45

by Johannes Stezenbach

[permalink] [raw]
Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On Mon, Apr 27, 2015 at 03:14:49PM -0700, Linus Torvalds wrote:
> On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > IOW, all the people who say that it's about avoiding context switches
> > are probably just full of shit. It's not about context switches, it's
> > about bad user-level code.
>
> Just to make sure, I did a system-wide profile (so that you can
> actually see the overhead of context switching better), and that
> didn't change the picture.
>
> The scheduler overhead *might* be 1% or so.
>
> So really. The people who talk about how kdbus improves performance
> are just full of sh*t. Yes, it improves things, but the improvement
> seems to be 100% "incidental", in that it avoids a few trips down the
> user-space problems.

I was interested how plain UDS performs compared to the
dbus-client/dbus-server benchmark when doing a similar
transaction (RPC call from client1 to client2 via a server,
i.e 4 send() and 4 recv() syscalls per RPC msg).
Since I had worked on socket code for some project anyway, I
decided to write a stupid little benchmark.

On my machine, dbus-client/dbus-server needs ~200us per call (1024 byte msg),
UDS "dbus call" needs ~23us. Of course, someone who cares about performance
wouldn't use sync RPC via a message broker, so I added
single-client and async mode to the benchmark for comparison.
Async mode not only decreases scheduling overhead, it also
can use two CPU cores, so it's more than twice as fast.

./server dbus
(you need to run two clients, the timing loop starts
when the second client connects)
./client sync 4096 1000000
22.757250 s, 43942 msg/s, 22.8 us/msg, 171.638 MB/s
./client async 4096 1000000
8.197482 s, 121989 msg/s, 8.2 us/msg, 476.488 MB/s
./server single
(only a single client talks to the server)
./client sync 4096 1000000
10.980143 s, 91073 msg/s, 11.0 us/msg, 355.733 MB/s
./client async 4096 1000000
3.041953 s, 328736 msg/s, 3.0 us/msg, 1284.044 MB/s

In all cases 1 msg means "send request + receive response".


Johannes


Attachments:
(No filename) (2.08 kB)
server.c (2.28 kB)
client.c (2.60 kB)
Makefile (79.00 B)
Download all attachments