Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932808AbbDOQ1x (ORCPT ); Wed, 15 Apr 2015 12:27:53 -0400 Received: from mail-qg0-f44.google.com ([209.85.192.44]:36681 "EHLO mail-qg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756669AbbDOQ1f (ORCPT ); Wed, 15 Apr 2015 12:27:35 -0400 MIME-Version: 1.0 In-Reply-To: <20150415150658.0b33410c@lxorguk.ukuu.org.uk> References: <20150414192357.GA6107@kroah.com> <20150414192429.GC26075@pd.tnic> <20150414193229.GB6107@kroah.com> <20150414194004.GG889@ZenIV.linux.org.uk> <20150414194804.GB7540@kroah.com> <20150415113727.0cfd5224@lxorguk.ukuu.org.uk> <20150415114936.GD19274@kroah.com> <20150415130354.458abfc2@lxorguk.ukuu.org.uk> <20150415124134.GC20554@kroah.com> <20150415150658.0b33410c@lxorguk.ukuu.org.uk> From: Havoc Pennington Date: Wed, 15 Apr 2015 12:27:03 -0400 X-Google-Sender-Auth: pV6L4X1quhI9S75lwoOQgW3PppI Message-ID: Subject: Re: [GIT PULL] kdbus for 4.1-rc1 To: One Thousand Gnomes Cc: Greg Kroah-Hartman , Jiri Kosina , Al Viro , Borislav Petkov , Andy Lutomirski , Linus Torvalds , Andrew Morton , Arnd Bergmann , "Eric W. Biederman" , Tom Gundersen , "linux-kernel@vger.kernel.org" , Daniel Mack , David Herrmann , Djalal Harouni Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6900 Lines: 146 Hi, I'm temporarily joining the list if anyone has questions about why dbus was originally the way it is. If you would like answers about its latest usage, systemd, or the kernel implementation, those are best answered by others. I "led" the original design but I was hardly the only person involved. I was sort of synthesizing previous efforts, lots of ideas from other people, and mediating the politics of the time. What I'd like to see in this conversation is: understanding what exists, and why it exists. If people understand that then I think they can make good decisions, using whatever process or timeline you like; I don't pretend to know much about kdbus, but I see a lot of confusion here about the use-case and design of dbus itself. No one should take the design on faith. To improve and maintain something it must be understood. Why should you bother to understand dbus as it exists? It's pretty successful, and I think for a reason. Hundreds of programs are using dbus, it's become (over a decade) foundational to the most-used Linux userspaces, there are many different implementations of it, and it's been quite a stable design over that time without any major changes. I don't think that's because it's perfect; I do think it's because some things are right, in ways that previous designs were not. The Linux userspace community went through a lot of alternatives before dbus, and dbus was the one that lasted. The worst-case scenario in my mind would be for the kernel to merge something dbus-like, but with ill-informed changes that render it worse. Then you would have a new ABI that nobody wants to use. We have a design in the wild that's been very successful. People using it for its intended use-case seem to like it. Step 1 is to try to understand why that is. I will try to give my take on some of the reasons. I can't emphasize enough that the success of dbus was *because of* many "obvious" criticisms people may have. Why? Tradeoffs. Given infinite time and resources, many of those tradeoffs can be mitigated or avoided - and I see kdbus as part of an effort to do so. The first and most important tradeoff: the central daemon (the hub in the wheel). A central daemon has several disadvantages. The success of dbus happened because those disadvantages, in this context, are not as important as the advantages. The advantages include: * ability to send a broadcast message to all interested processes * tracking/discovering well-known and unique names * crossing security domains (system-daemon-to-per-user-UIs, in particular) in an orderly fashion * reducing the number of file descriptors needed for N apps to all talk to each other * relatively simple model for application developers to get right The disadvantages include: * performance (extra context switches, copies, and validations) * it's difficult to handle killing/restarting the central daemon; dbus actually gives clients all the tools to do this, but in practice if you restart the daemon you are gambling that a hundred clients connected to it have implemented bug-free restart handling. * not a distributed cluster (it's a single bottleneck and point of failure running on a single machine - the daemon is a source of truth, which is also its virtue of course) For dbus to be as useful as it has been, these disadvantages, while not desirable, were acceptable tradeoffs. So it would be a mistake to solve any of these disadvantages by breaking the advantages. Message passing or IPC isn't really the most important part of dbus. Process lifecycle tracking and discovery are more important. However, by integrating the IPC system with the lifecycle tracking you can simplify the overall system and avoid race conditions. For example, you can have processes that auto-launch race-free when you send them a message, or more generally you can have an ordering between lifecycle events and other messages. For example if I send out a broadcast message and then disconnect, other clients will see first the broadcast and then the disconnect and won't have to handle the out-of-order case. dbus has a lot of semantic guarantees, such as message ordering, that reduce application complexity and therefore reduce code and reduce bugs. When implementing a Linux workstation userspace, ideally you have lots of little processes that do one thing each; but the tradeoff is that multi-process adds complexity. If your model for a multi-process program is that it has to solve a lot of hard distributed system problems, then it adds a LOT of complexity. But when everyone's on a single machine, it is not necessary to solve (all of) those problems, and in fact trying to solve non-problems creates bugs by adding tricky, rarely-touched codepaths. It is overengineering to treat "tray icon talking to NetworkManager" the same way you would treat IPC and shared state within a distributed cluster. Multi-process is valuable though; an alternative userspace design could be like Eclipse or Emacs, i.e. one enormous process with plugins, which would be a mess. There was some debate over my X11 analogy. One of the "thought experiments" while figuring out dbus was "why does CORBA seem to be at the root of endless bug reports, while X11 isn't?" Here are some things I think dbus has in common with X11: * it's a hub-and-spoke design (a central server that all apps connect to) rather than a design where every process talks directly to every other process * dbus names are directly modeled on X selections (see ICCCM) * designed to allow race-free asynchronous usage and minimize the need for round trips (though apps can certainly design bad APIs, see http://dbus.freedesktop.org/doc/dbus-api-design.html for advice on avoiding that) * binary protocol rather than text * generally assumes a reliable network - assumes all messages will arrive, as long as the connection is live * similar model for discovering and authenticating to the server * allows clients to track each other's lifecycle * it is stateful; clients connect, fetch the current state, then track changes to the state via events. Some differences from X11 of course: * X11 is a domain-specific server (about sharing the graphics and input hardware among multiple clients), while with dbus the domain-specific API will be in some client and the bus is only an intermediary. * X11 therefore has a bunch more server state than dbus; dbus only has to track clients, not track the state of the window system. * IPC on X11 is sort of bolted on in an ugly way (client messages) while dbus cleanly maps to the OO model people are used to in the rest of their code. Havoc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/