On 2015-04-29 11:03, Theodore Ts'o wrote:
> On Wed, Apr 29, 2015 at 04:53:53PM +0200, Harald Hoyer wrote:
>> Sure, I can write one binary to rule them all, pull out all the code from all
>> tools I need, but for me an IPC mechanism sounds a lot better. And it should be
>> _one_ common IPC mechanism and not a plethora of them. It should feel like an
>> operating system and not like a bunch of thrown together software, which is
>> glued together with some magic shell scripts.
>
> And so requiring wireshark (and X?) in initramfs to debug problems
> once dbus is introduced is better?
>
> I would think shell scripts are *easier* to debug when things go
> wrong, especially in a minimal environment such as an initial ram
> disk. Having had to debug problems in a distro initramfs when trying
> to help a customer bring up a FC boot disk long ago in another life,
> I'm certain I would rather debug problems while on site at a
> classified machine room[1] using shell scripts, and trying to debug
> dbus is something that would be infinitely worse.
>
> - Ted
>
> [1] So no laptop, no google, no access to sources to figure out random
> dbus messages, etc.
Likewise.

I keep hearing from people that shell scripting is hard, it really isn't
compared to a number of other scripting languages, you just need to
actually learn to do it right (which is getting more and more difficult
these days cause fewer and fewer CS schools are teaching Unix).

Attachments:

smime.p7s (2.90 kB)
S/MIME Cryptographic Signature

2015-04-29 15:22:13

by Harald Hoyer

[permalink] [raw]

Subject: Re: [GIT PULL] kdbus for 4.1-rc1

On 29.04.2015 17:17, Austin S Hemmelgarn wrote:
> On 2015-04-29 11:07, Harald Hoyer wrote:
>> Most of the stuff does not work without udev and something like systemd.
>>
> That's funny, apparently the initramfs images I've been using for multiple
> months now on server systems at work which don't have systemd, udev, or dbus,
> and do LVM/RAID assembly, network configuration, crypto devices, multipath,
> many different filesystems, and a number of other oddball configurations due to
> the insanity that is the software I have to deal with from our company, don't
> work. I wonder how my systems are booting successfully 100% of the time then?
>
>

Then you should probably open source your initramfs, so we can all benefit from
it and use it for all distributions.

2015-04-29 15:28:05

by Martin Steigerwald

[permalink] [raw]

Subject: Re: [GIT PULL] kdbus for 4.1-rc1

Am Mittwoch, 29. April 2015, 14:47:53 schrieb Harald Hoyer:
> We really don't want the IPC mechanism to be in a flux state. All tools
> have to fallback to a non-standard mechanism in that case.
>
> If I have to pull in a dbus daemon in the initramfs, we still have the
> chicken and egg problem for PID 1 talking to the logging daemon and
> starting dbus.
> systemd cannot talk to journald via dbus unless dbus-daemon is started,
> dbus cannot log anything on startup, if journald is not running, etc...

Do I get this right that it is basically a userspace *design* decision
that you use as a reason to have kdbus inside the kernel?

Is it really necessary to use DBUS for talking to journald? And does it
really matter that much if any message before starting up dbus do not
appear in the log? /proc/kmsg is a ring buffer, it can still be copied over
later.

I remember this kind of reason not not having cgroup management in a
separate process, but these are both in userspace.

"We have done it this way in userspace, thus this needs to be in kernel"
doesn?t sound quite convincing to me as an argument for having dbus inside
the kernel. Userspace uses the API the kernel and glibc provide, yes, it
makes sense to look at what userspace needs, but designing some things in
userspace and then requiring support for these design decisions in the
kernel just doesn?t sound quite right to me.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2015-04-29 15:41:56

On April 29, 2015 7:47:53 AM CDT, Harald Hoyer <[email protected]> wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA256
>
>On 29.04.2015 01:12, John Stoffel wrote:
>> LDAP is pretty damn generic, in that you can put pretty large objects
>into
>> it, and pretty large OUs, etc. So why would it be a candidate for
>going
>> into the kernel? And why is kdbus so important in the kernel as
>well?
>> People have talked about it needing to be there for bootup, but isn't
>that
>> why we ripped out RAID detection and such from the kernel and built
>> initramfs, so that there's LESS in the kernel, and more in an early
>> userspace? Same idea with dbus in my opinion.
>
>Let me elaborate on the initramfs/shutdown situation a little bit more,
>because I have to deal with that every day.
>
>Because of the "let's move everything to userspace" sentiment we
>nowadays
>have the situation, that we need a lot of tools to setup the root
>device.
>
>Be it LVM on IMSM or iSCSI multipath, the initramfs has to setup the
>network
>(with bridging, bonding, etc.), the iSCSI connection, assemble the
>raid, the
>LVM, open crypto devices, etc...
>And if something goes wrong, you want to have a shell, see all the logs
>and
>debug things.
>
>Now over the time we moved away from simple shell scripts (without any
>logging) and static compiled special versions for the initramfs to a
>mini
>distribution in the initramfs, which simplifies maintenance and
>improves
>reliability.
>
>Basically you want to use the same tools in the initramfs (and
>shutdown)
>which you already have and use in your real root, with the same
>configuration
>files and the same interfaces and the same code paths.
>
>Therefore systemd is started in dracut created initramfs, which starts
>journald for logging. The same basic systemd targets exist in the
>initramfs
>as on the real root, so normally you don't have to cope with
>specialized
>versions for the initramfs.
>
>The target here is to have the same IPC mechanism from the very
>beginning to
>the very end. No crappy fallback mechanisms in case a daemon is not
>running
>or has crashed, no creepy transition from initramfs root to real root
>to
>shutdown root.
>
>We already have such transitions like: systemd, journald, mdmon [1],
>etc.
>systemd has to serialize itself, journald's file descriptors are
>transitioned
>over, mdmon jumps through hoops. Remember you want to get rid of open
>files
>and executables and have to reexec everything, if you transition from
>the
>initramfs root to the real root, and also from the real root to the
>shutdown
>root.
>
>We really don't want the IPC mechanism to be in a flux state. All tools
>have
>to fallback to a non-standard mechanism in that case.
>
>If I have to pull in a dbus daemon in the initramfs, we still have the
>chicken and egg problem for PID 1 talking to the logging daemon and
>starting
>dbus.
>systemd cannot talk to journald via dbus unless dbus-daemon is started,
>dbus
>cannot log anything on startup, if journald is not running, etc...
>
>dbus-daemon would have to transition to the real root, and from the
>real root
>to the shutdown root, without losing state.

Which does not sound fundamentally hard.
Unify the roots, and make /run or wherever the dbus socket lives always available. As long as your initramfs has the latest versions of software there is no need for any tricky transitions except to upgrade software on a running system.

>Of course this can all be done, but it would involve fallback
>mechanisms,
>which we want to get rid off.

Only if you design things poorly.

>Hopefully, you don't suggest to merge
>dbus with
>PID 1. Also with a daemon, you will lose the points mentioned in the
>cover mail

I don't see how something that is inappropriate to be in PID 1 is better in PID 0.

>I don't care, if the kdbus speedup is only marginal.
>
>In my ideal world, there is a standard IPC mechanism from the beginning
>to
>the end, which does not rely on any process running (except the kernel)
>and
>which is used by _all_ tools, be it a system daemon providing
>information and
>interfaces about device assembly or network setup tools or end user
>desktop
>processes.

And that is a beautiful dream and an absolutely rubbish way to get there. If the performance is not top notch everything can not use your beautiful IPC mechanism. Which means your dream fails.

Good performance is a hard requirement to get where you want to be.

>dbus _is_ such an easy, flexible standard IPC mechanism. Of course, you
>can
>invent the wheel again (NIH, "we know better") and wait and see, if
>that
>works out. Until then the whole common IPC problem is unresolved and
>Linux
>distributions are just a collection of random software with no common
>interoperability and home grown interfaces.

kdbus seems to be the NIH "we know better better" approach. Many of it's design decisions we have chosen to differently elsewhere in the kernel because the have caused problems. When these issues have been pointed out in review people have blown off leading to the current mess.

Furthermore I don't know that I have seen people arguing for transporting something other than dbus messages but rather I have seen people pointing out there have been many excellent IPC mechanisms that are simpler and faster for the same kind of task and suggesting mapping dbus to better kernel primitives might be productive.

But seriously if you want to have one IPC mechanism to rule them all you won't succeed in convincing everyone with the currently sloppily designed kdbus code.

Performance matters, simplicity matters, being able to explain design decisions matter.

Eric

2015-05-01 15:49:18

On Mon, Apr 27, 2015 at 03:14:49PM -0700, Linus Torvalds wrote:
> On Mon, Apr 27, 2015 at 3:00 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > IOW, all the people who say that it's about avoiding context switches
> > are probably just full of shit. It's not about context switches, it's
> > about bad user-level code.
>
> Just to make sure, I did a system-wide profile (so that you can
> actually see the overhead of context switching better), and that
> didn't change the picture.
>
> The scheduler overhead *might* be 1% or so.
>
> So really. The people who talk about how kdbus improves performance
> are just full of sh*t. Yes, it improves things, but the improvement
> seems to be 100% "incidental", in that it avoids a few trips down the
> user-space problems.

I was interested how plain UDS performs compared to the
dbus-client/dbus-server benchmark when doing a similar
transaction (RPC call from client1 to client2 via a server,
i.e 4 send() and 4 recv() syscalls per RPC msg).
Since I had worked on socket code for some project anyway, I
decided to write a stupid little benchmark.

On my machine, dbus-client/dbus-server needs ~200us per call (1024 byte msg),
UDS "dbus call" needs ~23us. Of course, someone who cares about performance
wouldn't use sync RPC via a message broker, so I added
single-client and async mode to the benchmark for comparison.
Async mode not only decreases scheduling overhead, it also
can use two CPU cores, so it's more than twice as fast.

./server dbus
(you need to run two clients, the timing loop starts
when the second client connects)
./client sync 4096 1000000
22.757250 s, 43942 msg/s, 22.8 us/msg, 171.638 MB/s
./client async 4096 1000000
8.197482 s, 121989 msg/s, 8.2 us/msg, 476.488 MB/s
./server single
(only a single client talks to the server)
./client sync 4096 1000000
10.980143 s, 91073 msg/s, 11.0 us/msg, 355.733 MB/s
./client async 4096 1000000
3.041953 s, 328736 msg/s, 3.0 us/msg, 1284.044 MB/s

In all cases 1 msg means "send request + receive response".

Johannes

Attachments:

(No filename) (2.08 kB)
server.c (2.28 kB)
client.c (2.60 kB)
Makefile (79.00 B)
Download all attachments