MIME-Version: 1.0
In-Reply-To: <CALCETrWkGmCe0Job9RKxkq+LTT3gh8iCtYeVS75YODYUysq-5w@mail.gmail.com>
References: <20150413190350.GA9485@kroah.com> <CALCETrWx2sreF7+7XvLJi32jeeN+2peM_UNYxH-EiNqwh9zZVA@mail.gmail.com>
 <20150413204547.GB1760@kroah.com> <CALCETrVoLO2Py5OY7wyn4MBcqB7VEs5DU4E2ThVSUc8kkwR3cg@mail.gmail.com>
 <20150414175019.GA2874@kroah.com> <alpine.LNX.2.00.1504150024570.26287@pobox.suse.cz>
 <20150415085641.GH16381@kroah.com> <20150415120618.4d8d90ff@lxorguk.ukuu.org.uk>
 <552E8B11.4010803@redhat.com> <CAEntwhGzWpndmrorjkePZ1nkV6Ei_p-iSRavAbWhMGdfzfW-kA@mail.gmail.com>
 <CALCETrWkGmCe0Job9RKxkq+LTT3gh8iCtYeVS75YODYUysq-5w@mail.gmail.com>
From: Havoc Pennington <hp@pobox.com>
Date: Wed, 15 Apr 2015 17:58:05 -0400
Message-ID: <CAEntwhHLjrGQYXUn-H2qS_00_b=XTW9FnTzys2k+k4C7_zc_fA@mail.gmail.com>
Subject: Re: [GIT PULL] kdbus for 4.1-rc1
To: Andy Lutomirski <luto@amacapital.net>
Cc: Rik van Riel <riel@redhat.com>,
        One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Jiri Kosina <jkosina@suse.cz>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Arnd Bergmann <arnd@arndb.de>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        Tom Gundersen <teg@jklm.no>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Daniel Mack <daniel@zonque.org>,
        David Herrmann <dh.herrmann@gmail.com>,
        Djalal Harouni <tixxdz@opendz.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3207
Lines: 73

On Wed, Apr 15, 2015 at 4:22 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> This leads me to a potentially interesting question: where's the
> buffering?  If there's a bus with lots of untrusted clients and one of
> them broadcasts data faster than all receivers can process it, where
> does it go?
>
> At least with a userspace solution, it's clear what the OOM killer
> should kill when this happens.  Unless it's PID 1.  Sigh.
>

There's the history and there's the probably-should-happen. I'm sure
this can be improved.

What I think should probably happen is:

 - if a client is trying to send a message and the bus's incoming
buffer from that client is full, the bus should stop reading (forcing
the client to do its own buffering).
 - if a client is not consuming messages fast enough and the bus's
outgoing buffer to that client fills up, the client should be
disconnected.

This would essentially copy how the X server works (again).

The original userspace implementation has configurable buffer size
limits and also limits on resources (such as number of connections and
match rules) used by a single user, but I don't think it does the
right things when limits are reached.

When the incoming queue is full for a client, I'm not sure whether it
stops reading from that client or sends the client errors, I don't
remember.

When the outgoing-from-the-daemon queue is full (a client isn't
reading messages fast enough), if I remember right messages to that
client are dropped with an error reply to the sender - this error
probably gets ignored much of the time in practice, but in theory the
sender could retry.

A full outgoing queue for one client doesn't affect other clients, who
are still able to receive messages. For broadcast messages, a full
queue means a client will miss those broadcasts.

Disconnecting might be better than this drop-the-message behavior,
because clients could then assume that *either* they got all messages
that were broadcast, *or* they got disconnected - they won't ever
silently miss broadcasts and end up in a weird confused state.

Xserver does this - if I'm reading the code correctly just now
(xserver/os/io.c, FlushClient()), it buffers outgoing messages until
realloc fails, and then it disconnects the client.

If X didn't do this, then clients could miss events and become
confused about the state of the server.  The same will often apply in
dbus scenarios.

In practice right now APIs are designed and limits are configured to
try to avoid ever hitting the limits (unless something is malicious or
badly broken), because if you hit them things go to hell - much like
running out of memory, or hitting file descriptor limits.

Disconnecting slow-reading clients would probably improve this; the
full buffer would be instantly freed, and the client could reconnect
and re-establish all state it cares about, if it wants to. So it might
gracefully recover sometimes, if the problem was transient.

Havoc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/