Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753001AbbGAQvw (ORCPT ); Wed, 1 Jul 2015 12:51:52 -0400 Received: from mail-lb0-f179.google.com ([209.85.217.179]:34636 "EHLO mail-lb0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752016AbbGAQvn (ORCPT ); Wed, 1 Jul 2015 12:51:43 -0400 MIME-Version: 1.0 In-Reply-To: <20150701000358.GA32283@molukki> References: <20150701000358.GA32283@molukki> Date: Wed, 1 Jul 2015 18:51:41 +0200 Message-ID: Subject: Re: kdbus: to merge or not to merge? From: David Herrmann To: "Kalle A. Sandstrom" Cc: linux-kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5727 Lines: 110 Hi On Wed, Jul 1, 2015 at 2:03 AM, Kalle A. Sandstrom wrote: > For the first, compare unix domain sockets (i.e. point-to-point mode, access > control through filesystem [or fork() parentage], read/write/select) to the > kdbus message-sending ioctl. In the main data-exchanging portion, the former > requires only a connection identifier, a pointer to a buffer, and the length > of data in that buffer. To contrast, kdbus takes a complex message-sending > command structure with 0..n items of m kinds that the ioctl must parse in a > m-way switching loop, and then another complex message-describing structure > which has its own 1..n items of another m kinds describing its contents, > destination-lookup options, negotiation of supported options, and so forth. sendmsg(2) uses a very similar payload to kdbus. send(2) is a shortcut to simplify the most common use-case. I'd be more than glad to accept patches adding such shortcuts to kdbus, if accompanied by benchmark numbers and reasoning why this is a common path for dbus/etc. clients. The kdbus API is kept generic and extendable, while trying to keep runtime overhead minimal. If this overhead turns out to be a significant runtime slowdown (which none of my benchmarks showed), we should consider adding shortcuts. Until then, I prefer an API that is consistent, easy to extend and flexible. > Consequently, a carefully optimized implementation of unix domain sockets (and > by extension all the data-carrying SysV etc. IPC primitives, optimized > similarly) will always be superior to kdbus for both message throughput and > latency, [...] Yes, that's due to the point-to-point nature of UDS. > [...] For long messages (> L1 cache size per Stetson-Harrison[0]) the > only performance benefit from kdbus is its claimed single-copy mode of > operation-- an equivalent to which could be had with ye olde sockets by copying > data from the writer directly into the reader while one of them blocks[1] in > the appropriate syscall. That the current Linux pipes, SysV queues, unix domain > sockets, etc. don't do this doesn't really factor in. Parts of the network subsystem have supported single-copy (mmap'ed IO) for quite some time. kdbus mandates it, but otherwise is not special in that regard. > A consequence of this buffering is that whenever a client sends a message with > kdbus, it must be prepared to handle an out-of-space non-delivery status. > [...] There's no option to e.g. overwrite a previous > message, or to discard queued messages in an oldest-first order, instead of > rebuffing the sender. Correct. > For broadcast messaging, a recipient may observe that messages were dropped by > looking at a `dropped_msgs' field delivered (and then reset) as part of the > message reception ioctl. Its value is the number of messages dropped since last > read, so arguably a client could achieve the equivalent of the condition's > absence by resynchronizing explicitly with all signal-senders on its current > bus wrt which it knows the protocol, when the value is >0. This method could in > principle apply to 1-to-1 unidirectional messaging as well[2]. Correct. > Looking at the kdbus "send message, wait for tagged reply" feature in > conjunction with these details appears to reveal two holes in its state graph. > The first is that if replies are delivered through the requestor's buffer, > concurrent sends into that same buffer may cause it to become full (or the > queue to grow too long, w/e) before the service gets a chance to reply. If this > condition causes a reply to fall out of the IPC flow, the requestor will hang > until either its specified timeout happens or it gets interrupted by a signal. If sending a reply fails, the kdbus_reply state is destructed and the caller must be woken up. We do that for sync-calls just fine, but the async case does indeed lack a wake-up in the error path. I noted this down and will fix it. > If replies are delivered outside the shm pool, the requestor must be prepared > to pick them up using a different means from the "in your pool w/ offset X, > length Y" format the main-line kdbus interface provides. [...] Replies are never delivered outside the shm pool. > The second problem is that given how there can be a timeout or interrupt on the > receive side of a "method call" transaction, it's possible for the requestor to > bow out of the IPC flow _while the service is processing its request_. This > results either in the reply message being lost, or its ending up in the > requestor's buffer to appear in a loop where it may not be expected. Either (for completeness: we properly support resuming interrupted sync-calls) > way, the client must at that point resynchronize wrt all objects related to the > request's side effects, or abandon the IPC flow entirely and start over. > (services need only confirm their replies before effecting e.g. a chardev-like > "destructively read N bytes from buffer" operation's outcome, which is slightly > less ugly.) Correct. If you time-out, or refuse to resume, a sync-call, you have to treat this transaction as failed. > Tying this back into the first point: to prevent this type of denial-of-service > against sanguinely-written software it's necessary for kdbus to invoke the > policy engine to determine that an unrelated participant isn't allowed to > consume a peer's buffer space. It's not the policy engine, but quota-handling, but otherwise correct. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/