Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759360AbcJ1Ldt (ORCPT ); Fri, 28 Oct 2016 07:33:49 -0400 Received: from mail-vk0-f66.google.com ([209.85.213.66]:36073 "EHLO mail-vk0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752497AbcJ1Ldr (ORCPT ); Fri, 28 Oct 2016 07:33:47 -0400 MIME-Version: 1.0 X-Originating-IP: [195.159.249.59] In-Reply-To: <20161027164312.GI3175@twins.programming.kicks-ass.net> References: <20161026191810.12275-1-dh.herrmann@gmail.com> <20161026191810.12275-7-dh.herrmann@gmail.com> <20161027164312.GI3175@twins.programming.kicks-ass.net> From: Tom Gundersen Date: Fri, 28 Oct 2016 13:33:25 +0200 Message-ID: Subject: Re: [RFC v1 06/14] bus1: util - queue utility library To: Peter Zijlstra Cc: David Herrmann , LKML , Andy Lutomirski , Jiri Kosina , Greg KH , Hannes Reinecke , Steven Rostedt , Arnd Bergmann , Josh Triplett , Linus Torvalds , Andrew Morton Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4946 Lines: 102 On Thu, Oct 27, 2016 at 6:43 PM, Peter Zijlstra wrote: > On Wed, Oct 26, 2016 at 09:18:02PM +0200, David Herrmann wrote: > >> A bus1 message queue is a FIFO, i.e., messages are linearly ordered by >> the time they were sent. Moreover, atomic delivery of messages to >> multiple queues are supported, without any global synchronization, i.e., >> the order of message delivery is consistent across queues. >> >> Messages can be destined for multiple queues, hence, we need to be >> careful that all queues get a consistent order of incoming messages. > > So I read that to mean that if A and B both send a multi-cast message to > C and D, the messages will appear in the same order for both C and D. That is one of the ordering guarantees, yes. > Why is this important? It seem that this multi-cast ordering generates > much of the complexity of this patch while this Changelog fails to > explain why this is a desired property. I don't think this is the case. The most important guarantee we give is causal ordering. To make this work with multicast, we must stage messages first, then commit on a second round. That is, we must find some way to iterate over all clocks before committing, but at the same time preventing any races. The multicast-stability as you just described we get for free by introducing the second-level ordering via sender-address. Stability in multicasts without causal order is not necessarily a crucial feature. However, note that if this ordering is given, it allows reducing the number of round-trips in dependent systems. Imagine a daemon reacting to a set of events from different sources. If the actions of that daemon are solely defined by incoming events, someone else can deduce the actions the daemon took without requiring the daemon to send out events by itself. That is, you can just watch the events on the system, and validly deduce the state of such daemon. Example: There is a configuration daemon that sends events when configuration is changed. And there is a hotplug daemon that sends events when devices are hotplugged. You get an event that the "default mute-state" for audio devices was changed, after it you get a hotplugged audio device. You can now rely on the audio daemon to get the events in the same order, and hence apply the new "default mute-state" to the new device. No need to query the audio daemon whether the new device is muted. But as I said, the causal ordering is what we really want. Multicast-stability is just a nice side-effect. It might also be note mentioning: Both Android Binder and Chromium Mojo make sure they provide causal ordering, since they run into real issues. Binder allows placing multiple messages under the same binder-lock, and Mojo provides Associated Interfaces [1]. DBus makes sure to provide those ordering guarantees as well. >> We >> define the concept of `global order' to provide a basic set of >> guarantees. This global order is a partial order on the set of all >> messages. The order is defined as: >> >> 1) If a message B was queued *after* a message A, then: A < B >> >> 2) If a message B was queued *after* a message A was dequeued, >> then: A < B >> >> 3) If a message B was dequeued *after* a message A on the same queue, >> then: A < B >> >> (Note: Causality is honored. `after' and `before' do not refer to >> the same task, nor the same queue, but rather any kind of >> synchronization between the two operations.) >> >> The queue object implements this global order in a lockless fashion. It >> solely relies on a distributed clock on each queue. Each message to be >> sent causes a clock tick on the local clock and on all destination >> clocks. Furthermore, all clocks are synchronized, meaning they're >> fast-forwarded in case they're behind the highest of all participating >> peers. No global state tracking is involved. > > Yet the code does compares on more than just timestamps. Why are these > secondary (and even tertiary) ordering required? Lamport Timestamps are guaranteed to be unique per-sender, but a receiving queue can still contain messages with the same timestamp (from different senders). That is, if two multicasts overlap, they might end up with the same timestamp, if, and only if, they can have no causal relationship (i.e., the ioctls are called concurrently). We want to extend this partial order, though. We want to provide a stable order in those cases (as described above), so we need a secondary order (we simply pick the memory address of the sender). This guarantees that all receivers get the same order of all messages (even if they have equal timestamps). Note that equal timestamps only happen if entries block each other. Hence, we can use the memory address as secondary order, since we know it is unique in those cases (and cannot be re-used). Cheers, Tom [1] https://docs.google.com/document/d/1ENDDzACX4hplfQ8cCHGo_rXd3IHTu5H4hEZ44Cu8KVs