Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933846AbcJZUeg (ORCPT ); Wed, 26 Oct 2016 16:34:36 -0400 Received: from mail-lf0-f67.google.com ([209.85.215.67]:34082 "EHLO mail-lf0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932859AbcJZUec (ORCPT ); Wed, 26 Oct 2016 16:34:32 -0400 MIME-Version: 1.0 In-Reply-To: References: <20161026191810.12275-1-dh.herrmann@gmail.com> From: David Herrmann Date: Wed, 26 Oct 2016 22:34:30 +0200 Message-ID: Subject: Re: [RFC v1 00/14] Bus1 Kernel Message Bus To: Linus Torvalds Cc: Linux Kernel Mailing List , Andy Lutomirski , Jiri Kosina , Greg KH , Hannes Reinecke , Steven Rostedt , Arnd Bergmann , Tom Gundersen , Josh Triplett , Andrew Morton Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5222 Lines: 102 Hi On Wed, Oct 26, 2016 at 9:39 PM, Linus Torvalds wrote: > So the thing that tends to worry me about these is resource management. > > If I understood the documentation correctly, this has per-user > resource management, which guarantees that at least the system won't > run out of memory. Good. The act of sending a message transfers the > resource to the receiver end. Fine. > > However, the usual problem ends up being that a bad user can basically > DoS a system agent, especially since for obvious performance reasons > the send/receive has to be asynchronous. > > So the usual DoS model is that some user just sends a lot of messages > to a system agent, filling up the system agent resource quota, and > basically killing the system. No, it didn't run out of memory, but the > system agent may not be able to do anything more, since it is now out > of resources. > > Keeping the resource management with the sender doesn't solve the > problem, it just reverses it: now the attack will be to send a lot of > queries to the system agent, but then just refuse to listen to the > replies - again causing the system agent to run out of resources. > > Usually the way this is resolved this is by forcing a > "request-and-reply" resource management model, where the person who > sends out a request is not only the one who is accounted for the > request, but also accounted for the reply buffer. That way the system > agent never runs out of resources, because it's always the requesting > party that has its resources accounted, never the system agent. > > You may well have solved this, but can you describe what the solution > is without forcing people to read the code and try to analyze it? All accounting on bus1 is done on a UID-basis. This is the initial model that tries to match POSIX semantics. More advanced accounting is left as a future extension (like cgroup-based, etc.). So whenever we talk about "user accounting", we talk about the user-abstraction in bus1 that right now is based on UIDs, but could be extended for other schemes. All bus1 resources are owned by a peer, and each peer has a user assigned (which right now corresponds to file->f_cred->uid). Whenever a peer allocates resources, it is accounted on its user. There are global limits per user which cannot be exceeded. Additionally, each peer can set its own per-peer limits, to further partition the per-user limits. Of course, per-user limits override per-peer limits, if necessary. Now this is all trivial and obvious. It works like any resource accounting in the kernel. It becomes tricky when we try to transfer resources. Before SEND, a resource is always accounted on the sender. After RECV, a resource is accounted on the receiver. That is, resource ownership is transferred. In most cases this is obvious: memory is copied from one address-space into another, or file-descriptors are added into the file-table of another process, etc. Lastly, when a resource is queued, we decided to go with receiver-accounting. This means, at the time of SEND resource ownership is transferred (unlike sender-accounting, which would transfer it at time of RECV). The reasons are manifold, but mainly we want RECV to not fail due to accounting, resource exhaustion, etc. We wanted SEND to do the heavy-lifting, and RECV to just dequeue. By avoiding sender-based accounting, we avoid attacks where a receiver does not dequeue messages and thus exhausts the sender's limits. The issue left is senders DoS'ing a target user. To mitigate this, we implemented a quota system. Whenever a sender wants to transfer resources to a receiver, it only gets access to a subset of the receivers resource limits. The inflight resources are accounted on a uid<->uid basis, and the current algorithm allows a receiver access to at most half the limit of the destination not currently used by anyone else. Example: Imagine a receiver with a limit of 1024 handles. A sender transmits a message to that receiver. It gets access to half the limit not used by anyone else, hence 512 handles. It does not matter how many senders there are, nor how many messages are sent, it will reach its quota at 512. As long as they all belong to the same user, they will share the quota and can queue at most 512 handles. If a second sending user comes into play, it gets half the remaining not used by anyone else, which ends up being 256. And so on... If the peer dequeues messages in between, the numbers get higher again. But if you do the math, the most you can get is 50% of the targets resources, if you're the only sender. In all other cases you get less (like intertwined transfers, etc). We did look into sender-based inflight accounting, but the same set of issues arises. Sure, a Request+Reply model would make this easier to handle, but we want to explicitly support a Subscribe+Event{n} model. In this case there is more than one Reply to a message. Long story short: We have uid<->uid quotas so far, which prevent DoS attacks, unless you get access to a ridiculous amount of local UIDs. Details on which resources are accounted can be found in the wiki [1]. Thanks David [1] https://github.com/bus1/documentation/wiki/Quota