Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932644AbcJZTVp (ORCPT ); Wed, 26 Oct 2016 15:21:45 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:33351 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751399AbcJZTVk (ORCPT ); Wed, 26 Oct 2016 15:21:40 -0400 From: David Herrmann To: linux-kernel@vger.kernel.org Cc: Andy Lutomirski , Jiri Kosina , Greg KH , Hannes Reinecke , Steven Rostedt , Arnd Bergmann , Tom Gundersen , David Herrmann , Josh Triplett , Linus Torvalds , Andrew Morton Subject: [RFC v1 01/14] bus1: add bus1(7) man-page Date: Wed, 26 Oct 2016 21:17:57 +0200 Message-Id: <20161026191810.12275-2-dh.herrmann@gmail.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161026191810.12275-1-dh.herrmann@gmail.com> References: <20161026191810.12275-1-dh.herrmann@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 37504 Lines: 945 From: Tom Gundersen Add new directory ./Documentation/bus1/ and include DocBook scripts to build the bus1(7) man-page. This documents the bus1 character-device, including all the file-operations you can perform on it. Furthermore, the man-page also introduces the core bus1 concepts and explains how they work. Build the bus1-documentation via: $ make -C Documentation/bus1/ mandocs $ make -C Documentation/bus1/ htmldocs Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann --- Documentation/bus1/.gitignore | 2 + Documentation/bus1/Makefile | 41 ++ Documentation/bus1/bus1.xml | 833 ++++++++++++++++++++++++++++++++++++++ Documentation/bus1/stylesheet.xsl | 16 + 4 files changed, 892 insertions(+) create mode 100644 Documentation/bus1/.gitignore create mode 100644 Documentation/bus1/Makefile create mode 100644 Documentation/bus1/bus1.xml create mode 100644 Documentation/bus1/stylesheet.xsl diff --git a/Documentation/bus1/.gitignore b/Documentation/bus1/.gitignore new file mode 100644 index 0000000..b4a77cc --- /dev/null +++ b/Documentation/bus1/.gitignore @@ -0,0 +1,2 @@ +*.7 +*.html diff --git a/Documentation/bus1/Makefile b/Documentation/bus1/Makefile new file mode 100644 index 0000000..d2b9e61 --- /dev/null +++ b/Documentation/bus1/Makefile @@ -0,0 +1,41 @@ +cmd = $(cmd_$(1)) +srctree = $(shell pwd) +src = . +obj = $(srctree)/$(src) + +DOCS := \ + bus1.xml + +XMLFILES := $(addprefix $(obj)/,$(DOCS)) +MANFILES := $(patsubst %.xml, %.7, $(XMLFILES)) +HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES)) + +XMLTO_ARGS := \ + -m $(srctree)/$(src)/stylesheet.xsl \ + --skip-validation \ + --stringparam funcsynopsis.style=ansi \ + --stringparam man.output.quietly=1 \ + --stringparam man.authors.section.enabled=0 \ + --stringparam man.copyright.section.enabled=0 + +quiet_cmd_db2man = MAN $@ + cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $< +%.7: %.xml + @(which xmlto > /dev/null 2>&1) || \ + (echo "*** You need to install xmlto ***"; \ + exit 1) + $(call cmd,db2man) + +quiet_cmd_db2html = HTML $@ + cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $< +%.html: %.xml + @(which xmlto > /dev/null 2>&1) || \ + (echo "*** You need to install xmlto ***"; \ + exit 1) + $(call cmd,db2html) + +mandocs: $(MANFILES) + +htmldocs: $(HTMLFILES) + +clean-files := $(MANFILES) $(HTMLFILES) diff --git a/Documentation/bus1/bus1.xml b/Documentation/bus1/bus1.xml new file mode 100644 index 0000000..40b55c0 --- /dev/null +++ b/Documentation/bus1/bus1.xml @@ -0,0 +1,833 @@ + + + + + + + bus1 + bus1 + + + + Documentation + David + Herrmann + + + Documentation + Tom + Gundersen + + + + + + bus1 + 7 + + + + bus1 + Kernel Message Bus + + + + + #include <linux/bus1.h> + + + + + Description + +The bus1 Kernel Message Bus defines and implements a distributed object model. +It allows local processes to send messages to objects owned by remote processes, +as well as share their own objects with others. Object ownership is static and +cannot be transferred. Access to remote objects is prohibited, unless it was +explicitly granted. Processes can transmit messages to a remote object via the +message bus, transferring a data payload, object access rights, file +descriptors, or other auxiliary data. + + +To participate on the message bus, a peer context must be created. Peer contexts +are kernel objects, identified by a file descriptor. They are not bound to any +process, but can be shared freely. The peer context provides a message queue to +store all incoming messages, a registry for all locally owned objects, and +tracks access rights to remote objects. A peer context never serves as +routing entity, but merely as anchor for peer-owned resources. Any message on +the bus is always destined for an object, and the bus takes care to transfer a +message into the message queue of the peer context that owns this object. + + +The message bus manages object access using capabilities. That is, by default +only the owner of an object is granted access rights. No other peer can access +the object, nor are they aware of the existance of the object. However, access +rights can be transmitted as auxiliary data with any message, effectively +granting them to the receiver of the message. This even works transitively, that +is, any peer that was granted access to an object can pass on those rights, even +if they do not own the object. But mind that no access rights can ever be +revoked, besides the owner destroying the object. + + + + Nodes and Handles + +Each peer context comes with a registry of owned objects, which in bus1 +parlance are called nodes. A peer is always the exclusive +owner of all nodes it has created. Ownership cannot be transferred. The message +bus manages access rights to nodes as a set of handles held +by each peer. For each node a peer has access to, whether it is local or remote, +the message bus keeps a handle on the peer. Initially when a node is created the +node owner is the only peer with a handle to the newly created node. Handles are +local to each peer, but can be transmitted as auxiliary data with any message, +effectively allocating a new handle to the same node in the destination peer. +This works transitively, and each peer that holds a handle can pass it on +further, or deliberately drop it. As long as a peer has a handle to a node it +can send messages to it. However, a node owner can, at any time, decide to +destroy a node. This causes all further message transactions to this node to +fail, although messages that have already been queued for the node are still +delivered. When a node is destroyed, all peers that hold handles to the node are +notified of the destruction. Moreover, if the owner of a node that has been +destroyed releases all its handles to the node, no further messages or +notifications destined for the node are delivered. + + +Handles are the only way to refer to both local and remote nodes. For each +handle allocated on a peer, a 64-bit ID is assigned to identify that particular +handle on that particular peer. The ID is only valid locally on that peer, it +cannot be used by remote peers to address the handle (in other words, the ID +namespace is tied to each peer and does not define global entities). When +creating a new node, userspace freely selects the ID except that the +BUS1_HANDLE_FLAG_MANAGED bit must be cleared, and when +receiving a handle from a remote peer the kernel assigns the ID, which always +has the BUS1_HANDLE_FLAG_MANAGED set. Additionally, the +BUS1_HANDLE_FLAG_REMOTE flag tells whether a specific ID +refers to a remote handle (if set), or to an owner handle (if unset). An ID +assigned by the +kernel is never reused, even after a handle has been dropped. The kernel keeps a +user-reference count for each handle. Every time a handle is exposed to a peer, +the user-reference count of that handle is incremented by one. This is never +done asynchronously, but only synchronously when an ioctl is called by the +holding peer. Therefore, a peer can reliable deduce the current user-reference +count of all its handles, regardless of any ongoing message transaction. +References can be explicitly dropped by a peer. Once the counter of a handle +hits zero, it is destroyed, its ID becomes invalid, and if it was assigned by +the kernel, it will not be reused again. Note that a peer can never have +multiple different handles to the same node, rather the kernel always coalesces +them into a single handle, using the user-reference counter to track it. +However, if a handle is fully released, but the peer later acquires a handle to +the same remote node again, its ID will be different, as IDs are never reused. + + +New nodes are allocated on-demand by passing the desired ID to the kernel in any +ioctl that accepts a handle ID. When allocating a new node, the node owner +implicitly also gets a handle to that node. As long as the node is valid, the +kernel will pin a single user-reference to the owner's handle. This guarantees +that a node owner always retains access to their node, until they explicitly +destroy it (which will make it possible for userspace to release the handle like +any other). Once all the handles to a local node have been released, no more +messages destined for the node will be received. Otherwise, a handle to a local +node behaves just like any other handle, that is, user-references are acquired +and released according to its use. However, whenever the overall sum of all +user-references on all handles to a node drops to one (which implies that only +the pinned reference of the owner is left), a release-notification is queued on +the node owner. If the counter is incremented again, any such notification is +dropped, if not already dequeued. + + + + + Message Transactions + +A message transaction atomically transfers a message to any number of +destinations. Unless requested otherwise, the message transaction fully succeeds +or fully fails. + + +To receive messag payloads, each peer has an associated shmem-backed +pool which may be mapped read-only by the receiving peer. +The kernel copies the message payload directly from the sending peer to each of +the receivers' pool without an intermediary kernel buffer. The pool is divided +into slices to hold each message. When a message is +received, its offset into the pool in bytes is returned to +userspace, and userspace has to explicitly release the slice once it has +finished with it. + + +The kernel amends all data messages with the uid, +gid, pid, tid, and +optionally the security context of the sending peer. The information is +collected from the sending peer when the message is sent and translated into the +namespaces of the receiving peer's file-descriptor. + + + + + Seed Message + +Every peer may pin a special seed message. Only the peer +itself may set and retrieve the seed, and at most one seed message may be pinned +at any given time. The seed typically describes the peer itself and pins any +nodes and handles necessary to bootstrap the peer. + + + + + Resource quotas + +Each user has a fixed amount of available resources. The limits are static, but +may be overridden by module parameters. Limits are placed on the amount of +memory a user's pools may consume, the number of handles a user may hold and, +the number of inflight messages may be destined for a user and the number of +file descriptors may be inflight to a user. All inflight resources are accounted +on the receiving peer. + + +As resources are accounted on the receiver, a quota mechanism is in place in +order to avoid intentional or unintentional resource exhaustion by a malicious +or broken sending user. At the time of a message transaction, the sending user +may consume in total (including what is consumed by previous transactions) half +of the total resources of the receiving user that have not been consumed by +another user. When a message is dequeued its resource consumption is deaccounted +from the sending users quota. + + +If a receiving peer does not dequeue any of its incoming messages it would be +possible for a users quota to be fully consumed by one peer, making it +impossible to communicate with other functioning peers owned by the same user. A +second quota is therefore enforced per-peer, enforcing that at the time of a +message transaction the receiving peer may consume at in total (including what +is consumed by previous transactions) half of the total resources available to +the sending user that have not been consumed by another peer. + + + + + Global Ordering + +Despite there being no global synchronization, all events on the bus, such as +sending or receiving of messages, release of handles or destruction of nodes, +behave as if they were globally ordered. That is, for any two events it is +always possible to consider one to have happened before the other in such a way +that it is consistent with all the effects observed on the bus. + + +For instance, if two events occurr on one peer (say the sending of a message, +and the destruction of a node), and they are observed on another peer (by +receiving the message and receiving a destruction notification for the node), we +are guaranteed that the order the events occurred in and the order they were +observed in is the same. + + +One could consider a further example involving three peers, if a message is sent +from one peer to two others, and after receiving the message the first recipient +sends a further message to the second recipient, it is guaranteed that the +original message is received before the subsequent one. + + +This principle of causality is also respected in the pressence of side-channel +communication. That is, if one event may have triggered another, even if on +different, disconnected, peers, we are guaranteed that the events are ordered +accordingly. To be precise, if one event (such as receiving a message) completed +before another (such as sending a message) was started, then they are ordered +accordingly. + + +Also in the case where there can be no causal relationship, we are guaranteed a +global order. In case two events happend concurrently, there can never be any +inconsistency in which occurred before the other. By way of example, consider +two peers sending one message each to two different peers, we are guaranteed +that both the recipient peers receive the two messages in the same order, even +though the order may be arbitrary. + + + + + Operating on a bus1 file descriptor + +The bus1 peer file descriptor supports the following operations: + + + + + + open + 2 + + + + +A call to + + open2 + +on the bus1 character device (usually /dev/bus1) creates a +new peer context identified by the returned file descriptor. + + + + + + + + poll + 2 + + + + + select + 2 + + + (and similar) + + +The file descriptor supports + + poll2 + +(and analogously + + epoll7 +) and + + select2 +, as follows: + + + + + +The file descriptor is readable (the readfds argument of + + select2 +; +the POLLIN flag of + + poll2 +) +if one or more messages are ready to be dequeued. + + + + + +The file descriptor is writable (the writefds argument of + + select2 +; +the POLLOUT flag of + + poll2 +) +if the peer has not been shut down, yet (i.e., the peer can be used to send +messages). + + + + + +The file descriptor signals a hang-up (overloaded on the +readfds argument of + + select2 +; +the POLLHUP flag of + + poll2 +) +if the peer has been shut down. + + + + + +The bus1 peer file descriptor also supports the other file descriptor +multiplexing APIs: + + pselect2 +, and + + ppoll2 +. + + + + + + + + mmap + 2 + + + + +A call to + + mmap2 + +installs a memory mapping to the message pool of the peer into the caller's +address-space. No writable mappings are allowed. Furthermore, the pool has no +fixed size, but grows dynamically with the demands of the peer. + + + + + + + + ioctl + 2 + + + + +The following bus1-specific commands are supported: + + + + BUS1_CMD_PEER_DISCONNECT + + +This argument disconnects a peer and does not take an argument. All slices, +handles, nodes and queued messages are released and destroyed and all future +operations on the peer will fail with -ESHUTDOWN. + + + + + + BUS1_CMD_PEER_QUERY + + +This command queries the state of a peer context. It takes the following +structure as argument: + +struct bus1_cmd_peer_reset { + __u64 flags; + __u64 peer_flags; + __u64 max_slices; + __u64 max_handles; + __u64 max_inflight_bytes; + __u64 max_inflight_fds; +}; + +flags must always be set to 0. The state as set via +BUS1_CMD_PEER_RESET, or the default state if it was never +reset, is returned. + + + + + + BUS1_CMD_PEER_RESET + + +This command resets a peer context. It takes the following structure as +argument: + +struct bus1_cmd_peer_reset { + __u64 flags; + __u64 peer_flags; + __u64 max_slices; + __u64 max_handles; + __u64 max_inflight_bytes; + __u64 max_inflight_fds; +}; + +If peer_flags has +BUS1_PEER_FLAG_WANT_SECCTX set, the security context of the +sending task is attached to each message received by this peer. +max_slices, max_handles, +max_inflight_bytes, and max_inflight_fds +are the resource limits for this peer. Note that these are simply max valuse, +the resource usage is also limited per user. + + +If flags has +BUS1_CMD_PEER_RESET_FLAG_FLUSH_SEED set, the seed message +is dropped, and if BUS1_CMD_PEER_RESET_FLAG_FLUSH is set, +all slices and handles are released, all messages are dropped from the queue and +all nodes that are not pinned by the seed message are destroyed. + + + + + + BUS1_CMD_HANDLE_TRANSFER + + +This command transfers a handle from one peer context to another. It takes the +following structure as argument: + +struct bus1_cmd_handle_transfer { + __u64 flags; + __u64 src_handle; + __u64 dst_fd; + __u64 dst_handle; +}; + +flags must always be set to 0, src_handle +is the handle ID of the handle being transferred in the source context, +dst_fd is the file descriptor representing the destination +peer context and dst_handle must be +BUS1_HANDLE_INVALID and is set to the new handle ID in the +destination context on return. + + +If dst_fd is set to -1 the source +context is also used as the destination. + + + + + + BUS1_CMD_HANDLE_RELEASE + + +This command releases one user reference to a handle. It takes a handle ID as +argument. + + + + + + BUS1_CMD_NODE_DESTROY + + +This command destroys a set of nodes. It takes the following structure as +argument: + +struct bus1_cmd_node_destroy { + __u64 flags; + __u64 ptr_nodes; + __u64 n_nodes; +}; + +flags must always be set to 0, ptr_nodes +must be a pointer to an array of handle IDs of owner handles of local nodes, and +n_nodes must be the size of the array. + + + + + + BUS1_CMD_SLICE_RELEASE + + +This command releases one slice from the local pool. It takes a pool offset to +the start of the slice to be released. + + + + + + BUS1_CMD_SEND + + +This command sends a message. It takes the following structure as argument: + +struct bus1_cmd_send { + __u64 flags; + __u64 ptr_destinations; + __u64 ptr_errors; + __u64 n_destinations; + __u64 ptr_vecs; + __u64 n_vecs; + __u64 ptr_handles; + __u64 n_handles; + __u64 ptr_fds; + __u64 n_fds; +}; + + + +flags may be set to at most one of +BUS1_SEND_FLAG_CONTINUE and +BUS1_SEND_FLAG_SEED. If +BUS1_SEND_FLAG_CONTINUE is set any messages that cannot +be delivered due to errors on the remote peer do not make the whole transaction +fail, but merely set the corresponding error code in the error code array +respectively. If BUS1_SEND_FLAG_SEED is set the message +replaces the seed message on the local peer. In this case, +n_destinations must be 0. + + +ptr_destinations is a pointer to an array of handle IDs, +ptr_errors is a pointer to an array of corresponding +errno codes, and n_destinations is the length of the arrays. +The message being sent is delivered to the peer context owning the nodes pointed +to by each of the handles in the array. + + +ptr_vecs is a pointer to an array of iovecs and +n_vecs is the length of the array. The iovecs represent the +payload of the message which is delivered to each destination. + + +ptr_handles is a pointer to an array of handle IDs and +n_handles is the length of the array. Each of the handles in +this array is installed in each destination peer context at receive time. If the +underlying node has been destroyed at the time the message is delivered (the +message would be ordered after the node's destruction notification) then +BUS1_HANDLE_INVALID will be delivered instead. + + +ptr_fds is a pointer to an integer array of file descriptors +and n_fds is the length of the array. Each of the file +descriptors in this array may be installed in the destination peer context at +receive time (see below). + + + + + + BUS1_CMD_RECV + + +This command receives a message. It takes the following structure as argument: + +struct bus1_cmd_recv { + __u64 flags; + __u64 max_offset; + struct { + __u64 type; + __u64 flags; + __u64 destination; + __u32 uid; + __u32 gid; + __u32 pid; + __u32 tid; + __u64 offset; + __u64 n_bytes; + __u64 n_handles; + __u64 n_fds; + __u64 n_secctx; + } msg; +}; + +If BUS1_RECV_FLAG_PEEK is set in flags, +the received message is not dropped from the queue. If +BUS1_RECV_FLAG_SEED is set, the peer's seed is received +rather than a message from the queue. If +BUS1_RECV_FLAG_INSTALL_FDS the file descriptors attached to +the received message are installed in the receiving process. Care must be taken +when using this flag from more than one process on the same message as file +descriptor numbers are per-process and not per-peer. + + +max_offset indicates the maximum offset into the pool the +receiving peer is able to read. If a message slice would exceed this offset +the call would fail with -ERANGE. + + +msg.type indicates the type of message. +BUS1_MSG_NONE is never returned. +BUS1_MSG_DATA indicates a regular message sent from another +peer, possibly containing a payload, as well as attached handles and +filedescriptors. BUS1_MSG_NODE_DESTROY indicates that the +node referenced by the handle in msg.destination was +destroyed by its owner. BUS1_MSG_NODE_RELEASE indicates +that all the references to handles referencing the node in +msg.destination have been released. + + +msg.flags indicates additional flags of the message. +BUS1_MSG_FLAG_HAS_SECCTX indicates that a security context +was attached to the message (to distinguish an empty n_secctx +from an invalid one). +BUS1_MSG_FLAG_CONTINUE indicates that there are more +messages queued which belong to the same message transaction. + + +msg.destination is the ID of the destination node or handle +of the message. + + +msg.uid, msg.gid, +msg.pid, and msg.tid are the user, group, +process and thread ID of the process that created the sending peer context. + + +msg.offset is the offset, in bytes, into the pool of the +payload and msg.n_bytes is its length. + + +msg.n_handles is the number of handles attached to the +message. The handle IDs are stored in the pool following the payload (and +possibly padding to make the array 8-byte aligned). + + +msg.n_fds is the number of handles attached to the +message, or 0 if BUS1_RECV_FLAG_INSTALL_FDS was not set. +The file descriptor numbers are stored in the pool following the handle array +(and possibly padding to make the array 8-byte aligned). + + +msg.n_secctx is the number of bytes attached to the message, +which contain the security context of the sender. The security context is +stored in the pool following the payload (and possibly padding to make it +8-byte aligned). + + + + + + + + + + + close + 2 + + + + +A call to + + close2 + +releases the passed file descriptor. When all file descriptors associated with +the same peer context have been closed, the peer is shut down. This destroys all +nodes of that peer, releases all handles, flushes its queue and pool, and +deallocates all related resources. Messages that have been sent by the peer and +are still queued on destination queues are unaffected by this. + + + + + + + + + Return value + +All bus1 operations return zero on success. On failure, a negative error code is +returned. + + + + + Errors + +These are all standard errors generated by the bus layer. See the description +of each ioctl for details on their occurrence. + + + + EAGAIN + +No messages ready to be read. + + + + + EBADF + +Invalid file descriptor. + + + + + EDQUOT + +Resource quota exceeded. + + + + + EFAULT + +Cannot read, or write, ioctl parameters. + + + + + EHOSTUNREACH + +The destination object is no longer available. + + + + + EINVAL + +Invalid ioctl parameters. + + + + + EMSGSIZE + +The message to be sent exceeds its allowed resource limits. + + + + + ENOMEM + +Out of kernel memory. + + + + + ENOTTY + +Unknown ioctl. + + + + + ENXIO + +Unknown object. + + + + + EOPNOTSUPP + +Operation not supported. + + + + + EPERM + +Permission denied. + + + + + ERANGE + +The message to be received would exceed the maximal offset. + + + + + ESHUTDOWN + +Local peer was already disconnected. + + + + + + + See Also + + + + bus1.pool + 7 + + + + + + diff --git a/Documentation/bus1/stylesheet.xsl b/Documentation/bus1/stylesheet.xsl new file mode 100644 index 0000000..52565ea --- /dev/null +++ b/Documentation/bus1/stylesheet.xsl @@ -0,0 +1,16 @@ + + + 1 + ansi + 80 + 0 + A4 + 2 + 1 + 1 + + + -- 2.10.1