Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932676Ab2F2Qus (ORCPT ); Fri, 29 Jun 2012 12:50:48 -0400 Received: from bhuna.collabora.co.uk ([93.93.135.160]:42785 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755999Ab2F2Qq3 (ORCPT ); Fri, 29 Jun 2012 12:46:29 -0400 From: Vincent Sanders To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, "David S. Miller" Cc: Javier Martinez Canillas , Vincent Sanders Subject: [PATCH net-next 02/15] net: bus: Add documentation for AF_BUS Date: Fri, 29 Jun 2012 17:45:41 +0100 Message-Id: <1340988354-26981-3-git-send-email-vincent.sanders@collabora.co.uk> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk> References: <1340988354-26981-1-git-send-email-vincent.sanders@collabora.co.uk> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 25736 Lines: 583 From: Javier Martinez Canillas Docuemnt the AF_BUS design, API and usage semantics. Signed-off-by: Javier Martinez Canillas Signed-off-by: Vincent Sanders --- Documentation/networking/af_bus.txt | 558 +++++++++++++++++++++++++++++++++++ 1 file changed, 558 insertions(+) create mode 100644 Documentation/networking/af_bus.txt diff --git a/Documentation/networking/af_bus.txt b/Documentation/networking/af_bus.txt new file mode 100644 index 0000000..a0b078f --- /dev/null +++ b/Documentation/networking/af_bus.txt @@ -0,0 +1,558 @@ + The AF_BUS socket address family + ================================ + +Introduction +------------ + +AF_BUS is a message oriented inter process communication system. + +The principle features are: + + - Reliable datagram based communication (all sockets are of type + SOCK_SEQPACKET) + + - Multicast message delivery (one to many, unicast as a subset) + + - Strict ordering (messages are delivered to every client in the same order) + + - Ability to pass file descriptors + + - Ability to pass credentials + +The basic concept is to provide a virtual bus on which multiple +processes can communicate and policy is imposed by a "bus master". + +A process can create buses to which other processes can connect and +communicate with each other by sending messages. Processes' addresses +are automatically assigned by the bus on connect and are +unique. Messages can be sent either to a process' unique address or to +a bus multicast addresses. + +Netfilter rules or Berkeley Packet Filter can be used to restrict the +messages that each peer is allowed to receive. This is especially +important when sending to multicast addresses. + +Besides messages, process can send and receive ancillary data (i.e., +SCM_RIGHTS for passing file descriptors or SCM_CREDENTIALS for passing +Unix credentials). In the case of a multicast message all recipients +of a message may obtain a copy a file descriptor or credentials. + +A bus is created by processes connecting on an AF_BUS socket. The +"bus master" binds itself instead of connecting to the NULL address. + +The socket address is made up of a path component and a numeric +component. The path component is either a pathname or an abstract +socket similar to a unix socket. The numeric component is used to +uniquely identify each connection to the bus. Thus the path identifies +a specific bus and the numeric component the attachment to that bus. + +The process that calls bind(2) on the socket is the owner of the bus +and is called the bus master. The master is a special client of the +bus and has some responsibility for the bus' operation. The master is +assigned a fixed address with all the bits zero (0x0000000000000000). + +Each process connected to an AF_BUS socket has one or more addresses +within that bus. These addresses are 64-bit unsigned integers, +interpreted by splitting the address into two parts: the most +significant 16 bits are a prefix identifying the type of address, and +the remaining 48 bits are the actual client address within that +prefix, as shown in this figure: + +Bit: 0 15 16 63 + +----------------+------------------------------------------------+ + | Type prefix | Client address | + +----------------+------------------------------------------------+ + +The prefix with all bits zero is reserved for use by the kernel, which +automatically assigns one address from this prefix to each client on +connection. The address in this prefix with all bits zero is always +assigned to the bus master. Addresses on the prefix 0x0000 are unique +and will never repeat for the lifetime of the bus master. + +A client may have multiple addresses. When data is sent to other +clients, those clients will always see the sender address that is in +the prefix 0x0000 address space when calling recvmsg(2) or +recvfrom(2). Similarly, the prefix 0x0000 address is returned by calls +to getsockname(2) and getpeername(2). + +For each prefix, the address where the least significant 48 bits are +all 1 (i.e., 0xffffffffffff) is also reserved, and can be used to send +multicast messages to all the peers on a prefix. + +The non-reserved addresses in each of the remaining prefixes are +managed by the bus master, which may assign additional addresses to +any other connected socket. + +Having different name-spaces has two advantages: + + - Clients can have addresses on different mutually-exclusive + scopes. This permits sending multicast packets to only clients + that have addresses on a given prefix. + + - The addressing scheme can be more flexible. The kernel will only + assign unique addresses on the all-bits-zero prefix (0x0000) and + allows the bus master process to assign additional addresses to + clients on other prefixes. By having different prefixes, the + kernel and bus master assignments will not collide. + +AF_BUS transport can support two network topologies. When a process +first connects to the bus master, it can only communicate with the bus +master. The process can't send and receive packets from other peers on +the bus. So, from the client process point of view the network +topology is point-to-point. + +The bus master can allow the connected peer to be part of the bus and +start to communicate with other peers by setting a socket option with +the setsockopt(2) system call using the accepted socket descriptor. At +this point, the topology becomes a bus to the client process. + +Packets whose destination address is not assigned to any client are +routed by default to the bus master (the client accepted socket +descriptor). + + +Semantics +--------- + +Bus features: + + - Unicast and multicast addressing scheme. + - Ability to assign addresses from user-space with different prefixes. + - Automatic address assignment. + - Ordered packets delivery (FIFO, total ordering). + - File descriptor and credentials passing. + - Support for both point-to-point and bus network topologies. + - Bus control access managed from user-space. + - Netfilter hooks for packet sending, routing and receiving. + +A process (the "bus master") can create an AF_BUS bus with socket(2) +and use bind(2) to assign an address to the bus. Then it can listen(2) +on the created socket to start accepting incoming connections with +accept(2). + +Processes can connect to the bus by creating a socket with socket(2) +and using connect(2). The kernel will assign a unique address to each +connection and messages can be sent and received by using BSD socket +primitives. + +This uses the connect(2) semantic in a non-traditional way, with +AF_BUS sockets, it's not possible to connect "my" socket to a specific +peer socket whereas the traditional BSD sockets API usage, connect(2) +either connects to stream sockets, or assigns a peer address to a +datagram socket (so that send(2) can be used instead of sendto()). + +An AF_BUS socket address is represented as a combination of a bus +address and a bus path name. Address are unique within a path. The +unique bus address is further subdivided into a prefix and a client +address. Thus the path identifies a specific bus and the numeric +component the attachment to that bus. + +#define BUS_PATH_MAX 108 + +/* Bus address */ +struct bus_addr { + uint64_t s_addr; /* 16-bit prefix + 48-bit client address */ +}; + +/* Structure describing an AF_BUS socket address. */ +struct sockaddr_bus { + sa_family_t sbus_family; /* AF_BUS */ + struct bus_addr sbus_addr; /* bus address */ + char sbus_path[BUS_PATH_MAX]; /* pathname */ +}; + +A process becomes a bus master for a given struct sockaddr_bus by +calling bind(2) on an AF_BUS addresses. The argument must be { AF_BUS, +0, path }. + +AF_BUS supports both abstract and non-abstract path names. Abstract +names are distinguished by the fact that sbus_path[0] == '\0' and they +don't represent file system paths while non-abstract paths are bound +to a file system path name. (See the unix(7) man page for a discussion +of abstract socket addresses in the AF_UNIX address family.) + +Then the process calls listen(2) to accept incoming connections. If +that process calls getsockname(2), the returned address will be { +AF_BUS, 0, path }. + +The conventional string form of the full address is path + ":" + +prefix + "/" + client address. Prefix and client address are +represented in hex. + +For example the address: + +struct sockaddr_bus addr; +addr.sbus_family = AF_BUS; +strcpy(addr.sbus_path, "/tmp/test"); +addr.sbus_addr.s_addr = 0x0002f00ddeadbeef; + +would be represented using the string /tmp/test:0002/f00ddeadbeef. + +If the bus_addr is 0, then both the prefix and client address may be +omitted from the string form. To connect to a bus as a client it is +sufficient to specify the path, since the listening address always has +bus_addr == 0. it is not meanigful to specify 'bus_addr' as other than +0 on connect() + +The AF_BUS implementation will automatically assign a unique address +to each client but the bus master can assign additional addresses on a +different prefix by means of the setsockopt(2) system call. For +example: + +struct bus_addr addr; +addr.s_addr = 0x0001deadfee1dead; +ret = setsockopt(afd, SOL_BUS, BUS_ADD_ADDR, &addr, sizeof(addr)); + +where afd is the accepted socket descriptor in the daemon. To show graphically: + + L The AF_BUS listening socket } + / | \ }-- listener process + A1 A2 A3 The AF_BUS accepted sockets } + | | | + C1 C2 C3 The AF_BUS connected sockets }-- client processes + +So if setsockopt(A1, SOL_BUS, BUS_ADD_ADDR, &addr, sizeof(addr)) is +called, C1 will get the new address. + +The inverse operation is BUS_DEL_ADDR, which the bus master can use to +remove a client socket AF_BUS address: + +ret = setsockopt(afd, SOL_BUS, BUS_DEL_ADDR, &addr, sizeof(addr)); + +Besides assigning additional addresses, the bus master has to allow a +client process to communicate with other peers on the bus using a +setsockopt(2): + +ret = setsockopt(afd, SOL_BUS, BUS_JOIN_BUS, NULL, 0); + +Clients are not meant to send messages to each other until the master +tells them (in a protocol-specific way) that the BUS_JOIN_BUS +setsockopt(2) call was made. + +If a client sends a message to a destination other than the bus +master's all-zero address before joining the bus, a EHOSTUNREACH (No +route to host) error is returned since the only host that exists in +the point-to-point network before the client joins the bus are the +client and the bus master. + +A EHOSTUNREACH is returned if a client that joined a bus tries to send +a packet to a client from another bus. Cross-bus communication is not +permited. + +When a process wants to send a unicast message to a peer, it fills a +sockaddr structure and performs a socket operation (i.e., sendto(2)) + +struct sockaddr_bus addr; +char *msg = "Hello world"; + +addr.sbus_family = AF_BUS; +strcpy(addr.sbus_path, "/tmp/test"); +addr.sbus_addr.s_addr = 0x0001f00ddeadbeef; + +ret = sendto(sockfd, "Hello world", strlen("Hello world"), 0, + (struct sockaddr*)&addr, sizeof(addr)); + +The current implementation requires that the addr.sbus_path component +match the one used to conenct() to the bus but in future this +requirement will be removed. + +The kernel will first check that the socket is connected and that the +bus path of the socket correspond with the destination, then it will +extract the prefix and client address from the bus address using a +fixed 16 -bit bitmask. + +prefix = bus address >> 48 & 0xffff +client address = bus address & 0xffff + +If the client address is not all bits one, then the message is unicast +and is delivered to the socket with that assigned address +(0x0001f00ddeadbeef). Otherwise the message is multicast and is +delivered to all the peers with this address prefix (0x0001 in this +case). + +So, when a process wants to send a multicast message, it just has to +fill the address structure with the address prefix + 0xffffffffffff: + +struct sockaddr_bus addr; +char *msg = "Hello world"; + +addr.bus_family = AF_BUS; +strcpy(addr.sbus_path, "/tmp/test"); +addr.bus_addr = 0x0001ffffffffffff; + +ret = sendto(sockfd, "Hello world", strlen("Hello world"), 0, + (struct sockaddr*)&addr, sizeof(addr)); + +The kernel, will apply the binary and operation, learn that the +address is 0xffffffffffff and send the message to all the peers on +this prefix (0x0001). + +Socket transmit queued bytes are limited by a maximum send buffer size +(sysctl_wmem_max) defined in the kernel and can be modified at runtime +using the sysctl interface on /proc/sys/net/core/wmem_default. This +parameter is global for all the sockets families in a Linux system. + +AF_BUS permits the definition of a per-bus maximum send buffer size +using the BUS_SET_SENDBUF socket option. The bus master can call the +setsockopt(2) system call using as a parameter the listening socket. +The command sets a maximum write buffer that will be imposed on each +new socket that connects to the bus: + +ret = setsockopt(serverfd, SOL_BUS, BUS_SET_SENDBUF, &sndbuf, +sizeof(int)); + +In the transmission path both Berkeley Packet Filters and Netfilter +hooks are available, so they can be used to filter sending packets. + + +Using this addressing scheme with D-Bus +--------------------------------------- + +As an example of a use case for AF_BUS, let's analyze how the D-Bus +IPC system can be implemented on top of it. + +We define a new D-Bus address type "afbus". + +A D-Bus client may connect to an address of the form "afbus:path=X" +where X is a string. This means that it connect()s to { AF_BUS, 0, X }. + +For example: afbus:path=/tmp/test connects to { AF_BUS, 0, /tmp/test }. + +A D-Bus daemon may listen on the address "afbus:", which means that it +binds to { AF_BUS, 0, /tmp/test }. It will advertise an address of the +form "afbus:path=/tmp/test" to clients, for instance via the +--print-address option, or via dbus-launch setting the +DBUS_SESSION_BUS_ADDRESS environment variable. For instance, "afbus:" +is an appropriate default listening address for the session bus, +resulting in dbus-launch setting the DBUS_SESSION_BUS_ADDRESS +environment variable to something like +"afbus:path=/tmp/test,guid=...". + +A D-Bus daemon may listen on the address "afbus:file=/some/file", +which means that it will do as above, then write its path into the +given well-known file. For instance, +"afbus:file=/run/dbus/system_bus.afbus" is an appropriate listening +address for the system bus. Only processes with suitable privileges to +write to that file can impersonate the system bus. + +D-Bus clients wishing to connect to the well-known system bus should +attempt to connect to afbus:file=/run/dbus/system_bus.afbus, falling +back to unix:path=/var/run/dbus/system_bus_socket if that fails. On +Linux systems, the well-known system bus daemon should attempt to +listen on both of those addresses. + +The D-Bus daemon will serve as bus master as well since it will be the +process that creates and listens on the AF_BUS socket. + +D-Bus clients will use the fixed bus master address (all zero bits) to +send messages to the D-Bus daemon and the client's unique address to +send messages to other D-Bus clients using the bus. + +When initially connected, D-Bus clients will only be able to +communicate with the D-Bus daemon and will send authentication +information (AUTH message and SCM_CREDENTIALS ancillary +messages). Since the D-Bus daemon is also the bus master, it can allow +D-Bus clients to join the bus and be able to send and receive D-Bus +messages from other peers. + +On connection, the kernel will assign to each client an address in the +prefix 0x0000. If a client attempts to send messages to clients other +than the bus master, this is considered to be an error, and is +prevented by the kernel. + +When the D-Bus daemon has authenticated a client and determined that +it is authorized to be on this bus, it uses a setsockopt(2) call to +tell the kernel that this client has permission to send messages. The +D-Bus daemon then tells the client by sending the Hello() reply that +it has made the setsockopt(2) call and that now is able to send +messages to other peers on the bus. + +Well-known names are represented by addresses in the 0x0001, ... prefixes. + +Addresses in prefix 0x0000 must be mapped to D-Bus unique names in a +way that can't collide with unique names allocated by the dbus-daemon +for legacy clients. + +In order to be consistent with current D-Bus unique naming, the AF_BUS +addresses can be mapped directly to D-Bus unique names, for example +(0000/0000deadbeef to ":0.deadbeef"). Leading zeroes can be suppressed +since the common case should be relatively small numbers (the kernel +allocates client addresses sequentially, and machines could be +rebooted occasionally). + +By having both AF_BUS and legacy D-Bus clients use the same address +space, the D-Bus daemon can act as a proxy between clients and can be +sure that D-Bus unique names will be unique for both AF_BUS and legacy +clients. + +To act as a proxy between AF_BUS and legacy clients, each time the +D-Bus daemon accepts a legacy connection (i.e., AF_UNIX), it will +create an AF_BUS socket and establish a connection with itself. It +will then associate this newly created connection with the legacy one. + +To explain it graphically: + + L The AF_BUS listening socket } + / | \ }-- listener process + A1 A2 A3 The AF_BUS accepted sockets } + | | | + C1 C2 C3 The AF_BUS connected sockets, where: + | * C1 belongs to the listener process + | * C2 and C3 belongs to the client processes + | + L2--A4 The AF_UNIX listening and accepted sockets \ + | in the listener process + C4 The AF_UNIX connected socket in the legacy client process + + +where C2 and C3 are normal AF_BUS clients and C4 is a legacy +client. The D-Bus daemon after accepting the connection using the +legacy transport (A4), will create an AF_BUS socket pair (C1, A1) +associated with the legacy client. + +Legacy clients will send messages to the D-Bus daemon using their +legacy socket and the D-Bus daemon will extract the destination +address, resolve to the corresponding AF_BUS address and use this to +send the message to the right peer. + +Conversely, when an AF_BUS client sends a D-Bus message to a legacy +client, it will use the AF_BUS address of the connection associated +with that client. The D-Bus daemon will receive the message, modify +the message's content to set SENDER headers based on the AF_BUS source +address and use the legacy transport to send the D-Bus message to the +legacy client. + +As a special case, the bus daemon's all-zeroes address maps to +"org.freedesktop.DBus" and vice versa. + +When a D-Bus client receives an AF_BUS message from the bus master +(0/0), it must use the SENDER header field in the D-Bus message, as +for any other D-Bus transport, to determine whether the message is +actually from the D-Bus daemon (the SENDER is "org.freedesktop.DBus" +or missing), or from another client (the SENDER starts with ":"). It +is valid for messages from another AF_BUS client to be received via +the D-Bus daemon; if they are, the SENDER header field will always be +set. + +Besides its unique name, D-Bus services can have well-known names such +as org.gnome.Keyring or org.freedesktop.Telepathy. These well-known +names can also be used as a D-Bus message destination +address. Well-known names are not numeric and AF_BUS is not able to +parse D-Bus messages. + +To solve this, the D-Bus daemon will assign an additional AF_BUS +address to each D-Bus client that owns a well-known name. The mapping +between well-known names and AF_BUS address is maintained by the D-Bus +daemon on a persistent data structure. + +D-Bus client libraries will maintain a cache of these mappings so they +can send messages to services with well-known names using their mapped +AF_BUS address. + +If a client intending to send a D-Bus message to a given well-known +name does not have that well-known name in its cache, it must send the +AF_BUS message to the listener (0000/000000000000) instead. + +The listener must forward the D-Bus message to the owner of that +well-known name, setting the SENDER header field if necessary. It may +also send this AF_BUS-specific D-Bus signal to the sender, so that the +sender can update its cache: + + org.freedesktop.DBus.AF_BUS.Forwarded (STRING well_known_name, + UINT64 af_bus_client) + + Emitted by the D-Bus daemon with sender "org.freedesktop.DBus" + and object path "/org/freedesktop/DBus" to indicate that + the well-known name well_known_name is represented by the + AF_BUS address { AF_BUS, af_bus_client, path } where + path is the path name used by this bus. + + For instance, if the well-known name "org.gnome.Keyring" + is represented by AF_BUS address 0001/0000deadbeef, + the signal would have arguments ("org.gnome.Keyring", + 0x00010000deadbeef), corresponding to the AF_BUS + address { AF_BUS, 0x00010000deadbeef, /tmp/test }. + +If the D-Bus service for that well-known name is not active, then the +D-Bus daemon will first do the service activation, assign an +additional address to the recently activated service, store the +well-known service to numeric address mapping on its persistent cache, +and then send the AF_BUS.Forwarded signal back to the client. + +Once the mapping has been made, the AF_BUS address associated with a +well-known name cannot be reused for the lifetime of the D-Bus daemon +(which is the same as the lifetime of the socket). + +Nevertheless the AF_BUS address associated with a well-known name can +change, for example if a service goes away and a new instance gets +activated. This new instance can have a different AF_BUS address. The +D-Bus daemon will maintain a list of the mappings that are currently +valid so it can send the AF_BUS. + +Forwarded signal with the mapping information to the clients. Client +libraries will maintain a fixed-size Last Recently Used (LRU) cache +with previous mappings sent by the D-Bus daemon. + +If the clients overwrite a mapping due to the LRU replace policy and +later want to send a D-Bus message to the overwritten well-known name, +they will send the D-Bus message back to the D-Bus daemon and this +will send the signal with the mapping information. + +If a service goes away or if the service AF_BUS address changed and +the client still has the old AF_BUS address in its cache, it will send +the D-Bus message to the old destination. + +Since packets whose destination AF_BUS addresses are not assigned to +any process are routed by default to the bus master, the D-Bus daemon +will receive these D-bus messages and send an AF_BUS. + +Forwarded signal back to the client with the new AF_BUS address so it +can update its cache with the new mapping. + +For well-known names, the D-Bus daemon will use a different address +prefix (0x0001) so it doesn't conflict with the D-Bus unique names +address prefix (0x0000). + +Besides D-Bus method call messages which are unicast, D-Bus allows +clients to send multicast messages (D-Bus signals). Clients can send +signals messages using the bus unique name prefix multicast address +(0x0001ffffffffffff). + +A netfilter hook is used to filter these multicast messages and only +deliver to the correct peers based on match rules. + + +D-Bus aware netfilter module +---------------------------- + +AF_BUS is designed to be a generic bus transport supporting both +unicast and multicast communications. + +In order for D-Bus to operate efficiently, the transport method has to +know the D-Bus message wire-protocol and D-Bus message structure. But +adding this D-Bus specific knowledge to AF_BUS will break one of the +fundamental design principles of any network protocol stack, namely +layer-independence: layer n must not make any assumptions about the +payload in layer n + 1. + +So, in order to have a clean protocol design but be able to allow the +transport to analyze the D-Bus messages, netfilter hooks are used to +do the filtering based on match rules. + +The kernel module has to maintain the match rules and the D-Bus daemon +is responsible for managing this information. Every time an add match +rule message is processed by the D-Bus daemon, this will update the +netfilter module match rules set so the netfilter hook function can +use that information to do the match rules based filtering. + +The D-Bus daemon and the netfilter module will use the generic netlink +subsystem to do the kernel-to-user-space communication. Netlink is +already used by most of the networking subsystem in Linux +(iptables/netfilter, ip/routing, etc). + +We enforce a security scheme so only the bus master's user ID can +update the netfilter module match rules set. + +The advantage of using the netfilter subsystem is that we decouple the +mechanism from the policy. AF_BUS will only add a set of hook points +and external modules will be used to enforce a given policy. -- 1.7.10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/