Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752103AbbGaKKG (ORCPT ); Fri, 31 Jul 2015 06:10:06 -0400 Received: from mail-io0-f171.google.com ([209.85.223.171]:33113 "EHLO mail-io0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750950AbbGaKKE (ORCPT ); Fri, 31 Jul 2015 06:10:04 -0400 MIME-Version: 1.0 In-Reply-To: References: From: cee1 Date: Fri, 31 Jul 2015 18:09:43 +0800 Message-ID: Subject: Re: Revisit AF_BUS: is it a better way to implement KDBUS? To: LKML Cc: Greg KH , Lennart Poettering , David Herrmann , gnomes@lxorguk.ukuu.org.uk, luto@amacapital.net Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8269 Lines: 237 In a nutshell, this AF_BUS: 1. For privilege operations, bus endpoints send requests to bus master, and bus master replies with cmsg(control message, e.g. tells the kernel to assign specified sockaddr_bus) 2. Bus master allocates sockaddr_bus 3. Three kinds of sockaddr_bus: * The normal ones * Multicast addresses (last char of sbus_path is '*') * Kernel notification addr (sbus_addr == NULL) 4. Bloom filters friendly. (i.e. the multicast logic) 2015-07-30 21:09 GMT+08:00 cee1 : > Hi all, > > I'm interested in the idea of AF_BUS. > > There have already been varies discussions about it: > * Missing the AF_BUS - https://lwn.net/Articles/504970/ > * Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel - > http://lwn.net/Articles/537021/ > * presentation-kdbus - > https://github.com/gregkh/presentation-kdbus/blob/master/kdbus.txt > * Re: [GIT PULL] kdbus for 4.1-rc1 - https://lwn.net/Articles/641278/ > * The kdbuswreck - https://lwn.net/Articles/641275/ > > I'm wondering whether it is a better way, that is, a general mechanism > to implement varies __Bus__ orientated IPCs, such as Binder[1], > DBus[2], etc. > > The original design of AF_BUS is at > https://github.com/Airtau/genivi/blob/master/af_bus-linux/0002-net-bus-Add-AF_BUS-documentation.patch. > And following is my version of AF_BUS. > > Some characteristics of a Bus orientated IPC: > 1. A process creates a Bus, the process is then called 'bus master'. > 2. Connects to a Bus, be assigned Bus address(es). > 3. Sending/Receiving multicast message, in additional to P2P communication. > 4. The implementation may base on shared memory model to avoid unnecessary copy. > > ## How to map point 1: """A process creates a Bus, the process is then > called 'bus master'""" > The [bus master] acts: > > struct sockaddr_bus { > sa_family_t sbus_family; /* AF_BUS */ > unsigned short sbus_addr_ncomp; /* number of > components of sbus_addr */ > char sbus_path[BUS_PATH_MAX]; /* pathname of > this bus */ > uint64_t sbus_addr[BUS_ADDR_COMP_MAX]; /* address > within the bus */ > }; > #define BUS_ADDR_MAX (BUS_ADDR_COMP_MAX * sizeof(uint64_t)) > > char bus_path[] = "/tmp/test"; /* non-abstract path */ > char bus_addr[] = "org.example.bus"; > struct sockaddr_bus addr = { .sbus_family = AF_BUS }; > > strncpy(addr.sbus_path, bus_path, BUS_PATH_MAX - 2); > memcpy(addr.sbus_addr, bus_addr, MIN(sizeof(bus_addr), BUS_ADDR_MAX)); > addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(bus_addr), 8) / 8, BUS_ADDR_COMP_MAX); > > bus_fd = socket(AF_BUS, SOCK_DGRAM, 0); > /* creates a Bus, becomes the master of the bus */ > bind(bus_fd, &addr, sizeof(struct sockaddr_bus)); > > > ## How to map point 2: """Connects to a Bus, be assigned Bus address(es)""" > ### The [bus endpoint] acts: > fd = socket(AF_BUS, SOCK_DGRAM, 0); > > /* AUTH message setup */ > struct msghdr msghdr = { > .msg_name = &addr, /* bus master's addr */ > .msg_namelen = sizeof(struct sockaddr_bus), > .msg_iov = &auth_iovec, > .msg_iovlen = 1, > }; > > msghdr.msg_controllen = CMSG_SPACE(sizeof(struct ucred)); > msghdr.msg_control = alloca(msghdr.msg_controllen); > cmsg = CMSG_FIRSTHDR(&msghdr); > cmsg->cmsg_level = SOL_SOCKET; > cmsg->cmsg_type = SCM_CREDENTIALS; > cmsg->cmsg_len = CMSG_LEN(sizeof(struct ucred)); > ucred = (struct ucred *) CMSG_DATA(cmsg); > ucred->pid = getpid(); > ucred->uid = getuid(); > ucred->gid = getgid(); > > sendmsg(fd, &msghdr, MSG_NOSIGNAL); > > ### The [bus master] acts: > int optval = 1; > setsockopt(bus_fd, SOL_SOCKET, SO_PASSCRED, &optval, sizeof(optval)); > recvmsg(bus_fd, &msghdr, MSG_NOSIGNAL); > > /* do AUTH ... */ > > msghdr.msg_iov = &reply_iovec; > msghdr.msg_iovlen = 1; > msghdr.msg_controllen = 0; > msghdr.msg_control = NULL; > > if (auth_ok) { > /* bus master allocates a bus addr */ > char bus_path[] = "/tmp/test"; > char ret_bus_addr[] = "1.1"; > struct sockaddr_bus ret_addr = { .sbus_family = AF_BUS }; > > strncpy(ret_addr.sbus_path, bus_path, BUS_PATH_MAX - 2); > memcpy(ret_addr.sbus_addr, ret_bus_addr, > MIN(sizeof(ret_bus_addr), BUS_ADDR_MAX)); > ret_addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(ret_bus_addr), 8) > / 8, BUS_ADDR_COMP_MAX); > > /* > * 1. bus master returns the bus addr > * 2. kernel will apply it against the bus endpoint > * 3. the bus endpoint is then able to talk with endpoints on the bus. > */ > msghdr.msg_controllen = CMSG_SPACE(sizeof(struct sockaddr_bus)); > msghdr.msg_control = alloca(msghdr.msg_controllen); > cmsg = CMSG_FIRSTHDR(&msghdr); > cmsg->cmsg_level = BUS_SOCKET; > cmsg->cmsg_type = SCM_OWNED_ADDR; > cmsg->cmsg_len = CMSG_LEN(sizeof(struct sockaddr_bus)); > memcpy(CMSG_DATA(cmsg), &ret_addr, sizeof(struct sockaddr_bus)); > } > sendmsg(bus_fd, &msghdr, MSG_NOSIGNAL); > > > ## How to map point 3: """Sending/Receiving multicast message, in > additional to P2P communication""". > ### P2P communication > Sometimes, a bus endpoint maybe assigned to multi-addresses. It may > want to send message through a specific address. > > struct msghdr msghdr = { > .msg_name = &dst_addr, > .msg_namelen = sizeof(struct sockaddr_bus), > .msg_iov = &msg_iovec, > .msg_iovlen = 1, > }; > > char bus_path[] = "/tmp/test"; > char bus_addr[] = "com.example.service1"; > struct sockaddr_bus src_addr = { .sbus_family = AF_BUS }; > > strncpy(src_addr.sbus_path, bus_path, BUS_PATH_MAX - 2); > memcpy(src_addr.sbus_addr, bus_addr, MIN(sizeof(bus_addr), BUS_ADDR_MAX)); > src_addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(bus_addr), 8) / 8, > BUS_ADDR_COMP_MAX), > > msghdr.msg_controllen = CMSG_SPACE(sizeof(struct sockaddr_bus)); > msghdr.msg_control = alloca(msghdr.msg_controllen); > cmsg = CMSG_FIRSTHDR(&msghdr); > cmsg->cmsg_level = BUS_SOCKET; > cmsg->cmsg_type = SCM_SRC_ADDR; > cmsg->cmsg_len = CMSG_LEN(sizeof(struct sockaddr_bus)); > memcpy(CMSG_DATA(cmsg), &src_addr, sizeof(struct sockaddr_bus)); > > sendmsg(my_sock_fd, &msghdr, MSG_NOSIGNAL); > > ### Multicast > The multicast address may look like: > { > .sbus_family = AF_BUS, > > /* In a multicast addr, its bus_path is '*'-terminated */ > .sbus_path = "/tmp/test\0\0\0\0\0...*", > > .sbus_addr_ncomp = 8; > .sbus_addr = /* 8 * 64bits bitarray for example */ > } > > The receiver will request [bus master] for permitting to receive > messages from a set of multicast addresses, and the bus master grants > it with replying a control message: > { > .cmsg_level = BUS_SOCKET, > .cmsg_type = SCM_MULTICAST_MATCH, > .cmsg_data = /* the requested struct sockaddr_bus */ > } > > How does matching happen? > Let's assume someone sends message to multicast address maddr1, and > the receiver granted a match of maddr2: > > The [kernel]: > is_matched = maddr1 & maddr2 == maddr2. > > In this way, usespace can deploy bloom filters, and then it may > further apply eBPF to filter out "false positive" case. > > ## How to avoid unnecessary copy? > A sockopt similar to PACKET_RX_RING[3] may be introduced, which brings > a mmap/shared memory style API. > > > ## Other thoughts > 1. The bus master may want to receive notifications from the kernel, > such as "a bus endpoint died". A special sockaddr_bus "{ > .sbus_addr_ncomp = 0, .sbus_addr = NULL }" indicates a message from > kernel. > 2. A bus endpoint may pass a memfd to another bus endpoint, and then > they communicates under mmap/shared memory model, if it needs ultimate > performance. > > > > --- > 1. http://www.freedesktop.org/wiki/Software/dbus/ > 2. http://elinux.org/Android_Binder > 3. http://man7.org/linux/man-pages/man7/packet.7.html > > > > Regards, > > - cee1 -- Regards, - cee1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/