Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752472AbbG3NJt (ORCPT ); Thu, 30 Jul 2015 09:09:49 -0400 Received: from mail-ig0-f182.google.com ([209.85.213.182]:34533 "EHLO mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751778AbbG3NJr (ORCPT ); Thu, 30 Jul 2015 09:09:47 -0400 MIME-Version: 1.0 From: cee1 Date: Thu, 30 Jul 2015 21:09:27 +0800 Message-ID: Subject: Revisit AF_BUS: is it a better way to implement KDBUS? To: LKML Cc: Greg KH , Lennart Poettering , dh.herrmann@gmail.com, gnomes@lxorguk.ukuu.org.uk, luto@amacapital.net Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7366 Lines: 214 Hi all, I'm interested in the idea of AF_BUS. There have already been varies discussions about it: * Missing the AF_BUS - https://lwn.net/Articles/504970/ * Kroah-Hartman: AF_BUS, D-Bus, and the Linux kernel - http://lwn.net/Articles/537021/ * presentation-kdbus - https://github.com/gregkh/presentation-kdbus/blob/master/kdbus.txt * Re: [GIT PULL] kdbus for 4.1-rc1 - https://lwn.net/Articles/641278/ * The kdbuswreck - https://lwn.net/Articles/641275/ I'm wondering whether it is a better way, that is, a general mechanism to implement varies __Bus__ orientated IPCs, such as Binder[1], DBus[2], etc. The original design of AF_BUS is at https://github.com/Airtau/genivi/blob/master/af_bus-linux/0002-net-bus-Add-AF_BUS-documentation.patch. And following is my version of AF_BUS. Some characteristics of a Bus orientated IPC: 1. A process creates a Bus, the process is then called 'bus master'. 2. Connects to a Bus, be assigned Bus address(es). 3. Sending/Receiving multicast message, in additional to P2P communication. 4. The implementation may base on shared memory model to avoid unnecessary copy. ## How to map point 1: """A process creates a Bus, the process is then called 'bus master'""" The [bus master] acts: struct sockaddr_bus { sa_family_t sbus_family; /* AF_BUS */ unsigned short sbus_addr_ncomp; /* number of components of sbus_addr */ char sbus_path[BUS_PATH_MAX]; /* pathname of this bus */ uint64_t sbus_addr[BUS_ADDR_COMP_MAX]; /* address within the bus */ }; #define BUS_ADDR_MAX (BUS_ADDR_COMP_MAX * sizeof(uint64_t)) char bus_path[] = "/tmp/test"; /* non-abstract path */ char bus_addr[] = "org.example.bus"; struct sockaddr_bus addr = { .sbus_family = AF_BUS }; strncpy(addr.sbus_path, bus_path, BUS_PATH_MAX - 2); memcpy(addr.sbus_addr, bus_addr, MIN(sizeof(bus_addr), BUS_ADDR_MAX)); addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(bus_addr), 8) / 8, BUS_ADDR_COMP_MAX); bus_fd = socket(AF_BUS, SOCK_DGRAM, 0); /* creates a Bus, becomes the master of the bus */ bind(bus_fd, &addr, sizeof(struct sockaddr_bus)); ## How to map point 2: """Connects to a Bus, be assigned Bus address(es)""" ### The [bus endpoint] acts: fd = socket(AF_BUS, SOCK_DGRAM, 0); /* AUTH message setup */ struct msghdr msghdr = { .msg_name = &addr, /* bus master's addr */ .msg_namelen = sizeof(struct sockaddr_bus), .msg_iov = &auth_iovec, .msg_iovlen = 1, }; msghdr.msg_controllen = CMSG_SPACE(sizeof(struct ucred)); msghdr.msg_control = alloca(msghdr.msg_controllen); cmsg = CMSG_FIRSTHDR(&msghdr); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_CREDENTIALS; cmsg->cmsg_len = CMSG_LEN(sizeof(struct ucred)); ucred = (struct ucred *) CMSG_DATA(cmsg); ucred->pid = getpid(); ucred->uid = getuid(); ucred->gid = getgid(); sendmsg(fd, &msghdr, MSG_NOSIGNAL); ### The [bus master] acts: int optval = 1; setsockopt(bus_fd, SOL_SOCKET, SO_PASSCRED, &optval, sizeof(optval)); recvmsg(bus_fd, &msghdr, MSG_NOSIGNAL); /* do AUTH ... */ msghdr.msg_iov = &reply_iovec; msghdr.msg_iovlen = 1; msghdr.msg_controllen = 0; msghdr.msg_control = NULL; if (auth_ok) { /* bus master allocates a bus addr */ char bus_path[] = "/tmp/test"; char ret_bus_addr[] = "1.1"; struct sockaddr_bus ret_addr = { .sbus_family = AF_BUS }; strncpy(ret_addr.sbus_path, bus_path, BUS_PATH_MAX - 2); memcpy(ret_addr.sbus_addr, ret_bus_addr, MIN(sizeof(ret_bus_addr), BUS_ADDR_MAX)); ret_addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(ret_bus_addr), 8) / 8, BUS_ADDR_COMP_MAX); /* * 1. bus master returns the bus addr * 2. kernel will apply it against the bus endpoint * 3. the bus endpoint is then able to talk with endpoints on the bus. */ msghdr.msg_controllen = CMSG_SPACE(sizeof(struct sockaddr_bus)); msghdr.msg_control = alloca(msghdr.msg_controllen); cmsg = CMSG_FIRSTHDR(&msghdr); cmsg->cmsg_level = BUS_SOCKET; cmsg->cmsg_type = SCM_OWNED_ADDR; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sockaddr_bus)); memcpy(CMSG_DATA(cmsg), &ret_addr, sizeof(struct sockaddr_bus)); } sendmsg(bus_fd, &msghdr, MSG_NOSIGNAL); ## How to map point 3: """Sending/Receiving multicast message, in additional to P2P communication""". ### P2P communication Sometimes, a bus endpoint maybe assigned to multi-addresses. It may want to send message through a specific address. struct msghdr msghdr = { .msg_name = &dst_addr, .msg_namelen = sizeof(struct sockaddr_bus), .msg_iov = &msg_iovec, .msg_iovlen = 1, }; char bus_path[] = "/tmp/test"; char bus_addr[] = "com.example.service1"; struct sockaddr_bus src_addr = { .sbus_family = AF_BUS }; strncpy(src_addr.sbus_path, bus_path, BUS_PATH_MAX - 2); memcpy(src_addr.sbus_addr, bus_addr, MIN(sizeof(bus_addr), BUS_ADDR_MAX)); src_addr.sbus_addr_ncomp = MIN(ALIGN(sizeof(bus_addr), 8) / 8, BUS_ADDR_COMP_MAX), msghdr.msg_controllen = CMSG_SPACE(sizeof(struct sockaddr_bus)); msghdr.msg_control = alloca(msghdr.msg_controllen); cmsg = CMSG_FIRSTHDR(&msghdr); cmsg->cmsg_level = BUS_SOCKET; cmsg->cmsg_type = SCM_SRC_ADDR; cmsg->cmsg_len = CMSG_LEN(sizeof(struct sockaddr_bus)); memcpy(CMSG_DATA(cmsg), &src_addr, sizeof(struct sockaddr_bus)); sendmsg(my_sock_fd, &msghdr, MSG_NOSIGNAL); ### Multicast The multicast address may look like: { .sbus_family = AF_BUS, /* In a multicast addr, its bus_path is '*'-terminated */ .sbus_path = "/tmp/test\0\0\0\0\0...*", .sbus_addr_ncomp = 8; .sbus_addr = /* 8 * 64bits bitarray for example */ } The receiver will request [bus master] for permitting to receive messages from a set of multicast addresses, and the bus master grants it with replying a control message: { .cmsg_level = BUS_SOCKET, .cmsg_type = SCM_MULTICAST_MATCH, .cmsg_data = /* the requested struct sockaddr_bus */ } How does matching happen? Let's assume someone sends message to multicast address maddr1, and the receiver granted a match of maddr2: The [kernel]: is_matched = maddr1 & maddr2 == maddr2. In this way, usespace can deploy bloom filters, and then it may further apply eBPF to filter out "false positive" case. ## How to avoid unnecessary copy? A sockopt similar to PACKET_RX_RING[3] may be introduced, which brings a mmap/shared memory style API. ## Other thoughts 1. The bus master may want to receive notifications from the kernel, such as "a bus endpoint died". A special sockaddr_bus "{ .sbus_addr_ncomp = 0, .sbus_addr = NULL }" indicates a message from kernel. 2. A bus endpoint may pass a memfd to another bus endpoint, and then they communicates under mmap/shared memory model, if it needs ultimate performance. --- 1. http://www.freedesktop.org/wiki/Software/dbus/ 2. http://elinux.org/Android_Binder 3. http://man7.org/linux/man-pages/man7/packet.7.html Regards, - cee1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/