Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2127089imu; Wed, 12 Dec 2018 09:57:01 -0800 (PST) X-Google-Smtp-Source: AFSGD/WYFIRfrkGe8K4aw9fbzz6LIUkWPyQ3sYPpd3Ve52cOUHejBF6S89s8owCRfxHk+84SSxxT X-Received: by 2002:a62:13c3:: with SMTP id 64mr20972780pft.93.1544637421473; Wed, 12 Dec 2018 09:57:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544637421; cv=none; d=google.com; s=arc-20160816; b=NV9sMj//rabDVhjT+n5GIIe7/frpBcXsspPneFy3tN6By2PdhlLoSL4tW3ZpjgJuN8 /+2Krp70JtyGhEIH1v0RA3nZLxVTiF7sISpSbUEYP/ELqXv6RiFBs5jGAp0zrAMMcmaM gyw6qcnSxN3ddJmKBTvbvjfmQMyvt5y7nqONdayvIAlWNEcCL7dYDl9v8RmFwS0KJkGV EjruVOqM4VvlxsBQLiADR3CZZdyNWX3AnyfRBs+Ho+aG1FoBhJTvOAybhP0xkZElFTMZ ZMeyng0lABpMebLHVC0KEn49uyEbiN+QhnvrKRDdpbbV90Vz58PXOom2fWS8V6b2InBe mwOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=PLyDqFcbS2i//3IqsBwlrWa32W9Jh632Cny9otzRdf8=; b=k2+B5x7LgsddhwAs1aIYLxKB9X4g9+GJjY/ZqaKLeet3G1P4czaNGZPebpN/cY2CNN VSj6zW7Zl2Kun6NkD+GQ+4cOc8UELVdAqYGXArvo4D/BAzEqWNFa9yWdafMtsuRileRL RvErMa7KVTdvt7JNirpVq39Ma2olvhIPn+O5GE/cC7T38cYYANDoB4q7Exf53jolCEe9 DixYJmOO6A1cuLCh+Fjk1afSy+ezaBQeAgW/xyO7CCpOq/Ey1xpH9yh4VIqDINqF010A z41xSfV6s15+qxdWaO/5HzUw3qPOujeFFyM49YgDbEx9rowA3wrtw4WpwIvVwpT78w/p PEzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EbonfSOK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b11si16050877pfo.240.2018.12.12.09.56.23; Wed, 12 Dec 2018 09:57:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=EbonfSOK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728083AbeLLRyU (ORCPT + 99 others); Wed, 12 Dec 2018 12:54:20 -0500 Received: from mail-lf1-f66.google.com ([209.85.167.66]:44913 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727748AbeLLRyU (ORCPT ); Wed, 12 Dec 2018 12:54:20 -0500 Received: by mail-lf1-f66.google.com with SMTP id z13so14195685lfe.11 for ; Wed, 12 Dec 2018 09:54:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PLyDqFcbS2i//3IqsBwlrWa32W9Jh632Cny9otzRdf8=; b=EbonfSOKVkoqvBTXgEfQioPmcWkTsNdVmIN7W8+NIrAzQRge81tZ6QkCYuLVSjWBnv txlxOz+hIfrW2sYF8E74UrJYBLOrag44SnNY0k9v1+UuDP3FZwTmOlOp5CVuImOyjWnC j0j9bpq6v2OaAmW9G7zKXgwrq4MTQsV9LL02YYfxw4zWSCW+59pZ2nFcudy7mut+Tppc 0hHYWq1w1kIQRaHXmik1+PTTQhphyf1JJrG108nnco88eCgVZsT8vXvnYSHg5QVW+g6W Hi9Zl1i97C3XD5BWTqWJEuqGRNtBf73B9Nq0+T7Y1QeXl/nNzsfHafTm4OOHN40AeD2W 90TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PLyDqFcbS2i//3IqsBwlrWa32W9Jh632Cny9otzRdf8=; b=RU331ipB7fuxgrC82v9pAIOeWEIyWgtpY5pgY6wyb1x7wqF5/wL+yJDq9HbAA/XFlC BqCauIsPtjvSewtqdMcYmgWaRH0HZuHG4By9Vmq0+LxfYOz7KzbAEu2H5MBhT+2SALJP fj+Ger2IR6Te1XNeOzdb1wQxVdjDXPoLrLmBWxNHbMMxj4j0O95ycxdeMmUPkg3vNqml 5N0CN4RjsfSFD8gcbIekoeDYnM+IJQDM9hm1Pqb6N82/wL6M33p76YQRJpDC1ATgWWOk HFJGkL4mKJWZhxXOYgfVQXmbQgdcO3bSVny/f31H4gbKsRfcO/2dJmp1hrfBX+D0xrZ9 x2hQ== X-Gm-Message-State: AA+aEWbVRl7xhrvwM+xSlrStkSiSBiuBXWGpi05LNMm48J2uB8xzWWA8 FH8tCxDt7nTyO+KjtrgMOJbv0AV1GVPyDF6us2dhwLmbCR8= X-Received: by 2002:a19:280f:: with SMTP id o15mr12043228lfo.0.1544637254936; Wed, 12 Dec 2018 09:54:14 -0800 (PST) MIME-Version: 1.0 References: <20181207232509.31771-1-christian@brauner.io> <20181212125126.4lhnjvrvaz5wwpib@brauner.io> In-Reply-To: <20181212125126.4lhnjvrvaz5wwpib@brauner.io> From: Todd Kjos Date: Wed, 12 Dec 2018 09:54:03 -0800 Message-ID: Subject: Re: [PATCH v1] binder: implement binderfs To: Christian Brauner Cc: Greg Kroah-Hartman , Todd Kjos , Martijn Coenen , LKML , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , joel@joelfernandes.org, darrick.wong@oracle.com, david@fromorbit.com, kilobyte@angband.pl, "open list:ANDROID DRIVERS" , chouryzhou@tencent.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 12, 2018 at 4:51 AM Christian Brauner wrote: > > > On Fri, Dec 7, 2018 at 3:26 PM Christian Brauner wrote: > > > > > > As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the > > > implementation of binderfs. > > > > > > binderfs is a backwards-compatible filesystem for Android's binder ipc > > > mechanism. Each ipc namespace will mount a new binderfs instance. Mounting > > > binderfs multiple times at different locations in the same ipc namespace > > > will not cause a new super block to be allocated and hence it will be the > > > same filesystem instance. > > > Each new binderfs mount will have its own set of binder devices only > > > visible in the ipc namespace it has been mounted in. All devices in a new > > > binderfs mount will follow the scheme binder%d and numbering will always > > > start at 0. > > > > > > /* Backwards compatibility */ > > > Devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES for the > > > initial ipc namespace will work as before. They will be registered via > > > misc_register() and appear in the devtmpfs mount. Specifically, the > > > standard devices binder, hwbinder, and vndbinder will all appear in their > > > standard locations in /dev. Mounting or unmounting the binderfs mount in > > > the initial ipc namespace will have no effect on these devices, i.e. they > > > will neither show up in the binderfs mount nor will they disappear when the > > > binderfs mount is gone. > > > > > > /* binder-control */ > > > Each new binderfs instance comes with a binder-control device. No other > > > devices will be present at first. The binder-control device can be used to > > > dynamically allocate binder devices. All requests operate on the binderfs > > > mount the binder-control device resides in: > > > - BINDER_CTL_ADD > > > Allocate a new binder device. > > > Assuming a new instance of binderfs has been mounted at /dev/binderfs via > > > mount -t binderfs binderfs /dev/binderfs. Then a request to create a new > > > binder device can be made via: > > > > > > struct binderfs_device device = {0}; > > > snprintf(device.name, BINDERFS_MAX_NAME, "%s", "my-device"); > > > int fd = open("/dev/binderfs/binder-control", O_RDWR); > > > ioctl(fd, BINDER_CTL_ADD, &device); > > > > > > binderfs will then allocate a new minor number and create the device > > > "my-device". > > > The struct binderfs_device will then be used to return the major and minor > > > number, for the device. > > > Binderfs devices can simply be removed via unlink(). > > > > > > /* Implementation details */ > > > - When binderfs is registered as a new filesystem it will dynamically > > > allocate a new major number. The allocated major number will be returned > > > in struct binderfs_device when a new binder device is allocated. > > > Minor numbers that have been given out are tracked in a global idr struct > > > that is capped at BINDERFS_MAX_MINOR. The minor number tracker is > > > protected by a global mutex. This is the only point of contention between > > > binderfs mounts. > > > - Each binderfs super block has its own struct binderfs_info that tracks > > > specific details about a binderfs instance: the ipc namespace, the dentry > > > of the binder-control device, the root uid and gid of the user namespace > > > the binderfs instance was mounted in. > > > - binderfs can be mounted by user namespace root in a non-initial user > > > namespace. The devices will be owned by user namespace root. > > > - New binder devices associated with a binderfs mount do not use the > > > full misc_register() infrastructure. The misc_register() infrastructure > > > can only create new devices in the host's devtmpfs mount. binderfs does > > > however only make devices appear under its own mountpoint and thus > > > allocates new character devices nodes from the inode of the root dentry > > > of the super block. This will have the side-effect that binderfs specific > > > device nodes do not appear in sysfs. This behavior is similar to devpts > > > allocated pts devices and has no effect on the functionality of the ipc > > > mechanism itself. > > > > > > /* Create a new binder device in a binderfs mount */ > > > sudo mkdir /dev/binderfs > > > sudo mount -t binder binder /dev/binderfs > > > > > > #define _GNU_SOURCE > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > #include > > > > > > int main(int argc, char *argv[]) > > > { > > > int fd, ret, saved_errno; > > > struct binderfs_device device = { 0 }; > > > > > > if (argc < 2) > > > exit(EXIT_FAILURE); > > > > > > strncpy(device.name, argv[1], sizeof(device.name)); > > > > > > fd = open("/dev/binderfs/binder-control", O_RDONLY | O_CLOEXEC); > > > if (fd < 0) { > > > printf("%s - Failed to open binder-control device\n", > > > strerror(errno)); > > > exit(EXIT_FAILURE); > > > } > > > > > > ret = ioctl(fd, BINDER_CTL_ADD, &device); > > > saved_errno = errno; > > > close(fd); > > > errno = saved_errno; > > > if (ret < 0) { > > > printf("%s - Failed to allocate new binder device\n", > > > strerror(errno)); > > > exit(EXIT_FAILURE); > > > } > > > > > > printf("Allocated new binder device with major %d, minor %d, and " > > > "name %s\n", device.major, device.minor, > > > device.name); > > > > > > exit(EXIT_SUCCESS); > > > } > > > > > > /* Demo */ > > > A demo of how binderfs works can be found under [2]. > > > > > > [1]: https://goo.gl/JL2tfX > > > > > > Cc: Martijn Coenen > > > Cc: Todd Kjos > > > Cc: Greg Kroah-Hartman > > > Signed-off-by: Christian Brauner > > Do we plan to bring this into mergeable shape before Christmas? I'm > happy to do it. :) It looks fine to me and I tested it on an android device. Acked-by: Todd Kjos > > Christian > > > > --- > > > v1: > > > - simplify init_binderfs() > > > Move the creation of binder-control into binderfs_fill_super() so that we can > > > cleanly and without any complex error handling deallocate the super block on > > > failure. > > > - switch from __u32 to __u8 in struct binderfs_device > > > __u8 is the correct value to cross the kernel <-> userspace boundary. > > > - introduce BINDERFS_MAX_NAME > > > This determines the maximum length of a binderfs binder device name. > > > - add name member struct binderfs_device > > > This lets userspace specify a name for the binder device. The maximum length > > > is determined by BINDERFS_MAX_NAME. > > > - handle naming collisions > > > Since userspace now gives us a name to use for the device we need to handle > > > the case where userspace passes the same device name twice. This is done by > > > using d_lookup() (takes rename lock too). If a matching dentry under the root > > > dentry of the superblock is found we test whether it is still active and if > > > so return -EEXIST. > > > - remove per-super block idr tracking and locking > > > Since userspace now determines the name remove the idr tracking since it's > > > not needed anymore. > > > - remove ctl_mutex from struct binders_info > > > It was only needed to protect the per-super block idr which has been removed. > > > If I'm not mistaken we currently do not need to lock the ioctl() itself since > > > removing is handled by a simple unlink(). So remove the mutex. > > > - ensure that binderfs_evict_inode() doesn't cause double-frees on iput() > > > When setting up the inode fails at a step where a new inode already has been > > > allocated we need to set inode->i_private to NULL to ensure that > > > binderfs_evict_inode() doesn't try to free up stuff that we already freed > > > while handling the error. > > > --- > > > drivers/android/Kconfig | 12 + > > > drivers/android/Makefile | 1 + > > > drivers/android/binder.c | 25 +- > > > drivers/android/binder_internal.h | 49 ++ > > > drivers/android/binderfs.c | 565 ++++++++++++++++++++++++ > > > include/uapi/linux/android/binder_ctl.h | 35 ++ > > > include/uapi/linux/magic.h | 1 + > > > 7 files changed, 671 insertions(+), 17 deletions(-) > > > create mode 100644 drivers/android/binder_internal.h > > > create mode 100644 drivers/android/binderfs.c > > > create mode 100644 include/uapi/linux/android/binder_ctl.h > > > > > > diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig > > > index 51e8250d113f..4c190f8d1f4c 100644 > > > --- a/drivers/android/Kconfig > > > +++ b/drivers/android/Kconfig > > > @@ -20,6 +20,18 @@ config ANDROID_BINDER_IPC > > > Android process, using Binder to identify, invoke and pass arguments > > > between said processes. > > > > > > +config ANDROID_BINDERFS > > > + bool "Android Binderfs filesystem" > > > + depends on ANDROID_BINDER_IPC > > > + default n > > > + ---help--- > > > + Binderfs is a pseudo-filesystem for the Android Binder IPC driver > > > + which can be mounted per-ipc namespace allowing to run multiple > > > + instances of Android. > > > + Each binderfs mount initially only contains a binder-control device. > > > + It can be used to dynamically allocate new binder IPC devices via > > > + ioctls. > > > + > > > config ANDROID_BINDER_DEVICES > > > string "Android Binder devices" > > > depends on ANDROID_BINDER_IPC > > > diff --git a/drivers/android/Makefile b/drivers/android/Makefile > > > index a01254c43ee3..c7856e3200da 100644 > > > --- a/drivers/android/Makefile > > > +++ b/drivers/android/Makefile > > > @@ -1,4 +1,5 @@ > > > ccflags-y += -I$(src) # needed for trace events > > > > > > +obj-$(CONFIG_ANDROID_BINDERFS) += binderfs.o > > > obj-$(CONFIG_ANDROID_BINDER_IPC) += binder.o binder_alloc.o > > > obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > > > index cb30a524d16d..3ed8bc4b7451 100644 > > > --- a/drivers/android/binder.c > > > +++ b/drivers/android/binder.c > > > @@ -78,6 +78,7 @@ > > > #include > > > > > > #include "binder_alloc.h" > > > +#include "binder_internal.h" > > > #include "binder_trace.h" > > > > > > static HLIST_HEAD(binder_deferred_list); > > > @@ -262,20 +263,6 @@ static struct binder_transaction_log_entry *binder_transaction_log_add( > > > return e; > > > } > > > > > > -struct binder_context { > > > - struct binder_node *binder_context_mgr_node; > > > - struct mutex context_mgr_node_lock; > > > - > > > - kuid_t binder_context_mgr_uid; > > > - const char *name; > > > -}; > > > - > > > -struct binder_device { > > > - struct hlist_node hlist; > > > - struct miscdevice miscdev; > > > - struct binder_context context; > > > -}; > > > - > > > /** > > > * struct binder_work - work enqueued on a worklist > > > * @entry: node enqueued on list > > > @@ -4935,8 +4922,12 @@ static int binder_open(struct inode *nodp, struct file *filp) > > > proc->tsk = current->group_leader; > > > INIT_LIST_HEAD(&proc->todo); > > > proc->default_priority = task_nice(current); > > > - binder_dev = container_of(filp->private_data, struct binder_device, > > > - miscdev); > > > + /* binderfs stashes devices in i_private */ > > > + if (is_binderfs_device(nodp)) > > > + binder_dev = nodp->i_private; > > > + else > > > + binder_dev = container_of(filp->private_data, > > > + struct binder_device, miscdev); > > > proc->context = &binder_dev->context; > > > binder_alloc_init(&proc->alloc); > > > > > > @@ -5724,7 +5715,7 @@ static int binder_transaction_log_show(struct seq_file *m, void *unused) > > > return 0; > > > } > > > > > > -static const struct file_operations binder_fops = { > > > +const struct file_operations binder_fops = { > > > .owner = THIS_MODULE, > > > .poll = binder_poll, > > > .unlocked_ioctl = binder_ioctl, > > > diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h > > > new file mode 100644 > > > index 000000000000..7fb97f503ef2 > > > --- /dev/null > > > +++ b/drivers/android/binder_internal.h > > > @@ -0,0 +1,49 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > + > > > +#ifndef _LINUX_BINDER_INTERNAL_H > > > +#define _LINUX_BINDER_INTERNAL_H > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +struct binder_context { > > > + struct binder_node *binder_context_mgr_node; > > > + struct mutex context_mgr_node_lock; > > > + kuid_t binder_context_mgr_uid; > > > + const char *name; > > > +}; > > > + > > > +/** > > > + * struct binder_device - information about a binder device node > > > + * @hlist: list of binder devices (only used for devices requested via > > > + * CONFIG_ANDROID_BINDER_DEVICES) > > > + * @miscdev: information about a binder character device node > > > + * @context: binder context information > > > + * @binderfs_inode: This is the inode of the root dentry of the super block > > > + * belonging to a binderfs mount. > > > + */ > > > +struct binder_device { > > > + struct hlist_node hlist; > > > + struct miscdevice miscdev; > > > + struct binder_context context; > > > + struct inode *binderfs_inode; > > > +}; > > > + > > > +extern const struct file_operations binder_fops; > > > + > > > +#ifdef CONFIG_ANDROID_BINDERFS > > > +extern bool is_binderfs_device(const struct inode *inode); > > > +#else > > > +static inline bool is_binderfs_device(const struct inode *inode) > > > +{ > > > + return false; > > > +} > > > +#endif > > > + > > > +#endif /* _LINUX_BINDER_INTERNAL_H */ > > > diff --git a/drivers/android/binderfs.c b/drivers/android/binderfs.c > > > new file mode 100644 > > > index 000000000000..ac435210eb53 > > > --- /dev/null > > > +++ b/drivers/android/binderfs.c > > > @@ -0,0 +1,565 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#include "binder_internal.h" > > > + > > > +#define FIRST_INODE 1 > > > +#define SECOND_INODE 2 > > > +#define INODE_OFFSET 3 > > > +#define INTSTRLEN 21 > > > +#define BINDERFS_MAX_MINOR (1U << MINORBITS) > > > + > > > +static struct vfsmount *binderfs_mnt; > > > + > > > +static dev_t binderfs_dev; > > > +static DEFINE_MUTEX(binderfs_minors_mutex); > > > +static DEFINE_IDA(binderfs_minors); > > > + > > > +/** > > > + * binderfs_info - information about a binderfs mount > > > + * @ipc_ns: The ipc namespace the binderfs mount belongs to. > > > + * @control_dentry: This records the dentry of this binderfs mount > > > + * binder-control device. > > > + * @root_uid: uid that needs to be used when a new binder device is > > > + * created. > > > + * @root_gid: gid that needs to be used when a new binder device is > > > + * created. > > > + */ > > > +struct binderfs_info { > > > + struct ipc_namespace *ipc_ns; > > > + struct dentry *control_dentry; > > > + kuid_t root_uid; > > > + kgid_t root_gid; > > > + > > > +}; > > > + > > > +static inline struct binderfs_info *BINDERFS_I(struct inode *inode) > > > +{ > > > + return inode->i_sb->s_fs_info; > > > +} > > > + > > > +bool is_binderfs_device(const struct inode *inode) > > > +{ > > > + if (inode->i_sb->s_magic == BINDERFS_SUPER_MAGIC) > > > + return true; > > > + > > > + return false; > > > +} > > > + > > > +/** > > > + * binderfs_new_inode - allocate inode from super block of a binderfs mount > > > + * @ref_inode: inode from wich the super block will be taken > > > + * @userp: buffer to copy information about new device for userspace to > > > + * @device: binder device for which the new inode will be allocated > > > + * @req: struct binderfs_device as copied from userspace > > > + * > > > + * This function will allocate a new inode from the super block of the > > > + * filesystem mount and attach a dentry to that inode. > > > + * Minor numbers are limited and tracked globally in binderfs_minors. > > > + * The function will stash a struct binder_device for the specific binder > > > + * device in i_private of the inode. > > > + * > > > + * Return: 0 on success, negative errno on failure > > > + */ > > > +static int binderfs_new_inode(struct inode *ref_inode, > > > + struct binder_device *device, > > > + struct binderfs_device __user *userp, > > > + struct binderfs_device *req) > > > +{ > > > + int minor, ret; > > > + struct dentry *dentry, *dup, *root; > > > + size_t name_len = BINDERFS_MAX_NAME + 1; > > > + char *name = NULL; > > > + struct inode *inode = NULL; > > > + struct super_block *sb = ref_inode->i_sb; > > > + struct binderfs_info *info = sb->s_fs_info; > > > + > > > + /* Reserve new minor number for the new device. */ > > > + mutex_lock(&binderfs_minors_mutex); > > > + minor = ida_alloc_max(&binderfs_minors, BINDERFS_MAX_MINOR, GFP_KERNEL); > > > + mutex_unlock(&binderfs_minors_mutex); > > > + if (minor < 0) > > > + return minor; > > > + > > > + ret = -ENOMEM; > > > + inode = new_inode(sb); > > > + if (!inode) > > > + goto err; > > > + > > > + inode->i_ino = minor + INODE_OFFSET; > > > + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); > > > + init_special_inode(inode, S_IFCHR | 0600, > > > + MKDEV(MAJOR(binderfs_dev), minor)); > > > + inode->i_fop = &binder_fops; > > > + inode->i_uid = info->root_uid; > > > + inode->i_gid = info->root_gid; > > > + inode->i_private = device; > > > + > > > + name = kmalloc(name_len, GFP_KERNEL); > > > + if (!name) > > > + goto err; > > > + > > > + ret = snprintf(name, name_len, "%s", req->name); > > > + if (ret < 0 || (size_t)ret >= name_len) { > > > + ret = -EINVAL; > > > + goto err; > > > + } > > > + > > > + device->binderfs_inode = inode; > > > + device->context.binder_context_mgr_uid = INVALID_UID; > > > + device->context.name = name; > > > + device->miscdev.name = name; > > > + device->miscdev.minor = minor; > > > + mutex_init(&device->context.context_mgr_node_lock); > > > + > > > + req->major = MAJOR(binderfs_dev); > > > + req->minor = minor; > > > + > > > + ret = copy_to_user(userp, req, sizeof(*req)); > > > + if (ret) > > > + goto err; > > > + > > > + root = sb->s_root; > > > + inode_lock(d_inode(root)); > > > + dentry = d_alloc_name(root, name); > > > + if (!dentry) { > > > + inode_unlock(d_inode(root)); > > > + ret = -ENOMEM; > > > + goto err; > > > + } > > > + > > > + /* Verify that the name userspace gave us is not already in use. */ > > > + dup = d_lookup(root, &dentry->d_name); > > > + if (dup) { > > > + if (d_really_is_positive(dup)) { > > > + dput(dup); > > > + dput(dentry); > > > + inode_unlock(d_inode(root)); > > > + /* > > > + * Prevent double free since iput() calls > > > + * binderfs_evict_inode(). > > > + */ > > > + inode->i_private = NULL; > > > + ret = -EEXIST; > > > + goto err; > > > + } > > > + dput(dup); > > > + } > > > + > > > + d_add(dentry, inode); > > > + fsnotify_create(root->d_inode, dentry); > > > + inode_unlock(d_inode(root)); > > > + > > > + return 0; > > > + > > > +err: > > > + kfree(name); > > > + mutex_lock(&binderfs_minors_mutex); > > > + ida_free(&binderfs_minors, minor); > > > + mutex_unlock(&binderfs_minors_mutex); > > > + iput(inode); > > > + > > > + return ret; > > > +} > > > + > > > +static int binderfs_binder_device_create(struct inode *inode, > > > + struct binderfs_device __user *userp, > > > + struct binderfs_device *req) > > > +{ > > > + struct binder_device *device; > > > + int ret; > > > + > > > + device = kzalloc(sizeof(*device), GFP_KERNEL); > > > + if (!device) > > > + return -ENOMEM; > > > + > > > + ret = binderfs_new_inode(inode, device, userp, req); > > > + if (ret < 0) { > > > + kfree(device); > > > + return ret; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > +/** > > > + * binderfs_ctl_ioctl - handle binder device node allocation requests > > > + * > > > + * The request handler for the binder-control device. All requests operate on > > > + * the binderfs mount the binder-control device resides in: > > > + * - BINDER_CTL_ADD > > > + * Allocate a new binder device. > > > + * > > > + * Return: 0 on success, negative errno on failure > > > + */ > > > +static long binder_ctl_ioctl(struct file *file, unsigned int cmd, > > > + unsigned long arg) > > > +{ > > > + struct binderfs_info *info; > > > + int ret = -EINVAL; > > > + struct inode *inode = file_inode(file); > > > + struct binderfs_device *device = (struct binderfs_device __user *)arg; > > > + struct binderfs_device device_req; > > > + > > > + info = BINDERFS_I(inode); > > > + switch (cmd) { > > > + case BINDER_CTL_ADD: > > > + ret = copy_from_user(&device_req, device, sizeof(device_req)); > > > + if (ret) > > > + break; > > > + > > > + ret = binderfs_binder_device_create(inode, device, &device_req); > > > + break; > > > + default: > > > + break; > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +static void binderfs_evict_inode(struct inode *inode) > > > +{ > > > + struct binder_device *device = inode->i_private; > > > + > > > + clear_inode(inode); > > > + > > > + if (!device) > > > + return; > > > + > > > + mutex_lock(&binderfs_minors_mutex); > > > + ida_free(&binderfs_minors, device->miscdev.minor); > > > + mutex_unlock(&binderfs_minors_mutex); > > > + > > > + kfree(device->context.name); > > > + kfree(device); > > > +} > > > + > > > +static const struct super_operations binderfs_super_ops = { > > > + .statfs = simple_statfs, > > > + .evict_inode = binderfs_evict_inode, > > > +}; > > > + > > > +static int binderfs_rename(struct inode *old_dir, struct dentry *old_dentry, > > > + struct inode *new_dir, struct dentry *new_dentry, > > > + unsigned int flags) > > > +{ > > > + struct inode *inode = d_inode(old_dentry); > > > + > > > + /* binderfs doesn't support directories. */ > > > + if (d_is_dir(old_dentry)) > > > + return -EPERM; > > > + > > > + if (flags & ~RENAME_NOREPLACE) > > > + return -EINVAL; > > > + > > > + if (!simple_empty(new_dentry)) > > > + return -ENOTEMPTY; > > > + > > > + if (d_really_is_positive(new_dentry)) > > > + simple_unlink(new_dir, new_dentry); > > > + > > > + old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime = > > > + new_dir->i_mtime = inode->i_ctime = current_time(old_dir); > > > + > > > + return 0; > > > +} > > > + > > > +static int binderfs_unlink(struct inode *dir, struct dentry *dentry) > > > +{ > > > + /* > > > + * The control dentry is only ever touched during mount so checking it > > > + * here should not require us to take lock. > > > + */ > > > + if (BINDERFS_I(dir)->control_dentry == dentry) > > > + return -EPERM; > > > + > > > + return simple_unlink(dir, dentry); > > > +} > > > + > > > +static const struct file_operations binder_ctl_fops = { > > > + .owner = THIS_MODULE, > > > + .open = nonseekable_open, > > > + .unlocked_ioctl = binder_ctl_ioctl, > > > + .compat_ioctl = binder_ctl_ioctl, > > > + .llseek = noop_llseek, > > > +}; > > > + > > > +/** > > > + * binderfs_binder_ctl_create - create a new binder-control device > > > + * @sb: super block of the binderfs mount > > > + * > > > + * This function creates a new binder-control device node in the binderfs mount > > > + * referred to by @sb. > > > + * > > > + * Return: 0 on success, negative errno on failure > > > + */ > > > +static int binderfs_binder_ctl_create(struct super_block *sb) > > > +{ > > > + int minor; > > > + struct dentry *dentry; > > > + struct binder_device *device; > > > + int ret = 0; > > > + struct inode *inode = NULL; > > > + struct dentry *root = sb->s_root; > > > + struct binderfs_info *info = sb->s_fs_info; > > > + > > > + device = kzalloc(sizeof(*device), GFP_KERNEL); > > > + if (!device) > > > + return -ENOMEM; > > > + > > > + inode_lock(d_inode(root)); > > > + > > > + if (info->control_dentry) > > > + goto out; > > > + > > > + ret = -ENOMEM; > > > + inode = new_inode(sb); > > > + if (!inode) > > > + goto out; > > > + > > > + /* Reserve a new minor number for the new device. */ > > > + mutex_lock(&binderfs_minors_mutex); > > > + minor = ida_alloc_max(&binderfs_minors, BINDERFS_MAX_MINOR, GFP_KERNEL); > > > + mutex_unlock(&binderfs_minors_mutex); > > > + if (minor < 0) { > > > + ret = minor; > > > + goto out; > > > + } > > > + > > > + inode->i_ino = SECOND_INODE; > > > + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); > > > + init_special_inode(inode, S_IFCHR | 0600, > > > + MKDEV(MAJOR(binderfs_dev), minor)); > > > + inode->i_fop = &binder_ctl_fops; > > > + inode->i_uid = info->root_uid; > > > + inode->i_gid = info->root_gid; > > > + inode->i_private = device; > > > + > > > + device->binderfs_inode = inode; > > > + device->miscdev.minor = minor; > > > + > > > + dentry = d_alloc_name(root, "binder-control"); > > > + if (!dentry) > > > + goto out; > > > + > > > + info->control_dentry = dentry; > > > + d_add(dentry, inode); > > > + inode_unlock(d_inode(root)); > > > + > > > + return 0; > > > + > > > +out: > > > + inode_unlock(d_inode(root)); > > > + kfree(device); > > > + if (inode) { > > > + inode->i_private = NULL; > > > + iput(inode); > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +static const struct inode_operations binderfs_dir_inode_operations = { > > > + .lookup = simple_lookup, > > > + .rename = binderfs_rename, > > > + .unlink = binderfs_unlink, > > > +}; > > > + > > > +static int binderfs_fill_super(struct super_block *sb, void *data, int silent) > > > +{ > > > + struct binderfs_info *info; > > > + int ret = -ENOMEM; > > > + struct inode *inode = NULL; > > > + struct ipc_namespace *ipc_ns = sb->s_fs_info; > > > + > > > + get_ipc_ns(ipc_ns); > > > + > > > + sb->s_blocksize = PAGE_SIZE; > > > + sb->s_blocksize_bits = PAGE_SHIFT; > > > + > > > + /* > > > + * The binderfs filesystem can be mounted by userns root in a > > > + * non-initial userns. By default such mounts have the SB_I_NODEV flag > > > + * set in s_iflags to prevent security issues where userns root can > > > + * just create random device nodes via mknod() since it owns the > > > + * filesystem mount. But binderfs does not allow to create any files > > > + * including devices nodes. The only way to create binder devices nodes > > > + * is through the binder-control device which userns root is explicitly > > > + * allowed to do. So removing the SB_I_NODEV flag from s_iflags is both > > > + * necessary and safe. > > > + */ > > > + sb->s_iflags &= ~SB_I_NODEV; > > > + sb->s_iflags |= SB_I_NOEXEC; > > > + sb->s_magic = BINDERFS_SUPER_MAGIC; > > > + sb->s_op = &binderfs_super_ops; > > > + sb->s_time_gran = 1; > > > + > > > + info = kzalloc(sizeof(struct binderfs_info), GFP_KERNEL); > > > + if (!info) > > > + return ret; > > > + > > > + info->ipc_ns = ipc_ns; > > > + info->root_gid = make_kgid(sb->s_user_ns, 0); > > > + if (!gid_valid(info->root_gid)) > > > + info->root_gid = GLOBAL_ROOT_GID; > > > + info->root_uid = make_kuid(sb->s_user_ns, 0); > > > + if (!uid_valid(info->root_uid)) > > > + info->root_uid = GLOBAL_ROOT_UID; > > > + > > > + sb->s_fs_info = info; > > > + > > > + inode = new_inode(sb); > > > + if (!inode) > > > + goto err_without_dentry; > > > + > > > + inode->i_ino = FIRST_INODE; > > > + inode->i_fop = &simple_dir_operations; > > > + inode->i_mode = S_IFDIR | 0755; > > > + inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); > > > + inode->i_op = &binderfs_dir_inode_operations; > > > + set_nlink(inode, 2); > > > + > > > + sb->s_root = d_make_root(inode); > > > + if (!sb->s_root) > > > + goto err_without_dentry; > > > + > > > + ret = binderfs_binder_ctl_create(sb); > > > + if (ret) > > > + goto err_with_dentry; > > > + > > > + return 0; > > > + > > > +err_with_dentry: > > > + dput(sb->s_root); > > > + sb->s_root = NULL; > > > + > > > +err_without_dentry: > > > + if (inode) > > > + iput(inode); > > > + kfree(info); > > > + put_ipc_ns(ipc_ns); > > > + > > > + return ret; > > > +} > > > + > > > +static int binderfs_test_super(struct super_block *sb, void *data) > > > +{ > > > + struct binderfs_info *info = sb->s_fs_info; > > > + > > > + if (info) > > > + return info->ipc_ns == data; > > > + > > > + return 0; > > > +} > > > + > > > +static int binderfs_set_super(struct super_block *sb, void *data) > > > +{ > > > + sb->s_fs_info = data; > > > + return set_anon_super(sb, NULL); > > > +} > > > + > > > +static struct dentry *binderfs_mount(struct file_system_type *fs_type, > > > + int flags, const char *dev_name, > > > + void *data) > > > +{ > > > + struct super_block *sb; > > > + struct ipc_namespace *ipc_ns = current->nsproxy->ipc_ns; > > > + > > > + if (!ns_capable(ipc_ns->user_ns, CAP_SYS_ADMIN)) > > > + return ERR_PTR(-EPERM); > > > + > > > + sb = sget_userns(fs_type, binderfs_test_super, binderfs_set_super, > > > + flags, ipc_ns->user_ns, ipc_ns); > > > + if (IS_ERR(sb)) > > > + return ERR_CAST(sb); > > > + > > > + if (!sb->s_root) { > > > + int ret = binderfs_fill_super(sb, data, flags & SB_SILENT ? 1 : 0); > > > + if (ret) { > > > + deactivate_locked_super(sb); > > > + return ERR_PTR(ret); > > > + } > > > + > > > + sb->s_flags |= SB_ACTIVE; > > > + } > > > + > > > + return dget(sb->s_root); > > > +} > > > + > > > +static void binderfs_kill_super(struct super_block *sb) > > > +{ > > > + struct binderfs_info *info = sb->s_fs_info; > > > + > > > + if (info && info->ipc_ns) > > > + put_ipc_ns(info->ipc_ns); > > > + > > > + kfree(info); > > > + kill_litter_super(sb); > > > +} > > > + > > > +static struct file_system_type binder_fs_type = { > > > + .name = "binder", > > > + .mount = binderfs_mount, > > > + .kill_sb = binderfs_kill_super, > > > + .fs_flags = FS_USERNS_MOUNT, > > > +}; > > > + > > > +static int __init init_binderfs(void) > > > +{ > > > + int ret; > > > + > > > + /* Allocate new major number for binderfs. */ > > > + ret = alloc_chrdev_region(&binderfs_dev, 0, BINDERFS_MAX_MINOR, > > > + "binder"); > > > + if (ret < 0) > > > + return ret; > > > + > > > + ret = register_filesystem(&binder_fs_type); > > > + if (ret) { > > > + unregister_chrdev_region(binderfs_dev, BINDERFS_MAX_MINOR); > > > + return ret; > > > + } > > > + > > > + binderfs_mnt = kern_mount(&binder_fs_type); > > > + if (IS_ERR(binderfs_mnt)) { > > > + ret = PTR_ERR(binderfs_mnt); > > > + binderfs_mnt = NULL; > > > + unregister_filesystem(&binder_fs_type); > > > + unregister_chrdev_region(binderfs_dev, BINDERFS_MAX_MINOR); > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +device_initcall(init_binderfs); > > > diff --git a/include/uapi/linux/android/binder_ctl.h b/include/uapi/linux/android/binder_ctl.h > > > new file mode 100644 > > > index 000000000000..65b2efd1a0a5 > > > --- /dev/null > > > +++ b/include/uapi/linux/android/binder_ctl.h > > > @@ -0,0 +1,35 @@ > > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > > > +/* > > > + * Copyright (C) 2018 Canonical Ltd. > > > + * > > > + */ > > > + > > > +#ifndef _UAPI_LINUX_BINDER_CTL_H > > > +#define _UAPI_LINUX_BINDER_CTL_H > > > + > > > +#include > > > +#include > > > +#include > > > + > > > +#define BINDERFS_MAX_NAME 255 > > > + > > > +/** > > > + * struct binderfs_device - retrieve information about a new binder device > > > + * @name: the name to use for the new binderfs binder device > > > + * @major: major number allocated for binderfs binder devices > > > + * @minor: minor number allocated for the new binderfs binder device > > > + * > > > + */ > > > +struct binderfs_device { > > > + char name[BINDERFS_MAX_NAME + 1]; > > > + __u8 major; > > > + __u8 minor; > > > +}; > > > + > > > +/** > > > + * Allocate a new binder device. > > > + */ > > > +#define BINDER_CTL_ADD _IOWR('b', 1, struct binderfs_device) > > > + > > > +#endif /* _UAPI_LINUX_BINDER_CTL_H */ > > > + > > > diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > > > index 96c24478d8ce..f8c00045d537 100644 > > > --- a/include/uapi/linux/magic.h > > > +++ b/include/uapi/linux/magic.h > > > @@ -73,6 +73,7 @@ > > > #define DAXFS_MAGIC 0x64646178 > > > #define BINFMTFS_MAGIC 0x42494e4d > > > #define DEVPTS_SUPER_MAGIC 0x1cd1 > > > +#define BINDERFS_SUPER_MAGIC 0x6c6f6f70 > > > #define FUTEXFS_SUPER_MAGIC 0xBAD1DEA > > > #define PIPEFS_MAGIC 0x50495045 > > > #define PROC_SUPER_MAGIC 0x9fa0 > > > -- > > > 2.19.1 > > >