Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1508605ybl; Tue, 3 Dec 2019 08:14:30 -0800 (PST) X-Google-Smtp-Source: APXvYqzR4z0hus5ogN5CDctLQFV2WOJhSgaQKK1QX87BSiYiPABSMGhpEv/5+O32kKrjiKIpu3KP X-Received: by 2002:a9d:5cca:: with SMTP id r10mr3503512oti.57.1575389670261; Tue, 03 Dec 2019 08:14:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575389670; cv=none; d=google.com; s=arc-20160816; b=HU0x0ftvWkW7ykmqHONBD8Eusyu7lLadNOVUKQB4pld9TPyvl+iGV6e8spzyS7E1f6 NBTW5UCc15R2b1GIT4h+DWQjzk1q0W9PhA40aFFQ4Ytm3Hhu96+T3IM3ep7EYP7E2B0N 0ymhZQroI5I3nLLjQdeHoB3Y+XQvFrIRPEFwgMMpXw5/ugVMs7EozSiml8+A9/OfNh81 bxYN5JS5LgNiTaohVxdyRcCzfuXSyfNzW0tFhPYkzTMSc7Tmn0uLNdaKThNwLX79JxZG 5+XSEI2Pb6N6XWj55sMyN3Yy9M+NeYASjihJUGHO92qPVZaM0znKND3QUtqPeynNzGty rKZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=a54yD/g51GfoLgGAQHu36HUdBMKbUrAD/zgzCuk0e/8=; b=KA+F/8U2wcYridQXD02y4JxCPtE6VvX0/yFLjQZ0VLXiMxUUvUPDjpyG9HB8Jmjkmw Tk6VWarYti2QAtM2WIc7YUSt3+DX4ub1yx9uVaD5ayctC0Fo9Cfcbe7m8Qj3LHFqsd/F VJ/wXFTAUTQpDJe+kjK1nE0jYUTVJewhZHYSgLcsPC0U2HOSaRZ/5Blk7CQWM1+VaHvm RO3+L5B1dMU6FlUNZjOv0R5OAPtdUm6z6rTlnrB+nU+0mGYorPydNhebmq1nf0bn89XF 7LHmbUQ/sJTHkbbk5PZ/YSC/qvIGDSbCNptX2XuzIdO354O5W5gEA685jO/9HKBPdkz8 EIrg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l4si1466068otq.55.2019.12.03.08.14.13; Tue, 03 Dec 2019 08:14:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726418AbfLCPmf (ORCPT + 99 others); Tue, 3 Dec 2019 10:42:35 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:50160 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725848AbfLCPmf (ORCPT ); Tue, 3 Dec 2019 10:42:35 -0500 Received: from [81.92.17.147] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1icAHg-0004aI-Cd; Tue, 03 Dec 2019 15:40:16 +0000 Date: Tue, 3 Dec 2019 16:40:13 +0100 From: Christian Brauner To: Laurent Vivier Cc: linux-kernel@vger.kernel.org, Henning Schild , Dmitry Safonov , linux-api@vger.kernel.org, containers@lists.linux-foundation.org, Jann Horn , Greg Kurz , Alexander Viro , James Bottomley , Eric Biederman , Jan Kiszka , linux-fsdevel@vger.kernel.org, =?utf-8?Q?C=C3=A9dric?= Le Goater , dhowells@redhat.com Subject: Re: [PATCH v7 1/1] ns: add binfmt_misc to the user namespace Message-ID: <20191203154012.opirpicsuc6ou3cx@wittgenstein> References: <20191107140304.8426-1-laurent@vivier.eu> <20191107140304.8426-2-laurent@vivier.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191107140304.8426-2-laurent@vivier.eu> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 07, 2019 at 03:03:04PM +0100, Laurent Vivier wrote: > This patch allows to have a different binfmt_misc configuration > for each new user namespace. By default, the binfmt_misc configuration > is the one of the previous level, but if the binfmt_misc filesystem is > mounted in the new namespace a new empty binfmt instance is created and > used in this namespace. > > For instance, using "unshare" we can start a chroot of another > architecture and configure the binfmt_misc interpreter without being root > to run the binaries in this chroot. > > Signed-off-by: Laurent Vivier > Acked-by: Andrei Vagin > --- > fs/binfmt_misc.c | 115 +++++++++++++++++++++++++-------- > include/linux/user_namespace.h | 15 +++++ > kernel/user.c | 14 ++++ > kernel/user_namespace.c | 3 + > 4 files changed, 119 insertions(+), 28 deletions(-) > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index cdb45829354d..ba5f0d2ade96 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -40,9 +40,6 @@ enum { > VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ > }; > > -static LIST_HEAD(entries); > -static int enabled = 1; > - > enum {Enabled, Magic}; > #define MISC_FMT_PRESERVE_ARGV0 (1 << 31) > #define MISC_FMT_OPEN_BINARY (1 << 30) > @@ -62,10 +59,7 @@ typedef struct { > struct file *interp_file; > } Node; > > -static DEFINE_RWLOCK(entries_lock); > static struct file_system_type bm_fs_type; > -static struct vfsmount *bm_mnt; > -static int entry_count; > > /* > * Max length of the register string. Determined by: > @@ -82,18 +76,37 @@ static int entry_count; > */ > #define MAX_REGISTER_LENGTH 1920 > > +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns) > +{ > + struct binfmt_namespace *b_ns; > + > + while (ns) { > + b_ns = READ_ONCE(ns->binfmt_ns); > + if (b_ns) > + return b_ns; > + ns = ns->parent; > + } > + /* as the first user namespace is initialized with > + * &init_binfmt_ns we should never come here > + * but we try to stay safe by logging a warning > + * and returning a sane value > + */ > + WARN_ON_ONCE(1); > + return &init_binfmt_ns; > +} > + > /* > * Check if we support the binfmt > * if we do, return the node, else NULL > * locking is done in load_misc_binary > */ > -static Node *check_file(struct linux_binprm *bprm) > +static Node *check_file(struct binfmt_namespace *ns, struct linux_binprm *bprm) > { > char *p = strrchr(bprm->interp, '.'); > struct list_head *l; > > /* Walk all the registered handlers. */ > - list_for_each(l, &entries) { > + list_for_each(l, &ns->entries) { > Node *e = list_entry(l, Node, list); > char *s; > int j; > @@ -135,17 +148,18 @@ static int load_misc_binary(struct linux_binprm *bprm) > struct file *interp_file = NULL; > int retval; > int fd_binary = -1; > + struct binfmt_namespace *ns = binfmt_ns(current_user_ns()); > > retval = -ENOEXEC; > - if (!enabled) > + if (!ns->enabled) > return retval; > > /* to keep locking time low, we copy the interpreter string */ > - read_lock(&entries_lock); > - fmt = check_file(bprm); > + read_lock(&ns->entries_lock); > + fmt = check_file(ns, bprm); > if (fmt) > dget(fmt->dentry); > - read_unlock(&entries_lock); > + read_unlock(&ns->entries_lock); > if (!fmt) > return retval; > > @@ -611,19 +625,19 @@ static void bm_evict_inode(struct inode *inode) > kfree(e); > } > > -static void kill_node(Node *e) > +static void kill_node(struct binfmt_namespace *ns, Node *e) > { > struct dentry *dentry; > > - write_lock(&entries_lock); > + write_lock(&ns->entries_lock); > list_del_init(&e->list); > - write_unlock(&entries_lock); > + write_unlock(&ns->entries_lock); > > dentry = e->dentry; > drop_nlink(d_inode(dentry)); > d_drop(dentry); > dput(dentry); > - simple_release_fs(&bm_mnt, &entry_count); > + simple_release_fs(&ns->bm_mnt, &ns->entry_count); > } > > /* / */ > @@ -653,6 +667,9 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer, > struct dentry *root; > Node *e = file_inode(file)->i_private; > int res = parse_command(buffer, count); > + struct binfmt_namespace *ns; > + > + ns = binfmt_ns(file->f_path.dentry->d_sb->s_user_ns); Sorry for being so late to the party. The naked dereferences here are not very pretty and also likely mean you access the wrong dentry when you do (weird but possible) things like: mount -t overlay overlay -o lowerdir=/proc/sys/fs/binfmt_misc,workdir=/work,upperdir=/upper /merged (which might already cause trouble in other parts of this code) so I think it's better if you replace file->f_path.dentry->d_sb->s_user_ns with file_dentry(file)->d_sb->s_user_ns in places where you do naked dereferences now.