Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3426545imm; Mon, 8 Oct 2018 03:59:34 -0700 (PDT) X-Google-Smtp-Source: ACcGV62PffjwEXmhctseesoomlc+zKTN5WpTMTQciwr/ljKesYsN/a9dWXaha2W9o98g90gR6Wyj X-Received: by 2002:a62:120b:: with SMTP id a11-v6mr22198739pfj.165.1538996374050; Mon, 08 Oct 2018 03:59:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538996374; cv=none; d=google.com; s=arc-20160816; b=y/T//+bd6yIKwAVSwtY3kYJ2UrRVvMw+OG27Y/sBaNkIJtMgbJRJp3PXxhZXeHKmhR +yxh+dQLZbTwYUhOrgOTD35hbcvuVgB8PI1AzcBaNwJY3InaX9wv4NXG0fhUS+cEI/v5 Q2AGGDH7rn+dDKOdX0dcpa2xAKGGTVADFNLj9JgrESnSHwG2xJY6QeW8eMqcGZpkY4wa KZfARRBN0fGLZ0qz6B4FzZ1WhUlFOWc++0KdS7kxYcI6J878KmmeI2kRimhYK4J/Ha91 5zv00gULXKDoHlvZidxsWGiGOk/g3SCyGeAmhWU3rmb7mCXkhuor0yaozAKcwY5cZw+8 TtTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ApICHchZOvrUbgw6LQgKbxDEtLCUPiixEnHG2TUL65M=; b=GO/xRTHYpcr2hlkW2lmlHZVZT4FpJIgCl3fgBaPP/YOAlNgGb3R5d8pc8czH9aKkhp qNtjNFlagzIxkP59y9y65oqzWgxa1TH3Nax5wxxTfXjxhqmKr3ZJrKxmRIf3mriBhiBg cuhS32/Vz3JWutIdm+mPhmwGnKxntsXVgs+7KZotImaWAY4J6/WXjZ3MvB/4HnQX3hl+ NcZdy+hO+aJa/1agx/AaYrcCCvVZHWFqVurPcycUhcUDFEOw/hnX3fS1HV1KnJd3oOPN lbCn0CAGIjFv6fv79CJUxmo62Ire5/jDKBhpeX0GWJdc/ZVa/V48yg0P7ilE5iK+Jcr5 I1XQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=djoTiV57; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n5-v6si17546896pgh.397.2018.10.08.03.59.19; Mon, 08 Oct 2018 03:59:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=djoTiV57; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727781AbeJHSKK (ORCPT + 99 others); Mon, 8 Oct 2018 14:10:10 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:35089 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727745AbeJHSKJ (ORCPT ); Mon, 8 Oct 2018 14:10:09 -0400 Received: by mail-ot1-f66.google.com with SMTP id j9-v6so19171621otl.2 for ; Mon, 08 Oct 2018 03:59:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ApICHchZOvrUbgw6LQgKbxDEtLCUPiixEnHG2TUL65M=; b=djoTiV57bcKuUR3s3b6oDe9Z8xaDBTyUp4ptiufRn3MsJB3s050LGJXqgpBEiDz6Gx ZBcTzjiZ4Kh4QvSGw7+14o0F4mQduEgfzUhPSdg1s1Tqf2j42h2oEYjhokSnbdb1DHbV YPpvPS4pdhVfGVWuUdOyULUvfbi6oJ//2DQSe9ih1CphweYBQhLaqxStV1Dsp+0E7JOD i0aR9ejv0cLXQfGhI7wWREvBfNT39MWhGHXdeZ6uspkzJ0fRJ0Hl7zXln140YR3NwGyG bkgpgqBXqhB1oVe0Ch5p64c3ZJ7//l8P7egysNhQGVKHjj11+M2PRqPf01XsJJIeg7S9 VTsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ApICHchZOvrUbgw6LQgKbxDEtLCUPiixEnHG2TUL65M=; b=ERNzTSWhMFQeasZ+vQoODglCNobeVjNRiQl3kQrzfd/tgNL6lPPooMzC2FYUEyJzbl kGqrxwYdVGFIYzYVwE5esPCOJRYsxnUj0xHCVBWH4xuNJqIHxJIKKId36CERsmJbASqx SGQvEX0Rn6/VXwvpxOIb/jnbA1hbHBchtz8EEUicVt/LJb/MsdbKqPzpFyFrQNqiQsLA H1ktkKeLxk/KW69rE8d+yUKKKn3C+wauvfNXZHKx2lWSJjwy/Ai6AoSa21L7YDejzNNP TgvQb/FgldJfM8LBRtmKjUvQmGXHchO2owuBK4vU2+uVizuzTvbVS/yIFjZGiBLIqq2W YTBA== X-Gm-Message-State: ABuFfogza7r2lDK/3WQbkoSveBPColqs1yATgxzCs5LKMJNED5nIfsdv lwIm1h0tRsC0u1zGbWpXYYCDvgZ2B41WCsjNYzcatoBPl2I= X-Received: by 2002:a9d:2117:: with SMTP id i23mr12791645otb.230.1538996339837; Mon, 08 Oct 2018 03:58:59 -0700 (PDT) MIME-Version: 1.0 References: <20181003225022.32033-1-laurent@vivier.eu> <20181003225022.32033-2-laurent@vivier.eu> <20181006060427.GA15164@gmail.com> In-Reply-To: <20181006060427.GA15164@gmail.com> From: Jann Horn Date: Mon, 8 Oct 2018 12:58:33 +0200 Message-ID: Subject: Re: [RFC v3 1/1] ns: add binfmt_misc to the user namespace To: avagin@gmail.com Cc: Laurent Vivier , kernel list , Andrei Vagin , Linux API , containers@lists.linux-foundation.org, dima@arista.com, James Bottomley , Al Viro , linux-fsdevel@vger.kernel.org, "Eric W. Biederman" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 6, 2018 at 8:04 AM Andrei Vagin wrote: > On Thu, Oct 04, 2018 at 12:50:22AM +0200, Laurent Vivier wrote: > > This patch allows to have a different binfmt_misc configuration > > for each new user namespace. By default, the binfmt_misc configuration > > is the one of the host, but if the binfmt_misc filesystem is mounted > > in the new namespace a new empty binfmt instance is created and used > > in this namespace. > > > > For instance, using "unshare" we can start a chroot of an another > > architecture and configure the binfmt_misc interpreter without being root > > to run the binaries in this chroot. > > > > Signed-off-by: Laurent Vivier > > --- > > fs/binfmt_misc.c | 85 +++++++++++++++++++++++----------- > > include/linux/user_namespace.h | 15 ++++++ > > kernel/user.c | 14 ++++++ > > kernel/user_namespace.c | 9 ++++ > > 4 files changed, 95 insertions(+), 28 deletions(-) > > > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > > index aa4a7a23ff99..78780bc87506 100644 > > --- a/fs/binfmt_misc.c > > +++ b/fs/binfmt_misc.c > > @@ -38,9 +38,6 @@ enum { > > VERBOSE_STATUS = 1 /* make it zero to save 400 bytes kernel memory */ > > }; > > > > -static LIST_HEAD(entries); > > -static int enabled = 1; > > - > > enum {Enabled, Magic}; > > #define MISC_FMT_PRESERVE_ARGV0 (1 << 31) > > #define MISC_FMT_OPEN_BINARY (1 << 30) > > @@ -60,10 +57,7 @@ typedef struct { > > struct file *interp_file; > > } Node; > > > > -static DEFINE_RWLOCK(entries_lock); > > static struct file_system_type bm_fs_type; > > -static struct vfsmount *bm_mnt; > > -static int entry_count; > > > > /* > > * Max length of the register string. Determined by: > > @@ -85,13 +79,13 @@ static int entry_count; > > * if we do, return the node, else NULL > > * locking is done in load_misc_binary > > */ > > -static Node *check_file(struct linux_binprm *bprm) > > +static Node *check_file(struct user_namespace *ns, struct linux_binprm *bprm) > > { > > char *p = strrchr(bprm->interp, '.'); > > struct list_head *l; > > > > /* Walk all the registered handlers. */ > > - list_for_each(l, &entries) { > > + list_for_each(l, &ns->binfmt_ns->entries) { > > Node *e = list_entry(l, Node, list); > > char *s; > > int j; > > @@ -133,17 +127,18 @@ static int load_misc_binary(struct linux_binprm *bprm) > > struct file *interp_file = NULL; > > int retval; > > int fd_binary = -1; > > + struct user_namespace *ns = current_user_ns(); > > > > retval = -ENOEXEC; > > - if (!enabled) > > + if (!ns->binfmt_ns->enabled) > > return retval; > > > > /* to keep locking time low, we copy the interpreter string */ > > - read_lock(&entries_lock); > > - fmt = check_file(bprm); > > + read_lock(&ns->binfmt_ns->entries_lock); > > It looks like ns->binfmt_ns isn't protected by any lock and > ns->binfmt_ns can be changed between read_lock() and read_unlock(). > > This can be fixed if ns->binfmt_ns will be dereferenced only once in > this function: > > struct binfmt_namespace *binfmt_ns = ns->binfmt_ns; Technically, wouldn't you want READ_ONCE(ns->binfmt_ns)?