Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp234938imm; Tue, 15 May 2018 00:31:45 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoXJegrbvMzREN1g7uY/wGQlYDGL1kAcoanU1euBUB7HCyTYFK+t081tdyotpBVRVrGoGZD X-Received: by 2002:a17:902:108a:: with SMTP id c10-v6mr13611174pla.111.1526369505928; Tue, 15 May 2018 00:31:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526369505; cv=none; d=google.com; s=arc-20160816; b=m4IvyWkGUiuxbK7g1QdCV/fQ1QjUPNaFw63LSbTZsazqdrrephjgmQWSvELcMKoHZb RW2IQFzvjof+WUfrjbaSBAqcXedWMJltHvAKPOsT0CvA/EkEyuThrJYkPTa6PVUoTXTt swnNBKOEaPa70AD0vSoSYkZMYQ4CCUVcWowr2BuIsNW26UM7FouZfO9rbXRKqLA7elUY QnXa69roCnjngdrAhx5LifV0JawUImCeiY0VoC3eIG5s+/0XUI4dsmqoFYHV4d9vTPAZ LgUUlm/NdPjRFx/u4El8pXEsM06zP++8uNusBtB4bICUrJGaMGS46pV6vCfVuRpbv6an IDUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=qSROvJe8kjVYt3x2G2upuUhZ7vVR5uZa+n85QQhd9JI=; b=pbuKZy9sI8Luui61kgBmq8Rw75bPpa/qDwnO9/FPqGPOpFUEtgEjN5nkvfSVojONzg RhMN4Whj+zmkC4+j7rEzlocNXXYxm+t+jmpr9SLPTAh12B7kz73jwHzDRjSqP/sFjPDe xGQ6xUFMsonC0Jb3maWZRUd8Q1VvKJ4woWefraYdOXQ9KUJdya8B2EXY1djj/09hwtqC kaJoeOR64La8i1EKRdOKNKHxqbD833woncdIQTCPaMfhUtmm/r/I+FxNqhFxKHDGDpzT zvh0IvXsLWYeLnCkR0Ty731+ZMJJNMRS48BcGDEvDB/y0dK5/Poze15kOK3BAu6SZe0n Pf3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m26-v6si11037659pfa.45.2018.05.15.00.31.31; Tue, 15 May 2018 00:31:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752483AbeEOH3u (ORCPT + 99 others); Tue, 15 May 2018 03:29:50 -0400 Received: from monster.unsafe.ru ([5.9.28.80]:44388 "EHLO mail.unsafe.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752241AbeEOH3s (ORCPT ); Tue, 15 May 2018 03:29:48 -0400 Received: from comp-core-i7-2640m-0182e6 (nat-pool-brq-t.redhat.com [213.175.37.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.unsafe.ru (Postfix) with ESMTPSA id 5EDBFC6181B; Tue, 15 May 2018 07:29:32 +0000 (UTC) Date: Tue, 15 May 2018 09:21:54 +0200 From: Alexey Gladkov To: Jann Horn Cc: Kees Cook , Andy Lutomirski , Andrew Morton , linux-fsdevel@vger.kernel.org, kernel list , Kernel Hardening , linux-security-module , Linux API , Greg Kroah-Hartman , Alexander Viro , Akinobu Mita , Oleg Nesterov , Jeff Layton , Ingo Molnar , Alexey Dobriyan , "Eric W. Biederman" , Linus Torvalds , aniel Micay , Jonathan Corbet , bfields@fieldses.org, Stephen Rothwell , Solar Designer , "Dmitry V. Levin" , Djalal Harouni Subject: Re: [PATCH v5 1/7] proc: add proc_fs_info struct to store proc information Message-ID: <20180515072153.GD28179@comp-core-i7-2640m-0182e6> References: <20180511093445.GA1008@comp-core-i7-2640m-0182e6> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 11, 2018 at 03:49:13PM +0200, Jann Horn wrote: > On Fri, May 11, 2018 at 11:34 AM, Alexey Gladkov > wrote: > > From: Djalal Harouni > > > > This is a preparation patch that adds proc_fs_info to be able to store > > different procfs options and informations. Right now some mount options > > are stored inside the pid namespace which makes it hard to change or > > modernize procfs without affecting pid namespaces. Plus we do want to > > treat proc as more of a real mount point and filesystem. procfs is part > > of Linux API where it offers some features using filesystem syscalls and > > in order to support some features where we are able to have multiple > > instances of procfs, each one with its mount options inside the same pid > > namespace, we have to separate these procfs instances. > > > > This is the same feature that was also added to other Linux interfaces > > like devpts in order to support containers, sandboxes, and to have > > multiple instances of devpts filesystem [1]. > > > > [1] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14 > > > > Cc: Kees Cook > > Suggested-by: Andy Lutomirski > > Signed-off-by: Djalal Harouni > > Signed-off-by: Alexey Gladkov > > --- > [...] > > static struct dentry *proc_mount(struct file_system_type *fs_type, > > int flags, const char *dev_name, void *data) > > { > > + int error; > > + struct super_block *sb; > > struct pid_namespace *ns; > > + struct proc_fs_info *fs_info; > > + > > + /* > > + * Don't allow mounting unless the caller has CAP_SYS_ADMIN over > > + * the namespace. > > + */ > > + if (!(flags & MS_KERNMOUNT) && !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) > > + return ERR_PTR(-EPERM); > > Is this correct? > > The old code invoked a check with the same comment through mount_ns(); > however, this patch changes the semantics of the check. > The old code checked that the caller has privileges over the user > namespace that contains the PID namespace; in other words, it checked > that the caller has privileges over the PID namespace. The current > code just checks that the caller is privileged over its own user > namespace. > > As far as I can tell, this means that by doing something like this: > > unshare(CLONE_NEWNS|CLONE_NEWUSER); > mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL); > mount("proc", "/proc", "proc", 0, "newinstance,pids=all"); > > any process could create a new unrestricted procfs mount for its PID > namespace, even if it is only supposed to have access to a more > restricted procfs mount. Hm... let me investigate this. It looks like mount with "newinstance" option should fail if pid namespace is the same and the current and parent user namespace do not match. -- Rgrds, legion