Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp801337imm; Tue, 15 May 2018 09:21:27 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqGxNQpP7Oe+ma9kJmK62MBLVRTcNlhU5UJtqvMAGDb/5lAXr37e86t9DQhuoIdBYwmr9rS X-Received: by 2002:a62:89db:: with SMTP id n88-v6mr15701117pfk.11.1526401287226; Tue, 15 May 2018 09:21:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526401287; cv=none; d=google.com; s=arc-20160816; b=GR6Vasdyp8fBqDEiCMtIpp9bm7arYyr+0getVBIeXl2dBJKtr7Vedurep+tRZoJwxG Q6yUqagUXHKE/GSQTLeqNThMSuvxWkCkFVpg1ikiD/ApMeo9p97ka+uczS9tWnfaw4Aw dr8eWHUoLNpVM+tQyztC2Ytkx9DBR58GHKy4ybjyZl5/i4nM3HdoBXuazhODKw8XnvpT vPioJYhTyHaig3Uptpp+Kg+Xuy1+NUxwsMhPwibp2zDk5N2xbPcE7OPaos8JFyYBhre1 MI4w//3qcbchjiNrdeP96068gD6dR/Gor+8738jpT2qQ9TsGDJHK7gPitRMuS8Bgzesz 8BJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=xt4jMlpU1MDL6prVLBu06jW7EEDWTZ2fdQG9GZgNFKM=; b=MTgGYliFgPVydk5r7mk3A/bV7v7LNXv8oZ2WdgH4b22dFFXpksxd8uwppALzj2Ceag SCQtoOHTlGGrqVq9dzP9g6eot22M8K5JEO/M1vHzGYsrylly9vHpP/+4wQWpo2ZG/XNq krXJOgD52jnYDPLsIHpTLliyIYWP9qFd3yc1p35mK9EH3Dp/YDrIhBFGgjQFhlQthbqT VfkxArqBl58npNYhN+FoXQDpUWVE7Hrasao+XdE8EkdCE438iy2O1S0rcyLHce6Albdz 34bHRnleF7ttfsQ397caKsu3Yo+CB+EATgEW9hX+robmMX8coFIb91rof5SiB+u/25xb WvyA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Sq5Hjf8s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m4-v6si253335pgv.306.2018.05.15.09.21.12; Tue, 15 May 2018 09:21:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Sq5Hjf8s; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753559AbeEOQTm (ORCPT + 99 others); Tue, 15 May 2018 12:19:42 -0400 Received: from mail-ot0-f195.google.com ([74.125.82.195]:41773 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753039AbeEOQTj (ORCPT ); Tue, 15 May 2018 12:19:39 -0400 Received: by mail-ot0-f195.google.com with SMTP id t1-v6so860670oth.8 for ; Tue, 15 May 2018 09:19:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=xt4jMlpU1MDL6prVLBu06jW7EEDWTZ2fdQG9GZgNFKM=; b=Sq5Hjf8sAQMMrQM90/v6UDTMSP8gNtzBNYyBeIkmMobtuQ71XEWxyPolaPWw/iV4xd Q+4nupHxqRTEvCfSrqBJLIMrUUqiCm/XxchbZL1h1X71h2UycO13+9ty2MgK2FaVp7j7 f09kAvjOIRkgfZUI+EOq1gf2pmAy0NMsUxjGUqVvCiFjJ53+lWknjQNiRlIsi31rzlyy vKduyUD2G5Cfr5DW+q+HAyp8+Ylg/Lowak8BQNRn0CHZ4KZXaLoFWooYYC5lHShpoogx TDQzoKff8gv9lOwveujI6kWkA3+cydl+zLLb5bHo9NQqWRCaNWZYljCfURZsQTERpiyg SidA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=xt4jMlpU1MDL6prVLBu06jW7EEDWTZ2fdQG9GZgNFKM=; b=GeTc72J8oUw1BfFHAgU+gwQoNiMBQdj4Yz7fRW4uu1Euva8JPgP/DjbOE1kBq1htmd 0Uk72ruC7QRkxPvF0G8nLN3VW0wHsG+LT/EuBKTrKu4nvLUPw/wuhMukNrefNLZKN97v /7ZKswSgETaqYkKF5oW2UQo4QTMnTI6Zan/DkB46hpTPzR8vo5ACnnjxRJzYat4miZAu b7pIGwtoH/qWv8VvBbME7g7e+cj77ZcSlJUUSBGy4XQfwI35Vyu08NaaBYf/vLrm0fVQ GrPQoYG85/X779HdJ5irkfPZmkKYoKM0Mmr/eax6PxZjcrCbR198+ry/1dAMgFCSmX4S 9cdg== X-Gm-Message-State: ALKqPwfkiwFHAXnsvUwEUuxXg2PKUeEw23UW6VtuuJ4zkkfN1Bbe/hMv ly99Ylk6VmxHd3AXnG/iHuxodFLFKWHTZ5IdMOYunw== X-Received: by 2002:a9d:210a:: with SMTP id i10-v6mr10766790otb.72.1526401178816; Tue, 15 May 2018 09:19:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.145.144 with HTTP; Tue, 15 May 2018 09:19:18 -0700 (PDT) In-Reply-To: <20180515072153.GD28179@comp-core-i7-2640m-0182e6> References: <20180511093445.GA1008@comp-core-i7-2640m-0182e6> <20180515072153.GD28179@comp-core-i7-2640m-0182e6> From: Jann Horn Date: Tue, 15 May 2018 18:19:18 +0200 Message-ID: Subject: Re: [PATCH v5 1/7] proc: add proc_fs_info struct to store proc information To: Alexey Gladkov Cc: Kees Cook , Andy Lutomirski , Andrew Morton , linux-fsdevel@vger.kernel.org, kernel list , Kernel Hardening , linux-security-module , Linux API , Greg Kroah-Hartman , Alexander Viro , Akinobu Mita , Oleg Nesterov , Jeff Layton , Ingo Molnar , Alexey Dobriyan , "Eric W. Biederman" , Linus Torvalds , aniel Micay , Jonathan Corbet , Bruce Fields , Stephen Rothwell , Solar Designer , "Dmitry V. Levin" , Djalal Harouni Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 15, 2018 at 9:21 AM, Alexey Gladkov wrote: > On Fri, May 11, 2018 at 03:49:13PM +0200, Jann Horn wrote: >> On Fri, May 11, 2018 at 11:34 AM, Alexey Gladkov >> wrote: >> > From: Djalal Harouni >> > >> > This is a preparation patch that adds proc_fs_info to be able to store >> > different procfs options and informations. Right now some mount options >> > are stored inside the pid namespace which makes it hard to change or >> > modernize procfs without affecting pid namespaces. Plus we do want to >> > treat proc as more of a real mount point and filesystem. procfs is part >> > of Linux API where it offers some features using filesystem syscalls and >> > in order to support some features where we are able to have multiple >> > instances of procfs, each one with its mount options inside the same pid >> > namespace, we have to separate these procfs instances. >> > >> > This is the same feature that was also added to other Linux interfaces >> > like devpts in order to support containers, sandboxes, and to have >> > multiple instances of devpts filesystem [1]. >> > >> > [1] http://lxr.free-electrons.com/source/Documentation/filesystems/devpts.txt?v=3.14 >> > >> > Cc: Kees Cook >> > Suggested-by: Andy Lutomirski >> > Signed-off-by: Djalal Harouni >> > Signed-off-by: Alexey Gladkov >> > --- >> [...] >> > static struct dentry *proc_mount(struct file_system_type *fs_type, >> > int flags, const char *dev_name, void *data) >> > { >> > + int error; >> > + struct super_block *sb; >> > struct pid_namespace *ns; >> > + struct proc_fs_info *fs_info; >> > + >> > + /* >> > + * Don't allow mounting unless the caller has CAP_SYS_ADMIN over >> > + * the namespace. >> > + */ >> > + if (!(flags & MS_KERNMOUNT) && !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) >> > + return ERR_PTR(-EPERM); >> >> Is this correct? >> >> The old code invoked a check with the same comment through mount_ns(); >> however, this patch changes the semantics of the check. >> The old code checked that the caller has privileges over the user >> namespace that contains the PID namespace; in other words, it checked >> that the caller has privileges over the PID namespace. The current >> code just checks that the caller is privileged over its own user >> namespace. >> >> As far as I can tell, this means that by doing something like this: >> >> unshare(CLONE_NEWNS|CLONE_NEWUSER); >> mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL); >> mount("proc", "/proc", "proc", 0, "newinstance,pids=all"); >> >> any process could create a new unrestricted procfs mount for its PID >> namespace, even if it is only supposed to have access to a more >> restricted procfs mount. > > Hm... let me investigate this. It looks like mount with "newinstance" > option should fail if pid namespace is the same and the current and parent > user namespace do not match. I don't understand that last sentence. What does "if pid namespace is the same" mean, and what does "current and parent user namespace do not match" mean? Just changing "ns_capable(current_user_ns(), CAP_SYS_ADMIN)" to "ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)" should be enough to get the old semantics again: It checks whether the current task is capable over its PID namespace.