Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp651805pxu; Sun, 22 Nov 2020 23:45:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJyvZSa6/fWC9KQdZQ58Zve5VtDfRbPy1cF/Gh0GAQSVru8iq8Qv98NAV8iWeq9jGmBBffQ/ X-Received: by 2002:a17:906:6c2:: with SMTP id v2mr668990ejb.387.1606117537932; Sun, 22 Nov 2020 23:45:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606117537; cv=none; d=google.com; s=arc-20160816; b=OlpX0XvAQ3kBg/1EZ7Ofenjj+0dk+8h5f87RKm156TY8a1hPdOYPoFeZxjCIastIZV YavshihG7hgWrEOr0Ik1ukKt7EzA57VWr1S/mfQ7+/qhDcmQ3HiDvDcpTTE4uEgnNtod JrPOx04dZkhI+SWK2MEkdKDueURjozrkNTQ/GLuzSlQLXXK3VUq7qM9j/usFT91WDaCD TRNLwt+BZzVAhqOlR/8JgCXlGfJG/bO6VVOaaCrAhJsjsOGpVex4zQDBncbRJcufReIc Yo9twViZnF/rL9Z2UwtyCXtmftIZkvPxBjZQg4vC+Dy9tPgQRlxh1sixe+SgKKZgdtI0 Ai+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=T6R7K+L9w8HEsug3K1DTs0wyxFArG8P93hb1zXU4g6E=; b=pOEgyPU3opNBJNf353YJDVS9/38qenPpaeaPw6zcm1ZgU0gCFRd1ulMwJg6aR0EmIB nuEIqFMEJ9KgxwFGWmJbgkZXgi3lNM2FcM4iMqCEMc23Z6jDhIahG42yf808QaAI+cPl SYpG14gkHBPQ2W0DsTskOjjfrmpLRaRJrIbwpYxdOtHW3EL8iQK3bevt2sgLg+bry3Um OzcvYXtqGpDPFjiW0e6X9U2kkOvJyCzUxV+/3pw4oA+BYI9DOdQOVAHb5ED8POWJyT4s Jk2mKVTaljduXp0xcRT685b6wmaujyfVNKnz+2SeE+6vVgON4p5Sai357RbM4UdXxkRg ReNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bs15si2216698edb.427.2020.11.22.23.45.13; Sun, 22 Nov 2020 23:45:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726302AbgKWHpM (ORCPT + 99 others); Mon, 23 Nov 2020 02:45:12 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:33563 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725320AbgKWHpM (ORCPT ); Mon, 23 Nov 2020 02:45:12 -0500 Received: from ip5f5af0a0.dynamic.kabel-deutschland.de ([95.90.240.160] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kh6X5-0007Jl-9M; Mon, 23 Nov 2020 07:45:07 +0000 Date: Mon, 23 Nov 2020 08:45:05 +0100 From: Christian Brauner To: Paul Moore Cc: Alexander Viro , Christoph Hellwig , linux-fsdevel@vger.kernel.org, John Johansen , James Morris , Mimi Zohar , Dmitry Kasatkin , Stephen Smalley , Casey Schaufler , Arnd Bergmann , Andreas Dilger , OGAWA Hirofumi , Geoffrey Thomas , Mrunal Patel , Josh Triplett , Andy Lutomirski , Theodore Tso , Alban Crequy , Tycho Andersen , David Howells , James Bottomley , Jann Horn , Seth Forshee , =?utf-8?B?U3TDqXBoYW5l?= Graber , Aleksa Sarai , Lennart Poettering , "Eric W. Biederman" , smbarber@chromium.org, Phil Estes , Serge Hallyn , Kees Cook , Todd Kjos , Jonathan Corbet , containers@lists.linux-foundation.org, linux-security-module@vger.kernel.org, linux-api@vger.kernel.org, linux-ext4@vger.kernel.org, linux-audit@redhat.com, linux-integrity@vger.kernel.org, selinux@vger.kernel.org, Christoph Hellwig Subject: Re: [PATCH v2 14/39] commoncap: handle idmapped mounts Message-ID: <20201123074505.ds5hpqo5kgyvjksb@wittgenstein> References: <20201115103718.298186-1-christian.brauner@ubuntu.com> <20201115103718.298186-15-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Sun, Nov 22, 2020 at 04:18:55PM -0500, Paul Moore wrote: > On Sun, Nov 15, 2020 at 5:39 AM Christian Brauner > wrote: > > When interacting with user namespace and non-user namespace aware > > filesystem capabilities the vfs will perform various security checks to > > determine whether or not the filesystem capabilities can be used by the > > caller (e.g. during exec), or even whether they need to be removed. The > > main infrastructure for this resides in the capability codepaths but they > > are called through the LSM security infrastructure even though they are not > > technically an LSM or optional. This extends the existing security hooks > > security_inode_removexattr(), security_inode_killpriv(), > > security_inode_getsecurity() to pass down the mount's user namespace and > > makes them aware of idmapped mounts. > > In order to actually get filesystem capabilities from disk the capability > > infrastructure exposes the get_vfs_caps_from_disk() helper. For user > > namespace aware filesystem capabilities a root uid is stored alongside the > > capabilities. > > In order to determine whether the caller can make use of the filesystem > > capability or whether it needs to be ignored it is translated according to > > the superblock's user namespace. If it can be translated to uid 0 according > > to that id mapping the caller can use the filesystem capabilities stored on > > disk. If we are accessing the inode that holds the filesystem capabilities > > through an idmapped mount we need to map the root uid according to the > > mount's user namespace. > > Afterwards the checks are identical to non-idmapped mounts. Reading > > filesystem caps from disk enforces that the root uid associated with the > > filesystem capability must have a mapping in the superblock's user > > namespace and that the caller is either in the same user namespace or is a > > descendant of the superblock's user namespace. For filesystems that are > > mountable inside user namespace the container can just mount the filesystem > > and won't usually need to idmap it. If it does create an idmapped mount it > > can mark it with a user namespace it has created and which is therefore a > > descendant of the s_user_ns. For filesystems that are not mountable inside > > user namespaces the descendant rule is trivially true because the s_user_ns > > will be the initial user namespace. > > > > If the initial user namespace is passed all operations are a nop so > > non-idmapped mounts will not see a change in behavior and will also not see > > any performance impact. > > > > Cc: Christoph Hellwig > > Cc: David Howells > > Cc: Al Viro > > Cc: linux-fsdevel@vger.kernel.org > > Signed-off-by: Christian Brauner > > ... > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > index 8dba8f0983b5..ddb9213a3e81 100644 > > --- a/kernel/auditsc.c > > +++ b/kernel/auditsc.c > > @@ -1944,7 +1944,7 @@ static inline int audit_copy_fcaps(struct audit_names *name, > > if (!dentry) > > return 0; > > > > - rc = get_vfs_caps_from_disk(dentry, &caps); > > + rc = get_vfs_caps_from_disk(&init_user_ns, dentry, &caps); > > if (rc) > > return rc; > > > > @@ -2495,7 +2495,8 @@ int __audit_log_bprm_fcaps(struct linux_binprm *bprm, > > ax->d.next = context->aux; > > context->aux = (void *)ax; > > > > - get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps); > > + get_vfs_caps_from_disk(mnt_user_ns(bprm->file->f_path.mnt), > > + bprm->file->f_path.dentry, &vcaps); > > As audit currently records information in the context of the > initial/host namespace I'm guessing we don't want the mnt_user_ns() > call above; it seems like &init_user_ns would be the right choice > (similar to audit_copy_fcaps()), yes? Ok, sounds good. It also makes the patchset simpler. Note that I'm currently not on the audit mailing list so this is likely not going to show up there. (Fwiw, I responded to you in your other mail too.) Christian