Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2419361pxb; Mon, 19 Apr 2021 05:27:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzH20P7lHZllX4p5Lrl4Lv44m6rDAzvM45LIn7iatQEDeqE9isa6UG+1YPgSps+wZy/t2ba X-Received: by 2002:a05:6402:30a3:: with SMTP id df3mr11492081edb.230.1618835234317; Mon, 19 Apr 2021 05:27:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618835234; cv=none; d=google.com; s=arc-20160816; b=DyUCP+gIuZ2pK69pNDeqT7yDissa7F5pocuuPDm65t0hz4O706qIwk4V6DU5oSMoa/ f6ovMWcTyopv7+6TP6M3JCz19f1wMvJGS4ts5Nb9C+Jp7lQ8Q9Jhkel7MxwrsasxIVbU 0llctnS6UaHzopHrgd+XB5xRom1nVALUmleVVjXwTOJ+tyEIUxHcB48EBnLWu+kWlMNa Gz/uGSaqOmTgEPsu18i2b6m+lRbqgULnsc2x7KYUQ3ftdPrRF31Ne6snvZ2yO97/Enwm N16RhDXIrrFaQopBb7tUk/Dpflxov0B/D5oJLcaGo7YqK2nLCVu7kXcc2/SacUTfe2+z YQAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=ZaXkoD58ljacxZMRTKizBk9TnbZFkM7Pf4YXUvwna5Y=; b=t+r2ZEKgXIUQXklQ2fDrtKATAXmidQTy604J9oB9ou/Yv7FioLA3lPYWWb/ZzqNjM2 ERsLgea3IFCUQ3aSadujFuC54UN9uPZsUDQORpMzrWAhTmKNBjp5RroujpBBDxBiOp/8 joZ591u6HiMtwjvciZV1FPvV1lA454CIm+ck0zRCurakr/4QcdAc34gniBPAzkdgda6C c1dxBXwWglWvdmPP62imEeGQKyOl0WwnKHeJcqptT6i9Jo0UF8jbyyNnuQE5QYMQwofl yPL1rpdKKm7ywWPRaiEtepL8A7jCPWcrjCTk1x7DuiLxmtnZjncgWv05ELxDgkkYOgEq twcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f27si12508784edj.425.2021.04.19.05.26.51; Mon, 19 Apr 2021 05:27:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239133AbhDSMZq (ORCPT + 99 others); Mon, 19 Apr 2021 08:25:46 -0400 Received: from mail.hallyn.com ([178.63.66.53]:48580 "EHLO mail.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239081AbhDSMZp (ORCPT ); Mon, 19 Apr 2021 08:25:45 -0400 Received: by mail.hallyn.com (Postfix, from userid 1001) id B7751821; Mon, 19 Apr 2021 07:25:14 -0500 (CDT) Date: Mon, 19 Apr 2021 07:25:14 -0500 From: "Serge E. Hallyn" To: "Serge E. Hallyn" Cc: Christian Brauner , lkml , Linus Torvalds , Kees Cook , "Andrew G. Morgan" , "Eric W. Biederman" , Greg Kroah-Hartman , security@kernel.org, Tycho Andersen , Andy Lutomirski Subject: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3) Message-ID: <20210419122514.GA20598@mail.hallyn.com> References: <20210416045851.GA13811@mail.hallyn.com> <20210416150501.zam55gschpn2w56i@wittgenstein> <20210416213453.GA29094@mail.hallyn.com> <20210417021945.GA687@mail.hallyn.com> <20210417200434.GA17430@mail.hallyn.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210417200434.GA17430@mail.hallyn.com> User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org cap_setfcap is required to create file capabilities. Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a process running as uid 0 but without cap_setfcap is able to work around this as follows: unshare a new user namespace which maps parent uid 0 into the child namespace. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities are represented as xattrs. File capabilities valid in userns 1 are distinguished from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping parent uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. As a further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. Unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. Root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. Root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Note: an alternative solution would be to allow uid 0 mappings by processes without CAP_SETFCAP, but to prevent such a namespace from writing any file capabilities. This approach can be seen here: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 Signed-off-by: Serge Hallyn Reviewed-by: Andrew G. Morgan Tested-by: Christian Brauner Reviewed-by: Christian Brauner Cc: "Eric W. Biederman" Changelog: * fix logic in the case of writing to another task's uid_map * rename 'ns' to 'map_ns', and make a file_ns local variable * use /* comments */ * update the CAP_SETFCAP comment in capability.h * rename parent_unpriv to parent_can_setfcap (and reverse the logic) * remove printks * clarify (i hope) the code comments * update capability.h comment * renamed parent_can_setfcap to parent_could_setfcap * made the check its own disallowed_0_mapping() fn * moved the check into new_idmap_permitted * rename disallowed_0_mapping to verify_root_mapping * change verify_root_mapping to Christian's suggested flow * correct+clarify comments: parent uid 0 mapping to any child uid is a problem. --- include/linux/user_namespace.h | 3 ++ include/uapi/linux/capability.h | 3 +- kernel/user_namespace.c | 67 +++++++++++++++++++++++++++++++-- 3 files changed, 69 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 64cf8ebdc4ec..f6c5f784be5a 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -63,6 +63,9 @@ struct user_namespace { kgid_t group; struct ns_common ns; unsigned long flags; + /* parent_could_setfcap: true if the creator if this ns had CAP_SETFCAP + * in its effective capability set at the child ns creation time. */ + bool parent_could_setfcap; #ifdef CONFIG_KEYS /* List of joinable keyrings in this namespace. Modification access of diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index c6ca33034147..2ddb4226cd23 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -335,7 +335,8 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_CONTROL 30 -/* Set or remove capabilities on files */ +/* Set or remove capabilities on files. + Map uid=0 into a child user namespace. */ #define CAP_SETFCAP 31 diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index af612945a4d0..609a729a9879 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -106,6 +106,7 @@ int create_user_ns(struct cred *new) if (!ns) goto fail_dec; + ns->parent_could_setfcap = cap_raised(new->cap_effective, CAP_SETFCAP); ret = ns_alloc_inum(&ns->ns); if (ret) goto fail_free; @@ -841,6 +842,62 @@ static int sort_idmaps(struct uid_gid_map *map) return 0; } +/** + * verify_root_map() - check the uid 0 mapping + * @file: idmapping file + * @map_ns: user namespace of the target process + * @new_map: requested idmap + * + * If a process requests mapping parent uid 0 into the new ns, verify that the + * process writing the map had the CAP_SETFCAP capability as the target process + * will be able to write fscaps that are valid in ancestor user namespaces. + * + * Return: true if the mapping is allowed, false if not. + */ +static bool verify_root_map(const struct file *file, + struct user_namespace *map_ns, + struct uid_gid_map *new_map) +{ + int idx; + const struct user_namespace *file_ns = file->f_cred->user_ns; + struct uid_gid_extent *extent0 = NULL; + + for (idx = 0; idx < new_map->nr_extents; idx++) { + u32 lower_first; + + if (new_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS) + extent0 = &new_map->extent[idx]; + else + extent0 = &new_map->forward[idx]; + if (extent0->lower_first == 0) + break; + + extent0 = NULL; + } + + if (!extent0) + return true; + + if (map_ns == file_ns) { + /* The process unshared its ns and is writing to its own + * /proc/self/uid_map. User already has full capabilites in + * the new namespace. Verify that the parent had CAP_SETFCAP + * when it unshared. + * */ + if (!file_ns->parent_could_setfcap) + return false; + } else { + /* Process p1 is writing to uid_map of p2, who is in a child + * user namespace to p1's. Verify that the opener of the map + * file has CAP_SETFCAP against the parent of the new map + * namespace */ + if (!file_ns_capable(file, map_ns->parent, CAP_SETFCAP)) + return false; + } + + return true; +} + static ssize_t map_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos, int cap_setid, @@ -848,7 +905,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, struct uid_gid_map *parent_map) { struct seq_file *seq = file->private_data; - struct user_namespace *ns = seq->private; + struct user_namespace *map_ns = seq->private; struct uid_gid_map new_map; unsigned idx; struct uid_gid_extent extent; @@ -895,7 +952,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, /* * Adjusting namespace settings requires capabilities on the target. */ - if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) + if (cap_valid(cap_setid) && !file_ns_capable(file, map_ns, CAP_SYS_ADMIN)) goto out; /* Parse the user data */ @@ -965,7 +1022,7 @@ static ssize_t map_write(struct file *file, const char __user *buf, ret = -EPERM; /* Validate the user is allowed to use user id's mapped to. */ - if (!new_idmap_permitted(file, ns, cap_setid, &new_map)) + if (!new_idmap_permitted(file, map_ns, cap_setid, &new_map)) goto out; ret = -EPERM; @@ -1086,6 +1143,10 @@ static bool new_idmap_permitted(const struct file *file, struct uid_gid_map *new_map) { const struct cred *cred = file->f_cred; + + if (cap_setid == CAP_SETUID && !verify_root_map(file, ns, new_map)) + return false; + /* Don't allow mappings that would allow anything that wouldn't * be allowed without the establishment of unprivileged mappings. */ -- 2.25.1