Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp2219360lqo; Sun, 19 May 2024 20:38:44 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXKjlbjulofPwjPQcE7NxcSrUD738T9r3RORaMMSMeVqsf4RK6O2b9ygfx7FNkYNnXZWWka8VmF81PkkMQ5f0ixGZjgzt5t6+wuXFWAew== X-Google-Smtp-Source: AGHT+IE/vAoPgecOEsZWVol91SXXsC9TrQP6UuQeiO0qUwbQrL8uGIOUDpTlsvheIxV9uufomi5x X-Received: by 2002:a9d:6d95:0:b0:6f0:7986:7821 with SMTP id 46e09a7af769-6f0e929fc3cmr30057216a34.28.1716176324308; Sun, 19 May 2024 20:38:44 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716176324; cv=pass; d=google.com; s=arc-20160816; b=vvXorRtLivgzzC4gWfYCWe7gGPqwe5q9dc2EYyQYMbBF72dk4xLvxXzY0lnzx78Juc fwHD4VUI/1b63B0UhumGD4QyH9Id35teJNHU1SMq+3UCQ5F3Ej8LpUMxX2+XcvccLWng 4yVTWtnfXRDFPm2V7d8F4Iv8G3cImuJvtdREoBxjc2LNsWZZpO/nKLRm5wLbfM04g4oJ iT8yNAmht/xDR0AWhguSCGzsGPKXKK62JILqWKeOJ5CghSmWa0/FVntuiNYiVzwlHoTp tXJ//kJfj6sLCjmxoW6CTu2fOyB+tMGLENWTg66NrTcgOqDEwqTeYWwmxFDgEsOrpSR6 QBWQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date; bh=ZUHDVqQXov7Ai7lwiwJixM+7KhwJjab2gzZedrl1tt8=; fh=J7+GuMxQCkZAYZuZt7nxhQ+bKzJcqLVisW+cOJ1MV1s=; b=Hq00RzOH12Nsv3l383AM4U3K82QtOqM+PYQvswT9aYq0bau3flWsHWg/2sg1OWOl17 ZkFPFnwYdKH7vHiEI/qJecowV/vKHsn9720AgSbGXK3BHEpi922cNJqaADJpQuPn7odw hBX6ZiEASLWHb4XCA6f623Tm7RRTcQxj5NYF/h+7mr5Y32FAgk3P7RqCscbzoYNS4f5T AZdSpyHKXNrgviW8Yg85Z/Fw4n9zwCLZHY5ch/92G1RM/zHEGfe/GKjjXqdJDZtz/xkr mhQe2R5xOS+lG69ouUvwLkZOXn08b6bo4rCvl45QtE8SGjhXL/1NXjwGsJCCWnA6Uypq Tkpg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=mail.hallyn.com); spf=pass (google.com: domain of linux-kernel+bounces-183397-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-183397-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id 41be03b00d2f7-657b2f30e71si6660571a12.523.2024.05.19.20.38.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 May 2024 20:38:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-183397-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=mail.hallyn.com); spf=pass (google.com: domain of linux-kernel+bounces-183397-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-183397-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 160B02829C5 for ; Mon, 20 May 2024 03:38:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EB75812E68; Mon, 20 May 2024 03:38:33 +0000 (UTC) Received: from mail.hallyn.com (mail.hallyn.com [178.63.66.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14CC6DDAB; Mon, 20 May 2024 03:38:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.63.66.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716176313; cv=none; b=eCWXWahg7upRoQ5DlWDxXS3o0jG4QY8nXGKozSytBi5O9+epauA8URmmF/FkWS2uxhGwamMEIomL2SqfMrxqTLe+0WPpmR3seKEDJTCbP5QdKoJ4qcA0RPNfnkiQ3+X+oi27CvtkNPXqTfr0UK8dkRJY1zpx4wORcVh2ctWIBYg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716176313; c=relaxed/simple; bh=J/8eSJxWNcuvgo9lHIasRKrxSlhP5goxP0wEgoAeBEo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=jTKWyflqlYf/ZMtebQ3mYeK1oBzyCOZ1MRWZ6HZmObbAzMqEEkbonCoY72lhRNBczKMeR1TGZf84kiMKl4szStcb6VHMFhhtzAIaD7o+WgkjDil1vvrC3+a2WPXVJ2iJstuT/vXeW6fAxn1NtPVYsC+mdkc8XmetbEFvC7LGeUg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=hallyn.com; spf=pass smtp.mailfrom=mail.hallyn.com; arc=none smtp.client-ip=178.63.66.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=hallyn.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mail.hallyn.com Received: by mail.hallyn.com (Postfix, from userid 1001) id 8E72A176A; Sun, 19 May 2024 22:38:29 -0500 (CDT) Date: Sun, 19 May 2024 22:38:29 -0500 From: "Serge E. Hallyn" To: Jonathan Calmels Cc: brauner@kernel.org, ebiederm@xmission.com, Luis Chamberlain , Kees Cook , Joel Granados , Serge Hallyn , Paul Moore , James Morris , David Howells , Jarkko Sakkinen , containers@lists.linux.dev, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-security-module@vger.kernel.org, keyrings@vger.kernel.org Subject: Re: [PATCH 3/3] capabilities: add cap userns sysctl mask Message-ID: <20240520033829.GB1816262@mail.hallyn.com> References: <20240516092213.6799-1-jcalmels@3xx0.net> <20240516092213.6799-4-jcalmels@3xx0.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240516092213.6799-4-jcalmels@3xx0.net> On Thu, May 16, 2024 at 02:22:05AM -0700, Jonathan Calmels wrote: > This patch adds a new system-wide userns capability mask designed to mask > off capabilities in user namespaces. > > This mask is controlled through a sysctl and can be set early in the boot > process or on the kernel command line to exclude known capabilities from > ever being gained in namespaces. Once set, it can be further restricted to > exert dynamic policies on the system (e.g. ward off a potential exploit). > > Changing this mask requires privileges over CAP_SYS_ADMIN and CAP_SETPCAP > in the initial user namespace. > > Example: > > # sysctl -qw kernel.cap_userns_mask=0x1fffffdffff && \ > unshare -r grep Cap /proc/self/status > CapInh: 0000000000000000 > CapPrm: 000001fffffdffff > CapEff: 000001fffffdffff > CapBnd: 000001fffffdffff > CapAmb: 0000000000000000 > CapUNs: 000001fffffdffff > > Signed-off-by: Jonathan Calmels Reviewed-by: Serge Hallyn > --- > include/linux/user_namespace.h | 7 ++++ > kernel/sysctl.c | 10 ++++++ > kernel/user_namespace.c | 66 ++++++++++++++++++++++++++++++++++ > 3 files changed, 83 insertions(+) > > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h > index 6030a8235617..e3478bd54ee5 100644 > --- a/include/linux/user_namespace.h > +++ b/include/linux/user_namespace.h > @@ -2,6 +2,7 @@ > #ifndef _LINUX_USER_NAMESPACE_H > #define _LINUX_USER_NAMESPACE_H > > +#include > #include > #include > #include > @@ -14,6 +15,12 @@ > #define UID_GID_MAP_MAX_BASE_EXTENTS 5 > #define UID_GID_MAP_MAX_EXTENTS 340 > > +#ifdef CONFIG_SYSCTL > +extern kernel_cap_t cap_userns_mask; > +int proc_cap_userns_handler(struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos); > +#endif > + > struct uid_gid_extent { > u32 first; > u32 lower_first; > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 81cc974913bb..1546eebd6aea 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -62,6 +62,7 @@ > #include > #include > #include > +#include > #include > > #include "../lib/kstrtox.h" > @@ -1846,6 +1847,15 @@ static struct ctl_table kern_table[] = { > .mode = 0444, > .proc_handler = proc_dointvec, > }, > +#ifdef CONFIG_USER_NS > + { > + .procname = "cap_userns_mask", > + .data = &cap_userns_mask, > + .maxlen = sizeof(kernel_cap_t), > + .mode = 0644, > + .proc_handler = proc_cap_userns_handler, > + }, > +#endif > #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) > { > .procname = "unknown_nmi_panic", > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 53848e2b68cd..e0cf606e9140 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -26,6 +26,66 @@ > static struct kmem_cache *user_ns_cachep __ro_after_init; > static DEFINE_MUTEX(userns_state_mutex); > > +#ifdef CONFIG_SYSCTL > +static DEFINE_SPINLOCK(cap_userns_lock); > +kernel_cap_t cap_userns_mask = CAP_FULL_SET; > + > +int proc_cap_userns_handler(struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + struct ctl_table t; > + unsigned long mask_array[2]; > + kernel_cap_t new_mask, *mask; > + int err; > + > + if (write && (!capable(CAP_SETPCAP) || > + !capable(CAP_SYS_ADMIN))) > + return -EPERM; > + > + /* > + * convert from the global kernel_cap_t to the ulong array to print to > + * userspace if this is a read. > + * > + * capabilities are exposed as one 64-bit value or two 32-bit values > + * depending on the architecture > + */ > + mask = table->data; > + spin_lock(&cap_userns_lock); > + mask_array[0] = (unsigned long) mask->val; > +#if BITS_PER_LONG != 64 > + mask_array[1] = mask->val >> BITS_PER_LONG; > +#endif > + spin_unlock(&cap_userns_lock); > + > + t = *table; > + t.data = &mask_array; > + > + /* > + * actually read or write and array of ulongs from userspace. Remember > + * these are least significant bits first > + */ > + err = proc_doulongvec_minmax(&t, write, buffer, lenp, ppos); > + if (err < 0) > + return err; > + > + new_mask.val = mask_array[0]; > +#if BITS_PER_LONG != 64 > + new_mask.val += (u64)mask_array[1] << BITS_PER_LONG; > +#endif > + > + /* > + * Drop everything not in the new_mask (but don't add things) > + */ > + if (write) { > + spin_lock(&cap_userns_lock); > + *mask = cap_intersect(*mask, new_mask); > + spin_unlock(&cap_userns_lock); > + } > + > + return 0; > +} > +#endif > + > static bool new_idmap_permitted(const struct file *file, > struct user_namespace *ns, int cap_setid, > struct uid_gid_map *map); > @@ -46,6 +106,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) > /* Limit userns capabilities to our parent's bounding set. */ > if (iscredsecure(cred, SECURE_USERNS_STRICT_CAPS)) > cred->cap_userns = cap_intersect(cred->cap_userns, cred->cap_bset); > +#ifdef CONFIG_SYSCTL > + /* Mask off userns capabilities that are not permitted by the system-wide mask. */ > + spin_lock(&cap_userns_lock); > + cred->cap_userns = cap_intersect(cred->cap_userns, cap_userns_mask); > + spin_unlock(&cap_userns_lock); > +#endif > > /* Start with the capabilities defined in the userns set. */ > cred->cap_bset = cred->cap_userns; > -- > 2.45.0 >