Received: by 2002:ab2:3c46:0:b0:1f5:f2ab:c469 with SMTP id x6csp282438lqf; Fri, 26 Apr 2024 06:43:12 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVMVEtBKp7kJbTR7RClXcExNXTVBJJd+zkfrBr1wFxS4nkfv5Wds415yW8l5KN699YRgjdVvYJpYwqrb57hVAaDw2PIiFUpFLLCbj3Zbg== X-Google-Smtp-Source: AGHT+IFI6cndwzO0or2OKSpH30uXKmYdbHtxNeI6VJPiuD5IaXzudPWnmNX8FJnLsDc1afFoMwp5 X-Received: by 2002:a05:6a20:2d22:b0:1a7:1bef:2377 with SMTP id g34-20020a056a202d2200b001a71bef2377mr4414759pzl.38.1714138991759; Fri, 26 Apr 2024 06:43:11 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714138991; cv=pass; d=google.com; s=arc-20160816; b=MQeJHAOESCDzKzbbrenWZLRO+8SeISGoSzSQla4F9FU1xsGMxBJ0DlsjICQMOLuKY1 NEs7SQIRDrEZJjhW4tzZmsUfisRva+BUxR7wTUCnOGfQcm/Gjq2u32kSYw6ZSh+y8hq0 qtb9s97yOISoNumrE406nEocTGEtKDxPG5XXX3rY8ElhlycHa+XksCl5TlAb7mJryoSK MT0/FEAS/zOdDtb3TsXvolASs9PLhcYBHs8VNsC8Ra1wTR0zxfM1Fo9crO1JIgGgRTe2 OYHxOaxaQX9mu+P7MDWipbuyFqoE2rf9o32wnCdseI6wAiaR9lEgsn+iO7WIWJE2pk2e JAkw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=PQrL87zyCxcvBgqIk2OLIv3fLCdiMHul6W+w6klXk6Y=; fh=328TqmIIEN4VgTC/+/XdKCFZhWn0fxfd+KKmu/Qf5iY=; b=Wagmo7sl8zam2pXiSJs1TiNiIs8S1yomP3G2Xnpf+jID5SswL0kpxWim08a9PBDYVV n2jSCKnkjsE0bdsaZ4DKCBXpbVqViWAV1MFsIhwWUMjMb4aoDATNVU3e8dVbvUjhvIZK k8ZSGffHXW/pm2yKYj/JbG5jgNVGnmr3yWVDnOzo4k/uWC0M2XNY9e+u0xCSRaw9/Ohx LjU294Ye1QlgCflWRjzXHix5ofqADysLBdY9z0j2VLSNOtNUEIrOtJlpUVlMAHNUy6IH rksrQTbm5kDL6+AasXXwIjjbN1s3ebIifbt2MmvSPQKC+Vq4mOtQnZYW+p6ZJyZfV2zH 0/bw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@yandex.ru header.s=mail header.b=E1ffhLcT; arc=pass (i=1 spf=pass spfdomain=yandex.ru dkim=pass dkdomain=yandex.ru dmarc=pass fromdomain=yandex.ru); spf=pass (google.com: domain of linux-kernel+bounces-160058-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-160058-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex.ru Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id m16-20020aa78a10000000b006ed9760c37asi15045260pfa.300.2024.04.26.06.43.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Apr 2024 06:43:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-160058-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@yandex.ru header.s=mail header.b=E1ffhLcT; arc=pass (i=1 spf=pass spfdomain=yandex.ru dkim=pass dkdomain=yandex.ru dmarc=pass fromdomain=yandex.ru); spf=pass (google.com: domain of linux-kernel+bounces-160058-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-160058-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=yandex.ru Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id DF8CF2876A9 for ; Fri, 26 Apr 2024 13:34:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 395C2148308; Fri, 26 Apr 2024 13:34:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=yandex.ru header.i=@yandex.ru header.b="E1ffhLcT" Received: from forward103a.mail.yandex.net (forward103a.mail.yandex.net [178.154.239.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAEC01474DF; Fri, 26 Apr 2024 13:34:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=178.154.239.86 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714138460; cv=none; b=JFmJGS1X3RpuekiRAx2G8RbkCgux0FDtrj6IvPFJMaCbPysaUJWuPQfrrI8KfiUd7LT/EBnPm3+18oe3SkG3OVVdQzb+WZ6sg+Pl+gAyjL9UPPCUxrJjMIu2PrPbjOjQIT5wf81Pw2kor2jCl4Xa8ndEEgr6PjzjS6G011kDlYU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714138460; c=relaxed/simple; bh=nZvMQTmrqARtQUnPBPFu3Ve70ZduHAcyHC7S7TaKEbw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hD73tobEN7x1tqxwO1qxxdQOEYZBn5D9W4AxeUQ1/+ystI1+c+Yxy++oSsrItp7caxFbLcIv9noIfPjD9KYliVcQiB+agQFbWDerVwIJYukfg0uqOAb39EbIOsCyzTGP3eu/fUOJ4Y39dbnr6i2goWvUmJl3ib07ljbtFg4OuDo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex.ru; spf=pass smtp.mailfrom=yandex.ru; dkim=pass (1024-bit key) header.d=yandex.ru header.i=@yandex.ru header.b=E1ffhLcT; arc=none smtp.client-ip=178.154.239.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=yandex.ru Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=yandex.ru Received: from mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net (mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net [IPv6:2a02:6b8:c0d:230c:0:640:f8e:0]) by forward103a.mail.yandex.net (Yandex) with ESMTPS id C1737608F6; Fri, 26 Apr 2024 16:34:09 +0300 (MSK) Received: by mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net (smtp/Yandex) with ESMTPSA id 2YN3P0DXnmI0-8WRp0rPj; Fri, 26 Apr 2024 16:34:08 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1714138448; bh=PQrL87zyCxcvBgqIk2OLIv3fLCdiMHul6W+w6klXk6Y=; h=Cc:Message-ID:References:Date:In-Reply-To:Subject:To:From; b=E1ffhLcTFWZrSX+Ma2El3B2AWKqbH5vZPawrmS98o5BQmb8lkocxXMsnagi5Mkgr0 sXG3eaGmMdZoFcoxN3TCjo5He3ZY5gjtCR1W578ltnAQ02pEWmZggxfCAiYw19nKk3 fn837K0d7/WZjPIwT1Nhns2R4i4KIkw09rQWIRng= Authentication-Results: mail-nwsmtp-smtp-production-main-55.vla.yp-c.yandex.net; dkim=pass header.i=@yandex.ru From: Stas Sergeev To: linux-kernel@vger.kernel.org Cc: Stas Sergeev , Stefan Metzmacher , Eric Biederman , Alexander Viro , Andy Lutomirski , Christian Brauner , Jan Kara , Jeff Layton , Chuck Lever , Alexander Aring , David Laight , linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Paolo Bonzini , =?UTF-8?q?Christian=20G=C3=B6ttsche?= Subject: [PATCH v5 3/3] openat2: add OA2_CRED_INHERIT flag Date: Fri, 26 Apr 2024 16:33:10 +0300 Message-ID: <20240426133310.1159976-4-stsp2@yandex.ru> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240426133310.1159976-1-stsp2@yandex.ru> References: <20240426133310.1159976-1-stsp2@yandex.ru> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This flag performs the open operation with the fs credentials (fsuid, fsgid, group_info) that were in effect when dir_fd was opened. dir_fd must be opened with O_CRED_ALLOW flag for this to work. This allows the process to pre-open some directories and then change eUID (and all other UIDs/GIDs) to a less-privileged user, retaining the ability to open/create files within these directories. Design goal: The idea is to provide a very light-weight sandboxing, where the process, without the use of any heavy-weight techniques like chroot within namespaces, can restrict the access to the set of pre-opened directories. This patch is just a first step to such sandboxing. If things go well, in the future the same extension can be added to more syscalls. These should include at least unlinkat(), renameat2() and the not-yet-upstreamed setxattrat(). Security considerations: - Only the bare minimal set of credentials is overridden: fsuid, fsgid and group_info. The rest, for example capabilities, are not overridden to avoid unneeded security risks. - To avoid sandboxing escape, this patch makes sure the restricted lookup modes are used. Namely, RESOLVE_BENEATH or RESOLVE_IN_ROOT. - Magic /proc symlinks are discarded, as suggested by Andy Lutomirski - O_CRED_ALLOW fds cannot be passed via unix socket and are always closed on exec() to prevent "unsuspecting userspace" from not being able to fully drop privs. Use cases: Virtual machines that deal with untrusted code, can use that instead of a more heavy-weighted approaches. Currently the approach is being tested on a dosemu2 VM. Signed-off-by: Stas Sergeev CC: Stefan Metzmacher CC: Eric Biederman CC: Alexander Viro CC: Andy Lutomirski CC: Christian Brauner CC: Jan Kara CC: Jeff Layton CC: Chuck Lever CC: Alexander Aring CC: linux-fsdevel@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: Paolo Bonzini CC: Christian Göttsche --- fs/fcntl.c | 2 ++ fs/namei.c | 56 ++++++++++++++++++++++++++++++++++-- fs/open.c | 10 ++++++- include/linux/fcntl.h | 2 ++ include/uapi/linux/openat2.h | 2 ++ 5 files changed, 69 insertions(+), 3 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 78c96b1293c2..283c2e65fc2c 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -1043,6 +1043,8 @@ static int __init fcntl_init(void) HWEIGHT32( (VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) | __FMODE_EXEC | __FMODE_NONOTIFY)); + BUILD_BUG_ON(HWEIGHT32(VALID_OPENAT2_FLAGS) != + HWEIGHT32(VALID_OPEN_FLAGS) + 1); fasync_cache = kmem_cache_create("fasync_cache", sizeof(struct fasync_struct), 0, diff --git a/fs/namei.c b/fs/namei.c index dd50345f7260..aa5dcf57851b 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -3776,6 +3776,43 @@ static int do_o_path(struct nameidata *nd, unsigned flags, struct file *file) return error; } +static const struct cred *openat2_init_creds(int dfd) +{ + struct cred *cred; + struct fd f; + + if (dfd == AT_FDCWD) + return ERR_PTR(-EINVAL); + + f = fdget_raw(dfd); + if (!f.file) + return ERR_PTR(-EBADF); + + cred = ERR_PTR(-EPERM); + if (!(f.file->f_flags & O_CRED_ALLOW)) + goto done; + + cred = prepare_creds(); + if (!cred) { + cred = ERR_PTR(-ENOMEM); + goto done; + } + + cred->fsuid = f.file->f_cred->fsuid; + cred->fsgid = f.file->f_cred->fsgid; + cred->group_info = get_group_info(f.file->f_cred->group_info); + +done: + fdput(f); + return cred; +} + +static void openat2_done_creds(const struct cred *cred) +{ + put_group_info(cred->group_info); + put_cred(cred); +} + static struct file *path_openat(struct nameidata *nd, const struct open_flags *op, unsigned flags) { @@ -3793,18 +3830,33 @@ static struct file *path_openat(struct nameidata *nd, error = do_o_path(nd, flags, file); } else { const char *s; + const struct cred *old_cred = NULL, *cred = NULL; - file = alloc_empty_file(open_flags, current_cred()); - if (IS_ERR(file)) + if (open_flags & OA2_CRED_INHERIT) { + cred = openat2_init_creds(nd->dfd); + if (IS_ERR(cred)) + return ERR_CAST(cred); + } + file = alloc_empty_file(open_flags, cred ?: current_cred()); + if (IS_ERR(file)) { + if (cred) + openat2_done_creds(cred); return file; + } s = path_init(nd, flags); + if (cred) + old_cred = override_creds(cred); while (!(error = link_path_walk(s, nd)) && (s = open_last_lookups(nd, file, op)) != NULL) ; if (!error) error = do_open(nd, file, op); + if (old_cred) + revert_creds(old_cred); terminate_walk(nd); + if (cred) + openat2_done_creds(cred); } if (likely(!error)) { if (likely(file->f_mode & FMODE_OPENED)) diff --git a/fs/open.c b/fs/open.c index ee8460c83c77..dd4fab536135 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1225,7 +1225,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) * values before calling build_open_flags(), but openat2(2) checks all * of its arguments. */ - if (flags & ~VALID_OPEN_FLAGS) + if (flags & ~VALID_OPENAT2_FLAGS) return -EINVAL; if (how->resolve & ~VALID_RESOLVE_FLAGS) return -EINVAL; @@ -1326,6 +1326,14 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) lookup_flags |= LOOKUP_CACHED; } + if (flags & OA2_CRED_INHERIT) { + /* Inherit creds only with scoped look-up modes. */ + if (!(lookup_flags & LOOKUP_IS_SCOPED)) + return -EPERM; + /* Reject /proc "magic" links if inheriting creds. */ + lookup_flags |= LOOKUP_NO_MAGICLINKS; + } + op->lookup_flags = lookup_flags; return 0; } diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index e074ee9c1e36..33b9c7ad056b 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -12,6 +12,8 @@ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_CRED_ALLOW) +#define VALID_OPENAT2_FLAGS (VALID_OPEN_FLAGS | OA2_CRED_INHERIT) + /* List of all valid flags for the how->resolve argument: */ #define VALID_RESOLVE_FLAGS \ (RESOLVE_NO_XDEV | RESOLVE_NO_MAGICLINKS | RESOLVE_NO_SYMLINKS | \ diff --git a/include/uapi/linux/openat2.h b/include/uapi/linux/openat2.h index a5feb7604948..f803558ad62f 100644 --- a/include/uapi/linux/openat2.h +++ b/include/uapi/linux/openat2.h @@ -40,4 +40,6 @@ struct open_how { return -EAGAIN if that's not possible. */ +#define OA2_CRED_INHERIT (1UL << 28) + #endif /* _UAPI_LINUX_OPENAT2_H */ -- 2.44.0