Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1896112imm; Sat, 29 Sep 2018 06:16:11 -0700 (PDT) X-Google-Smtp-Source: ACcGV60/48UALcOBR1NkhGT/aFZL00CPHJ0Rairf1DKj67L/q2nklUgwy94SLEu+hlI5AsCO42Mq X-Received: by 2002:a63:91c9:: with SMTP id l192-v6mr2948742pge.433.1538226971093; Sat, 29 Sep 2018 06:16:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538226971; cv=none; d=google.com; s=arc-20160816; b=NCz+9TPVziDChf9ny6x5EGNbYXL15H6LgMV11FKCRgLPnHkRKr0TvPz/J4/kyd7jKs olcxOp60eWTK523JromUrjyYLolHDjZViXUY8h1qQ3XBvdSNJu3DN1gEaaMbFaSf/65A OjQET89JbClnBUPI1aO2L2tgr4LkxcPSF+XJiGkcqcaiEJjr6ePYFwqM4ta0ER7cNaHe NRJPW6txWd9or/vtPbnRX90LcJdFhMtovaVg+4mgUpQIZ13VvdTxbQ20ctpo9JB3Detx W1oqtF4E2N54o/TrcTAWu3170jjwkU42wl76m72obyJz63ol2i3c+aUcaFz37o7QbWvP Yonw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kCfBYlLBGG/ZmNnfk2cB3cnEY2JzUCPOh6/KsUth3Tk=; b=Hod8sSCb3EFO9FNM+3idML2egh+54JdfXPuU+QZF59lEImTnnqVrVz6y9c1hmLYGtj 4J2Uq/4nF1RqEoukPkyJjKNIy7fqf5Qa4Hf0s1SYbtpbv38OBiqvtwOOjEkYbvez7lMz gF7OR8rSc74IDlQRPyAGt8d/CSwqlmM6IeoYM1Gr2CwEyHdDBAHL8CCpA4+0Xft+GPt7 uJ2G4JM40w1AQdTLQQbDiSnnreiXxVez+ip+0WTB7Ea+3UMwGEvkDfDEspbPsT6LMncU 4T7UAZ3tLZKCxNoDNmaQsOFPcBOjpiJkeFfIGV2tD/TDkR27SU9e00ju67/sZI+u1ezk A/Qg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cyphar-com.20150623.gappssmtp.com header.s=20150623 header.b=xAA8O1FI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p1-v6si7490825pfe.150.2018.09.29.06.15.54; Sat, 29 Sep 2018 06:16:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cyphar-com.20150623.gappssmtp.com header.s=20150623 header.b=xAA8O1FI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728251AbeI2ToO (ORCPT + 99 others); Sat, 29 Sep 2018 15:44:14 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:45782 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728077AbeI2ToO (ORCPT ); Sat, 29 Sep 2018 15:44:14 -0400 Received: by mail-pg1-f195.google.com with SMTP id t70-v6so6360443pgd.12 for ; Sat, 29 Sep 2018 06:15:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kCfBYlLBGG/ZmNnfk2cB3cnEY2JzUCPOh6/KsUth3Tk=; b=xAA8O1FIJKQIk5Yd6s/VAROrt8QJA000J6rCRj8eOs6DGryyAOW3Mu817XDJxtODQ4 ja8dGk6QCAFcs9aQw8OT3kxs89pfEhQL9DD+B+CpCK6vGgQn+byVkYSlEyGBd2L66iY1 +rZ0PO8Q8MW1Rc0jeRn/zmdY9gGskgaiyQTB9Q3K8TFbS/367//IbT1qeXgr/jASByAk 7mDxl2tHVSgAWlTqtdpvSzSSUFJ3TTReWb8pxfqXCfEUY1QN+HYFnyByF/zLAPIv0uUe prcasDNKiSkcduzQAIlQec/XxPeWFFTBZsr59WRFTnatxL3MaPDa4DlHehaoa4B+i8aF nRvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kCfBYlLBGG/ZmNnfk2cB3cnEY2JzUCPOh6/KsUth3Tk=; b=q5pXxpSp790HycCmgzOgQBik97Ttnkm+Y6KzNwPHd87a21dO3NnzEJ0joHjPMKGV/h ZlPVYoAmWorMph4NcFi1AMnG0uU3khoxEsWv5im4w6wH/MV9P6bWkD8tSxQnTRHNnwM2 pSqScFhiKVBXcSjOQJ5anZ7+vIK/Whmwk5v5nFIs1y4lrByRCcdV3pmLyd6ZCBCJFmap SG+n3QwPpISfsyOl2oKB5RueG7p0m+8spNdmdlqOw0JtHVoKXljNKpWeV6RtVXiF5Yib m0j7+l2gLehqYtMzWkhMjEwPXiMtL2OEkwlnVdSc/vN52+pjbzcfJ7sfbbXyt6GF07qA JhzA== X-Gm-Message-State: ABuFfojJkW/Q/lVgtmJYhFnoRFeCcufl2rLjKdsyvwMI119SPQLFjFxf HfDAF92C4rtSOmCJmTng+h8t3g== X-Received: by 2002:a17:902:246a:: with SMTP id m39-v6mr3160964plg.57.1538226947444; Sat, 29 Sep 2018 06:15:47 -0700 (PDT) Received: from ?redacted? (pa49-199-213-175.pa.vic.optusnet.com.au. [49.199.213.175]) by smtp.gmail.com with ESMTPSA id h124-v6sm11335360pfg.112.2018.09.29.06.15.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 29 Sep 2018 06:15:46 -0700 (PDT) From: Aleksa Sarai To: Jeff Layton , "J. Bruce Fields" , Al Viro , Arnd Bergmann , Shuah Khan Cc: David Howells , Andy Lutomirski , Christian Brauner , Eric Biederman , Tycho Andersen , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, dev@opencontainers.org, containers@lists.linux-foundation.org, Aleksa Sarai Subject: [PATCH 2/3] namei: implement AT_THIS_ROOT chroot-like path resolution Date: Sat, 29 Sep 2018 23:15:33 +1000 Message-Id: <20180929131534.24472-1-cyphar@cyphar.com> X-Mailer: git-send-email 2.19.0 In-Reply-To: <20180929103453.12025-1-cyphar@cyphar.com> References: <20180929103453.12025-1-cyphar@cyphar.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The primary motivation for the need for this flag is container runtimes which have to interact with malicious root filesystems in the host namespaces. One of the first requirements for a container runtime to be secure against a malicious rootfs is that they correctly scope symlinks (that is, they should be scoped as though they are chroot(2)ed into the container's rootfs) and ".."-style paths. The already-existing AT_XDEV and AT_NO_PROCLINKS help defend against other potential attacks in a malicious rootfs scenario. Currently most container runtimes try to do this resolution in userspace[1], causing many potential race conditions. In addition, the "obvious" alternative (actually performing a {ch,pivot_}root(2)) requires a fork+exec which is *very* costly if necessary for every filesystem operation involving a container. The most significant change in semantics with AT_THIS_ROOT is that *at(2) syscalls now no longer have the property that an absolute pathname causes the dirfd to be ignored completely (if LOOKUP_CHROOT is specified). The reasoning behind this is that AT_THIS_ROOT necessarily has to chroot-scope symlinks with absolute paths to dirfd, and so doing it for the base path seems to be the most consistent behaviour (and also avoids foot-gunning users who want to chroot-scope paths that might be absolute). Currently this is only enabled for the stat(2) and openat(2) family (the latter has its own flag O_THISROOT with the same semantics). Ideally this flag would be supported by all *at(2) syscalls, but this will require adding flags arguments to many of them (and will be done in a separate patchset). [1]: https://github.com/cyphar/filepath-securejoin Cc: Eric Biederman Cc: Christian Brauner Signed-off-by: Aleksa Sarai --- fs/fcntl.c | 2 +- fs/namei.c | 121 +++++++++++++++++-------------- fs/open.c | 2 + fs/stat.c | 4 +- include/linux/fcntl.h | 2 +- include/linux/namei.h | 1 + include/uapi/asm-generic/fcntl.h | 3 + include/uapi/linux/fcntl.h | 2 + 8 files changed, 81 insertions(+), 56 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index e343618736f7..4c36c5b9fdb9 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -1031,7 +1031,7 @@ static int __init fcntl_init(void) * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY * is defined as O_NONBLOCK on some platforms and not on others. */ - BUILD_BUG_ON(25 - 1 /* for O_RDONLY being 0 */ != + BUILD_BUG_ON(26 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32( (VALID_OPEN_FLAGS & ~(O_NONBLOCK | O_NDELAY)) | __FMODE_EXEC | __FMODE_NONOTIFY)); diff --git a/fs/namei.c b/fs/namei.c index 757dd783771c..1b984f0dbbb4 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2193,9 +2193,64 @@ static int link_path_walk(const char *name, struct nameidata *nd) } } +/* + * Configure nd->path based on the nd->dfd. This is only used as part of + * path_init(). + */ +static inline int dirfd_path_init(struct nameidata *nd) +{ + if (nd->dfd == AT_FDCWD) { + if (nd->flags & LOOKUP_RCU) { + struct fs_struct *fs = current->fs; + unsigned seq; + + do { + seq = read_seqcount_begin(&fs->seq); + nd->path = fs->pwd; + nd->inode = nd->path.dentry->d_inode; + nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); + } while (read_seqcount_retry(&fs->seq, seq)); + } else { + get_fs_pwd(current->fs, &nd->path); + nd->inode = nd->path.dentry->d_inode; + } + } else { + /* Caller must check execute permissions on the starting path component */ + struct fd f = fdget_raw(nd->dfd); + struct dentry *dentry; + + if (!f.file) + return -EBADF; + + dentry = f.file->f_path.dentry; + + if (*nd->name->name && unlikely(!d_can_lookup(dentry))) { + fdput(f); + return -ENOTDIR; + } + + nd->path = f.file->f_path; + if (nd->flags & LOOKUP_RCU) { + nd->inode = nd->path.dentry->d_inode; + nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq); + } else { + path_get(&nd->path); + nd->inode = nd->path.dentry->d_inode; + } + fdput(f); + } + if (unlikely(nd->flags & (LOOKUP_CHROOT | LOOKUP_BENEATH))) { + nd->root = nd->path; + if (!(nd->flags & LOOKUP_RCU)) + path_get(&nd->root); + } + return 0; +} + /* must be paired with terminate_walk() */ static const char *path_init(struct nameidata *nd, unsigned flags) { + int error; const char *s = nd->name->name; if (!*s) @@ -2230,65 +2285,25 @@ static const char *path_init(struct nameidata *nd, unsigned flags) nd->path.dentry = NULL; nd->m_seq = read_seqbegin(&mount_lock); + if (unlikely(flags & LOOKUP_CHROOT)) { + error = dirfd_path_init(nd); + if (unlikely(error)) + return ERR_PTR(error); + } if (*s == '/') { - int error; - set_root(nd); + if (likely(!nd->root.mnt)) + set_root(nd); error = nd_jump_root(nd); if (unlikely(error)) s = ERR_PTR(error); return s; - } else if (nd->dfd == AT_FDCWD) { - if (flags & LOOKUP_RCU) { - struct fs_struct *fs = current->fs; - unsigned seq; - - do { - seq = read_seqcount_begin(&fs->seq); - nd->path = fs->pwd; - nd->inode = nd->path.dentry->d_inode; - nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq); - } while (read_seqcount_retry(&fs->seq, seq)); - } else { - get_fs_pwd(current->fs, &nd->path); - nd->inode = nd->path.dentry->d_inode; - } - if (unlikely(flags & LOOKUP_BENEATH)) { - nd->root = nd->path; - if (!(flags & LOOKUP_RCU)) - path_get(&nd->root); - } - return s; - } else { - /* Caller must check execute permissions on the starting path component */ - struct fd f = fdget_raw(nd->dfd); - struct dentry *dentry; - - if (!f.file) - return ERR_PTR(-EBADF); - - dentry = f.file->f_path.dentry; - - if (*s && unlikely(!d_can_lookup(dentry))) { - fdput(f); - return ERR_PTR(-ENOTDIR); - } - - nd->path = f.file->f_path; - if (flags & LOOKUP_RCU) { - nd->inode = nd->path.dentry->d_inode; - nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq); - } else { - path_get(&nd->path); - nd->inode = nd->path.dentry->d_inode; - } - if (unlikely(flags & LOOKUP_BENEATH)) { - nd->root = nd->path; - if (!(flags & LOOKUP_RCU)) - path_get(&nd->root); - } - fdput(f); - return s; } + if (likely(!nd->path.mnt)) { + error = dirfd_path_init(nd); + if (unlikely(error)) + return ERR_PTR(error); + } + return s; } static const char *trailing_symlink(struct nameidata *nd) diff --git a/fs/open.c b/fs/open.c index 80f5f566a5ff..81d148f626cd 100644 --- a/fs/open.c +++ b/fs/open.c @@ -996,6 +996,8 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o lookup_flags |= LOOKUP_NO_PROCLINKS; if (flags & O_NOSYMLINKS) lookup_flags |= LOOKUP_NO_SYMLINKS; + if (flags & O_THISROOT) + lookup_flags |= LOOKUP_CHROOT; op->lookup_flags = lookup_flags; return 0; } diff --git a/fs/stat.c b/fs/stat.c index 791e61b916ae..e8366e4812c3 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -172,7 +172,7 @@ int vfs_statx(int dfd, const char __user *filename, int flags, if (flags & ~(AT_SYMLINK_NOFOLLOW | AT_NO_AUTOMOUNT | AT_EMPTY_PATH | KSTAT_QUERY_FLAGS | AT_BENEATH | AT_XDEV | - AT_NO_PROCLINKS | AT_NO_SYMLINKS)) + AT_NO_PROCLINKS | AT_NO_SYMLINKS | AT_THIS_ROOT)) return -EINVAL; if (flags & AT_SYMLINK_NOFOLLOW) @@ -189,6 +189,8 @@ int vfs_statx(int dfd, const char __user *filename, int flags, lookup_flags |= LOOKUP_NO_PROCLINKS; if (flags & AT_NO_SYMLINKS) lookup_flags |= LOOKUP_NO_SYMLINKS; + if (flags & AT_THIS_ROOT) + lookup_flags |= LOOKUP_CHROOT; retry: error = user_path_at(dfd, filename, lookup_flags, &path); diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h index ad5bba4b5b12..95480cd4c09d 100644 --- a/include/linux/fcntl.h +++ b/include/linux/fcntl.h @@ -10,7 +10,7 @@ O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \ FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \ O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | O_BENEATH | O_XDEV | \ - O_NOPROCLINKS | O_NOSYMLINKS) + O_NOPROCLINKS | O_NOSYMLINKS | O_THISROOT) #ifndef force_o_largefile #define force_o_largefile() (BITS_PER_LONG != 32) diff --git a/include/linux/namei.h b/include/linux/namei.h index 5ff7f3362d1b..7ec9e2d84649 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -53,6 +53,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_NO_PROCLINKS 0x040000 /* No /proc/$pid/fd/ "symlink" crossing. */ #define LOOKUP_NO_SYMLINKS 0x080000 /* No symlink crossing *at all*. Implies LOOKUP_NO_PROCLINKS. */ +#define LOOKUP_CHROOT 0x100000 /* Treat dirfd as %current->fs->root. */ extern int path_pts(struct path *path); diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h index c2bf5983e46a..11206b0e927c 100644 --- a/include/uapi/asm-generic/fcntl.h +++ b/include/uapi/asm-generic/fcntl.h @@ -113,6 +113,9 @@ #ifndef O_NOSYMLINKS #define O_NOSYMLINKS 01000000000 #endif +#ifndef O_THISROOT +#define O_THISROOT 02000000000 +#endif #define F_DUPFD 0 /* dup */ #define F_GETFD 1 /* get close_on_exec */ diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 551a9e2166a8..ea978457b68f 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -99,6 +99,8 @@ #define AT_NO_PROCLINKS 0x40000 /* No /proc/$pid/fd/... "symlinks". */ #define AT_NO_SYMLINKS 0x80000 /* No symlinks *at all*. Implies AT_NO_PROCLINKS. */ +#define AT_THIS_ROOT 0x100000 /* Path resolution acts as though + it is chroot-ed into dirfd. */ #endif /* _UAPI_LINUX_FCNTL_H */ -- 2.19.0