From: Mattias Nissler <[email protected]>
For mounts that have the new "nosymfollow" option, don't follow symlinks
when resolving paths. The new option is similar in spirit to the
existing "nodev", "noexec", and "nosuid" options, as well as to the
LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD
variants have been supporting the "nosymfollow" mount option for a long
time with equivalent implementations.
Note that symlinks may still be created on file systems mounted with
the "nosymfollow" option present. readlink() remains functional, so
user space code that is aware of symlinks can still choose to follow
them explicitly.
Setting the "nosymfollow" mount option helps prevent privileged
writers from modifying files unintentionally in case there is an
unexpected link along the accessed path. The "nosymfollow" option is
thus useful as a defensive measure for systems that need to deal with
untrusted file systems in privileged contexts.
More information on the history and motivation for this patch can be
found here:
https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversal
Signed-off-by: Mattias Nissler <[email protected]>
Signed-off-by: Ross Zwisler <[email protected]>
---
Resending v6 which was previously posted here [0].
Aleksa, if I've addressed all of your feedback, would you mind adding
your Reviewed-by?
Andrew, would you please consider merging this?
This patch adds a security measure which we feel is necessary to have in
Chrome OS, and which would be beneficial to have in other Linux setups
as well. Lots of details on exactly what we're protecting against exist
in the write-up that I linked to in the commit message.
Changes since v5 [1]:
* Redefined MS_NOSYMFOLLOW to use a lower unused bit value (256) so it
doesn't collide with MS_SUBMOUNT.
* Updated the mount code in util-linux [2] to use the newly defined
flag value.
* Updated man pages for mount(8) [2], as well as mount(2) and statfs(2) [3].
[0]: https://patchwork.kernel.org/patch/11405065/
[1]: https://patchwork.kernel.org/patch/11365291/
[2]: https://github.com/rzwisler/util-linux/commit/7f8771acd85edb70d97921c026c55e1e724d4e15
[3]: https://github.com/rzwisler/man-pages/commit/b8fe8079f64b5068940c0144586e580399a71668
---
fs/namei.c | 3 ++-
fs/namespace.c | 2 ++
fs/proc_namespace.c | 1 +
fs/statfs.c | 2 ++
include/linux/mount.h | 3 ++-
include/linux/statfs.h | 1 +
include/uapi/linux/mount.h | 1 +
7 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c
index db6565c998259..026a774d28c3d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1122,7 +1122,8 @@ const char *get_link(struct nameidata *nd)
int error;
const char *res;
- if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS))
+ if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS) ||
+ unlikely(nd->path.mnt->mnt_flags & MNT_NOSYMFOLLOW))
return ERR_PTR(-ELOOP);
if (!(nd->flags & LOOKUP_RCU)) {
diff --git a/fs/namespace.c b/fs/namespace.c
index 85b5f7bea82e7..9b843b66d39e4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3074,6 +3074,8 @@ long do_mount(const char *dev_name, const char __user *dir_name,
mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
if (flags & MS_RDONLY)
mnt_flags |= MNT_READONLY;
+ if (flags & MS_NOSYMFOLLOW)
+ mnt_flags |= MNT_NOSYMFOLLOW;
/* The default atime for remount is preservation */
if ((flags & MS_REMOUNT) &&
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 273ee82d8aa97..91a552f617406 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -70,6 +70,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt)
{ MNT_NOATIME, ",noatime" },
{ MNT_NODIRATIME, ",nodiratime" },
{ MNT_RELATIME, ",relatime" },
+ { MNT_NOSYMFOLLOW, ",nosymfollow" },
{ 0, NULL }
};
const struct proc_fs_info *fs_infop;
diff --git a/fs/statfs.c b/fs/statfs.c
index 2616424012ea7..59f33752c1311 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -29,6 +29,8 @@ static int flags_by_mnt(int mnt_flags)
flags |= ST_NODIRATIME;
if (mnt_flags & MNT_RELATIME)
flags |= ST_RELATIME;
+ if (mnt_flags & MNT_NOSYMFOLLOW)
+ flags |= ST_NOSYMFOLLOW;
return flags;
}
diff --git a/include/linux/mount.h b/include/linux/mount.h
index bf8cc4108b8f9..ff2d132c21f5d 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -30,6 +30,7 @@ struct fs_context;
#define MNT_NODIRATIME 0x10
#define MNT_RELATIME 0x20
#define MNT_READONLY 0x40 /* does the user want this to be r/o? */
+#define MNT_NOSYMFOLLOW 0x80
#define MNT_SHRINKABLE 0x100
#define MNT_WRITE_HOLD 0x200
@@ -46,7 +47,7 @@ struct fs_context;
#define MNT_SHARED_MASK (MNT_UNBINDABLE)
#define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
| MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
- | MNT_READONLY)
+ | MNT_READONLY | MNT_NOSYMFOLLOW)
#define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
#define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \
diff --git a/include/linux/statfs.h b/include/linux/statfs.h
index 9bc69edb8f188..fac4356ea1bfc 100644
--- a/include/linux/statfs.h
+++ b/include/linux/statfs.h
@@ -40,6 +40,7 @@ struct kstatfs {
#define ST_NOATIME 0x0400 /* do not update access times */
#define ST_NODIRATIME 0x0800 /* do not update directory access times */
#define ST_RELATIME 0x1000 /* update atime relative to mtime/ctime */
+#define ST_NOSYMFOLLOW 0x2000 /* do not follow symlinks */
struct dentry;
extern int vfs_get_fsid(struct dentry *dentry, __kernel_fsid_t *fsid);
diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h
index 96a0240f23fed..dd8306ea336c1 100644
--- a/include/uapi/linux/mount.h
+++ b/include/uapi/linux/mount.h
@@ -16,6 +16,7 @@
#define MS_REMOUNT 32 /* Alter flags of a mounted FS */
#define MS_MANDLOCK 64 /* Allow mandatory locks on an FS */
#define MS_DIRSYNC 128 /* Directory modifications are synchronous */
+#define MS_NOSYMFOLLOW 256 /* Do not follow symlinks */
#define MS_NOATIME 1024 /* Do not update access times. */
#define MS_NODIRATIME 2048 /* Do not update directory access times */
#define MS_BIND 4096
--
2.25.1.481.gfbce0eb801-goog
On Wed, Mar 04, 2020 at 10:34:46AM -0700, Ross Zwisler wrote:
> From: Mattias Nissler <[email protected]>
>
> For mounts that have the new "nosymfollow" option, don't follow symlinks
> when resolving paths. The new option is similar in spirit to the
> existing "nodev", "noexec", and "nosuid" options, as well as to the
> LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD
> variants have been supporting the "nosymfollow" mount option for a long
> time with equivalent implementations.
>
> Note that symlinks may still be created on file systems mounted with
> the "nosymfollow" option present. readlink() remains functional, so
> user space code that is aware of symlinks can still choose to follow
> them explicitly.
>
> Setting the "nosymfollow" mount option helps prevent privileged
> writers from modifying files unintentionally in case there is an
> unexpected link along the accessed path. The "nosymfollow" option is
> thus useful as a defensive measure for systems that need to deal with
> untrusted file systems in privileged contexts.
>
> More information on the history and motivation for this patch can be
> found here:
>
> https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversal
>
> Signed-off-by: Mattias Nissler <[email protected]>
> Signed-off-by: Ross Zwisler <[email protected]>
> ---
> Resending v6 which was previously posted here [0].
>
> Aleksa, if I've addressed all of your feedback, would you mind adding
> your Reviewed-by?
>
> Andrew, would you please consider merging this?
NAK. It's not that I hated the patch, but I call hard moratorium on
fs/namei.c features this cycle.
Reason: very massive rewrite of the entire area about to hit -next.
Moreover, that rewrite is still in the "might be reordered/rebased/whatnot"
stage. The patches had been posted on fsdevel, along with the warning
that it's going into -next shortly.
Folks, we are close enough to losing control of complexity in that
code. It needs to be sanitized, or we'll get into a state where the
average amount of new bugs introduced by fixing an old one exceeds 1.
There had been several complexity injections into that thing over
years (r/o bind-mounts, original RCU pathwalk merge, atomic_open,
mount traps, openat2 to name some) and while some of that got eventually
cleaned up, there's a lot of subtle stuff accumulated in the area.
It can be sanitized and I am doing just that (62 commits in the local
branch at the moment). If that gets in the way of someone's patches -
too fucking bad. The stuff already in needs to be integrated properly;
that gets priority over additional security hardening any day, especially
since this cycle has already seen
* user-triggerable oops in several years old hardening stuff
(use-after-free, unlikely to be escalatable beyond null pointer
dereference). And I'm not blaming the patch authors - liveness analysis
in do_last() as it is in mainline is a nightmare.
* my own brown paperbag braino in attempt to fix that.
Fortunately that one was easily caught by fuzzers and it was trivial to fix
once found. Again, liveness analysis (and data invariants) from hell...
* gaps in LOOKUP_NO_XDEV (openat2 series, just merged). Missed
on review. Reason: several places implementing mount crossing, with
varying amount of divergence between them. One got missed...
* rather interesting corner cases of aushit vs. open vs. NFS.
Fairly old ones, at that. Still sorting that one out...
Anyway, the bottom line is: leave fs/namei.c (especially around the
pathwalk-related code) alone for now. Or work on top of the posted
series, but expect it to change quite a bit under you. Trying to
dump that fun job on akpm is unlikely to work. And if all of that
comes as a surprise since you are not following fsdevel, consider
doing so in the future, please.
PS:
al@dizzy:~/linux/trees/vfs$ git diff --stat v5.6-rc1..HEAD fs/namei.c
fs/namei.c | 1408 +++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------
1 file changed, 597 insertions(+), 811 deletions(-)
al@dizzy:~/linux/trees/vfs$ wc -l fs/namei.c
4723 fs/namei.c
The affected area is almost exclusively in core pathname resolution
code.
On Wed, Mar 04, 2020 at 06:38:29PM +0000, Al Viro wrote:
> On Wed, Mar 04, 2020 at 10:34:46AM -0700, Ross Zwisler wrote:
> > From: Mattias Nissler <[email protected]>
> >
> > For mounts that have the new "nosymfollow" option, don't follow symlinks
> > when resolving paths. The new option is similar in spirit to the
> > existing "nodev", "noexec", and "nosuid" options, as well as to the
> > LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD
> > variants have been supporting the "nosymfollow" mount option for a long
> > time with equivalent implementations.
> >
> > Note that symlinks may still be created on file systems mounted with
> > the "nosymfollow" option present. readlink() remains functional, so
> > user space code that is aware of symlinks can still choose to follow
> > them explicitly.
> >
> > Setting the "nosymfollow" mount option helps prevent privileged
> > writers from modifying files unintentionally in case there is an
> > unexpected link along the accessed path. The "nosymfollow" option is
> > thus useful as a defensive measure for systems that need to deal with
> > untrusted file systems in privileged contexts.
> >
> > More information on the history and motivation for this patch can be
> > found here:
> >
> > https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-docs/hardening-against-malicious-stateful-data#TOC-Restricting-symlink-traversal
> >
> > Signed-off-by: Mattias Nissler <[email protected]>
> > Signed-off-by: Ross Zwisler <[email protected]>
> > ---
> > Resending v6 which was previously posted here [0].
> >
> > Aleksa, if I've addressed all of your feedback, would you mind adding
> > your Reviewed-by?
> >
> > Andrew, would you please consider merging this?
>
> NAK. It's not that I hated the patch, but I call hard moratorium on
> fs/namei.c features this cycle.
>
> Reason: very massive rewrite of the entire area about to hit -next.
> Moreover, that rewrite is still in the "might be reordered/rebased/whatnot"
> stage. The patches had been posted on fsdevel, along with the warning
> that it's going into -next shortly.
>
> Folks, we are close enough to losing control of complexity in that
> code. It needs to be sanitized, or we'll get into a state where the
> average amount of new bugs introduced by fixing an old one exceeds 1.
>
> There had been several complexity injections into that thing over
> years (r/o bind-mounts, original RCU pathwalk merge, atomic_open,
> mount traps, openat2 to name some) and while some of that got eventually
> cleaned up, there's a lot of subtle stuff accumulated in the area.
> It can be sanitized and I am doing just that (62 commits in the local
> branch at the moment). If that gets in the way of someone's patches -
> too fucking bad. The stuff already in needs to be integrated properly;
> that gets priority over additional security hardening any day, especially
> since this cycle has already seen
> * user-triggerable oops in several years old hardening stuff
> (use-after-free, unlikely to be escalatable beyond null pointer
> dereference). And I'm not blaming the patch authors - liveness analysis
> in do_last() as it is in mainline is a nightmare.
> * my own brown paperbag braino in attempt to fix that.
> Fortunately that one was easily caught by fuzzers and it was trivial to fix
> once found. Again, liveness analysis (and data invariants) from hell...
> * gaps in LOOKUP_NO_XDEV (openat2 series, just merged). Missed
> on review. Reason: several places implementing mount crossing, with
> varying amount of divergence between them. One got missed...
> * rather interesting corner cases of aushit vs. open vs. NFS.
> Fairly old ones, at that. Still sorting that one out...
>
> Anyway, the bottom line is: leave fs/namei.c (especially around the
> pathwalk-related code) alone for now. Or work on top of the posted
> series, but expect it to change quite a bit under you. Trying to
> dump that fun job on akpm is unlikely to work. And if all of that
> comes as a surprise since you are not following fsdevel, consider
> doing so in the future, please.
>
> PS:
> al@dizzy:~/linux/trees/vfs$ git diff --stat v5.6-rc1..HEAD fs/namei.c
> fs/namei.c | 1408 +++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------------
> 1 file changed, 597 insertions(+), 811 deletions(-)
> al@dizzy:~/linux/trees/vfs$ wc -l fs/namei.c
> 4723 fs/namei.c
>
> The affected area is almost exclusively in core pathname resolution
> code.
Makes sense, thank you for the response.