Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755782AbcKVONr (ORCPT ); Tue, 22 Nov 2016 09:13:47 -0500 Received: from mail-vk0-f41.google.com ([209.85.213.41]:34439 "EHLO mail-vk0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755710AbcKVONo (ORCPT ); Tue, 22 Nov 2016 09:13:44 -0500 MIME-Version: 1.0 In-Reply-To: <1479748213.2309.37.camel@HansenPartnership.com> References: <1476880312-64786-1-git-send-email-mnissler@chromium.org> <1479748213.2309.37.camel@HansenPartnership.com> From: Mattias Nissler Date: Tue, 22 Nov 2016 15:13:22 +0100 X-Google-Sender-Auth: Hh65UBq9pAIZHbc2BGvEAayYom0 Message-ID: Subject: Re: [PATCH v2] Add a "nosymlinks" mount option. To: James Bottomley Cc: linux-fsdevel@vger.kernel.org, Alexander Viro , linux-kernel@vger.kernel.org, Colin Walters , "Austin S . Hemmelgarn" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12855 Lines: 291 On Mon, Nov 21, 2016 at 6:10 PM, James Bottomley wrote: > On Wed, 2016-11-16 at 13:18 -0800, Mattias Nissler wrote: >> I understand that silence suggests there's little interest, but >> here's some new information I discovered today that may justify to >> reconsider the patch: >> >> The BSDs already have exactly what I propose, the mount option is >> called "nosymfollow" there: >> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2 >> f48865f18d6/sys/kern/vfs_lookup.c#L939 >> >> There's also some evidence on the net that people have been using >> said nosymfollow mount option to mitigate symlink attacks. > > I've got to say that just because BSD does this doesn't make it a good > idea. The problem with disabling symlinks is that they're a core part > of unix filesystem semantics and simply disabling them breaks various > setups in various ways. Agreed. The proposed mount option is not going to mix well with cases such as shared object symlinks commonly found in /usr/lib/ > The breakage depends on the user, so if you > come from a Windows background, where symlinks basically don't exist, > you're likely fine, but if you come from a UNIX background, chances are > you use them pretty extensively. Obviously, I'm a UNIX background > person and I can think of all the nasty ways you'll break my setup ... Note that I'm not saying you should be applying this mount option broadly, just that it is useful for certain mounts where there are no reasons for symlinks to be present. Point in case: we already have noexec, but obviously people don't mount their root fs noexec as that would prevent the system from booting ;-) > > If a root privileged programme is trying to write to a file, it's not > unreasonable to expect the programme itself to take simple precautions > (like setuid to the user it's trying to write), so I'd really rather > declare binaries that don't do this as broken and fix them. Thinking > you've plugged a hole like this simply by disabling symlinks is a false > sense of security because there are a variety of other ways I could > trick the programme into writing where it shouldn't. Disabling > symlinks is fixing a symptom, it's really the cause that should be > fixed. Your points are generally valid. Yes, I'm addressing the symptom of incautious access to untrusted file systems. I actually agree that a better solution would be for privileged code to be aware of the risks of file system access and to apply caution where necessary. I also agree that symlinks are not the only issue with untrusted file systems, so applying this mount option is not a panacea. I should note that I'm actively working on hardening some of the more exposed contexts in Chrome OS beyond the symlink traversal issue. However, I still believe that blocking symlink traversal in the kernel is a useful thing to do in our use case. Rationale: 1. On a typical system, there's a ton of code executing as root or with elevated privileges. It's not practical to audit all that code for symlink traversal issues. Traditional systems just assume that the root file system is trusted and that code running as root only deals with root-owned files, which alleviates the problem to a point where it becomes manageable. However, on Chrome OS we have the ambition to not blindly trust the writable file system, so we have to worry about *any* file access in privileged contexts, not just the cases where privileged processes access user-owned file system locations (allow me to add that history has shown that this alone has been a constant source of bugs to the point that it is now generally considered an anti-pattern). 2. It is sometimes hard to make code sufficiently cautious, for example in init scripts. As long as you're using shell, you want to use features such as shell redirection. To cover these, you'd ideally patch the shell to take care upon opening files for redirection, but given the prevalence of file access and the myriad of tools invoked by shell this is bound to be a losing battle. Another option is to add shell code along the lines of this: test "$(readlink -f $PATH)" = "$PATH" (note that O_NOFOLLOW is not enough as that'll only affect the last path component). That's tedious and error-prone to maintain, and also comes with a TOCTOU issue. 3. Assuming for a moment that we can fix all existing issues, it's still not realistic to keep the code clean moving forward. IMHO, the root of the problem is that developers just don't have symlinks on their radar when writing file access code. "I'll just write this data to this file, and I'll hardcode the path, so what can go wrong?" is probably a realistic level of caution you can expect. The fact that the presence of symlinks can redirect the operation to an entirely different file isn't obvious at all when writing the code. 4. On a philosophical note: There are many precedents where fixing symptoms is standard practice because attempts to fix the root cause have proven insufficient. One example is memory bugs in browsers. The industry has widely adopted mitigations that fix symptoms, e.g. running browser code in sandboxed environments. Furthermore, the defense-in-depth principle says that you should be prepared for your fix to the root cause to fail. In that context, it makes sense to disable symlink traversal for cases where we don't need it to have another line of defense. This became longer than intended, but hopefully it helps clarify my perspective a bit. - Mattias > > James > >> On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler < >> mnissler@chromium.org> wrote: >> > Friendly ping - does this version of the patch have any chance on >> > getting included in mainline? >> > >> > On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler < >> > mnissler@chromium.org> wrote: >> > > For mounts that have the new "nosymlinks" option, don't follow >> > > symlinks when resolving paths. The new option is similar in >> > > spirit to >> > > the existing "nodev", "noexec", and "nosuid" options. >> > > >> > > Note that symlinks may still be created on mounts where the >> > > "nosymlinks" option is present. readlink() remains functional, so >> > > user >> > > space code that is aware of symlinks can still choose to follow >> > > them >> > > explicitly. >> > > >> > > Setting the "nosymlinks" mount option helps prevent privileged >> > > writers >> > > from modifying files unintentionally in case there is an >> > > unexpected >> > > link along the accessed path. The "nosymlinks" option is thus >> > > useful >> > > as a defensive measure for systems that need to deal with >> > > untrusted >> > > file systems in privileged contexts. >> > > >> > > Signed-off-by: Mattias Nissler >> > > --- >> > > fs/namei.c | 3 +++ >> > > fs/namespace.c | 9 ++++++--- >> > > fs/proc_namespace.c | 1 + >> > > fs/statfs.c | 2 ++ >> > > include/linux/mount.h | 3 ++- >> > > include/linux/statfs.h | 1 + >> > > include/uapi/linux/fs.h | 1 + >> > > 7 files changed, 16 insertions(+), 4 deletions(-) >> > > >> > > diff --git a/fs/namei.c b/fs/namei.c >> > > index 5b4eed2..4cddcf3 100644 >> > > --- a/fs/namei.c >> > > +++ b/fs/namei.c >> > > @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd) >> > > touch_atime(&last->link); >> > > } >> > > >> > > + if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS) >> > > + return ERR_PTR(-EACCES); >> > > + >> > > error = security_inode_follow_link(dentry, inode, >> > > nd->flags & >> > > LOOKUP_RCU); >> > > if (unlikely(error)) >> > > diff --git a/fs/namespace.c b/fs/namespace.c >> > > index e6c234b..deec84e 100644 >> > > --- a/fs/namespace.c >> > > +++ b/fs/namespace.c >> > > @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const >> > > char __user *dir_name, >> > > mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); >> > > if (flags & MS_RDONLY) >> > > mnt_flags |= MNT_READONLY; >> > > + if (flags & MS_NOSYMLINKS) >> > > + mnt_flags |= MNT_NOSYMLINKS; >> > > >> > > /* The default atime for remount is preservation */ >> > > if ((flags & MS_REMOUNT) && >> > > @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const >> > > char __user *dir_name, >> > > mnt_flags |= path.mnt->mnt_flags & >> > > MNT_ATIME_MASK; >> > > } >> > > >> > > - flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | >> > > MS_BORN | >> > > - MS_NOATIME | MS_NODIRATIME | MS_RELATIME| >> > > MS_KERNMOUNT | >> > > - MS_STRICTATIME | MS_NOREMOTELOCK); >> > > + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | >> > > MS_NOSYMLINKS | >> > > + MS_ACTIVE | MS_BORN | MS_NOATIME | >> > > MS_NODIRATIME | >> > > + MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME | >> > > + MS_NOREMOTELOCK); >> > > >> > > if (flags & MS_REMOUNT) >> > > retval = do_remount(&path, flags & ~MS_REMOUNT, >> > > mnt_flags, >> > > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c >> > > index 3f1190d..a1949d9 100644 >> > > --- a/fs/proc_namespace.c >> > > +++ b/fs/proc_namespace.c >> > > @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m, >> > > struct vfsmount *mnt) >> > > { MNT_NOATIME, ",noatime" }, >> > > { MNT_NODIRATIME, ",nodiratime" }, >> > > { MNT_RELATIME, ",relatime" }, >> > > + { MNT_NOSYMLINKS, ",nosymlinks" }, >> > > { 0, NULL } >> > > }; >> > > const struct proc_fs_info *fs_infop; >> > > diff --git a/fs/statfs.c b/fs/statfs.c >> > > index 083dc0a..7ff7c32 100644 >> > > --- a/fs/statfs.c >> > > +++ b/fs/statfs.c >> > > @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags) >> > > flags |= ST_NODIRATIME; >> > > if (mnt_flags & MNT_RELATIME) >> > > flags |= ST_RELATIME; >> > > + if (mnt_flags & MNT_NOSYMLINKS) >> > > + flags |= ST_NOSYMLINKS; >> > > return flags; >> > > } >> > > >> > > diff --git a/include/linux/mount.h b/include/linux/mount.h >> > > index 1172cce..5e302f4 100644 >> > > --- a/include/linux/mount.h >> > > +++ b/include/linux/mount.h >> > > @@ -28,6 +28,7 @@ struct mnt_namespace; >> > > #define MNT_NODIRATIME 0x10 >> > > #define MNT_RELATIME 0x20 >> > > #define MNT_READONLY 0x40 /* does the user want this to be >> > > r/o? */ >> > > +#define MNT_NOSYMLINKS 0x80 >> > > >> > > #define MNT_SHRINKABLE 0x100 >> > > #define MNT_WRITE_HOLD 0x200 >> > > @@ -44,7 +45,7 @@ struct mnt_namespace; >> > > #define MNT_SHARED_MASK (MNT_UNBINDABLE) >> > > #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | >> > > MNT_NOEXEC \ >> > > | MNT_NOATIME | MNT_NODIRATIME | >> > > MNT_RELATIME \ >> > > - | MNT_READONLY) >> > > + | MNT_READONLY | MNT_NOSYMLINKS) >> > > #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | >> > > MNT_RELATIME ) >> > > >> > > #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | >> > > MNT_INTERNAL | \ >> > > diff --git a/include/linux/statfs.h b/include/linux/statfs.h >> > > index 0166d32..994b059 100644 >> > > --- a/include/linux/statfs.h >> > > +++ b/include/linux/statfs.h >> > > @@ -39,5 +39,6 @@ struct kstatfs { >> > > #define ST_NOATIME 0x0400 /* do not update access times */ >> > > #define ST_NODIRATIME 0x0800 /* do not update directory access >> > > times */ >> > > #define ST_RELATIME 0x1000 /* update atime relative to >> > > mtime/ctime */ >> > > +#define ST_NOSYMLINKS 0x2000 /* do not follow symbolic links >> > > */ >> > > >> > > #endif >> > > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h >> > > index acb2b61..06193d8 100644 >> > > --- a/include/uapi/linux/fs.h >> > > +++ b/include/uapi/linux/fs.h >> > > @@ -130,6 +130,7 @@ struct inodes_stat_t { >> > > #define MS_I_VERSION (1<<23) /* Update inode I_version field >> > > */ >> > > #define MS_STRICTATIME (1<<24) /* Always perform atime updates >> > > */ >> > > #define MS_LAZYTIME (1<<25) /* Update the on-disk [acm]times >> > > lazily */ >> > > +#define MS_NOSYMLINKS (1<<26) /* Do not follow symbolic links >> > > */ >> > > >> > > /* These sb flags are internal to the kernel */ >> > > #define MS_NOREMOTELOCK (1<<27) >> > > -- >> > > 2.8.0.rc3.226.g39d4020 >> > > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux >> -fsdevel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >