MIME-Version: 1.0
In-Reply-To: <1479748213.2309.37.camel@HansenPartnership.com>
References: <1476880312-64786-1-git-send-email-mnissler@chromium.org>
 <CAKUbbx+EvLFEOkAWynshmUquK_n=LS9aRDEAD=OUK9AbR3c1LA@mail.gmail.com>
 <CAKUbbxKeHNcQXhzBpoRiDUun-fqjTFicTzsLHLkoryKD11d74A@mail.gmail.com> <1479748213.2309.37.camel@HansenPartnership.com>
From: Mattias Nissler <mnissler@chromium.org>
Date: Tue, 22 Nov 2016 15:13:22 +0100
Message-ID: <CAKUbbxKcraUKiEUQ-E2UC7R-gORjU9G+NThW2Ha_H+i1GS_Y+g@mail.gmail.com>
Subject: Re: [PATCH v2] Add a "nosymlinks" mount option.
To: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: linux-fsdevel@vger.kernel.org,
        Alexander Viro <viro@zeniv.linux.org.uk>, linux-kernel@vger.kernel.org,
        Colin Walters <walters@verbum.org>,
        "Austin S . Hemmelgarn" <ahferroin7@gmail.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 12855
Lines: 291

On Mon, Nov 21, 2016 at 6:10 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Wed, 2016-11-16 at 13:18 -0800, Mattias Nissler wrote:
>> I understand that silence suggests there's little interest, but
>> here's some new information I discovered today that may justify to
>> reconsider the patch:
>>
>> The BSDs already have exactly what I propose, the mount option is
>> called "nosymfollow" there:
>> https://github.com/freebsd/freebsd/blob/a41f4cc9a57cd74604ae7b051eec2
>> f48865f18d6/sys/kern/vfs_lookup.c#L939
>>
>> There's also some evidence on the net that people have been using
>> said nosymfollow mount option to mitigate symlink attacks.
>
> I've got to say that just because BSD does this doesn't make it a good
> idea.  The problem with disabling symlinks is that they're a core part
> of unix filesystem semantics and simply disabling them breaks various
> setups in various ways.

Agreed. The proposed mount option is not going to mix well with cases
such as shared object symlinks commonly found in /usr/lib/

> The breakage depends on the user, so if you
> come from a Windows background, where symlinks basically don't exist,
> you're likely fine, but if you come from a UNIX background, chances are
> you use them pretty extensively.  Obviously, I'm a UNIX background
> person and I can think of all the nasty ways you'll break my setup ...

Note that I'm not saying you should be applying this mount option
broadly, just that it is useful for certain mounts where there are no
reasons for symlinks to be present. Point in case: we already have
noexec, but obviously people don't mount their root fs noexec as that
would prevent the system from booting ;-)

>
> If a root privileged programme is trying to write to a file, it's not
> unreasonable to expect the programme itself to take simple precautions
> (like setuid to the user it's trying to write), so I'd really rather
> declare binaries that don't do this as broken and fix them.  Thinking
> you've plugged a hole like this simply by disabling symlinks is a false
> sense of security because there are a variety of other ways I could
> trick the programme into writing where it shouldn't.  Disabling
> symlinks is fixing a symptom, it's really the cause that should be
> fixed.

Your points are generally valid. Yes, I'm addressing the symptom of
incautious access to untrusted file systems. I actually agree that a
better solution would be for privileged code to be aware of the risks
of file system access and to apply caution where necessary. I also
agree that symlinks are not the only issue with untrusted file
systems, so applying this mount option is not a panacea. I should note
that I'm actively working on hardening some of the more exposed
contexts in Chrome OS beyond the symlink traversal issue.

However, I still believe that blocking symlink traversal in the kernel
is a useful thing to do in our use case. Rationale:

1. On a typical system, there's a ton of code executing as root or
with elevated privileges. It's not practical to audit all that code
for symlink traversal issues. Traditional systems just assume that the
root file system is trusted and that code running as root only deals
with root-owned files, which alleviates the problem to a point where
it becomes manageable. However, on Chrome OS we have the ambition to
not blindly trust the writable file system, so we have to worry about
*any* file access in privileged contexts, not just the cases where
privileged processes access user-owned file system locations (allow me
to add that history has shown that this alone has been a constant
source of bugs to the point that it is now generally considered an
anti-pattern).

2. It is sometimes hard to make code sufficiently cautious, for
example in init scripts. As long as you're using shell, you want to
use features such as shell redirection. To cover these, you'd ideally
patch the shell to take care upon opening files for redirection, but
given the prevalence of file access and the myriad of tools invoked by
shell this is bound to be a losing battle. Another option is to add
shell code along the lines of this: test "$(readlink -f $PATH)" =
"$PATH" (note that O_NOFOLLOW is not enough as that'll only affect the
last path component). That's tedious and error-prone to maintain, and
also comes with a TOCTOU issue.

3. Assuming for a moment that we can fix all existing issues, it's
still not realistic to keep the code clean moving forward. IMHO, the
root of the problem is that developers just don't have symlinks on
their radar when writing file access code. "I'll just write this data
to this file, and I'll hardcode the path, so what can go wrong?" is
probably a realistic level of caution you can expect. The fact that
the presence of symlinks can redirect the operation to an entirely
different file isn't obvious at all when writing the code.

4. On a philosophical note: There are many precedents where fixing
symptoms is standard practice because attempts to fix the root cause
have proven insufficient. One example is memory bugs in browsers. The
industry has widely adopted mitigations that fix symptoms, e.g.
running browser code in sandboxed environments. Furthermore, the
defense-in-depth principle says that you should be prepared for your
fix to the root cause to fail. In that context, it makes sense to
disable symlink traversal for cases where we don't need it to have
another line of defense.

This became longer than intended, but hopefully it helps clarify my
perspective a bit.

- Mattias

>
> James
>
>> On Mon, Oct 24, 2016 at 7:09 AM, Mattias Nissler <
>> mnissler@chromium.org> wrote:
>> > Friendly ping - does this version of the patch have any chance on
>> > getting included in mainline?
>> >
>> > On Wed, Oct 19, 2016 at 2:31 PM, Mattias Nissler <
>> > mnissler@chromium.org> wrote:
>> > > For mounts that have the new "nosymlinks" option, don't follow
>> > > symlinks when resolving paths. The new option is similar in
>> > > spirit to
>> > > the existing "nodev", "noexec", and "nosuid" options.
>> > >
>> > > Note that symlinks may still be created on mounts where the
>> > > "nosymlinks" option is present. readlink() remains functional, so
>> > > user
>> > > space code that is aware of symlinks can still choose to follow
>> > > them
>> > > explicitly.
>> > >
>> > > Setting the "nosymlinks" mount option helps prevent privileged
>> > > writers
>> > > from modifying files unintentionally in case there is an
>> > > unexpected
>> > > link along the accessed path. The "nosymlinks" option is thus
>> > > useful
>> > > as a defensive measure for systems that need to deal with
>> > > untrusted
>> > > file systems in privileged contexts.
>> > >
>> > > Signed-off-by: Mattias Nissler <mnissler@chromium.org>
>> > > ---
>> > >  fs/namei.c              | 3 +++
>> > >  fs/namespace.c          | 9 ++++++---
>> > >  fs/proc_namespace.c     | 1 +
>> > >  fs/statfs.c             | 2 ++
>> > >  include/linux/mount.h   | 3 ++-
>> > >  include/linux/statfs.h  | 1 +
>> > >  include/uapi/linux/fs.h | 1 +
>> > >  7 files changed, 16 insertions(+), 4 deletions(-)
>> > >
>> > > diff --git a/fs/namei.c b/fs/namei.c
>> > > index 5b4eed2..4cddcf3 100644
>> > > --- a/fs/namei.c
>> > > +++ b/fs/namei.c
>> > > @@ -1021,6 +1021,9 @@ const char *get_link(struct nameidata *nd)
>> > >                 touch_atime(&last->link);
>> > >         }
>> > >
>> > > +       if (nd->path.mnt->mnt_flags & MNT_NOSYMLINKS)
>> > > +               return ERR_PTR(-EACCES);
>> > > +
>> > >         error = security_inode_follow_link(dentry, inode,
>> > >                                            nd->flags &
>> > > LOOKUP_RCU);
>> > >         if (unlikely(error))
>> > > diff --git a/fs/namespace.c b/fs/namespace.c
>> > > index e6c234b..deec84e 100644
>> > > --- a/fs/namespace.c
>> > > +++ b/fs/namespace.c
>> > > @@ -2732,6 +2732,8 @@ long do_mount(const char *dev_name, const
>> > > char __user *dir_name,
>> > >                 mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME);
>> > >         if (flags & MS_RDONLY)
>> > >                 mnt_flags |= MNT_READONLY;
>> > > +       if (flags & MS_NOSYMLINKS)
>> > > +               mnt_flags |= MNT_NOSYMLINKS;
>> > >
>> > >         /* The default atime for remount is preservation */
>> > >         if ((flags & MS_REMOUNT) &&
>> > > @@ -2741,9 +2743,10 @@ long do_mount(const char *dev_name, const
>> > > char __user *dir_name,
>> > >                 mnt_flags |= path.mnt->mnt_flags &
>> > > MNT_ATIME_MASK;
>> > >         }
>> > >
>> > > -       flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
>> > > MS_BORN |
>> > > -                  MS_NOATIME | MS_NODIRATIME | MS_RELATIME|
>> > > MS_KERNMOUNT |
>> > > -                  MS_STRICTATIME | MS_NOREMOTELOCK);
>> > > +       flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV |
>> > > MS_NOSYMLINKS |
>> > > +                  MS_ACTIVE | MS_BORN | MS_NOATIME |
>> > > MS_NODIRATIME |
>> > > +                  MS_RELATIME | MS_KERNMOUNT | MS_STRICTATIME |
>> > > +                  MS_NOREMOTELOCK);
>> > >
>> > >         if (flags & MS_REMOUNT)
>> > >                 retval = do_remount(&path, flags & ~MS_REMOUNT,
>> > > mnt_flags,
>> > > diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
>> > > index 3f1190d..a1949d9 100644
>> > > --- a/fs/proc_namespace.c
>> > > +++ b/fs/proc_namespace.c
>> > > @@ -67,6 +67,7 @@ static void show_mnt_opts(struct seq_file *m,
>> > > struct vfsmount *mnt)
>> > >                 { MNT_NOATIME, ",noatime" },
>> > >                 { MNT_NODIRATIME, ",nodiratime" },
>> > >                 { MNT_RELATIME, ",relatime" },
>> > > +               { MNT_NOSYMLINKS, ",nosymlinks" },
>> > >                 { 0, NULL }
>> > >         };
>> > >         const struct proc_fs_info *fs_infop;
>> > > diff --git a/fs/statfs.c b/fs/statfs.c
>> > > index 083dc0a..7ff7c32 100644
>> > > --- a/fs/statfs.c
>> > > +++ b/fs/statfs.c
>> > > @@ -27,6 +27,8 @@ static int flags_by_mnt(int mnt_flags)
>> > >                 flags |= ST_NODIRATIME;
>> > >         if (mnt_flags & MNT_RELATIME)
>> > >                 flags |= ST_RELATIME;
>> > > +       if (mnt_flags & MNT_NOSYMLINKS)
>> > > +               flags |= ST_NOSYMLINKS;
>> > >         return flags;
>> > >  }
>> > >
>> > > diff --git a/include/linux/mount.h b/include/linux/mount.h
>> > > index 1172cce..5e302f4 100644
>> > > --- a/include/linux/mount.h
>> > > +++ b/include/linux/mount.h
>> > > @@ -28,6 +28,7 @@ struct mnt_namespace;
>> > >  #define MNT_NODIRATIME 0x10
>> > >  #define MNT_RELATIME   0x20
>> > >  #define MNT_READONLY   0x40    /* does the user want this to be
>> > > r/o? */
>> > > +#define MNT_NOSYMLINKS 0x80
>> > >
>> > >  #define MNT_SHRINKABLE 0x100
>> > >  #define MNT_WRITE_HOLD 0x200
>> > > @@ -44,7 +45,7 @@ struct mnt_namespace;
>> > >  #define MNT_SHARED_MASK        (MNT_UNBINDABLE)
>> > >  #define MNT_USER_SETTABLE_MASK  (MNT_NOSUID | MNT_NODEV |
>> > > MNT_NOEXEC \
>> > >                                  | MNT_NOATIME | MNT_NODIRATIME |
>> > > MNT_RELATIME \
>> > > -                                | MNT_READONLY)
>> > > +                                | MNT_READONLY | MNT_NOSYMLINKS)
>> > >  #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME |
>> > > MNT_RELATIME )
>> > >
>> > >  #define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD |
>> > > MNT_INTERNAL | \
>> > > diff --git a/include/linux/statfs.h b/include/linux/statfs.h
>> > > index 0166d32..994b059 100644
>> > > --- a/include/linux/statfs.h
>> > > +++ b/include/linux/statfs.h
>> > > @@ -39,5 +39,6 @@ struct kstatfs {
>> > >  #define ST_NOATIME     0x0400  /* do not update access times */
>> > >  #define ST_NODIRATIME  0x0800  /* do not update directory access
>> > > times */
>> > >  #define ST_RELATIME    0x1000  /* update atime relative to
>> > > mtime/ctime */
>> > > +#define ST_NOSYMLINKS  0x2000  /* do not follow symbolic links
>> > > */
>> > >
>> > >  #endif
>> > > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> > > index acb2b61..06193d8 100644
>> > > --- a/include/uapi/linux/fs.h
>> > > +++ b/include/uapi/linux/fs.h
>> > > @@ -130,6 +130,7 @@ struct inodes_stat_t {
>> > >  #define MS_I_VERSION   (1<<23) /* Update inode I_version field
>> > > */
>> > >  #define MS_STRICTATIME (1<<24) /* Always perform atime updates
>> > > */
>> > >  #define MS_LAZYTIME    (1<<25) /* Update the on-disk [acm]times
>> > > lazily */
>> > > +#define MS_NOSYMLINKS  (1<<26) /* Do not follow symbolic links
>> > > */
>> > >
>> > >  /* These sb flags are internal to the kernel */
>> > >  #define MS_NOREMOTELOCK        (1<<27)
>> > > --
>> > > 2.8.0.rc3.226.g39d4020
>> > >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux
>> -fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>