MIME-Version: 1.0
In-Reply-To: <20161014150017.GB19539@ZenIV.linux.org.uk>
References: <1476455305-35554-1-git-send-email-mnissler@chromium.org>
 <20161014145515.GA19539@ZenIV.linux.org.uk> <20161014150017.GB19539@ZenIV.linux.org.uk>
From: Mattias Nissler <mnissler@chromium.org>
Date: Fri, 14 Oct 2016 17:50:56 +0200
Message-ID: <CAKUbbx+60R66QR-5CUbasQ5ucMn_nyDmwaYj8yWzOH36Z3m2aw@mail.gmail.com>
Subject: Re: [RFC] [PATCH] Add a "nolinks" mount option.
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4156
Lines: 80

On Fri, Oct 14, 2016 at 5:00 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Oct 14, 2016 at 03:55:15PM +0100, Al Viro wrote:
> > > Setting the "nolinks" mount option helps prevent privileged writers
> > > from modifying files unintentionally in case there is an unexpected
> > > link along the accessed path. The "nolinks" option is thus useful as a
> > > defensive measure against persistent exploits (i.e. a system getting
> > > re-exploited after a reboot) for systems that employ a read-only or
> > > dm-verity-protected rootfs. These systems prevent non-legit binaries
> > > from running after reboot. However, legit code typically still reads
> > > from and writes to a writable file system previously under full
> > > control of the attacker, who can place symlinks to trick file writes
> > > after reboot to target a file of their choice. "nolinks" fundamentally
> > > prevents this.
> >
> > Which parts of the tree would be on that "protected" rootfs and which would
> > you mount with that option?  Description above is rather vague and I'm
> > not convinced that it actually buys you anything.  Details, please...

Apologies for the vague description, I'm happy to explain in detail.

In case of Chrome OS, we have all binaries on a dm-verity rootfs, so
an attacker can't modify any binaries. After reboot, everything except
the rootfs is mounted noexec, so there's no way to re-gain code
execution after reboot by modifying existing binaries or dropping new
ones.

We've seen multiple exploits now where the attacker worked around
these limitations in two steps:

1. Before reboot, the attacker sets up symlinks on the writeable file
system (called "stateful" file system), which are later accessed by
legit boot code (such as init scripts) after reboot. For example, an
init script that copies file A to B can be abused by an attacker by
symlinking or hardlinking B to a location C of their choice, and
placing desired data to be written to C in A. That gives the attacker
a primitive to write data of their choice to a path of their choice
after reboot. Note that this primitive may target locations _outside_
the stateful file system the attacker previously had control of.
Particularly of interest are targets on /sys, but also tmpfs on /run
etc.

2. The second step for a successful attack is finding some legit code
invoked in the boot flow that has a vulnerability exploitable by
feeding it unexpected data. As an example, there are Linux userspace
utilities that read config from /run which may contain shell commands
the the utility executes, through which the attacker can gain code
execution again.

The purpose of the proposed patch is to raise the bar for the first
step of the attack: Writing arbitrary files after reboot. I'm
intending to mount the stateful file system with the nolinks option
(or otherwise prevent symlink traversal). This will help make sure
that any legit writes taking place during boot in init scripts etc. go
to the files intended by the developer, and can't be redirected by an
attacker.

Does this make more sense to you?

>
>
> PS: what the hell do restrictions on _following_ symlinks have to _creating_
> hardlinks?  I'm trying to imagine a threat model where both would apply or
> anything else beyond the word "link" they would have in common...

The restriction is not on _creating_ hard links, but _opening_
hardlinks. The commonality is in the confusion between the file you're
meaning to write vs. the file you actually end up writing to, which
stems from the fact that as things stand a file can be accessible on
other paths than its canonical one. For Chrome OS, I'd like to get to
a point where most privileged code can only access a file via its
canonical name (bind mounts are an OK exception as they're not
persistent, so out of reach for manipulation).

>
> The one you've described above might have something to do with the first
> one (modulo missing description of the setup you have in mind), but it
> clearly has nothing to do with the second - attackers could've created
> whatever they wanted while the fs had been under their control, after all.
> Doesn't make sense...