From: ebiederm@xmission.com (Eric W. Biederman)
To: Willy Tarreau <w@1wt.eu>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de,
        gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org
References: <1515427939-10999-1-git-send-email-w@1wt.eu>
        <87a7xnkq0g.fsf@xmission.com> <20180109160215.GA13065@1wt.eu>
Date: Tue, 09 Jan 2018 15:07:07 -0600
In-Reply-To: <20180109160215.GA13065@1wt.eu> (Willy Tarreau's message of "Tue,
        9 Jan 2018 17:02:15 +0100")
Message-ID: <87d12iivwk.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [PATCH RFC 0/4] Per-task PTI activation
Sender: linux-kernel-owner@vger.kernel.org

Willy Tarreau <w@1wt.eu> writes:

> Hi Eric,
>
> On Tue, Jan 09, 2018 at 09:31:27AM -0600, Eric W. Biederman wrote:
>> The dangerous scenario is someone exploting a buffer overflow, or
>> otherwise getting a network facing application to misbehave, and then
>> using these new attacks to assist in gaining privilege escalation.
>
> For most use cases sure. But for *some* use cases, if they can control
> of the application, you've already lost everything you had. Private keys,
> clear text traffic, etc. We're precisely talking about such applications
> where the userspace is as much important as the kernel, and where there's
> hardly anything left to lose once the application is cracked. However, a
> significant performance drop on the application definitely is a problem,
> first making it weaker when facing attacks, or even failing to deal with
> traffic peaks.

>From reading the earlier emails it was not clear that all was lost if
they were compromomised.  In that case this makes plenty of sense.


>> Googling seems to indicate that there is about one issue a year found in
>> haproxy.  So this is not an unrealistic concern for the case you
>> mention.
>
> I agree. But in practice, we had two exploitable bugs, one in 2002 
> (overflow in the logs), and one in 2014 requiring a purposely written
> config which makes no pratical sense at all. Most other vulnerabilities
> involve freezes, occasionally crashes, though that's even more rare.
> And even with the two above, you just have one chance to try to exploit
> it, if you get your pointer wrong, it dies and you have to wait for the
> admin to restart it. In practice, seeing the process die is the worst
> nightmare of admins as the service simply stops. I'm not saying we don't
> want to defend them, we even chroot to an empty directory and drop
> privileges to mitigate such a risk. But when the intruder is in the
> process it's really too late.
>
>> So unless I am seeing things wrong this is a patchset designed to drop
>> your defensense on the most vulnerable applications.
>
> In fact it can be seen very differently. By making it possible for exposed
> but critical applications to share some risks with the rest of the system,
> we also ensure they remain strong for their initial purpose and against
> the most common types of attacks. And quite frankly we're not weakening
> much given the risks already involved by the process itself.
>
> What I'm describing represents a small category of processes in only
> certain environments. Some database servers will have the same issue.
> Imagine a Redis server for example, which normally is very fast and
> easily saturates whatever network around it. Some DNS providers may
> have the same problem when dealing with hundreds of thousands to
> millions of UDP packets each second (not counting attacks).
>
> All such services are critical in themselves, but the fact that we accept
> to let them share the risks with the system doesn't mean they should be
> running without the protections from the occasional operations guy just
> allowed to connect there to verify if logs are full and to retrive
> stats.

Reasonable.

>> Disably protection on the most vunerable applications is not behavior
>> I would encourage.
>
> I'm not encouraging this behaviour either but right now the only option
> for performance critical applications (even if they are vulnerable) is
> to make the whole system vulnerable.
>
>> It seems better than disabling protection system
>> wide but only slightly.   I definitely don't think this is something we
>> want applications disabling themselves.
>
> In fact that's what I liked with the wrapper approach, except that it
> had the downside of being harder to manage in terms of administration
> and we'd risk to see it used everywhere by default. The arch_prctl()
> approach ensures that only applications where this is relevant can do
> it. In the case of haproxy, I can trivially add a config option like
> "disable-page-isolation" to let the admin enable it on purpose.

How is that different from the option?

> But I suspect there might be some performance critical applications that
> cannot be patched, and that's where the wrapper could still provide some
> value.

I just don't want to encourage changning this option by default.  As a
lot of applications get installed in home servers or other places where
they are not performance critical.  At which point disabling the kpti
protection by default would be reducing the level of protection of
everything.

But ultimately I only brought this up so that people are thinking about
the other side of this.   About how it will affect not the high
performance servers single function but how it will affect the millions
of little servers that do many things all from a single machine.

Certainly I would not want this enabled in a container or a virtual
private server.  The capable(CAP_RAWIO) seems to handle that beautifully.

>> Certainly this is something that should look at no-new-privs and if
>> no-new-privs is set not allow disabling this protection.
>
> I don't know what is "no-new-privs" and couldn't find info on it
> unfortunately. Do you have a link please ?

Probably because I used dashes.  The no new privs flag is documented
in:
Documentation/userspace-api/no_new_privs.rst

It is a sandboxing flag that guarantees a process can not gain
privileges after it has been set.  You can search for PFA_NO_NEW_PRIVS
in sched.h if you want to see where it is defined.

Eric