From: ebiederm@xmission.com (Eric W. Biederman)
To: Willy Tarreau <w@1wt.eu>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, tglx@linutronix.de,
        gnomes@lxorguk.ukuu.org.uk, torvalds@linux-foundation.org
References: <1515427939-10999-1-git-send-email-w@1wt.eu>
Date: Tue, 09 Jan 2018 09:31:27 -0600
In-Reply-To: <1515427939-10999-1-git-send-email-w@1wt.eu> (Willy Tarreau's
        message of "Mon, 8 Jan 2018 17:12:15 +0100")
Message-ID: <87a7xnkq0g.fsf@xmission.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Subject: Re: [PATCH RFC 0/4] Per-task PTI activation
Sender: linux-kernel-owner@vger.kernel.org

Willy Tarreau <w@1wt.eu> writes:

> Hi!
>
> I could experiment a bit with the possibility to enable/disable PTI per
> task. Please keep in mind that it's not my area of experitise at all, but
> doing so I could recover the initial performance without disabling PTI on
> the whole system.
>
> So what I did in this series consists in the following :
>   - addition of a new per-task TIF_NOPTI flag. Please note that I'm not
>     proud of the way I did it, as 32 flags were already taken. The flags
>     are declared as "long" so there are 32 more flags available on x86_64
>     but C and asm disagree on the type of 1<<32 so I had to declare the
>     hex value by hand... By the way I even suspect that _TIF_FSCHECK is
>     wrong once cast to a long, I think it causes sign extension into the
>     32 upper bits since it's supposed to be signed.
>
>   - addition of a set of arch_prctl() calls (ARCH_GET_NOPTI and
>     ARCH_SET_NOPTI), to check and change the activation of the
>     protection. The change requires CAP_SYS_RAWIO and can be done in
>     a wrapper (that's how I tested)
>
>   - the user PGD was marked with _PAGE_NX to prevent an accidental leak
>     of CR3 from not being detected. I obviously had to disable this since
>     in this case we do want such a user task to run without switching the
>     PGD. I think this could be performed per-task maybe. Another approach
>     might consist in dealing with 3 PGDs and using a different one for
>     unprotected tasks but that really starts to sound overkill.
>
>   - upon return to userspace, I check if the task's flags contain the
>     new TIF_NOPTI or not. If it does contain it, then we don't switch
>     the CR3.
>
>   - upon entry into the kernel from userspace, we can't access the task's
>     flags but we can already check if CR3 points to the kernel or user PGD,
>     and we refrain from switching if it's already the system one.
>
> By doing so I could recover the initial performance of haproxy in a VM,
> going from 12400 connections per second to 21000 once started with this
> trivial wrapper :
>
>   #include <asm/prctl.h>
>   #include <sys/prctl.h>
>   
>   #ifndef ARCH_SET_NOPTI
>   #define ARCH_SET_NOPTI 0x1022
>   #endif
>   
>   int main(int argc, char **argv)
>   {
>           arch_prctl(ARCH_SET_NOPTI, 1);
>           argv++;
>           return execvp(argv[0], argv);
>   }
>
> I have not yet run it on real hardware. Before trying to go a bit further
> I'd like to know if such an approach is acceptable or if I'm doing anything
> stupid and looking in the wrong direction.

Before this goes much farther I want to point something out.

When I have kpti protecting me it is the applications with that connect
to the network I worry about.  Until I get to a system with users that
don't trust each other local I don't have a reason to worry about these
attacks from local applications.

The dangerous scenario is someone exploting a buffer overflow, or
otherwise getting a network facing application to misbehave, and then
using these new attacks to assist in gaining privilege escalation.


Googling seems to indicate that there is about one issue a year found in
haproxy.  So this is not an unrealistic concern for the case you
mention.


So unless I am seeing things wrong this is a patchset designed to drop
your defensense on the most vulnerable applications.


Disably protection on the most vunerable applications is not behavior
I would encourage.  It seems better than disabling protection system
wide but only slightly.   I definitely don't think this is something we
want applications disabling themselves.

Certainly this is something that should look at no-new-privs and if
no-new-privs is set not allow disabling this protection.

Eric