Subject: Re: FSGSBASE ABI considerations
To: Andy Lutomirski <luto@kernel.org>,
        "Bae, Chang Seok" <chang.seok.bae@intel.com>, X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Borislav Petkov <bpetkov@suse.de>, Brian Gerst <brgerst@gmail.com>,
        Bart Oldeman <bartoldeman@users.sourceforge.net>
References: <CALCETrVSm9tLiHKf-BKaYknEfko1bd_2i+4U7cqELG5iiFDygg@mail.gmail.com>
From: Stas Sergeev <stsp@list.ru>
Message-ID: <73bef0c2-f181-0626-2ac1-e4e0537ca851@list.ru>
Date: Mon, 7 Aug 2017 11:06:40 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <CALCETrVSm9tLiHKf-BKaYknEfko1bd_2i+4U7cqELG5iiFDygg@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-MW
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3915
Lines: 86

Hello.

31.07.2017 06:05, Andy Lutomirski пишет:
>   - User code can use the new RD/WR FS/GS BASE instructions.
> Apparently some users really want this for, umm, userspace threading.
> Think Java.
I wonder how java avoids the lack of the user-space
continuations support while getting the userspace threading.
(swapcontext() calls to kernel for sigprocmask())

> The major disadvantage is that user code can use the new instructions.
> Now userspace is going to do totally stupid shite like writing some
> nonzero value to GS and then doing WRGSBASE or like linking some
> idiotic library that uses WRGSBASE into a perfectly innocent program
> like dosemu2 and resulting in utterly nonsensical descriptor state.
I don't think this can represent the problem, at least not
for dosemu1/2. dosemu2 does the full context switch via
a sighandler, dosemu1 uses iret with manually changing
all registers before jumping to compatibility mode. I don't
think any state changes done in long mode, can affect the
state after jump to compatibility mode.

> ----- interaction with modify_ldt() -----
>
> The first sticking point we'll hit is modify_ldt() and, in particular,
> what happens if you call modify_ldt() to change the base of a segment
> that is ioaded into gs by another thread in the same mm.
>
> Our current behavior here is nonsensical: on 32-bit kernels, FS would
> be fully refreshed on other threads and GS might be depending on
> compiler options.  On 64-bit kernels, neither FS nor GS is immediately
> refreshed.  Historically, we didn't refresh anything reliably.  On the
> bright side, this means that existing modify_ldt() users are (AFAIK)
> tolerant of somewhat crazy behavior.
>
> On an FSGSBASE-enabled system, I think we need to provide
> deterministic, documented, tested behavior.  I can think of three
> plausible choices:
>
> 1a. modify_ldt() immediately updates FSBASE and GSBASE all threads
> that reference the modified selector.
>
> 1b. modify_ldt() immediatley updates FSBASE and GSBASE on all threads
> that reference the LDT.
Does 1b mean that any call to modify_ldt(), even the
read call, will reset all bases to the ones of LDT? I think
this is the half-step. It clearly shows that you don't want
such state to ever exist, but why not to go a step further
and just make the bases to be reset not only by any
unrelated modify_ldt() call, but always on schedule?
You can state that using wrgsbase on non-zero selector
is invalid, reset it to LDT state and maybe send a signal
to the program so that it knows it did something wrong.
This may sound too rough, but I really don't see how it
differs from resetting all LDT bases on some unrelated
modify_ldt() that was done for read, not write.
Or you may want to reset selector to 0 rather than
base to LDT.

> 2. modify_ldt() leaves FSBASE and GSBASE alone on all threads.
>
> (2) is trivial to implement, whereas (1a) and (1b) are a bit nasty to
> implement when FSGSBASE is on.
>
> The tricky bit is that 32-bit kernels can't do (2), so, if we want
But do we have fsgsbase on 32bit kernels at all?
I think it works only in long mode, no?
I really tried to google some extensive description
on this feature, but failed.

> modify_ldt() to behave the same on 32-bit and 64-bit kernels, we're
> stuck with (1).
If you mean 1a, then to me it looks like a lot of efforts
for something no one ever needs.

> Thoughts?
I am far from the kernel development so my thoughts
may be naive, but IMHO you should just disallow this
by some means (like by doing a fixup on schedule() and
sending a signal). No one will suffer, people will just
write 0 to segreg first. Note that such a problem can
be provoked by the fact that the sighandler does not
reset the segregs to their default values, and someone
may simply forget to reset it to 0. You need to remind
him to do so rather than to invent the tricky code to
do something theoretically correct.