Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752247AbdHGIG4 (ORCPT ); Mon, 7 Aug 2017 04:06:56 -0400 Received: from fallback8.mail.ru ([94.100.181.110]:40228 "EHLO fallback.mail.ru" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751513AbdHGIGx (ORCPT ); Mon, 7 Aug 2017 04:06:53 -0400 Subject: Re: FSGSBASE ABI considerations To: Andy Lutomirski , "Bae, Chang Seok" , X86 ML , "linux-kernel@vger.kernel.org" , Linus Torvalds , Borislav Petkov , Brian Gerst , Bart Oldeman References: From: Stas Sergeev Message-ID: <73bef0c2-f181-0626-2ac1-e4e0537ca851@list.ru> Date: Mon, 7 Aug 2017 11:06:40 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-MW X-7FA49CB5: 0D63561A33F958A554712F4D9E9F5100F514DF15559FCE202A859F08B1316BAE725E5C173C3A84C3C9EEE74C166EF7BC2BB0C0D36CB6ED359EF166FBCB559E95C4224003CC836476C0CAF46E325F83A50BF2EBBBDD9D6B0F41B67924A99884D73B503F486389A921A5CC5B56E945C8DA X-Mailru-Sender: F1845AB6CCC9920DF7838D61D4D05C427F1062335F1D898FE94EE4B97B91F975D4E243ACB0A4A1701653177920737CA72999BEE114A20FF4278B2D54D4112F244F0A872F021F905956A8FB0C6EBA5FCCEAB4BC95F72C04283CDA0F3B3F5B9367 X-Mras: OK X-Mras: OK Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3915 Lines: 86 Hello. 31.07.2017 06:05, Andy Lutomirski пишет: > - User code can use the new RD/WR FS/GS BASE instructions. > Apparently some users really want this for, umm, userspace threading. > Think Java. I wonder how java avoids the lack of the user-space continuations support while getting the userspace threading. (swapcontext() calls to kernel for sigprocmask()) > The major disadvantage is that user code can use the new instructions. > Now userspace is going to do totally stupid shite like writing some > nonzero value to GS and then doing WRGSBASE or like linking some > idiotic library that uses WRGSBASE into a perfectly innocent program > like dosemu2 and resulting in utterly nonsensical descriptor state. I don't think this can represent the problem, at least not for dosemu1/2. dosemu2 does the full context switch via a sighandler, dosemu1 uses iret with manually changing all registers before jumping to compatibility mode. I don't think any state changes done in long mode, can affect the state after jump to compatibility mode. > ----- interaction with modify_ldt() ----- > > The first sticking point we'll hit is modify_ldt() and, in particular, > what happens if you call modify_ldt() to change the base of a segment > that is ioaded into gs by another thread in the same mm. > > Our current behavior here is nonsensical: on 32-bit kernels, FS would > be fully refreshed on other threads and GS might be depending on > compiler options. On 64-bit kernels, neither FS nor GS is immediately > refreshed. Historically, we didn't refresh anything reliably. On the > bright side, this means that existing modify_ldt() users are (AFAIK) > tolerant of somewhat crazy behavior. > > On an FSGSBASE-enabled system, I think we need to provide > deterministic, documented, tested behavior. I can think of three > plausible choices: > > 1a. modify_ldt() immediately updates FSBASE and GSBASE all threads > that reference the modified selector. > > 1b. modify_ldt() immediatley updates FSBASE and GSBASE on all threads > that reference the LDT. Does 1b mean that any call to modify_ldt(), even the read call, will reset all bases to the ones of LDT? I think this is the half-step. It clearly shows that you don't want such state to ever exist, but why not to go a step further and just make the bases to be reset not only by any unrelated modify_ldt() call, but always on schedule? You can state that using wrgsbase on non-zero selector is invalid, reset it to LDT state and maybe send a signal to the program so that it knows it did something wrong. This may sound too rough, but I really don't see how it differs from resetting all LDT bases on some unrelated modify_ldt() that was done for read, not write. Or you may want to reset selector to 0 rather than base to LDT. > 2. modify_ldt() leaves FSBASE and GSBASE alone on all threads. > > (2) is trivial to implement, whereas (1a) and (1b) are a bit nasty to > implement when FSGSBASE is on. > > The tricky bit is that 32-bit kernels can't do (2), so, if we want But do we have fsgsbase on 32bit kernels at all? I think it works only in long mode, no? I really tried to google some extensive description on this feature, but failed. > modify_ldt() to behave the same on 32-bit and 64-bit kernels, we're > stuck with (1). If you mean 1a, then to me it looks like a lot of efforts for something no one ever needs. > Thoughts? I am far from the kernel development so my thoughts may be naive, but IMHO you should just disallow this by some means (like by doing a fixup on schedule() and sending a signal). No one will suffer, people will just write 0 to segreg first. Note that such a problem can be provoked by the fact that the sighandler does not reset the segregs to their default values, and someone may simply forget to reset it to 0. You need to remind him to do so rather than to invent the tricky code to do something theoretically correct.