Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753631AbcLFRuH (ORCPT ); Tue, 6 Dec 2016 12:50:07 -0500 Received: from mail-ua0-f180.google.com ([209.85.217.180]:36639 "EHLO mail-ua0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751495AbcLFRuD (ORCPT ); Tue, 6 Dec 2016 12:50:03 -0500 MIME-Version: 1.0 In-Reply-To: <20161206075227.utcmqghjjcdofx3w@pd.tnic> References: <20161206075227.utcmqghjjcdofx3w@pd.tnic> From: Andy Lutomirski Date: Tue, 6 Dec 2016 09:49:42 -0800 Message-ID: Subject: Re: [PATCH v3 4/4] x86/asm: Rewrite sync_core() to use IRET-to-self To: Borislav Petkov Cc: Andy Lutomirski , X86 ML , One Thousand Gnomes , "linux-kernel@vger.kernel.org" , Brian Gerst , Matthew Whitehead , Henrique de Moraes Holschuh , Peter Zijlstra , xen-devel , Juergen Gross , Boris Ostrovsky , Andrew Cooper , "H. Peter Anvin" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id uB6HoAPm022428 Content-Length: 4757 Lines: 125 On Mon, Dec 5, 2016 at 11:52 PM, Borislav Petkov wrote: > On Mon, Dec 05, 2016 at 01:32:43PM -0800, Andy Lutomirski wrote: >> Aside from being excessively slow, CPUID is problematic: Linux runs >> on a handful of CPUs that don't have CPUID. Use IRET-to-self >> instead. IRET-to-self works everywhere, so it makes testing easy. >> >> For reference, On my laptop, IRET-to-self is ~110ns, >> CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM, >> and MOV-to-CR2 is ~42ns. >> >> While we're at it: sync_core() serves a very specific purpose. >> Document it. >> >> Cc: "H. Peter Anvin" >> Signed-off-by: Andy Lutomirski >> --- >> arch/x86/include/asm/processor.h | 77 ++++++++++++++++++++++++++++------------ >> 1 file changed, 55 insertions(+), 22 deletions(-) >> >> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h >> index 64fbc937d586..201a956e345f 100644 >> --- a/arch/x86/include/asm/processor.h >> +++ b/arch/x86/include/asm/processor.h >> @@ -590,33 +590,66 @@ static __always_inline void cpu_relax(void) >> >> #define cpu_relax_lowlatency() cpu_relax() >> >> -/* Stop speculative execution and prefetching of modified code. */ >> +/* >> + * This function forces the icache and prefetched instruction stream to >> + * catch up with reality in two very specific cases: >> + * >> + * a) Text was modified using one virtual address and is about to be executed >> + * from the same physical page at a different virtual address. >> + * >> + * b) Text was modified on a different CPU, may subsequently be >> + * executed on this CPU, and you want to make sure the new version >> + * gets executed. This generally means you're calling this in a IPI. >> + * >> + * If you're calling this for a different reason, you're probably doing >> + * it wrong. > > "... and think hard before you call this - it is slow." > > I'd add that now that it is even slower than CPUID. But only barely. And it's hugely faster than CPUID under KVM or similar. And it works on all CPUs. > >> + */ >> static inline void sync_core(void) >> { >> - int tmp; >> - >> -#ifdef CONFIG_X86_32 >> /* >> - * Do a CPUID if available, otherwise do a jump. The jump >> - * can conveniently enough be the jump around CPUID. >> + * There are quite a few ways to do this. IRET-to-self is nice >> + * because it works on every CPU, at any CPL (so it's compatible >> + * with paravirtualization), and it never exits to a hypervisor. >> + * The only down sides are that it's a bit slow (it seems to be >> + * a bit more than 2x slower than the fastest options) and that >> + * it unmasks NMIs. > > Ewww, I hadn't thought of that angle. Aren't we going to get in all > kinds of hard to debug issues due to that couple of cycles window of > unmasked NMIs? > > We sync_core in some NMI handler and then right in the middle of it we > get another NMI. Yeah, we have the nested NMI stuff still but I'd like > to avoid complications if possible. I'll counter with: 1. Why on earth would an NMI call sync_core()? 2. We have very careful and code to handle this issue, and NMIs really do cause page faults which have exactly the same problem. So I'm not too worried. > >> The "push %cs" is needed because, in >> + * paravirtual environments, __KERNEL_CS may not be a valid CS >> + * value when we do IRET directly. >> + * >> + * In case NMI unmasking or performance every becomes a problem, >> + * the next best option appears to be MOV-to-CR2 and an >> + * unconditional jump. That sequence also works on all CPUs, >> + * but it will fault at CPL3. > > Does it really have to be non-priviledged? Unless we want to declare lguest unsupported, delete it from the tree, or, sigh, actually maintain it, then yes :( native_write_cr2() would work on Xen, but it's slow. > > If not, there are a couple more serializing insns: > > "• Privileged serializing instructions — INVD, INVEPT, INVLPG, > INVVPID, LGDT, LIDT, LLDT, LTR, MOV (to control register, with the > exception of MOV CR83), MOV (to debug register), WBINVD, and WRMSR" > > What about INVD? It is expensive too :-) Only if you write the patch and label it: Snickered-at-by: Andy Lutomirski > > Can't we use, MOV %dr or so? With previously saving/restoring the reg? > > WBINVD could be another candidate, albeit a big hammer. > > WRMSR maybe too. If it faults, it's fine too because then you have the > IRET automagically. Hell, we could even make it fault on purpose by > writing into an invalid MSR but then we're back to the unmasking NMIs... > :-\ I still like MOV-to-CR2 better than all of these. --Andy