Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753037AbcLFIql convert rfc822-to-8bit (ORCPT ); Tue, 6 Dec 2016 03:46:41 -0500 Received: from prv-mh.provo.novell.com ([137.65.248.74]:42079 "EHLO prv-mh.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752110AbcLFIqk (ORCPT ); Tue, 6 Dec 2016 03:46:40 -0500 Message-Id: <584688FD02000078001257BB@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.1 Date: Tue, 06 Dec 2016 01:46:37 -0700 From: "Jan Beulich" To: "Andy Lutomirski" Cc: "Borislav Petkov" , "Andrew Cooper" , "Brian Gerst" , "Matthew Whitehead" , "Henrique de Moraes Holschuh" , "Peter Zijlstra" , , "xen-devel" , "One Thousand Gnomes" , "Boris Ostrovsky" , "Juergen Gross" , "linux-kernel@vger.kernel.org" , "H. Peter Anvin" Subject: Re: [Xen-devel] [PATCH v3 4/4] x86/asm: Rewrite sync_core() to use IRET-to-self References: In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2576 Lines: 82 >>> On 05.12.16 at 22:32, wrote: > static inline void sync_core(void) > { > - int tmp; > - > -#ifdef CONFIG_X86_32 > /* > - * Do a CPUID if available, otherwise do a jump. The jump > - * can conveniently enough be the jump around CPUID. > + * There are quite a few ways to do this. IRET-to-self is nice > + * because it works on every CPU, at any CPL (so it's compatible > + * with paravirtualization), and it never exits to a hypervisor. > + * The only down sides are that it's a bit slow (it seems to be > + * a bit more than 2x slower than the fastest options) and that > + * it unmasks NMIs. The "push %cs" is needed because, in > + * paravirtual environments, __KERNEL_CS may not be a valid CS > + * value when we do IRET directly. > + * > + * In case NMI unmasking or performance every becomes a problem, > + * the next best option appears to be MOV-to-CR2 and an > + * unconditional jump. That sequence also works on all CPUs, > + * but it will fault at CPL3. CPL > 0 I think. > + * CPUID is the conventional way, but it's nasty: it doesn't > + * exist on some 486-like CPUs, and it usually exits to a > + * hypervisor. > */ > - asm volatile("cmpl %2,%1\n\t" > - "jl 1f\n\t" > - "cpuid\n" > - "1:" > - : "=a" (tmp) > - : "rm" (boot_cpu_data.cpuid_level), "ri" (0), "0" (1) > - : "ebx", "ecx", "edx", "memory"); > + register void *__sp asm(_ASM_SP); > + > +#ifdef CONFIG_X86_32 > + asm volatile ( > + "pushfl\n\t" > + "pushl %%cs\n\t" > + "pushl $1f\n\t" > + "iret\n\t" > + "1:" > + : "+r" (__sp) : : "cc", "memory"); I don't thing EFLAGS (i.e. "cc") gets modified anywhere here. And the memory clobber would perhaps better be pulled out into an explicit barrier() invocation (making it more obvious what it's needed for)? > #else > - /* > - * CPUID is a barrier to speculative execution. > - * Prefetched instructions are automatically > - * invalidated when modified. > - */ > - asm volatile("cpuid" > - : "=a" (tmp) > - : "0" (1) > - : "ebx", "ecx", "edx", "memory"); > + unsigned long tmp; > + > + asm volatile ( > + "movq %%ss, %0\n\t" > + "pushq %0\n\t" > + "pushq %%rsp\n\t" > + "addq $8, (%%rsp)\n\t" > + "pushfq\n\t" > + "movq %%cs, %0\n\t" > + "pushq %0\n\t" > + "pushq $1f\n\t" > + "iretq\n\t" > + "1:" > + : "=r" (tmp), "+r" (__sp) : : "cc", "memory"); The first output needs to be "=&r". And is movq really a good idea for selector reads? Why don't you make tmp unsigned int, use plain mov, and use %q0 as pushq's operands? Jan