Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753398AbbKPTDs (ORCPT ); Mon, 16 Nov 2015 14:03:48 -0500 Received: from mail-ob0-f181.google.com ([209.85.214.181]:33374 "EHLO mail-ob0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752546AbbKPTDm (ORCPT ); Mon, 16 Nov 2015 14:03:42 -0500 MIME-Version: 1.0 In-Reply-To: <564A0371.2040104@oracle.com> References: <1447456706-24347-1-git-send-email-boris.ostrovsky@oracle.com> <56468D24.8030801@oracle.com> <564A0371.2040104@oracle.com> From: Andy Lutomirski Date: Mon, 16 Nov 2015 11:03:22 -0800 Message-ID: Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit To: Boris Ostrovsky Cc: "linux-kernel@vger.kernel.org" , xen-devel , David Vrabel , Konrad Rzeszutek Wilk , Borislav Petkov Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5351 Lines: 139 On Mon, Nov 16, 2015 at 8:25 AM, Boris Ostrovsky wrote: > On 11/15/2015 01:02 PM, Andy Lutomirski wrote: >> >> On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" >> wrote: >>> >>> >>> >>> On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >>>> >>>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >>>> wrote: >>>>> >>>>> After 32-bit syscall rewrite, and specifically after commit >>>>> 5f310f739b4c >>>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>>>> it's not pt_regs). >>>>> >>>>> We need to adjust it so that subsequent xen_iret can use it. >>>> >>>> I'm wondering if this should be more straightforward: >>>> >>>> movq %rsp, %rdi >>>> call do_fast_syscall_32 >>>> testl %eax, %eax >>>> jz .Lsyscall_32_done >>>> >>>> /* Opportunistic SYSRET */ >>>> sysret32_from_system_call: >>>> XEN_DO_SYSRET32 >>>> >>>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >>>> variant of Xen's iret path that knows that the fast path is okay. >>> >>> >>> >>> This patch is for 32-bit kernel. I actually haven't looked at compat code >>> (probably because our tests don't try that), I need to do that too. >> >> In 4.4, it's almost identical (which was part of the point of this >> whole series). We use sysret32 instead of sysexit, but the underlying >> structure is the same: munge the stack frame and register state >> appropriately to use the fast return instruction in question and then >> execute it. In both cases, the only real difference from the IRET >> path is that we're willing to lose the values of some subset of cx, >> dx, and (on 64-bit kernels) r11. > > > > So it turned out that for compat mode we don't need to do anything since > xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it > can't be used) and builds the IRET frame itself. > It's still a waste of effort, though. Also, I'd eventually like the number of places in Xen code in which rsp/esp is invalid to be exactly zero, and this approach makes this harder or even impossible. > >> >>> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for >>> baremetal otherwise current paravirt op will use native_usergs_sysret32 (for >>> compat code). Which means a new pv_op, I think. >> >> Agreed, unless... >> >> Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could >> be easier to follow and be less code at the same time. Frankly, >> following the control flow from asm through the pre-paravirt-patching >> and post-paravirt-patching variants and into the final targets is >> getting a little bit old, and ALTERNATIVE is crystal clear in >> comparison (and has all the interesting info inline with the rest of >> the asm). Of course, it doesn't work early in boot, but that's fine >> for anything involving user/kernel switches. > > > > We don't currently have a Xen-specific CPU feature. We could, in principle, > add it but we can't replace all of current paravirt patching with a single > feature since PVH guests use a subset of existing pv ops (and in the future > it may become even more fine-grained). > > And I don't think we should go ALTERNATIVE route for one set of features and > keep pv ops for the rest --- it should be either one or the other. Does PVH hook into the entry asm code at all? I thought it was just boot code and drivers. In any case, someone needs to do some serious review and cleanup on the whole paravirt op mess. We have a bunch of paravirt ops that serve little purpose. The paravirt infrastructure is a bit weird, too: it seems to effectively have four states for each patch site. There's: 1. The initial state, which is unoptimized and works on native. Presumably any of these that happen early also need to work, if slowly, on Xen. 2. The Xen state without text patching. I'm not actually sure why this exists at all. Are there pvops that need to switch too early for us to patch the text? 3. The native patched state. This is supposedly optimal, but it results in a few more NOPs than are really needed. 4. The Xen patched state. Alternatives have only two states, and the code is much easier to understand. Also, alternatives avoid things like: ... SWAPGS ... The reader surely doesn't remember that this isn't guaranteed to be a swapgs instruction on native. Using: ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and much clearer. We could hide *that* behind a macro and no one would be confused. (Well, they'd be confused by the fact that Xen PV handles gsbase very differently from native, but that has nothing to do with the macro.) I think we could convert piecemeal, and I wonder if this new patch for 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good starting point. Borislav, what do you think? Would you be okay with adding a Xen PV pseudofeature? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/