Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752976AbbKPQZd (ORCPT ); Mon, 16 Nov 2015 11:25:33 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:44885 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752920AbbKPQZ1 (ORCPT ); Mon, 16 Nov 2015 11:25:27 -0500 Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit To: Andy Lutomirski References: <1447456706-24347-1-git-send-email-boris.ostrovsky@oracle.com> <56468D24.8030801@oracle.com> Cc: "linux-kernel@vger.kernel.org" , xen-devel , David Vrabel , Konrad Rzeszutek Wilk From: Boris Ostrovsky Message-ID: <564A0371.2040104@oracle.com> Date: Mon, 16 Nov 2015 11:25:21 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3295 Lines: 73 On 11/15/2015 01:02 PM, Andy Lutomirski wrote: > On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" wrote: >> >> >> On 11/13/2015 06:26 PM, Andy Lutomirski wrote: >>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky >>> wrote: >>>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c >>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack >>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e. >>>> it's not pt_regs). >>>> >>>> We need to adjust it so that subsequent xen_iret can use it. >>> I'm wondering if this should be more straightforward: >>> >>> movq %rsp, %rdi >>> call do_fast_syscall_32 >>> testl %eax, %eax >>> jz .Lsyscall_32_done >>> >>> /* Opportunistic SYSRET */ >>> sysret32_from_system_call: >>> XEN_DO_SYSRET32 >>> >>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a >>> variant of Xen's iret path that knows that the fast path is okay. >> >> >> This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too. > In 4.4, it's almost identical (which was part of the point of this > whole series). We use sysret32 instead of sysexit, but the underlying > structure is the same: munge the stack frame and register state > appropriately to use the fast return instruction in question and then > execute it. In both cases, the only real difference from the IRET > path is that we're willing to lose the values of some subset of cx, > dx, and (on 64-bit kernels) r11. So it turned out that for compat mode we don't need to do anything since xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it can't be used) and builds the IRET frame itself. > >> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think. > Agreed, unless... > > Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could > be easier to follow and be less code at the same time. Frankly, > following the control flow from asm through the pre-paravirt-patching > and post-paravirt-patching variants and into the final targets is > getting a little bit old, and ALTERNATIVE is crystal clear in > comparison (and has all the interesting info inline with the rest of > the asm). Of course, it doesn't work early in boot, but that's fine > for anything involving user/kernel switches. We don't currently have a Xen-specific CPU feature. We could, in principle, add it but we can't replace all of current paravirt patching with a single feature since PVH guests use a subset of existing pv ops (and in the future it may become even more fine-grained). And I don't think we should go ALTERNATIVE route for one set of features and keep pv ops for the rest --- it should be either one or the other. -boris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/