2015-11-13 23:15:50

by Boris Ostrovsky

[permalink] [raw]
Subject: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
it's not pt_regs).

We need to adjust it so that subsequent xen_iret can use it.

Signed-off-by: Boris Ostrovsky <[email protected]>
---

Alternatively, we could return 0 from do_fast_syscall_32() if paravirt_enabled()
is true since Xen PV guests will end up using xen_iret one way or the other. And
then we won't need xen_sysexit at all.

arch/x86/xen/xen-asm_32.S | 23 ++++++++++++++++-------
1 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/xen-asm_32.S b/arch/x86/xen/xen-asm_32.S
index fd92a64..c70ec37 100644
--- a/arch/x86/xen/xen-asm_32.S
+++ b/arch/x86/xen/xen-asm_32.S
@@ -36,15 +36,24 @@ check_events:

/*
* We can't use sysexit directly, because we're not running in ring0.
- * But we can easily fake it up using iret. Assuming xen_sysexit is
- * jumped to with a standard stack frame, we can just strip it back to
- * a standard iret frame and use iret.
+ * But we can easily fake it up using iret.
+ * We came here from the opportunistic SYSEXIT path in entry_SYSENTER_32
+ * which left the stack looking like this:
+ * $__USER_DS
+ * %ecx
+ * eflags
+ * $__USER_CS
+ * %eip
+ * %eax
+ * %gs
+ * %fs
+ * %es
+ * %ds <-- %esp
+ *
+ * so we need to adjust it to look like a standard iret frame
*/
ENTRY(xen_sysexit)
- movl PT_EAX(%esp), %eax /* Shouldn't be necessary? */
- orl $X86_EFLAGS_IF, PT_EFLAGS(%esp)
- lea PT_EIP(%esp), %esp
-
+ add $5*4, %esp
jmp xen_iret
ENDPROC(xen_sysexit)

--
1.7.1


2015-11-13 23:26:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
<[email protected]> wrote:
> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
> frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
> it's not pt_regs).
>
> We need to adjust it so that subsequent xen_iret can use it.

I'm wondering if this should be more straightforward:

movq %rsp, %rdi
call do_fast_syscall_32
testl %eax, %eax
jz .Lsyscall_32_done

/* Opportunistic SYSRET */
sysret32_from_system_call:
XEN_DO_SYSRET32

where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
variant of Xen's iret path that knows that the fast path is okay.

--Andy

2015-11-14 01:24:00

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit



On 11/13/2015 06:26 PM, Andy Lutomirski wrote:
> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
> <[email protected]> wrote:
>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
>> it's not pt_regs).
>>
>> We need to adjust it so that subsequent xen_iret can use it.
> I'm wondering if this should be more straightforward:
>
> movq %rsp, %rdi
> call do_fast_syscall_32
> testl %eax, %eax
> jz .Lsyscall_32_done
>
> /* Opportunistic SYSRET */
> sysret32_from_system_call:
> XEN_DO_SYSRET32
>
> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
> variant of Xen's iret path that knows that the fast path is okay.


This patch is for 32-bit kernel. I actually haven't looked at compat
code (probably because our tests don't try that), I need to do that too.

As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for
baremetal otherwise current paravirt op will use native_usergs_sysret32
(for compat code). Which means a new pv_op, I think.

-boris

2015-11-15 18:02:43

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <[email protected]> wrote:
>
>
>
> On 11/13/2015 06:26 PM, Andy Lutomirski wrote:
>>
>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
>> <[email protected]> wrote:
>>>
>>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
>>> it's not pt_regs).
>>>
>>> We need to adjust it so that subsequent xen_iret can use it.
>>
>> I'm wondering if this should be more straightforward:
>>
>> movq %rsp, %rdi
>> call do_fast_syscall_32
>> testl %eax, %eax
>> jz .Lsyscall_32_done
>>
>> /* Opportunistic SYSRET */
>> sysret32_from_system_call:
>> XEN_DO_SYSRET32
>>
>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
>> variant of Xen's iret path that knows that the fast path is okay.
>
>
>
> This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too.

In 4.4, it's almost identical (which was part of the point of this
whole series). We use sysret32 instead of sysexit, but the underlying
structure is the same: munge the stack frame and register state
appropriately to use the fast return instruction in question and then
execute it. In both cases, the only real difference from the IRET
path is that we're willing to lose the values of some subset of cx,
dx, and (on 64-bit kernels) r11.

>
> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think.

Agreed, unless...

Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could
be easier to follow and be less code at the same time. Frankly,
following the control flow from asm through the pre-paravirt-patching
and post-paravirt-patching variants and into the final targets is
getting a little bit old, and ALTERNATIVE is crystal clear in
comparison (and has all the interesting info inline with the rest of
the asm). Of course, it doesn't work early in boot, but that's fine
for anything involving user/kernel switches.


--Andy

>
> -boris

2015-11-16 16:25:33

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/15/2015 01:02 PM, Andy Lutomirski wrote:
> On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <[email protected]> wrote:
>>
>>
>> On 11/13/2015 06:26 PM, Andy Lutomirski wrote:
>>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
>>> <[email protected]> wrote:
>>>> After 32-bit syscall rewrite, and specifically after commit 5f310f739b4c
>>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
>>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
>>>> it's not pt_regs).
>>>>
>>>> We need to adjust it so that subsequent xen_iret can use it.
>>> I'm wondering if this should be more straightforward:
>>>
>>> movq %rsp, %rdi
>>> call do_fast_syscall_32
>>> testl %eax, %eax
>>> jz .Lsyscall_32_done
>>>
>>> /* Opportunistic SYSRET */
>>> sysret32_from_system_call:
>>> XEN_DO_SYSRET32
>>>
>>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
>>> variant of Xen's iret path that knows that the fast path is okay.
>>
>>
>> This patch is for 32-bit kernel. I actually haven't looked at compat code (probably because our tests don't try that), I need to do that too.
> In 4.4, it's almost identical (which was part of the point of this
> whole series). We use sysret32 instead of sysexit, but the underlying
> structure is the same: munge the stack frame and register state
> appropriately to use the fast return instruction in question and then
> execute it. In both cases, the only real difference from the IRET
> path is that we're willing to lose the values of some subset of cx,
> dx, and (on 64-bit kernels) r11.


So it turned out that for compat mode we don't need to do anything since
xen_sysret32 doesn't assume any stack format (or, rather, it assumes
that it can't be used) and builds the IRET frame itself.


>
>> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for baremetal otherwise current paravirt op will use native_usergs_sysret32 (for compat code). Which means a new pv_op, I think.
> Agreed, unless...
>
> Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could
> be easier to follow and be less code at the same time. Frankly,
> following the control flow from asm through the pre-paravirt-patching
> and post-paravirt-patching variants and into the final targets is
> getting a little bit old, and ALTERNATIVE is crystal clear in
> comparison (and has all the interesting info inline with the rest of
> the asm). Of course, it doesn't work early in boot, but that's fine
> for anything involving user/kernel switches.


We don't currently have a Xen-specific CPU feature. We could, in
principle, add it but we can't replace all of current paravirt patching
with a single feature since PVH guests use a subset of existing pv ops
(and in the future it may become even more fine-grained).

And I don't think we should go ALTERNATIVE route for one set of features
and keep pv ops for the rest --- it should be either one or the other.


-boris

2015-11-16 19:03:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 8:25 AM, Boris Ostrovsky
<[email protected]> wrote:
> On 11/15/2015 01:02 PM, Andy Lutomirski wrote:
>>
>> On Nov 13, 2015 5:23 PM, "Boris Ostrovsky" <[email protected]>
>> wrote:
>>>
>>>
>>>
>>> On 11/13/2015 06:26 PM, Andy Lutomirski wrote:
>>>>
>>>> On Fri, Nov 13, 2015 at 3:18 PM, Boris Ostrovsky
>>>> <[email protected]> wrote:
>>>>>
>>>>> After 32-bit syscall rewrite, and specifically after commit
>>>>> 5f310f739b4c
>>>>> ("x86/entry/32: Re-implement SYSENTER using the new C path"), the stack
>>>>> frame that is passed to xen_sysexit is no longer a "standard" one (i.e.
>>>>> it's not pt_regs).
>>>>>
>>>>> We need to adjust it so that subsequent xen_iret can use it.
>>>>
>>>> I'm wondering if this should be more straightforward:
>>>>
>>>> movq %rsp, %rdi
>>>> call do_fast_syscall_32
>>>> testl %eax, %eax
>>>> jz .Lsyscall_32_done
>>>>
>>>> /* Opportunistic SYSRET */
>>>> sysret32_from_system_call:
>>>> XEN_DO_SYSRET32
>>>>
>>>> where XEN_DO_SYSRET32 is a simple pv op that, on Xen, jumps to a
>>>> variant of Xen's iret path that knows that the fast path is okay.
>>>
>>>
>>>
>>> This patch is for 32-bit kernel. I actually haven't looked at compat code
>>> (probably because our tests don't try that), I need to do that too.
>>
>> In 4.4, it's almost identical (which was part of the point of this
>> whole series). We use sysret32 instead of sysexit, but the underlying
>> structure is the same: munge the stack frame and register state
>> appropriately to use the fast return instruction in question and then
>> execute it. In both cases, the only real difference from the IRET
>> path is that we're willing to lose the values of some subset of cx,
>> dx, and (on 64-bit kernels) r11.
>
>
>
> So it turned out that for compat mode we don't need to do anything since
> xen_sysret32 doesn't assume any stack format (or, rather, it assumes that it
> can't be used) and builds the IRET frame itself.
>

It's still a waste of effort, though. Also, I'd eventually like the
number of places in Xen code in which rsp/esp is invalid to be exactly
zero, and this approach makes this harder or even impossible.

>
>>
>>> As for XEN_DO_SYSRET32 --- we'd presumably need to have a nop for
>>> baremetal otherwise current paravirt op will use native_usergs_sysret32 (for
>>> compat code). Which means a new pv_op, I think.
>>
>> Agreed, unless...
>>
>> Does Xen have a cpufeature? Using ALTERNATIVE instead of a pvop could
>> be easier to follow and be less code at the same time. Frankly,
>> following the control flow from asm through the pre-paravirt-patching
>> and post-paravirt-patching variants and into the final targets is
>> getting a little bit old, and ALTERNATIVE is crystal clear in
>> comparison (and has all the interesting info inline with the rest of
>> the asm). Of course, it doesn't work early in boot, but that's fine
>> for anything involving user/kernel switches.
>
>
>
> We don't currently have a Xen-specific CPU feature. We could, in principle,
> add it but we can't replace all of current paravirt patching with a single
> feature since PVH guests use a subset of existing pv ops (and in the future
> it may become even more fine-grained).
>
> And I don't think we should go ALTERNATIVE route for one set of features and
> keep pv ops for the rest --- it should be either one or the other.

Does PVH hook into the entry asm code at all? I thought it was just
boot code and drivers.

In any case, someone needs to do some serious review and cleanup on
the whole paravirt op mess. We have a bunch of paravirt ops that
serve little purpose.

The paravirt infrastructure is a bit weird, too: it seems to
effectively have four states for each patch site. There's:

1. The initial state, which is unoptimized and works on native.
Presumably any of these that happen early also need to work, if
slowly, on Xen.

2. The Xen state without text patching. I'm not actually sure why
this exists at all. Are there pvops that need to switch too early for
us to patch the text?

3. The native patched state. This is supposedly optimal, but it
results in a few more NOPs than are really needed.

4. The Xen patched state.

Alternatives have only two states, and the code is much easier to
understand. Also, alternatives avoid things like:

...
SWAPGS
...

The reader surely doesn't remember that this isn't guaranteed to be a
swapgs instruction on native. Using:

ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV

would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and
much clearer. We could hide *that* behind a macro and no one would be
confused. (Well, they'd be confused by the fact that Xen PV handles
gsbase very differently from native, but that has nothing to do with
the macro.)

I think we could convert piecemeal, and I wonder if this new patch for
32-bit native on 4.4 (this is needed for 4.4, right?) would be a good
starting point. Borislav, what do you think? Would you be okay with
adding a Xen PV pseudofeature?

--Andy

2015-11-16 19:59:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote:

> ...
> The reader surely doesn't remember that this isn't guaranteed to be a
> swapgs instruction on native. Using:
>
> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV
>
> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and
> much clearer. We could hide *that* behind a macro and no one would be
> confused. (Well, they'd be confused by the fact that Xen PV handles
> gsbase very differently from native, but that has nothing to do with
> the macro.)
>
> I think we could convert piecemeal, and I wonder if this new patch for
> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good
> starting point. Borislav, what do you think? Would you be okay with
> adding a Xen PV pseudofeature?

AFAICT, I'd prefer this becomes rather a jump label which gets enabled
on xen. Especially if a single pseudofeature might not be enough,
apprently...

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-11-16 20:11:33

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 11:59 AM, Borislav Petkov <[email protected]> wrote:
> On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote:
>
>> ...
>> The reader surely doesn't remember that this isn't guaranteed to be a
>> swapgs instruction on native. Using:
>>
>> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV
>>
>> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and
>> much clearer. We could hide *that* behind a macro and no one would be
>> confused. (Well, they'd be confused by the fact that Xen PV handles
>> gsbase very differently from native, but that has nothing to do with
>> the macro.)
>>
>> I think we could convert piecemeal, and I wonder if this new patch for
>> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good
>> starting point. Borislav, what do you think? Would you be okay with
>> adding a Xen PV pseudofeature?
>
> AFAICT, I'd prefer this becomes rather a jump label which gets enabled
> on xen. Especially if a single pseudofeature might not be enough,
> apprently...

Except it's not a jump. (Also, the alternatives infrastructure is IMO
much nicer than the jump label infrastructure.)

Taking SWAPGS as an example, the semantics we need are:

- On native, do swapgs. This *can't* be a call due to RSP issues.
- On Xen PV, swapgs will work, but it's emulated. We'd rather just nop it out.

In principle, we could static jump over it on Xen, but that also
involves forcing the jump label to be built on old GCC versions, which
PeterZ objected to the last time I asked.

If it would make you feel better, it could be X86_BUG_XENPV :-p

Are there really multiple feature bits for this stuff? I'd like to
imagine that the entry code is all either Xen PV or native/PVH/PVHVM
-- i.e. I assumed that PVH works like native for all entries.

--Andy tip #101: Trim your mails when you reply.



--
Andy Lutomirski
AMA Capital Management, LLC

2015-11-16 20:22:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 16, 2015 at 11:59 AM, Borislav Petkov <[email protected]> wrote:
> > On Mon, Nov 16, 2015 at 11:03:22AM -0800, Andy Lutomirski wrote:
> >
> >> ...
> >> The reader surely doesn't remember that this isn't guaranteed to be a
> >> swapgs instruction on native. Using:
> >>
> >> ALTERNATIVE "swapgs" "" X86_FEATURE_XENPV
> >>
> >> would be safer (it would get rid of the SWAPGS_UNSAFE_STACK mess) and
> >> much clearer. We could hide *that* behind a macro and no one would be
> >> confused. (Well, they'd be confused by the fact that Xen PV handles
> >> gsbase very differently from native, but that has nothing to do with
> >> the macro.)
> >>
> >> I think we could convert piecemeal, and I wonder if this new patch for
> >> 32-bit native on 4.4 (this is needed for 4.4, right?) would be a good
> >> starting point. Borislav, what do you think? Would you be okay with
> >> adding a Xen PV pseudofeature?
> >
> > AFAICT, I'd prefer this becomes rather a jump label which gets enabled
> > on xen. Especially if a single pseudofeature might not be enough,
> > apprently...
>
> Except it's not a jump. (Also, the alternatives infrastructure is IMO
> much nicer than the jump label infrastructure.)
>
> Taking SWAPGS as an example, the semantics we need are:
>
> - On native, do swapgs. This *can't* be a call due to RSP issues.
> - On Xen PV, swapgs will work, but it's emulated. We'd rather just nop it out.

Huh, so what's wrong with a jump:

jmp 1f
swapgs
1:

> In principle, we could static jump over it on Xen, but that also
> involves forcing the jump label to be built on old GCC versions, which
> PeterZ objected to the last time I asked.

:-\

> If it would make you feel better, it could be X86_BUG_XENPV :-p

That doesn't matter - I just don't want to open the flood gates on
pseudo feature bits.

hpa, what do you think?

> Are there really multiple feature bits for this stuff? I'd like to
> imagine that the entry code is all either Xen PV or native/PVH/PVHVM
> -- i.e. I assumed that PVH works like native for all entries.

I just reacted to Boris' statement:

"We don't currently have a Xen-specific CPU feature. We could, in
principle, add it but we can't replace all of current paravirt patching
with a single feature since PVH guests use a subset of existing pv ops
(and in the future it may become even more fine-grained)."

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-11-16 20:32:14

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/16/2015 02:03 PM, Andy Lutomirski wrote:
> It's still a waste of effort, though. Also, I'd eventually like the
> number of places in Xen code in which rsp/esp is invalid to be exactly
> zero, and this approach makes this harder or even impossible.

That's what PVH is going to do.



> Does PVH hook into the entry asm code at all? I thought it was just
> boot code and drivers.

Not the current version --- it starts with xen_start_kernel(). But we
are currently changing it and my plan is to have a small stub executed
initially (to set bootparams and such) and then jump to startup_{32|64}().

>
> In any case, someone needs to do some serious review and cleanup on
> the whole paravirt op mess. We have a bunch of paravirt ops that
> serve little purpose.
>
> The paravirt infrastructure is a bit weird, too: it seems to
> effectively have four states for each patch site. There's:
>
> 1. The initial state, which is unoptimized and works on native.
> Presumably any of these that happen early also need to work, if
> slowly, on Xen.

Not on PV (and as of today, on PVH) --- we start directly from
xen_start_kernel(). I.e. from step 2.

>
> 2. The Xen state without text patching. I'm not actually sure why
> this exists at all. Are there pvops that need to switch too early for
> us to patch the text?

I don't think so.

-boris

2015-11-16 20:49:10

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/16/2015 03:22 PM, Borislav Petkov wrote:
> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote:
>
>> Are there really multiple feature bits for this stuff? I'd like to
>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM
>> -- i.e. I assumed that PVH works like native for all entries.


Almost. For PVH we will have a small stub to set up bootparams and such
but then we jump to startup_{32|64} code.


> I just reacted to Boris' statement:
>
> "We don't currently have a Xen-specific CPU feature. We could, in
> principle, add it but we can't replace all of current paravirt patching
> with a single feature since PVH guests use a subset of existing pv ops
> (and in the future it may become even more fine-grained)."

Actually, nevermind this --- I was thinking about APIC ops and they are
not pv ops.

Note though that there are other users of pv ops --- lguest and looks
like KVM (for one op) use them too.

-boris

2015-11-16 20:50:41

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky
<[email protected]> wrote:
> On 11/16/2015 03:22 PM, Borislav Petkov wrote:
>>
>> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote:
>>
>>> Are there really multiple feature bits for this stuff? I'd like to
>>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM
>>> -- i.e. I assumed that PVH works like native for all entries.
>
>
>
> Almost. For PVH we will have a small stub to set up bootparams and such but
> then we jump to startup_{32|64} code.
>
>
>> I just reacted to Boris' statement:
>>
>> "We don't currently have a Xen-specific CPU feature. We could, in
>> principle, add it but we can't replace all of current paravirt patching
>> with a single feature since PVH guests use a subset of existing pv ops
>> (and in the future it may become even more fine-grained)."
>
>
> Actually, nevermind this --- I was thinking about APIC ops and they are not
> pv ops.
>
> Note though that there are other users of pv ops --- lguest and looks like
> KVM (for one op) use them too.
>

Honestly, I think we should just delete lguest some time soon. And
KVM uses this stuff so lightly that we shouldn't need all of the pvop
stuff. (In fact, I'm slowly working on removing KVM_GUEST's
dependency on PARAVIRT.)

--Andy


--
Andy Lutomirski
AMA Capital Management, LLC

2015-11-16 21:00:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote:
> (In fact, I'm slowly working on removing KVM_GUEST's dependency on
> PARAVIRT.)

Good! The hope that we'll be able to remove paravirt one day is != 0
again.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-11-16 21:04:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Mon, Nov 16, 2015 at 1:03 PM, Konrad Rzeszutek Wilk
<[email protected]> wrote:
> On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote:
>> On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky
>> <[email protected]> wrote:
>> > On 11/16/2015 03:22 PM, Borislav Petkov wrote:
>> >>
>> >> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote:
>> >>
>> >>> Are there really multiple feature bits for this stuff? I'd like to
>> >>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM
>> >>> -- i.e. I assumed that PVH works like native for all entries.
>> >
>> >
>> >
>> > Almost. For PVH we will have a small stub to set up bootparams and such but
>> > then we jump to startup_{32|64} code.
>> >
>> >
>> >> I just reacted to Boris' statement:
>> >>
>> >> "We don't currently have a Xen-specific CPU feature. We could, in
>> >> principle, add it but we can't replace all of current paravirt patching
>> >> with a single feature since PVH guests use a subset of existing pv ops
>> >> (and in the future it may become even more fine-grained)."
>> >
>> >
>> > Actually, nevermind this --- I was thinking about APIC ops and they are not
>> > pv ops.
>> >
>> > Note though that there are other users of pv ops --- lguest and looks like
>> > KVM (for one op) use them too.
>> >
>>
>> Honestly, I think we should just delete lguest some time soon. And
>> KVM uses this stuff so lightly that we shouldn't need all of the pvop
>> stuff. (In fact, I'm slowly working on removing KVM_GUEST's
>> dependency on PARAVIRT.)
>
> Even for the pvclock?
>
> (sorry for stealing this thread on this subject).

I don't think that pvclock uses any of the paravirt infrastructure.
It's just another clock source AFAIK.

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2015-11-16 21:55:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/16/15 12:22, Borislav Petkov wrote:
>
> Huh, so what's wrong with a jump:
>
> jmp 1f
> swapgs
> 1:
>

What is the point of that jump?

>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>
> That doesn't matter - I just don't want to open the flood gates on
> pseudo feature bits.
>
> hpa, what do you think?

Pseudo feature bits are fine, we already have plenty of them. They make
sense as they let us reuse a lot of infrastructure.

-hpa

2015-11-17 10:54:29

by Joao Martins

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit



On 11/16/2015 09:04 PM, Andy Lutomirski wrote:
> On Mon, Nov 16, 2015 at 1:03 PM, Konrad Rzeszutek Wilk
> <[email protected]> wrote:
>> On Mon, Nov 16, 2015 at 12:50:19PM -0800, Andy Lutomirski wrote:
>>> On Mon, Nov 16, 2015 at 12:48 PM, Boris Ostrovsky
>>> <[email protected]> wrote:
>>>> On 11/16/2015 03:22 PM, Borislav Petkov wrote:
>>>>>
>>>>> On Mon, Nov 16, 2015 at 12:11:11PM -0800, Andy Lutomirski wrote:
>>>>>
>>>>>> Are there really multiple feature bits for this stuff? I'd like to
>>>>>> imagine that the entry code is all either Xen PV or native/PVH/PVHVM
>>>>>> -- i.e. I assumed that PVH works like native for all entries.
>>>>
>>>>
>>>>
>>>> Almost. For PVH we will have a small stub to set up bootparams and such but
>>>> then we jump to startup_{32|64} code.
>>>>
>>>>
>>>>> I just reacted to Boris' statement:
>>>>>
>>>>> "We don't currently have a Xen-specific CPU feature. We could, in
>>>>> principle, add it but we can't replace all of current paravirt patching
>>>>> with a single feature since PVH guests use a subset of existing pv ops
>>>>> (and in the future it may become even more fine-grained)."
>>>>
>>>>
>>>> Actually, nevermind this --- I was thinking about APIC ops and they are not
>>>> pv ops.
>>>>
>>>> Note though that there are other users of pv ops --- lguest and looks like
>>>> KVM (for one op) use them too.
>>>>
>>>
>>> Honestly, I think we should just delete lguest some time soon. And
>>> KVM uses this stuff so lightly that we shouldn't need all of the pvop
>>> stuff. (In fact, I'm slowly working on removing KVM_GUEST's
>>> dependency on PARAVIRT.)
>>
>> Even for the pvclock?
>>
>> (sorry for stealing this thread on this subject).
>
> I don't think that pvclock uses any of the paravirt infrastructure.
> It's just another clock source AFAIK.
>
Yeah, but pv_time_ops is still used on both Xen and KVM. Even though it looks
that on KVM some of it's used only when pvclock isn't marked as stable (i.e. no
PVCLOCK_TSC_STABLE_BIT).

Joao

> --Andy
>

2015-11-17 14:40:47

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/16/2015 04:55 PM, H. Peter Anvin wrote:
> On 11/16/15 12:22, Borislav Petkov wrote:
>> Huh, so what's wrong with a jump:
>>
>> jmp 1f
>> swapgs
>> 1:
>>
> What is the point of that jump?
>
>>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>> That doesn't matter - I just don't want to open the flood gates on
>> pseudo feature bits.
>>
>> hpa, what do you think?
> Pseudo feature bits are fine, we already have plenty of them. They make
> sense as they let us reuse a lot of infrastructure.


So how about something like this? And then I think we can remove
usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will
use them (lguest doesn't set them)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 3eb572e..c43df7b 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -308,7 +308,8 @@ sysenter_past_esp:

movl %esp, %eax
call do_fast_syscall_32
- testl %eax, %eax
+ /* PV guests always use IRET path */
+ ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done",
X86_FEATURE_PV
jz .Lsyscall_32_done

/* Opportunistic SYSEXIT */
diff --git a/arch/x86/entry/entry_64_compat.S
b/arch/x86/entry/entry_64_compat.S
index c320183..2d1bc82 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -121,7 +121,7 @@ sysenter_flags_fixed:

movq %rsp, %rdi
call do_fast_syscall_32
- testl %eax, %eax
+ ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done",
X86_FEATURE_PV
jz .Lsyscall_32_done
jmp sysret32_from_system_call

@@ -200,7 +200,8 @@ ENTRY(entry_SYSCALL_compat)

movq %rsp, %rdi
call do_fast_syscall_32
- testl %eax, %eax
+ /* PV guests always use IRET path */
+ ALTERNATIVE "testl %eax, %eax", "jmp .Lsyscall_32_done",
X86_FEATURE_PV
jz .Lsyscall_32_done

/* Opportunistic SYSRET */
diff --git a/arch/x86/include/asm/cpufeature.h
b/arch/x86/include/asm/cpufeature.h
index e4f8010..723327b 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -216,6 +216,7 @@
#define X86_FEATURE_PAUSEFILTER ( 8*32+13) /* AMD filtered pause
intercept */
#define X86_FEATURE_PFTHRESHOLD ( 8*32+14) /* AMD pause filter
threshold */
#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */
+#define X86_FEATURE_PV ( 8*32+16) /* Paravirtual guest */


/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */

2015-11-17 18:50:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <[email protected]> wrote:
>
> On 11/16/2015 04:55 PM, H. Peter Anvin wrote:
>>
>> On 11/16/15 12:22, Borislav Petkov wrote:
>>>
>>> Huh, so what's wrong with a jump:
>>>
>>> jmp 1f
>>> swapgs
>>> 1:
>>>
>> What is the point of that jump?
>>
>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>>>
>>> That doesn't matter - I just don't want to open the flood gates on
>>> pseudo feature bits.
>>>
>>> hpa, what do you think?
>>
>> Pseudo feature bits are fine, we already have plenty of them. They make
>> sense as they let us reuse a lot of infrastructure.
>
>
>
> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them)
>

Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
return to 32-bit user mode? If so, it could be worth trying to wire
it up by patching the jz instead of the test instruction.

Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to
too many people.

--Andy

2015-11-17 19:12:49

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 17/11/15 18:49, Andy Lutomirski wrote:
> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <[email protected]> wrote:
>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote:
>>> On 11/16/15 12:22, Borislav Petkov wrote:
>>>> Huh, so what's wrong with a jump:
>>>>
>>>> jmp 1f
>>>> swapgs
>>>> 1:
>>>>
>>> What is the point of that jump?
>>>
>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>>>> That doesn't matter - I just don't want to open the flood gates on
>>>> pseudo feature bits.
>>>>
>>>> hpa, what do you think?
>>> Pseudo feature bits are fine, we already have plenty of them. They make
>>> sense as they let us reuse a lot of infrastructure.
>>
>>
>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them)
>>
> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
> return to 32-bit user mode? If so, it could be worth trying to wire
> it up by patching the jz instead of the test instruction.

>From the guests point of view, there is only hypercall_iret.

>
> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to
> too many people.

I agree - PV on its own is too generic.

An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an
unambiguous, although rather longer.

~Andrew

2015-11-17 19:16:33

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper
<[email protected]> wrote:
> On 17/11/15 18:49, Andy Lutomirski wrote:
>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <[email protected]> wrote:
>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote:
>>>> On 11/16/15 12:22, Borislav Petkov wrote:
>>>>> Huh, so what's wrong with a jump:
>>>>>
>>>>> jmp 1f
>>>>> swapgs
>>>>> 1:
>>>>>
>>>> What is the point of that jump?
>>>>
>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>>>>> That doesn't matter - I just don't want to open the flood gates on
>>>>> pseudo feature bits.
>>>>>
>>>>> hpa, what do you think?
>>>> Pseudo feature bits are fine, we already have plenty of them. They make
>>>> sense as they let us reuse a lot of infrastructure.
>>>
>>>
>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them)
>>>
>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
>> return to 32-bit user mode? If so, it could be worth trying to wire
>> it up by patching the jz instead of the test instruction.
>
> From the guests point of view, there is only hypercall_iret.

Doesn't hypercall_iret have flags that ask for different behavior,
though? (VG_syscall or whatever for the 64-bit case?)

>
>>
>> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to
>> too many people.
>
> I agree - PV on its own is too generic.
>
> An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an
> unambiguous, although rather longer.

Works for me, too, although seeing "xen_pv_host" in the Linux cpu
features would be very strange indeed :)

--Andy

2015-11-17 19:21:27

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Tue, Nov 17, 2015 at 11:16:11AM -0800, Andy Lutomirski wrote:
> Works for me, too, although seeing "xen_pv_host" in the Linux cpu
> features would be very strange indeed :)

That feature would be, of course, *not* in /proc/cpuinfo.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-11-17 19:29:25

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 17/11/15 19:16, Andy Lutomirski wrote:
> On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper
> <[email protected]> wrote:
>> On 17/11/15 18:49, Andy Lutomirski wrote:
>>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <[email protected]> wrote:
>>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote:
>>>>> On 11/16/15 12:22, Borislav Petkov wrote:
>>>>>> Huh, so what's wrong with a jump:
>>>>>>
>>>>>> jmp 1f
>>>>>> swapgs
>>>>>> 1:
>>>>>>
>>>>> What is the point of that jump?
>>>>>
>>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>>>>>> That doesn't matter - I just don't want to open the flood gates on
>>>>>> pseudo feature bits.
>>>>>>
>>>>>> hpa, what do you think?
>>>>> Pseudo feature bits are fine, we already have plenty of them. They make
>>>>> sense as they let us reuse a lot of infrastructure.
>>>>
>>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them)
>>>>
>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
>>> return to 32-bit user mode? If so, it could be worth trying to wire
>>> it up by patching the jz instead of the test instruction.
>> From the guests point of view, there is only hypercall_iret.
> Doesn't hypercall_iret have flags that ask for different behavior,
> though? (VG_syscall or whatever for the 64-bit case?)

The one and only flag is VGCF_in_syscall

Xen has its own logic for choosing between sysretq/sysretl if
VGCF_in_syscall, but will end up on an iret path in all other
circumstances.

There is definitely some room for optimisation here, but in in some
copious free time before that, I want to see about brining most of our
asm code up into C. The vast majority of it doesn't need to be written
in asm.

>
>>> Also, I'd prefer X86_FEATURE_XENPV. IMO "PV" means too many things to
>>> too many people.
>> I agree - PV on its own is too generic.
>>
>> An alternative might be X86_FEATURE_XEN_PV_GUEST which is very clear an
>> unambiguous, although rather longer.
> Works for me, too, although seeing "xen_pv_host" in the Linux cpu
> features would be very strange indeed :)

This makes me wonders whether the `insmod xen` project has managed to
gain any traction ;)

~Andrew

2015-11-17 19:37:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On Tue, Nov 17, 2015 at 11:29 AM, Andrew Cooper
<[email protected]> wrote:
> On 17/11/15 19:16, Andy Lutomirski wrote:
>> On Tue, Nov 17, 2015 at 11:12 AM, Andrew Cooper
>> <[email protected]> wrote:
>>> On 17/11/15 18:49, Andy Lutomirski wrote:
>>>> On Nov 17, 2015 6:40 AM, "Boris Ostrovsky" <[email protected]> wrote:
>>>>> On 11/16/2015 04:55 PM, H. Peter Anvin wrote:
>>>>>> On 11/16/15 12:22, Borislav Petkov wrote:
>>>>>>> Huh, so what's wrong with a jump:
>>>>>>>
>>>>>>> jmp 1f
>>>>>>> swapgs
>>>>>>> 1:
>>>>>>>
>>>>>> What is the point of that jump?
>>>>>>
>>>>>>>> If it would make you feel better, it could be X86_BUG_XENPV :-p
>>>>>>> That doesn't matter - I just don't want to open the flood gates on
>>>>>>> pseudo feature bits.
>>>>>>>
>>>>>>> hpa, what do you think?
>>>>>> Pseudo feature bits are fine, we already have plenty of them. They make
>>>>>> sense as they let us reuse a lot of infrastructure.
>>>>>
>>>>> So how about something like this? And then I think we can remove usergs_sysret32 and irq_enable_sysexit pv ops completely as noone will use them (lguest doesn't set them)
>>>>>
>>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
>>>> return to 32-bit user mode? If so, it could be worth trying to wire
>>>> it up by patching the jz instead of the test instruction.
>>> From the guests point of view, there is only hypercall_iret.
>> Doesn't hypercall_iret have flags that ask for different behavior,
>> though? (VG_syscall or whatever for the 64-bit case?)
>
> The one and only flag is VGCF_in_syscall
>
> Xen has its own logic for choosing between sysretq/sysretl if
> VGCF_in_syscall, but will end up on an iret path in all other
> circumstances.

In that case, a nicer version of this patch could preserve the new
sysret-or-iret decision (on 64-bit kernels in the compat path) and use
it to set VGCF_in_syscall. This might work considerably better now
than it ever would have, since Linux now tries to use sysret32 on
*all* 64-bit CPUs, and the way it's structured for compat is just a
flag (the testl %eax,%eax thing) that indicates that sysret32 is okay.

Anyway, that can be a followup patch.

--Andy

2015-11-17 19:37:28

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/17/2015 02:16 PM, Andy Lutomirski wrote:
>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
>>> return to 32-bit user mode? If so, it could be worth trying to wire
>>> it up by patching the jz instead of the test instruction.

We can actually make patching a little bit more efficient by replacing
the test instruction with 'xor %eax,%eax'. That way we won't need any
'nop's.


-boris

2015-11-17 19:38:51

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen/x86: Adjust stack pointer in xen_sysexit

On 11/17/2015 02:37 PM, Boris Ostrovsky wrote:
> On 11/17/2015 02:16 PM, Andy Lutomirski wrote:
>>>> Looks good to me. Does Xen have any sysexit/sysret32 equivalent to
>>>> return to 32-bit user mode? If so, it could be worth trying to wire
>>>> it up by patching the jz instead of the test instruction.
>
> We can actually make patching a little bit more efficient by replacing
> the test instruction with 'xor %eax,%eax'. That way we won't need any
> 'nop's.

Nevermind that, we are looking at flags, not register.

-boris