2006-05-09 08:54:12

by Chris Wright

[permalink] [raw]
Subject: [RFC PATCH 17/35] Segment register changes for Xen

1. We clear FS/GS before changing TLS entries and switching LDT, as
otherwise the hypervisor will fail to restore thread-local values on
return to the guest kernel and we take a slow exception path.

2. We allow for the fact that the guest kernel may not run in ring 0.
This requires some abstraction in a few places when setting %cs or
checking privilege level (user vs kernel).

Signed-off-by: Ian Pratt <[email protected]>
Signed-off-by: Christian Limpach <[email protected]>
Signed-off-by: Chris Wright <[email protected]>
---
arch/i386/kernel/process.c | 4 +++-
arch/i386/mm/fault.c | 8 +++++---
include/asm-i386/mach-default/mach_segment.h | 8 ++++++++
include/asm-i386/mach-default/mach_system.h | 2 ++
include/asm-i386/mach-xen/mach_segment.h | 9 +++++++++
include/asm-i386/mach-xen/mach_system.h | 2 ++
include/asm-i386/mach-xen/setup_arch_post.h | 2 ++
include/asm-i386/ptrace.h | 6 ++++--
include/asm-i386/segment.h | 2 ++
9 files changed, 37 insertions(+), 6 deletions(-)

--- linus-2.6.orig/arch/i386/kernel/process.c
+++ linus-2.6/arch/i386/kernel/process.c
@@ -347,7 +347,7 @@ int kernel_thread(int (*fn)(void *), voi
regs.xes = __USER_DS;
regs.orig_eax = -1;
regs.eip = (unsigned long) kernel_thread_helper;
- regs.xcs = __KERNEL_CS;
+ regs.xcs = get_kernel_cs();
regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;

/* Ok, create the new process.. */
@@ -647,6 +647,8 @@ struct task_struct fastcall * __switch_t
*/
savesegment(fs, prev->fs);
savesegment(gs, prev->gs);
+ clearsegment(fs);
+ clearsegment(gs);

/*
* Load the per-thread Thread-Local Storage descriptor.
--- linus-2.6.orig/arch/i386/mm/fault.c
+++ linus-2.6/arch/i386/mm/fault.c
@@ -28,6 +28,8 @@
#include <asm/desc.h>
#include <asm/kdebug.h>

+#include <mach_segment.h>
+
extern void die(const char *,struct pt_regs *,long);

/*
@@ -78,14 +80,14 @@ static inline unsigned long get_segment_
u32 seg_ar, seg_limit, base, *desc;

/* The standard kernel/user address space limit. */
- *eip_limit = (seg & 3) ? USER_DS.seg : KERNEL_DS.seg;
+ *eip_limit = (seg & USER_MODE_MASK) ? USER_DS.seg : KERNEL_DS.seg;

/* Unlikely, but must come before segment checks. */
if (unlikely((regs->eflags & VM_MASK) != 0))
return eip + (seg << 4);

/* By far the most common cases. */
- if (likely(seg == __USER_CS || seg == __KERNEL_CS))
+ if (likely(seg == __USER_CS || seg == get_kernel_cs()))
return eip;

/* Check the segment exists, is within the current LDT/GDT size,
@@ -400,7 +402,7 @@ good_area:
switch (error_code & 3) {
default: /* 3: write, present */
#ifdef TEST_VERIFY_AREA
- if (regs->cs == KERNEL_CS)
+ if (regs->cs == get_kernel_cs())
printk("WP fault at %08lx\n", regs->eip);
#endif
/* fall through */
--- linus-2.6.orig/include/asm-i386/mach-default/mach_system.h
+++ linus-2.6/include/asm-i386/mach-default/mach_system.h
@@ -1,6 +1,8 @@
#ifndef __ASM_MACH_SYSTEM_H
#define __ASM_MACH_SYSTEM_H

+#define clearsegment(seg)
+
/* interrupt control.. */
#define local_save_flags(x) do { typecheck(unsigned long,x); __asm__ __volatile__("pushfl ; popl %0":"=g" (x): /* no input */); } while (0)
#define local_irq_restore(x) do { typecheck(unsigned long,x); __asm__ __volatile__("pushl %0 ; popfl": /* no output */ :"g" (x):"memory", "cc"); } while (0)
--- linus-2.6.orig/include/asm-i386/mach-xen/mach_system.h
+++ linus-2.6/include/asm-i386/mach-xen/mach_system.h
@@ -5,6 +5,8 @@

#include <asm/hypervisor.h>

+#define clearsegment(seg) loadsegment(seg, 0)
+
#ifdef CONFIG_SMP
#define __vcpu_id smp_processor_id()
#else
--- linus-2.6.orig/include/asm-i386/mach-xen/setup_arch_post.h
+++ linus-2.6/include/asm-i386/mach-xen/setup_arch_post.h
@@ -26,6 +26,8 @@ static void __init machine_specific_arch
(struct shared_info *)__va(xen_start_info->shared_info);
memset(empty_zero_page, 0, sizeof(empty_zero_page));

+ HYPERVISOR_vm_assist(VMASST_CMD_enable, VMASST_TYPE_4gb_segments);
+
HYPERVISOR_set_callbacks(
__KERNEL_CS, (unsigned long)hypervisor_callback,
__KERNEL_CS, (unsigned long)failsafe_callback);
--- linus-2.6.orig/include/asm-i386/ptrace.h
+++ linus-2.6/include/asm-i386/ptrace.h
@@ -1,6 +1,8 @@
#ifndef _I386_PTRACE_H
#define _I386_PTRACE_H

+#include <mach_segment.h>
+
#define EBX 0
#define ECX 1
#define EDX 2
@@ -73,11 +75,11 @@ extern void send_sigtrap(struct task_str
*/
static inline int user_mode(struct pt_regs *regs)
{
- return (regs->xcs & 3) != 0;
+ return (regs->xcs & USER_MODE_MASK) != 0;
}
static inline int user_mode_vm(struct pt_regs *regs)
{
- return ((regs->xcs & 3) | (regs->eflags & VM_MASK)) != 0;
+ return ((regs->xcs & USER_MODE_MASK) | (regs->eflags & VM_MASK)) != 0;
}
#define instruction_pointer(regs) ((regs)->eip)
#if defined(CONFIG_SMP) && defined(CONFIG_FRAME_POINTER)
--- linus-2.6.orig/include/asm-i386/segment.h
+++ linus-2.6/include/asm-i386/segment.h
@@ -1,6 +1,8 @@
#ifndef _ASM_SEGMENT_H
#define _ASM_SEGMENT_H

+#include <mach_segment.h>
+
/*
* The layout of the per-CPU GDT under Linux:
*
--- /dev/null
+++ linus-2.6/include/asm-i386/mach-default/mach_segment.h
@@ -0,0 +1,8 @@
+#ifndef __ASM_MACH_SEGMENT_H
+#define __ASM_MACH_SEGMENT_H
+
+#define USER_MODE_MASK 3
+
+#define get_kernel_cs() __KERNEL_CS
+
+#endif /* __ASM_MACH_SEGMENT_H */
--- /dev/null
+++ linus-2.6/include/asm-i386/mach-xen/mach_segment.h
@@ -0,0 +1,9 @@
+#ifndef __ASM_MACH_SEGMENT_H
+#define __ASM_MACH_SEGMENT_H
+
+#define USER_MODE_MASK 2
+
+#define get_kernel_cs() \
+ (__KERNEL_CS + (xen_feature(XENFEAT_supervisor_mode_kernel) ? 0 : 1))
+
+#endif /* __ASM_MACH_SEGMENT_H */

--


2006-05-09 16:45:23

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

/* fall through */
> --- linus-2.6.orig/include/asm-i386/mach-default/mach_system.h
> +++ linus-2.6/include/asm-i386/mach-default/mach_system.h
> @@ -1,6 +1,8 @@
> #ifndef __ASM_MACH_SYSTEM_H
> #define __ASM_MACH_SYSTEM_H
>
> +#define clearsegment(seg)

I don't think you can give such a specific hook such a generic name.

I would just add an ifdef around the real user with a comment.

-Andi

2006-05-10 20:00:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

Hi!

> --- linus-2.6.orig/include/asm-i386/mach-default/mach_system.h
> +++ linus-2.6/include/asm-i386/mach-default/mach_system.h
> @@ -1,6 +1,8 @@
> #ifndef __ASM_MACH_SYSTEM_H
> #define __ASM_MACH_SYSTEM_H
>
> +#define clearsegment(seg)

do {} while (0), please.

Pavel
--
Thanks for all the (sleeping) penguins.

2006-05-10 20:09:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

On Tuesday 09 May 2006 09:16, Pavel Machek wrote:
> Hi!
>
> > --- linus-2.6.orig/include/asm-i386/mach-default/mach_system.h
> > +++ linus-2.6/include/asm-i386/mach-default/mach_system.h
> > @@ -1,6 +1,8 @@
> > #ifndef __ASM_MACH_SYSTEM_H
> > #define __ASM_MACH_SYSTEM_H
> >
> > +#define clearsegment(seg)
>
> do {} while (0), please.

It's not needed. Think about it.

-Andi

2006-05-10 20:30:56

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

On St 10-05-06 22:09:04, Andi Kleen wrote:
> On Tuesday 09 May 2006 09:16, Pavel Machek wrote:
> > Hi!
> >
> > > --- linus-2.6.orig/include/asm-i386/mach-default/mach_system.h
> > > +++ linus-2.6/include/asm-i386/mach-default/mach_system.h
> > > @@ -1,6 +1,8 @@
> > > #ifndef __ASM_MACH_SYSTEM_H
> > > #define __ASM_MACH_SYSTEM_H
> > >
> > > +#define clearsegment(seg)
> >
> > do {} while (0), please.
>
> It's not needed. Think about it.

Really? If someone does

if (something)
clearsegment(seg)
somethingelse();

... he'll get very confusing behaviour instead of compile error.

Okay, that's weaker argument than expected...

Also clearsegment(x) clearsegment(y); will compile when it should not.

Also clearsegment(i++) will behave strangely. So perhaps

#define clearsegment(seg) do { seg; } while (0)

is best variant?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-05-11 10:34:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

Pavel Machek wrote:
> Really? If someone does
>
> if (something)
> clearsegment(seg)
> somethingelse();
>
> ... he'll get very confusing behaviour instead of compile error.
>
> Okay, that's weaker argument than expected...
>
> Also clearsegment(x) clearsegment(y); will compile when it should not.
>
> Also clearsegment(i++) will behave strangely. So perhaps
>
> #define clearsegment(seg) do { seg; } while (0)
>

static inline void clearsegment(int seg) {}

?

--
error compiling committee.c: too many arguments to function

2006-05-11 10:41:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

On Thursday 11 May 2006 12:34, Avi Kivity wrote:
> Pavel Machek wrote:
> > Really? If someone does
> >
> > if (something)
> > clearsegment(seg)
> > somethingelse();
> >
> > ... he'll get very confusing behaviour instead of compile error.
> >
> > Okay, that's weaker argument than expected...
> >
> > Also clearsegment(x) clearsegment(y); will compile when it should not.
> >
> > Also clearsegment(i++) will behave strangely. So perhaps
> >
> > #define clearsegment(seg) do { seg; } while (0)
> >
>
> static inline void clearsegment(int seg) {}


It's all mood because the complete function is wrongly named
and probably should just go.

-Andi

2006-05-12 00:29:18

by Rusty Russell

[permalink] [raw]
Subject: Re: [Xen-devel] Re: [RFC PATCH 17/35] Segment register changes for Xen

On Tue, 2006-05-09 at 07:16 +0000, Pavel Machek wrote:
> Hi!
>
> > --- linus-2.6.orig/include/asm-i386/mach-default/mach_system.h
> > +++ linus-2.6/include/asm-i386/mach-default/mach_system.h
> > @@ -1,6 +1,8 @@
> > #ifndef __ASM_MACH_SYSTEM_H
> > #define __ASM_MACH_SYSTEM_H
> >
> > +#define clearsegment(seg)
>
> do {} while (0), please.

It's off-topic, but: why?

Rusty.
--
ccontrol: http://ccontrol.ozlabs.org

2006-05-18 20:23:17

by Zachary Amsden

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

Chris Wright wrote:
> 1. We clear FS/GS before changing TLS entries and switching LDT, as
> otherwise the hypervisor will fail to restore thread-local values on
> return to the guest kernel and we take a slow exception path.

> @@ -647,6 +647,8 @@ struct task_struct fastcall * __switch_t
> */
> savesegment(fs, prev->fs);
> savesegment(gs, prev->gs);
> + clearsegment(fs);
> + clearsegment(gs);
>

Really not needed. Think about it. You can even speed up Xen. I'm
glad the native operation here is a nop, but it should be
hypervisor_clearsegment or xen_clearsegment if you really want to keep it.

Zach

2006-05-18 20:46:40

by Keir Fraser

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen


On 18 May 2006, at 21:20, Zachary Amsden wrote:

>> 1. We clear FS/GS before changing TLS entries and switching LDT, as
>> otherwise the hypervisor will fail to restore thread-local values on
>> return to the guest kernel and we take a slow exception path.
>
>> @@ -647,6 +647,8 @@ struct task_struct fastcall * __switch_t
>> */
>> savesegment(fs, prev->fs);
>> savesegment(gs, prev->gs);
>> + clearsegment(fs);
>> + clearsegment(gs);
>>
>
> Really not needed. Think about it. You can even speed up Xen. I'm
> glad the native operation here is a nop, but it should be
> hypervisor_clearsegment or xen_clearsegment if you really want to keep
> it.

Maybe you could explain in more detail? It's not needed for
correctness, but it is faster for us to clear at that point.

-- Keir

2006-05-18 21:23:10

by Chris Wright

[permalink] [raw]
Subject: Re: [RFC PATCH 17/35] Segment register changes for Xen

* Zachary Amsden ([email protected]) wrote:
> Chris Wright wrote:
> >1. We clear FS/GS before changing TLS entries and switching LDT, as
> >otherwise the hypervisor will fail to restore thread-local values on
> >return to the guest kernel and we take a slow exception path.
>
> >@@ -647,6 +647,8 @@ struct task_struct fastcall * __switch_t
> > */
> > savesegment(fs, prev->fs);
> > savesegment(gs, prev->gs);
> >+ clearsegment(fs);
> >+ clearsegment(gs);
> >
>
> Really not needed. Think about it. You can even speed up Xen.

Please describe how. I'm afraid I'm missing your point, as I don't see
the improvement.

> I'm glad the native operation here is a nop, but it should be
> hypervisor_clearsegment or xen_clearsegment if you really want to keep it.

Yeah, Andi had similar naming concern.