2006-08-21 09:53:07

by Magnus Damm

[permalink] [raw]
Subject: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

x86_64: Reload CS when startup_64 is used.

The current x86_64 startup code never reloads CS during the early boot process
if the 64-bit function startup_64 is used as entry point. The 32-bit entry
point startup_32 does the right thing and reloads CS, and this is what most
people are using if they use bzImage.

This patch fixes the case when the Linux kernel is booted into using kexec
under Xen. The Xen hypervisor is using large CS values which makes the x86_64
kernel fail - but only if vmlinux is booted, bzImage works well because it
is using the 32-bit entry point.

The main question is if we require that the boot loader should setup CS
to some certain offset to be able to boot the kernel. The sane solution IMO
should be that the kernel requires that the loaded descriptors are correct,
but that the exact offset within the GDT the boot loader is using should not
matter. This is the way the i386 boot works if I understand things correctly.

Signed-off-by: Magnus Damm <[email protected]>
---

Applies on top of 2.6.18-rc4

head.S | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

--- 0001/arch/x86_64/kernel/head.S
+++ work/arch/x86_64/kernel/head.S 2006-08-21 18:22:57.000000000 +0900
@@ -165,6 +165,25 @@ startup_64:
*/
lgdt cpu_gdt_descr

+ /* Reload CS with a value that is within our GDT. We need to do this
+ * if we were loaded by a 64 bit bootloader that happened to use a
+ * CS that is larger than the GDT limit. This is true if we came here
+ * from kexec running under Xen.
+ */
+ movq %rsp, %rdx
+ movq $__KERNEL_DS, %rax
+ pushq %rax /* SS */
+ pushq %rdx /* RSP */
+ movq $__KERNEL_CS, %rax
+ movq $cs_reloaded, %rdx
+ pushq %rax /* CS */
+ pushq %rdx /* RIP */
+ lretq
+
+cs_reloaded:
+ /* Setup the boot time stack again */
+ movq init_rsp(%rip),%rsp
+
/*
* Setup up a dummy PDA. this is just for some early bootup code
* that does in_interrupt()


2006-08-21 10:19:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.


>
> + /* Reload CS with a value that is within our GDT. We need to do this
> + * if we were loaded by a 64 bit bootloader that happened to use a
> + * CS that is larger than the GDT limit. This is true if we came here
> + * from kexec running under Xen.
> + */
> + movq %rsp, %rdx
> + movq $__KERNEL_DS, %rax
> + pushq %rax /* SS */
> + pushq %rdx /* RSP */
> + movq $__KERNEL_CS, %rax
> + movq $cs_reloaded, %rdx
> + pushq %rax /* CS */
> + pushq %rdx /* RIP */
> + lretq

Can't you just use a normal far jump? That might be simpler.

-Andi

2006-08-21 13:29:39

by Magnus Damm

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On 8/21/06, Andi Kleen <[email protected]> wrote:
>
> >
> > + /* Reload CS with a value that is within our GDT. We need to do this
> > + * if we were loaded by a 64 bit bootloader that happened to use a
> > + * CS that is larger than the GDT limit. This is true if we came here
> > + * from kexec running under Xen.
> > + */
> > + movq %rsp, %rdx
> > + movq $__KERNEL_DS, %rax
> > + pushq %rax /* SS */
> > + pushq %rdx /* RSP */
> > + movq $__KERNEL_CS, %rax
> > + movq $cs_reloaded, %rdx
> > + pushq %rax /* CS */
> > + pushq %rdx /* RIP */
> > + lretq
>
> Can't you just use a normal far jump? That might be simpler.

I couldn't find a far jump that took a 64-bit address to jump to. But
I guess that the kernel will be loaded in the lowest 4G regardless so
I guess 32-bit pointers are ok, right? That will make it simpler for
sure.

What do you think about reloading CS? Is it the right thing to do, or
is it correct as it is today where we depend on that CS == _KERNEL_CS?
I need to fix kexec-tools regardless, but maybe it is a good idea to
make the 64-bit kernel boot a bit robust too.

Thanks,

/ magnus

2006-08-21 14:17:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Monday 21 August 2006 15:29, Magnus Damm wrote:
> On 8/21/06, Andi Kleen <[email protected]> wrote:
> >
> > >
> > > + /* Reload CS with a value that is within our GDT. We need to do this
> > > + * if we were loaded by a 64 bit bootloader that happened to use a
> > > + * CS that is larger than the GDT limit. This is true if we came here
> > > + * from kexec running under Xen.
> > > + */
> > > + movq %rsp, %rdx
> > > + movq $__KERNEL_DS, %rax
> > > + pushq %rax /* SS */
> > > + pushq %rdx /* RSP */
> > > + movq $__KERNEL_CS, %rax
> > > + movq $cs_reloaded, %rdx
> > > + pushq %rax /* CS */
> > > + pushq %rdx /* RIP */
> > > + lretq
> >
> > Can't you just use a normal far jump? That might be simpler.
>
> I couldn't find a far jump that took a 64-bit address to jump to. But
> I guess that the kernel will be loaded in the lowest 4G regardless so
> I guess 32-bit pointers are ok, right? That will make it simpler for
> sure.

Yes, that code always runs in the identity mapping and at 2MB.

>
> What do you think about reloading CS? Is it the right thing to do, or
> is it correct as it is today where we depend on that CS == _KERNEL_CS?
> I need to fix kexec-tools regardless, but maybe it is a good idea to
> make the 64-bit kernel boot a bit robust too.

Reloading CS is ok, although longer term I plan to switch the kernel
to uncompress already in 64bit. Then you would need the same GDT anyways.

-Andi


2006-08-21 14:17:51

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Mon, Aug 21, 2006 at 10:29:36PM +0900, Magnus Damm wrote:
> On 8/21/06, Andi Kleen <[email protected]> wrote:
> >
> >>
> >> + /* Reload CS with a value that is within our GDT. We need to do
> >this
> >> + * if we were loaded by a 64 bit bootloader that happened to use a
> >> + * CS that is larger than the GDT limit. This is true if we came
> >here
> >> + * from kexec running under Xen.
> >> + */
> >> + movq %rsp, %rdx
> >> + movq $__KERNEL_DS, %rax
> >> + pushq %rax /* SS */
> >> + pushq %rdx /* RSP */
> >> + movq $__KERNEL_CS, %rax
> >> + movq $cs_reloaded, %rdx
> >> + pushq %rax /* CS */
> >> + pushq %rdx /* RIP */
> >> + lretq
> >
> >Can't you just use a normal far jump? That might be simpler.
>
> I couldn't find a far jump that took a 64-bit address to jump to. But
> I guess that the kernel will be loaded in the lowest 4G regardless so
> I guess 32-bit pointers are ok, right? That will make it simpler for
> sure.
>

Given the idea of relocatable kernel is floating around I would prefer if
we are not bounded by the restriction of loading a kernel in lowest 4G.

Thanks
Vivek

2006-08-21 14:24:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.


> Given the idea of relocatable kernel is floating around I would prefer if
> we are not bounded by the restriction of loading a kernel in lowest 4G.

There is already other code that requires this. In fact i don't think it can
be above 40MB currently.

-Andi

2006-08-21 14:47:30

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Mon, Aug 21, 2006 at 04:24:10PM +0200, Andi Kleen wrote:
>
> > Given the idea of relocatable kernel is floating around I would prefer if
> > we are not bounded by the restriction of loading a kernel in lowest 4G.
>
> There is already other code that requires this. In fact i don't think it can
> be above 40MB currently.
>

But I think Eric's prototype patches for relocatable kernel do get over
this limitation (Hope I understood the code right). Assuming that relocatable
kernel patches will be merged down the line, it would be nice not to be
bound by 4G limitation.

Thanks
Vivek

2006-08-21 15:04:11

by Andi Kleen

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Monday 21 August 2006 16:46, Vivek Goyal wrote:
> On Mon, Aug 21, 2006 at 04:24:10PM +0200, Andi Kleen wrote:
> >
> > > Given the idea of relocatable kernel is floating around I would prefer if
> > > we are not bounded by the restriction of loading a kernel in lowest 4G.
> >
> > There is already other code that requires this. In fact i don't think it can
> > be above 40MB currently.
> >
>
> But I think Eric's prototype patches for relocatable kernel do get over
> this limitation (Hope I understood the code right). Assuming that relocatable
> kernel patches will be merged down the line, it would be nice not to be
> bound by 4G limitation.

He may have fixed the 40MB issue, but I very much doubt he changed the 2GB
limitation because that would be a major change.

-Andi

2006-08-21 20:02:59

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

Andi Kleen <[email protected]> writes:

> On Monday 21 August 2006 16:46, Vivek Goyal wrote:
>> On Mon, Aug 21, 2006 at 04:24:10PM +0200, Andi Kleen wrote:
>> >
>> > > Given the idea of relocatable kernel is floating around I would prefer if
>> > > we are not bounded by the restriction of loading a kernel in lowest 4G.
>> >
>> > There is already other code that requires this. In fact i don't think it can
>> > be above 40MB currently.
>> >
>>
>> But I think Eric's prototype patches for relocatable kernel do get over
>> this limitation (Hope I understood the code right). Assuming that relocatable
>> kernel patches will be merged down the line, it would be nice not to be
>> bound by 4G limitation.
>
> He may have fixed the 40MB issue, but I very much doubt he changed the 2GB
> limitation because that would be a major change.

I'm not certain I caught everything but as far as I know I did.
Part of that was by having the code run at a fixed virtual address so
we still live in the last 2GB of the virtual address space.


Eric

2006-08-21 20:10:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.


> I'm not certain I caught everything but as far as I know I did.
> Part of that was by having the code run at a fixed virtual address so
> we still live in the last 2GB of the virtual address space.

You changed the -2GB (or rather -40MB unpatched) mapping to not necessarily
be linear? There are a couple of assumptions that it is, including at boot up
(it doubles as the 1:1 mapping then) and in change_page_attr() and in suspend/resume.

-Andi

2006-08-21 21:00:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

Andi Kleen <[email protected]> writes:

>> I'm not certain I caught everything but as far as I know I did.
>> Part of that was by having the code run at a fixed virtual address so
>> we still live in the last 2GB of the virtual address space.
>
> You changed the -2GB (or rather -40MB unpatched) mapping to not necessarily
> be linear?

I left it linear but I removed assumptions about which physical address it
goes to.

> There are a couple of assumptions that it is, including at boot up
> (it doubles as the 1:1 mapping then) and in change_page_attr() and in
> suspend/resume.

Yes. Those all sound familiar.

I removed the doubling as the 1:1 physical mapping.

Eric

2006-08-21 21:03:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

Magnus Damm <[email protected]> writes:

> x86_64: Reload CS when startup_64 is used.
>
> The current x86_64 startup code never reloads CS during the early boot process
> if the 64-bit function startup_64 is used as entry point. The 32-bit entry
> point startup_32 does the right thing and reloads CS, and this is what most
> people are using if they use bzImage.
>
> This patch fixes the case when the Linux kernel is booted into using kexec
> under Xen. The Xen hypervisor is using large CS values which makes the x86_64
> kernel fail - but only if vmlinux is booted, bzImage works well because it
> is using the 32-bit entry point.
>
> The main question is if we require that the boot loader should setup CS
> to some certain offset to be able to boot the kernel. The sane solution IMO
> should be that the kernel requires that the loaded descriptors are correct,
> but that the exact offset within the GDT the boot loader is using should not
> matter. This is the way the i386 boot works if I understand things correctly.

What extra reload of cs does Xen introduce?

I'm not really comfortable with a half virtualized case.

Eric

2006-08-22 00:46:52

by Magnus Damm

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Mon, 2006-08-21 at 16:16 +0200, Andi Kleen wrote:
> On Monday 21 August 2006 15:29, Magnus Damm wrote:
> > On 8/21/06, Andi Kleen <[email protected]> wrote:
> > >
> > > >
> > > > + /* Reload CS with a value that is within our GDT. We need to do this
> > > > + * if we were loaded by a 64 bit bootloader that happened to use a
> > > > + * CS that is larger than the GDT limit. This is true if we came here
> > > > + * from kexec running under Xen.
> > > > + */
> > > > + movq %rsp, %rdx
> > > > + movq $__KERNEL_DS, %rax
> > > > + pushq %rax /* SS */
> > > > + pushq %rdx /* RSP */
> > > > + movq $__KERNEL_CS, %rax
> > > > + movq $cs_reloaded, %rdx
> > > > + pushq %rax /* CS */
> > > > + pushq %rdx /* RIP */
> > > > + lretq
> > >
> > > Can't you just use a normal far jump? That might be simpler.
> >
> > I couldn't find a far jump that took a 64-bit address to jump to. But
> > I guess that the kernel will be loaded in the lowest 4G regardless so
> > I guess 32-bit pointers are ok, right? That will make it simpler for
> > sure.
>
> Yes, that code always runs in the identity mapping and at 2MB.
>
> >
> > What do you think about reloading CS? Is it the right thing to do, or
> > is it correct as it is today where we depend on that CS == _KERNEL_CS?
> > I need to fix kexec-tools regardless, but maybe it is a good idea to
> > make the 64-bit kernel boot a bit robust too.
>
> Reloading CS is ok, although longer term I plan to switch the kernel
> to uncompress already in 64bit. Then you would need the same GDT anyways.

I think reloading CS is the right thing to do. IMO it is not sane to
depend on that the 64-bit boot loader sets up CS to 0x18 for us.

Having a dependency like that (unless there is a good reason and it is
documented somehow) is good to avoid, regardless of 64-bit uncompress or
not. I mean, if you plan on making the bzImage code 64-bit then it needs
to reload CS too, right?

Thanks,

/ magnus

2006-08-22 00:58:11

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Mon, 2006-08-21 at 15:02 -0600, Eric W. Biederman wrote:
> Magnus Damm <[email protected]> writes:
>
> > x86_64: Reload CS when startup_64 is used.
> >
> > The current x86_64 startup code never reloads CS during the early boot process
> > if the 64-bit function startup_64 is used as entry point. The 32-bit entry
> > point startup_32 does the right thing and reloads CS, and this is what most
> > people are using if they use bzImage.
> >
> > This patch fixes the case when the Linux kernel is booted into using kexec
> > under Xen. The Xen hypervisor is using large CS values which makes the x86_64
> > kernel fail - but only if vmlinux is booted, bzImage works well because it
> > is using the 32-bit entry point.
> >
> > The main question is if we require that the boot loader should setup CS
> > to some certain offset to be able to boot the kernel. The sane solution IMO
> > should be that the kernel requires that the loaded descriptors are correct,
> > but that the exact offset within the GDT the boot loader is using should not
> > matter. This is the way the i386 boot works if I understand things correctly.
>
> What extra reload of cs does Xen introduce?

None, but Xen is using CS values that are very different from Linux:

xen/include/public/arch-x86_64.h:

#define FLAT_RING3_CS32 0xe023 /* GDT index 260 */
#define FLAT_RING3_CS64 0xe033 /* GDT index 261 */
#define FLAT_RING3_DS32 0xe02b /* GDT index 262 */
#define FLAT_RING3_DS64 0x0000 /* NULL selector */
#define FLAT_RING3_SS32 0xe02b /* GDT index 262 */
#define FLAT_RING3_SS64 0xe02b /* GDT index 262 */

The main problem is that startup_64 depends on that CS is set to
__KERNEL_CS when it shouldn't. On top of that the purgatory code in
kexec-tools doesn't setup CS when using a 64-bit entry point. The
following (mangled) patch solves that for me:

--- 0001/purgatory/arch/x86_64/entry64.S
+++ work/purgatory/arch/x86_64/entry64.S 2006-08-18
15:34:23.000000000 +0900
@@ -37,8 +37,9 @@ entry64:
movl %eax, %fs
movl %eax, %gs

- /* In 64bit mode the code segment is meaningless */
-
+ ljmp *new_cs_addr(%rip)
+new_cs_exit:
+
/* Load the registers */
movq rax(%rip), %rax
movq rbx(%rip), %rbx
@@ -93,8 +94,11 @@ gdt: /* 0x00 unusable segment
.word 0, 0, 0

/* 0x10 4GB flat code segment */
- .word 0xFFFF, 0x0000, 0x9A00, 0x00CF
+ .word 0xFFFF, 0x0000, 0x9A00, 0x00AF

/* 0x18 4GB flat data segment */
.word 0xFFFF, 0x0000, 0x9200, 0x00CF
gdt_end:
+new_cs_addr:
+ .long new_cs_exit
+ .word 0x10 /* CODE_CS */


> I'm not really comfortable with a half virtualized case.

That I don't understand, care to explain more?

Thanks,

/ magnus

2006-08-22 03:42:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

Magnus Damm <[email protected]> writes:

> On Mon, 2006-08-21 at 15:02 -0600, Eric W. Biederman wrote:
>> Magnus Damm <[email protected]> writes:
>>
>> > x86_64: Reload CS when startup_64 is used.
>> >
>> > The current x86_64 startup code never reloads CS during the early boot
> process
>> > if the 64-bit function startup_64 is used as entry point. The 32-bit entry
>> > point startup_32 does the right thing and reloads CS, and this is what most
>> > people are using if they use bzImage.
>> >
>> > This patch fixes the case when the Linux kernel is booted into using kexec
>> > under Xen. The Xen hypervisor is using large CS values which makes the
> x86_64
>> > kernel fail - but only if vmlinux is booted, bzImage works well because it
>> > is using the 32-bit entry point.
>> >
>> > The main question is if we require that the boot loader should setup CS
>> > to some certain offset to be able to boot the kernel. The sane solution IMO
>> > should be that the kernel requires that the loaded descriptors are correct,
>> > but that the exact offset within the GDT the boot loader is using should not
>> > matter. This is the way the i386 boot works if I understand things
> correctly.
>>
>> What extra reload of cs does Xen introduce?
>
> None, but Xen is using CS values that are very different from Linux:
>
> xen/include/public/arch-x86_64.h:
>
> #define FLAT_RING3_CS32 0xe023 /* GDT index 260 */
> #define FLAT_RING3_CS64 0xe033 /* GDT index 261 */
> #define FLAT_RING3_DS32 0xe02b /* GDT index 262 */
> #define FLAT_RING3_DS64 0x0000 /* NULL selector */
> #define FLAT_RING3_SS32 0xe02b /* GDT index 262 */
> #define FLAT_RING3_SS64 0xe02b /* GDT index 262 */
>
> The main problem is that startup_64 depends on that CS is set to
> __KERNEL_CS when it shouldn't. On top of that the purgatory code in
> kexec-tools doesn't setup CS when using a 64-bit entry point. The
> following (mangled) patch solves that for me:

I believe the cs shadow registers are fine.

> --- 0001/purgatory/arch/x86_64/entry64.S
> +++ work/purgatory/arch/x86_64/entry64.S 2006-08-18
> 15:34:23.000000000 +0900
> @@ -37,8 +37,9 @@ entry64:
> movl %eax, %fs
> movl %eax, %gs
>
> - /* In 64bit mode the code segment is meaningless */

Would you care to explain to me how the above comment is not true.
As I recall the only meaning %cs has is if you are a kernel
or user space process. The base address and everything else
mean nothing.

In addition the value in %cs never means anything. It is the
values in the cs shadow registers that count. The value in %cs
just reflects where those values in %cs came from.

So if we never reload %cs and only use the shadow values why does
the value in %cs matter?

>> I'm not really comfortable with a half virtualized case.
>
> That I don't understand, care to explain more?

The only case where I can think of that the value in %cs would matter
is if you change %cs. The only way I can see that is if you are
half para-virtualized. Because we are running privileged
instructions in startup_64 it shouldn't work for the paravirtualized
case, and it should work just like it does today in the
fully virtualized case.



Eric

2006-08-22 04:09:53

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.

On Mon, 2006-08-21 at 21:41 -0600, Eric W. Biederman wrote:
> Magnus Damm <[email protected]> writes:
>
> > On Mon, 2006-08-21 at 15:02 -0600, Eric W. Biederman wrote:
> >> Magnus Damm <[email protected]> writes:
> >>
> >> > x86_64: Reload CS when startup_64 is used.
> >> >
> >> > The current x86_64 startup code never reloads CS during the early boot
> > process
> >> > if the 64-bit function startup_64 is used as entry point. The 32-bit entry
> >> > point startup_32 does the right thing and reloads CS, and this is what most
> >> > people are using if they use bzImage.
> >> >
> >> > This patch fixes the case when the Linux kernel is booted into using kexec
> >> > under Xen. The Xen hypervisor is using large CS values which makes the
> > x86_64
> >> > kernel fail - but only if vmlinux is booted, bzImage works well because it
> >> > is using the 32-bit entry point.
> >> >
> >> > The main question is if we require that the boot loader should setup CS
> >> > to some certain offset to be able to boot the kernel. The sane solution IMO
> >> > should be that the kernel requires that the loaded descriptors are correct,
> >> > but that the exact offset within the GDT the boot loader is using should not
> >> > matter. This is the way the i386 boot works if I understand things
> > correctly.
> >>
> >> What extra reload of cs does Xen introduce?
> >
> > None, but Xen is using CS values that are very different from Linux:
> >
> > xen/include/public/arch-x86_64.h:
> >
> > #define FLAT_RING3_CS32 0xe023 /* GDT index 260 */
> > #define FLAT_RING3_CS64 0xe033 /* GDT index 261 */
> > #define FLAT_RING3_DS32 0xe02b /* GDT index 262 */
> > #define FLAT_RING3_DS64 0x0000 /* NULL selector */
> > #define FLAT_RING3_SS32 0xe02b /* GDT index 262 */
> > #define FLAT_RING3_SS64 0xe02b /* GDT index 262 */
> >
> > The main problem is that startup_64 depends on that CS is set to
> > __KERNEL_CS when it shouldn't. On top of that the purgatory code in
> > kexec-tools doesn't setup CS when using a 64-bit entry point. The
> > following (mangled) patch solves that for me:
>
> I believe the cs shadow registers are fine.

Yeah, I know you do. =)

> > --- 0001/purgatory/arch/x86_64/entry64.S
> > +++ work/purgatory/arch/x86_64/entry64.S 2006-08-18
> > 15:34:23.000000000 +0900
> > @@ -37,8 +37,9 @@ entry64:
> > movl %eax, %fs
> > movl %eax, %gs
> >
> > - /* In 64bit mode the code segment is meaningless */
>
> Would you care to explain to me how the above comment is not true.
> As I recall the only meaning %cs has is if you are a kernel
> or user space process. The base address and everything else
> mean nothing.
>
> In addition the value in %cs never means anything. It is the
> values in the cs shadow registers that count. The value in %cs
> just reflects where those values in %cs came from.
>
> So if we never reload %cs and only use the shadow values why does
> the value in %cs matter?

Well, I cannot explain it well myself. But if you try setting CS to
something else than 0x18 and start the x86_64 kernel using startup_64
you will see what I am talking about. It does not work.

A good example of when this doesn't work is when my page_table_a patches
are used on x86_64 to boot a vmlinux. It does not work because CS is set
to something else then __KERNEL_CS.

Things just happen to work in most cases today (excluding being rebooted
from Xen) because either
a) vmlinux is booted and CS is set to 0x18 by Linux before kexec:ing.
or
b) bzImage is booted (32 bit entry point)

> >> I'm not really comfortable with a half virtualized case.
> >
> > That I don't understand, care to explain more?
>
> The only case where I can think of that the value in %cs would matter
> is if you change %cs. The only way I can see that is if you are
> half para-virtualized. Because we are running privileged
> instructions in startup_64 it shouldn't work for the paravirtualized
> case, and it should work just like it does today in the
> fully virtualized case.

I'm not talking about running Linux under Xen, this is a standard
non-virtualized kernel that is started _from_ a Xen environment using
kexec-tools. Think kdump kernel being started by the Xen hypervisor on
panic. Does that make sense?

Thanks,

/ magnus

2006-08-22 08:03:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH][RFC] x86_64: Reload CS when startup_64 is used.


> So if we never reload %cs and only use the shadow values why does
> the value in %cs matter?

We usually do at the first IRET.

-Andi

2006-08-22 08:38:17

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH] x86_64: Reload CS when startup_64 is used.


In long mode the %cs is largely a relic. However there are a few cases
like lret where it matters that we have a valid value. Without this
patch it is possible to enter the kernel in startup_64 without setting
%cs to a valid value. With this patch we don't care what %cs value
we enter the kernel with, so long as the cs shadow register indicates
it is a privileged code segment.

Thanks to Magnus Damm for finding this problem and posting the
first workable patch. I have moved the jump to set %cs down a
few instructions so we don't need to take an extra jump. Which
keeps the code simpler.

Signed-of-by: Eric W. Biederman <[email protected]>
---
arch/x86_64/kernel/head.S | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86_64/kernel/head.S b/arch/x86_64/kernel/head.S
index 923f080..6c89f6a 100644
--- a/arch/x86_64/kernel/head.S
+++ b/arch/x86_64/kernel/head.S
@@ -186,10 +186,13 @@ #define CR0_PAGING (1<<31)
/* Finally jump to run C code and to be on real kernel address
* Since we are running on identity-mapped space we have to jump
* to the full 64bit address , this is only possible as indirect
- * jump
+ * jump. In addition we need to ensure %cs is set so we make this
+ * a far return.
*/
movq initial_code(%rip),%rax
- jmp *%rax
+ pushq $__KERNEL_CS
+ pushq %rax
+ lretq

/* SMP bootup changes these two */
.align 8

2006-08-22 08:53:27

by Magnus Damm

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] x86_64: Reload CS when startup_64 is used.

Hi Eric,

On 8/22/06, Eric W. Biederman <[email protected]> wrote:
>
> In long mode the %cs is largely a relic. However there are a few cases
> like lret where it matters that we have a valid value. Without this
> patch it is possible to enter the kernel in startup_64 without setting
> %cs to a valid value. With this patch we don't care what %cs value
> we enter the kernel with, so long as the cs shadow register indicates
> it is a privileged code segment.
>
> Thanks to Magnus Damm for finding this problem and posting the
> first workable patch. I have moved the jump to set %cs down a
> few instructions so we don't need to take an extra jump. Which
> keeps the code simpler.
>
> Signed-of-by: Eric W. Biederman <[email protected]>

While at it, could you please fix up the purgatory code in kexec-tools
to include this fix so we can boot older versions of the kernel too?

Thanks,

/ magnus

2006-08-22 09:01:11

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Reload CS when startup_64 is used.

On Tue, 22 Aug 2006 02:37:44 -0600
[email protected] (Eric W. Biederman) wrote:

>
> In long mode the %cs is largely a relic. However there are a few cases
> like lret

You mean iret?


> + * jump. In addition we need to ensure %cs is set so we make this
> + * a far return.
> */
> movq initial_code(%rip),%rax
> - jmp *%rax
> + pushq $__KERNEL_CS
> + pushq %rax
> + lretq

Ok merged thanks

-Andi

2006-08-22 09:21:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] x86_64: Reload CS when startup_64 is used.

Andi Kleen <[email protected]> writes:

> On Tue, 22 Aug 2006 02:37:44 -0600
> [email protected] (Eric W. Biederman) wrote:
>
>>
>> In long mode the %cs is largely a relic. However there are a few cases
>> like lret
>
> You mean iret?

Yes, sorry.

>
>> + * jump. In addition we need to ensure %cs is set so we make this
>> + * a far return.
>> */
>> movq initial_code(%rip),%rax
>> - jmp *%rax
>> + pushq $__KERNEL_CS
>> + pushq %rax
>> + lretq
>
> Ok merged thanks
>
> -Andi

2006-08-22 09:26:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] x86_64: Reload CS when startup_64 is used.

"Magnus Damm" <[email protected]> writes:

> Hi Eric,
>
> On 8/22/06, Eric W. Biederman <[email protected]> wrote:
>>
>> In long mode the %cs is largely a relic. However there are a few cases
>> like lret where it matters that we have a valid value. Without this
>> patch it is possible to enter the kernel in startup_64 without setting
>> %cs to a valid value. With this patch we don't care what %cs value
>> we enter the kernel with, so long as the cs shadow register indicates
>> it is a privileged code segment.
>>
>> Thanks to Magnus Damm for finding this problem and posting the
>> first workable patch. I have moved the jump to set %cs down a
>> few instructions so we don't need to take an extra jump. Which
>> keeps the code simpler.
>>
>> Signed-of-by: Eric W. Biederman <[email protected]>
>
> While at it, could you please fix up the purgatory code in kexec-tools
> to include this fix so we can boot older versions of the kernel too?

I guess I'm not opposed to a patch but I have some serious reservations
about a solution that limits my address to 32bits in 64bit mode, and
appears to break the gdt used for entering the 32bit kernel.

I'm up way to late tonight. I just wanted to resolve the situation
when it was clear what was going on.

Eric

2006-08-23 03:10:25

by Magnus Damm

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] x86_64: Reload CS when startup_64 is used.

On 8/22/06, Eric W. Biederman <[email protected]> wrote:
> "Magnus Damm" <[email protected]> writes:
> > On 8/22/06, Eric W. Biederman <[email protected]> wrote:
> >>
> >> In long mode the %cs is largely a relic. However there are a few cases
> >> like lret where it matters that we have a valid value. Without this
> >> patch it is possible to enter the kernel in startup_64 without setting
> >> %cs to a valid value. With this patch we don't care what %cs value
> >> we enter the kernel with, so long as the cs shadow register indicates
> >> it is a privileged code segment.
> >>
> >> Thanks to Magnus Damm for finding this problem and posting the
> >> first workable patch. I have moved the jump to set %cs down a
> >> few instructions so we don't need to take an extra jump. Which
> >> keeps the code simpler.
> >>
> >> Signed-of-by: Eric W. Biederman <[email protected]>
> >
> > While at it, could you please fix up the purgatory code in kexec-tools
> > to include this fix so we can boot older versions of the kernel too?
>
> I guess I'm not opposed to a patch but I have some serious reservations
> about a solution that limits my address to 32bits in 64bit mode, and
> appears to break the gdt used for entering the 32bit kernel.

The 32-bit pointer issue can easily be resolved. I don't understand
what you mean with breaking the GDT though - to me it looks like the
entry for CS in entry64.S is broken without my patch. You reload the
GDT to a new one anyway in entry64-32.S so I'm not sure what this
32-bit breakage is that you are talking about.

> I'm up way to late tonight. I just wanted to resolve the situation
> when it was clear what was going on.

Getting the fix in the kernel is hopefully solved now, thanks for the
help. Next step in my mind is to fix up kexec-tools - I'll send an
updated patch to fastboot later on today.

Thanks,

/ magnus