2006-12-05 13:38:26

by Magnus Damm

[permalink] [raw]
Subject: [PATCH 00/02] kexec: Move segment code to assembly files

kexec: Move segment code to assembly files

The following patches rearrange the lowlevel kexec code to perform idt,
gdt and segment setup code in assembly on the code page instead of doing
it in inline assembly in the C files.

Our dom0 Xen port of kexec and kdump executes the code page from the
hypervisor when kexec:ing into a new kernel. Putting as much code as
possible on the code page allows us to keep the amount of duplicated
code low.

These patches are part of the Xen port of kexec and kdump which recently
has been accepted into the xen-unstable.hg tree. Sending them upstream
now is an attempt to simplify future porting work.

Signed-off-by: Magnus Damm <[email protected]>
---

Applies to 2.6.19.

arch/i386/kernel/machine_kexec.c | 59 ----------------------------------
arch/i386/kernel/relocate_kernel.S | 58 ++++++++++++++++++++++++++++++---
arch/x86_64/kernel/machine_kexec.c | 58 ---------------------------------
arch/x86_64/kernel/relocate_kernel.S | 50 +++++++++++++++++++++++++---
4 files changed, 98 insertions(+), 127 deletions(-)


2006-12-05 13:38:46

by Magnus Damm

[permalink] [raw]
Subject: [PATCH 02/02] kexec: Move segment code to assembly file (x86_64)

kexec: Move segment code to assembly file (x86_64)

This patch moves the idt, gdt, and segment handling code from machine_kexec.c
to relocate_kernel.S. The main reason behind this move is to avoid code
duplication in the Xen hypervisor. With this patch all code required to kexec
is put on the control page.

Signed-off-by: Magnus Damm <[email protected]>
---

Applies to 2.6.19.

arch/x86_64/kernel/machine_kexec.c | 58 ----------------------------------
arch/x86_64/kernel/relocate_kernel.S | 50 ++++++++++++++++++++++++++---
2 files changed, 45 insertions(+), 63 deletions(-)

--- 0001/arch/x86_64/kernel/machine_kexec.c
+++ work/arch/x86_64/kernel/machine_kexec.c 2006-12-05 17:24:14.000000000 +0900
@@ -112,47 +112,6 @@ static int init_pgtable(struct kimage *i
return init_level4_page(image, level4p, 0, end_pfn << PAGE_SHIFT);
}

-static void set_idt(void *newidt, u16 limit)
-{
- struct desc_ptr curidt;
-
- /* x86-64 supports unaliged loads & stores */
- curidt.size = limit;
- curidt.address = (unsigned long)newidt;
-
- __asm__ __volatile__ (
- "lidtq %0\n"
- : : "m" (curidt)
- );
-};
-
-
-static void set_gdt(void *newgdt, u16 limit)
-{
- struct desc_ptr curgdt;
-
- /* x86-64 supports unaligned loads & stores */
- curgdt.size = limit;
- curgdt.address = (unsigned long)newgdt;
-
- __asm__ __volatile__ (
- "lgdtq %0\n"
- : : "m" (curgdt)
- );
-};
-
-static void load_segments(void)
-{
- __asm__ __volatile__ (
- "\tmovl %0,%%ds\n"
- "\tmovl %0,%%es\n"
- "\tmovl %0,%%ss\n"
- "\tmovl %0,%%fs\n"
- "\tmovl %0,%%gs\n"
- : : "a" (__KERNEL_DS) : "memory"
- );
-}
-
int machine_kexec_prepare(struct kimage *image)
{
unsigned long start_pgtable;
@@ -209,23 +168,6 @@ NORET_TYPE void machine_kexec(struct kim
page_list[PA_TABLE_PAGE] =
(unsigned long)__pa(page_address(image->control_code_page));

- /* The segment registers are funny things, they have both a
- * visible and an invisible part. Whenever the visible part is
- * set to a specific selector, the invisible part is loaded
- * with from a table in memory. At no other time is the
- * descriptor table in memory accessed.
- *
- * I take advantage of this here by force loading the
- * segments, before I zap the gdt with an invalid value.
- */
- load_segments();
- /* The gdt & idt are now invalid.
- * If you want to load them you must set up your own idt & gdt.
- */
- set_gdt(phys_to_virt(0),0);
- set_idt(phys_to_virt(0),0);
-
- /* now call it */
relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
image->start);
}
--- 0001/arch/x86_64/kernel/relocate_kernel.S
+++ work/arch/x86_64/kernel/relocate_kernel.S 2006-12-05 17:24:14.000000000 +0900
@@ -159,13 +159,39 @@ relocate_new_kernel:
movq PTR(PA_PGD)(%rsi), %r9
movq %r9, %cr3

+ /* setup idt */
+ movq %r8, %rax
+ addq $(idt_80 - relocate_kernel), %rax
+ lidtq (%rax)
+
+ /* setup gdt */
+ movq %r8, %rax
+ addq $(gdt - relocate_kernel), %rax
+ movq %r8, %r9
+ addq $((gdt_80 - relocate_kernel) + 2), %r9
+ movq %rax, (%r9)
+
+ movq %r8, %rax
+ addq $(gdt_80 - relocate_kernel), %rax
+ lgdtq (%rax)
+
+ /* setup data segment registers */
+ xorl %eax, %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl %eax, %fs
+ movl %eax, %gs
+ movl %eax, %ss
+
/* setup a new stack at the end of the physical control page */
lea 4096(%r8), %rsp

- /* jump to identity mapped page */
- addq $(identity_mapped - relocate_kernel), %r8
- pushq %r8
- ret
+ /* load new code segment and jump to identity mapped page */
+ movq %r8, %rax
+ addq $(identity_mapped - relocate_kernel), %rax
+ pushq $(gdt_cs - gdt)
+ pushq %rax
+ lretq

identity_mapped:
/* store the start address on the stack */
@@ -272,5 +298,19 @@ identity_mapped:
xorq %r13, %r13
xorq %r14, %r14
xorq %r15, %r15
-
ret
+
+ .align 16
+gdt:
+ .quad 0x0000000000000000 /* NULL descriptor */
+gdt_cs:
+ .quad 0x00af9a000000ffff
+gdt_end:
+
+gdt_80:
+ .word gdt_end - gdt - 1 /* limit */
+ .quad 0 /* base - filled in by code above */
+
+idt_80:
+ .word 0 /* limit */
+ .quad 0 /* base */

2006-12-05 13:38:30

by Magnus Damm

[permalink] [raw]
Subject: [PATCH 01/02] kexec: Move segment code to assembly file (i386)

kexec: Move segment code to assembly file (i386)

This patch moves the idt, gdt, and segment handling code from machine_kexec.c
to relocate_kernel.S. The main reason behind this move is to avoid code
duplication in the Xen hypervisor. With this patch all code required to kexec
is put on the control page.

Signed-off-by: Magnus Damm <[email protected]>
---

Applies to 2.6.19.

arch/i386/kernel/machine_kexec.c | 59 ------------------------------------
arch/i386/kernel/relocate_kernel.S | 58 ++++++++++++++++++++++++++++++++---
2 files changed, 53 insertions(+), 64 deletions(-)

--- 0001/arch/i386/kernel/machine_kexec.c
+++ work/arch/i386/kernel/machine_kexec.c 2006-12-05 17:24:10.000000000 +0900
@@ -29,48 +29,6 @@ static u32 kexec_pmd1[1024] PAGE_ALIGNED
static u32 kexec_pte0[1024] PAGE_ALIGNED;
static u32 kexec_pte1[1024] PAGE_ALIGNED;

-static void set_idt(void *newidt, __u16 limit)
-{
- struct Xgt_desc_struct curidt;
-
- /* ia32 supports unaliged loads & stores */
- curidt.size = limit;
- curidt.address = (unsigned long)newidt;
-
- load_idt(&curidt);
-};
-
-
-static void set_gdt(void *newgdt, __u16 limit)
-{
- struct Xgt_desc_struct curgdt;
-
- /* ia32 supports unaligned loads & stores */
- curgdt.size = limit;
- curgdt.address = (unsigned long)newgdt;
-
- load_gdt(&curgdt);
-};
-
-static void load_segments(void)
-{
-#define __STR(X) #X
-#define STR(X) __STR(X)
-
- __asm__ __volatile__ (
- "\tljmp $"STR(__KERNEL_CS)",$1f\n"
- "\t1:\n"
- "\tmovl $"STR(__KERNEL_DS)",%%eax\n"
- "\tmovl %%eax,%%ds\n"
- "\tmovl %%eax,%%es\n"
- "\tmovl %%eax,%%fs\n"
- "\tmovl %%eax,%%gs\n"
- "\tmovl %%eax,%%ss\n"
- ::: "eax", "memory");
-#undef STR
-#undef __STR
-}
-
/*
* A architecture hook called to validate the
* proposed image and prepare the control pages
@@ -127,23 +85,6 @@ NORET_TYPE void machine_kexec(struct kim
page_list[PA_PTE_1] = __pa(kexec_pte1);
page_list[VA_PTE_1] = (unsigned long)kexec_pte1;

- /* The segment registers are funny things, they have both a
- * visible and an invisible part. Whenever the visible part is
- * set to a specific selector, the invisible part is loaded
- * with from a table in memory. At no other time is the
- * descriptor table in memory accessed.
- *
- * I take advantage of this here by force loading the
- * segments, before I zap the gdt with an invalid value.
- */
- load_segments();
- /* The gdt & idt are now invalid.
- * If you want to load them you must set up your own idt & gdt.
- */
- set_gdt(phys_to_virt(0),0);
- set_idt(phys_to_virt(0),0);
-
- /* now call it */
relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
image->start, cpu_has_pae);
}
--- 0001/arch/i386/kernel/relocate_kernel.S
+++ work/arch/i386/kernel/relocate_kernel.S 2006-12-05 17:24:10.000000000 +0900
@@ -154,14 +154,45 @@ relocate_new_kernel:
movl PTR(PA_PGD)(%ebp), %eax
movl %eax, %cr3

+ /* setup idt */
+ movl %edi, %eax
+ addl $(idt_48 - relocate_kernel), %eax
+ lidtl (%eax)
+
+ /* setup gdt */
+ movl %edi, %eax
+ addl $(gdt - relocate_kernel), %eax
+ movl %edi, %esi
+ addl $((gdt_48 - relocate_kernel) + 2), %esi
+ movl %eax, (%esi)
+
+ movl %edi, %eax
+ addl $(gdt_48 - relocate_kernel), %eax
+ lgdtl (%eax)
+
+ /* setup data segment registers */
+ mov $(gdt_ds - gdt), %eax
+ mov %eax, %ds
+ mov %eax, %es
+ mov %eax, %fs
+ mov %eax, %gs
+ mov %eax, %ss
+
/* setup a new stack at the end of the physical control page */
lea 4096(%edi), %esp

- /* jump to identity mapped page */
- movl %edi, %eax
- addl $(identity_mapped - relocate_kernel), %eax
- pushl %eax
- ret
+ /* load new code segment and jump to identity mapped page */
+ movl %edi, %esi
+ xorl %eax, %eax
+ pushl %eax
+ pushl %esi
+ pushl %eax
+ movl $(gdt_cs - gdt), %eax
+ pushl %eax
+ movl %edi, %eax
+ addl $(identity_mapped - relocate_kernel),%eax
+ pushl %eax
+ iretl

identity_mapped:
/* store the start address on the stack */
@@ -250,3 +281,20 @@ identity_mapped:
xorl %edi, %edi
xorl %ebp, %ebp
ret
+
+ .align 16
+gdt:
+ .quad 0x0000000000000000 /* NULL descriptor */
+gdt_cs:
+ .quad 0x00cf9a000000ffff /* kernel 4GB code at 0x00000000 */
+gdt_ds:
+ .quad 0x00cf92000000ffff /* kernel 4GB data at 0x00000000 */
+gdt_end:
+
+gdt_48:
+ .word gdt_end - gdt - 1 /* limit */
+ .long 0 /* base - filled in by code above */
+
+idt_48:
+ .word 0 /* limit */
+ .long 0 /* base */

2006-12-05 14:03:38

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 00/02] kexec: Move segment code to assembly files

On Tue, Dec 05, 2006 at 10:37:57PM +0900, Magnus Damm wrote:
> kexec: Move segment code to assembly files
>
> The following patches rearrange the lowlevel kexec code to perform idt,
> gdt and segment setup code in assembly on the code page instead of doing
> it in inline assembly in the C files.
>

I don't think we should be doing this. I would rather prefer code to
keep in C for easier debugging, readability and maintenance.

> Our dom0 Xen port of kexec and kdump executes the code page from the
> hypervisor when kexec:ing into a new kernel. Putting as much code as
> possible on the code page allows us to keep the amount of duplicated
> code low.
>

Is Xen going upstream now? I heard now lhype+KVM seems to be the way.
Even if it is required, we should do it once Xen goes in.

You have already moved page table setup code to assembly and we should
be getting rid of that code too.

I would rather live with duplicated code than moving more code in assembly
which can be written in C. Understanding and debugging assembly code
is such a big pain.

Thanks
Vivek

2006-12-05 15:08:32

by Magnus Damm

[permalink] [raw]
Subject: Re: [PATCH 00/02] kexec: Move segment code to assembly files

On 12/5/06, Vivek Goyal <[email protected]> wrote:
> On Tue, Dec 05, 2006 at 10:37:57PM +0900, Magnus Damm wrote:
> > kexec: Move segment code to assembly files
> >
> > The following patches rearrange the lowlevel kexec code to perform idt,
> > gdt and segment setup code in assembly on the code page instead of doing
> > it in inline assembly in the C files.
> >
>
> I don't think we should be doing this. I would rather prefer code to
> keep in C for easier debugging, readability and maintenance.

I prefer to write code in C too, but I don't see how wrapping assembly
instructions in inline C makes the code any easier compared to raw
assembly. Either you understand the assembly or you don't.

> > Our dom0 Xen port of kexec and kdump executes the code page from the
> > hypervisor when kexec:ing into a new kernel. Putting as much code as
> > possible on the code page allows us to keep the amount of duplicated
> > code low.
> >
>
> Is Xen going upstream now? I heard now lhype+KVM seems to be the way.
> Even if it is required, we should do it once Xen goes in.

I am not sure about status of the Xen merging effort. domU seemed to
be the top priority last time I heard something, but this change only
affects dom0 so it is probably even further away.

> You have already moved page table setup code to assembly and we should
> be getting rid of that code too.

This was recommended to me by Eric if I'm not mistaken, but if we can
move out parts of the assembly code to C then that would be great.

> I would rather live with duplicated code than moving more code in assembly
> which can be written in C. Understanding and debugging assembly code
> is such a big pain.

Again, I think that is true for C code - not for inline assembly in C
files. But I guess you are talking about the already merged page table
a patches. My first version implemented the code in C, have a look at
the function create_mapping() which I think is very clear:

http://lists.osdl.org/pipermail/fastboot/2006-May/002838.html

The important question IMO is if this should be merged ahead of the
rest of the Xen stuff, and maybe it shouldn't.

Thanks,

/ magnus