2005-09-22 07:52:59

by Zachary Amsden

[permalink] [raw]
Subject: [PATCH 3/3] Gdt page isolation

Make GDT page aligned and page padded to support running inside of a
hypervisor. This prevents false sharing of the GDT page with other
hot data, which is not allowed in Xen, and causes performance problems
in VMware.

Rather than go back to the old method of statically allocating the
GDT (which wastes unneded space for non-present CPUs), the GDT for
APs is allocated dynamically.

Signed-off-by: Zachary Amsden <[email protected]>
Index: linux-2.6.14-rc1/include/asm-i386/desc.h
===================================================================
--- linux-2.6.14-rc1.orig/include/asm-i386/desc.h 2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/include/asm-i386/desc.h 2005-09-20 20:20:50.000000000 -0700
@@ -15,9 +15,6 @@
#include <asm/mmu.h>

extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
-DECLARE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]);
-
-#define get_cpu_gdt_table(_cpu) (per_cpu(cpu_gdt_table,_cpu))

DECLARE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);

@@ -29,6 +26,8 @@ struct Xgt_desc_struct {

extern struct Xgt_desc_struct idt_descr, cpu_gdt_descr[NR_CPUS];

+#define get_cpu_gdt_table(_cpu) ((struct desc_struct *)cpu_gdt_descr[(_cpu)].address)
+
#define load_TR_desc() __asm__ __volatile__("ltr %w0"::"q" (GDT_ENTRY_TSS*8))
#define load_LDT_desc() __asm__ __volatile__("lldt %w0"::"q" (GDT_ENTRY_LDT*8))

Index: linux-2.6.14-rc1/arch/i386/kernel/i386_ksyms.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/i386_ksyms.c 2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/i386_ksyms.c 2005-09-20 20:28:51.000000000 -0700
@@ -3,8 +3,7 @@
#include <asm/checksum.h>
#include <asm/desc.h>

-/* This is definitely a GPL-only symbol */
-EXPORT_SYMBOL_GPL(cpu_gdt_table);
+EXPORT_SYMBOL_GPL(cpu_gdt_descr);

EXPORT_SYMBOL(__down_failed);
EXPORT_SYMBOL(__down_failed_interruptible);
Index: linux-2.6.14-rc1/arch/i386/kernel/smpboot.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/smpboot.c 2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/smpboot.c 2005-09-20 20:38:22.000000000 -0700
@@ -898,6 +898,7 @@ static int __devinit do_boot_cpu(int api
* This grunge runs the startup process for
* the targeted processor.
*/
+ cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);

atomic_set(&init_deasserted, 0);

Index: linux-2.6.14-rc1/arch/i386/kernel/head.S
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/head.S 2005-09-20 14:32:16.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/head.S 2005-09-20 20:44:38.000000000 -0700
@@ -525,3 +525,5 @@ ENTRY(cpu_gdt_table)
.quad 0x0000000000000000 /* 0xf0 - unused */
.quad 0x0000000000000000 /* 0xf8 - GDT entry 31: double-fault TSS */

+ /* Be sure this is zeroed to avoid false validations in Xen */
+ .fill PAGE_SIZE_asm / 8 - GDT_ENTRIES,8,0
Index: linux-2.6.14-rc1/arch/i386/kernel/cpu/common.c
===================================================================
--- linux-2.6.14-rc1.orig/arch/i386/kernel/cpu/common.c 2005-09-20 20:19:57.000000000 -0700
+++ linux-2.6.14-rc1/arch/i386/kernel/cpu/common.c 2005-09-20 20:37:31.000000000 -0700
@@ -18,9 +18,6 @@

#include "cpu.h"

-DEFINE_PER_CPU(struct desc_struct, cpu_gdt_table[GDT_ENTRIES]);
-EXPORT_PER_CPU_SYMBOL(cpu_gdt_table);
-
DEFINE_PER_CPU(unsigned char, cpu_16bit_stack[CPU_16BIT_STACK_SIZE]);
EXPORT_PER_CPU_SYMBOL(cpu_16bit_stack);


2005-09-22 08:58:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] Gdt page isolation


* Zachary Amsden <[email protected]> wrote:

> Make GDT page aligned and page padded to support running inside of a
> hypervisor. This prevents false sharing of the GDT page with other
> hot data, which is not allowed in Xen, and causes performance problems
> in VMware.
>
> Rather than go back to the old method of statically allocating the GDT
> (which wastes unneded space for non-present CPUs), the GDT for APs is
> allocated dynamically.
>
> Signed-off-by: Zachary Amsden <[email protected]>

Acked-by: Ingo Molnar <[email protected]>

Ingo

2005-09-22 13:17:20

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 3/3] Gdt page isolation

> * This grunge runs the startup process for
> * the targeted processor.
> */
> + cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);

I can see why don't check it for NULL, but it's a ugly reason
and would be better fixed. It at least needs a comment.

-Andi (who would still prefer just going back to the array
in head.S - would work as well and waste less memory)
>
> atomic_set(&init_deasserted, 0);

2005-09-22 13:33:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/3] Gdt page isolation


* Andi Kleen <[email protected]> wrote:

> > * This grunge runs the startup process for
> > * the targeted processor.
> > */
> > + cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
>
> I can see why don't check it for NULL, but it's a ugly reason and
> would be better fixed. It at least needs a comment.

it's so early in the bootup that any failure here would probably be
fatal anyway. But yeah, a comment would be nice.

> -Andi (who would still prefer just going back to the array in head.S -
> would work as well and waste less memory)

that doesnt really solve the problem for e.g. Xen, which needs a
separate page for each GDT. (xenolinux is a separate arch right now, but
it will/should be merged back into its base architectures) So why not
solve all the problems at once?

Ingo

2005-09-22 13:38:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 3/3] Gdt page isolation

On Thu, Sep 22, 2005 at 03:34:02PM +0200, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > > * This grunge runs the startup process for
> > > * the targeted processor.
> > > */
> > > + cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
> >
> > I can see why don't check it for NULL, but it's a ugly reason and
> > would be better fixed. It at least needs a comment.
>
> it's so early in the bootup that any failure here would probably be
> fatal anyway. But yeah, a comment would be nice.

Not true for AP boot, especially with CPU hotplug.
And CPU hotplug will be common because suspend/resume use it
and next years laptops will have dual core CPUs.
So it would be nicer to handle it. But getting out of that
path is difficult.


> > -Andi (who would still prefer just going back to the array in head.S -
> > would work as well and waste less memory)
>
> that doesnt really solve the problem for e.g. Xen, which needs a
> separate page for each GDT. (xenolinux is a separate arch right now, but
> it will/should be merged back into its base architectures) So why not
> solve all the problems at once?

Xen could use the same method vmware uses that handles having other
read only data in there too? Doesn't sound too bad.

-Andi

2005-09-23 19:00:24

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH 3/3] Gdt page isolation

Andi Kleen wrote:

>> * This grunge runs the startup process for
>> * the targeted processor.
>> */
>>+ cpu_gdt_descr[cpu].address = __get_free_page(GFP_KERNEL|__GFP_ZERO);
>>
>>
>
>I can see why don't check it for NULL, but it's a ugly reason
>and would be better fixed. It at least needs a comment.
>
>-Andi (who would still prefer just going back to the array
>in head.S - would work as well and waste less memory)
>
>

The array in head.S does waste more memory if you compile for NR_CPUS >>
actual cpus. But the primary reason for allocating on individual pages
is to preserve the hypervisor GDT entries for Xen. Xen relies on a GDT
virtualization technique which uses descriptors in the high part of the
GDT. Keep in mind, the GDT is a paged data structure. So here is what
they do:

Linear address space:

+---------------------------+ 4GB
| |
| Xen code, heap |
| |
| |
+---------------------------+
| | GDT virtual mapping
| Xen per-domain mappings |==(page 15)====> hypervisor physical page
| |==(page 1-14)==> zeroed pages
| GDT (16 pages) |==(page 0)==+
+---------------------------+ -168 ? MB |
| | |
| MPT tables | |
| | |
| | |
+---------------------------+ |
| | |
| Guest kernel | |
| | |
| | |
| | |
| | |
| GDT 256 bytes, read-only |============+==> guest physical page
| |
+---------------------------+ 3GB


So, the GDT mapping which is live in the hypervisor consists of guest
GDT pages following by blank pages, followed by a page which is reserved
for Xen private GDT mappings. The guest pages are mapped into the
guest, read-only.

This imposes a strict requirement on the guest regarding sharing of data
on the GDT pages; it is impossible to share arbitrary data, even if it
is read-only. All data on thes pages must conform to the rules for
valid guest GDT descriptors, which only GDT and LDT entries are forced
to do.

So while it is technically possible to share the per-cpu GDTs on the
same set of pages, the entire thing must be padded, page-aligned, and
zeroed of any unused data. Unless you go to a complicated scheme where
per-cpu GDTs are colored and shared, you have an arbitrary limit (240)
on the number of virtual CPUs.

So for now, the approach Xen is currently using appears to be simplest
and most flexible to implement in terms of one page per CPU for the GDT.

Zach