2009-01-27 01:12:37

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: unified percpu stuff

I'm really pleased to see the unified percpu stuff in the kernel, but
unfortunately its breaking Xen at the moment.

It looks like this is just a matter of initializing %gs properly in
xen_start_kernel. Is there any problem with me doing a load_gs_base(0)
somewhere early in xen_start_kernel (arch/x86/xen/enlighten.c)? Is the
initial percpu are and offset for cpu0 all set up? Do I need to make it
#ifdef CONFIG_SMP?

Do I need to do anything for 32-bit? (I haven't tested that yet.)

Thanks,
J


2009-01-27 04:17:35

by Tejun Heo

[permalink] [raw]
Subject: Re: unified percpu stuff

Hello,

Jeremy Fitzhardinge wrote:
> I'm really pleased to see the unified percpu stuff in the kernel, but
> unfortunately its breaking Xen at the moment.
> It looks like this is just a matter of initializing %gs properly in
> xen_start_kernel. Is there any problem with me doing a load_gs_base(0)
> somewhere early in xen_start_kernel (arch/x86/xen/enlighten.c)?

No, not at all. Patches welcome. :-)

> Is the initial percpu are and offset for cpu0 all set up? Do I need
> to make it #ifdef CONFIG_SMP?
>
> Do I need to do anything for 32-bit? (I haven't tested that yet.)

The current tj-percpu[1] contains Brian's commit which consolidates
load_gs_base() into switch_to_new_gdt(), so you'll either need to call
it instead or call wrmsrl() or loadsegment() directly if that doesn't
fit the bill (but I think it will).

The initial offset is different between 32 and 64. Please take a look
at BOOT_PERCPU_OFFSET definition in setup_percpu.c.

Thanks.

--
tejun

[1] git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git tj-percpu

2009-01-27 05:58:15

by Brian Gerst

[permalink] [raw]
Subject: Re: unified percpu stuff

On Mon, Jan 26, 2009 at 8:12 PM, Jeremy Fitzhardinge <[email protected]> wrote:
> I'm really pleased to see the unified percpu stuff in the kernel, but
> unfortunately its breaking Xen at the moment.
> It looks like this is just a matter of initializing %gs properly in
> xen_start_kernel. Is there any problem with me doing a load_gs_base(0)
> somewhere early in xen_start_kernel (arch/x86/xen/enlighten.c)?

Some of the changes I did made the assumption that the percpu state is
set up early in head_xx.S, which apparently is skipped for xen. Is
there documentation anywhere for the xen bootstrap process?

Try this patch (untested):

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index bef941f..b90d061 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1647,6 +1647,8 @@ asmlinkage void __init xen_start_kernel(void)
have_vcpu_info_placement = 0;
#endif

+ switch_to_new_gdt();
+
xen_smp_init();

/* Get mfn list */
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7735e3d..00d9265 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -235,6 +235,8 @@ cpu_initialize_context(unsigned int cpu, struct
task_struct *idle)
ctxt->user_regs.ss = __KERNEL_DS;
#ifdef CONFIG_X86_32
ctxt->user_regs.fs = __KERNEL_PERCPU;
+#else
+ ctxt->gs_base_kernel = per_cpu_offset(cpu);
#endif
ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */

--
Brian Gerst

2009-01-27 07:24:22

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: unified percpu stuff

Brian Gerst wrote:
> On Mon, Jan 26, 2009 at 8:12 PM, Jeremy Fitzhardinge <[email protected]> wrote:
>
>> I'm really pleased to see the unified percpu stuff in the kernel, but
>> unfortunately its breaking Xen at the moment.
>> It looks like this is just a matter of initializing %gs properly in
>> xen_start_kernel. Is there any problem with me doing a load_gs_base(0)
>> somewhere early in xen_start_kernel (arch/x86/xen/enlighten.c)?
>>
>
> Some of the changes I did made the assumption that the percpu state is
> set up early in head_xx.S, which apparently is skipped for xen. Is
> there documentation anywhere for the xen bootstrap process?
>

Not really. Its quite different from native because the guest kernel
starts up in protected/long mode with paging enabled, using a
Xen-provided pagetable. It means that most of the normal head.S stuff
is redundant; the kernel starts executing more or less exactly at
xen_start_kernel (there's a 2 instruction asm part which stashes away
the Xen info pointer from %[re]si, then jumps to xen_start_kernel).

> Try this patch (untested):
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index bef941f..b90d061 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1647,6 +1647,8 @@ asmlinkage void __init xen_start_kernel(void)
> have_vcpu_info_placement = 0;
> #endif
>
> + switch_to_new_gdt();
> +
> xen_smp_init();
>
> /* Get mfn list */
> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
> index 7735e3d..00d9265 100644
> --- a/arch/x86/xen/smp.c
> +++ b/arch/x86/xen/smp.c
> @@ -235,6 +235,8 @@ cpu_initialize_context(unsigned int cpu, struct
> task_struct *idle)
> ctxt->user_regs.ss = __KERNEL_DS;
> #ifdef CONFIG_X86_32
> ctxt->user_regs.fs = __KERNEL_PERCPU;
> +#else
> + ctxt->gs_base_kernel = per_cpu_offset(cpu);
> #endif
> ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
> ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
>

Thanks, I'll try this out.

BTW, does the initial cpu0 percpu area get reallocated and moved during
boot, or does cpu0 keep using the same memory forever?

Thanks,
J

2009-01-27 12:35:19

by Brian Gerst

[permalink] [raw]
Subject: Re: unified percpu stuff

On Tue, Jan 27, 2009 at 2:24 AM, Jeremy Fitzhardinge <[email protected]> wrote:
> Brian Gerst wrote:
>>
>> On Mon, Jan 26, 2009 at 8:12 PM, Jeremy Fitzhardinge <[email protected]>
>> wrote:
>>
>>>
>>> I'm really pleased to see the unified percpu stuff in the kernel, but
>>> unfortunately its breaking Xen at the moment.
>>> It looks like this is just a matter of initializing %gs properly in
>>> xen_start_kernel. Is there any problem with me doing a load_gs_base(0)
>>> somewhere early in xen_start_kernel (arch/x86/xen/enlighten.c)?
>>>
>>
>> Some of the changes I did made the assumption that the percpu state is
>> set up early in head_xx.S, which apparently is skipped for xen. Is
>> there documentation anywhere for the xen bootstrap process?
>>
>
> Not really. Its quite different from native because the guest kernel starts
> up in protected/long mode with paging enabled, using a Xen-provided
> pagetable. It means that most of the normal head.S stuff is redundant; the
> kernel starts executing more or less exactly at xen_start_kernel (there's a
> 2 instruction asm part which stashes away the Xen info pointer from %[re]si,
> then jumps to xen_start_kernel).
>
>> Try this patch (untested):
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index bef941f..b90d061 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -1647,6 +1647,8 @@ asmlinkage void __init xen_start_kernel(void)
>> have_vcpu_info_placement = 0;
>> #endif
>>
>> + switch_to_new_gdt();
>> +
>> xen_smp_init();
>>
>> /* Get mfn list */
>> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
>> index 7735e3d..00d9265 100644
>> --- a/arch/x86/xen/smp.c
>> +++ b/arch/x86/xen/smp.c
>> @@ -235,6 +235,8 @@ cpu_initialize_context(unsigned int cpu, struct
>> task_struct *idle)
>> ctxt->user_regs.ss = __KERNEL_DS;
>> #ifdef CONFIG_X86_32
>> ctxt->user_regs.fs = __KERNEL_PERCPU;
>> +#else
>> + ctxt->gs_base_kernel = per_cpu_offset(cpu);
>> #endif
>> ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
>> ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */
>>
>
> Thanks, I'll try this out.
>
> BTW, does the initial cpu0 percpu area get reallocated and moved during
> boot, or does cpu0 keep using the same memory forever?

It is reallocated in setup_per_cpu_areas().

--
Brian Gerst

2009-01-27 17:51:17

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: unified percpu stuff

Brian Gerst wrote:
>> BTW, does the initial cpu0 percpu area get reallocated and moved during
>> boot, or does cpu0 keep using the same memory forever?
>>
>
> It is reallocated in setup_per_cpu_areas().
>

Hm, OK. What's the reason we need to move cpu 0's percpu area? Its
always seemed a bit awkward.

Once it has been moved, how can I find the address of a variable in the
original boot-time cpu 0 percpu area? There's a couple of pages which
Xen will have marked RO which need to be made RW if they're being freed
back into the kernel pool. Currently I use per_cpu_var(gdt_page), but
guess that's a small offset rather than a directly usable address.

Thanks,
J

2009-01-27 19:08:01

by Brian Gerst

[permalink] [raw]
Subject: Re: unified percpu stuff

On Tue, Jan 27, 2009 at 12:50 PM, Jeremy Fitzhardinge <[email protected]> wrote:
> Brian Gerst wrote:
>>>
>>> BTW, does the initial cpu0 percpu area get reallocated and moved during
>>> boot, or does cpu0 keep using the same memory forever?
>>>
>>
>> It is reallocated in setup_per_cpu_areas().
>>
>
> Hm, OK. What's the reason we need to move cpu 0's percpu area? Its always
> seemed a bit awkward.
>
> Once it has been moved, how can I find the address of a variable in the
> original boot-time cpu 0 percpu area? There's a couple of pages which Xen
> will have marked RO which need to be made RW if they're being freed back
> into the kernel pool. Currently I use per_cpu_var(gdt_page), but guess
> that's a small offset rather than a directly usable address.

The fact that the boot cpu area is moved isn't new. It's been that
way for a while. The reason it is moved is so that extra space can be
allocated at the end for dynamic allocations. It looks like Xen just
avoided it by not loading the per-cpu GDT until after the copy. I
guess you could get by without reloading the GDT, but you would still
need to set MSR_GS_BASE for 64-bit. 32-bit happens to work because
%fs is probably initialized to the same segment as %ds. If 32-bit
were ever to switch to zero-based per-cpu like 64-bit, that assumption
would fail.

You need to add __per_cpu_load to the offset to get the address of the
variable in the original boot per-cpu area.

--
Brian Gerst

2009-01-29 10:43:26

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: unified percpu stuff

Tejun Heo wrote:
> The current tj-percpu[1] contains Brian's commit which consolidates
> load_gs_base() into switch_to_new_gdt(),

There's a bit of a bootstrap problem using switch_to_new_gdt() on the
boot CPU: it uses smp_processor_id, which ends up doing a
percpu_read(cpu_number), which naturally crashes because %gs isn't set
up yet. Perhaps switch_to_new_gdt() should take the cpu number as a
parameter (or at least a __switch_to_new_gdt variant which does).

J

2009-01-29 11:01:42

by Tejun Heo

[permalink] [raw]
Subject: Re: unified percpu stuff

Jeremy Fitzhardinge wrote:
> Tejun Heo wrote:
>> The current tj-percpu[1] contains Brian's commit which consolidates
>> load_gs_base() into switch_to_new_gdt(),
>
> There's a bit of a bootstrap problem using switch_to_new_gdt() on the
> boot CPU: it uses smp_processor_id, which ends up doing a
> percpu_read(cpu_number), which naturally crashes because %gs isn't set
> up yet. Perhaps switch_to_new_gdt() should take the cpu number as a
> parameter (or at least a __switch_to_new_gdt variant which does).

Please feel free to submit a patch. It being a pretty low level
function, I think it would be better to convert it to take cpu_number.

Thanks.

--
tejun

2009-01-29 13:43:58

by Brian Gerst

[permalink] [raw]
Subject: [PATCH] x86: pass in cpu number to switch_to_new_gdt()

Impact: cleanup, prepare for xen boot fix.

Xen needs to call this function very early to setup the GDT and
per-cpu segments. Remove the call to smp_processor_id() and just
pass in the cpu number.

Signed-off-by: Brian Gerst <[email protected]>
---
arch/x86/include/asm/processor.h | 2 +-
arch/x86/kernel/cpu/common.c | 7 +++----
arch/x86/kernel/setup_percpu.c | 2 +-
arch/x86/kernel/smpboot.c | 2 +-
arch/x86/mach-voyager/voyager_smp.c | 11 ++++++-----
5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index befa20b..1c25eb6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -768,7 +768,7 @@ extern int sysenter_setup(void);
extern struct desc_ptr early_gdt_descr;

extern void cpu_set_gdt(int);
-extern void switch_to_new_gdt(void);
+extern void switch_to_new_gdt(int);
extern void cpu_init(void);

static inline unsigned long get_debugctlmsr(void)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 652fdc9..6eacd64 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -255,10 +255,9 @@ __u32 cleared_cpu_caps[NCAPINTS] __cpuinitdata;

/* Current gdt points %fs at the "master" per-cpu area: after this,
* it's on the real one. */
-void switch_to_new_gdt(void)
+void switch_to_new_gdt(int cpu)
{
struct desc_ptr gdt_descr;
- int cpu = smp_processor_id();

gdt_descr.address = (long)get_cpu_gdt_table(cpu);
gdt_descr.size = GDT_SIZE - 1;
@@ -993,7 +992,7 @@ void __cpuinit cpu_init(void)
* and set up the GDT descriptor:
*/

- switch_to_new_gdt();
+ switch_to_new_gdt(cpu);
loadsegment(fs, 0);

load_idt((const struct desc_ptr *)&idt_descr);
@@ -1098,7 +1097,7 @@ void __cpuinit cpu_init(void)
clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);

load_idt(&idt_descr);
- switch_to_new_gdt();
+ switch_to_new_gdt(cpu);

/*
* Set up and load the per-CPU TSS and LDT
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 0d1e7ac..ef91747 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -122,7 +122,7 @@ void __init setup_per_cpu_areas(void)
* area. Reload any changed state for the boot CPU.
*/
if (cpu == boot_cpu_id)
- switch_to_new_gdt();
+ switch_to_new_gdt(cpu);

DBG("PERCPU: cpu %4d %p\n", cpu, ptr);
}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f9dbcff..612d3c7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1185,7 +1185,7 @@ out:
void __init native_smp_prepare_boot_cpu(void)
{
int me = smp_processor_id();
- switch_to_new_gdt();
+ switch_to_new_gdt(me);
/* already set me in cpu_online_mask in boot_cpu_init() */
cpumask_set_cpu(me, cpu_callout_mask);
per_cpu(cpu_state, me) = CPU_ONLINE;
diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c
index 331cd6d..58c7cac 100644
--- a/arch/x86/mach-voyager/voyager_smp.c
+++ b/arch/x86/mach-voyager/voyager_smp.c
@@ -1746,12 +1746,13 @@ static void __init voyager_smp_prepare_cpus(unsigned int max_cpus)

static void __cpuinit voyager_smp_prepare_boot_cpu(void)
{
- switch_to_new_gdt();
+ int cpu = smp_processor_id();
+ switch_to_new_gdt(cpu);

- cpu_set(smp_processor_id(), cpu_online_map);
- cpu_set(smp_processor_id(), cpu_callout_map);
- cpu_set(smp_processor_id(), cpu_possible_map);
- cpu_set(smp_processor_id(), cpu_present_map);
+ cpu_set(cpu, cpu_online_map);
+ cpu_set(cpu, cpu_callout_map);
+ cpu_set(cpu, cpu_possible_map);
+ cpu_set(cpu, cpu_present_map);
}

static int __cpuinit voyager_cpu_up(unsigned int cpu)
--
1.6.1

2009-01-29 21:36:46

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] x86: pass in cpu number to switch_to_new_gdt()

Brian Gerst wrote:
> Impact: cleanup, prepare for xen boot fix.
>
> Xen needs to call this function very early to setup the GDT and
> per-cpu segments. Remove the call to smp_processor_id() and just
> pass in the cpu number.
>

Unfortunate that still not sufficient for a Xen boot. I had to separate
the gdt reload from the segment register reload. Patches follow.

J

2009-01-29 21:37:01

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 1/2] x86: split loading percpu segments from loading gdt

Xen needs to be able to access percpu data from very early on. For
various reasons, it cannot also load the gdt at that time. It does,
however, have a pefectly functional gdt at that point, so there's no
pressing need to reload the gdt.

Split the function to load the segment registers off, so Xen can call
it directly.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/include/asm/processor.h | 1 +
arch/x86/kernel/cpu/common.c | 18 ++++++++++++------
2 files changed, 13 insertions(+), 6 deletions(-)

===================================================================
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -769,6 +769,7 @@

extern void cpu_set_gdt(int);
extern void switch_to_new_gdt(int);
+extern void load_percpu_segment(int);
extern void cpu_init(void);

static inline unsigned long get_debugctlmsr(void)
===================================================================
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -297,6 +297,16 @@

__u32 cleared_cpu_caps[NCAPINTS] __cpuinitdata;

+void load_percpu_segment(int cpu)
+{
+#ifdef CONFIG_X86_32
+ loadsegment(fs, __KERNEL_PERCPU);
+#else
+ loadsegment(gs, 0);
+ wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
+#endif
+}
+
/* Current gdt points %fs at the "master" per-cpu area: after this,
* it's on the real one. */
void switch_to_new_gdt(int cpu)
@@ -307,12 +317,8 @@
gdt_descr.size = GDT_SIZE - 1;
load_gdt(&gdt_descr);
/* Reload the per-cpu base */
-#ifdef CONFIG_X86_32
- loadsegment(fs, __KERNEL_PERCPU);
-#else
- loadsegment(gs, 0);
- wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
-#endif
+
+ load_percpu_segment(cpu);
}

static struct cpu_dev *cpu_devs[X86_VENDOR_NUM] = {};

2009-01-29 21:37:35

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH 2/2] xen: setup percpu data pointers

We need to access percpu data fairly early, so set up the percpu
registers as soon as possible. We only need to load the appropriate
segment register. We already have a GDT, but its hard to change it
early because we need to manipulate the pagetable to do so, and that
hasn't been set up yet.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
arch/x86/xen/enlighten.c | 3 +++
arch/x86/xen/smp.c | 5 ++++-
2 files changed, 7 insertions(+), 1 deletion(-)

===================================================================
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1646,6 +1640,9 @@
have_vcpu_info_placement = 0;
#endif

+ /* setup percpu state */
+ load_percpu_segment(0);
+
xen_smp_init();

/* Get mfn list */
===================================================================
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -170,7 +170,8 @@

/* We've switched to the "real" per-cpu gdt, so make sure the
old memory can be recycled */
- make_lowmem_page_readwrite(&per_cpu_var(gdt_page));
+ make_lowmem_page_readwrite(__per_cpu_load +
+ (unsigned long)&per_cpu_var(gdt_page));

xen_setup_vcpu_info_placement();
}
@@ -235,6 +236,8 @@
ctxt->user_regs.ss = __KERNEL_DS;
#ifdef CONFIG_X86_32
ctxt->user_regs.fs = __KERNEL_PERCPU;
+#else
+ ctxt->gs_base_kernel = per_cpu_offset(cpu);
#endif
ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle;
ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */

2009-01-30 08:51:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] x86: pass in cpu number to switch_to_new_gdt()

Brian Gerst wrote:
> Impact: cleanup, prepare for xen boot fix.
>
> Xen needs to call this function very early to setup the GDT and
> per-cpu segments. Remove the call to smp_processor_id() and just
> pass in the cpu number.
>
> Signed-off-by: Brian Gerst <[email protected]>

Committed to #tj-percpu. Thanks.

--
tejun

2009-01-30 08:51:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/2] xen: setup percpu data pointers

Jeremy Fitzhardinge wrote:
> We need to access percpu data fairly early, so set up the percpu
> registers as soon as possible. We only need to load the appropriate
> segment register. We already have a GDT, but its hard to change it
> early because we need to manipulate the pagetable to do so, and that
> hasn't been set up yet.
>
> Signed-off-by: Jeremy Fitzhardinge <[email protected]>

Committed 1-2 to #tj-percpu. Thanks.

--
tejun