2006-08-30 09:06:54

by Chuck Ebbert

[permalink] [raw]
Subject: [PATCH RFC 0/6] Implement per-processor data areas for i386.

In-Reply-To: <[email protected]>

On Sun, 27 Aug 2006 01:44:17 -0700, Jeremy Fitzhardinge wrote:

> This patch implements per-processor data areas by using %gs as the
> base segment of the per-processor memory.

This changes the ABI for signals and ptrace() and that seems like
a bad idea to me.

And the way things are done now is so ingrained into the i386
kernel that I'm not sure it can be done. E.g. I found two
open-coded implementations of current, one in kernel_fpu_begin()
and one in math_state_restore().

> - It also allows per-CPU data to be allocated as each CPU is brought
> up, rather than statically allocating it based on the maximum number
> of CPUs which could be brought up.

Can you describe what it is about the way things work now that
prevents dynamic allocation?

--
Chuck


2006-08-30 09:17:43

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

Chuck Ebbert wrote:
>> This patch implements per-processor data areas by using %gs as the
>> base segment of the per-processor memory.
>>
> This changes the ABI for signals and ptrace() and that seems like
> a bad idea to me.
>

I don't believe it does; it certainly shouldn't change the usermode
ABI. How do you see it changing?

> And the way things are done now is so ingrained into the i386
> kernel that I'm not sure it can be done. E.g. I found two
> open-coded implementations of current, one in kernel_fpu_begin()
> and one in math_state_restore().
>

That's OK. The current task will still be available in thread_info;
this is needed for very early CPU bringup anyway, before the PDA has
been set up. The vast majority of "get current task" operations can be
swept up by changing "current" however.

> Can you describe what it is about the way things work now that
> prevents dynamic allocation?
>

To be honest, I haven't looked at percpu.h in great detail. I was
making assumptions about how it works, but it looks like they were wrong.

J

2006-08-30 12:39:53

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

In-Reply-To: <[email protected]>

On Wed, 30 Aug 2006 02:17:28 -0700, Jeremy Fitzhardinge wrote:

> > This changes the ABI for signals and ptrace() and that seems like
> > a bad idea to me.
> >
>
> I don't believe it does; it certainly shouldn't change the usermode
> ABI. How do you see it changing?

Nevermind. I thought because you changed struct pt_regs in ptrace_abi.h
it meant a user ABI change.

> > And the way things are done now is so ingrained into the i386
> > kernel that I'm not sure it can be done. E.g. I found two
> > open-coded implementations of current, one in kernel_fpu_begin()
> > and one in math_state_restore().
> >
>
> That's OK. The current task will still be available in thread_info;

But they can get out of sync, e.g. when switch_to() restores the new
task's esp, the PDA still contains the old pcurrent and they don't get
synchronized until the write_pda() in __switch_to().

> To be honest, I haven't looked at percpu.h in great detail. I was
> making assumptions about how it works, but it looks like they were wrong.

Would it make any sense to replace the 'cpu' field in thread_info with
a pointer to a PDA-like structure? We could even embed the static per_cpu
data directly into that struct instead of chasing pointers...

--
Chuck

2006-08-30 12:54:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

On Wednesday 30 August 2006 14:33, Chuck Ebbert wrote:
> In-Reply-To: <[email protected]>
>
> On Wed, 30 Aug 2006 02:17:28 -0700, Jeremy Fitzhardinge wrote:
>
> > > This changes the ABI for signals and ptrace() and that seems like
> > > a bad idea to me.
> > >
> >
> > I don't believe it does; it certainly shouldn't change the usermode
> > ABI. How do you see it changing?
>
> Nevermind. I thought because you changed struct pt_regs in ptrace_abi.h
> it meant a user ABI change.

I think he broke the ptrace ABI actually in the first patch, but only by mistake
and he promised to fix it :)

>
> > > And the way things are done now is so ingrained into the i386
> > > kernel that I'm not sure it can be done. E.g. I found two
> > > open-coded implementations of current, one in kernel_fpu_begin()
> > > and one in math_state_restore().

Perhaps those should be fixed? Is there a reason they are open coded?

> > >
> >
> > That's OK. The current task will still be available in thread_info;
>
> But they can get out of sync, e.g. when switch_to() restores the new
> task's esp, the PDA still contains the old pcurrent and they don't get
> synchronized until the write_pda() in __switch_to().

But there is neither kernel_fpu_begin nor math_state_restore inbetween.
And I think interrupts are off too.

>
> > To be honest, I haven't looked at percpu.h in great detail. I was
> > making assumptions about how it works, but it looks like they were wrong.
>
> Would it make any sense to replace the 'cpu' field in thread_info with
> a pointer to a PDA-like structure? We could even embed the static per_cpu
> data directly into that struct instead of chasing pointers...

I don't see what advantage it would have. %gs is clearly faster and shorter.

-Andi

2006-08-30 16:39:15

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

Chuck Ebbert wrote:
> Nevermind. I thought because you changed struct pt_regs in ptrace_abi.h
> it meant a user ABI change.
>

Yes, that header seems pretty badly misnamed. The user-visible
structure is user_pt_regs.

> But they can get out of sync, e.g. when switch_to() restores the new
> task's esp, the PDA still contains the old pcurrent and they don't get
> synchronized until the write_pda() in __switch_to().
>

Yes, that's true, but the window is fairly small. More importantly, the
question of "what task is currently running" is fundamentally
ill-defined while you're in the middle of a context switch, so I don't
think this is a big issue. __switch_to runs on the new task's stack, so
that's effectively where current_thread_info()->task changes value
(aside from the few instructions in switch_to() between the %esp update
and the jmp to __switch_to). I could put the write_pda() earlier in
__switch_to so the window is smaller. But again, for it to make a
difference, someone would want to be using current *within* __switch_to,
which is just silly.

>> To be honest, I haven't looked at percpu.h in great detail. I was
>> making assumptions about how it works, but it looks like they were wrong.
>>
>
> Would it make any sense to replace the 'cpu' field in thread_info with
> a pointer to a PDA-like structure? We could even embed the static per_cpu
> data directly into that struct instead of chasing pointers...
>

I don't think so. The whole point is to make the pda easily accessible
with simple addressing modes based on %gs:. I have been wondering if
we can modify the percpu mechanism to get the linker to construct the
layout of the pda, so that all the existing percpu stuff can be
transparently moved into the pda and accessed efficiently. I think it
would be pretty tricky to get it all working though...

But the PDA isn't really intended for *all* per-cpu data; its mostly for
stuff which is accessed often, or needs to be quickly and easily
accessibly. My specific motivation is to use the PDA for easy access to
Xen per-cpu data, which needs to be accessible with short instructions
which can be inlined.

J

2006-08-30 16:48:42

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.


>
> I don't think so. The whole point is to make the pda easily accessible
> with simple addressing modes based on %gs:. I have been wondering if
> we can modify the percpu mechanism to get the linker to construct the
> layout of the pda, so that all the existing percpu stuff can be
> transparently moved into the pda and accessed efficiently. I think it
> would be pretty tricky to get it all working though...

I tried that once on x86-64, but it wasn't possible because the linker
is missing the right relocations. It has something on the first look similar
for __thread data in user space, but it wasn't usable for the kernel.

Even with a single indirection it is still far more efficient than a
array lookup.

-Andi

2006-08-30 17:13:44

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

Andi Kleen wrote:
> I tried that once on x86-64, but it wasn't possible because the linker
> is missing the right relocations. It has something on the first look similar
> for __thread data in user space, but it wasn't usable for the kernel.
>
The other difficulty is that you can't take the addresses of things in
the pda and pass them around, which happens a lot. It would be another
matter if you could add an attribute to a pointer which told gcc "always
use a %gs override for indirecting through this pointer", though that
would still require some type qualifier on any pointer which holds the
value.

(Hm, it would be interesting to see if we could possibly use the code
generated by gcc for TLS variables...)

> Even with a single indirection it is still far more efficient than a
> array lookup.

Yes. Even in the Xen case it will be useful to have a pointer to the
vcpu structure, at least until Xen can be convinced to put the vcpu
structure in the PDA itself.

J

2006-08-30 17:32:46

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH RFC 0/6] Implement per-processor data areas for i386.

On Wednesday 30 August 2006 19:13, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > I tried that once on x86-64, but it wasn't possible because the linker
> > is missing the right relocations. It has something on the first look similar
> > for __thread data in user space, but it wasn't usable for the kernel.
> >
> The other difficulty is that you can't take the addresses of things in
> the pda and pass them around, which happens a lot.

The user space __thread works around this by always storing the address at
offset 0. Kernel does it similar, except it's not at offset 0.


> (Hm, it would be interesting to see if we could possibly use the code
> generated by gcc for TLS variables...)

I tried once on 64bit and it wasn't possible for various reasons.

-Andi