2008-06-04 00:30:31

by Mike Travis

[permalink] [raw]
Subject: [PATCH 0/4] percpu: Optimize percpu accesses


This patchset provides the following:

* Generic: Percpu infrastructure to rebase the per cpu area to zero

This provides for the capability of accessing the percpu variables
using a local register instead of having to go through a table
on node 0 to find the cpu-specific offsets. It also would allow
atomic operations on percpu variables to reduce required locking.
Uses a new config var HAVE_ZERO_BASED_PER_CPU to indicate to the
generic code that the arch has this new basing.

* x86_64: Fold pda into per cpu area

Declare the pda as a per cpu variable. This will move the pda
area to an address accessible by the x86_64 per cpu macros.
Subtraction of __per_cpu_start will make the offset based from
the beginning of the per cpu area. Since %gs is pointing to the
pda, it will then also point to the per cpu variables and can be
accessed thusly:

%gs:[&per_cpu_xxxx - __per_cpu_start]

* x86_64: Rebase per cpu variables to zero

Take advantage of the zero-based per cpu area provided above.
Then we can directly use the x86_32 percpu operations. x86_32
offsets %fs by __per_cpu_start. x86_64 has %gs pointing directly
to the pda and the per cpu area thereby allowing access to the
pda with the x86_64 pda operations and access to the per cpu
variables using x86_32 percpu operations.


Based on linux-2.6.tip

Signed-off-by: Christoph Lameter <[email protected]>
Signed-off-by: Mike Travis <[email protected]>
---

--


2008-06-04 10:19:28

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: [PATCH] x86: collapse the various size-dependent percpu accessors together

We can use gcc's %z modifier to emit the appropriate size suffix for
an instruction, so we don't need to duplicate the asm statement for
each size.

Signed-off-by: Jeremy Fitzhardinge <[email protected]>
---
include/asm-x86/percpu.h | 56 +++-------------------------------------------
1 file changed, 4 insertions(+), 52 deletions(-)

===================================================================
--- a/include/asm-x86/percpu.h
+++ b/include/asm-x86/percpu.h
@@ -75,22 +75,10 @@
} \
switch (sizeof(var)) { \
case 1: \
- asm(op "b %1,"__percpu_seg"%0" \
- : "+m" (var) \
- : "ri" ((T__)val)); \
- break; \
case 2: \
- asm(op "w %1,"__percpu_seg"%0" \
- : "+m" (var) \
- : "ri" ((T__)val)); \
- break; \
case 4: \
- asm(op "l %1,"__percpu_seg"%0" \
- : "+m" (var) \
- : "ri" ((T__)val)); \
- break; \
case 8: \
- asm(op "q %1,"__percpu_seg"%0" \
+ asm(op "%z0 %1,"__percpu_seg"%0" \
: "+m" (var) \
: "ri" ((T__)val)); \
break; \
@@ -103,22 +91,10 @@
typeof(var) ret__; \
switch (sizeof(var)) { \
case 1: \
- asm(op "b "__percpu_seg"%1,%0" \
- : "=r" (ret__) \
- : "m" (var)); \
- break; \
case 2: \
- asm(op "w "__percpu_seg"%1,%0" \
- : "=r" (ret__) \
- : "m" (var)); \
- break; \
case 4: \
- asm(op "l "__percpu_seg"%1,%0" \
- : "=r" (ret__) \
- : "m" (var)); \
- break; \
case 8: \
- asm(op "q "__percpu_seg"%1,%0" \
+ asm(op "%z1 "__percpu_seg"%1,%0" \
: "=r" (ret__) \
: "m" (var)); \
break; \
@@ -131,19 +107,10 @@
({ \
switch (sizeof(var)) { \
case 1: \
- asm(op "b "__percpu_seg"%0" \
- : : "m"(var)); \
- break; \
case 2: \
- asm(op "w "__percpu_seg"%0" \
- : : "m"(var)); \
- break; \
case 4: \
- asm(op "l "__percpu_seg"%0" \
- : : "m"(var)); \
- break; \
case 8: \
- asm(op "q "__percpu_seg"%0" \
+ asm(op "%z0 "__percpu_seg"%0" \
: : "m"(var)); \
break; \
default: __bad_percpu_size(); \
@@ -155,25 +122,10 @@
typeof(var) prev; \
switch (sizeof(var)) { \
case 1: \
- asm("cmpxchgb %b1, "__percpu_seg"%2" \
- : "=a"(prev) \
- : "q"(new), "m"(var), "0"(old) \
- : "memory"); \
- break; \
case 2: \
- asm("cmpxchgw %w1, "__percpu_seg"%2" \
- : "=a"(prev) \
- : "r"(new), "m"(var), "0"(old) \
- : "memory"); \
- break; \
case 4: \
- asm("cmpxchgl %k1, "__percpu_seg"%2" \
- : "=a"(prev) \
- : "r"(new), "m"(var), "0"(old) \
- : "memory"); \
- break; \
case 8: \
- asm("cmpxchgq %1, "__percpu_seg"%2" \
+ asm("cmpxchg%z1 %1, "__percpu_seg"%2" \
: "=a"(prev) \
: "r"(new), "m"(var), "0"(old) \
: "memory"); \

2008-06-04 10:46:40

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] x86: collapse the various size-dependent percpu accessors together

Jeremy Fitzhardinge wrote:
> We can use gcc's %z modifier to emit the appropriate size suffix for
> an instruction, so we don't need to duplicate the asm statement for
> each size.

Nah, it's a disaster. Drop this one.

J

2008-06-04 11:30:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86: collapse the various size-dependent percpu accessors together


* Jeremy Fitzhardinge <[email protected]> wrote:

> Jeremy Fitzhardinge wrote:
>> We can use gcc's %z modifier to emit the appropriate size suffix for
>> an instruction, so we don't need to duplicate the asm statement for
>> each size.
>
> Nah, it's a disaster. Drop this one.

hm, what's the problem with it? What you are trying to do here looks
like a nice cleanup - assuming it results in the same instructions
emitted ;-)

Ingo

2008-06-04 12:12:21

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] x86: collapse the various size-dependent percpu accessors together

Ingo Molnar wrote:
> * Jeremy Fitzhardinge <[email protected]> wrote:
>
>
>> Jeremy Fitzhardinge wrote:
>>
>>> We can use gcc's %z modifier to emit the appropriate size suffix for
>>> an instruction, so we don't need to duplicate the asm statement for
>>> each size.
>>>
>> Nah, it's a disaster. Drop this one.
>>
>
> hm, what's the problem with it? What you are trying to do here looks
> like a nice cleanup - assuming it results in the same instructions
> emitted ;-)

Yes, would have been lovely. But gcc emits junk:

CC arch/x86/xen/enlighten.o
{standard input}: Assembler messages:
{standard input}:637: Error: no such instruction: `movll %gs:per_cpu__xen_vcpu(%rip),%rax'
{standard input}:655: Error: no such instruction: `movll %gs:per_cpu__xen_vcpu(%rip),%rax'
{standard input}:671: Error: no such instruction: `movll %gs:per_cpu__xen_vcpu(%rip),%rax'
{standard input}:682: Error: no such instruction: `movll %gs:per_cpu__xen_vcpu(%rip),%rax'
{standard input}:783: Error: no such instruction: `movll %gs:per_cpu__pda+8(%rip),%rbx'
{standard input}:834: Error: no such instruction: `movll %gs:per_cpu__xen_mc_irq_flags(%rip),%rdi'
{standard input}:901: Error: no such instruction: `movll %gs:per_cpu__pda+8(%rip),%rbx'
{standard input}:978: Error: no such instruction: `movll %gs:per_cpu__xen_mc_irq_flags(%rip),%rdi'
{standard input}:1064: Error: no such instruction: `movll %gs:per_cpu__pda+8(%rip),%rbx'
{standard input}:1110: Error: no such instruction: `movll %gs:per_cpu__xen_mc_irq_flags(%rip),%rdi'
...
CC arch/x86/vdso/vclock_gettime.o
{standard input}: Assembler messages:
{standard input}:75: Error: suffix or operands invalid for `movs'
(all over the place)


I tried a version to do 64-bit accesses with an explicit "movq" to solve
the "movll" problem, but it generates "movs" on occasion and that was
the point I gave up.

J

2008-06-10 17:22:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: [PATCH] x86: collapse the various size-dependent percpu accessors together

> I tried a version to do 64-bit accesses with an explicit "movq" to solve the
> "movll" problem, but it generates "movs" on occasion and that was the point I
> gave up.

Shucks. Would have been a great approach.