2009-12-14 22:06:35

by Christoph Lameter

[permalink] [raw]
Subject: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

Leftovers from the earlier patchset. Mostly applications of per cpu counters
to core components.

After this patchset there will be only one user of local_t left: Mathieu's
trace ringbuffer. Does it really need these ops?

V6->V7
- Drop patches merged in 2.6.33 merge cycle
- Drop risky slub patches

V5->V6:
- Drop patches merged by Tejun.
- Drop irqless slub fastpath for now.
- Patches against Tejun percpu for-next branch.

V4->V5:
- Avoid setup_per_cpu_area() modifications and fold the remainder of the
patch into the page allocator patch.
- Irq disable / per cpu ptr fixes for page allocator patch.

V3->V4:
- Fix various macro definitions.
- Provide experimental percpu based fastpath that does not disable
interrupts for SLUB.

V2->V3:
- Available via git tree against latest upstream from
git://git.kernel.org/pub/scm/linux/kernel/git/christoph/percpu.git linus
- Rework SLUB per cpu operations. Get rid of dynamic DMA slab creation
for CONFIG_ZONE_DMA
- Create fallback framework so that 64 bit ops on 32 bit platforms
can fallback to the use of preempt or interrupt disable. 64 bit
platforms can use 64 bit atomic per cpu ops.

V1->V2:
- Various minor fixes
- Add SLUB conversion
- Add Page allocator conversion
- Patch against the git tree of today


2009-12-15 06:37:50

by Pekka Enberg

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

On Mon, 2009-12-14 at 16:03 -0600, Christoph Lameter wrote:
> Leftovers from the earlier patchset. Mostly applications of per cpu counters
> to core components.

I can pick up the SLUB patches. Is that OK with you Tejun?

2009-12-15 06:46:10

by Tejun Heo

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

On 12/15/2009 03:37 PM, Pekka Enberg wrote:
> On Mon, 2009-12-14 at 16:03 -0600, Christoph Lameter wrote:
>> Leftovers from the earlier patchset. Mostly applications of per cpu counters
>> to core components.
>
> I can pick up the SLUB patches. Is that OK with you Tejun?

Yeap, now that this_cpu stuff is upstream, no reason to route them
through percpu tree. I'll be happy to pick up whatever is left.

Thanks.

--
tejun

2009-12-15 14:51:51

by Christoph Lameter

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

Small fixup patch to the slub patches:

Subject: Update remaining reference to get_cpu_slab

If CONFIG_SLUB_STATS is set then some additional code is being
compiled that was not updated since it was a recent addition to slub.

Signed-off-by: Christoph Lameter <[email protected]>

---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2009-12-15 08:47:44.000000000 -0600
+++ linux-2.6/mm/slub.c 2009-12-15 08:48:03.000000000 -0600
@@ -4221,7 +4221,7 @@ static void clear_stat(struct kmem_cache
int cpu;

for_each_online_cpu(cpu)
- get_cpu_slab(s, cpu)->stat[si] = 0;
+ per_cpu_ptr(s->cpu_slab, cpu)->stat[si] = 0;
}

#define STAT_ATTR(si, text) \

2009-12-15 17:06:33

by Mel Gorman

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

On Mon, Dec 14, 2009 at 04:03:20PM -0600, Christoph Lameter wrote:
> Leftovers from the earlier patchset. Mostly applications of per cpu counters
> to core components.
>
> After this patchset there will be only one user of local_t left: Mathieu's
> trace ringbuffer. Does it really need these ops?
>

What kernel are these patches based on? They do not cleanly apply and
when fixed up, they do not build against 2.6.32.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2009-12-15 17:43:12

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

* Christoph Lameter ([email protected]) wrote:
> Leftovers from the earlier patchset. Mostly applications of per cpu counters
> to core components.
>
> After this patchset there will be only one user of local_t left: Mathieu's
> trace ringbuffer. Does it really need these ops?
>

Besides my own ring buffer implementation in LTTng, at least Steven's
kernel/trace/ring_buffer.c (in mainline) use this too. We would need a
way to directly map to the same resulting behavior with per-cpu
variables.

In LTTng, I use local_cmpxchg, local_read, local_add and, in some
setups, local_add_return to manage the write counter and commit
counters. These per-cpu counters are kept in per-cpu buffer management
data allocated for each data collection "channel".

The current way I allocate this structure for all cpus is:

chan->buf = alloc_percpu(struct ltt_chanbuf);

But note that each struct ltt_chanbuf contains a pointer to an array
containing each sub-buffer commit counters for the given buffer:

struct commit_counters {
local_t cc;
local_t cc_sb; /* Incremented _once_ at sb switch */
local_t events; /* Event count */
};

struct ltt_chanbuf {
struct ltt_chanbuf_alloc a; /* Parent. First field. */
/* First 32 bytes cache-hot cacheline */
local_t offset; /* Current offset in the buffer */
struct commit_counters *commit_count;
/* Commit count per sub-buffer */
atomic_long_t consumed; /*
* Current offset in the buffer
* standard atomic access (shared)
*/
....

So I think accessing the "local_t offset" through percpu pointers should
be fine if I allocate struct ltt_chanbuf through the per cpu API.
However, I wonder how to deal with the commit_count counters, because
there is an indirection level.

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-12-16 00:57:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

Hello, Mathieu.

On 12/16/2009 02:43 AM, Mathieu Desnoyers wrote:
> So I think accessing the "local_t offset" through percpu pointers should
> be fine if I allocate struct ltt_chanbuf through the per cpu API.
> However, I wonder how to deal with the commit_count counters, because
> there is an indirection level.

Are they different in numbers for different cpus?

Thanks.

--
tejun

2009-12-16 01:40:41

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

* Tejun Heo ([email protected]) wrote:
> Hello, Mathieu.
>
> On 12/16/2009 02:43 AM, Mathieu Desnoyers wrote:
> > So I think accessing the "local_t offset" through percpu pointers should
> > be fine if I allocate struct ltt_chanbuf through the per cpu API.
> > However, I wonder how to deal with the commit_count counters, because
> > there is an indirection level.
>
> Are they different in numbers for different cpus?

Nope, there is the same number of sub-buffers for each per-cpu buffer.
I just want to see if supplementary indirections are allowed after
dereferencing the per-cpu pointer ?

Thanks,

Mathieu

>
> Thanks.
>
> --
> tejun

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-12-16 01:45:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

Hello,

On 12/16/2009 10:40 AM, Mathieu Desnoyers wrote:
> Nope, there is the same number of sub-buffers for each per-cpu buffer.
> I just want to see if supplementary indirections are allowed after
> dereferencing the per-cpu pointer ?

Hmmm... you can store percpu pointer to a variable. If there are the
same number of commit_count for each cpu, they can be allocated using
percpu allocator and their pointers can be stored, offset and
dereferenced. Would that be enough?

Thanks.

--
tejun

2009-12-16 14:46:17

by Christoph Lameter

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

On Tue, 15 Dec 2009, Mel Gorman wrote:

> What kernel are these patches based on? They do not cleanly apply and
> when fixed up, they do not build against 2.6.32.

Upstream. Linus tree.

2009-12-16 21:37:50

by Christoph Lameter

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

With todays git tree I get a reject. Will rediff when rc1 is out.

2009-12-17 13:40:07

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

* Tejun Heo ([email protected]) wrote:
> Hello,
>
> On 12/16/2009 10:40 AM, Mathieu Desnoyers wrote:
> > Nope, there is the same number of sub-buffers for each per-cpu buffer.
> > I just want to see if supplementary indirections are allowed after
> > dereferencing the per-cpu pointer ?
>
> Hmmm... you can store percpu pointer to a variable. If there are the
> same number of commit_count for each cpu, they can be allocated using
> percpu allocator and their pointers can be stored, offset and
> dereferenced. Would that be enough?

Yes, I think I could allocate, from the channel structure perspective:

- A percpu pointer to the per-cpu buffer structures
- A percpu pointer to the per-cpu commit counters.

This should fix my problem. The main change here is that the pointer to
the commit counters would not be located in the per-cpu buffer
structures anymore.

However, I would need:

this_cpu_cmpxchg(scalar, oldv, newv)
(maps to x86 cmpxchg)

this_cpu_add_return(scalar, value)
(maps to x86 xadd)

too. Is that a planned addition ?

(while we are at it, we might as will add the xchg instruction,
althrough it has an implied LOCK prefix on x86).

Thanks,

Mathieu

>
> Thanks.
>
> --
> tejun

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-12-17 19:30:01

by Christoph Lameter

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

> However, I would need:
>
> this_cpu_cmpxchg(scalar, oldv, newv)
> (maps to x86 cmpxchg)
>
> this_cpu_add_return(scalar, value)
> (maps to x86 xadd)
>
> too. Is that a planned addition ?

It was not necessary. Its easy to add though.

> (while we are at it, we might as will add the xchg instruction,
> althrough it has an implied LOCK prefix on x86).

Well yeah thats a thorny one. One could use the cmpxchg instead?

2009-12-17 20:26:31

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

* Christoph Lameter ([email protected]) wrote:
> > However, I would need:
> >
> > this_cpu_cmpxchg(scalar, oldv, newv)
> > (maps to x86 cmpxchg)
> >
> > this_cpu_add_return(scalar, value)
> > (maps to x86 xadd)
> >
> > too. Is that a planned addition ?
>
> It was not necessary. Its easy to add though.
>
> > (while we are at it, we might as will add the xchg instruction,
> > althrough it has an implied LOCK prefix on x86).
>
> Well yeah thats a thorny one. One could use the cmpxchg instead?

Yes, although maybe it would make sense to encapsulate it in a xchg
primitive anyway, in case some architecture has a better xchg than x86.
For instance, powerpc, with its linked load/store conditional, can skip
a comparison for xchg that's otherwise required for cmpxchg.

Some quick test on my Intel Xeon E5405:

local cmpxchg: 14 cycles
xchg: 18 cycles

So yes, indeed, the non-LOCK prefixed local cmpxchg seems a bit faster
than the xchg, given the latter has an implied LOCK prefix.

Code used for local cmpxchg:
old = var;
do {
ret = cmpxchg_local(&var, old, 4);
if (likely(ret == old))
break;
old = ret;
} while (1);

Thanks,

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-12-17 20:44:49

by Christoph Lameter

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

On Thu, 17 Dec 2009, Mathieu Desnoyers wrote:

> Some quick test on my Intel Xeon E5405:
>
> local cmpxchg: 14 cycles
> xchg: 18 cycles
>
> So yes, indeed, the non-LOCK prefixed local cmpxchg seems a bit faster
> than the xchg, given the latter has an implied LOCK prefix.
>
> Code used for local cmpxchg:
> old = var;
> do {
> ret = cmpxchg_local(&var, old, 4);
> if (likely(ret == old))
> break;
> old = ret;
> } while (1);
>

Great. Could you also put that into "patch-format"?

2009-12-18 00:14:13

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

* Christoph Lameter ([email protected]) wrote:
> On Thu, 17 Dec 2009, Mathieu Desnoyers wrote:
>
> > Some quick test on my Intel Xeon E5405:
> >
> > local cmpxchg: 14 cycles
> > xchg: 18 cycles
> >
> > So yes, indeed, the non-LOCK prefixed local cmpxchg seems a bit faster
> > than the xchg, given the latter has an implied LOCK prefix.
> >
> > Code used for local cmpxchg:
> > old = var;
> > do {
> > ret = cmpxchg_local(&var, old, 4);
> > if (likely(ret == old))
> > break;
> > old = ret;
> > } while (1);
> >
>
> Great. Could you also put that into "patch-format"?
>

Sure, can you point me to a git tree I should work on top of which
includes the per cpu infrastructure to extend ?

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-12-18 00:28:49

by Christoph Lameter

[permalink] [raw]
Subject: Re: [this_cpu_xx V7 0/8] Per cpu atomics in core allocators and cleanup

On Thu, 17 Dec 2009, Mathieu Desnoyers wrote:

> Sure, can you point me to a git tree I should work on top of which
> includes the per cpu infrastructure to extend ?

Linus' git tree contains what yoou need. I have a early draft here of a
patch to implement the generic portions. Unfinished. I hope I have time to
complete this. Feel free to complete it but keep me posted so that I wont
repeat anything you do.

The modifications to asm-generic/cmpxchg-local wont work since we need to
do this_cpu_ptr() pointer calculations within the protected sections. I
was in the middle of getting rid of it when I found it was time to go
home...


---
include/asm-generic/cmpxchg-local.h | 24 ++++-
include/linux/percpu.h | 151 ++++++++++++++++++++++++++++++++++++
2 files changed, 169 insertions(+), 6 deletions(-)

Index: linux-2.6/include/asm-generic/cmpxchg-local.h
===================================================================
--- linux-2.6.orig/include/asm-generic/cmpxchg-local.h 2009-12-17 17:44:01.000000000 -0600
+++ linux-2.6/include/asm-generic/cmpxchg-local.h 2009-12-17 17:46:31.000000000 -0600
@@ -6,13 +6,12 @@
extern unsigned long wrong_size_cmpxchg(volatile void *ptr);

/*
- * Generic version of __cmpxchg_local (disables interrupts). Takes an unsigned
- * long parameter, supporting various types of architectures.
+ * Generic version of __cmpxchg_local.
*/
-static inline unsigned long __cmpxchg_local_generic(volatile void *ptr,
+static inline unsigned long ____cmpxchg_local_generic(volatile void *ptr,
unsigned long old, unsigned long new, int size)
{
- unsigned long flags, prev;
+ unsigned long prev;

/*
* Sanity checking, compile-time.
@@ -20,7 +19,6 @@ static inline unsigned long __cmpxchg_lo
if (size == 8 && sizeof(unsigned long) != 8)
wrong_size_cmpxchg(ptr);

- local_irq_save(flags);
switch (size) {
case 1: prev = *(u8 *)ptr;
if (prev == old)
@@ -41,11 +39,25 @@ static inline unsigned long __cmpxchg_lo
default:
wrong_size_cmpxchg(ptr);
}
- local_irq_restore(flags);
return prev;
}

/*
+ * Generic version of __cmpxchg_local (disables interrupts). Takes an unsigned
+ * long parameter, supporting various types of architectures.
+ */
+static inline unsigned long __cmpxchg_local_generic(volatile void *ptr,
+ unsigned long old, unsigned long new, int size)
+{
+ unsigned long flags, r;
+
+ local_irq_save(flags);
+ r = ____cmpxchg_local_generic(ptr, old, new ,size);
+ local_irq_restore(flags);
+ return r;
+}
+
+/*
* Generic version of __cmpxchg64_local. Takes an u64 parameter.
*/
static inline u64 __cmpxchg64_local_generic(volatile void *ptr,
Index: linux-2.6/include/linux/percpu.h
===================================================================
--- linux-2.6.orig/include/linux/percpu.h 2009-12-17 17:31:10.000000000 -0600
+++ linux-2.6/include/linux/percpu.h 2009-12-17 18:23:02.000000000 -0600
@@ -443,6 +443,48 @@ do { \
# define this_cpu_xor(pcp, val) __pcpu_size_call(this_cpu_or_, (pcp), (val))
#endif

+#ifndef this_cpu_cmpxchg
+# ifndef this_cpu_cmpxchg_1
+# define this_cpu_cmpxchg_1(pcp, old, new) this_cpu_cmpxchg_generic(((pcp), (old), (new), 1)
+# endif
+# ifndef this_cpu_cmpxchg_2
+# define this_cpu_cmpxchg_2(pcp, old, new) this_cpu_cmpxchg_generic(((pcp), (old), (new), 2)
+# endif
+# ifndef this_cpu_cmpxchg_4
+# define this_cpu_cmpxchg_4(pcp, old, new) this_cpu_cmpxchg_generic(((pcp), (old), (new), 4)
+# endif
+# ifndef this_cpu_cmpxchg_8
+# define this_cpu_cmpxchg_8(pcp, old, new) this_cpu_cmpxchg_generic(((pcp), (old), (new), 8)
+# endif
+# define this_cpu_cmpxchg(pcp, old, new) __pcpu_size_call_return(this_cpu_cmpxchg_, (old), (new))
+#endif
+
+#define _this_cpu_generic_xchg_op(pcp, val) \
+ ({ \
+ typeof(*(var)) __tmp_var__; \
+ preempt_disable(); \
+ __tmp_var = __this_cpu_read(pcp); \
+ __this_cpu_read(pcp) = (val); \
+ preemt_enable(); \
+ __tmp_var__; \
+ })
+
+#ifndef this_cpu_xchg
+# ifndef this_cpu_xchg_1
+# define this_cpu_xchg_1(pcp, val) _this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# ifndef this_cpu_xchg_2
+# define this_cpu_xchg_2(pcp, val) _this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# ifndef this_cpu_xchg_4
+# define this_cpu_xchg_4(pcp, val) _this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# ifndef this_cpu_xchg_8
+# define this_cpu_xchg_8(pcp, val) _this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# define this_cpu_xchg(pcp, val) __pcpu_size_call_return(this_cpu_xchg_, (val))
+#endif
+
/*
* Generic percpu operations that do not require preemption handling.
* Either we do not care about races or the caller has the
@@ -594,6 +636,46 @@ do { \
# define __this_cpu_xor(pcp, val) __pcpu_size_call(__this_cpu_xor_, (pcp), (val))
#endif

+#ifndef __this_cpu_cmpxchg
+# ifndef __this_cpu_cmpxchg_1
+# define __this_cpu_cmpxchg _1(pcp, old, new) ____cmpxchg_local_generic(__this_cpu_ptr(pcp), (old), (new), 1)
+# endif
+# ifndef __this_cpu_cmpxchg_2
+# define __this_cpu_cmpxchg_2(pcp, old, new) ____cmpxchg_local_generic(__this_cpu_ptr(pcp), (old), (new), 2)
+# endif
+# ifndef __this_cpu_cmpxchg_4
+# define __this_cpu_cmpxchg_4(pcp, old, new) ____cmpxchg_local_generic(__this_cpu_ptr(pcp), (old), (new), 4)
+# endif
+# ifndef __this_cpu_cmpxchg_8
+# define __this_cpu_cmpxchg_8(pcp, old, new) ____cmpxchg_local_generic(__this_cpu_ptr(pcp), (old), (new), 8)
+# endif
+# define __this_cpu_cmpxchg(pcp, old, new) __pcpu_size_call_return(__this_cpu_cmpxchg_, (old), (new))
+#endif
+
+#define _this_cpu_generic_xchg_op(pcp, val) \
+ ({ \
+ typeof(*(var)) __tmp_var__; \
+ __tmp_var = __this_cpu_read(pcp); \
+ __this_cpu_write((pcp), val); \
+ __tmp_var__; \
+ })
+
+#ifndef __this_cpu_xchg
+# ifndef __this_cpu_xchg_1
+# define __this_cpu_xchg_1(pcp, val) __this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# ifndef __this_cpu_xchg_2
+# define __this_cpu_xchg_2(pcp, val) __this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# ifndef __this_cpu_xchg_4
+# define __this_cpu_xchg_4(pcp, val) __this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# ifndef __this_cpu_xchg_8
+# define __this_cpu_xchg_8(pcp, val) __this_cpu_generic_xchg_op((pcp), (val))
+# endif
+# define __this_cpu_xchg(pcp, val) __pcpu_size_call_return(__this_cpu_xchg_, (val))
+#endif
+
/*
* IRQ safe versions of the per cpu RMW operations. Note that these operations
* are *not* safe against modification of the same variable from another
@@ -709,4 +791,73 @@ do { \
# define irqsafe_cpu_xor(pcp, val) __pcpu_size_call(irqsafe_cpu_xor_, (val))
#endif

+#ifndef irqsafe_cpu_cmpxchg
+# ifndef irqsafe_cpu_cmpxchg_1
+# define irqsafe_cpu_cmpxchg_1(pcp, old, new) __cmpxchg_local_generic(((pcp), (old), (new), 1)
+# endif
+# ifndef irqsafe_cpu_cmpxchg_2
+# define irqsafe_cpu_cmpxchg_2(pcp, old, new) __cmpxchg_local_generic(((pcp), (old), (new), 2)
+# endif
+# ifndef irqsafe_cpu_cmpxchg_4
+# define irqsafe_cpu_cmpxchg_4(pcp, old, new) __cmpxchg_local_generic(((pcp), (old), (new), 4)
+# endif
+# ifndef irqsafe_cpu_cmpxchg_8
+# define irqsafe_cpu_cmpxchg_8(pcp, old, new) __cmpxchg_local_generic(((pcp), (old), (new), 8)
+# endif
+# define irqsafe_cpu_cmpxchg(pcp, old, new) __pcpu_size_call_return(irqsafe_cpu_cmpxchg_, (old), (new))
+#endif
+
+#define irqsafe_generic_xchg_op(pcp, val) \
+ ({ \
+ typeof(*(var)) __tmp_var__; \
+ unsigned long flags; \
+ local_irq_disable(flags); \
+ __tmp_var = __this_cpu_read(pcp); \
+ __this_cpu_write(pcp, val); \
+ local_irq_enable(flags); \
+ __tmp_var__; \
+ })
+
+#ifndef irqsafe_cpu_xchg
+# ifndef irqsafe_cpu_xchg_1
+# define irqsafe_cpu_xchg_1(pcp, val) irqsafe_generic_xchg_op((pcp), (val))
+# endif
+# ifndef irqsafe_cpu_xchg_2
+# define irqsafe_cpu_xchg_2(pcp, val) irqsafe_generic_xchg_op((pcp), (val))
+# endif
+# ifndef irqsafe_cpu_xchg_4
+# define irqsafe_cpu_xchg_4(pcp, val) irqsafe_generic_xchg_op((pcp), (val))
+# endif
+# ifndef irqsafe_cpu_xchg_8
+# define irqsafe_cpu_xchg_8(pcp, val) irqsafe_generic_xchg_op((pcp), (val))
+# endif
+# define irqsafe_cpu_xchg(pcp, val) __pcpu_size_call_return(irqsafe_cpu_xchg_, (val))
+#endif
+
+#define _this_cpu_generic_add_return_op(pcp, val) \
+do { \
+ preempt_disable(); \
+ *__this_cpu_ptr(&pcp) op val; \
+ preempt_enable(); \
+} while (0)
+
+
+#ifndef this_cpu_add_return
+# ifndef this_cpu_add_return_1
+# define this_cpu_add_return_1(pcp, val) _this_cpu_generic_add_return_op((pcp), (val))
+# endif
+# ifndef this_cpu_ _2
+# define this_cpu_add_return_2(pcp, val) _this_cpu_generic_add_return_op((pcp), (val))
+# endif
+# ifndef this_cpu_add_return_4
+# define this_cpu_add_return_4(pcp, val) _this_cpu_generic_add_return_op((pcp), (val))
+# endif
+# ifndef this_cpu_add_return_8
+# define this_cpu_add_return_8(pcp, val) _this_cpu_generic_add_return_op((pcp), (val))
+# endif
+# define this_cpu_add_return(pcp, val) __pcpu_size_call_return(_cpu_ _, (val))
+#endif
+
+#ifndef irqsafe_cpu_add_return
+
#endif /* __LINUX_PERCPU_H */