2012-11-13 01:51:48

by Shan Wei

[permalink] [raw]
Subject: [PATCH 0/9 v4] use efficient this_cpu_* helper

this_cpu_ptr/this_cpu_read is faster than per_cpu_ptr(p, smp_processor_id())
and can reduce memory accesses.
The latter helper needs to find the offset for current cpu,
and needs more assembler instructions which objdump shows in following.

this_cpu_ptr relocates and address. this_cpu_read() relocates the address
and performs the fetch. If you want to operate on rda(defined as per_cpu)
then you can only use this_cpu_ptr. this_cpu_read() saves you more instructions
since it can do the relocation and the fetch in one instruction.

per_cpu_ptr(p, smp_processor_id()):
1e: 65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
26: 48 98 cltq
28: 31 f6 xor %esi,%esi
2a: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
31: 48 8b 04 c5 00 00 00 00 mov 0x0(,%rax,8),%rax
39: c7 44 10 04 14 00 00 00 movl $0x14,0x4(%rax,%rdx,1)

this_cpu_ptr(p)
1e: 65 48 03 14 25 00 00 00 00 add %gs:0x0,%rdx
27: 31 f6 xor %esi,%esi
29: c7 42 04 14 00 00 00 movl $0x14,0x4(%rdx)
30: 48 c7 c7 00 00 00 00 mov $0x0,%rdi



Changelog V4:
1. [read|write]ing fields of struct rds_ib_cache_head using __this_cpu_* operation for rds subsystem.
see patch2
2. fix bug in xfrm to read pointer. see patch3.
3. avoid type cast in patch7.

Changelog V3:
1. use this_cpu_read directly read member of per-cpu variable,
so that droping the this_cpu_ptr operation.
2. for preemption off and bottom halves off case,
use __this_cpu_read instead of this_cpu_read.

Changelog V2:
1. Use this_cpu_read directly instead of ref to field of per-cpu variable.
2. Patch5 about ftrace is dropped from this series.
3. Add new patch9 to replace get_cpu;per_cpu_ptr;put_cpu with this_cpu_add opt.
4. For preemption disable case, use __this_cpu_read instead.


$ git diff --stat d4185bbf62a5d8d777ee445db1581beb17882a07
drivers/clocksource/arm_generic.c | 2 +-
kernel/padata.c | 5 ++---
kernel/rcutree.c | 2 +-
kernel/trace/blktrace.c | 2 +-
kernel/trace/trace.c | 5 +----
net/batman-adv/main.h | 4 +---
net/core/flow.c | 4 +---
net/openvswitch/datapath.c | 4 ++--
net/openvswitch/vport.c | 5 ++---
net/rds/ib.h | 2 +-
net/rds/ib_recv.c | 24 +++++++++++++-----------
net/xfrm/xfrm_ipcomp.c | 8 +++-----
12 files changed, 29 insertions(+), 38 deletions(-)


Subject: Re: [PATCH 0/9 v4] use efficient this_cpu_* helper

Tejon: Could you pick up this patchset?

On Tue, 13 Nov 2012, Shan Wei wrote:

> this_cpu_ptr/this_cpu_read is faster than per_cpu_ptr(p, smp_processor_id())
> and can reduce memory accesses.
> The latter helper needs to find the offset for current cpu,
> and needs more assembler instructions which objdump shows in following.
>
> this_cpu_ptr relocates and address. this_cpu_read() relocates the address
> and performs the fetch. If you want to operate on rda(defined as per_cpu)
> then you can only use this_cpu_ptr. this_cpu_read() saves you more instructions
> since it can do the relocation and the fetch in one instruction.
>
> per_cpu_ptr(p, smp_processor_id()):
> 1e: 65 8b 04 25 00 00 00 00 mov %gs:0x0,%eax
> 26: 48 98 cltq
> 28: 31 f6 xor %esi,%esi
> 2a: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
> 31: 48 8b 04 c5 00 00 00 00 mov 0x0(,%rax,8),%rax
> 39: c7 44 10 04 14 00 00 00 movl $0x14,0x4(%rax,%rdx,1)
>
> this_cpu_ptr(p)
> 1e: 65 48 03 14 25 00 00 00 00 add %gs:0x0,%rdx
> 27: 31 f6 xor %esi,%esi
> 29: c7 42 04 14 00 00 00 movl $0x14,0x4(%rdx)
> 30: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
>
>
>
> Changelog V4:
> 1. [read|write]ing fields of struct rds_ib_cache_head using __this_cpu_* operation for rds subsystem.
> see patch2
> 2. fix bug in xfrm to read pointer. see patch3.
> 3. avoid type cast in patch7.
>
> Changelog V3:
> 1. use this_cpu_read directly read member of per-cpu variable,
> so that droping the this_cpu_ptr operation.
> 2. for preemption off and bottom halves off case,
> use __this_cpu_read instead of this_cpu_read.
>
> Changelog V2:
> 1. Use this_cpu_read directly instead of ref to field of per-cpu variable.
> 2. Patch5 about ftrace is dropped from this series.
> 3. Add new patch9 to replace get_cpu;per_cpu_ptr;put_cpu with this_cpu_add opt.
> 4. For preemption disable case, use __this_cpu_read instead.
>
>
> $ git diff --stat d4185bbf62a5d8d777ee445db1581beb17882a07
> drivers/clocksource/arm_generic.c | 2 +-
> kernel/padata.c | 5 ++---
> kernel/rcutree.c | 2 +-
> kernel/trace/blktrace.c | 2 +-
> kernel/trace/trace.c | 5 +----
> net/batman-adv/main.h | 4 +---
> net/core/flow.c | 4 +---
> net/openvswitch/datapath.c | 4 ++--
> net/openvswitch/vport.c | 5 ++---
> net/rds/ib.h | 2 +-
> net/rds/ib_recv.c | 24 +++++++++++++-----------
> net/xfrm/xfrm_ipcomp.c | 8 +++-----
> 12 files changed, 29 insertions(+), 38 deletions(-)
>

2012-11-15 14:53:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/9 v4] use efficient this_cpu_* helper

On Thu, Nov 15, 2012 at 02:19:38PM +0000, Christoph Lameter wrote:
> Tejon: Could you pick up this patchset?

Sure, but, Shan, when posting patchset, please make the patches
replies to the head message; otherwise, it's pretty difficult to track
what's going on with the patchset as a whole. I see that some patches
are being picked up by respective subsystems. If you have patches
left, please let me know.

Thanks.

--
tejun

2012-11-16 08:30:32

by Shan Wei

[permalink] [raw]
Subject: Re: [PATCH 0/9 v4] use efficient this_cpu_* helper

Hi Tejun Heo:

Tejun Heo said, at 2012/11/15 22:53:
> On Thu, Nov 15, 2012 at 02:19:38PM +0000, Christoph Lameter wrote:
>> Tejon: Could you pick up this patchset?
>
> Sure, but, Shan, when posting patchset, please make the patches
> replies to the head message; otherwise, it's pretty difficult to track
> what's going on with the patchset as a whole. I see that some patches
> are being picked up by respective subsystems. If you have patches
> left, please let me know.

OK, next time i will do as you suggest.

This patchset include more subsystem, i.e network, rcu, trace.
The best way to avoid code conflict is subsystem maintainer to pick them up
to their code tree. I will remind them in each patch that not yet applied and
add you to the receiver list.

Best Regards
Shan Wei

>
> Thanks.
>