2012-11-02 16:01:58

by Shan Wei

[permalink] [raw]
Subject: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

From: Shan Wei <[email protected]>

Signed-off-by: Shan Wei <[email protected]>
---
kernel/rcutree.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 74df86b..441b945 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
struct rcu_node *rnp_old = NULL;

/* Funnel through hierarchy to reduce memory contention. */
- rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
+ rnp = __this_cpu_read(rsp->rda->mynode);
for (; rnp != NULL; rnp = rnp->parent) {
ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
!raw_spin_trylock(&rnp->fqslock);
--
1.7.1


Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Sat, 3 Nov 2012, Shan Wei wrote:

>
> /* Funnel through hierarchy to reduce memory contention. */
> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> + rnp = __this_cpu_read(rsp->rda->mynode);
> for (; rnp != NULL; rnp = rnp->parent) {

Reviewed-by: Christoph Lameter <[email protected]>

2012-11-02 18:11:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> From: Shan Wei <[email protected]>
>
> Signed-off-by: Shan Wei <[email protected]>
> ---
> kernel/rcutree.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 74df86b..441b945 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
> struct rcu_node *rnp_old = NULL;
>
> /* Funnel through hierarchy to reduce memory contention. */
> - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> + rnp = __this_cpu_read(rsp->rda->mynode);

OK, I'll bite... Why this instead of:

rnp = __this_cpu_read(rsp->rda)->mynode;

Thanx, Paul

> for (; rnp != NULL; rnp = rnp->parent) {
> ret = (ACCESS_ONCE(rsp->gp_flags) & RCU_GP_FLAG_FQS) ||
> !raw_spin_trylock(&rnp->fqslock);
> --
> 1.7.1
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Fri, 2 Nov 2012, Paul E. McKenney wrote:

> On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> > From: Shan Wei <[email protected]>
> >
> > Signed-off-by: Shan Wei <[email protected]>
> > ---
> > kernel/rcutree.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 74df86b..441b945 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
> > struct rcu_node *rnp_old = NULL;
> >
> > /* Funnel through hierarchy to reduce memory contention. */
> > - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> > + rnp = __this_cpu_read(rsp->rda->mynode);
>
> OK, I'll bite... Why this instead of:
>
> rnp = __this_cpu_read(rsp->rda)->mynode;

Because this_cpu_read fetches a data word from an address. The addres is
relocated using a segment prefix (which contains the offset of the
current per cpu area).

And the address needed here is the address of the field of mynode
within a structure that has a per cpu address.

2012-11-03 09:19:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Fri, Nov 02, 2012 at 08:19:04PM +0000, Christoph Lameter wrote:
> On Fri, 2 Nov 2012, Paul E. McKenney wrote:
>
> > On Sat, Nov 03, 2012 at 12:01:47AM +0800, Shan Wei wrote:
> > > From: Shan Wei <[email protected]>
> > >
> > > Signed-off-by: Shan Wei <[email protected]>
> > > ---
> > > kernel/rcutree.c | 2 +-
> > > 1 files changed, 1 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > > index 74df86b..441b945 100644
> > > --- a/kernel/rcutree.c
> > > +++ b/kernel/rcutree.c
> > > @@ -1960,7 +1960,7 @@ static void force_quiescent_state(struct rcu_state *rsp)
> > > struct rcu_node *rnp_old = NULL;
> > >
> > > /* Funnel through hierarchy to reduce memory contention. */
> > > - rnp = per_cpu_ptr(rsp->rda, raw_smp_processor_id())->mynode;
> > > + rnp = __this_cpu_read(rsp->rda->mynode);
> >
> > OK, I'll bite... Why this instead of:
> >
> > rnp = __this_cpu_read(rsp->rda)->mynode;
>
> Because this_cpu_read fetches a data word from an address. The addres is
> relocated using a segment prefix (which contains the offset of the
> current per cpu area).
>
> And the address needed here is the address of the field of mynode
> within a structure that has a per cpu address.

OK, I do understand why it happens to work. My question is instead why
it is considered a good idea. After all, it is the ->rda field that is
marked __percpu, not the ->mynode field. So in the interest of
mechanical checking and general readability, it seems to me that it
would be way better to apply __this_cpu_read() to rsp->rda rather than
to rsp->rda->mynode.

Thanx, Paul

2012-11-04 10:38:44

by Shan Wei

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

Paul E. McKenney said, at 2012/11/3 17:19:
> OK, I do understand why it happens to work. My question is instead why
> it is considered a good idea.

Maybe objdump gives the answer.
__this_cpu_read which read member pointer of per-cpu variable
can reduce two instructions on x86-64 arch.


*test code:*
struct eater_state {
u32 state;
struct eater __percpu *eater_info;
};

struct eater {
char name[4];
u32 age;
};

static u32 test_func(struct eater_state *tstas)
{
struct eater *aeater;

//aeater = __this_cpu_ptr(tstas->eater_info); <-----------------1
//return aeater->age;
return __this_cpu_read(tstas->eater_info->age); <-----------------2
}

static int __init demo_init(void)
{
int ret = 0 ;
int age;
struct eater_state as;
struct eater david;

as.state = 1;
as.eater_info = &david;

age = test_func(&as);

return ret;
}


__this_cpu_ptr <-----------------1
0000000000000000 <init_module>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 8d 45 f0 lea -0x10(%rbp),%rax
c: 65 48 03 04 25 00 00 00 00 add %gs:0x0,%rax
15: 31 c0 xor %eax,%eax
17: c9 leaveq
18: c3 retq


__this_cpu_read<-----------------2
0000000000000000 <init_module>:
0: 55 push %rbp
1: 31 c0 xor %eax,%eax
3: 48 89 e5 mov %rsp,%rbp
6: 48 83 ec 10 sub $0x10,%rsp
a: c9 leaveq
b: c3 retq

Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Sun, 4 Nov 2012, Shan Wei wrote:

> __this_cpu_read<-----------------2
> 0000000000000000 <init_module>:
> 0: 55 push %rbp
> 1: 31 c0 xor %eax,%eax
> 3: 48 89 e5 mov %rsp,%rbp
> 6: 48 83 ec 10 sub $0x10,%rsp
> a: c9 leaveq
> b: c3 retq

?? There should be an operation using gs: here. This does not look
like code that includes a __this_cpu_read().

Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Sat, 3 Nov 2012, Paul E. McKenney wrote:

> OK, I do understand why it happens to work. My question is instead why
> it is considered a good idea. After all, it is the ->rda field that is
> marked __percpu, not the ->mynode field. So in the interest of
> mechanical checking and general readability, it seems to me that it
> would be way better to apply __this_cpu_read() to rsp->rda rather than
> to rsp->rda->mynode.

mynode is part of the structure reached via rda.

Use on rsp->rda does not work since the offset of mynode must be added to
rda before a fetch related to the current cpus per cpu address can be
done.

this_cpu_ptr relocates and address. this_cpu_read() relocates the address
and performs the fetch. If you want to operate on rda then you can only
use this_cpu_ptr. this_cpu_read() saves you more instructions since it can
do the relocation and the fetch in one instruction.

Subject: Re: [PATCH v2 6/9] rcu: use __this_cpu_read helper instead of per_cpu_ptr(p, raw_smp_processor_id())

On Mon, 5 Nov 2012, ���� wrote:

> I guarantee that x86-64 don't use gs register here. run test again��
> Maybe there is some optimizations for __this_cpu_read call on x86-64�� not
> sure.

There is no optimization that I know of unless the compiler eliminated the
__this_cpu_read completely. gs: is necessary to perform the implied
relocation in this_cpu_read().