2014-06-04 20:48:30

by Pranith Kumar

[permalink] [raw]
Subject: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

remove a redundant comparision

Signed-off-by: Pranith Kumar <[email protected]>
---
kernel/locking/rwsem-xadd.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1f99664b..6f8bd3c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -249,8 +249,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
{
if (!(count & RWSEM_ACTIVE_MASK)) {
/* try acquiring the write lock */
- if (sem->count == RWSEM_WAITING_BIAS &&
- cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
+ if (cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
if (!list_is_singular(&sem->wait_list))
rwsem_atomic_update(RWSEM_WAITING_BIAS, sem);
--
1.9.1


2014-06-04 20:56:52

by Andev

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

On Wed, Jun 4, 2014 at 4:38 PM, Pranith Kumar <[email protected]> wrote:
> remove a redundant comparision
>
> Signed-off-by: Pranith Kumar <[email protected]>
> ---
> kernel/locking/rwsem-xadd.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 1f99664b..6f8bd3c 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -249,8 +249,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
> {
> if (!(count & RWSEM_ACTIVE_MASK)) {
> /* try acquiring the write lock */
> - if (sem->count == RWSEM_WAITING_BIAS &&
> - cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> + if (cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {

This was mainly done to avoid the cost of a cmpxchg in case where they
are not equal. Not sure if it really makes a difference though.

--
Andev

2014-06-05 07:22:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

On Wed, Jun 04, 2014 at 04:56:50PM -0400, Andev wrote:
> On Wed, Jun 4, 2014 at 4:38 PM, Pranith Kumar <[email protected]> wrote:
> > remove a redundant comparision
> >
> > Signed-off-by: Pranith Kumar <[email protected]>
> > ---
> > kernel/locking/rwsem-xadd.c | 3 +--
> > 1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index 1f99664b..6f8bd3c 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -249,8 +249,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
> > {
> > if (!(count & RWSEM_ACTIVE_MASK)) {
> > /* try acquiring the write lock */
> > - if (sem->count == RWSEM_WAITING_BIAS &&
> > - cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> > + if (cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> > RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
>
> This was mainly done to avoid the cost of a cmpxchg in case where they
> are not equal. Not sure if it really makes a difference though.

It does, a cache hot cmpxchg instruction is 24 cycles (as is pretty much
any other LOCKed ins, as measured on my WSM-EP), not to mention that
cmpxchg is a RMW so it needs to grab the cacheline in exclusive mode.

A read, which allows the cacheline to remain in shared, and non LOCKed
ops are way faster.


Attachments:
(No filename) (1.37 kB)
(No filename) (836.00 B)
Download all attachments

2014-06-05 17:54:37

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

On Thu, 2014-06-05 at 09:22 +0200, Peter Zijlstra wrote:
> On Wed, Jun 04, 2014 at 04:56:50PM -0400, Andev wrote:
> > On Wed, Jun 4, 2014 at 4:38 PM, Pranith Kumar <[email protected]> wrote:
> > > remove a redundant comparision
> > >
> > > Signed-off-by: Pranith Kumar <[email protected]>
> > > ---
> > > kernel/locking/rwsem-xadd.c | 3 +--
> > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > > index 1f99664b..6f8bd3c 100644
> > > --- a/kernel/locking/rwsem-xadd.c
> > > +++ b/kernel/locking/rwsem-xadd.c
> > > @@ -249,8 +249,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
> > > {
> > > if (!(count & RWSEM_ACTIVE_MASK)) {
> > > /* try acquiring the write lock */
> > > - if (sem->count == RWSEM_WAITING_BIAS &&
> > > - cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> > > + if (cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> > > RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
> >
> > This was mainly done to avoid the cost of a cmpxchg in case where they
> > are not equal. Not sure if it really makes a difference though.
>
> It does, a cache hot cmpxchg instruction is 24 cycles (as is pretty much
> any other LOCKed ins, as measured on my WSM-EP), not to mention that
> cmpxchg is a RMW so it needs to grab the cacheline in exclusive mode.
>
> A read, which allows the cacheline to remain in shared, and non LOCKed
> ops are way faster.

Yep, and we also do it in mutexes. The numbers and benefits on larger
systems speaks for themselves. It would, perhaps, be worth adding a
comment as it does seem redundant if you're not thinking about the
cacheline when reading the code.

2014-06-05 18:08:31

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

On Thu, 2014-06-05 at 10:54 -0700, Davidlohr Bueso wrote:
> On Thu, 2014-06-05 at 09:22 +0200, Peter Zijlstra wrote:
> > On Wed, Jun 04, 2014 at 04:56:50PM -0400, Andev wrote:
> > > On Wed, Jun 4, 2014 at 4:38 PM, Pranith Kumar <[email protected]> wrote:
> > > > remove a redundant comparision
> > > >
> > > > Signed-off-by: Pranith Kumar <[email protected]>
> > > > ---
> > > > kernel/locking/rwsem-xadd.c | 3 +--
> > > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > > > index 1f99664b..6f8bd3c 100644
> > > > --- a/kernel/locking/rwsem-xadd.c
> > > > +++ b/kernel/locking/rwsem-xadd.c
> > > > @@ -249,8 +249,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
> > > > {
> > > > if (!(count & RWSEM_ACTIVE_MASK)) {
> > > > /* try acquiring the write lock */
> > > > - if (sem->count == RWSEM_WAITING_BIAS &&
> > > > - cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> > > > + if (cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
> > > > RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
> > >
> > > This was mainly done to avoid the cost of a cmpxchg in case where they
> > > are not equal. Not sure if it really makes a difference though.
> >
> > It does, a cache hot cmpxchg instruction is 24 cycles (as is pretty much
> > any other LOCKed ins, as measured on my WSM-EP), not to mention that
> > cmpxchg is a RMW so it needs to grab the cacheline in exclusive mode.
> >
> > A read, which allows the cacheline to remain in shared, and non LOCKed
> > ops are way faster.
>
> Yep, and we also do it in mutexes. The numbers and benefits on larger
> systems speaks for themselves. It would, perhaps, be worth adding a
> comment as it does seem redundant if you're not thinking about the
> cacheline when reading the code.

I knew I had formally read this technique somewhere:
http://pdos.csail.mit.edu/6.828/2010/readings/mcs.pdf (part 2.1).

Peter, what do you think of adding a new cmp_cmpxchg() or dcmpxchg()
call for such scenarios?

2014-06-06 03:10:04

by Pranith Kumar

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

On Thu, Jun 5, 2014 at 3:22 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Jun 04, 2014 at 04:56:50PM -0400, Andev wrote:
>> On Wed, Jun 4, 2014 at 4:38 PM, Pranith Kumar <[email protected]> wrote:
>> > remove a redundant comparision
>> >
>> > Signed-off-by: Pranith Kumar <[email protected]>
>> > ---
>> > kernel/locking/rwsem-xadd.c | 3 +--
>> > 1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
>> > index 1f99664b..6f8bd3c 100644
>> > --- a/kernel/locking/rwsem-xadd.c
>> > +++ b/kernel/locking/rwsem-xadd.c
>> > @@ -249,8 +249,7 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
>> > {
>> > if (!(count & RWSEM_ACTIVE_MASK)) {
>> > /* try acquiring the write lock */
>> > - if (sem->count == RWSEM_WAITING_BIAS &&
>> > - cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
>> > + if (cmpxchg(&sem->count, RWSEM_WAITING_BIAS,
>> > RWSEM_ACTIVE_WRITE_BIAS) == RWSEM_WAITING_BIAS) {
>>
>> This was mainly done to avoid the cost of a cmpxchg in case where they
>> are not equal. Not sure if it really makes a difference though.
>
> It does, a cache hot cmpxchg instruction is 24 cycles (as is pretty much
> any other LOCKed ins, as measured on my WSM-EP), not to mention that
> cmpxchg is a RMW so it needs to grab the cacheline in exclusive mode.
>
> A read, which allows the cacheline to remain in shared, and non LOCKed
> ops are way faster.


Ok, this means that we need to use more of such swaps on highly
contended paths. As Davidlohr suggested later on, I think it would be
a good idea to document this and add an API.

--
Pranith

2014-06-06 07:01:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 1/1] remove redundant compare, cmpxchg already does it

On Thu, Jun 05, 2014 at 11:08:23AM -0700, Davidlohr Bueso wrote:
> I knew I had formally read this technique somewhere:
> http://pdos.csail.mit.edu/6.828/2010/readings/mcs.pdf (part 2.1).
>
> Peter, what do you think of adding a new cmp_cmpxchg() or dcmpxchg()
> call for such scenarios?

Don't like dcmpxchg(), too easy to confuse with double-cmpxchg or
somesuch.

That said, I'm not entirely sure we want this primitive, the thing is,
people might use it ;-)

And its somewhat dangerous in that it explicitly does not provide any
kind of memory barrier on the fail path, where cmpxchg() is an
unconditional full memory barrier.

Also, you really don't want to use it in loops.

So I think it makes more sense to leave things as are and simply apply
the pattern where safe and meaningful.


Attachments:
(No filename) (791.00 B)
(No filename) (836.00 B)
Download all attachments