2013-03-09 15:38:46

by Ming Lei

[permalink] [raw]
Subject: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

Generally, both atomic_inc_unless_negative() and
atomic_dec_unless_positive() need at least two atomic_cmpxchg()
to complete the atomic operation. In fact, the 1st atomic_cmpxchg()
is just used to read current value of the atomic variable at most times.

Considered memory barrier, bus lock, cache walking, etc. things may be
involved in atomic_cmpxchg(), it is much expensive than atomic_read(),
which is just the simple below:

(*(volatile int *)&(v)->counter)

so this patch can save one extra atomic_cmpxchg() for the two
helpers under general situation, and should improve them a bit.

Cc: Andrew Morton <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
include/linux/atomic.h | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 5b08a85..aa951d8 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -63,26 +63,34 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
#ifndef atomic_inc_unless_negative
static inline int atomic_inc_unless_negative(atomic_t *p)
{
- int v, v1;
- for (v = 0; v >= 0; v = v1) {
- v1 = atomic_cmpxchg(p, v, v + 1);
- if (likely(v1 == v))
+ int v, t;
+
+ v = atomic_read(p);
+ while (1) {
+ if (unlikely(v < 0))
+ return 0;
+ t = atomic_cmpxchg(p, v, v + 1);
+ if (likely(t == v))
return 1;
+ v = t;
}
- return 0;
}
#endif

#ifndef atomic_dec_unless_positive
static inline int atomic_dec_unless_positive(atomic_t *p)
{
- int v, v1;
- for (v = 0; v <= 0; v = v1) {
- v1 = atomic_cmpxchg(p, v, v - 1);
- if (likely(v1 == v))
+ int v, t;
+
+ v = atomic_read(p);
+ while (1) {
+ if (unlikely(v > 0))
+ return 0;
+ t = atomic_cmpxchg(p, v, v - 1);
+ if (likely(t == v))
return 1;
+ v = t;
}
- return 0;
}
#endif

--
1.7.9.5


2013-03-11 23:45:29

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Sat, 9 Mar 2013 23:38:21 +0800 Ming Lei <[email protected]> wrote:

> Generally, both atomic_inc_unless_negative() and
> atomic_dec_unless_positive() need at least two atomic_cmpxchg()
> to complete the atomic operation. In fact, the 1st atomic_cmpxchg()
> is just used to read current value of the atomic variable at most times.
>
> Considered memory barrier, bus lock, cache walking, etc. things may be
> involved in atomic_cmpxchg(), it is much expensive than atomic_read(),
> which is just the simple below:
>
> (*(volatile int *)&(v)->counter)
>
> so this patch can save one extra atomic_cmpxchg() for the two
> helpers under general situation, and should improve them a bit.
>
> ...
>
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -63,26 +63,34 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
> #ifndef atomic_inc_unless_negative
> static inline int atomic_inc_unless_negative(atomic_t *p)
> {
> - int v, v1;
> - for (v = 0; v >= 0; v = v1) {
> - v1 = atomic_cmpxchg(p, v, v + 1);
> - if (likely(v1 == v))
> + int v, t;
> +
> + v = atomic_read(p);
> + while (1) {
> + if (unlikely(v < 0))
> + return 0;
> + t = atomic_cmpxchg(p, v, v + 1);
> + if (likely(t == v))
> return 1;
> + v = t;
> }
> - return 0;

It looks right to me...

2013-03-12 00:00:03

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

2013/3/9 Ming Lei <[email protected]>:
> Generally, both atomic_inc_unless_negative() and
> atomic_dec_unless_positive() need at least two atomic_cmpxchg()
> to complete the atomic operation. In fact, the 1st atomic_cmpxchg()
> is just used to read current value of the atomic variable at most times.
>
> Considered memory barrier, bus lock, cache walking, etc. things may be
> involved in atomic_cmpxchg(), it is much expensive than atomic_read(),
> which is just the simple below:
>
> (*(volatile int *)&(v)->counter)
>
> so this patch can save one extra atomic_cmpxchg() for the two
> helpers under general situation, and should improve them a bit.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Shaohua Li <[email protected]>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> include/linux/atomic.h | 28 ++++++++++++++++++----------
> 1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 5b08a85..aa951d8 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -63,26 +63,34 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
> #ifndef atomic_inc_unless_negative
> static inline int atomic_inc_unless_negative(atomic_t *p)
> {
> - int v, v1;
> - for (v = 0; v >= 0; v = v1) {
> - v1 = atomic_cmpxchg(p, v, v + 1);
> - if (likely(v1 == v))
> + int v, t;
> +
> + v = atomic_read(p);
> + while (1) {
> + if (unlikely(v < 0))
> + return 0;

But atomic_read() lacks the full memory barrier that is needed for
proper atomicity here.

For example if the initial value of p is -1 and another CPU just did
an atomic_inc() that resulted in the new value to be 0, the above
atomic_read() might return -1 because there is no guarantee it's
seeing the recent update on the remote CPU.

atomic_cmpxchg() OTOH provides that guarantee.

2013-03-12 00:24:47

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Sat, Mar 09, 2013 at 11:38:21PM +0800, Ming Lei wrote:
> Generally, both atomic_inc_unless_negative() and
> atomic_dec_unless_positive() need at least two atomic_cmpxchg()
> to complete the atomic operation. In fact, the 1st atomic_cmpxchg()
> is just used to read current value of the atomic variable at most times.
>
> Considered memory barrier, bus lock, cache walking, etc. things may be
> involved in atomic_cmpxchg(), it is much expensive than atomic_read(),
> which is just the simple below:
>
> (*(volatile int *)&(v)->counter)
>
> so this patch can save one extra atomic_cmpxchg() for the two
> helpers under general situation, and should improve them a bit.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Shaohua Li <[email protected]>
> Cc: Al Viro <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> include/linux/atomic.h | 28 ++++++++++++++++++----------
> 1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 5b08a85..aa951d8 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -63,26 +63,34 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
> #ifndef atomic_inc_unless_negative
> static inline int atomic_inc_unless_negative(atomic_t *p)
> {
> - int v, v1;
> - for (v = 0; v >= 0; v = v1) {
> - v1 = atomic_cmpxchg(p, v, v + 1);
> - if (likely(v1 == v))
> + int v, t;
> +
> + v = atomic_read(p);
> + while (1) {
> + if (unlikely(v < 0))

As Frederic noted, you need an smp_mb() right here. Because
atomic_inc_unless_negative() returns a value, it is required to
act as a full memory barrier. On the other code paths, atomic_cmpxchg()
supplies the needed memory ordering, but not on this path.

> + return 0;
> + t = atomic_cmpxchg(p, v, v + 1);
> + if (likely(t == v))
> return 1;
> + v = t;
> }
> - return 0;
> }
> #endif
>
> #ifndef atomic_dec_unless_positive
> static inline int atomic_dec_unless_positive(atomic_t *p)
> {
> - int v, v1;
> - for (v = 0; v <= 0; v = v1) {
> - v1 = atomic_cmpxchg(p, v, v - 1);
> - if (likely(v1 == v))
> + int v, t;
> +
> + v = atomic_read(p);
> + while (1) {
> + if (unlikely(v > 0))

Ditto here.

Thanx, Paul

> + return 0;
> + t = atomic_cmpxchg(p, v, v - 1);
> + if (likely(t == v))
> return 1;
> + v = t;
> }
> - return 0;
> }
> #endif
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-12 02:15:44

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Tue, Mar 12, 2013 at 7:59 AM, Frederic Weisbecker <[email protected]> wrote:
> 2013/3/9 Ming Lei <[email protected]>:
>> Generally, both atomic_inc_unless_negative() and
>> atomic_dec_unless_positive() need at least two atomic_cmpxchg()
>> to complete the atomic operation. In fact, the 1st atomic_cmpxchg()
>> is just used to read current value of the atomic variable at most times.
>>
>> Considered memory barrier, bus lock, cache walking, etc. things may be
>> involved in atomic_cmpxchg(), it is much expensive than atomic_read(),
>> which is just the simple below:
>>
>> (*(volatile int *)&(v)->counter)
>>
>> so this patch can save one extra atomic_cmpxchg() for the two
>> helpers under general situation, and should improve them a bit.
>>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Shaohua Li <[email protected]>
>> Cc: Al Viro <[email protected]>
>> Signed-off-by: Ming Lei <[email protected]>
>> ---
>> include/linux/atomic.h | 28 ++++++++++++++++++----------
>> 1 file changed, 18 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
>> index 5b08a85..aa951d8 100644
>> --- a/include/linux/atomic.h
>> +++ b/include/linux/atomic.h
>> @@ -63,26 +63,34 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
>> #ifndef atomic_inc_unless_negative
>> static inline int atomic_inc_unless_negative(atomic_t *p)
>> {
>> - int v, v1;
>> - for (v = 0; v >= 0; v = v1) {
>> - v1 = atomic_cmpxchg(p, v, v + 1);
>> - if (likely(v1 == v))
>> + int v, t;
>> +
>> + v = atomic_read(p);
>> + while (1) {
>> + if (unlikely(v < 0))
>> + return 0;
>
> But atomic_read() lacks the full memory barrier that is needed for
> proper atomicity here.
>
> For example if the initial value of p is -1 and another CPU just did
> an atomic_inc() that resulted in the new value to be 0, the above
> atomic_read() might return -1 because there is no guarantee it's
> seeing the recent update on the remote CPU.

Yes, you are right. Also looks memory barrier is needed around
atomic_inc() too.

But I have a question, why a memory barrier can guarantee that
remote CPU can see the recent update? I understand that memory
barrier only orders consecutive memory access, and but here
not see this kind of pattern. Sorry for a possibly stupid question.


Thanks,
--
Ming Lei

2013-03-12 03:39:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Tue, Mar 12, 2013 at 10:15:42AM +0800, Ming Lei wrote:
> On Tue, Mar 12, 2013 at 7:59 AM, Frederic Weisbecker <[email protected]> wrote:
> > 2013/3/9 Ming Lei <[email protected]>:
> >> Generally, both atomic_inc_unless_negative() and
> >> atomic_dec_unless_positive() need at least two atomic_cmpxchg()
> >> to complete the atomic operation. In fact, the 1st atomic_cmpxchg()
> >> is just used to read current value of the atomic variable at most times.
> >>
> >> Considered memory barrier, bus lock, cache walking, etc. things may be
> >> involved in atomic_cmpxchg(), it is much expensive than atomic_read(),
> >> which is just the simple below:
> >>
> >> (*(volatile int *)&(v)->counter)
> >>
> >> so this patch can save one extra atomic_cmpxchg() for the two
> >> helpers under general situation, and should improve them a bit.
> >>
> >> Cc: Andrew Morton <[email protected]>
> >> Cc: Shaohua Li <[email protected]>
> >> Cc: Al Viro <[email protected]>
> >> Signed-off-by: Ming Lei <[email protected]>
> >> ---
> >> include/linux/atomic.h | 28 ++++++++++++++++++----------
> >> 1 file changed, 18 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> >> index 5b08a85..aa951d8 100644
> >> --- a/include/linux/atomic.h
> >> +++ b/include/linux/atomic.h
> >> @@ -63,26 +63,34 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
> >> #ifndef atomic_inc_unless_negative
> >> static inline int atomic_inc_unless_negative(atomic_t *p)
> >> {
> >> - int v, v1;
> >> - for (v = 0; v >= 0; v = v1) {
> >> - v1 = atomic_cmpxchg(p, v, v + 1);
> >> - if (likely(v1 == v))
> >> + int v, t;
> >> +
> >> + v = atomic_read(p);
> >> + while (1) {
> >> + if (unlikely(v < 0))
> >> + return 0;
> >
> > But atomic_read() lacks the full memory barrier that is needed for
> > proper atomicity here.
> >
> > For example if the initial value of p is -1 and another CPU just did
> > an atomic_inc() that resulted in the new value to be 0, the above
> > atomic_read() might return -1 because there is no guarantee it's
> > seeing the recent update on the remote CPU.
>
> Yes, you are right. Also looks memory barrier is needed around
> atomic_inc() too.

The atomic_inc() primitive makes no guarantees about ordering (see
Documentation/atomic_ops.txt), so, yes, if the caller needs ordering
around an atomic_inc(), the caller must supply the needed memory
barriers, for example, by using smp_mb__before_atomic_inc() or
smp_mb__after_atomic_inc().

> But I have a question, why a memory barrier can guarantee that
> remote CPU can see the recent update? I understand that memory
> barrier only orders consecutive memory access, and but here
> not see this kind of pattern. Sorry for a possibly stupid question.

Atomic operations that return a value are required to act as full memory
barriers. This means that code relying on ordering provided by these
atomic operations must also do ordering, either by using an explicit
memory barrier or by relying on guarantees from atomic operations.

For example:

CPU 0 CPU 1

X = 1; r1 = Z;
if (atomic_inc_unless_negative(&Y) smp_mb();
do_something();
Z = 1; r2 = X;

Assuming X and Z are initially zero, if r1==1, we are guaranteed
that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
the barrier implicit in atomic_inc_unless_negative().

Make sense?

Thanx, Paul

2013-03-12 04:03:25

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Tue, Mar 12, 2013 at 11:39 AM, Paul E. McKenney
<[email protected]> wrote:
>
> Atomic operations that return a value are required to act as full memory
> barriers. This means that code relying on ordering provided by these
> atomic operations must also do ordering, either by using an explicit
> memory barrier or by relying on guarantees from atomic operations.
>
> For example:
>
> CPU 0 CPU 1
>
> X = 1; r1 = Z;
> if (atomic_inc_unless_negative(&Y) smp_mb();
> do_something();
> Z = 1; r2 = X;
>
> Assuming X and Z are initially zero, if r1==1, we are guaranteed
> that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
> the barrier implicit in atomic_inc_unless_negative().
>
> Make sense?

Yes, it does, and thanks for the explanation.

But looks the above example is not what Frederic described:

"the above atomic_read() might return -1 because there is no
guarantee it's seeing the recent update on the remote CPU."

Even I am not sure if adding one smp_mb() around atomic_read()
can guarantee that too.

Andrew, please ignore the patch, thanks.

Thanks,
--
Ming Lei

2013-03-12 14:48:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Tue, Mar 12, 2013 at 12:03:23PM +0800, Ming Lei wrote:
> On Tue, Mar 12, 2013 at 11:39 AM, Paul E. McKenney
> <[email protected]> wrote:
> >
> > Atomic operations that return a value are required to act as full memory
> > barriers. This means that code relying on ordering provided by these
> > atomic operations must also do ordering, either by using an explicit
> > memory barrier or by relying on guarantees from atomic operations.
> >
> > For example:
> >
> > CPU 0 CPU 1
> >
> > X = 1; r1 = Z;
> > if (atomic_inc_unless_negative(&Y) smp_mb();
> > do_something();
> > Z = 1; r2 = X;
> >
> > Assuming X and Z are initially zero, if r1==1, we are guaranteed
> > that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
> > the barrier implicit in atomic_inc_unless_negative().
> >
> > Make sense?
>
> Yes, it does, and thanks for the explanation.
>
> But looks the above example is not what Frederic described:
>
> "the above atomic_read() might return -1 because there is no
> guarantee it's seeing the recent update on the remote CPU."
>
> Even I am not sure if adding one smp_mb() around atomic_read()
> can guarantee that too.

Frederic was likely thinking of some other scenario that would be
broken by atomic_inc_unless_negative() failing to act as a full
memory barrier. Here is another example:


CPU 0 CPU 1

X = 1;
if (atomic_inc_unless_negative(&Y) r1 = atomic_xchg(&Y, -1);
r2 = X;

If atomic_inc_unless_negative() acts as a full memory barrier, then
if CPU 0 reaches the assignment from X, the results will be guaranteed
to be 1. Otherwise, there is no guarantee.

Thanx, Paul

2013-03-12 15:02:52

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

2013/3/12 Paul E. McKenney <[email protected]>:
> On Tue, Mar 12, 2013 at 12:03:23PM +0800, Ming Lei wrote:
>> On Tue, Mar 12, 2013 at 11:39 AM, Paul E. McKenney
>> <[email protected]> wrote:
>> >
>> > Atomic operations that return a value are required to act as full memory
>> > barriers. This means that code relying on ordering provided by these
>> > atomic operations must also do ordering, either by using an explicit
>> > memory barrier or by relying on guarantees from atomic operations.
>> >
>> > For example:
>> >
>> > CPU 0 CPU 1
>> >
>> > X = 1; r1 = Z;
>> > if (atomic_inc_unless_negative(&Y) smp_mb();
>> > do_something();
>> > Z = 1; r2 = X;
>> >
>> > Assuming X and Z are initially zero, if r1==1, we are guaranteed
>> > that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
>> > the barrier implicit in atomic_inc_unless_negative().
>> >
>> > Make sense?
>>
>> Yes, it does, and thanks for the explanation.
>>
>> But looks the above example is not what Frederic described:
>>
>> "the above atomic_read() might return -1 because there is no
>> guarantee it's seeing the recent update on the remote CPU."
>>
>> Even I am not sure if adding one smp_mb() around atomic_read()
>> can guarantee that too.
>
> Frederic was likely thinking of some other scenario that would be
> broken by atomic_inc_unless_negative() failing to act as a full
> memory barrier. Here is another example:
>
>
> CPU 0 CPU 1
>
> X = 1;
> if (atomic_inc_unless_negative(&Y) r1 = atomic_xchg(&Y, -1);
> r2 = X;
>
> If atomic_inc_unless_negative() acts as a full memory barrier, then
> if CPU 0 reaches the assignment from X, the results will be guaranteed
> to be 1. Otherwise, there is no guarantee.

Your scenarios show an interesting guarantee I did not think about.
But my concern was on such a situation:

CPU 0 CPU 1

atomic_set(&X, -1)
atomic_inc(&X)
atomic_add_unless_negative(&X, 5)

On the above situation, CPU 0 may still see X == -1 and thus not add
the 5. Of course all that only make sense with datas coming along.

2013-03-12 17:57:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Tue, Mar 12, 2013 at 04:02:47PM +0100, Frederic Weisbecker wrote:
> 2013/3/12 Paul E. McKenney <[email protected]>:
> > On Tue, Mar 12, 2013 at 12:03:23PM +0800, Ming Lei wrote:
> >> On Tue, Mar 12, 2013 at 11:39 AM, Paul E. McKenney
> >> <[email protected]> wrote:
> >> >
> >> > Atomic operations that return a value are required to act as full memory
> >> > barriers. This means that code relying on ordering provided by these
> >> > atomic operations must also do ordering, either by using an explicit
> >> > memory barrier or by relying on guarantees from atomic operations.
> >> >
> >> > For example:
> >> >
> >> > CPU 0 CPU 1
> >> >
> >> > X = 1; r1 = Z;
> >> > if (atomic_inc_unless_negative(&Y) smp_mb();
> >> > do_something();
> >> > Z = 1; r2 = X;
> >> >
> >> > Assuming X and Z are initially zero, if r1==1, we are guaranteed
> >> > that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
> >> > the barrier implicit in atomic_inc_unless_negative().
> >> >
> >> > Make sense?
> >>
> >> Yes, it does, and thanks for the explanation.
> >>
> >> But looks the above example is not what Frederic described:
> >>
> >> "the above atomic_read() might return -1 because there is no
> >> guarantee it's seeing the recent update on the remote CPU."
> >>
> >> Even I am not sure if adding one smp_mb() around atomic_read()
> >> can guarantee that too.
> >
> > Frederic was likely thinking of some other scenario that would be
> > broken by atomic_inc_unless_negative() failing to act as a full
> > memory barrier. Here is another example:
> >
> >
> > CPU 0 CPU 1
> >
> > X = 1;
> > if (atomic_inc_unless_negative(&Y) r1 = atomic_xchg(&Y, -1);
> > r2 = X;
> >
> > If atomic_inc_unless_negative() acts as a full memory barrier, then
> > if CPU 0 reaches the assignment from X, the results will be guaranteed
> > to be 1. Otherwise, there is no guarantee.
>
> Your scenarios show an interesting guarantee I did not think about.
> But my concern was on such a situation:
>
> CPU 0 CPU 1
>
> atomic_set(&X, -1)
> atomic_inc(&X)
> atomic_add_unless_negative(&X, 5)
>
> On the above situation, CPU 0 may still see X == -1 and thus not add
> the 5. Of course all that only make sense with datas coming along.

That could happen, but you would need CPU 1 to carry out some other
reference for it to be a bug. Otherwise, CPU 1's atomic_inc() just
happened after all of CPU 0's code. But yes, it would be possible
to misorder with some larger scenario starting with this example.
Especially given that atomic_inc() does not make any ordering guarantees.

Thanx, Paul

Thanx, Paul

2013-03-13 09:33:12

by anish singh

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Tue, Mar 12, 2013 at 11:25 PM, Paul E. McKenney
<[email protected]> wrote:
> On Tue, Mar 12, 2013 at 04:02:47PM +0100, Frederic Weisbecker wrote:
>> 2013/3/12 Paul E. McKenney <[email protected]>:
>> > On Tue, Mar 12, 2013 at 12:03:23PM +0800, Ming Lei wrote:
>> >> On Tue, Mar 12, 2013 at 11:39 AM, Paul E. McKenney
>> >> <[email protected]> wrote:
>> >> >
>> >> > Atomic operations that return a value are required to act as full
>> >> > memory
>> >> > barriers. This means that code relying on ordering provided by
>> >> > these
>> >> > atomic operations must also do ordering, either by using an explicit
>> >> > memory barrier or by relying on guarantees from atomic operations.
>> >> >
>> >> > For example:
>> >> >
>> >> > CPU 0 CPU 1
>> >> >
>> >> > X = 1; r1 = Z;
>> >> > if (atomic_inc_unless_negative(&Y) smp_mb();
>> >> > do_something();
>> >> > Z = 1; r2 = X;
>> >> >
>> >> > Assuming X and Z are initially zero, if r1==1, we are guaranteed
>> >> > that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
>> >> > the barrier implicit in atomic_inc_unless_negative().
>> >> >
>> >> > Make sense?
>> >>
>> >> Yes, it does, and thanks for the explanation.
>> >>
>> >> But looks the above example is not what Frederic described:
>> >>
>> >> "the above atomic_read() might return -1 because there is no
>> >> guarantee it's seeing the recent update on the remote CPU."
>> >>
>> >> Even I am not sure if adding one smp_mb() around atomic_read()
>> >> can guarantee that too.
>> >
>> > Frederic was likely thinking of some other scenario that would be
>> > broken by atomic_inc_unless_negative() failing to act as a full
>> > memory barrier. Here is another example:
>> >
>> >
>> > CPU 0 CPU 1
>> >
>> > X = 1;
>> > if (atomic_inc_unless_negative(&Y) r1 = atomic_xchg(&Y,
>> > -1);
>> > r2 = X;
>> >
>> > If atomic_inc_unless_negative() acts as a full memory barrier, then
>> > if CPU 0 reaches the assignment from X, the results will be guaranteed
>> > to be 1. Otherwise, there is no guarantee.
>>
>> Your scenarios show an interesting guarantee I did not think about.
>> But my concern was on such a situation:
>>
>> CPU 0 CPU 1
>>
>> atomic_set(&X, -1)
>> atomic_inc(&X)
>> atomic_add_unless_negative(&X, 5)
>>
>> On the above situation, CPU 0 may still see X == -1 and thus not add
>> the 5. Of course all that only make sense with datas coming along.
>
> That could happen, but you would need CPU 1 to carry out some other
> reference for it to be a bug. Otherwise, CPU 1's atomic_inc() just

CPU 0 CPU 1

atomic_set(&X, -1)
A =5
&X = A
atomic_add_unless_negative(&X, 5)

Do you mean this when you referred "carry out some other reference
for it to be a bug"?

> happened after all of CPU 0's code. But yes, it would be possible
> to misorder with some larger scenario starting with this example.
> Especially given that atomic_inc() does not make any ordering guarantees.
>
> Thanx, Paul
>
> Thanx, Paul
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-13 12:40:54

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive

On Wed, Mar 13, 2013 at 03:03:10PM +0530, anish singh wrote:
> On Tue, Mar 12, 2013 at 11:25 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Tue, Mar 12, 2013 at 04:02:47PM +0100, Frederic Weisbecker wrote:
> >> 2013/3/12 Paul E. McKenney <[email protected]>:
> >> > On Tue, Mar 12, 2013 at 12:03:23PM +0800, Ming Lei wrote:
> >> >> On Tue, Mar 12, 2013 at 11:39 AM, Paul E. McKenney
> >> >> <[email protected]> wrote:
> >> >> >
> >> >> > Atomic operations that return a value are required to act as full
> >> >> > memory
> >> >> > barriers. This means that code relying on ordering provided by
> >> >> > these
> >> >> > atomic operations must also do ordering, either by using an explicit
> >> >> > memory barrier or by relying on guarantees from atomic operations.
> >> >> >
> >> >> > For example:
> >> >> >
> >> >> > CPU 0 CPU 1
> >> >> >
> >> >> > X = 1; r1 = Z;
> >> >> > if (atomic_inc_unless_negative(&Y) smp_mb();
> >> >> > do_something();
> >> >> > Z = 1; r2 = X;
> >> >> >
> >> >> > Assuming X and Z are initially zero, if r1==1, we are guaranteed
> >> >> > that r2==1. However, CPU 1 needs its smp_mb() in order to pair with
> >> >> > the barrier implicit in atomic_inc_unless_negative().
> >> >> >
> >> >> > Make sense?
> >> >>
> >> >> Yes, it does, and thanks for the explanation.
> >> >>
> >> >> But looks the above example is not what Frederic described:
> >> >>
> >> >> "the above atomic_read() might return -1 because there is no
> >> >> guarantee it's seeing the recent update on the remote CPU."
> >> >>
> >> >> Even I am not sure if adding one smp_mb() around atomic_read()
> >> >> can guarantee that too.
> >> >
> >> > Frederic was likely thinking of some other scenario that would be
> >> > broken by atomic_inc_unless_negative() failing to act as a full
> >> > memory barrier. Here is another example:
> >> >
> >> >
> >> > CPU 0 CPU 1
> >> >
> >> > X = 1;
> >> > if (atomic_inc_unless_negative(&Y) r1 = atomic_xchg(&Y,
> >> > -1);
> >> > r2 = X;
> >> >
> >> > If atomic_inc_unless_negative() acts as a full memory barrier, then
> >> > if CPU 0 reaches the assignment from X, the results will be guaranteed
> >> > to be 1. Otherwise, there is no guarantee.
> >>
> >> Your scenarios show an interesting guarantee I did not think about.
> >> But my concern was on such a situation:
> >>
> >> CPU 0 CPU 1
> >>
> >> atomic_set(&X, -1)
> >> atomic_inc(&X)
> >> atomic_add_unless_negative(&X, 5)
> >>
> >> On the above situation, CPU 0 may still see X == -1 and thus not add
> >> the 5. Of course all that only make sense with datas coming along.
> >
> > That could happen, but you would need CPU 1 to carry out some other
> > reference for it to be a bug. Otherwise, CPU 1's atomic_inc() just
>
> CPU 0 CPU 1
>
> atomic_set(&X, -1)
> A =5
> &X = A
> atomic_add_unless_negative(&X, 5)
>
> Do you mean this when you referred "carry out some other reference
> for it to be a bug"?

Not exactly. I was thinking more of something like this, with X and
Y both initially zero:

CPU 0 CPU 1

Y = 1;
smp_mb__before_atomic_dec();
if (atomic_inc_unless_negative(&X)) atomic_dec(&X);
r1 = 1;
else
r1 = Y;

If atomic_inc_unless_negative() does not imply a full memory barrier,
r1 could be equal to zero. (Not sure where atomic_add_unless_negative()
came from, by the way, not seeing it in mainline.)

Thanx, Paul

> > happened after all of CPU 0's code. But yes, it would be possible
> > to misorder with some larger scenario starting with this example.
> > Especially given that atomic_inc() does not make any ordering guarantees.
> >
> > Thanx, Paul
> >
> > Thanx, Paul
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>