> From: Ming Lei <[email protected]>
> Subject: atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive
>
> Generally, both atomic_inc_unless_negative() and
> atomic_dec_unless_positive() need at least two atomic_cmpxchg() to
> complete the atomic operation. In fact, the 1st atomic_cmpxchg() is just
> used to read current value of the atomic variable at most times.
Agreed, this looks ugly...
But can't we make a simpler patch and keep the code simple/readable ?
Oleg.
--- x/include/linux/atomic.h
+++ x/include/linux/atomic.h
@@ -64,7 +64,7 @@ static inline int atomic_inc_not_zero_hi
static inline int atomic_inc_unless_negative(atomic_t *p)
{
int v, v1;
- for (v = 0; v >= 0; v = v1) {
+ for (v = atomic_read(p); v >= 0; v = v1) {
v1 = atomic_cmpxchg(p, v, v + 1);
if (likely(v1 == v))
return 1;
@@ -77,7 +77,7 @@ static inline int atomic_inc_unless_nega
static inline int atomic_dec_unless_positive(atomic_t *p)
{
int v, v1;
- for (v = 0; v <= 0; v = v1) {
+ for (v = atomic_read(p); v <= 0; v = v1) {
v1 = atomic_cmpxchg(p, v, v - 1);
if (likely(v1 == v))
return 1;
On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov <[email protected]> wrote:
>> From: Ming Lei <[email protected]>
>> Subject: atomic: improve atomic_inc_unless_negative/atomic_dec_unless_positive
>>
>> Generally, both atomic_inc_unless_negative() and
>> atomic_dec_unless_positive() need at least two atomic_cmpxchg() to
>> complete the atomic operation. In fact, the 1st atomic_cmpxchg() is just
>> used to read current value of the atomic variable at most times.
>
> Agreed, this looks ugly...
>
> But can't we make a simpler patch and keep the code simple/readable ?
>
> Oleg.
>
> --- x/include/linux/atomic.h
> +++ x/include/linux/atomic.h
> @@ -64,7 +64,7 @@ static inline int atomic_inc_not_zero_hi
> static inline int atomic_inc_unless_negative(atomic_t *p)
> {
> int v, v1;
> - for (v = 0; v >= 0; v = v1) {
> + for (v = atomic_read(p); v >= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v + 1);
Unfortunately, the above will exchange the current value even though
it is negative, so it isn't correct.
> if (likely(v1 == v))
> return 1;
> @@ -77,7 +77,7 @@ static inline int atomic_inc_unless_nega
> static inline int atomic_dec_unless_positive(atomic_t *p)
> {
> int v, v1;
> - for (v = 0; v <= 0; v = v1) {
> + for (v = atomic_read(p); v <= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v - 1);
Similar with above.
> if (likely(v1 == v))
> return 1;
>
Thanks,
--
Ming Lei
On 03/15, Ming Lei wrote:
>
> On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov <[email protected]> wrote:
> > static inline int atomic_inc_unless_negative(atomic_t *p)
> > {
> > int v, v1;
> > - for (v = 0; v >= 0; v = v1) {
> > + for (v = atomic_read(p); v >= 0; v = v1) {
> > v1 = atomic_cmpxchg(p, v, v + 1);
>
> Unfortunately, the above will exchange the current value even though
> it is negative, so it isn't correct.
Hmm, why? We always check "v >= 0" before we try to do
atomic_cmpxchg(old => v) ?
Oleg.
On Fri, Mar 15, 2013 at 9:46 PM, Oleg Nesterov <[email protected]> wrote:
> On 03/15, Ming Lei wrote:
>>
>> On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov <[email protected]> wrote:
>> > static inline int atomic_inc_unless_negative(atomic_t *p)
>> > {
>> > int v, v1;
>> > - for (v = 0; v >= 0; v = v1) {
>> > + for (v = atomic_read(p); v >= 0; v = v1) {
>> > v1 = atomic_cmpxchg(p, v, v + 1);
>>
>> Unfortunately, the above will exchange the current value even though
>> it is negative, so it isn't correct.
>
> Hmm, why? We always check "v >= 0" before we try to do
> atomic_cmpxchg(old => v) ?
Sorry, yes, you are right. But then your patch is basically same with the
previous one, isn't it? And has same problem, see below discussion:
http://marc.info/?t=136284366900001&r=1&w=2
Thanks,
--
Ming Lei
On 03/15, Ming Lei wrote:
>
> On Fri, Mar 15, 2013 at 9:46 PM, Oleg Nesterov <[email protected]> wrote:
> > On 03/15, Ming Lei wrote:
> >>
> >> On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov <[email protected]> wrote:
> >> > static inline int atomic_inc_unless_negative(atomic_t *p)
> >> > {
> >> > int v, v1;
> >> > - for (v = 0; v >= 0; v = v1) {
> >> > + for (v = atomic_read(p); v >= 0; v = v1) {
> >> > v1 = atomic_cmpxchg(p, v, v + 1);
> >>
> >> Unfortunately, the above will exchange the current value even though
> >> it is negative, so it isn't correct.
> >
> > Hmm, why? We always check "v >= 0" before we try to do
> > atomic_cmpxchg(old => v) ?
>
> Sorry, yes, you are right. But then your patch is basically same with the
> previous one, isn't it?
Sure, the logic is the same, just the patch (and the code) looks simpler
and more understandable.
> And has same problem, see below discussion:
>
> http://marc.info/?t=136284366900001&r=1&w=2
The lack of the barrier?
I thought about this, this should be fine? atomic_add_unless() has the same
"problem", but this is documented in atomic_ops.txt:
atomic_add_unless requires explicit memory barriers around the operation
unless it fails (returns 0).
I thought that atomic_add_unless_negative() should have the same
guarantees?
Paul? Frederic?
Oleg.
2013/3/15 Oleg Nesterov <[email protected]>:
> On 03/15, Ming Lei wrote:
>>
>> On Fri, Mar 15, 2013 at 9:46 PM, Oleg Nesterov <[email protected]> wrote:
>> > On 03/15, Ming Lei wrote:
>> >>
>> >> On Fri, Mar 15, 2013 at 12:24 AM, Oleg Nesterov <[email protected]> wrote:
>> >> > static inline int atomic_inc_unless_negative(atomic_t *p)
>> >> > {
>> >> > int v, v1;
>> >> > - for (v = 0; v >= 0; v = v1) {
>> >> > + for (v = atomic_read(p); v >= 0; v = v1) {
>> >> > v1 = atomic_cmpxchg(p, v, v + 1);
>> >>
>> >> Unfortunately, the above will exchange the current value even though
>> >> it is negative, so it isn't correct.
>> >
>> > Hmm, why? We always check "v >= 0" before we try to do
>> > atomic_cmpxchg(old => v) ?
>>
>> Sorry, yes, you are right. But then your patch is basically same with the
>> previous one, isn't it?
>
> Sure, the logic is the same, just the patch (and the code) looks simpler
> and more understandable.
>
>> And has same problem, see below discussion:
>>
>> http://marc.info/?t=136284366900001&r=1&w=2
>
> The lack of the barrier?
>
> I thought about this, this should be fine? atomic_add_unless() has the same
> "problem", but this is documented in atomic_ops.txt:
>
> atomic_add_unless requires explicit memory barriers around the operation
> unless it fails (returns 0).
>
> I thought that atomic_add_unless_negative() should have the same
> guarantees?
I feel very uncomfortable with that. The memory barrier is needed
anyway to make sure we don't deal with a stale value of the atomic val
(wrt. ordering against another object).
The following should really be expected to work without added barrier:
void put_object(foo *obj)
{
if (atomic_dec_return(obj->ref) == -1)
free_rcu(obj);
}
bool try_get_object(foo *obj)
{
if (atomic_add_unless_negative(obj, 1))
return true;
return false;
}
= CPU 0 = = CPU 1
rcu_read_lock()
put_object(obj0);
obj = rcu_derefr(obj0);
rcu_assign_ptr(obj0, NULL);
if (try_get_object(obj))
do_something...
else
object is dying
rcu_read_unlock()
But anyway I must defer on Paul, he's the specialist here.
On 03/15, Frederic Weisbecker wrote:
>
> > The lack of the barrier?
> >
> > I thought about this, this should be fine? atomic_add_unless() has the same
> > "problem", but this is documented in atomic_ops.txt:
> >
> > atomic_add_unless requires explicit memory barriers around the operation
> > unless it fails (returns 0).
> >
> > I thought that atomic_add_unless_negative() should have the same
> > guarantees?
>
> I feel very uncomfortable with that. The memory barrier is needed
> anyway to make sure we don't deal with a stale value of the atomic val
> (wrt. ordering against another object).
> The following should really be expected to work without added barrier:
>
> void put_object(foo *obj)
> {
> if (atomic_dec_return(obj->ref) == -1)
> free_rcu(obj);
> }
>
> bool try_get_object(foo *obj)
> {
> if (atomic_add_unless_negative(obj, 1))
> return true;
> return false;
> }
>
> = CPU 0 = = CPU 1
> rcu_read_lock()
> put_object(obj0);
> obj = rcu_derefr(obj0);
> rcu_assign_ptr(obj0, NULL);
(I guess you meant rcu_assign_ptr() then put_object())
> if (try_get_object(obj))
> do_something...
> else
> object is dying
> rcu_read_unlock()
I must have missed something.
do_something() looks fine, if atomic_add_unless_negative() succeeds
we do have a barrier?
Anyway, I understand that it is possible to write the code which
won't work without the uncoditional mb().
My point was: should we fix atomic_add_unless() then? If not, why
should atomic_add_unless_negative() differ?
Oleg.
2013/3/15 Oleg Nesterov <[email protected]>:
> On 03/15, Frederic Weisbecker wrote:
>>
>> > The lack of the barrier?
>> >
>> > I thought about this, this should be fine? atomic_add_unless() has the same
>> > "problem", but this is documented in atomic_ops.txt:
>> >
>> > atomic_add_unless requires explicit memory barriers around the operation
>> > unless it fails (returns 0).
>> >
>> > I thought that atomic_add_unless_negative() should have the same
>> > guarantees?
>>
>> I feel very uncomfortable with that. The memory barrier is needed
>> anyway to make sure we don't deal with a stale value of the atomic val
>> (wrt. ordering against another object).
>> The following should really be expected to work without added barrier:
>>
>> void put_object(foo *obj)
>> {
>> if (atomic_dec_return(obj->ref) == -1)
>> free_rcu(obj);
>> }
>>
>> bool try_get_object(foo *obj)
>> {
>> if (atomic_add_unless_negative(obj, 1))
>> return true;
>> return false;
>> }
>>
>> = CPU 0 = = CPU 1
>> rcu_read_lock()
>> put_object(obj0);
>> obj = rcu_derefr(obj0);
>> rcu_assign_ptr(obj0, NULL);
>
> (I guess you meant rcu_assign_ptr() then put_object())
Right.
>
>> if (try_get_object(obj))
>> do_something...
>> else
>> object is dying
>> rcu_read_unlock()
>
> I must have missed something.
>
> do_something() looks fine, if atomic_add_unless_negative() succeeds
> we do have a barrier?
Ok, I guess the guarantee of a barrier in case of failure is probably
not needed. But since the only way to safely read the atomic value is
a cmpxchg like operation, I guess a barrier must be involved in any
case.
Using atomic_read() may return some stale value.
>
> Anyway, I understand that it is possible to write the code which
> won't work without the uncoditional mb().
Yeah that's my fear.
>
> My point was: should we fix atomic_add_unless() then? If not, why
> should atomic_add_unless_negative() differ?
They shouldn't differ I guess.
On Fri, Mar 15, 2013 at 07:34:32PM +0100, Frederic Weisbecker wrote:
> 2013/3/15 Oleg Nesterov <[email protected]>:
> > On 03/15, Frederic Weisbecker wrote:
> >>
> >> > The lack of the barrier?
> >> >
> >> > I thought about this, this should be fine? atomic_add_unless() has the same
> >> > "problem", but this is documented in atomic_ops.txt:
> >> >
> >> > atomic_add_unless requires explicit memory barriers around the operation
> >> > unless it fails (returns 0).
> >> >
> >> > I thought that atomic_add_unless_negative() should have the same
> >> > guarantees?
> >>
> >> I feel very uncomfortable with that. The memory barrier is needed
> >> anyway to make sure we don't deal with a stale value of the atomic val
> >> (wrt. ordering against another object).
> >> The following should really be expected to work without added barrier:
> >>
> >> void put_object(foo *obj)
> >> {
> >> if (atomic_dec_return(obj->ref) == -1)
> >> free_rcu(obj);
> >> }
> >>
> >> bool try_get_object(foo *obj)
> >> {
> >> if (atomic_add_unless_negative(obj, 1))
> >> return true;
> >> return false;
> >> }
> >>
> >> = CPU 0 = = CPU 1
> >> rcu_read_lock()
> >> put_object(obj0);
> >> obj = rcu_derefr(obj0);
> >> rcu_assign_ptr(obj0, NULL);
> >
> > (I guess you meant rcu_assign_ptr() then put_object())
>
> Right.
>
> >
> >> if (try_get_object(obj))
> >> do_something...
> >> else
> >> object is dying
> >> rcu_read_unlock()
> >
> > I must have missed something.
> >
> > do_something() looks fine, if atomic_add_unless_negative() succeeds
> > we do have a barrier?
>
> Ok, I guess the guarantee of a barrier in case of failure is probably
> not needed. But since the only way to safely read the atomic value is
> a cmpxchg like operation, I guess a barrier must be involved in any
> case.
>
> Using atomic_read() may return some stale value.
>
> >
> > Anyway, I understand that it is possible to write the code which
> > won't work without the uncoditional mb().
>
> Yeah that's my fear.
>
> >
> > My point was: should we fix atomic_add_unless() then? If not, why
> > should atomic_add_unless_negative() differ?
>
> They shouldn't differ I guess.
Completely agreed. It is not like memory ordering is simple, so we should
keep the rules simple. Atomic primitives that sometimes imply a memory
barrier seems a bit over the top.
The rule is that if an atomic primitive returns non-void, then there is
a full memory barrier before and after. This applies to primitives
returning boolean as well, with atomic_dec_and_test() setting this
precedent from what I can see.
Thanx, Paul
On 03/15, Frederic Weisbecker wrote:
>
> 2013/3/15 Oleg Nesterov <[email protected]>:
> >
> > do_something() looks fine, if atomic_add_unless_negative() succeeds
> > we do have a barrier?
>
> Ok, I guess the guarantee of a barrier in case of failure is probably
> not needed. But since the only way to safely read the atomic value is
> a cmpxchg like operation, I guess a barrier must be involved in any
> case.
>
> Using atomic_read() may return some stale value.
Oh, if the lack of the barrier is fine, then "stale" should be fine
too, I think. I bet you can't describe accurately what "stale" can
actually mean in this case ;)
If, say, atomic_inc_unless_negative(p) sees the stale value < 0, it
was actually negative somewhere in the past. If it was changed later,
we can pretend that atomic_inc_unless_negative() was called before
the change which makes it positive.
> > Anyway, I understand that it is possible to write the code which
> > won't work without the uncoditional mb().
>
> Yeah that's my fear.
I see... well personally I can't imagine the "natural" (non-artificial)
code example which needs mb() in case of failure.
However, I have to agree with Paul's "It is not like memory ordering is
simple", so I won't argue.
> > My point was: should we fix atomic_add_unless() then? If not, why
> > should atomic_add_unless_negative() differ?
>
> They shouldn't differ I guess.
Agreed, they shouldn't.
Oleg.
On 03/15, Paul E. McKenney wrote:
>
> On Fri, Mar 15, 2013 at 07:34:32PM +0100, Frederic Weisbecker wrote:
> > 2013/3/15 Oleg Nesterov <[email protected]>:
> > >
> > > My point was: should we fix atomic_add_unless() then? If not, why
> > > should atomic_add_unless_negative() differ?
> >
> > They shouldn't differ I guess.
>
> Completely agreed. It is not like memory ordering is simple, so we should
> keep the rules simple.
It is hardly possible to argue with this ;)
> The rule is that if an atomic primitive returns non-void, then there is
> a full memory barrier before and after.
This case is documented...
> This applies to primitives
> returning boolean as well, with atomic_dec_and_test() setting this
> precedent from what I can see.
I don't think this is the "fair" comparison. Unlike atomic_add_unless(),
atomic_dec_and_test() always changes the memory even if it "fails".
If atomic_add_unless() returns 0, nothing was changed and if we add
the barrier it is not clear what it should be paired with.
But OK. I have to agree that "keep the rules simple" makes sense, so
we should change atomic_add_unless() as well.
Oleg.
On Sat, Mar 16, 2013 at 07:30:22PM +0100, Oleg Nesterov wrote:
> On 03/15, Paul E. McKenney wrote:
> >
> > On Fri, Mar 15, 2013 at 07:34:32PM +0100, Frederic Weisbecker wrote:
> > > 2013/3/15 Oleg Nesterov <[email protected]>:
> > > >
> > > > My point was: should we fix atomic_add_unless() then? If not, why
> > > > should atomic_add_unless_negative() differ?
> > >
> > > They shouldn't differ I guess.
> >
> > Completely agreed. It is not like memory ordering is simple, so we should
> > keep the rules simple.
>
> It is hardly possible to argue with this ;)
>
> > The rule is that if an atomic primitive returns non-void, then there is
> > a full memory barrier before and after.
>
> This case is documented...
>
> > This applies to primitives
> > returning boolean as well, with atomic_dec_and_test() setting this
> > precedent from what I can see.
>
> I don't think this is the "fair" comparison. Unlike atomic_add_unless(),
> atomic_dec_and_test() always changes the memory even if it "fails".
>
> If atomic_add_unless() returns 0, nothing was changed and if we add
> the barrier it is not clear what it should be paired with.
>
> But OK. I have to agree that "keep the rules simple" makes sense, so
> we should change atomic_add_unless() as well.
Agreed!
Thanx, Paul
On 03/17, Paul E. McKenney wrote:
>
> On Sat, Mar 16, 2013 at 07:30:22PM +0100, Oleg Nesterov wrote:
> >
> > > The rule is that if an atomic primitive returns non-void, then there is
> > > a full memory barrier before and after.
> >
> > This case is documented...
> >
> > > This applies to primitives
> > > returning boolean as well, with atomic_dec_and_test() setting this
> > > precedent from what I can see.
> >
> > I don't think this is the "fair" comparison. Unlike atomic_add_unless(),
> > atomic_dec_and_test() always changes the memory even if it "fails".
> >
> > If atomic_add_unless() returns 0, nothing was changed and if we add
> > the barrier it is not clear what it should be paired with.
> >
> > But OK. I have to agree that "keep the rules simple" makes sense, so
> > we should change atomic_add_unless() as well.
>
> Agreed!
OK... since nobody volunteered to make a patch, what do you think about
the change below?
It should "fix" atomic_add_unless() (only on x86) and optimize
atomic_inc/dec_unless.
With this change atomic_*_unless() can do the unnecessary mb() after
cmpxchg() fails, but I think this case is very unlikely.
And, in the likely case atomic_inc/dec_unless avoids the 1st cmpxchg()
which in most cases just reads the memory for the next cmpxchg().
Oleg.
--- x/arch/x86/include/asm/atomic.h
+++ x/arch/x86/include/asm/atomic.h
@@ -212,15 +212,12 @@ static inline int atomic_xchg(atomic_t *
static inline int __atomic_add_unless(atomic_t *v, int a, int u)
{
int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
+ for (c = atomic_read(v); c != u; c = old) {
+ old = atomic_cmpxchg(v, c, c + a);
if (likely(old == c))
- break;
- c = old;
+ return c;
}
+ smp_mb();
return c;
}
--- x/include/linux/atomic.h
+++ x/include/linux/atomic.h
@@ -64,11 +64,12 @@ static inline int atomic_inc_not_zero_hi
static inline int atomic_inc_unless_negative(atomic_t *p)
{
int v, v1;
- for (v = 0; v >= 0; v = v1) {
+ for (v = atomic_read(p); v >= 0; v = v1) {
v1 = atomic_cmpxchg(p, v, v + 1);
if (likely(v1 == v))
return 1;
}
+ smp_mb();
return 0;
}
#endif
@@ -77,11 +78,12 @@ static inline int atomic_inc_unless_nega
static inline int atomic_dec_unless_positive(atomic_t *p)
{
int v, v1;
- for (v = 0; v <= 0; v = v1) {
+ for (atomic_read(p); v <= 0; v = v1) {
v1 = atomic_cmpxchg(p, v, v - 1);
if (likely(v1 == v))
return 1;
}
+ smp_mb();
return 0;
}
#endif
On Thu, Mar 21, 2013 at 06:08:27PM +0100, Oleg Nesterov wrote:
> On 03/17, Paul E. McKenney wrote:
> >
> > On Sat, Mar 16, 2013 at 07:30:22PM +0100, Oleg Nesterov wrote:
> > >
> > > > The rule is that if an atomic primitive returns non-void, then there is
> > > > a full memory barrier before and after.
> > >
> > > This case is documented...
> > >
> > > > This applies to primitives
> > > > returning boolean as well, with atomic_dec_and_test() setting this
> > > > precedent from what I can see.
> > >
> > > I don't think this is the "fair" comparison. Unlike atomic_add_unless(),
> > > atomic_dec_and_test() always changes the memory even if it "fails".
> > >
> > > If atomic_add_unless() returns 0, nothing was changed and if we add
> > > the barrier it is not clear what it should be paired with.
> > >
> > > But OK. I have to agree that "keep the rules simple" makes sense, so
> > > we should change atomic_add_unless() as well.
> >
> > Agreed!
>
> OK... since nobody volunteered to make a patch, what do you think about
> the change below?
>
> It should "fix" atomic_add_unless() (only on x86) and optimize
> atomic_inc/dec_unless.
>
> With this change atomic_*_unless() can do the unnecessary mb() after
> cmpxchg() fails, but I think this case is very unlikely.
>
> And, in the likely case atomic_inc/dec_unless avoids the 1st cmpxchg()
> which in most cases just reads the memory for the next cmpxchg().
Thank you, Oleg!
Reviewed-by: Paul E. McKenney <[email protected]>
> Oleg.
>
> --- x/arch/x86/include/asm/atomic.h
> +++ x/arch/x86/include/asm/atomic.h
> @@ -212,15 +212,12 @@ static inline int atomic_xchg(atomic_t *
> static inline int __atomic_add_unless(atomic_t *v, int a, int u)
> {
> int c, old;
> - c = atomic_read(v);
> - for (;;) {
> - if (unlikely(c == (u)))
> - break;
> - old = atomic_cmpxchg((v), c, c + (a));
> + for (c = atomic_read(v); c != u; c = old) {
> + old = atomic_cmpxchg(v, c, c + a);
> if (likely(old == c))
> - break;
> - c = old;
> + return c;
> }
> + smp_mb();
> return c;
> }
>
> --- x/include/linux/atomic.h
> +++ x/include/linux/atomic.h
> @@ -64,11 +64,12 @@ static inline int atomic_inc_not_zero_hi
> static inline int atomic_inc_unless_negative(atomic_t *p)
> {
> int v, v1;
> - for (v = 0; v >= 0; v = v1) {
> + for (v = atomic_read(p); v >= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v + 1);
> if (likely(v1 == v))
> return 1;
> }
> + smp_mb();
> return 0;
> }
> #endif
> @@ -77,11 +78,12 @@ static inline int atomic_inc_unless_nega
> static inline int atomic_dec_unless_positive(atomic_t *p)
> {
> int v, v1;
> - for (v = 0; v <= 0; v = v1) {
> + for (atomic_read(p); v <= 0; v = v1) {
> v1 = atomic_cmpxchg(p, v, v - 1);
> if (likely(v1 == v))
> return 1;
> }
> + smp_mb();
> return 0;
> }
> #endif
>
On Thu, 2013-03-21 at 18:08 +0100, Oleg Nesterov wrote:
> OK... since nobody volunteered to make a patch, what do you think about
> the change below?
>
> It should "fix" atomic_add_unless() (only on x86) and optimize
> atomic_inc/dec_unless.
>
> With this change atomic_*_unless() can do the unnecessary mb() after
> cmpxchg() fails, but I think this case is very unlikely.
>
> And, in the likely case atomic_inc/dec_unless avoids the 1st cmpxchg()
> which in most cases just reads the memory for the next cmpxchg().
>
> Oleg.
Hmm, cmpxchg() has different effect on MESI transaction, than a plain
read.
Given there is a single user of atomic_inc_unless_negative(),
(get_write_access(struct inode *inode) )
maybe the 'hint' idea used in atomic_inc_not_zero_hint() could be used.
(The caller might know what is the expected current value, instead of
using atomic_read() and possibly not use appropriate transaction)
On 03/21, Eric Dumazet wrote:
>
> On Thu, 2013-03-21 at 18:08 +0100, Oleg Nesterov wrote:
>
> > OK... since nobody volunteered to make a patch, what do you think about
> > the change below?
> >
> > It should "fix" atomic_add_unless() (only on x86) and optimize
> > atomic_inc/dec_unless.
> >
> > With this change atomic_*_unless() can do the unnecessary mb() after
> > cmpxchg() fails, but I think this case is very unlikely.
> >
> > And, in the likely case atomic_inc/dec_unless avoids the 1st cmpxchg()
> > which in most cases just reads the memory for the next cmpxchg().
> >
> > Oleg.
>
> Hmm, cmpxchg() has different effect on MESI transaction, than a plain
> read.
But this doesn't matter?
We will do cmpxchg() anyway. Unless we can see that it will fail.
Or could you explain what I missed?
> maybe the 'hint' idea used in atomic_inc_not_zero_hint() could be used.
To me, it would be better to kill atomic_inc_not_zero_hint() or unify
unify it with atomic_inc_not_zero(). But this is another story.
Oleg.
On Thu, 2013-03-21 at 19:30 +0100, Oleg Nesterov wrote:
> To me, it would be better to kill atomic_inc_not_zero_hint() or unify
> unify it with atomic_inc_not_zero(). But this is another story.
git is your friend.
I suggest you read 3f9d35b9514 changelog before killing it, thanks.
On 03/21, Eric Dumazet wrote:
>
> On Thu, 2013-03-21 at 19:30 +0100, Oleg Nesterov wrote:
>
> > To me, it would be better to kill atomic_inc_not_zero_hint() or unify
> > unify it with atomic_inc_not_zero(). But this is another story.
>
> git is your friend.
>
> I suggest you read 3f9d35b9514 changelog before killing it, thanks.
Thanks Eric for your friendly suggestion.
But I didn't mean we should kill this optimization. Yes, I am wondering
if we can avoid inc_not_zero_hint _or_ unify with add_unless. But let me
repeat, this is another story.
Perhaps I misread your previous email... I understood it as if you think
the patch I sent is wrong. No?
If you meant that get_write_access() can predict the current value of
i_writecount... how? And even if we could, why we cant/shouldnt try to
optimize the generic atomic_inc_unless_negative()?
So what did you actually mean?
Oleg.
On Thu, Mar 21, 2013 at 03:56:58PM -0700, Eric Dumazet wrote:
> On Thu, 2013-03-21 at 19:30 +0100, Oleg Nesterov wrote:
>
> > To me, it would be better to kill atomic_inc_not_zero_hint() or unify
> > unify it with atomic_inc_not_zero(). But this is another story.
>
> git is your friend.
>
> I suggest you read 3f9d35b9514 changelog before killing it, thanks.
Hello, Eric,
Does the addition of a memory barrier in the not-zero case impose
too large a performance penalty?
Thanx, Paul