2011-04-12 08:04:55

by Shaohua Li

[permalink] [raw]
Subject: [PATCH 3/4]percpu_counter: fix code for 32bit systems

percpu_counter.counter is a 's64'. Accessing it in 32-bit system is racing.
we need some locking to protect it otherwise some very wrong value could be
accessed.

Signed-off-by: Shaohua Li <[email protected]>
---
include/linux/percpu_counter.h | 43 +++++++++++++++++++++++++++++++----------
1 file changed, 33 insertions(+), 10 deletions(-)

Index: linux/include/linux/percpu_counter.h
===================================================================
--- linux.orig/include/linux/percpu_counter.h 2011-04-12 15:48:44.000000000 +0800
+++ linux/include/linux/percpu_counter.h 2011-04-12 15:48:54.000000000 +0800
@@ -54,7 +54,15 @@ static inline s64 percpu_counter_sum(str

static inline s64 percpu_counter_read(struct percpu_counter *fbc)
{
+#if BITS_PER_LONG == 32
+ s64 count;
+ spin_lock(&fbc->lock);
+ count = fbc->count;
+ spin_unlock(&fbc->lock);
+ return count;
+#else
return fbc->count;
+#endif
}

static inline int percpu_counter_initialized(struct percpu_counter *fbc)
@@ -68,9 +76,20 @@ struct percpu_counter {
s64 count;
};

-static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amount)
+static inline void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
{
+#if BITS_PER_LONG == 32
+ preempt_disable();
fbc->count = amount;
+ preempt_enable();
+#else
+ fbc->count = amount;
+#endif
+}
+
+static inline int percpu_counter_init(struct percpu_counter *fbc, s64 amount)
+{
+ percpu_counter_set(fbc, amount);
return 0;
}

@@ -78,16 +97,25 @@ static inline void percpu_counter_destro
{
}

-static inline void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
+static inline s64 percpu_counter_read(struct percpu_counter *fbc)
{
- fbc->count = amount;
+#if BITS_PER_LONG == 32
+ s64 count;
+ preempt_disable();
+ count = fbc->count;
+ preempt_enable();
+ return count;
+#else
+ return fbc->count;
+#endif
}

static inline int percpu_counter_compare(struct percpu_counter *fbc, s64 rhs)
{
- if (fbc->count > rhs)
+ s64 count = percpu_counter_read(fbc);
+ if (count > rhs)
return 1;
- else if (fbc->count < rhs)
+ else if (count < rhs)
return -1;
else
return 0;
@@ -107,11 +135,6 @@ __percpu_counter_add(struct percpu_count
percpu_counter_add(fbc, amount);
}

-static inline s64 percpu_counter_read(struct percpu_counter *fbc)
-{
- return fbc->count;
-}
-
static inline s64 percpu_counter_sum(struct percpu_counter *fbc)
{
return percpu_counter_read(fbc);


2011-04-12 09:04:00

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

Le mardi 12 avril 2011 à 16:04 +0800, Shaohua Li a écrit :
> percpu_counter.counter is a 's64'. Accessing it in 32-bit system is racing.
> we need some locking to protect it otherwise some very wrong value could be
> accessed.
>
> Signed-off-by: Shaohua Li <[email protected]>
> ---
> include/linux/percpu_counter.h | 43 +++++++++++++++++++++++++++++++----------
> 1 file changed, 33 insertions(+), 10 deletions(-)
>
> Index: linux/include/linux/percpu_counter.h
> ===================================================================
> --- linux.orig/include/linux/percpu_counter.h 2011-04-12 15:48:44.000000000 +0800
> +++ linux/include/linux/percpu_counter.h 2011-04-12 15:48:54.000000000 +0800
> @@ -54,7 +54,15 @@ static inline s64 percpu_counter_sum(str
>
> static inline s64 percpu_counter_read(struct percpu_counter *fbc)
> {
> +#if BITS_PER_LONG == 32
> + s64 count;
> + spin_lock(&fbc->lock);
> + count = fbc->count;
> + spin_unlock(&fbc->lock);
> + return count;
> +#else
> return fbc->count;
> +#endif
> }
>

Hmm... did you test this with LOCKDEP on ?

You add a possible deadlock here.

Hint : Some percpu_counter are used from irq context.

This interface assumes caller take the appropriate locking.


2011-04-12 19:03:12

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

On Tue, Apr 12, 2011 at 04:04:04PM +0800, Shaohua Li wrote:
> static inline s64 percpu_counter_read(struct percpu_counter *fbc)
> {
> +#if BITS_PER_LONG == 32
> + s64 count;
> + spin_lock(&fbc->lock);
> + count = fbc->count;
> + spin_unlock(&fbc->lock);
> + return count;
> +#else
> return fbc->count;
> +#endif

I don't know. Is there any problem caused by this? The interface is
known to be unreliable and already being used in speculative manner.
I think it's more beneficial to avoid using locks on fast read path.

Thanks.

--
tejun

2011-04-13 01:01:12

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

On Tue, 2011-04-12 at 17:03 +0800, Eric Dumazet wrote:
> Le mardi 12 avril 2011 à 16:04 +0800, Shaohua Li a écrit :
> > percpu_counter.counter is a 's64'. Accessing it in 32-bit system is racing.
> > we need some locking to protect it otherwise some very wrong value could be
> > accessed.
> >
> > Signed-off-by: Shaohua Li <[email protected]>
> > ---
> > include/linux/percpu_counter.h | 43 +++++++++++++++++++++++++++++++----------
> > 1 file changed, 33 insertions(+), 10 deletions(-)
> >
> > Index: linux/include/linux/percpu_counter.h
> > ===================================================================
> > --- linux.orig/include/linux/percpu_counter.h 2011-04-12 15:48:44.000000000 +0800
> > +++ linux/include/linux/percpu_counter.h 2011-04-12 15:48:54.000000000 +0800
> > @@ -54,7 +54,15 @@ static inline s64 percpu_counter_sum(str
> >
> > static inline s64 percpu_counter_read(struct percpu_counter *fbc)
> > {
> > +#if BITS_PER_LONG == 32
> > + s64 count;
> > + spin_lock(&fbc->lock);
> > + count = fbc->count;
> > + spin_unlock(&fbc->lock);
> > + return count;
> > +#else
> > return fbc->count;
> > +#endif
> > }
> >
>
> Hmm... did you test this with LOCKDEP on ?
>
> You add a possible deadlock here.
>
> Hint : Some percpu_counter are used from irq context.
there are some places we didn't disable interrupt, for example
percpu_counter_add. So the API isn't irq safe to me.

> This interface assumes caller take the appropriate locking.
no comments say this, and some places we don't hold locking.
for example, meminfo_proc_show.

2011-04-13 01:05:39

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

On Wed, 2011-04-13 at 03:02 +0800, Tejun Heo wrote:
> On Tue, Apr 12, 2011 at 04:04:04PM +0800, Shaohua Li wrote:
> > static inline s64 percpu_counter_read(struct percpu_counter *fbc)
> > {
> > +#if BITS_PER_LONG == 32
> > + s64 count;
> > + spin_lock(&fbc->lock);
> > + count = fbc->count;
> > + spin_unlock(&fbc->lock);
> > + return count;
> > +#else
> > return fbc->count;
> > +#endif
>
> I don't know. Is there any problem caused by this? The interface is
> known to be unreliable and already being used in speculative manner.
> I think it's more beneficial to avoid using locks on fast read path.
yes, it is unreliable, but only in an extent of batch*nr_cpus. accessing
64bits in 32bit machine can give us a _very_ big inaccuracy, which is
unacceptable to me.

2011-04-13 02:32:15

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

Le mercredi 13 avril 2011 à 09:01 +0800, Shaohua Li a écrit :
> On Tue, 2011-04-12 at 17:03 +0800, Eric Dumazet wrote:
> >
> > Hmm... did you test this with LOCKDEP on ?
> >
> > You add a possible deadlock here.
> >
> > Hint : Some percpu_counter are used from irq context.
> there are some places we didn't disable interrupt, for example
> percpu_counter_add. So the API isn't irq safe to me.
>

So what ? Callers must disable IRQ before calling percpu_counter_add(),
and they actually do in network stack. Please check again,
tcp_sockets_allocated for example.

> > This interface assumes caller take the appropriate locking.
> no comments say this, and some places we don't hold locking.
> for example, meminfo_proc_show.
>

This doesnt answer my question about LOCKDEP ;)

Just fix the few callers that might need a fix, since this is the only
way to deal with potential problems without adding performance penalty
(for stable trees)


2011-04-13 02:42:01

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

On Wed, 2011-04-13 at 10:32 +0800, Eric Dumazet wrote:
> Le mercredi 13 avril 2011 à 09:01 +0800, Shaohua Li a écrit :
> > On Tue, 2011-04-12 at 17:03 +0800, Eric Dumazet wrote:
> > >
> > > Hmm... did you test this with LOCKDEP on ?
> > >
> > > You add a possible deadlock here.
> > >
> > > Hint : Some percpu_counter are used from irq context.
> > there are some places we didn't disable interrupt, for example
> > percpu_counter_add. So the API isn't irq safe to me.
> >
>
> So what ? Callers must disable IRQ before calling percpu_counter_add(),
> and they actually do in network stack. Please check again,
> tcp_sockets_allocated for example.
Did you check other code? for example, __vm_enough_memory() doesn't
disable IRQ before calling percpu_counter_add().

> > > This interface assumes caller take the appropriate locking.
> > no comments say this, and some places we don't hold locking.
> > for example, meminfo_proc_show.
> >
>
> This doesnt answer my question about LOCKDEP ;)
>
> Just fix the few callers that might need a fix, since this is the only
> way to deal with potential problems without adding performance penalty
> (for stable trees)
I mean the interface doesn't assume caller should take locking. Since
there isn't locking taking, we should make the interface itself correct,
instead of fixing caller.

2011-04-13 02:47:40

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

Le mercredi 13 avril 2011 à 10:41 +0800, Shaohua Li a écrit :
> On Wed, 2011-04-13 at 10:32 +0800, Eric Dumazet wrote:
> > Le mercredi 13 avril 2011 à 09:01 +0800, Shaohua Li a écrit :
> > > On Tue, 2011-04-12 at 17:03 +0800, Eric Dumazet wrote:
> > > >
> > > > Hmm... did you test this with LOCKDEP on ?
> > > >
> > > > You add a possible deadlock here.
> > > >
> > > > Hint : Some percpu_counter are used from irq context.
> > > there are some places we didn't disable interrupt, for example
> > > percpu_counter_add. So the API isn't irq safe to me.
> > >
> >
> > So what ? Callers must disable IRQ before calling percpu_counter_add(),
> > and they actually do in network stack. Please check again,
> > tcp_sockets_allocated for example.
> Did you check other code? for example, __vm_enough_memory() doesn't
> disable IRQ before calling percpu_counter_add().
>

Did you read my mails ?

I said : fix the buggy parts, dont add new bugs or slow down parts that
are OK.


> > > > This interface assumes caller take the appropriate locking.
> > > no comments say this, and some places we don't hold locking.
> > > for example, meminfo_proc_show.
> > >
> >
> > This doesnt answer my question about LOCKDEP ;)
> >
> > Just fix the few callers that might need a fix, since this is the only
> > way to deal with potential problems without adding performance penalty
> > (for stable trees)
> I mean the interface doesn't assume caller should take locking. Since
> there isn't locking taking, we should make the interface itself correct,
> instead of fixing caller.
>

No _please_

Q: Is spin_lock() irq safe ?
A: No

Q: Should we make it irq safe ?
A: just use spin_lock_... variants


2011-04-13 03:03:06

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

On Wed, 2011-04-13 at 10:47 +0800, Eric Dumazet wrote:
> Le mercredi 13 avril 2011 à 10:41 +0800, Shaohua Li a écrit :
> > On Wed, 2011-04-13 at 10:32 +0800, Eric Dumazet wrote:
> > > Le mercredi 13 avril 2011 à 09:01 +0800, Shaohua Li a écrit :
> > > > On Tue, 2011-04-12 at 17:03 +0800, Eric Dumazet wrote:
> > > > >
> > > > > Hmm... did you test this with LOCKDEP on ?
> > > > >
> > > > > You add a possible deadlock here.
> > > > >
> > > > > Hint : Some percpu_counter are used from irq context.
> > > > there are some places we didn't disable interrupt, for example
> > > > percpu_counter_add. So the API isn't irq safe to me.
> > > >
> > >
> > > So what ? Callers must disable IRQ before calling percpu_counter_add(),
> > > and they actually do in network stack. Please check again,
> > > tcp_sockets_allocated for example.
> > Did you check other code? for example, __vm_enough_memory() doesn't
> > disable IRQ before calling percpu_counter_add().
> >
>
> Did you read my mails ?
>
> I said : fix the buggy parts
that's the difference. Why the parts are buggy? what I said is the
interface is never IRQ safe.

> dont add new bugs or slow down parts that
> are OK.
>
>
> > > > > This interface assumes caller take the appropriate locking.
> > > > no comments say this, and some places we don't hold locking.
> > > > for example, meminfo_proc_show.
> > > >
> > >
> > > This doesnt answer my question about LOCKDEP ;)
> > >
> > > Just fix the few callers that might need a fix, since this is the only
> > > way to deal with potential problems without adding performance penalty
> > > (for stable trees)
> > I mean the interface doesn't assume caller should take locking. Since
> > there isn't locking taking, we should make the interface itself correct,
> > instead of fixing caller.
> >
>
> No _please_
>
> Q: Is spin_lock() irq safe ?
> A: No
>
> Q: Should we make it irq safe ?
> A: just use spin_lock_... variants
I can do this, but please give a reason. If network code is the only
place requiring disable irq, why not network code do it?

2011-04-13 03:50:46

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

Hello, guys.

On Wed, Apr 13, 2011 at 11:03:01AM +0800, Shaohua Li wrote:
> I can do this, but please give a reason. If network code is the only
> place requiring disable irq, why not network code do it?

This thread is pointless. The next patch converts it to atomic64_t
and the lock is removed anyway. I think Eric's argument makes sense
given that atomic64_t translates into irqsave spinlock (it has to) in
generic 32bit implementation. That said, this is all a moot point.
We might as well simply drop this patch and directly convert to
atomic64_t.

Thanks.

--
tejun

2011-04-13 03:53:49

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

Le mercredi 13 avril 2011 à 11:03 +0800, Shaohua Li a écrit :
> I can do this, but please give a reason. If network code is the only
> place requiring disable irq, why not network code do it?
>

Lot of percpu_counter users dont use full s64 range, but "unsigned long"
or "unsigned int". Adding a lock on 32bit arches to get the s64, then
truncate it is not needed.

This discussion reminds me an old one ;)

http://kerneltrap.org/mailarchive/linux-ext4/2008/12/12/4401894


2011-04-13 04:00:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

Hello,

On Wed, Apr 13, 2011 at 05:53:40AM +0200, Eric Dumazet wrote:
> Le mercredi 13 avril 2011 ? 11:03 +0800, Shaohua Li a ?crit :
> > I can do this, but please give a reason. If network code is the only
> > place requiring disable irq, why not network code do it?
>
> Lot of percpu_counter users dont use full s64 range, but "unsigned long"
> or "unsigned int". Adding a lock on 32bit arches to get the s64, then
> truncate it is not needed.

Yeah, it might hurt 32bit archs a bit but if 64bit becomes better I'll
take that any day. Also, atomic64_t implementation on x86-32 seems
pretty good and doesn't depend on irq spinlocks (which is quite
expensive), so it shouldn't be too bad.

Thanks.

--
tejun

2011-04-13 04:37:12

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH 3/4]percpu_counter: fix code for 32bit systems

On Wed, 2011-04-13 at 11:50 +0800, [email protected] wrote:
> Hello, guys.
>
> On Wed, Apr 13, 2011 at 11:03:01AM +0800, Shaohua Li wrote:
> > I can do this, but please give a reason. If network code is the only
> > place requiring disable irq, why not network code do it?
>
> This thread is pointless. The next patch converts it to atomic64_t
> and the lock is removed anyway. I think Eric's argument makes sense
> given that atomic64_t translates into irqsave spinlock (it has to) in
> generic 32bit implementation. That said, this is all a moot point.
> We might as well simply drop this patch and directly convert to
> atomic64_t.
We need it for UP case anyway. Ok, I'll change it to irqsave in next
post.

Thanks,
Shaohua