2013-06-12 20:45:40

by Tejun Heo

[permalink] [raw]
Subject: [PATCH percpu/for-3.11 1/2] percpu-refcount: cosmetic updates

>From 49d1e1a6561f1e4b190695f36d2594c7c04b8e76 Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Wed, 12 Jun 2013 13:37:42 -0700

* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
postfix for types and the type is gonna be used for a different type
of callback too.

* Add @ARG to function comments.

* Drop unnecessary and unaligned indentation from percpu_ref_init()
function comment.

Signed-off-by: Tejun Heo <[email protected]>
---
Hello,

These two patches implement percpu_ref_tryget() along with
percpu_ref_kill_and_confirm(). The first one is a prep. The second
implements both. This is on top of

percpu/for-3.11 c1ae6e9b4d ("percpu-refcount: Don't use silly cmpxchg()")
+[1] "[PATCH percpu/for-3.11] percpu-refcount: consistently use plain (non-sched) RCU"

and available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git review-percpu-ref-tryget

Thanks.

[1] http://article.gmane.org/gmane.linux.kernel/1507256

include/linux/percpu-refcount.h | 8 +++++---
lib/percpu-refcount.c | 7 ++++---
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index abe1411..b61bd6f 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -51,7 +51,7 @@
#include <linux/rcupdate.h>

struct percpu_ref;
-typedef void (percpu_ref_release)(struct percpu_ref *);
+typedef void (percpu_ref_func_t)(struct percpu_ref *);

struct percpu_ref {
atomic_t count;
@@ -62,11 +62,11 @@ struct percpu_ref {
* percpu_ref_kill_rcu())
*/
unsigned __percpu *pcpu_count;
- percpu_ref_release *release;
+ percpu_ref_func_t *release;
struct rcu_head rcu;
};

-int percpu_ref_init(struct percpu_ref *, percpu_ref_release *);
+int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release);
void percpu_ref_kill(struct percpu_ref *ref);

#define PCPU_STATUS_BITS 2
@@ -78,6 +78,7 @@ void percpu_ref_kill(struct percpu_ref *ref);

/**
* percpu_ref_get - increment a percpu refcount
+ * @ref: percpu_ref to get
*
* Analagous to atomic_inc().
*/
@@ -99,6 +100,7 @@ static inline void percpu_ref_get(struct percpu_ref *ref)

/**
* percpu_ref_put - decrement a percpu refcount
+ * @ref: percpu_ref to put
*
* Decrement the refcount, and if 0, call the release function (which was passed
* to percpu_ref_init())
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 1a17399..9a78e55 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -33,8 +33,8 @@

/**
* percpu_ref_init - initialize a percpu refcount
- * @ref: ref to initialize
- * @release: function which will be called when refcount hits 0
+ * @ref: percpu_ref to initialize
+ * @release: function which will be called when refcount hits 0
*
* Initializes the refcount in single atomic counter mode with a refcount of 1;
* analagous to atomic_set(ref, 1).
@@ -42,7 +42,7 @@
* Note that @release must not sleep - it may potentially be called from RCU
* callback context by percpu_ref_kill().
*/
-int percpu_ref_init(struct percpu_ref *ref, percpu_ref_release *release)
+int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release)
{
atomic_set(&ref->count, 1 + PCPU_COUNT_BIAS);

@@ -98,6 +98,7 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)

/**
* percpu_ref_kill - safely drop initial ref
+ * @ref: percpu_ref to kill
*
* Must be used to drop the initial ref on a percpu refcount; must be called
* precisely once before shutdown.
--
1.8.2.1


2013-06-12 20:46:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

>From de3c0749e2c1960afcc433fc5da136b85c8bd896 Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Wed, 12 Jun 2013 13:37:42 -0700

Implement percpu_tryget() which succeeds iff the refcount hasn't been
killed yet. Because the refcnt is per-cpu, different CPUs may have
different perceptions on when the counter has been killed and tryget()
may continue to succeed for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

Signed-off-by: Tejun Heo <[email protected]>
---
I'll soon post a cgroup patchset which makes use of both functions.

Thanks.

include/linux/percpu-refcount.h | 50 ++++++++++++++++++++++++++++++++++++++++-
lib/percpu-refcount.c | 23 ++++++++++++++-----
2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index b61bd6f..b2dfa0f 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -63,11 +63,28 @@ struct percpu_ref {
*/
unsigned __percpu *pcpu_count;
percpu_ref_func_t *release;
+ percpu_ref_func_t *confirm_kill;
struct rcu_head rcu;
};

int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release);
-void percpu_ref_kill(struct percpu_ref *ref);
+void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
+ percpu_ref_func_t *confirm_kill);
+
+/**
+ * percpu_ref_kill - drop the initial ref
+ * @ref: percpu_ref to kill
+ *
+ * Must be used to drop the initial ref on a percpu refcount; must be called
+ * precisely once before shutdown.
+ *
+ * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
+ * percpu counters and dropping the initial ref.
+ */
+static inline void percpu_ref_kill(struct percpu_ref *ref)
+{
+ return percpu_ref_kill_and_confirm(ref, NULL);
+}

#define PCPU_STATUS_BITS 2
#define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
@@ -99,6 +116,37 @@ static inline void percpu_ref_get(struct percpu_ref *ref)
}

/**
+ * percpu_ref_tryget - try to increment a percpu refcount
+ * @ref: percpu_ref to try-get
+ *
+ * Increment a percpu refcount unless it has already been killed. Returns
+ * %true on success; %false on failure.
+ *
+ * Completion of percpu_ref_kill() in itself doesn't guarantee that tryget
+ * will fail. For such guarantee, percpu_ref_kill_and_confirm() should be
+ * used. After the confirm_kill callback is invoked, it's guaranteed that
+ * no new reference will be given out by percpu_ref_tryget().
+ */
+static inline bool percpu_ref_tryget(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count;
+ int ret = false;
+
+ rcu_read_lock();
+
+ pcpu_count = ACCESS_ONCE(ref->pcpu_count);
+
+ if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
+ __this_cpu_inc(*pcpu_count);
+ ret = true;
+ }
+
+ rcu_read_unlock();
+
+ return ret;
+}
+
+/**
* percpu_ref_put - decrement a percpu refcount
* @ref: percpu_ref to put
*
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 9a78e55..4d9519a 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -89,6 +89,10 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)

atomic_add((int) count - PCPU_COUNT_BIAS, &ref->count);

+ /* @ref is viewed as dead on all CPUs, send out kill confirmation */
+ if (ref->confirm_kill)
+ ref->confirm_kill(ref);
+
/*
* Now we're in single atomic_t mode with a consistent refcount, so it's
* safe to drop our initial ref:
@@ -97,22 +101,29 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
}

/**
- * percpu_ref_kill - safely drop initial ref
+ * percpu_ref_kill_and_confirm - drop the initial ref and schedule confirmation
* @ref: percpu_ref to kill
+ * @confirm_kill: optional confirmation callback
*
- * Must be used to drop the initial ref on a percpu refcount; must be called
- * precisely once before shutdown.
+ * Equivalent to percpu_ref_kill() but also schedules kill confirmation if
+ * @confirm_kill is not NULL. @confirm_kill, which may not block, will be
+ * called after @ref is seen as dead from all CPUs - all further
+ * invocations of percpu_ref_tryget() will fail. See percpu_ref_tryget()
+ * for more details.
*
- * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
- * percpu counters and dropping the initial ref.
+ * It's guaranteed that there will be at least one full RCU grace period
+ * between the invocation of this function and @confirm_kill and the caller
+ * can piggy-back their RCU release on the callback.
*/
-void percpu_ref_kill(struct percpu_ref *ref)
+void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
+ percpu_ref_func_t *confirm_kill)
{
WARN_ONCE(REF_STATUS(ref->pcpu_count) == PCPU_REF_DEAD,
"percpu_ref_kill() called more than once!\n");

ref->pcpu_count = (unsigned __percpu *)
(((unsigned long) ref->pcpu_count)|PCPU_REF_DEAD);
+ ref->confirm_kill = confirm_kill;

call_rcu(&ref->rcu, percpu_ref_kill_rcu);
}
--
1.8.2.1

2013-06-12 20:57:14

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH percpu/for-3.11 1/2] percpu-refcount: cosmetic updates

On Wed, Jun 12, 2013 at 01:45:32PM -0700, Tejun Heo wrote:
> From 49d1e1a6561f1e4b190695f36d2594c7c04b8e76 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <[email protected]>
> Date: Wed, 12 Jun 2013 13:37:42 -0700
>
> * s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
> postfix for types and the type is gonna be used for a different type
> of callback too.
>
> * Add @ARG to function comments.
>
> * Drop unnecessary and unaligned indentation from percpu_ref_init()
> function comment.
>
> Signed-off-by: Tejun Heo <[email protected]>

Had to futz with my indentation :p

Acked-by: Kent Overstreet <[email protected]>

2013-06-12 20:59:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH percpu/for-3.11 1/2] percpu-refcount: cosmetic updates

On Wed, Jun 12, 2013 at 1:57 PM, Kent Overstreet <[email protected]> wrote:
> Had to futz with my indentation :p

Because it didn't align!!!! :P

Thanks.

--
tejun

2013-06-12 21:08:29

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

On Wed, Jun 12, 2013 at 01:46:27PM -0700, Tejun Heo wrote:
> From de3c0749e2c1960afcc433fc5da136b85c8bd896 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <[email protected]>
> Date: Wed, 12 Jun 2013 13:37:42 -0700
>
> Implement percpu_tryget() which succeeds iff the refcount hasn't been
> killed yet. Because the refcnt is per-cpu, different CPUs may have
> different perceptions on when the counter has been killed and tryget()
> may continue to succeed for a while after percpu_ref_kill() returns.

I don't feel very comfortable with saying percpu_ref_tryget() succeeds
"iff the refcount hasn't been killed yet". That's something I would say
about e.g. atomic_inc_not_zero(), but percpu_ref_tryget() doesn't do
that sort of synchronization which is what iff implies to me.

If the user does need some kind of strict ordering between
percpu_ref_kill() and percpu_ref_tryget(), they'd have to insert some
memory barriers - tryget() certainly doesn't have any.

That said, I haven't seen near enough actual uses to know whether this
would be an issue in practice, or what a better description would be. I
mean, tryget() does always get you a valid ref...

Maybe emphasize that tryget() succeeds iff this cpu hasn't seen
percpu_ref_kill() done yet? I dunno.

> For use cases where it's necessary to know when all CPUs start to see
> the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
> function takes an extra argument @confirm_kill which is invoked when
> the refcnt is guaranteed to be viewed as killed on all CPUs.
>
> While this isn't the prettiest interface, it doesn't force synchronous
> wait and is much safer than requiring the caller to do its own
> call_rcu().

Yeah, this seems... icky to me. I'm going to withhold judgement until I
see how it's used, maybe there isn't any other way but I'd like to try
and find something prettier.

> /**
> - * percpu_ref_kill - safely drop initial ref
> + * percpu_ref_kill_and_confirm - drop the initial ref and schedule confirmation
> * @ref: percpu_ref to kill
> + * @confirm_kill: optional confirmation callback
> *
> - * Must be used to drop the initial ref on a percpu refcount; must be called
> - * precisely once before shutdown.
> + * Equivalent to percpu_ref_kill() but also schedules kill confirmation if
> + * @confirm_kill is not NULL. @confirm_kill, which may not block, will be
> + * called after @ref is seen as dead from all CPUs - all further
> + * invocations of percpu_ref_tryget() will fail. See percpu_ref_tryget()
> + * for more details.
> *
> - * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
> - * percpu counters and dropping the initial ref.
> + * It's guaranteed that there will be at least one full RCU grace period
> + * between the invocation of this function and @confirm_kill and the caller
> + * can piggy-back their RCU release on the callback.
> */
> -void percpu_ref_kill(struct percpu_ref *ref)
> +void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> + percpu_ref_func_t *confirm_kill)

Passing release to percpu_ref_init() and confirm_kill to
percpu_ref_kill() is inconsistent. Can we pass them both to
percpu_ref_init()?

Also, given that confirm_kill is an optional thing I don't see why
you're renaming percpu_ref_kill() -> percpu_ref_kill_and_confirm(). Most
users (certainly aio, I think the module code too) don't have any use
for confirm kill, I don't want to rename it for an ugly optional thing.

2013-06-12 21:17:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

Hey, Kent.

On Wed, Jun 12, 2013 at 02:08:24PM -0700, Kent Overstreet wrote:
> On Wed, Jun 12, 2013 at 01:46:27PM -0700, Tejun Heo wrote:
> > From de3c0749e2c1960afcc433fc5da136b85c8bd896 Mon Sep 17 00:00:00 2001
> > From: Tejun Heo <[email protected]>
> > Date: Wed, 12 Jun 2013 13:37:42 -0700
> >
> > Implement percpu_tryget() which succeeds iff the refcount hasn't been
> > killed yet. Because the refcnt is per-cpu, different CPUs may have
> > different perceptions on when the counter has been killed and tryget()
> > may continue to succeed for a while after percpu_ref_kill() returns.
>
> I don't feel very comfortable with saying percpu_ref_tryget() succeeds
> "iff the refcount hasn't been killed yet". That's something I would say

Yeah, the phrasing of the first sentence could be a bit misleading.
It probably should emphasize that there's no synchronization by
default from the beginning.

> about e.g. atomic_inc_not_zero(), but percpu_ref_tryget() doesn't do
> that sort of synchronization which is what iff implies to me.
>
> If the user does need some kind of strict ordering between
> percpu_ref_kill() and percpu_ref_tryget(), they'd have to insert some
> memory barriers - tryget() certainly doesn't have any.

which is why percpu_ref_kill_and_confirm() has been added.

> > While this isn't the prettiest interface, it doesn't force synchronous
> > wait and is much safer than requiring the caller to do its own
> > call_rcu().
>
> Yeah, this seems... icky to me. I'm going to withhold judgement until I
> see how it's used, maybe there isn't any other way but I'd like to try
> and find something prettier.

Yeap, this is icky. If you have any better ideas, I'm all ears.

> > -void percpu_ref_kill(struct percpu_ref *ref)
> > +void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> > + percpu_ref_func_t *confirm_kill)
>
> Passing release to percpu_ref_init() and confirm_kill to
> percpu_ref_kill() is inconsistent. Can we pass them both to
> percpu_ref_init()?

I don't know. Maybe. While they're stored in the same place,
@confirm_kill is really an optional part of killing itself, so
specifying it to kill *seems* like the better place and it also marks
it clearly that something funky is going on during while killing the
reference count.

> Also, given that confirm_kill is an optional thing I don't see why
> you're renaming percpu_ref_kill() -> percpu_ref_kill_and_confirm(). Most
> users (certainly aio, I think the module code too) don't have any use
> for confirm kill, I don't want to rename it for an ugly optional thing.

Hmm? percpu_ref_kill() is still there. It now just calls the ugly
thing with %NULL @confirm_kill.

Thanks.

--
tejun

2013-06-12 21:46:35

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

On Wed, Jun 12, 2013 at 02:17:47PM -0700, Tejun Heo wrote:
> Yeap, this is icky. If you have any better ideas, I'm all ears.

I'm reading through the cgroup patch/code now - this refcounting is
_hairy_ so I could certainly believe the way you've done it is the
sanest way that'll work.

But it does feel like some of the ordering/ownership is backwards here,
and that's where the need for confirm_kill is coming from - also, the
fact that css_tryget() is used more than css_get() is... suspicious.

Basically, you're wanting the lifetime of the subsystems to be
controlled by the lifetime of the cgroup. If that's the case, then
nothing should be taking refs on them (i'm not sure if that's actually
the case) and they shouldn't be taking refs on the cgroup - the cgroup
should kill them directly when its ref goes to 0.

Of course, if they can't just be killed and they really do need
independent lifetimes, then that's a problem - though embedding two
refcounts in the cgroup might solve that (one for the subsystems, one
for everything else).

Uhm, cgroup_offline_fn() seems weird. First it's telling the subsystems
to go away - but all we know is that tryget() is failing, there could
still be outstanding refs. What am I missing? offline_css() doesn't
appear to wait for any refs to hit 0.

Then it says it's putting base refs... but base refs should be put with
percpu_ref_kill(), so what refs are those really?

Then there's a list_del_rcu() - but that should probably happen _before_
the call_rcu that percpu_ref_kill() does (in the absense of the bizzare
cgroup stuff you need the list_del_rcu() or whatever to happen first, so
that you know no new gets() can happen, then then after the call_rcu()
it's safe to drop the base ref... if you drop it before the rcu barrier
you can have a get happen after the ref already hit 0.)

So maybe those lists just aren't used by anything that would take a ref?
Or is there a barrier or something somewhere else that makes it safe?

>
> > > -void percpu_ref_kill(struct percpu_ref *ref)
> > > +void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> > > + percpu_ref_func_t *confirm_kill)
> >
> > Passing release to percpu_ref_init() and confirm_kill to
> > percpu_ref_kill() is inconsistent. Can we pass them both to
> > percpu_ref_init()?
>
> I don't know. Maybe. While they're stored in the same place,
> @confirm_kill is really an optional part of killing itself, so
> specifying it to kill *seems* like the better place and it also marks
> it clearly that something funky is going on during while killing the
> reference count.

I missed the part where you kept percpu_ref_kill() as a wrapper - the
way you did it makes more sense now.

2013-06-12 23:31:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

Hey,

On Wed, Jun 12, 2013 at 02:46:30PM -0700, Kent Overstreet wrote:
> I'm reading through the cgroup patch/code now - this refcounting is
> _hairy_ so I could certainly believe the way you've done it is the
> sanest way that'll work.

There are several different entities involved in the refcnting. It's
hairy.

> But it does feel like some of the ordering/ownership is backwards here,
> and that's where the need for confirm_kill is coming from - also, the
> fact that css_tryget() is used more than css_get() is... suspicious.
>
> Basically, you're wanting the lifetime of the subsystems to be
> controlled by the lifetime of the cgroup. If that's the case, then
> nothing should be taking refs on them (i'm not sure if that's actually
> the case) and they shouldn't be taking refs on the cgroup - the cgroup
> should kill them directly when its ref goes to 0.

The cgroup initiates destruction but controllers are free to hold on
to their part of cgroup (each css) which in turn should pin the cgroup
itself. Each css proxies cgroup for each controller. The reference
counting becomes nested but it doesn't change the overall mechanism.

> Of course, if they can't just be killed and they really do need
> independent lifetimes, then that's a problem - though embedding two
> refcounts in the cgroup might solve that (one for the subsystems, one
> for everything else).

The original intention of the nested refcnting probably is allowing
draining css's separately so that subsystems can be added and removed
from an existing hierarchy. That never worked out, so it could be
that we can collapse all the css refcnts into cgroup refcnt, which BTW
is currently using the dentry reference count. I'm fairly certain
that we can collapse it but that's a separate issue we can deal with
later.

Being collapsed or not doesn't really change the necessity for kill
confirming tho. More on the subject below.

> Uhm, cgroup_offline_fn() seems weird. First it's telling the subsystems
> to go away - but all we know is that tryget() is failing, there could
> still be outstanding refs. What am I missing? offline_css() doesn't
> appear to wait for any refs to hit 0.

offline_css() tells the controllers that the cgroup is being removed
and they should release whatever resources which could be pinning the
css and stop creating such new references, so that whatever in flight
finishes the reference could hit zero.

Some controllers depend on tryget reliably failing for not creating
lasting references, so that's where the requirement for guaranteeing
tryget failure comes in. Note that it has nothing to do with whether
the refcnts are collapsed into one or not. It's still the same deal.
You need to confirm that the one refcnt is visible as killed on all
CPUs before starting offlining.

> Then it says it's putting base refs... but base refs should be put with
> percpu_ref_kill(), so what refs are those really?

Ah, that's me forgetting that the base refs are already put and slab
poisoning missing it probably because the css object tends to still be
in RCU purgatory when the last extra put happens. Will fix.

> Then there's a list_del_rcu() - but that should probably happen _before_
> the call_rcu that percpu_ref_kill() does (in the absense of the bizzare
> cgroup stuff you need the list_del_rcu() or whatever to happen first, so
> that you know no new gets() can happen, then then after the call_rcu()
> it's safe to drop the base ref... if you drop it before the rcu barrier
> you can have a get happen after the ref already hit 0.)

No, there are two separate stages of RCU'ing going on here. One to
disable tryget for ->css_offline. The other to actually free the css
as css is expected to be RCU-read accessible during and after
->css_offline(). The percpu ref is adding an extra RCU stage.

> So maybe those lists just aren't used by anything that would take a ref?
> Or is there a barrier or something somewhere else that makes it safe?

The two RCU grace periods are just protecting different things and
they can't be combined.

Thanks.

--
tejun

2013-06-12 23:34:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

On Wed, Jun 12, 2013 at 04:31:51PM -0700, Tejun Heo wrote:
> The original intention of the nested refcnting probably is allowing
> draining css's separately so that subsystems can be added and removed
> from an existing hierarchy. That never worked out, so it could be
> that we can collapse all the css refcnts into cgroup refcnt, which BTW
> is currently using the dentry reference count. I'm fairly certain
> that we can collapse it but that's a separate issue we can deal with
> later.

Hmmm.... I *think* we might need per-css refcnts to implement unified
hierarchy to allow turning on and off controllers along the single
hierarchy. I'll think more about it but this is sorta off topic for
this series.

Thanks.

--
tejun

2013-06-13 03:49:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH percpu/for-3.11 1/2] percpu-refcount: cosmetic updates

On Wed, Jun 12, 2013 at 01:45:32PM -0700, Tejun Heo wrote:
> From 49d1e1a6561f1e4b190695f36d2594c7c04b8e76 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <[email protected]>
> Date: Wed, 12 Jun 2013 13:37:42 -0700
>
> * s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
> postfix for types and the type is gonna be used for a different type
> of callback too.
>
> * Add @ARG to function comments.
>
> * Drop unnecessary and unaligned indentation from percpu_ref_init()
> function comment.
>
> Signed-off-by: Tejun Heo <[email protected]>

Applied to percpu/for-3.11. Thanks.

--
tejun

2013-06-13 03:50:48

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v2 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed. Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
continue to succeed on some CPUs after kill() returns as suggested
by Kent.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
---
include/linux/percpu-refcount.h | 50 +++++++++++++++++++++++++++++++++++++++-
lib/percpu-refcount.c | 23 +++++++++++++-----
2 files changed, 66 insertions(+), 7 deletions(-)

--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -63,11 +63,28 @@ struct percpu_ref {
*/
unsigned __percpu *pcpu_count;
percpu_ref_func_t *release;
+ percpu_ref_func_t *confirm_kill;
struct rcu_head rcu;
};

int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release);
-void percpu_ref_kill(struct percpu_ref *ref);
+void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
+ percpu_ref_func_t *confirm_kill);
+
+/**
+ * percpu_ref_kill - drop the initial ref
+ * @ref: percpu_ref to kill
+ *
+ * Must be used to drop the initial ref on a percpu refcount; must be called
+ * precisely once before shutdown.
+ *
+ * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
+ * percpu counters and dropping the initial ref.
+ */
+static inline void percpu_ref_kill(struct percpu_ref *ref)
+{
+ return percpu_ref_kill_and_confirm(ref, NULL);
+}

#define PCPU_STATUS_BITS 2
#define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
@@ -99,6 +116,37 @@ static inline void percpu_ref_get(struct
}

/**
+ * percpu_ref_tryget - try to increment a percpu refcount
+ * @ref: percpu_ref to try-get
+ *
+ * Increment a percpu refcount unless it has already been killed. Returns
+ * %true on success; %false on failure.
+ *
+ * Completion of percpu_ref_kill() in itself doesn't guarantee that tryget
+ * will fail. For such guarantee, percpu_ref_kill_and_confirm() should be
+ * used. After the confirm_kill callback is invoked, it's guaranteed that
+ * no new reference will be given out by percpu_ref_tryget().
+ */
+static inline bool percpu_ref_tryget(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count;
+ int ret = false;
+
+ rcu_read_lock();
+
+ pcpu_count = ACCESS_ONCE(ref->pcpu_count);
+
+ if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
+ __this_cpu_inc(*pcpu_count);
+ ret = true;
+ }
+
+ rcu_read_unlock();
+
+ return ret;
+}
+
+/**
* percpu_ref_put - decrement a percpu refcount
* @ref: percpu_ref to put
*
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -89,6 +89,10 @@ static void percpu_ref_kill_rcu(struct r

atomic_add((int) count - PCPU_COUNT_BIAS, &ref->count);

+ /* @ref is viewed as dead on all CPUs, send out kill confirmation */
+ if (ref->confirm_kill)
+ ref->confirm_kill(ref);
+
/*
* Now we're in single atomic_t mode with a consistent refcount, so it's
* safe to drop our initial ref:
@@ -97,22 +101,29 @@ static void percpu_ref_kill_rcu(struct r
}

/**
- * percpu_ref_kill - safely drop initial ref
+ * percpu_ref_kill_and_confirm - drop the initial ref and schedule confirmation
* @ref: percpu_ref to kill
+ * @confirm_kill: optional confirmation callback
*
- * Must be used to drop the initial ref on a percpu refcount; must be called
- * precisely once before shutdown.
+ * Equivalent to percpu_ref_kill() but also schedules kill confirmation if
+ * @confirm_kill is not NULL. @confirm_kill, which may not block, will be
+ * called after @ref is seen as dead from all CPUs - all further
+ * invocations of percpu_ref_tryget() will fail. See percpu_ref_tryget()
+ * for more details.
*
- * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
- * percpu counters and dropping the initial ref.
+ * It's guaranteed that there will be at least one full RCU grace period
+ * between the invocation of this function and @confirm_kill and the caller
+ * can piggy-back their RCU release on the callback.
*/
-void percpu_ref_kill(struct percpu_ref *ref)
+void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
+ percpu_ref_func_t *confirm_kill)
{
WARN_ONCE(REF_STATUS(ref->pcpu_count) == PCPU_REF_DEAD,
"percpu_ref_kill() called more than once!\n");

ref->pcpu_count = (unsigned __percpu *)
(((unsigned long) ref->pcpu_count)|PCPU_REF_DEAD);
+ ref->confirm_kill = confirm_kill;

call_rcu(&ref->rcu, percpu_ref_kill_rcu);
}

2013-06-13 23:13:23

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

On Wed, Jun 12, 2013 at 01:46:27PM -0700, Tejun Heo wrote:
> From de3c0749e2c1960afcc433fc5da136b85c8bd896 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <[email protected]>
> Date: Wed, 12 Jun 2013 13:37:42 -0700
>
> Implement percpu_tryget() which succeeds iff the refcount hasn't been
> killed yet. Because the refcnt is per-cpu, different CPUs may have
> different perceptions on when the counter has been killed and tryget()
> may continue to succeed for a while after percpu_ref_kill() returns.
>
> For use cases where it's necessary to know when all CPUs start to see
> the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
> function takes an extra argument @confirm_kill which is invoked when
> the refcnt is guaranteed to be viewed as killed on all CPUs.
>
> While this isn't the prettiest interface, it doesn't force synchronous
> wait and is much safer than requiring the caller to do its own
> call_rcu().

Ok, slightly grumpily but now I at least understand how it's being used

Acked-by: Kent Overstreet <[email protected]>

>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> I'll soon post a cgroup patchset which makes use of both functions.
>
> Thanks.
>
> include/linux/percpu-refcount.h | 50 ++++++++++++++++++++++++++++++++++++++++-
> lib/percpu-refcount.c | 23 ++++++++++++++-----
> 2 files changed, 66 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index b61bd6f..b2dfa0f 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -63,11 +63,28 @@ struct percpu_ref {
> */
> unsigned __percpu *pcpu_count;
> percpu_ref_func_t *release;
> + percpu_ref_func_t *confirm_kill;
> struct rcu_head rcu;
> };
>
> int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release);
> -void percpu_ref_kill(struct percpu_ref *ref);
> +void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> + percpu_ref_func_t *confirm_kill);
> +
> +/**
> + * percpu_ref_kill - drop the initial ref
> + * @ref: percpu_ref to kill
> + *
> + * Must be used to drop the initial ref on a percpu refcount; must be called
> + * precisely once before shutdown.
> + *
> + * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
> + * percpu counters and dropping the initial ref.
> + */
> +static inline void percpu_ref_kill(struct percpu_ref *ref)
> +{
> + return percpu_ref_kill_and_confirm(ref, NULL);
> +}
>
> #define PCPU_STATUS_BITS 2
> #define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
> @@ -99,6 +116,37 @@ static inline void percpu_ref_get(struct percpu_ref *ref)
> }
>
> /**
> + * percpu_ref_tryget - try to increment a percpu refcount
> + * @ref: percpu_ref to try-get
> + *
> + * Increment a percpu refcount unless it has already been killed. Returns
> + * %true on success; %false on failure.
> + *
> + * Completion of percpu_ref_kill() in itself doesn't guarantee that tryget
> + * will fail. For such guarantee, percpu_ref_kill_and_confirm() should be
> + * used. After the confirm_kill callback is invoked, it's guaranteed that
> + * no new reference will be given out by percpu_ref_tryget().
> + */
> +static inline bool percpu_ref_tryget(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count;
> + int ret = false;
> +
> + rcu_read_lock();
> +
> + pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> + if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
> + __this_cpu_inc(*pcpu_count);
> + ret = true;
> + }
> +
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
> +/**
> * percpu_ref_put - decrement a percpu refcount
> * @ref: percpu_ref to put
> *
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index 9a78e55..4d9519a 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -89,6 +89,10 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
>
> atomic_add((int) count - PCPU_COUNT_BIAS, &ref->count);
>
> + /* @ref is viewed as dead on all CPUs, send out kill confirmation */
> + if (ref->confirm_kill)
> + ref->confirm_kill(ref);
> +
> /*
> * Now we're in single atomic_t mode with a consistent refcount, so it's
> * safe to drop our initial ref:
> @@ -97,22 +101,29 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
> }
>
> /**
> - * percpu_ref_kill - safely drop initial ref
> + * percpu_ref_kill_and_confirm - drop the initial ref and schedule confirmation
> * @ref: percpu_ref to kill
> + * @confirm_kill: optional confirmation callback
> *
> - * Must be used to drop the initial ref on a percpu refcount; must be called
> - * precisely once before shutdown.
> + * Equivalent to percpu_ref_kill() but also schedules kill confirmation if
> + * @confirm_kill is not NULL. @confirm_kill, which may not block, will be
> + * called after @ref is seen as dead from all CPUs - all further
> + * invocations of percpu_ref_tryget() will fail. See percpu_ref_tryget()
> + * for more details.
> *
> - * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
> - * percpu counters and dropping the initial ref.
> + * It's guaranteed that there will be at least one full RCU grace period
> + * between the invocation of this function and @confirm_kill and the caller
> + * can piggy-back their RCU release on the callback.
> */
> -void percpu_ref_kill(struct percpu_ref *ref)
> +void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> + percpu_ref_func_t *confirm_kill)
> {
> WARN_ONCE(REF_STATUS(ref->pcpu_count) == PCPU_REF_DEAD,
> "percpu_ref_kill() called more than once!\n");
>
> ref->pcpu_count = (unsigned __percpu *)
> (((unsigned long) ref->pcpu_count)|PCPU_REF_DEAD);
> + ref->confirm_kill = confirm_kill;
>
> call_rcu(&ref->rcu, percpu_ref_kill_rcu);
> }
> --
> 1.8.2.1
>

2013-06-13 23:44:47

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

On Wed, Jun 12, 2013 at 01:46:27PM -0700, Tejun Heo wrote:
> From de3c0749e2c1960afcc433fc5da136b85c8bd896 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <[email protected]>
> Date: Wed, 12 Jun 2013 13:37:42 -0700
>
> Implement percpu_tryget() which succeeds iff the refcount hasn't been
> killed yet. Because the refcnt is per-cpu, different CPUs may have
> different perceptions on when the counter has been killed and tryget()
> may continue to succeed for a while after percpu_ref_kill() returns.
>
> For use cases where it's necessary to know when all CPUs start to see
> the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
> function takes an extra argument @confirm_kill which is invoked when
> the refcnt is guaranteed to be viewed as killed on all CPUs.
>
> While this isn't the prettiest interface, it doesn't force synchronous
> wait and is much safer than requiring the caller to do its own
> call_rcu().
>
> Signed-off-by: Tejun Heo <[email protected]>

Acked-by: Kent Overstreet <[email protected]>

2013-06-14 02:41:31

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v3 2/2] percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm()

>From dbece3a0f1ef0b19aff1cc6ed0942fec9ab98de1 Mon Sep 17 00:00:00 2001
From: Tejun Heo <[email protected]>
Date: Thu, 13 Jun 2013 19:23:53 -0700

Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed. Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
continue to succeed on some CPUs after kill() returns as suggested
by Kent.

v3: Function comment in percpu_ref_kill_and_confirm() updated warning
people to not depend on the implied RCU grace period from the
confirm callback as it's an implementation detail.

Signed-off-by: Tejun Heo <[email protected]>
Slightly-Grumpily-Acked-by: Kent Overstreet <[email protected]>
---
Applied to percpu/for-3.11.

Thanks!

include/linux/percpu-refcount.h | 50 ++++++++++++++++++++++++++++++++++++++++-
lib/percpu-refcount.c | 23 ++++++++++++++-----
2 files changed, 66 insertions(+), 7 deletions(-)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 6d843d6..dd2a086 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -63,13 +63,30 @@ struct percpu_ref {
*/
unsigned __percpu *pcpu_count;
percpu_ref_func_t *release;
+ percpu_ref_func_t *confirm_kill;
struct rcu_head rcu;
};

int __must_check percpu_ref_init(struct percpu_ref *ref,
percpu_ref_func_t *release);
void percpu_ref_cancel_init(struct percpu_ref *ref);
-void percpu_ref_kill(struct percpu_ref *ref);
+void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
+ percpu_ref_func_t *confirm_kill);
+
+/**
+ * percpu_ref_kill - drop the initial ref
+ * @ref: percpu_ref to kill
+ *
+ * Must be used to drop the initial ref on a percpu refcount; must be called
+ * precisely once before shutdown.
+ *
+ * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
+ * percpu counters and dropping the initial ref.
+ */
+static inline void percpu_ref_kill(struct percpu_ref *ref)
+{
+ return percpu_ref_kill_and_confirm(ref, NULL);
+}

#define PCPU_STATUS_BITS 2
#define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
@@ -101,6 +118,37 @@ static inline void percpu_ref_get(struct percpu_ref *ref)
}

/**
+ * percpu_ref_tryget - try to increment a percpu refcount
+ * @ref: percpu_ref to try-get
+ *
+ * Increment a percpu refcount unless it has already been killed. Returns
+ * %true on success; %false on failure.
+ *
+ * Completion of percpu_ref_kill() in itself doesn't guarantee that tryget
+ * will fail. For such guarantee, percpu_ref_kill_and_confirm() should be
+ * used. After the confirm_kill callback is invoked, it's guaranteed that
+ * no new reference will be given out by percpu_ref_tryget().
+ */
+static inline bool percpu_ref_tryget(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count;
+ int ret = false;
+
+ rcu_read_lock();
+
+ pcpu_count = ACCESS_ONCE(ref->pcpu_count);
+
+ if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
+ __this_cpu_inc(*pcpu_count);
+ ret = true;
+ }
+
+ rcu_read_unlock();
+
+ return ret;
+}
+
+/**
* percpu_ref_put - decrement a percpu refcount
* @ref: percpu_ref to put
*
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index ebeaac2..8bf9e71 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -118,6 +118,10 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)

atomic_add((int) count - PCPU_COUNT_BIAS, &ref->count);

+ /* @ref is viewed as dead on all CPUs, send out kill confirmation */
+ if (ref->confirm_kill)
+ ref->confirm_kill(ref);
+
/*
* Now we're in single atomic_t mode with a consistent refcount, so it's
* safe to drop our initial ref:
@@ -126,22 +130,29 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
}

/**
- * percpu_ref_kill - safely drop initial ref
+ * percpu_ref_kill_and_confirm - drop the initial ref and schedule confirmation
* @ref: percpu_ref to kill
+ * @confirm_kill: optional confirmation callback
*
- * Must be used to drop the initial ref on a percpu refcount; must be called
- * precisely once before shutdown.
+ * Equivalent to percpu_ref_kill() but also schedules kill confirmation if
+ * @confirm_kill is not NULL. @confirm_kill, which may not block, will be
+ * called after @ref is seen as dead from all CPUs - all further
+ * invocations of percpu_ref_tryget() will fail. See percpu_ref_tryget()
+ * for more details.
*
- * Puts @ref in non percpu mode, then does a call_rcu() before gathering up the
- * percpu counters and dropping the initial ref.
+ * Due to the way percpu_ref is implemented, @confirm_kill will be called
+ * after at least one full RCU grace period has passed but this is an
+ * implementation detail and callers must not depend on it.
*/
-void percpu_ref_kill(struct percpu_ref *ref)
+void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
+ percpu_ref_func_t *confirm_kill)
{
WARN_ONCE(REF_STATUS(ref->pcpu_count) == PCPU_REF_DEAD,
"percpu_ref_kill() called more than once!\n");

ref->pcpu_count = (unsigned __percpu *)
(((unsigned long) ref->pcpu_count)|PCPU_REF_DEAD);
+ ref->confirm_kill = confirm_kill;

call_rcu(&ref->rcu, percpu_ref_kill_rcu);
}
--
1.8.2.1