Hello,
Currently, percpu_ref destruction - freeing the percpu counter -
happens when the percpu_ref is released. This while a bit more
convenient restricts how percpu_ref can be used. It can't be
initialized with static percpu area and it can't be reinitialized
without going through percpu allocation.
There are use cases which can take advantage of cycling percpu_ref
through init/release multiple times. This patchset separates out
percpu_ref destruction into percpu_exit() which should be invoked
explicitly and introduces percpu_ref_reinit() which can be used to
recycle a released percpu_ref.
This patchset doesn't add any users. Patchset to use this will soon
be posted.
This patchset contains the following six patches.
0001-percpu-refcount-aio-use-percpu_ref_cancel_init-in-io.patch
0002-percpu-refcount-one-bit-is-enough-for-REF_STATUS.patch
0003-percpu-refcount-add-helpers-for-percpu_count-accesse.patch
0004-percpu-refcount-use-unsigned-long-for-pcpu_count-poi.patch
0005-percpu-refcount-require-percpu_ref-to-be-exited-expl.patch
0006-percpu-refcount-implement-percpu_ref_reinit-and-perc.patch
0001-0004 are prep/cleanups.
0005 separates out percpu destruction into percpu_ref_exit() and 0006
implements percpu_ref_reinit().
This patchset is on top of percpu/for-3.17 6fbc07bbe2b5 ("percpu:
invoke __verify_pcpu_ptr() from the generic part of accessors and
operations").
diffstat follows.
drivers/target/target_core_tpg.c | 4 +
fs/aio.c | 6 +-
include/linux/percpu-refcount.h | 64 +++++++++++++++++++----------
kernel/cgroup.c | 8 ++-
lib/percpu-refcount.c | 86 ++++++++++++++++++++++++---------------
5 files changed, 109 insertions(+), 59 deletions(-)
Thanks.
--
tejun
Now that explicit invocation of percpu_ref_exit() is necessary to free
the percpu counter, we can implement percpu_ref_reinit() which
reinitializes a released percpu_ref. This can be used implement
scalable gating switch which can be drained and then re-opened without
worrying about memory allocation failures.
percpu_ref_is_zero() is added to be used in a sanity check in
percpu_ref_exit(). As this function will be useful for other purposes
too, make it a public interface.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Christoph Lameter <[email protected]>
---
include/linux/percpu-refcount.h | 21 ++++++++++++++++++++-
lib/percpu-refcount.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 0ddd283..80f21ea 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -67,6 +67,7 @@ struct percpu_ref {
int __must_check percpu_ref_init(struct percpu_ref *ref,
percpu_ref_func_t *release);
+void percpu_ref_reinit(struct percpu_ref *ref);
void percpu_ref_exit(struct percpu_ref *ref);
void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
percpu_ref_func_t *confirm_kill);
@@ -97,7 +98,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
unsigned __percpu **pcpu_countp)
{
- unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
+ unsigned long pcpu_ptr;
+
+ /* paired with smp_store_release() in percpu_ref_reinit() */
+ pcpu_ptr = smp_load_acquire(&ref->pcpu_count_ptr);
if (unlikely(pcpu_ptr & PCPU_REF_DEAD))
return false;
@@ -206,4 +210,19 @@ static inline void percpu_ref_put(struct percpu_ref *ref)
rcu_read_unlock_sched();
}
+/**
+ * percpu_ref_is_zero - test whether a percpu refcount reached zero
+ * @ref: percpu_ref to test
+ *
+ * Returns %true if @ref reached zero.
+ */
+static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count;
+
+ if (__pcpu_ref_alive(ref, &pcpu_count))
+ return false;
+ return !atomic_read(&ref->count);
+}
+
#endif
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index ac42991..b348a2f 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -61,6 +61,41 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release)
EXPORT_SYMBOL_GPL(percpu_ref_init);
/**
+ * percpu_ref_reinit - re-initialize a percpu refcount
+ * @ref: perpcu_ref to re-initialize
+ *
+ * Re-initialize @ref so that it's in the same state as when it finished
+ * percpu_ref_init(). @ref must have been initialized successfully, killed
+ * and reached 0 but not exited.
+ *
+ * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while
+ * this function is in progress.
+ */
+void percpu_ref_reinit(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
+ int cpu;
+
+ BUG_ON(!pcpu_count);
+ WARN_ON(!percpu_ref_is_zero(ref));
+
+ atomic_set(&ref->count, 1 + PCPU_COUNT_BIAS);
+
+ /*
+ * Restore per-cpu operation. smp_store_release() is paired with
+ * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
+ * zeroing is visible to all percpu accesses which can see the
+ * following PCPU_REF_DEAD clearing.
+ */
+ for_each_possible_cpu(cpu)
+ *per_cpu_ptr(pcpu_count, cpu) = 0;
+
+ smp_store_release(&ref->pcpu_count_ptr,
+ ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
+}
+EXPORT_SYMBOL_GPL(percpu_ref_reinit);
+
+/**
* percpu_ref_exit - undo percpu_ref_init()
* @ref: percpu_ref to exit
*
--
1.9.3
percpu_ref->pcpu_count is a percpu pointer with a status flag in its
lowest bit. As such, it always goes through arithmetic operations
which is very cumbersome to do on a pointer. It has to be first
casted to unsigned long and then back.
Let's just make the field unsigned long so that we can skip the first
casts. While at it, rename it to pcpu_counter_ptr to clarify that
it's a pointer value.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Christoph Lameter <[email protected]>
---
include/linux/percpu-refcount.h | 4 ++--
lib/percpu-refcount.c | 11 +++++------
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index b62a4ee..6f8cd4c 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -61,7 +61,7 @@ struct percpu_ref {
* hack because we need to keep the pointer around for
* percpu_ref_kill_rcu())
*/
- unsigned __percpu *pcpu_count;
+ unsigned long pcpu_count_ptr;
percpu_ref_func_t *release;
percpu_ref_func_t *confirm_kill;
struct rcu_head rcu;
@@ -99,7 +99,7 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
unsigned __percpu **pcpu_countp)
{
- unsigned long pcpu_ptr = (unsigned long)ACCESS_ONCE(ref->pcpu_count);
+ unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
if (unlikely(pcpu_ptr & PCPU_REF_DEAD))
return false;
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 087f1a0..94e5b62 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -33,7 +33,7 @@
static unsigned __percpu *pcpu_count_ptr(struct percpu_ref *ref)
{
- return (unsigned __percpu *)((unsigned long)ref->pcpu_count & ~PCPU_REF_DEAD);
+ return (unsigned __percpu *)(ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
}
/**
@@ -51,8 +51,8 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release)
{
atomic_set(&ref->count, 1 + PCPU_COUNT_BIAS);
- ref->pcpu_count = alloc_percpu(unsigned);
- if (!ref->pcpu_count)
+ ref->pcpu_count_ptr = (unsigned long)alloc_percpu(unsigned);
+ if (!ref->pcpu_count_ptr)
return -ENOMEM;
ref->release = release;
@@ -153,11 +153,10 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
percpu_ref_func_t *confirm_kill)
{
- WARN_ONCE((unsigned long)ref->pcpu_count & PCPU_REF_DEAD,
+ WARN_ONCE(ref->pcpu_count_ptr & PCPU_REF_DEAD,
"percpu_ref_kill() called more than once!\n");
- ref->pcpu_count = (unsigned __percpu *)
- (((unsigned long) ref->pcpu_count)|PCPU_REF_DEAD);
+ ref->pcpu_count_ptr |= PCPU_REF_DEAD;
ref->confirm_kill = confirm_kill;
call_rcu_sched(&ref->rcu, percpu_ref_kill_rcu);
--
1.9.3
Currently, a percpu_ref undoes percpu_ref_init() automatically by
freeing the allocated percpu area when the percpu_ref is killed.
While seemingly convenient, this has the following niggles.
* It's impossible to re-init a released reference counter without
going through re-allocation.
* In the similar vein, it's impossible to initialize a percpu_ref
count with static percpu variables.
* We need and have an explicit destructor anyway for failure paths -
percpu_ref_cancel_init().
This patch removes the automatic percpu counter freeing in
percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
generic destructor now named percpu_ref_exit(). percpu_ref_destroy()
is considered but it gets confusing with percpu_ref_kill() while
"exit" clearly indicates that it's the counterpart of
percpu_ref_init().
All percpu_ref_cancel_init() users are updated to invoke
percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
added to the destruction path of all percpu_ref users.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Nicholas A. Bellinger <[email protected]>
Cc: Li Zefan <[email protected]>
---
drivers/target/target_core_tpg.c | 4 +++-
fs/aio.c | 6 ++++--
include/linux/percpu-refcount.h | 6 ++----
kernel/cgroup.c | 8 +++++---
lib/percpu-refcount.c | 33 ++++++++++-----------------------
5 files changed, 24 insertions(+), 33 deletions(-)
diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c
index c036595..fddfae6 100644
--- a/drivers/target/target_core_tpg.c
+++ b/drivers/target/target_core_tpg.c
@@ -825,7 +825,7 @@ int core_tpg_add_lun(
ret = core_dev_export(dev, tpg, lun);
if (ret < 0) {
- percpu_ref_cancel_init(&lun->lun_ref);
+ percpu_ref_exit(&lun->lun_ref);
return ret;
}
@@ -880,5 +880,7 @@ int core_tpg_post_dellun(
lun->lun_status = TRANSPORT_LUN_STATUS_FREE;
spin_unlock(&tpg->tpg_lun_lock);
+ percpu_ref_exit(&lun->lun_ref);
+
return 0;
}
diff --git a/fs/aio.c b/fs/aio.c
index 5e0d7f9..ea1bc2e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -506,6 +506,8 @@ static void free_ioctx(struct work_struct *work)
aio_free_ring(ctx);
free_percpu(ctx->cpu);
+ percpu_ref_exit(&ctx->reqs);
+ percpu_ref_exit(&ctx->users);
kmem_cache_free(kioctx_cachep, ctx);
}
@@ -715,8 +717,8 @@ err_ctx:
err:
mutex_unlock(&ctx->ring_lock);
free_percpu(ctx->cpu);
- percpu_ref_cancel_init(&ctx->reqs);
- percpu_ref_cancel_init(&ctx->users);
+ percpu_ref_exit(&ctx->reqs);
+ percpu_ref_exit(&ctx->users);
kmem_cache_free(kioctx_cachep, ctx);
pr_debug("error allocating ioctx %d\n", err);
return ERR_PTR(err);
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 6f8cd4c..0ddd283 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -57,9 +57,7 @@ struct percpu_ref {
atomic_t count;
/*
* The low bit of the pointer indicates whether the ref is in percpu
- * mode; if set, then get/put will manipulate the atomic_t (this is a
- * hack because we need to keep the pointer around for
- * percpu_ref_kill_rcu())
+ * mode; if set, then get/put will manipulate the atomic_t.
*/
unsigned long pcpu_count_ptr;
percpu_ref_func_t *release;
@@ -69,7 +67,7 @@ struct percpu_ref {
int __must_check percpu_ref_init(struct percpu_ref *ref,
percpu_ref_func_t *release);
-void percpu_ref_cancel_init(struct percpu_ref *ref);
+void percpu_ref_exit(struct percpu_ref *ref);
void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
percpu_ref_func_t *confirm_kill);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7868fc3..c06aa5e 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1638,7 +1638,7 @@ destroy_root:
exit_root_id:
cgroup_exit_root_id(root);
cancel_ref:
- percpu_ref_cancel_init(&root_cgrp->self.refcnt);
+ percpu_ref_exit(&root_cgrp->self.refcnt);
out:
free_cgrp_cset_links(&tmp_links);
return ret;
@@ -4133,6 +4133,8 @@ static void css_free_work_fn(struct work_struct *work)
container_of(work, struct cgroup_subsys_state, destroy_work);
struct cgroup *cgrp = css->cgroup;
+ percpu_ref_exit(&css->refcnt);
+
if (css->ss) {
/* css free path */
if (css->parent)
@@ -4330,7 +4332,7 @@ err_list_del:
err_free_id:
cgroup_idr_remove(&ss->css_idr, css->id);
err_free_percpu_ref:
- percpu_ref_cancel_init(&css->refcnt);
+ percpu_ref_exit(&css->refcnt);
err_free_css:
call_rcu(&css->rcu_head, css_free_rcu_fn);
return err;
@@ -4441,7 +4443,7 @@ static int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
out_free_id:
cgroup_idr_remove(&root->cgroup_idr, cgrp->id);
out_cancel_ref:
- percpu_ref_cancel_init(&cgrp->self.refcnt);
+ percpu_ref_exit(&cgrp->self.refcnt);
out_free_cgrp:
kfree(cgrp);
out_unlock:
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 94e5b62..ac42991 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -61,36 +61,25 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release)
EXPORT_SYMBOL_GPL(percpu_ref_init);
/**
- * percpu_ref_cancel_init - cancel percpu_ref_init()
- * @ref: percpu_ref to cancel init for
+ * percpu_ref_exit - undo percpu_ref_init()
+ * @ref: percpu_ref to exit
*
- * Once a percpu_ref is initialized, its destruction is initiated by
- * percpu_ref_kill() and completes asynchronously, which can be painful to
- * do when destroying a half-constructed object in init failure path.
- *
- * This function destroys @ref without invoking @ref->release and the
- * memory area containing it can be freed immediately on return. To
- * prevent accidental misuse, it's required that @ref has finished
- * percpu_ref_init(), whether successful or not, but never used.
- *
- * The weird name and usage restriction are to prevent people from using
- * this function by mistake for normal shutdown instead of
- * percpu_ref_kill().
+ * This function exits @ref. The caller is responsible for ensuring that
+ * @ref is no longer in active use. The usual places to invoke this
+ * function from are the @ref->release() callback or in init failure path
+ * where percpu_ref_init() succeeded but other parts of the initialization
+ * of the embedding object failed.
*/
-void percpu_ref_cancel_init(struct percpu_ref *ref)
+void percpu_ref_exit(struct percpu_ref *ref)
{
unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
- int cpu;
-
- WARN_ON_ONCE(atomic_read(&ref->count) != 1 + PCPU_COUNT_BIAS);
if (pcpu_count) {
- for_each_possible_cpu(cpu)
- WARN_ON_ONCE(*per_cpu_ptr(pcpu_count, cpu));
free_percpu(pcpu_count);
+ ref->pcpu_count_ptr = PCPU_REF_DEAD;
}
}
-EXPORT_SYMBOL_GPL(percpu_ref_cancel_init);
+EXPORT_SYMBOL_GPL(percpu_ref_exit);
static void percpu_ref_kill_rcu(struct rcu_head *rcu)
{
@@ -102,8 +91,6 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
for_each_possible_cpu(cpu)
count += *per_cpu_ptr(pcpu_count, cpu);
- free_percpu(pcpu_count);
-
pr_debug("global %i pcpu %i", atomic_read(&ref->count), (int) count);
/*
--
1.9.3
percpu-refcount currently reserves two lowest bits of its percpu
pointer to indicate its state; however, only one bit is used for
PCPU_REF_DEAD.
Simplify it by removing PCPU_STATUS_BITS/MASK and testing
PCPU_REF_DEAD directly. This also allows the compiler to choose a
more efficient instruction depending on the architecture.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Christoph Lameter <[email protected]>
---
include/linux/percpu-refcount.h | 4 +---
lib/percpu-refcount.c | 2 +-
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 5d8920e..bfdeb0d4 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -88,12 +88,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
return percpu_ref_kill_and_confirm(ref, NULL);
}
-#define PCPU_STATUS_BITS 2
-#define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
#define PCPU_REF_PTR 0
#define PCPU_REF_DEAD 1
-#define REF_STATUS(count) (((unsigned long) count) & PCPU_STATUS_MASK)
+#define REF_STATUS(count) (((unsigned long) count) & PCPU_REF_DEAD)
/**
* percpu_ref_get - increment a percpu refcount
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 963b703..17bce2b 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -96,7 +96,7 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
/* Mask out PCPU_REF_DEAD */
pcpu_count = (unsigned __percpu *)
- (((unsigned long) pcpu_count) & ~PCPU_STATUS_MASK);
+ (((unsigned long) pcpu_count) & ~PCPU_REF_DEAD);
for_each_possible_cpu(cpu)
count += *per_cpu_ptr(pcpu_count, cpu);
--
1.9.3
* All four percpu_ref_*() operations implemented in the header file
perform the same operation to determine whether the percpu_ref is
alive and extract the percpu pointer. Factor out the common logic
into __pcpu_ref_alive(). This doesn't change the generated code.
* There are a couple places in percpu-refcount.c which masks out
PCPU_REF_DEAD to obtain the percpu pointer. Factor it out into
pcpu_count_ptr().
* The above changes make the WARN_ON_ONCE() conditional at the top of
percpu_ref_kill_and_confirm() the only user of REF_STATUS(). Test
PCPU_REF_DEAD directly and remove REF_STATUS().
This patch doesn't introduce any functional change.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Christoph Lameter <[email protected]>
---
include/linux/percpu-refcount.h | 35 +++++++++++++++++++++--------------
lib/percpu-refcount.c | 17 +++++++++--------
2 files changed, 30 insertions(+), 22 deletions(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index bfdeb0d4..b62a4ee 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -88,10 +88,25 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
return percpu_ref_kill_and_confirm(ref, NULL);
}
-#define PCPU_REF_PTR 0
#define PCPU_REF_DEAD 1
-#define REF_STATUS(count) (((unsigned long) count) & PCPU_REF_DEAD)
+/*
+ * Internal helper. Don't use outside percpu-refcount proper. The
+ * function doesn't return the pointer and let the caller test it for NULL
+ * because doing so forces the compiler to generate two conditional
+ * branches as it can't assume that @ref->pcpu_count is not NULL.
+ */
+static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
+ unsigned __percpu **pcpu_countp)
+{
+ unsigned long pcpu_ptr = (unsigned long)ACCESS_ONCE(ref->pcpu_count);
+
+ if (unlikely(pcpu_ptr & PCPU_REF_DEAD))
+ return false;
+
+ *pcpu_countp = (unsigned __percpu *)pcpu_ptr;
+ return true;
+}
/**
* percpu_ref_get - increment a percpu refcount
@@ -105,9 +120,7 @@ static inline void percpu_ref_get(struct percpu_ref *ref)
rcu_read_lock_sched();
- pcpu_count = ACCESS_ONCE(ref->pcpu_count);
-
- if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
+ if (__pcpu_ref_alive(ref, &pcpu_count))
this_cpu_inc(*pcpu_count);
else
atomic_inc(&ref->count);
@@ -131,9 +144,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref *ref)
rcu_read_lock_sched();
- pcpu_count = ACCESS_ONCE(ref->pcpu_count);
-
- if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
+ if (__pcpu_ref_alive(ref, &pcpu_count)) {
this_cpu_inc(*pcpu_count);
ret = true;
} else {
@@ -166,9 +177,7 @@ static inline bool percpu_ref_tryget_live(struct percpu_ref *ref)
rcu_read_lock_sched();
- pcpu_count = ACCESS_ONCE(ref->pcpu_count);
-
- if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR)) {
+ if (__pcpu_ref_alive(ref, &pcpu_count)) {
this_cpu_inc(*pcpu_count);
ret = true;
}
@@ -191,9 +200,7 @@ static inline void percpu_ref_put(struct percpu_ref *ref)
rcu_read_lock_sched();
- pcpu_count = ACCESS_ONCE(ref->pcpu_count);
-
- if (likely(REF_STATUS(pcpu_count) == PCPU_REF_PTR))
+ if (__pcpu_ref_alive(ref, &pcpu_count))
this_cpu_dec(*pcpu_count);
else if (unlikely(atomic_dec_and_test(&ref->count)))
ref->release(ref);
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 17bce2b..087f1a0 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -31,6 +31,11 @@
#define PCPU_COUNT_BIAS (1U << 31)
+static unsigned __percpu *pcpu_count_ptr(struct percpu_ref *ref)
+{
+ return (unsigned __percpu *)((unsigned long)ref->pcpu_count & ~PCPU_REF_DEAD);
+}
+
/**
* percpu_ref_init - initialize a percpu refcount
* @ref: percpu_ref to initialize
@@ -74,7 +79,7 @@ EXPORT_SYMBOL_GPL(percpu_ref_init);
*/
void percpu_ref_cancel_init(struct percpu_ref *ref)
{
- unsigned __percpu *pcpu_count = ref->pcpu_count;
+ unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
int cpu;
WARN_ON_ONCE(atomic_read(&ref->count) != 1 + PCPU_COUNT_BIAS);
@@ -82,7 +87,7 @@ void percpu_ref_cancel_init(struct percpu_ref *ref)
if (pcpu_count) {
for_each_possible_cpu(cpu)
WARN_ON_ONCE(*per_cpu_ptr(pcpu_count, cpu));
- free_percpu(ref->pcpu_count);
+ free_percpu(pcpu_count);
}
}
EXPORT_SYMBOL_GPL(percpu_ref_cancel_init);
@@ -90,14 +95,10 @@ EXPORT_SYMBOL_GPL(percpu_ref_cancel_init);
static void percpu_ref_kill_rcu(struct rcu_head *rcu)
{
struct percpu_ref *ref = container_of(rcu, struct percpu_ref, rcu);
- unsigned __percpu *pcpu_count = ref->pcpu_count;
+ unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
unsigned count = 0;
int cpu;
- /* Mask out PCPU_REF_DEAD */
- pcpu_count = (unsigned __percpu *)
- (((unsigned long) pcpu_count) & ~PCPU_REF_DEAD);
-
for_each_possible_cpu(cpu)
count += *per_cpu_ptr(pcpu_count, cpu);
@@ -152,7 +153,7 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
percpu_ref_func_t *confirm_kill)
{
- WARN_ONCE(REF_STATUS(ref->pcpu_count) == PCPU_REF_DEAD,
+ WARN_ONCE((unsigned long)ref->pcpu_count & PCPU_REF_DEAD,
"percpu_ref_kill() called more than once!\n");
ref->pcpu_count = (unsigned __percpu *)
--
1.9.3
ioctx_alloc() reaches inside percpu_ref and directly frees
->pcpu_count in its failure path, which is quite gross. percpu_ref
has been providing a proper interface to do this,
percpu_ref_cancel_init(), for quite some time now. Let's use that
instead.
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Kent Overstreet <[email protected]>
---
fs/aio.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 4f078c0..5e0d7f9 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -715,8 +715,8 @@ err_ctx:
err:
mutex_unlock(&ctx->ring_lock);
free_percpu(ctx->cpu);
- free_percpu(ctx->reqs.pcpu_count);
- free_percpu(ctx->users.pcpu_count);
+ percpu_ref_cancel_init(&ctx->reqs);
+ percpu_ref_cancel_init(&ctx->users);
kmem_cache_free(kioctx_cachep, ctx);
pr_debug("error allocating ioctx %d\n", err);
return ERR_PTR(err);
--
1.9.3
On 06/18/2014 09:08 AM, Tejun Heo wrote:
> percpu-refcount currently reserves two lowest bits of its percpu
> pointer to indicate its state; however, only one bit is used for
> PCPU_REF_DEAD.
>
> Simplify it by removing PCPU_STATUS_BITS/MASK and testing
> PCPU_REF_DEAD directly. This also allows the compiler to choose a
> more efficient instruction depending on the architecture.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Kent Overstreet <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> ---
> include/linux/percpu-refcount.h | 4 +---
> lib/percpu-refcount.c | 2 +-
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index 5d8920e..bfdeb0d4 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -88,12 +88,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
> return percpu_ref_kill_and_confirm(ref, NULL);
> }
>
> -#define PCPU_STATUS_BITS 2
Just using "#define PCPU_STATUS_BITS 1" can achieve exactly the same
result as you need. And it is better for future extensions for the status.
Anyway,
Reviewed-by: Lai Jiangshan <[email protected]>
> -#define PCPU_STATUS_MASK ((1 << PCPU_STATUS_BITS) - 1)
> #define PCPU_REF_PTR 0
> #define PCPU_REF_DEAD 1
>
> -#define REF_STATUS(count) (((unsigned long) count) & PCPU_STATUS_MASK)
> +#define REF_STATUS(count) (((unsigned long) count) & PCPU_REF_DEAD)
>
> /**
> * percpu_ref_get - increment a percpu refcount
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index 963b703..17bce2b 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -96,7 +96,7 @@ static void percpu_ref_kill_rcu(struct rcu_head *rcu)
>
> /* Mask out PCPU_REF_DEAD */
> pcpu_count = (unsigned __percpu *)
> - (((unsigned long) pcpu_count) & ~PCPU_STATUS_MASK);
> + (((unsigned long) pcpu_count) & ~PCPU_REF_DEAD);
>
> for_each_possible_cpu(cpu)
> count += *per_cpu_ptr(pcpu_count, cpu);
On 06/18/2014 09:08 AM, Tejun Heo wrote:
> Now that explicit invocation of percpu_ref_exit() is necessary to free
> the percpu counter, we can implement percpu_ref_reinit() which
> reinitializes a released percpu_ref. This can be used implement
> scalable gating switch which can be drained and then re-opened without
> worrying about memory allocation failures.
>
> percpu_ref_is_zero() is added to be used in a sanity check in
> percpu_ref_exit(). As this function will be useful for other purposes
> too, make it a public interface.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Kent Overstreet <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> ---
> include/linux/percpu-refcount.h | 21 ++++++++++++++++++++-
> lib/percpu-refcount.c | 35 +++++++++++++++++++++++++++++++++++
> 2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index 0ddd283..80f21ea 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -67,6 +67,7 @@ struct percpu_ref {
>
> int __must_check percpu_ref_init(struct percpu_ref *ref,
> percpu_ref_func_t *release);
> +void percpu_ref_reinit(struct percpu_ref *ref);
> void percpu_ref_exit(struct percpu_ref *ref);
> void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> percpu_ref_func_t *confirm_kill);
> @@ -97,7 +98,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
> static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
> unsigned __percpu **pcpu_countp)
> {
> - unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
> + unsigned long pcpu_ptr;
> +
> + /* paired with smp_store_release() in percpu_ref_reinit() */
> + pcpu_ptr = smp_load_acquire(&ref->pcpu_count_ptr);
Does "smp_load_acquire()" hurts the performance of percpu_ref_get/put()
in non-x86 system?
>
> if (unlikely(pcpu_ptr & PCPU_REF_DEAD))
> return false;
> @@ -206,4 +210,19 @@ static inline void percpu_ref_put(struct percpu_ref *ref)
> rcu_read_unlock_sched();
> }
>
> +/**
> + * percpu_ref_is_zero - test whether a percpu refcount reached zero
> + * @ref: percpu_ref to test
> + *
> + * Returns %true if @ref reached zero.
> + */
> +static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count;
> +
> + if (__pcpu_ref_alive(ref, &pcpu_count))
> + return false;
> + return !atomic_read(&ref->count);
> +}
> +
> #endif
> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
> index ac42991..b348a2f 100644
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -61,6 +61,41 @@ int percpu_ref_init(struct percpu_ref *ref, percpu_ref_func_t *release)
> EXPORT_SYMBOL_GPL(percpu_ref_init);
>
> /**
> + * percpu_ref_reinit - re-initialize a percpu refcount
> + * @ref: perpcu_ref to re-initialize
> + *
> + * Re-initialize @ref so that it's in the same state as when it finished
> + * percpu_ref_init(). @ref must have been initialized successfully, killed
> + * and reached 0 but not exited.
> + *
> + * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while
> + * this function is in progress.
> + */
> +void percpu_ref_reinit(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
> + int cpu;
> +
> + BUG_ON(!pcpu_count);
> + WARN_ON(!percpu_ref_is_zero(ref));
> +
> + atomic_set(&ref->count, 1 + PCPU_COUNT_BIAS);
> +
> + /*
> + * Restore per-cpu operation. smp_store_release() is paired with
> + * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
> + * zeroing is visible to all percpu accesses which can see the
> + * following PCPU_REF_DEAD clearing.
> + */
> + for_each_possible_cpu(cpu)
> + *per_cpu_ptr(pcpu_count, cpu) = 0;
> +
> + smp_store_release(&ref->pcpu_count_ptr,
> + ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
> +}
> +EXPORT_SYMBOL_GPL(percpu_ref_reinit);
> +
> +/**
> * percpu_ref_exit - undo percpu_ref_init()
> * @ref: percpu_ref to exit
> *
On Wed, Jun 18, 2014 at 11:37:35AM +0800, Lai Jiangshan wrote:
> > @@ -97,7 +98,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
> > static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
> > unsigned __percpu **pcpu_countp)
> > {
> > - unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
> > + unsigned long pcpu_ptr;
> > +
> > + /* paired with smp_store_release() in percpu_ref_reinit() */
> > + pcpu_ptr = smp_load_acquire(&ref->pcpu_count_ptr);
>
>
> Does "smp_load_acquire()" hurts the performance of percpu_ref_get/put()
> in non-x86 system?
It's equivalent to data dependency barrier. The only arch which needs
something more than barrier() is alpha. It isn't an issue.
Thanks.
--
tejun
On 06/18/2014 11:32 PM, Tejun Heo wrote:
> On Wed, Jun 18, 2014 at 11:37:35AM +0800, Lai Jiangshan wrote:
>>> @@ -97,7 +98,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
>>> static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
>>> unsigned __percpu **pcpu_countp)
>>> {
>>> - unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
>>> + unsigned long pcpu_ptr;
>>> +
>>> + /* paired with smp_store_release() in percpu_ref_reinit() */
>>> + pcpu_ptr = smp_load_acquire(&ref->pcpu_count_ptr);
>>
>>
>> Does "smp_load_acquire()" hurts the performance of percpu_ref_get/put()
>> in non-x86 system?
>
> It's equivalent to data dependency barrier. The only arch which needs
> something more than barrier() is alpha. It isn't an issue.
>
But I searched from the source, smp_load_acquire() is just barrier() in
x86, arm64, ia64, s390, sparc, but it includes memory barrier
instruction in other archs.
CC Paul. If smp_load_acquire() is sufficient lightweight, I would update
the SRCU.
Thanks,
Lai
On Thu, Jun 19, 2014 at 09:58:16AM +0800, Lai Jiangshan wrote:
> On 06/18/2014 11:32 PM, Tejun Heo wrote:
> > On Wed, Jun 18, 2014 at 11:37:35AM +0800, Lai Jiangshan wrote:
> >>> @@ -97,7 +98,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
> >>> static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
> >>> unsigned __percpu **pcpu_countp)
> >>> {
> >>> - unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
> >>> + unsigned long pcpu_ptr;
> >>> +
> >>> + /* paired with smp_store_release() in percpu_ref_reinit() */
> >>> + pcpu_ptr = smp_load_acquire(&ref->pcpu_count_ptr);
> >>
> >>
> >> Does "smp_load_acquire()" hurts the performance of percpu_ref_get/put()
> >> in non-x86 system?
> >
> > It's equivalent to data dependency barrier. The only arch which needs
> > something more than barrier() is alpha. It isn't an issue.
> >
>
> But I searched from the source, smp_load_acquire() is just barrier() in
> x86, arm64, ia64, s390, sparc, but it includes memory barrier
> instruction in other archs.
Hmmm, right, it's a stronger guarantee than the data dependency
barrier. This should probably use smp_wmb() and
smp_read_barrier_depends(). That's all it needs anyway.
Thanks.
--
tejun
Now that explicit invocation of percpu_ref_exit() is necessary to free
the percpu counter, we can implement percpu_ref_reinit() which
reinitializes a released percpu_ref. This can be used implement
scalable gating switch which can be drained and then re-opened without
worrying about memory allocation failures.
percpu_ref_is_zero() is added to be used in a sanity check in
percpu_ref_exit(). As this function will be useful for other purposes
too, make it a public interface.
v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We
only need data dep barrier and smp_load_acquire() is stronger and
heavier on some archs. Spotted by Lai Jiangshan.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Lai Jiangshan <[email protected]>
---
include/linux/percpu-refcount.h | 19 +++++++++++++++++++
lib/percpu-refcount.c | 35 +++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -67,6 +67,7 @@ struct percpu_ref {
int __must_check percpu_ref_init(struct percpu_ref *ref,
percpu_ref_func_t *release);
+void percpu_ref_reinit(struct percpu_ref *ref);
void percpu_ref_exit(struct percpu_ref *ref);
void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
percpu_ref_func_t *confirm_kill);
@@ -99,6 +100,9 @@ static inline bool __pcpu_ref_alive(stru
{
unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
+ /* paired with smp_store_release() in percpu_ref_reinit() */
+ smp_read_barrier_depends();
+
if (unlikely(pcpu_ptr & PCPU_REF_DEAD))
return false;
@@ -206,4 +210,19 @@ static inline void percpu_ref_put(struct
rcu_read_unlock_sched();
}
+/**
+ * percpu_ref_is_zero - test whether a percpu refcount reached zero
+ * @ref: percpu_ref to test
+ *
+ * Returns %true if @ref reached zero.
+ */
+static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count;
+
+ if (__pcpu_ref_alive(ref, &pcpu_count))
+ return false;
+ return !atomic_read(&ref->count);
+}
+
#endif
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -61,6 +61,41 @@ int percpu_ref_init(struct percpu_ref *r
EXPORT_SYMBOL_GPL(percpu_ref_init);
/**
+ * percpu_ref_reinit - re-initialize a percpu refcount
+ * @ref: perpcu_ref to re-initialize
+ *
+ * Re-initialize @ref so that it's in the same state as when it finished
+ * percpu_ref_init(). @ref must have been initialized successfully, killed
+ * and reached 0 but not exited.
+ *
+ * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while
+ * this function is in progress.
+ */
+void percpu_ref_reinit(struct percpu_ref *ref)
+{
+ unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
+ int cpu;
+
+ BUG_ON(!pcpu_count);
+ WARN_ON(!percpu_ref_is_zero(ref));
+
+ atomic_set(&ref->count, 1 + PCPU_COUNT_BIAS);
+
+ /*
+ * Restore per-cpu operation. smp_store_release() is paired with
+ * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
+ * zeroing is visible to all percpu accesses which can see the
+ * following PCPU_REF_DEAD clearing.
+ */
+ for_each_possible_cpu(cpu)
+ *per_cpu_ptr(pcpu_count, cpu) = 0;
+
+ smp_store_release(&ref->pcpu_count_ptr,
+ ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
+}
+EXPORT_SYMBOL_GPL(percpu_ref_reinit);
+
+/**
* percpu_ref_exit - undo percpu_ref_init()
* @ref: percpu_ref to exit
*
On Wed, Jun 18, 2014 at 10:07:27PM -0400, Tejun Heo wrote:
> On Thu, Jun 19, 2014 at 09:58:16AM +0800, Lai Jiangshan wrote:
> > On 06/18/2014 11:32 PM, Tejun Heo wrote:
> > > On Wed, Jun 18, 2014 at 11:37:35AM +0800, Lai Jiangshan wrote:
> > >>> @@ -97,7 +98,10 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
> > >>> static inline bool __pcpu_ref_alive(struct percpu_ref *ref,
> > >>> unsigned __percpu **pcpu_countp)
> > >>> {
> > >>> - unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
> > >>> + unsigned long pcpu_ptr;
> > >>> +
> > >>> + /* paired with smp_store_release() in percpu_ref_reinit() */
> > >>> + pcpu_ptr = smp_load_acquire(&ref->pcpu_count_ptr);
> > >>
> > >>
> > >> Does "smp_load_acquire()" hurts the performance of percpu_ref_get/put()
> > >> in non-x86 system?
> > >
> > > It's equivalent to data dependency barrier. The only arch which needs
> > > something more than barrier() is alpha. It isn't an issue.
> >
> > But I searched from the source, smp_load_acquire() is just barrier() in
> > x86, arm64, ia64, s390, sparc, but it includes memory barrier
> > instruction in other archs.
>
> Hmmm, right, it's a stronger guarantee than the data dependency
> barrier. This should probably use smp_wmb() and
> smp_read_barrier_depends(). That's all it needs anyway.
Yep, smp_load_acquire() orders its load against later loads and stores,
so it really does need a memory barrier on weakly ordered systems.
This is the "publish" operation for dynamically allocated per-CPU
references? If so, agreed, you should be able to rely on dependency
ordering. Make sure to comment the smp_read_barrier_depends(). ;-)
Thanx, Paul
> Thanks.
>
> --
> tejun
>
On 06/19/2014 10:20 AM, Tejun Heo wrote:
> Now that explicit invocation of percpu_ref_exit() is necessary to free
> the percpu counter, we can implement percpu_ref_reinit() which
> reinitializes a released percpu_ref. This can be used implement
> scalable gating switch which can be drained and then re-opened without
> worrying about memory allocation failures.
>
> percpu_ref_is_zero() is added to be used in a sanity check in
> percpu_ref_exit(). As this function will be useful for other purposes
> too, make it a public interface.
>
> v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We
> only need data dep barrier and smp_load_acquire() is stronger and
> heavier on some archs. Spotted by Lai Jiangshan.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Kent Overstreet <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> ---
> include/linux/percpu-refcount.h | 19 +++++++++++++++++++
> lib/percpu-refcount.c | 35 +++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -67,6 +67,7 @@ struct percpu_ref {
>
> int __must_check percpu_ref_init(struct percpu_ref *ref,
> percpu_ref_func_t *release);
> +void percpu_ref_reinit(struct percpu_ref *ref);
> void percpu_ref_exit(struct percpu_ref *ref);
> void percpu_ref_kill_and_confirm(struct percpu_ref *ref,
> percpu_ref_func_t *confirm_kill);
> @@ -99,6 +100,9 @@ static inline bool __pcpu_ref_alive(stru
> {
> unsigned long pcpu_ptr = ACCESS_ONCE(ref->pcpu_count_ptr);
>
> + /* paired with smp_store_release() in percpu_ref_reinit() */
> + smp_read_barrier_depends();
> +
> if (unlikely(pcpu_ptr & PCPU_REF_DEAD))
> return false;
>
> @@ -206,4 +210,19 @@ static inline void percpu_ref_put(struct
> rcu_read_unlock_sched();
> }
>
> +/**
> + * percpu_ref_is_zero - test whether a percpu refcount reached zero
> + * @ref: percpu_ref to test
> + *
> + * Returns %true if @ref reached zero.
> + */
> +static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count;
> +
> + if (__pcpu_ref_alive(ref, &pcpu_count))
> + return false;
> + return !atomic_read(&ref->count);
> +}
> +
> #endif
> --- a/lib/percpu-refcount.c
> +++ b/lib/percpu-refcount.c
> @@ -61,6 +61,41 @@ int percpu_ref_init(struct percpu_ref *r
> EXPORT_SYMBOL_GPL(percpu_ref_init);
>
> /**
> + * percpu_ref_reinit - re-initialize a percpu refcount
> + * @ref: perpcu_ref to re-initialize
> + *
> + * Re-initialize @ref so that it's in the same state as when it finished
> + * percpu_ref_init(). @ref must have been initialized successfully, killed
> + * and reached 0 but not exited.
> + *
> + * Note that percpu_ref_tryget[_live]() are safe to perform on @ref while
> + * this function is in progress.
> + */
> +void percpu_ref_reinit(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count = pcpu_count_ptr(ref);
> + int cpu;
> +
> + BUG_ON(!pcpu_count);
> + WARN_ON(!percpu_ref_is_zero(ref));
> +
> + atomic_set(&ref->count, 1 + PCPU_COUNT_BIAS);
> +
> + /*
> + * Restore per-cpu operation. smp_store_release() is paired with
> + * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
s/smp_load_acquire()/smp_read_barrier_depends()/
s/smp_store_release()/smp_mb()/ if you accept my next comment.
> + * zeroing is visible to all percpu accesses which can see the
> + * following PCPU_REF_DEAD clearing.
> + */
> + for_each_possible_cpu(cpu)
> + *per_cpu_ptr(pcpu_count, cpu) = 0;
> +
> + smp_store_release(&ref->pcpu_count_ptr,
> + ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
I think it would be better if smp_mb() is used.
it is documented that smp_read_barrier_depends() and smp_mb() are paired.
Not smp_read_barrier_depends() and smp_store_release().
> +}
> +EXPORT_SYMBOL_GPL(percpu_ref_reinit);
> +
> +/**
> * percpu_ref_exit - undo percpu_ref_init()
> * @ref: percpu_ref to exit
> *
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>
On Thu, Jun 19, 2014 at 11:01:26AM +0800, Lai Jiangshan wrote:
> > + /*
> > + * Restore per-cpu operation. smp_store_release() is paired with
> > + * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
>
> s/smp_load_acquire()/smp_read_barrier_depends()/
Will update.
> s/smp_store_release()/smp_mb()/ if you accept my next comment.
>
> > + * zeroing is visible to all percpu accesses which can see the
> > + * following PCPU_REF_DEAD clearing.
> > + */
> > + for_each_possible_cpu(cpu)
> > + *per_cpu_ptr(pcpu_count, cpu) = 0;
> > +
> > + smp_store_release(&ref->pcpu_count_ptr,
> > + ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
>
> I think it would be better if smp_mb() is used.
smp_wmb() would be better here. We don't need the reader side.
> it is documented that smp_read_barrier_depends() and smp_mb() are paired.
> Not smp_read_barrier_depends() and smp_store_release().
I don't know. I thought about doing that but the RCU accessors are
pairing store_release with read_barrier_depends, so I don't think the
particular paring is problematic and store_release is better at
documenting what's being barriered.
Thanks.
--
tejun
Hey, Paul.
On Wed, Jun 18, 2014 at 07:27:08PM -0700, Paul E. McKenney wrote:
> Yep, smp_load_acquire() orders its load against later loads and stores,
> so it really does need a memory barrier on weakly ordered systems.
Yeap.
> This is the "publish" operation for dynamically allocated per-CPU
> references? If so, agreed, you should be able to rely on dependency
> ordering. Make sure to comment the smp_read_barrier_depends(). ;-)
Definitely, there aren't many things which are more frustrating than
barriers w/o comments explaining their pairing. I'm pairing
store_release with read_barrier_depends as that's what RCU is doing.
Is this the preferred way now? I like the new store_release and
load_acquire as they document what's being barriered better but as Lai
suggested in another reply it does seem a bit unbalanced. I wonder
whether load_acquire_depends would make sense.
Thanks.
--
tejun
On Thu, Jun 19, 2014 at 09:31:04AM -0400, Tejun Heo wrote:
> On Thu, Jun 19, 2014 at 11:01:26AM +0800, Lai Jiangshan wrote:
> > > + /*
> > > + * Restore per-cpu operation. smp_store_release() is paired with
> > > + * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
> >
> > s/smp_load_acquire()/smp_read_barrier_depends()/
>
> Will update.
>
> > s/smp_store_release()/smp_mb()/ if you accept my next comment.
> >
> > > + * zeroing is visible to all percpu accesses which can see the
> > > + * following PCPU_REF_DEAD clearing.
> > > + */
> > > + for_each_possible_cpu(cpu)
> > > + *per_cpu_ptr(pcpu_count, cpu) = 0;
> > > +
> > > + smp_store_release(&ref->pcpu_count_ptr,
> > > + ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
> >
> > I think it would be better if smp_mb() is used.
>
> smp_wmb() would be better here. We don't need the reader side.
>
> > it is documented that smp_read_barrier_depends() and smp_mb() are paired.
> > Not smp_read_barrier_depends() and smp_store_release().
Well, sounds like the documentation needs an update, then. ;-)
For example, current rcu_assign_pointer() is a wrapper around
smp_store_release().
> I don't know. I thought about doing that but the RCU accessors are
> pairing store_release with read_barrier_depends, so I don't think the
> particular paring is problematic and store_release is better at
> documenting what's being barriered.
Which Tejun noted as well.
Thanx, Paul
On Thu, Jun 19, 2014 at 09:36:24AM -0400, Tejun Heo wrote:
> Hey, Paul.
>
> On Wed, Jun 18, 2014 at 07:27:08PM -0700, Paul E. McKenney wrote:
> > Yep, smp_load_acquire() orders its load against later loads and stores,
> > so it really does need a memory barrier on weakly ordered systems.
>
> Yeap.
>
> > This is the "publish" operation for dynamically allocated per-CPU
> > references? If so, agreed, you should be able to rely on dependency
> > ordering. Make sure to comment the smp_read_barrier_depends(). ;-)
>
> Definitely, there aren't many things which are more frustrating than
> barriers w/o comments explaining their pairing. I'm pairing
> store_release with read_barrier_depends as that's what RCU is doing.
> Is this the preferred way now? I like the new store_release and
> load_acquire as they document what's being barriered better but as Lai
> suggested in another reply it does seem a bit unbalanced. I wonder
> whether load_acquire_depends would make sense.
If you mean what I think you mean by load_acquire_depends(), it is spelled
"rcu_dereference()" or, in this case, where you are never removing anything
that has been added, "rcu_dereference_raw()". Because you are never
removing anything, you don't need rcu_read_lock() or rcu_read_unlock(),
thus you don't want lockdep yelling at you about not having RCU read-side
critical sections, thus rcu_dereference_raw().
Thanx, Paul
On Thu, Jun 19, 2014 at 09:55:02AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 09:31:04AM -0400, Tejun Heo wrote:
> > On Thu, Jun 19, 2014 at 11:01:26AM +0800, Lai Jiangshan wrote:
> > > > + /*
> > > > + * Restore per-cpu operation. smp_store_release() is paired with
> > > > + * smp_load_acquire() in __pcpu_ref_alive() and guarantees that the
> > >
> > > s/smp_load_acquire()/smp_read_barrier_depends()/
> >
> > Will update.
> >
> > > s/smp_store_release()/smp_mb()/ if you accept my next comment.
> > >
> > > > + * zeroing is visible to all percpu accesses which can see the
> > > > + * following PCPU_REF_DEAD clearing.
> > > > + */
> > > > + for_each_possible_cpu(cpu)
> > > > + *per_cpu_ptr(pcpu_count, cpu) = 0;
> > > > +
> > > > + smp_store_release(&ref->pcpu_count_ptr,
> > > > + ref->pcpu_count_ptr & ~PCPU_REF_DEAD);
> > >
> > > I think it would be better if smp_mb() is used.
> >
> > smp_wmb() would be better here. We don't need the reader side.
> >
> > > it is documented that smp_read_barrier_depends() and smp_mb() are paired.
> > > Not smp_read_barrier_depends() and smp_store_release().
>
> Well, sounds like the documentation needs an update, then. ;-)
>
> For example, current rcu_assign_pointer() is a wrapper around
> smp_store_release().
>
> > I don't know. I thought about doing that but the RCU accessors are
> > pairing store_release with read_barrier_depends, so I don't think the
> > particular paring is problematic and store_release is better at
> > documenting what's being barriered.
>
> Which Tejun noted as well.
And here is a patch to update the documentation. Thoughts?
Thanx, Paul
------------------------------------------------------------------------
documentation: Add acquire/release barriers to pairing rules
It is possible to pair acquire and release barriers with other barriers,
so this commit adds them to the list in the SMP barrier pairing section.
Reported-by: Lai Jiangshan <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index a6ca533a73fc..2a7c3c4fb53f 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -757,10 +757,12 @@ SMP BARRIER PAIRING
When dealing with CPU-CPU interactions, certain types of memory barrier should
always be paired. A lack of appropriate pairing is almost certainly an error.
-A write barrier should always be paired with a data dependency barrier or read
-barrier, though a general barrier would also be viable. Similarly a read
-barrier or a data dependency barrier should always be paired with at least an
-write barrier, though, again, a general barrier is viable:
+A write barrier should always be paired with a data dependency barrier,
+acquire barrier, release barrier, or read barrier, though a general
+barrier would also be viable. Similarly a read barrier or a data
+dependency barrier should always be paired with at least a write barrier,
+an acquire barrier, or a release barrier, though, again, a general
+barrier is viable:
CPU 1 CPU 2
=============== ===============
Hello,
On Thu, Jun 19, 2014 at 10:05:49AM -0700, Paul E. McKenney wrote:
> If you mean what I think you mean by load_acquire_depends(), it is spelled
> "rcu_dereference()" or, in this case, where you are never removing anything
> that has been added, "rcu_dereference_raw()". Because you are never
> removing anything, you don't need rcu_read_lock() or rcu_read_unlock(),
> thus you don't want lockdep yelling at you about not having RCU read-side
> critical sections, thus rcu_dereference_raw().
Yeah, along that line but it's kinda weird to use rcu_dereference()
when RCU isn't involved. It'd be clearer to have something like
load_acquire_depends() and then define RCU deref in terms of it.
This is purely notational and clarifiying in the documentation is
probably enough.
Thanks.
--
tejun
On Thu, Jun 19, 2014 at 10:03:05AM -0700, Paul E. McKenney wrote:
> documentation: Add acquire/release barriers to pairing rules
>
> It is possible to pair acquire and release barriers with other barriers,
> so this commit adds them to the list in the SMP barrier pairing section.
>
> Reported-by: Lai Jiangshan <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index a6ca533a73fc..2a7c3c4fb53f 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -757,10 +757,12 @@ SMP BARRIER PAIRING
> When dealing with CPU-CPU interactions, certain types of memory barrier should
> always be paired. A lack of appropriate pairing is almost certainly an error.
>
> -A write barrier should always be paired with a data dependency barrier or read
> -barrier, though a general barrier would also be viable. Similarly a read
> -barrier or a data dependency barrier should always be paired with at least an
> -write barrier, though, again, a general barrier is viable:
> +A write barrier should always be paired with a data dependency barrier,
> +acquire barrier, release barrier, or read barrier, though a general
> +barrier would also be viable. Similarly a read barrier or a data
> +dependency barrier should always be paired with at least a write barrier,
> +an acquire barrier, or a release barrier, though, again, a general
> +barrier is viable:
FWIW,
Reviewed-by: Tejun Heo <[email protected]>
Thanks.
--
tejun
On Thu, Jun 19, 2014 at 03:06:00PM -0400, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 19, 2014 at 10:05:49AM -0700, Paul E. McKenney wrote:
> > If you mean what I think you mean by load_acquire_depends(), it is spelled
> > "rcu_dereference()" or, in this case, where you are never removing anything
> > that has been added, "rcu_dereference_raw()". Because you are never
> > removing anything, you don't need rcu_read_lock() or rcu_read_unlock(),
> > thus you don't want lockdep yelling at you about not having RCU read-side
> > critical sections, thus rcu_dereference_raw().
>
> Yeah, along that line but it's kinda weird to use rcu_dereference()
> when RCU isn't involved. It'd be clearer to have something like
> load_acquire_depends() and then define RCU deref in terms of it.
>
> This is purely notational and clarifiying in the documentation is
> probably enough.
OK. If there end up being too many non-RCU uses of rcu_dereference_raw(),
then it might make sense to create a new primitive. But it is not like
we have any shortage of them just now. ;-)
Thanx, Paul
On Tue, Jun 17, 2014 at 09:08:00PM -0400, Tejun Heo wrote:
> ioctx_alloc() reaches inside percpu_ref and directly frees
> ->pcpu_count in its failure path, which is quite gross. percpu_ref
> has been providing a proper interface to do this,
> percpu_ref_cancel_init(), for quite some time now. Let's use that
> instead.
I applied this to my aio-next tree at git://git.kvack.org/~bcrl/aio-next.git .
-ben
> This patch doesn't introduce any behavior changes.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Benjamin LaHaise <[email protected]>
> Cc: Kent Overstreet <[email protected]>
> ---
> fs/aio.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 4f078c0..5e0d7f9 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -715,8 +715,8 @@ err_ctx:
> err:
> mutex_unlock(&ctx->ring_lock);
> free_percpu(ctx->cpu);
> - free_percpu(ctx->reqs.pcpu_count);
> - free_percpu(ctx->users.pcpu_count);
> + percpu_ref_cancel_init(&ctx->reqs);
> + percpu_ref_cancel_init(&ctx->users);
> kmem_cache_free(kioctx_cachep, ctx);
> pr_debug("error allocating ioctx %d\n", err);
> return ERR_PTR(err);
> --
> 1.9.3
--
"Thought is the essence of where you are now."
Hello, Benjamin.
On Wed, Jun 25, 2014 at 10:31:12AM -0400, Benjamin LaHaise wrote:
> On Tue, Jun 17, 2014 at 09:08:00PM -0400, Tejun Heo wrote:
> > ioctx_alloc() reaches inside percpu_ref and directly frees
> > ->pcpu_count in its failure path, which is quite gross. percpu_ref
> > has been providing a proper interface to do this,
> > percpu_ref_cancel_init(), for quite some time now. Let's use that
> > instead.
>
> I applied this to my aio-next tree at git://git.kvack.org/~bcrl/aio-next.git .
It'd probably easier if I route this through percpu tree with the rest
of changes as there are further changes on the aio side (trivial but
still), so we'd end up having to do pulling dance back and forth. The
changes on aio side are mostly trivial. Can I route them through
percpu tree with your acks?
Thanks.
--
tejun
Hi Tejun,
On Wed, Jun 25, 2014 at 10:56:34AM -0400, Tejun Heo wrote:
> Hello, Benjamin.
...
> It'd probably easier if I route this through percpu tree with the rest
> of changes as there are further changes on the aio side (trivial but
> still), so we'd end up having to do pulling dance back and forth. The
> changes on aio side are mostly trivial. Can I route them through
> percpu tree with your acks?
Sure, no problem -- add my acks and proceed. I don't think there should
be any conflicts. Cheers,
-ben
> Thanks.
>
> --
> tejun
--
"Thought is the essence of where you are now."
On Wed, Jun 25, 2014 at 11:35:20AM -0400, Benjamin LaHaise wrote:
> Hi Tejun,
>
> On Wed, Jun 25, 2014 at 10:56:34AM -0400, Tejun Heo wrote:
> > Hello, Benjamin.
> ...
> > It'd probably easier if I route this through percpu tree with the rest
> > of changes as there are further changes on the aio side (trivial but
> > still), so we'd end up having to do pulling dance back and forth. The
> > changes on aio side are mostly trivial. Can I route them through
> > percpu tree with your acks?
>
> Sure, no problem -- add my acks and proceed. I don't think there should
> be any conflicts. Cheers,
Thanks a lot!
--
tejun
On Tue, Jun 17, 2014 at 09:07:59PM -0400, Tejun Heo wrote:
> Hello,
>
> Currently, percpu_ref destruction - freeing the percpu counter -
> happens when the percpu_ref is released. This while a bit more
> convenient restricts how percpu_ref can be used. It can't be
> initialized with static percpu area and it can't be reinitialized
> without going through percpu allocation.
>
> There are use cases which can take advantage of cycling percpu_ref
> through init/release multiple times. This patchset separates out
> percpu_ref destruction into percpu_exit() which should be invoked
> explicitly and introduces percpu_ref_reinit() which can be used to
> recycle a released percpu_ref.
>
> This patchset doesn't add any users. Patchset to use this will soon
> be posted.
>
> This patchset contains the following six patches.
>
> 0001-percpu-refcount-aio-use-percpu_ref_cancel_init-in-io.patch
> 0002-percpu-refcount-one-bit-is-enough-for-REF_STATUS.patch
> 0003-percpu-refcount-add-helpers-for-percpu_count-accesse.patch
> 0004-percpu-refcount-use-unsigned-long-for-pcpu_count-poi.patch
> 0005-percpu-refcount-require-percpu_ref-to-be-exited-expl.patch
> 0006-percpu-refcount-implement-percpu_ref_reinit-and-perc.patch
Applied to percpu/for-3.17.
Thanks.
--
tejun