2015-05-12 12:29:14

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/7 V2] workqueue: cleanup for attr management

Hi,

This is the V2 version of the V1 pathset. But it is just the updated
version of the patch1&2 of the V1 patchset.

[1/5 V1] is split into [1/7 V2] and [2/7 V2].
[2/5 V1] is split into [3,4,5,6,7/7 V2].

[1/7] extends the wq_pool_mutex lock region in the apply_workqueue_attrs().
It is a basic patch for all the later patches except the [6/7 V2].
[2/7] simplifies wq_update_unbound_numa() since lock requirement is eased.

[3/7] introduces get_pwq_unlocked() for reusing existing pwqs
It is also a basic patch for all the later patches except the [6/7 V2].
[4,5,6/7] apply the good code of wq_update_unbound_numa() to
apply_workqueue_attrs(): reuse the unchanged per-node/default pwq
and reuse the preallocated wq_update_unbound_numa_attrs_buf.

[2,3,4,5,6/7] try to make the per-node allocation are the same in
wq_update_unbound_numa() and apply_workqueue_attrs(), but they
are not exactly the same. wq_update_unbound_numa() directly returns
when the pwq unchanged.

[7/7] adds get_node_unbound_pwq() which uses the per-node allocation
behavior of the apply_workqueue_attrs(). So wq_update_unbound_numa()
has some overhead introduced. But cpu-hotplug path is cold path,
it is Ok.

It removes some comments without adding the corresponding ones back.

Other changed from V1:
The unneeded comment about the stableness of the wq->unbound_attrs
is removed in [2/7].

alloc_node_unbound_pwq() is renamed to get_node_unbound_pwq().
@use_dfl_when_fail is removed from get_node_unbound_pwq().
get_node_unbound_pwq() is shorter under the help of get_pwq_unlocked().

apply_wqattrs_cleanup() is not protected by wq_pool_mutex

some comment revised as TJ's suguested.

Thanks,
Lai

Cc: Tejun Heo <[email protected]>

Lai Jiangshan (7):
workqueue: wq_pool_mutex protects the attrs-installation
workqueue: simplify wq_update_unbound_numa()
workqueue: introduce get_pwq_unlocked()
workqueue: reuse the current per-node pwq when its attrs unchanged
workqueue: reuse the current default pwq when its attrs unchanged
workqueue: reuse wq_update_unbound_numa_attrs_buf as temporary attrs
workqueue: add get_node_unbound_pwq()

kernel/workqueue.c | 197 ++++++++++++++++++++++++++---------------------------
1 file changed, 95 insertions(+), 102 deletions(-)

--
2.1.0
ge


2015-05-12 12:30:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/7 V2] workqueue: wq_pool_mutex protects the attrs-installation

Current wq_pool_mutex doesn't proctect the attrs-installation, it results
that ->unbound_attrs, ->numa_pwq_tbl[] and ->dfl_pwq can only be accessed
under wq->mutex and causes some inconveniences. Example, wq_update_unbound_numa()
has to acquire wq->mutex before fetching the wq->unbound_attrs->no_numa
and the old_pwq.

attrs-installation is a short operation, so this change will no cause any
latency for other operations which also acquire the wq_pool_mutex.

The only unprotected attrs-installation code is in apply_workqueue_attrs(),
so this patch touches code less than comments.

It is also a preparation patch for next several patches which read
wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq with
only wq_pool_mutex held.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a3915ab..f02b8ad 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -127,6 +127,11 @@ enum {
*
* PR: wq_pool_mutex protected for writes. Sched-RCU protected for reads.
*
+ * PW: wq_pool_mutex and wq->mutex protected for writes. Either for reads.
+ *
+ * PWR: wq_pool_mutex and wq->mutex protected for writes. Either or
+ * sched-RCU for reads.
+ *
* WQ: wq->mutex protected.
*
* WR: wq->mutex protected for writes. Sched-RCU protected for reads.
@@ -247,8 +252,8 @@ struct workqueue_struct {
int nr_drainers; /* WQ: drain in progress */
int saved_max_active; /* WQ: saved pwq max_active */

- struct workqueue_attrs *unbound_attrs; /* WQ: only for unbound wqs */
- struct pool_workqueue *dfl_pwq; /* WQ: only for unbound wqs */
+ struct workqueue_attrs *unbound_attrs; /* PW: only for unbound wqs */
+ struct pool_workqueue *dfl_pwq; /* PW: only for unbound wqs */

#ifdef CONFIG_SYSFS
struct wq_device *wq_dev; /* I: for sysfs interface */
@@ -268,7 +273,7 @@ struct workqueue_struct {
/* hot fields used during command issue, aligned to cacheline */
unsigned int flags ____cacheline_aligned; /* WQ: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwqs */
- struct pool_workqueue __rcu *numa_pwq_tbl[]; /* FR: unbound pwqs indexed by node */
+ struct pool_workqueue __rcu *numa_pwq_tbl[]; /* PWR: unbound pwqs indexed by node */
};

static struct kmem_cache *pwq_cache;
@@ -349,6 +354,12 @@ static void workqueue_sysfs_unregister(struct workqueue_struct *wq);
lockdep_is_held(&wq->mutex), \
"sched RCU or wq->mutex should be held")

+#define assert_rcu_or_wq_mutex_or_pool_mutex(wq) \
+ rcu_lockdep_assert(rcu_read_lock_sched_held() || \
+ lockdep_is_held(&wq->mutex) || \
+ lockdep_is_held(&wq_pool_mutex), \
+ "sched RCU, wq->mutex or wq_pool_mutex should be held")
+
#define for_each_cpu_worker_pool(pool, cpu) \
for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0]; \
(pool) < &per_cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
@@ -553,7 +564,8 @@ static int worker_pool_assign_id(struct worker_pool *pool)
* @wq: the target workqueue
* @node: the node ID
*
- * This must be called either with pwq_lock held or sched RCU read locked.
+ * This must be called with any of wq_pool_mutex, wq->mutex or sched RCU
+ * read locked.
* If the pwq needs to be used beyond the locking in effect, the caller is
* responsible for guaranteeing that the pwq stays online.
*
@@ -562,7 +574,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
static struct pool_workqueue *unbound_pwq_by_node(struct workqueue_struct *wq,
int node)
{
- assert_rcu_or_wq_mutex(wq);
+ assert_rcu_or_wq_mutex_or_pool_mutex(wq);
return rcu_dereference_raw(wq->numa_pwq_tbl[node]);
}

@@ -3479,6 +3491,7 @@ static struct pool_workqueue *numa_pwq_tbl_install(struct workqueue_struct *wq,
{
struct pool_workqueue *old_pwq;

+ lockdep_assert_held(&wq_pool_mutex);
lockdep_assert_held(&wq->mutex);

/* link_pwq() can handle duplicate calls */
@@ -3644,10 +3657,9 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
* pwqs accordingly.
*/
get_online_cpus();
-
mutex_lock(&wq_pool_mutex);
+
ctx = apply_wqattrs_prepare(wq, attrs);
- mutex_unlock(&wq_pool_mutex);

/* the ctx has been prepared successfully, let's commit it */
if (ctx) {
@@ -3655,6 +3667,7 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
ret = 0;
}

+ mutex_unlock(&wq_pool_mutex);
put_online_cpus();

apply_wqattrs_cleanup(ctx);
--
2.1.0

2015-05-12 12:30:24

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/7 V2] workqueue: simplify wq_update_unbound_numa()

wq_update_unbound_numa() is known be called with wq_pool_mutex held.

But wq_update_unbound_numa() requests wq->mutex before reading
wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq. But these fields
were changed to be allowed being read with wq_pool_mutex held. So we
simply remove the mutex_lock(&wq->mutex).

Without the dependence on the the mutex_lock(&wq->mutex), the test
of wq->unbound_attrs->no_numa can also be moved upward.

The old code need a long comment to describe the stableness of
@wq->unbound_attrs which is also guaranteed by wq_pool_mutex now,
so we don't need this such comment.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f02b8ad..c8b9de0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3708,7 +3708,8 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,

lockdep_assert_held(&wq_pool_mutex);

- if (!wq_numa_enabled || !(wq->flags & WQ_UNBOUND))
+ if (!wq_numa_enabled || !(wq->flags & WQ_UNBOUND) ||
+ wq->unbound_attrs->no_numa)
return;

/*
@@ -3719,10 +3720,6 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
target_attrs = wq_update_unbound_numa_attrs_buf;
cpumask = target_attrs->cpumask;

- mutex_lock(&wq->mutex);
- if (wq->unbound_attrs->no_numa)
- goto out_unlock;
-
copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
pwq = unbound_pwq_by_node(wq, node);

@@ -3734,33 +3731,26 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
*/
if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
- goto out_unlock;
+ return;
} else {
goto use_dfl_pwq;
}

- mutex_unlock(&wq->mutex);
-
/* create a new pwq */
pwq = alloc_unbound_pwq(wq, target_attrs);
if (!pwq) {
pr_warn("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n",
wq->name);
- mutex_lock(&wq->mutex);
goto use_dfl_pwq;
}

- /*
- * Install the new pwq. As this function is called only from CPU
- * hotplug callbacks and applying a new attrs is wrapped with
- * get/put_online_cpus(), @wq->unbound_attrs couldn't have changed
- * inbetween.
- */
+ /* Install the new pwq. */
mutex_lock(&wq->mutex);
old_pwq = numa_pwq_tbl_install(wq, node, pwq);
goto out_unlock;

use_dfl_pwq:
+ mutex_lock(&wq->mutex);
spin_lock_irq(&wq->dfl_pwq->pool->lock);
get_pwq(wq->dfl_pwq);
spin_unlock_irq(&wq->dfl_pwq->pool->lock);
--
2.1.0

2015-05-12 12:30:20

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/7 V2] workqueue: introduce get_pwq_unlocked()

attrs management code may reuse existed pwq and it has open code
to do "lock();get_pwq();unlock()", we move this open code into
get_pwq_unlocked().

get_pwq_unlocked() will also be used in later patches to allow
apply_wqattrs_prepare() to resue the original default or per-node pwq.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c8b9de0..0fa352d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1067,6 +1067,24 @@ static void put_pwq(struct pool_workqueue *pwq)
}

/**
+ * get_pwq_unlocked - get_pwq() with surrounding pool lock/unlock
+ * @pwq: pool_workqueue to get (should not %NULL)
+ *
+ * get_pwq() with locking. The caller should have at least an owned
+ * reference on @pwq to match the guarantees required by get_pwq().
+ *
+ * Return itsefl for allowing chained expressions.
+ */
+static struct pool_workqueue *get_pwq_unlocked(struct pool_workqueue *pwq)
+{
+ spin_lock_irq(&pwq->pool->lock);
+ get_pwq(pwq);
+ spin_unlock_irq(&pwq->pool->lock);
+
+ return pwq;
+}
+
+/**
* put_pwq_unlocked - put_pwq() with surrounding pool lock/unlock
* @pwq: pool_workqueue to put (can be %NULL)
*
@@ -3733,7 +3751,8 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
return;
} else {
- goto use_dfl_pwq;
+ pwq = get_pwq_unlocked(wq->dfl_pwq);
+ goto install;
}

/* create a new pwq */
@@ -3741,21 +3760,13 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
if (!pwq) {
pr_warn("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n",
wq->name);
- goto use_dfl_pwq;
+ pwq = get_pwq_unlocked(wq->dfl_pwq);
}

+install:
/* Install the new pwq. */
mutex_lock(&wq->mutex);
old_pwq = numa_pwq_tbl_install(wq, node, pwq);
- goto out_unlock;
-
-use_dfl_pwq:
- mutex_lock(&wq->mutex);
- spin_lock_irq(&wq->dfl_pwq->pool->lock);
- get_pwq(wq->dfl_pwq);
- spin_unlock_irq(&wq->dfl_pwq->pool->lock);
- old_pwq = numa_pwq_tbl_install(wq, node, wq->dfl_pwq);
-out_unlock:
mutex_unlock(&wq->mutex);
put_pwq_unlocked(old_pwq);
}
--
2.1.0

2015-05-12 12:29:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 4/7 V2] workqueue: reuse the current per-node pwq when its attrs unchanged

If the cpuamsk is changed, it is possible that only a part of the per-node
pwq is affected. This can happen when the user changes the cpumask of
a workqueue or the low level cpumask.

So we try to reuse the current per-node pwq when its attrs unchanged.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0fa352d..24e5fe4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3552,6 +3552,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
{
struct apply_wqattrs_ctx *ctx;
struct workqueue_attrs *new_attrs, *tmp_attrs;
+ struct pool_workqueue *pwq;
int node;

lockdep_assert_held(&wq_pool_mutex);
@@ -3592,9 +3593,14 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,

for_each_node(node) {
if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
- ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
- if (!ctx->pwq_tbl[node])
+ pwq = unbound_pwq_by_node(wq, node);
+ if (pwq && wqattrs_equal(tmp_attrs, pwq->pool->attrs))
+ pwq = get_pwq_unlocked(pwq);
+ else
+ pwq = alloc_unbound_pwq(wq, tmp_attrs);
+ if (!pwq)
goto out_free;
+ ctx->pwq_tbl[node] = pwq;
} else {
ctx->dfl_pwq->refcnt++;
ctx->pwq_tbl[node] = ctx->dfl_pwq;
@@ -3739,7 +3745,6 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
cpumask = target_attrs->cpumask;

copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
- pwq = unbound_pwq_by_node(wq, node);

/*
* Let's determine what needs to be done. If the target cpumask is
@@ -3748,6 +3753,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
* equals the default pwq's, the default pwq should be used.
*/
if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
+ pwq = unbound_pwq_by_node(wq, node);
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
return;
} else {
--
2.1.0

2015-05-12 12:29:22

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 5/7 V2] workqueue: reuse the current default pwq when its attrs unchanged

When apply_wqattrs_prepare() is called, it is possible that the default
pwq is unaffected. It is always true that only the NUMA affinity is being
changed and sometimes true that the low level cpumask is being changed.

So we try to reuse the current default pwq when its attrs unchanged.

After this change, "ctx->dfl_pwq->refcnt++" could be dangerous
when ctx->dfl_pwq is being reused, so we use get_pwq_unlocked() instead.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 24e5fe4..61f8ace 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3587,7 +3587,10 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
* the default pwq covering whole @attrs->cpumask. Always create
* it even if we don't use it immediately.
*/
- ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs);
+ if (wq->dfl_pwq && wqattrs_equal(new_attrs, wq->dfl_pwq->pool->attrs))
+ ctx->dfl_pwq = get_pwq_unlocked(wq->dfl_pwq);
+ else
+ ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs);
if (!ctx->dfl_pwq)
goto out_free;

@@ -3602,8 +3605,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
goto out_free;
ctx->pwq_tbl[node] = pwq;
} else {
- ctx->dfl_pwq->refcnt++;
- ctx->pwq_tbl[node] = ctx->dfl_pwq;
+ ctx->pwq_tbl[node] = get_pwq_unlocked(ctx->dfl_pwq);
}
}

--
2.1.0

2015-05-12 12:29:20

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 6/7 V2] workqueue: reuse wq_update_unbound_numa_attrs_buf as temporary attrs

tmp_attrs is just temporary attrs, we can use wq_update_unbound_numa_attrs_buf
for it like wq_update_unbound_numa();

This change also avoids frequently alloc/free the tmp_attrs when
the low level cpumask is being updated.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 61f8ace..6426d6e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -295,7 +295,7 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);

static bool wq_numa_enabled; /* unbound NUMA affinity enabled */

-/* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug exclusion */
+/* PL: buf for apply_wqattrs_prepare() and wq_update_unbound_numa() */
static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;

static DEFINE_MUTEX(wq_pool_mutex); /* protects pools and workqueues list */
@@ -3561,11 +3561,16 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
GFP_KERNEL);

new_attrs = alloc_workqueue_attrs(GFP_KERNEL);
- tmp_attrs = alloc_workqueue_attrs(GFP_KERNEL);
- if (!ctx || !new_attrs || !tmp_attrs)
+ if (!ctx || !new_attrs)
goto out_free;

/*
+ * We don't need to alloc/free temporary attrs. Let's use a
+ * preallocated one. The following buf is protected by wq_pool_mutex.
+ */
+ tmp_attrs = wq_update_unbound_numa_attrs_buf;
+
+ /*
* Calculate the attrs of the default pwq.
* If the user configured cpumask doesn't overlap with the
* wq_unbound_cpumask, we fallback to the wq_unbound_cpumask.
@@ -3615,11 +3620,9 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
ctx->attrs = new_attrs;

ctx->wq = wq;
- free_workqueue_attrs(tmp_attrs);
return ctx;

out_free:
- free_workqueue_attrs(tmp_attrs);
free_workqueue_attrs(new_attrs);
apply_wqattrs_cleanup(ctx);
return NULL;
@@ -3741,7 +3744,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
/*
* We don't wanna alloc/free wq_attrs for each wq for each CPU.
* Let's use a preallocated one. The following buf is protected by
- * CPU hotplug exclusion.
+ * wq_pool_mutex.
*/
target_attrs = wq_update_unbound_numa_attrs_buf;
cpumask = target_attrs->cpumask;
--
2.1.0

2015-05-12 12:29:29

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 7/7 V2] workqueue: add get_node_unbound_pwq()

The calculation and allocation of per-node pwq code in
apply_workqueue_attrs() and wq_update_unbound_numa() are highly similar,
we merge them into get_node_unbound_pwq().

wq_calc_node_mask() is only used for this calculation and allocation,
so it is also moved into get_node_unbound_pwq().

The only changed behavior (overhead introduced!)
Any reusage of old pwq will introduce get_pwq()/put_pwq() and
corresponding lock overhead. The behavior of apply_wqattrs_prepare()
is still unchanged, but the wq_update_unbound_numa() is changed
when the current node pwq is reused. Comparing to the old behavior,
wq_update_unbound_numa() introduces 3 pairs of lock()/unlock()
operations and overhead when the pwq is unchanged. Although
cpu-hotplug is cold path, but this case is likely true in
the cpu-hotplug path.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 126 ++++++++++++++++++++---------------------------------
1 file changed, 47 insertions(+), 79 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6426d6e..ed228ef64 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -295,7 +295,7 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);

static bool wq_numa_enabled; /* unbound NUMA affinity enabled */

-/* PL: buf for apply_wqattrs_prepare() and wq_update_unbound_numa() */
+/* PL: buf for alloc_node_unbound_pwq() */
static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;

static DEFINE_MUTEX(wq_pool_mutex); /* protects pools and workqueues list */
@@ -3458,32 +3458,45 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
}

/**
- * wq_calc_node_mask - calculate a wq_attrs' cpumask for the specified node
- * @attrs: the wq_attrs of the default pwq of the target workqueue
+ * get_node_unbound_pwq - get a pwq for the specified node
+ * @wq: the target workqueue
+ * @numa: NUMA affinity
* @node: the target NUMA node
* @cpu_going_down: if >= 0, the CPU to consider as offline
- * @cpumask: outarg, the resulting cpumask
+ * @dfl_pwq: the allocated default pwq
*
- * Calculate the cpumask a workqueue with @attrs should use on @node. If
- * @cpu_going_down is >= 0, that cpu is considered offline during
- * calculation. The result is stored in @cpumask.
+ * Allocate or reuse a pwq with the cpumask that @wq should use on @node.
*
- * If NUMA affinity is not enabled, @attrs->cpumask is always used. If
- * enabled and @node has online CPUs requested by @attrs, the returned
- * cpumask is the intersection of the possible CPUs of @node and
- * @attrs->cpumask.
+ * If NUMA affinity is not enabled, @dfl_pwq is always used. If
+ * enabled and @node has online CPUs requested by @dfl_pwq->pool->attrs,
+ * the cpumask is the intersection of the possible CPUs of @node and
+ * the cpumask of @dfl_pwq->pool->attrs. If @cpu_going_down is >= 0,
+ * that cpu is considered offline during calculation.
*
* The caller is responsible for ensuring that the cpumask of @node stays
* stable.
*
- * Return: %true if the resulting @cpumask is different from @attrs->cpumask,
- * %false if equal.
+ * Return: valid pwq, it might be @dfl_pwq under some conditions
+ * or might be the current pwq of the @node.
+ * NULL, when the allocation failed.
*/
-static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
- int cpu_going_down, cpumask_t *cpumask)
+static struct pool_workqueue *
+get_node_unbound_pwq(struct workqueue_struct *wq, bool numa, int node,
+ int cpu_going_down, struct pool_workqueue *dfl_pwq)
+
{
- if (!wq_numa_enabled || attrs->no_numa)
- goto use_dfl;
+ struct pool_workqueue *pwq = unbound_pwq_by_node(wq, node);
+ const struct workqueue_attrs *attrs = dfl_pwq->pool->attrs;
+ struct workqueue_attrs *tmp_attrs = wq_update_unbound_numa_attrs_buf;
+ cpumask_t *cpumask;
+
+ lockdep_assert_held(&wq_pool_mutex);
+
+ if (!wq_numa_enabled || !numa)
+ return get_pwq_unlocked(dfl_pwq);
+
+ copy_workqueue_attrs(tmp_attrs, attrs);
+ cpumask = tmp_attrs->cpumask;

/* does @node have any online CPUs @attrs wants? */
cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
@@ -3491,15 +3504,18 @@ static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
cpumask_clear_cpu(cpu_going_down, cpumask);

if (cpumask_empty(cpumask))
- goto use_dfl;
+ return get_pwq_unlocked(dfl_pwq);

- /* yeap, return possible CPUs in @node that @attrs wants */
+ /* yeap, use possible CPUs in @node that @attrs wants */
cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
- return !cpumask_equal(cpumask, attrs->cpumask);
+ if (cpumask_equal(cpumask, attrs->cpumask))
+ return get_pwq_unlocked(dfl_pwq);

-use_dfl:
- cpumask_copy(cpumask, attrs->cpumask);
- return false;
+ /* try to reuse the current pwq */
+ if (pwq && wqattrs_equal(tmp_attrs, pwq->pool->attrs))
+ return get_pwq_unlocked(pwq);
+
+ return alloc_unbound_pwq(wq, tmp_attrs);
}

/* install @pwq into @wq's numa_pwq_tbl[] for @node and return the old pwq */
@@ -3551,7 +3567,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
const struct workqueue_attrs *attrs)
{
struct apply_wqattrs_ctx *ctx;
- struct workqueue_attrs *new_attrs, *tmp_attrs;
+ struct workqueue_attrs *new_attrs;
struct pool_workqueue *pwq;
int node;

@@ -3565,12 +3581,6 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
goto out_free;

/*
- * We don't need to alloc/free temporary attrs. Let's use a
- * preallocated one. The following buf is protected by wq_pool_mutex.
- */
- tmp_attrs = wq_update_unbound_numa_attrs_buf;
-
- /*
* Calculate the attrs of the default pwq.
* If the user configured cpumask doesn't overlap with the
* wq_unbound_cpumask, we fallback to the wq_unbound_cpumask.
@@ -3581,13 +3591,6 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask);

/*
- * We may create multiple pwqs with differing cpumasks. Make a
- * copy of @new_attrs which will be modified and used to obtain
- * pools.
- */
- copy_workqueue_attrs(tmp_attrs, new_attrs);
-
- /*
* If something goes wrong during CPU up/down, we'll fall back to
* the default pwq covering whole @attrs->cpumask. Always create
* it even if we don't use it immediately.
@@ -3600,18 +3603,11 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
goto out_free;

for_each_node(node) {
- if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
- pwq = unbound_pwq_by_node(wq, node);
- if (pwq && wqattrs_equal(tmp_attrs, pwq->pool->attrs))
- pwq = get_pwq_unlocked(pwq);
- else
- pwq = alloc_unbound_pwq(wq, tmp_attrs);
- if (!pwq)
- goto out_free;
- ctx->pwq_tbl[node] = pwq;
- } else {
- ctx->pwq_tbl[node] = get_pwq_unlocked(ctx->dfl_pwq);
- }
+ pwq = get_node_unbound_pwq(wq, !attrs->no_numa, node, -1,
+ ctx->dfl_pwq);
+ if (!pwq)
+ goto out_free;
+ ctx->pwq_tbl[node] = pwq;
}

/* save the user configured attrs and sanitize it. */
@@ -3731,9 +3727,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
{
int node = cpu_to_node(cpu);
int cpu_off = online ? -1 : cpu;
- struct pool_workqueue *old_pwq = NULL, *pwq;
- struct workqueue_attrs *target_attrs;
- cpumask_t *cpumask;
+ struct pool_workqueue *old_pwq, *pwq;

lockdep_assert_held(&wq_pool_mutex);

@@ -3741,40 +3735,14 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
wq->unbound_attrs->no_numa)
return;

- /*
- * We don't wanna alloc/free wq_attrs for each wq for each CPU.
- * Let's use a preallocated one. The following buf is protected by
- * wq_pool_mutex.
- */
- target_attrs = wq_update_unbound_numa_attrs_buf;
- cpumask = target_attrs->cpumask;
-
- copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
-
- /*
- * Let's determine what needs to be done. If the target cpumask is
- * different from the default pwq's, we need to compare it to @pwq's
- * and create a new one if they don't match. If the target cpumask
- * equals the default pwq's, the default pwq should be used.
- */
- if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
- pwq = unbound_pwq_by_node(wq, node);
- if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
- return;
- } else {
- pwq = get_pwq_unlocked(wq->dfl_pwq);
- goto install;
- }
-
/* create a new pwq */
- pwq = alloc_unbound_pwq(wq, target_attrs);
+ pwq = get_node_unbound_pwq(wq, true, node, cpu_off, wq->dfl_pwq);
if (!pwq) {
pr_warn("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n",
wq->name);
pwq = get_pwq_unlocked(wq->dfl_pwq);
}

-install:
/* Install the new pwq. */
mutex_lock(&wq->mutex);
old_pwq = numa_pwq_tbl_install(wq, node, pwq);
--
2.1.0

2015-05-18 00:35:56

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 0/7 V2] workqueue: cleanup for attr management

ping

On 05/12/2015 08:32 PM, Lai Jiangshan wrote:
> Hi,
>
> This is the V2 version of the V1 pathset. But it is just the updated
> version of the patch1&2 of the V1 patchset.
>
> [1/5 V1] is split into [1/7 V2] and [2/7 V2].
> [2/5 V1] is split into [3,4,5,6,7/7 V2].
>
> [1/7] extends the wq_pool_mutex lock region in the apply_workqueue_attrs().
> It is a basic patch for all the later patches except the [6/7 V2].
> [2/7] simplifies wq_update_unbound_numa() since lock requirement is eased.
>
> [3/7] introduces get_pwq_unlocked() for reusing existing pwqs
> It is also a basic patch for all the later patches except the [6/7 V2].
> [4,5,6/7] apply the good code of wq_update_unbound_numa() to
> apply_workqueue_attrs(): reuse the unchanged per-node/default pwq
> and reuse the preallocated wq_update_unbound_numa_attrs_buf.
>
> [2,3,4,5,6/7] try to make the per-node allocation are the same in
> wq_update_unbound_numa() and apply_workqueue_attrs(), but they
> are not exactly the same. wq_update_unbound_numa() directly returns
> when the pwq unchanged.
>
> [7/7] adds get_node_unbound_pwq() which uses the per-node allocation
> behavior of the apply_workqueue_attrs(). So wq_update_unbound_numa()
> has some overhead introduced. But cpu-hotplug path is cold path,
> it is Ok.
>
> It removes some comments without adding the corresponding ones back.
>
> Other changed from V1:
> The unneeded comment about the stableness of the wq->unbound_attrs
> is removed in [2/7].
>
> alloc_node_unbound_pwq() is renamed to get_node_unbound_pwq().
> @use_dfl_when_fail is removed from get_node_unbound_pwq().
> get_node_unbound_pwq() is shorter under the help of get_pwq_unlocked().
>
> apply_wqattrs_cleanup() is not protected by wq_pool_mutex
>
> some comment revised as TJ's suguested.
>
> Thanks,
> Lai
>
> Cc: Tejun Heo <[email protected]>
>
> Lai Jiangshan (7):
> workqueue: wq_pool_mutex protects the attrs-installation
> workqueue: simplify wq_update_unbound_numa()
> workqueue: introduce get_pwq_unlocked()
> workqueue: reuse the current per-node pwq when its attrs unchanged
> workqueue: reuse the current default pwq when its attrs unchanged
> workqueue: reuse wq_update_unbound_numa_attrs_buf as temporary attrs
> workqueue: add get_node_unbound_pwq()
>
> kernel/workqueue.c | 197 ++++++++++++++++++++++++++---------------------------
> 1 file changed, 95 insertions(+), 102 deletions(-)
>

2015-05-18 01:26:08

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/7 V2] workqueue: cleanup for attr management

On Mon, May 18, 2015 at 08:39:21AM +0800, Lai Jiangshan wrote:
> ping

Does this reflect the comments from the previous review cycle?

--
tejun

2015-05-18 02:03:26

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 0/7 V2] workqueue: cleanup for attr management

On 05/18/2015 09:26 AM, Tejun Heo wrote:
> On Mon, May 18, 2015 at 08:39:21AM +0800, Lai Jiangshan wrote:
>> ping
>
> Does this reflect the comments from the previous review cycle?
>


This is the V2 version of the V1 pathset. But it is just the updated
version of the patch1&2 of the V1 patchset.

It doesn't contains the fix-up patch for wq_[nice|cpumask|numa]_store(),
so I can say it reflects all the comments except the name of the function
"get_node_unbound_pwq()" (patch was sent earlier than your replied).
(I wish I can get more comments before the next version).

The fix-up patch for wq_[nice|cpumask|numa]_store() is so important,
should I directly send a patchset for it (including the patch1&2 of this V2 patchset)?
(and delay or even drop the "get_alloc_node_unbound_pwq()").

Thanks,
Lai.

2015-05-18 20:21:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/7 V2] workqueue: simplify wq_update_unbound_numa()

On Tue, May 12, 2015 at 08:32:30PM +0800, Lai Jiangshan wrote:
> wq_update_unbound_numa() is known be called with wq_pool_mutex held.
>
> But wq_update_unbound_numa() requests wq->mutex before reading
> wq->unbound_attrs, wq->numa_pwq_tbl[] and wq->dfl_pwq. But these fields
> were changed to be allowed being read with wq_pool_mutex held. So we
> simply remove the mutex_lock(&wq->mutex).
>
> Without the dependence on the the mutex_lock(&wq->mutex), the test
> of wq->unbound_attrs->no_numa can also be moved upward.
>
> The old code need a long comment to describe the stableness of
> @wq->unbound_attrs which is also guaranteed by wq_pool_mutex now,
> so we don't need this such comment.
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Applied 1-2 to wq/for-4.2.

Thanks.

--
tejun

2015-05-18 20:28:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7 V2] workqueue: introduce get_pwq_unlocked()

Hello, Lai.

On Tue, May 12, 2015 at 08:32:31PM +0800, Lai Jiangshan wrote:
> attrs management code may reuse existed pwq and it has open code

existing

> to do "lock();get_pwq();unlock()", we move this open code into
> get_pwq_unlocked().
>
> get_pwq_unlocked() will also be used in later patches to allow
> apply_wqattrs_prepare() to resue the original default or per-node pwq.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
...
> +install:
> /* Install the new pwq. */
> mutex_lock(&wq->mutex);
> old_pwq = numa_pwq_tbl_install(wq, node, pwq);
> - goto out_unlock;
> -
> -use_dfl_pwq:
> - mutex_lock(&wq->mutex);
> - spin_lock_irq(&wq->dfl_pwq->pool->lock);
> - get_pwq(wq->dfl_pwq);
> - spin_unlock_irq(&wq->dfl_pwq->pool->lock);
> - old_pwq = numa_pwq_tbl_install(wq, node, wq->dfl_pwq);
> -out_unlock:
> mutex_unlock(&wq->mutex);
> put_pwq_unlocked(old_pwq);
> }

This doens't strike me as better. If we have other places where the
unlocked get is used, maybe, but that's not the case.

Thanks.

--
tejun

2015-05-18 20:34:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/7 V2] workqueue: reuse the current per-node pwq when its attrs unchanged

On Tue, May 12, 2015 at 08:32:32PM +0800, Lai Jiangshan wrote:
> If the cpuamsk is changed, it is possible that only a part of the per-node

cpumask

> pwq is affected. This can happen when the user changes the cpumask of
> a workqueue or the low level cpumask.
>
> So we try to reuse the current per-node pwq when its attrs unchanged.

are unchanged.

> @@ -3592,9 +3593,14 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
>
> for_each_node(node) {
> if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
> - ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
> - if (!ctx->pwq_tbl[node])
> + pwq = unbound_pwq_by_node(wq, node);
> + if (pwq && wqattrs_equal(tmp_attrs, pwq->pool->attrs))
> + pwq = get_pwq_unlocked(pwq);

Ah, okay, the function gets used here again. BTW, why does this
function return anything? Can this function ever return something
which isn't the pwq it was called with?

> + else
> + pwq = alloc_unbound_pwq(wq, tmp_attrs);
> + if (!pwq)
> goto out_free;

If get_pwq_unlocked() can't fail, why are we testing for NULL pwq
here? This code is kinda misleading.

> + ctx->pwq_tbl[node] = pwq;
> } else {
> ctx->dfl_pwq->refcnt++;
> ctx->pwq_tbl[node] = ctx->dfl_pwq;
> @@ -3739,7 +3745,6 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
> cpumask = target_attrs->cpumask;
>
> copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
> - pwq = unbound_pwq_by_node(wq, node);
>
> /*
> * Let's determine what needs to be done. If the target cpumask is
> @@ -3748,6 +3753,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
> * equals the default pwq's, the default pwq should be used.
> */
> if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
> + pwq = unbound_pwq_by_node(wq, node);

It'd be nice to note this change in the patch description.

Thanks.

--
tejun

2015-05-18 20:38:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 5/7 V2] workqueue: reuse the current default pwq when its attrs unchanged

Hello,

On Tue, May 12, 2015 at 08:32:33PM +0800, Lai Jiangshan wrote:
> When apply_wqattrs_prepare() is called, it is possible that the default
> pwq is unaffected. It is always true that only the NUMA affinity is being
> changed and sometimes true that the low level cpumask is being changed.
>
> So we try to reuse the current default pwq when its attrs unchanged.
>
> After this change, "ctx->dfl_pwq->refcnt++" could be dangerous
> when ctx->dfl_pwq is being reused, so we use get_pwq_unlocked() instead.

Can you elaborate the above paragraph? Why wouldn't that be dangerous
before this change?

Thanks.

--
tejun

2015-05-18 20:41:18

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7 V2] workqueue: introduce get_pwq_unlocked()

Hello,

So, it looks like we'll need get_pwq_unlocked(). Reviews below.

On Tue, May 12, 2015 at 08:32:31PM +0800, Lai Jiangshan wrote:
> attrs management code may reuse existed pwq and it has open code
> to do "lock();get_pwq();unlock()", we move this open code into
> get_pwq_unlocked().
>
> get_pwq_unlocked() will also be used in later patches to allow
> apply_wqattrs_prepare() to resue the original default or per-node pwq.

reuse

> /**
> + * get_pwq_unlocked - get_pwq() with surrounding pool lock/unlock
> + * @pwq: pool_workqueue to get (should not %NULL)
> + *
> + * get_pwq() with locking. The caller should have at least an owned
> + * reference on @pwq to match the guarantees required by get_pwq().
> + *
> + * Return itsefl for allowing chained expressions.
> + */
> +static struct pool_workqueue *get_pwq_unlocked(struct pool_workqueue *pwq)
> +{
> + spin_lock_irq(&pwq->pool->lock);
> + get_pwq(pwq);
> + spin_unlock_irq(&pwq->pool->lock);
> +
> + return pwq;
> +}

As I mentioned before, please drop the return value. It may be
tempting to do this to match the pattern with alloc and whatnot but
these things end badly in the long term. Please stick to what's
necessary. The function can't fail. Make its return type void.

Thanks.

--
tejun

2015-05-25 09:44:25

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 4/7 V2] workqueue: reuse the current per-node pwq when its attrs unchanged

Hi, TJ

The patch 4/5/6 does reduce cpu and temporary-memory usage sometimes.
But it is in slow path where small optimization is commonly unwelcome at.

Do I need to refactor the patches? I'm in doubt for the necessary.

Thanks,
Lai

2015-05-26 18:52:07

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/7 V2] workqueue: reuse the current per-node pwq when its attrs unchanged

Hello, Lai.

On Mon, May 25, 2015 at 05:47:17PM +0800, Lai Jiangshan wrote:
> The patch 4/5/6 does reduce cpu and temporary-memory usage sometimes.
> But it is in slow path where small optimization is commonly unwelcome at.
>
> Do I need to refactor the patches? I'm in doubt for the necessary.

I like avoiding creating identical pwqs at least. For the rest, I'm
okay either way as long as code is not made more complex in the
process.

Thanks.

--
tejun