2015-06-03 15:01:52

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/4 V3] workqueue: avoid creating identical pwqs

Hi,

In some conditions, the default pwq or the per-node pwq might be rebuilt identically,
it causes overhead. The code of this patchset checks the attrs and avoids it.

Patch1 adds get_pwq_unlocked() due to reusing old pwq in more cases
Patch2~3 avoid to create identical pwqs
Patch4 reuse wq_update_unbound_numa_attrs_buf, simple cleanup

changed from V2:
o get_pwq_unlocked() returns void
o move the check of the return value of alloc_unbound_pwq() into its branch
o wq_update_unbound_numa_attrs_buf is renamed
o kill get_alloc_node_unbound_pwq(), may or may not replement it in future

Cc: Tejun Heo <[email protected]>

Lai Jiangshan (4):
workqueue: introduce get_pwq_unlocked()
workqueue: reuse the current per-node pwq when its attrs are unchanged
workqueue: reuse the current default pwq when its attrs unchanged
workqueue: reuse wq_update_unbound_numa_attrs_buf as temporary attrs

kernel/workqueue.c | 94 ++++++++++++++++++++++++++++++++----------------------
1 file changed, 56 insertions(+), 38 deletions(-)

--
2.1.0


2015-06-03 15:02:34

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/4] workqueue: introduce get_pwq_unlocked()

attrs management code may reuse existed pwq and it has open code
to do "lock();get_pwq();unlock()", we move this open code into
get_pwq_unlocked().

get_pwq_unlocked() will also be used in later patches to allow
apply_wqattrs_prepare() to resue the original default or per-node pwq.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index afe7c53..6aa9bd5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1065,6 +1065,20 @@ static void put_pwq(struct pool_workqueue *pwq)
}

/**
+ * get_pwq_unlocked - get_pwq() with surrounding pool lock/unlock
+ * @pwq: pool_workqueue to get (should not %NULL)
+ *
+ * get_pwq() with locking. The caller should have at least an owned
+ * reference on @pwq to match the guarantees required by get_pwq().
+ */
+static void get_pwq_unlocked(struct pool_workqueue *pwq)
+{
+ spin_lock_irq(&pwq->pool->lock);
+ get_pwq(pwq);
+ spin_unlock_irq(&pwq->pool->lock);
+}
+
+/**
* put_pwq_unlocked - put_pwq() with surrounding pool lock/unlock
* @pwq: pool_workqueue to put (can be %NULL)
*
@@ -3715,7 +3729,9 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
return;
} else {
- goto use_dfl_pwq;
+ pwq = wq->dfl_pwq;
+ get_pwq_unlocked(pwq);
+ goto install;
}

/* create a new pwq */
@@ -3723,21 +3739,14 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
if (!pwq) {
pr_warn("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n",
wq->name);
- goto use_dfl_pwq;
+ pwq = wq->dfl_pwq;
+ get_pwq_unlocked(pwq);
}

+install:
/* Install the new pwq. */
mutex_lock(&wq->mutex);
old_pwq = numa_pwq_tbl_install(wq, node, pwq);
- goto out_unlock;
-
-use_dfl_pwq:
- mutex_lock(&wq->mutex);
- spin_lock_irq(&wq->dfl_pwq->pool->lock);
- get_pwq(wq->dfl_pwq);
- spin_unlock_irq(&wq->dfl_pwq->pool->lock);
- old_pwq = numa_pwq_tbl_install(wq, node, wq->dfl_pwq);
-out_unlock:
mutex_unlock(&wq->mutex);
put_pwq_unlocked(old_pwq);
}
--
2.1.0

2015-06-03 15:02:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/4] workqueue: reuse the current per-node pwq when its attrs are unchanged

If the cpumask is changed, it is possible that only a part of the per-node
pwq is affected. This can happen when the user changes the cpumask of
a workqueue or the low level cpumask.

So we try to reuse the current per-node pwq when its attrs are unchanged.

In wq_update_unbound_numa(), we had already made the current pwq be
reused when its attrs are unaffected, but we move the code of fetching
current pwq closer to the code of testing.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6aa9bd5..197520b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3516,6 +3516,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
{
struct apply_wqattrs_ctx *ctx;
struct workqueue_attrs *new_attrs, *tmp_attrs;
+ struct pool_workqueue *pwq;
int node;

lockdep_assert_held(&wq_pool_mutex);
@@ -3556,9 +3557,16 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,

for_each_node(node) {
if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
- ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
- if (!ctx->pwq_tbl[node])
- goto out_free;
+ /* Try to reuse the current one */
+ pwq = unbound_pwq_by_node(wq, node);
+ if (pwq && wqattrs_equal(tmp_attrs, pwq->pool->attrs)) {
+ get_pwq_unlocked(pwq);
+ } else {
+ pwq = alloc_unbound_pwq(wq, tmp_attrs);
+ if (!pwq)
+ goto out_free;
+ }
+ ctx->pwq_tbl[node] = pwq;
} else {
ctx->dfl_pwq->refcnt++;
ctx->pwq_tbl[node] = ctx->dfl_pwq;
@@ -3717,7 +3725,6 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
cpumask = target_attrs->cpumask;

copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
- pwq = unbound_pwq_by_node(wq, node);

/*
* Let's determine what needs to be done. If the target cpumask is
@@ -3726,6 +3733,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
* equals the default pwq's, the default pwq should be used.
*/
if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
+ pwq = unbound_pwq_by_node(wq, node);
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
return;
} else {
--
2.1.0

2015-06-03 15:02:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/4] workqueue: reuse the current default pwq when its attrs unchanged

When apply_wqattrs_prepare() is called, it is possible that the default
pwq is unaffected. It is always true that only the NUMA affinity is being
changed and sometimes true that the low level cpumask is being changed.

So we try to reuse the current default pwq when its attrs unchanged.

After this change, "ctx->dfl_pwq->refcnt++" could be dangerous
when ctx->dfl_pwq is a reusing pwq which may be receiving work items
or processing work items and hurts concurrency [get|put]_pwq(),
so we use get_pwq_unlocked() instead.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 197520b..0c2f819 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3549,11 +3549,17 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
/*
* If something goes wrong during CPU up/down, we'll fall back to
* the default pwq covering whole @attrs->cpumask. Always create
- * it even if we don't use it immediately.
+ * it even if we don't use it immediately. Check and reuse the
+ * current default pwq if the @new_attrs equals the current one.
*/
- ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs);
- if (!ctx->dfl_pwq)
- goto out_free;
+ if (wq->dfl_pwq && wqattrs_equal(new_attrs, wq->dfl_pwq->pool->attrs)) {
+ get_pwq_unlocked(wq->dfl_pwq);
+ ctx->dfl_pwq = wq->dfl_pwq;
+ } else {
+ ctx->dfl_pwq = alloc_unbound_pwq(wq, new_attrs);
+ if (!ctx->dfl_pwq)
+ goto out_free;
+ }

for_each_node(node) {
if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) {
@@ -3568,7 +3574,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
}
ctx->pwq_tbl[node] = pwq;
} else {
- ctx->dfl_pwq->refcnt++;
+ get_pwq_unlocked(ctx->dfl_pwq);
ctx->pwq_tbl[node] = ctx->dfl_pwq;
}
}
--
2.1.0

2015-06-03 15:02:41

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 4/4] workqueue: reuse wq_update_unbound_numa_attrs_buf as temporary attrs

tmp_attrs in apply_wqattrs_prepare() is just temporary attrs, we can use
wq_update_unbound_numa_attrs_buf for it like wq_update_unbound_numa();

The wq_update_unbound_numa_attrs_buf is renamed to wq_calc_node_attrs_buf
since it is used both for apply_wqattrs_prepare() and wq_update_unbound_numa().

The comment for using wq_calc_node_attrs_buf in wq_update_unbound_numa()
is also moved to the defination of the wq_calc_node_attrs_buf.

This change also avoids frequently alloc/free the tmp_attrs for every
workqueue when the low level cpumask is being updated.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 31 +++++++++++++------------------
1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0c2f819..4a40165 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -295,8 +295,13 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);

static bool wq_numa_enabled; /* unbound NUMA affinity enabled */

-/* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug exclusion */
-static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
+/*
+ * PL: resulted attrs of wq_calc_node_cpumask() for apply_wqattrs_prepare()
+ * and wq_update_unbound_numa().
+ * We don't wanna alloc/free temporary attrs for each call. Let's preallocate
+ * one with the access protection of wq_pool_mutex.
+ */
+static struct workqueue_attrs *wq_calc_node_attrs_buf;

static DEFINE_MUTEX(wq_pool_mutex); /* protects pools and workqueues list */
static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */
@@ -3515,7 +3520,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
const struct workqueue_attrs *attrs)
{
struct apply_wqattrs_ctx *ctx;
- struct workqueue_attrs *new_attrs, *tmp_attrs;
+ struct workqueue_attrs *new_attrs, *tmp_attrs = wq_calc_node_attrs_buf;
struct pool_workqueue *pwq;
int node;

@@ -3525,8 +3530,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
GFP_KERNEL);

new_attrs = alloc_workqueue_attrs(GFP_KERNEL);
- tmp_attrs = alloc_workqueue_attrs(GFP_KERNEL);
- if (!ctx || !new_attrs || !tmp_attrs)
+ if (!ctx || !new_attrs)
goto out_free;

/*
@@ -3585,11 +3589,9 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
ctx->attrs = new_attrs;

ctx->wq = wq;
- free_workqueue_attrs(tmp_attrs);
return ctx;

out_free:
- free_workqueue_attrs(tmp_attrs);
free_workqueue_attrs(new_attrs);
apply_wqattrs_cleanup(ctx);
return NULL;
@@ -3713,7 +3715,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
int node = cpu_to_node(cpu);
int cpu_off = online ? -1 : cpu;
struct pool_workqueue *old_pwq = NULL, *pwq;
- struct workqueue_attrs *target_attrs;
+ struct workqueue_attrs *target_attrs = wq_calc_node_attrs_buf;
cpumask_t *cpumask;

lockdep_assert_held(&wq_pool_mutex);
@@ -3722,14 +3724,6 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
wq->unbound_attrs->no_numa)
return;

- /*
- * We don't wanna alloc/free wq_attrs for each wq for each CPU.
- * Let's use a preallocated one. The following buf is protected by
- * CPU hotplug exclusion.
- */
- target_attrs = wq_update_unbound_numa_attrs_buf;
- cpumask = target_attrs->cpumask;
-
copy_workqueue_attrs(target_attrs, wq->unbound_attrs);

/*
@@ -3738,6 +3732,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
* and create a new one if they don't match. If the target cpumask
* equals the default pwq's, the default pwq should be used.
*/
+ cpumask = target_attrs->cpumask;
if (wq_calc_node_cpumask(wq->dfl_pwq->pool->attrs, node, cpu_off, cpumask)) {
pwq = unbound_pwq_by_node(wq, node);
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
@@ -5208,8 +5203,8 @@ static void __init wq_numa_init(void)
return;
}

- wq_update_unbound_numa_attrs_buf = alloc_workqueue_attrs(GFP_KERNEL);
- BUG_ON(!wq_update_unbound_numa_attrs_buf);
+ wq_calc_node_attrs_buf = alloc_workqueue_attrs(GFP_KERNEL);
+ BUG_ON(!wq_calc_node_attrs_buf);

/*
* We want masks of possible CPUs of each node which isn't readily
--
2.1.0

2015-06-04 01:05:08

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/4] workqueue: introduce get_pwq_unlocked()

Hello,

On Wed, Jun 03, 2015 at 10:29:49PM +0800, Lai Jiangshan wrote:
> attrs management code may reuse existed pwq and it has open code

existing

> to do "lock();get_pwq();unlock()", we move this open code into
> get_pwq_unlocked().
>
> get_pwq_unlocked() will also be used in later patches to allow
> apply_wqattrs_prepare() to resue the original default or per-node pwq.

reuse

> /**
> + * get_pwq_unlocked - get_pwq() with surrounding pool lock/unlock
> + * @pwq: pool_workqueue to get (should not %NULL)

should not be %NULL

> + *
> + * get_pwq() with locking. The caller should have at least an owned
> + * reference on @pwq to match the guarantees required by get_pwq().

The caller should have a reference on @pwq.

Thanks.

--
tejun

2015-06-04 01:06:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/4] workqueue: reuse the current per-node pwq when its attrs are unchanged

On Wed, Jun 03, 2015 at 10:29:50PM +0800, Lai Jiangshan wrote:
> If the cpumask is changed, it is possible that only a part of the per-node
> pwq is affected. This can happen when the user changes the cpumask of
> a workqueue or the low level cpumask.
>
> So we try to reuse the current per-node pwq when its attrs are unchanged.
>
> In wq_update_unbound_numa(), we had already made the current pwq be
> reused when its attrs are unaffected, but we move the code of fetching
> current pwq closer to the code of testing.
>
> Signed-off-by: Lai Jiangshan <[email protected]>

This one looks good to me.

Thanks.

--
tejun

2015-06-04 01:09:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 4/4] workqueue: reuse wq_update_unbound_numa_attrs_buf as temporary attrs

Hello,

On Wed, Jun 03, 2015 at 10:29:52PM +0800, Lai Jiangshan wrote:
> tmp_attrs in apply_wqattrs_prepare() is just temporary attrs, we can use
> wq_update_unbound_numa_attrs_buf for it like wq_update_unbound_numa();
>
> The wq_update_unbound_numa_attrs_buf is renamed to wq_calc_node_attrs_buf
> since it is used both for apply_wqattrs_prepare() and wq_update_unbound_numa().
>
> The comment for using wq_calc_node_attrs_buf in wq_update_unbound_numa()
> is also moved to the defination of the wq_calc_node_attrs_buf.
>
> This change also avoids frequently alloc/free the tmp_attrs for every
> workqueue when the low level cpumask is being updated.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/workqueue.c | 31 +++++++++++++------------------
> 1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 0c2f819..4a40165 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -295,8 +295,13 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);
>
> static bool wq_numa_enabled; /* unbound NUMA affinity enabled */
>
> -/* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug exclusion */
> -static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;
> +/*
> + * PL: resulted attrs of wq_calc_node_cpumask() for apply_wqattrs_prepare()
> + * and wq_update_unbound_numa().
> + * We don't wanna alloc/free temporary attrs for each call. Let's preallocate
> + * one with the access protection of wq_pool_mutex.
> + */
> +static struct workqueue_attrs *wq_calc_node_attrs_buf;
>
> static DEFINE_MUTEX(wq_pool_mutex); /* protects pools and workqueues list */
> static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */
> @@ -3515,7 +3520,7 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
> const struct workqueue_attrs *attrs)
> {
> struct apply_wqattrs_ctx *ctx;
> - struct workqueue_attrs *new_attrs, *tmp_attrs;
> + struct workqueue_attrs *new_attrs, *tmp_attrs = wq_calc_node_attrs_buf;

Hmm... this actually bothers me a bit. Can we please do something
like the following?

static struct workqueue_attrs *wq_attrs_shared_buf(void)
{
lockdep_assert_held(...);
return __wq_attrs_shared_buf;
}

--
tejun