LinuxLists.cc - [PATCH 0/3] workqueue: unbound workqueue management Vs hotplug

2014-10-08 03:51:03

Subject: [PATCH 0/3] workqueue: unbound workqueue management Vs hotplug

Hi, TJ

These patches are for unbound workqueue management (hotplug).

This patchset simplify the unbound workqueue management when hotplug.
This is also a preparation patchset for later unbound workqueue management
patches.

Thanks,
Lai.

Lai Jiangshan (3):
workqueue: add wq_unbound_online_cpumask
workqueue: extend wq_pool_mutex to also protect pwq-installation
workqueue: remove get_online_cpus() from apply_workqueue_attrs()

kernel/workqueue.c | 61 ++++++++++++++++++++-------------------------------
1 files changed, 24 insertions(+), 37 deletions(-)

--
1.7.4.4

2014-10-08 03:51:07

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 3/3] workqueue: remove get_online_cpus() from apply_workqueue_attrs()

There are two aims for get_online_cpus():
1) Protects cpumask_of_node(node). (CPUs should stay stable)
2) Protects the pwq-allocation and installation

But both aims are settled by other methods in previous patches:
cpumask_of_node(node) is replaced by wq_unbound_online_cpumask, and
the pwq-allocation and installation are changed to be protected by
wq_pool_mutex. Now the get_online_cpus() is no reason to exist,
remove it!

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 15 ++-------------
1 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9bc3a87..63a8000 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3776,13 +3776,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
*/
copy_workqueue_attrs(tmp_attrs, new_attrs);

- /*
- * CPUs should stay stable across pwq creations and installations.
- * Pin CPUs, determine the target cpumask for each node and create
- * pwqs accordingly.
- */
- get_online_cpus();
-
mutex_lock(&wq_pool_mutex);

/*
@@ -3827,7 +3820,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,

mutex_unlock(&wq_pool_mutex);

- put_online_cpus();
ret = 0;
/* fall through */
out_free:
@@ -3842,7 +3834,6 @@ enomem_pwq:
if (pwq_tbl && pwq_tbl[node] != dfl_pwq)
free_unbound_pwq(pwq_tbl[node]);
mutex_unlock(&wq_pool_mutex);
- put_online_cpus();
enomem:
ret = -ENOMEM;
goto out_free;
@@ -3921,10 +3912,8 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu)
}

/*
- * Install the new pwq. As this function is called only from CPU
- * hotplug callbacks and applying a new attrs is wrapped with
- * get/put_online_cpus(), @wq->unbound_attrs couldn't have changed
- * inbetween.
+ * Install the new pwq. As this function is called with wq_pool_mutex
+ * held, @wq->unbound_attrs couldn't have changed inbetween.
*/
mutex_lock(&wq->mutex);
old_pwq = numa_pwq_tbl_install(wq, node, pwq);
--
1.7.4.4

2014-10-08 03:51:38

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 1/3] workqueue: add wq_unbound_online_cpumask

Current wq_calc_node_cpumask() is complicated by cpumask_of_node(node) whose
value need to be revised before using and the "revising" needs @cpu_going_down
which makes more complicated.

This patch introduces wq_unbound_online_cpumask which is updated before
wq_update_unbound_numa() in the cpu-hotplug callbacks and wq_calc_node_cpumask()
can use it instead of cpumask_of_node(node). Thus wq_calc_node_cpumask()
becomes much simpler and @cpu_going_down is gone.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 42 ++++++++++++++++++++----------------------
1 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5dbe22a..7a217f0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -282,6 +282,9 @@ module_param_named(power_efficient, wq_power_efficient, bool, 0444);

static bool wq_numa_enabled; /* unbound NUMA affinity enabled */

+/* PL: online cpumask for all unbound wqs */
+static struct cpumask wq_unbound_online_cpumask;
+
/* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug exclusion */
static struct workqueue_attrs *wq_update_unbound_numa_attrs_buf;

@@ -3675,12 +3678,9 @@ static void free_unbound_pwq(struct pool_workqueue *pwq)
* wq_calc_node_mask - calculate a wq_attrs' cpumask for the specified node
* @attrs: the wq_attrs of interest
* @node: the target NUMA node
- * @cpu_going_down: if >= 0, the CPU to consider as offline
* @cpumask: outarg, the resulting cpumask
*
- * Calculate the cpumask a workqueue with @attrs should use on @node. If
- * @cpu_going_down is >= 0, that cpu is considered offline during
- * calculation. The result is stored in @cpumask.
+ * Calculate the cpumask a workqueue with @attrs should use on @node.
*
* If NUMA affinity is not enabled, @attrs->cpumask is always used. If
* enabled and @node has online CPUs requested by @attrs, the returned
@@ -3694,22 +3694,17 @@ static void free_unbound_pwq(struct pool_workqueue *pwq)
* %false if equal.
*/
static bool wq_calc_node_cpumask(const struct workqueue_attrs *attrs, int node,
- int cpu_going_down, cpumask_t *cpumask)
+ struct cpumask *cpumask)
{
if (!wq_numa_enabled || attrs->no_numa)
goto use_dfl;

/* does @node have any online CPUs @attrs wants? */
- cpumask_and(cpumask, cpumask_of_node(node), attrs->cpumask);
- if (cpu_going_down >= 0)
- cpumask_clear_cpu(cpu_going_down, cpumask);
-
- if (cpumask_empty(cpumask))
- goto use_dfl;
-
- /* yeap, return possible CPUs in @node that @attrs wants */
cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]);
- return !cpumask_equal(cpumask, attrs->cpumask);
+ if (cpumask_intersects(cpumask, &wq_unbound_online_cpumask)) {
+ /* yeap, return possible CPUs in @node that @attrs wants */
+ return !cpumask_equal(cpumask, attrs->cpumask);
+ }

use_dfl:
cpumask_copy(cpumask, attrs->cpumask);
@@ -3800,7 +3795,7 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
goto enomem_pwq;

for_each_node(node) {
- if (wq_calc_node_cpumask(attrs, node, -1, tmp_attrs->cpumask)) {
+ if (wq_calc_node_cpumask(attrs, node, tmp_attrs->cpumask)) {
pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
if (!pwq_tbl[node])
goto enomem_pwq;
@@ -3857,7 +3852,6 @@ enomem:
* wq_update_unbound_numa - update NUMA affinity of a wq for CPU hot[un]plug
* @wq: the target workqueue
* @cpu: the CPU coming up or going down
- * @online: whether @cpu is coming up or going down
*
* This function is to be called from %CPU_DOWN_PREPARE, %CPU_ONLINE and
* %CPU_DOWN_FAILED. @cpu is being hot[un]plugged, update NUMA affinity of
@@ -3875,11 +3869,9 @@ enomem:
* affinity, it's the user's responsibility to flush the work item from
* CPU_DOWN_PREPARE.
*/
-static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
- bool online)
+static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu)
{
int node = cpu_to_node(cpu);
- int cpu_off = online ? -1 : cpu;
struct pool_workqueue *old_pwq = NULL, *pwq;
struct workqueue_attrs *target_attrs;
cpumask_t *cpumask;
@@ -3910,7 +3902,7 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
* a new one if they don't match. If the target cpumask equals
* wq's, the default pwq should be used.
*/
- if (wq_calc_node_cpumask(wq->unbound_attrs, node, cpu_off, cpumask)) {
+ if (wq_calc_node_cpumask(wq->unbound_attrs, node, cpumask)) {
if (cpumask_equal(cpumask, pwq->pool->attrs->cpumask))
goto out_unlock;
} else {
@@ -4583,9 +4575,11 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
mutex_unlock(&pool->attach_mutex);
}

+ cpumask_set_cpu(cpu, &wq_unbound_online_cpumask);
+
/* update NUMA affinity of unbound workqueues */
list_for_each_entry(wq, &workqueues, list)
- wq_update_unbound_numa(wq, cpu, true);
+ wq_update_unbound_numa(wq, cpu);

mutex_unlock(&wq_pool_mutex);
break;
@@ -4611,10 +4605,12 @@ static int workqueue_cpu_down_callback(struct notifier_block *nfb,
INIT_WORK_ONSTACK(&unbind_work, wq_unbind_fn);
queue_work_on(cpu, system_highpri_wq, &unbind_work);

+ cpumask_clear_cpu(cpu, &wq_unbound_online_cpumask);
+
/* update NUMA affinity of unbound workqueues */
mutex_lock(&wq_pool_mutex);
list_for_each_entry(wq, &workqueues, list)
- wq_update_unbound_numa(wq, cpu, false);
+ wq_update_unbound_numa(wq, cpu);
mutex_unlock(&wq_pool_mutex);

/* wait for per-cpu unbinding to finish */
@@ -4828,6 +4824,8 @@ static int __init init_workqueues(void)

pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);

+ cpumask_copy(&wq_unbound_online_cpumask, cpu_online_mask);
+
cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);

--
1.7.4.4

2014-10-08 03:51:36

by Lai Jiangshan

[permalink] [raw]

Subject: [PATCH 2/3] workqueue: extend wq_pool_mutex to also protect pwq-installation

Athough pwq-installation without wq_pool_mutex held is not bug,
but it is not good design, it is better to make the pwq-allocation and installation
are in the (same) wq_pool_mutex.

And since the pwq-allocation and installation are in the same wq_pool_mutex,
get_online_cpus() will not be needed for this reason, and it will be remove
in later patch.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7a217f0..9bc3a87 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3805,8 +3805,6 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
}
}

- mutex_unlock(&wq_pool_mutex);
-
/* all pwqs have been created successfully, let's install'em */
mutex_lock(&wq->mutex);

@@ -3827,6 +3825,8 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
put_pwq_unlocked(pwq_tbl[node]);
put_pwq_unlocked(dfl_pwq);

+ mutex_unlock(&wq_pool_mutex);
+
put_online_cpus();
ret = 0;
/* fall through */
--
1.7.4.4

2014-10-27 09:24:19

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH 0/3] workqueue: unbound workqueue management Vs hotplug

ping

On 10/08/2014 11:53 AM, Lai Jiangshan wrote:
> Hi, TJ
>
> These patches are for unbound workqueue management (hotplug).
>
> This patchset simplify the unbound workqueue management when hotplug.
> This is also a preparation patchset for later unbound workqueue management
> patches.
>
> Thanks,
> Lai.
>
> Lai Jiangshan (3):
> workqueue: add wq_unbound_online_cpumask
> workqueue: extend wq_pool_mutex to also protect pwq-installation
> workqueue: remove get_online_cpus() from apply_workqueue_attrs()
>
> kernel/workqueue.c | 61 ++++++++++++++++++++-------------------------------
> 1 files changed, 24 insertions(+), 37 deletions(-)
>

2014-10-27 13:15:53

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH 1/3] workqueue: add wq_unbound_online_cpumask

On Wed, Oct 08, 2014 at 11:53:31AM +0800, Lai Jiangshan wrote:
> Current wq_calc_node_cpumask() is complicated by cpumask_of_node(node) whose
> value need to be revised before using and the "revising" needs @cpu_going_down
> which makes more complicated.
>
> This patch introduces wq_unbound_online_cpumask which is updated before
> wq_update_unbound_numa() in the cpu-hotplug callbacks and wq_calc_node_cpumask()
> can use it instead of cpumask_of_node(node). Thus wq_calc_node_cpumask()
> becomes much simpler and @cpu_going_down is gone.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/workqueue.c | 42 ++++++++++++++++++++----------------------
> 1 files changed, 20 insertions(+), 22 deletions(-)

"much simpler" seems a bit overblown for 2 LOC reduction.

> +/* PL: online cpumask for all unbound wqs */
> +static struct cpumask wq_unbound_online_cpumask;

And now someone who's reading the code has to wonder "why is wq
maintaining a separate copy of cpumask?" and from the code itself it
isn't clear at all. I don't necessarily dislike the patch and it does
make the code a bit simpler but at the cost of higher obscurity.
You'll at least need to add more comments explaining why the separate
cpumask is necessary and how it's used. If there are more
simplifications which can build atop, it'd be fine; otherwise, this is
firmly in the 'meh...' territory.

Thanks.

--
tejun

2014-10-27 13:20:54

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH 2/3] workqueue: extend wq_pool_mutex to also protect pwq-installation

Hello, Lai.

On Wed, Oct 08, 2014 at 11:53:32AM +0800, Lai Jiangshan wrote:
> Athough pwq-installation without wq_pool_mutex held is not bug,
> but it is not good design, it is better to make the pwq-allocation and installation
> are in the (same) wq_pool_mutex.

WHY? Why is that not a good design and why is extending the locking a
better thing to do? Can you elaborate the reasoning here?

> And since the pwq-allocation and installation are in the same wq_pool_mutex,
> get_online_cpus() will not be needed for this reason, and it will be remove
> in later patch.

So, if this enables further cleanup, it's fine, but please just stick
to those reasons. You do this a lot. You wanna do A because of B but
the changelog often doesn't mention that at all and just goes "This is
bad so change this to be better" without any proper reasoning. If you
want to do A to achieve B later, just say so.

Thanks.

--
tejun

2014-10-27 13:41:25

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH 3/3] workqueue: remove get_online_cpus() from apply_workqueue_attrs()

On Wed, Oct 08, 2014 at 11:53:33AM +0800, Lai Jiangshan wrote:
> There are two aims for get_online_cpus():
> 1) Protects cpumask_of_node(node). (CPUs should stay stable)
> 2) Protects the pwq-allocation and installation
>
> But both aims are settled by other methods in previous patches:
> cpumask_of_node(node) is replaced by wq_unbound_online_cpumask, and
> the pwq-allocation and installation are changed to be protected by
> wq_pool_mutex. Now the get_online_cpus() is no reason to exist,
> remove it!
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Well, this is very marginal and we want to add comments explaining why
this is safe, but yeah I guess it's an improvement. Can you please
add comments explaining what states are depended upon and that they
are all protected by pool mutex?

Thanks.

--
tejun