2013-04-04 02:06:08

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/7] workqueue: add __WQ_FREEZING and remove POOL_FREEZING

freezing is nothing related to pools, but POOL_FREEZING adds a connection,
and causes freeze_workqueues_begin() and thaw_workqueues() complicated.

Since freezing is workqueue instance attribute, so we introduce __WQ_FREEZING
to wq->flags instead and remove POOL_FREEZING.

Signed-off-by: Lai Jiangshan <[email protected]>
---
include/linux/workqueue.h | 1 +
kernel/workqueue.c | 33 +++++++--------------------------
2 files changed, 8 insertions(+), 26 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7179756..672b51e 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -300,6 +300,7 @@ enum {
WQ_CPU_INTENSIVE = 1 << 5, /* cpu instensive workqueue */
WQ_SYSFS = 1 << 6, /* visible in sysfs, see wq_sysfs_register() */

+ __WQ_FREEZING = 1 << 15, /* internel: workqueue is freezing */
__WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dd2a4c4..e06a5b0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,7 +68,6 @@ enum {
*/
POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
POOL_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */
- POOL_FREEZING = 1 << 3, /* freeze in progress */

/* worker flags */
WORKER_STARTED = 1 << 0, /* started */
@@ -3556,9 +3555,6 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
if (!pool || init_worker_pool(pool) < 0)
goto fail;

- if (workqueue_freezing)
- pool->flags |= POOL_FREEZING;
-
lockdep_set_subclass(&pool->lock, 1); /* see put_pwq() */
copy_workqueue_attrs(pool->attrs, attrs);

@@ -3650,7 +3646,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
struct workqueue_struct *wq = pwq->wq;
bool freezable = wq->flags & WQ_FREEZABLE;

- /* for @wq->saved_max_active */
+ /* for @wq->saved_max_active and @wq->flags */
lockdep_assert_held(&wq->mutex);

/* fast exit for non-freezable wqs */
@@ -3659,7 +3655,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)

spin_lock_irq(&pwq->pool->lock);

- if (!freezable || !(pwq->pool->flags & POOL_FREEZING)) {
+ if (!freezable || !(wq->flags & __WQ_FREEZING)) {
pwq->max_active = wq->saved_max_active;

while (!list_empty(&pwq->delayed_works) &&
@@ -4155,6 +4151,8 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
mutex_lock(&wq_pool_mutex);

mutex_lock(&wq->mutex);
+ if (workqueue_freezing)
+ wq->flags |= __WQ_FREEZING;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
@@ -4670,26 +4668,18 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
*/
void freeze_workqueues_begin(void)
{
- struct worker_pool *pool;
struct workqueue_struct *wq;
struct pool_workqueue *pwq;
- int pi;

mutex_lock(&wq_pool_mutex);

WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;

- /* set FREEZING */
- for_each_pool(pool, pi) {
- spin_lock_irq(&pool->lock);
- WARN_ON_ONCE(pool->flags & POOL_FREEZING);
- pool->flags |= POOL_FREEZING;
- spin_unlock_irq(&pool->lock);
- }
-
list_for_each_entry(wq, &workqueues, list) {
mutex_lock(&wq->mutex);
+ WARN_ON_ONCE(wq->flags & __WQ_FREEZING);
+ wq->flags |= __WQ_FREEZING;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
@@ -4757,25 +4747,16 @@ void thaw_workqueues(void)
{
struct workqueue_struct *wq;
struct pool_workqueue *pwq;
- struct worker_pool *pool;
- int pi;

mutex_lock(&wq_pool_mutex);

if (!workqueue_freezing)
goto out_unlock;

- /* clear FREEZING */
- for_each_pool(pool, pi) {
- spin_lock_irq(&pool->lock);
- WARN_ON_ONCE(!(pool->flags & POOL_FREEZING));
- pool->flags &= ~POOL_FREEZING;
- spin_unlock_irq(&pool->lock);
- }
-
/* restore max_active and repopulate worklist */
list_for_each_entry(wq, &workqueues, list) {
mutex_lock(&wq->mutex);
+ wq->flags &= ~__WQ_FREEZING;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
--
1.7.7.6


2013-04-04 02:06:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 4/7] workqueue: simplify workqueue_cpu_up_callback(CPU_ONLINE)

If we have 4096 CPUs, workqueue_cpu_up_callback() will travel too much CPUs,
to avoid it, we use for_each_cpu_worker_pool() for the cpu pools and
use for_each_unbound_pool() for unbound pools.

After it, for_each_pool() becomes unused, but we keep it for future
possible usage.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 53 ++++++++++++++++++++++++++++++++++-----------------
1 files changed, 35 insertions(+), 18 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b4369de..a383eaf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -354,6 +354,23 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
else

/**
+ * for_each_unbound_pool - iterate through all unbound worker_pools in the system
+ * @pool: iteration cursor
+ * @bkt: bucket (of integer) used for iteration
+ *
+ * This must be called either with wq_pool_mutex held or sched RCU read
+ * locked. If the pool needs to be used beyond the locking in effect, the
+ * caller is responsible for guaranteeing that the pool stays online.
+ *
+ * The if/else clause exists only for the lockdep assertion and can be
+ * ignored.
+ */
+#define for_each_unbound_pool(pool, bkt) \
+ hash_for_each(unbound_pool_hash, bkt, pool, hash_node) \
+ if (({ assert_rcu_or_pool_mutex(); false; })) { } \
+ else
+
+/**
* for_each_pool_worker - iterate through all workers of a worker_pool
* @worker: iteration cursor
* @wi: integer used for iteration
@@ -4442,7 +4459,7 @@ static void associate_cpu_pool(struct worker_pool *pool)
struct worker *worker;
int wi;

- lockdep_assert_held(&pool->manager_mutex);
+ mutex_lock(&pool->manager_mutex);

/*
* Restore CPU affinity of all workers. As all idle workers should
@@ -4454,7 +4471,7 @@ static void associate_cpu_pool(struct worker_pool *pool)
for_each_pool_worker(worker, wi, pool)
if (WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0))
- return;
+ goto out_unlock;

spin_lock_irq(&pool->lock);

@@ -4495,6 +4512,9 @@ static void associate_cpu_pool(struct worker_pool *pool)

pool->flags &= ~POOL_DISASSOCIATED;
spin_unlock_irq(&pool->lock);
+
+out_unlock:
+ mutex_unlock(&pool->manager_mutex);
}

/**
@@ -4509,25 +4529,28 @@ static void associate_cpu_pool(struct worker_pool *pool)
*/
static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
{
- static cpumask_t cpumask;
+ static cpumask_t cpumask; /* protected by wq_pool_mutex */
struct worker *worker;
int wi;

- lockdep_assert_held(&pool->manager_mutex);
+ mutex_lock(&pool->manager_mutex);

/* is @cpu allowed for @pool? */
if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
- return;
+ goto out_unlock;

/* is @cpu the only online CPU? */
cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
if (cpumask_weight(&cpumask) != 1)
- return;
+ goto out_unlock;

/* as we're called from CPU_ONLINE, the following shouldn't fail */
for_each_pool_worker(worker, wi, pool)
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);
+
+out_unlock:
+ mutex_unlock(&pool->manager_mutex);
}

/*
@@ -4541,7 +4564,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
int cpu = (unsigned long)hcpu;
struct worker_pool *pool;
struct workqueue_struct *wq;
- int pi;
+ int bkt;

switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
@@ -4555,19 +4578,13 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,

case CPU_DOWN_FAILED:
case CPU_ONLINE:
- mutex_lock(&wq_pool_mutex);
+ for_each_cpu_worker_pool(pool, cpu)
+ associate_cpu_pool(pool);

- for_each_pool(pool, pi) {
- mutex_lock(&pool->manager_mutex);
-
- if (pool->cpu == cpu) {
- associate_cpu_pool(pool);
- } else if (pool->cpu < 0) {
- restore_unbound_workers_cpumask(pool, cpu);
- }
+ mutex_lock(&wq_pool_mutex);

- mutex_unlock(&pool->manager_mutex);
- }
+ for_each_unbound_pool(pool, bkt)
+ restore_unbound_workers_cpumask(pool, cpu);

/* update NUMA affinity of unbound workqueues */
list_for_each_entry(wq, &workqueues, list)
--
1.7.7.6

2013-04-04 02:06:05

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/7] workqueue: set __WQ_FREEZING only when freezable

simplify pwq_adjust_max_active().
make freeze_workqueues_begin() and thaw_workqueues() fast skip non-freezable wq.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 13 ++++++-------
1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e06a5b0..66a9d71 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3644,18 +3644,13 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
static void pwq_adjust_max_active(struct pool_workqueue *pwq)
{
struct workqueue_struct *wq = pwq->wq;
- bool freezable = wq->flags & WQ_FREEZABLE;

/* for @wq->saved_max_active and @wq->flags */
lockdep_assert_held(&wq->mutex);

- /* fast exit for non-freezable wqs */
- if (!freezable && pwq->max_active == wq->saved_max_active)
- return;
-
spin_lock_irq(&pwq->pool->lock);

- if (!freezable || !(wq->flags & __WQ_FREEZING)) {
+ if (!(wq->flags & __WQ_FREEZING)) {
pwq->max_active = wq->saved_max_active;

while (!list_empty(&pwq->delayed_works) &&
@@ -4151,7 +4146,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
mutex_lock(&wq_pool_mutex);

mutex_lock(&wq->mutex);
- if (workqueue_freezing)
+ if ((wq->flags & WQ_FREEZABLE) && workqueue_freezing)
wq->flags |= __WQ_FREEZING;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
@@ -4677,6 +4672,8 @@ void freeze_workqueues_begin(void)
workqueue_freezing = true;

list_for_each_entry(wq, &workqueues, list) {
+ if (!(wq->flags & WQ_FREEZABLE))
+ continue;
mutex_lock(&wq->mutex);
WARN_ON_ONCE(wq->flags & __WQ_FREEZING);
wq->flags |= __WQ_FREEZING;
@@ -4755,6 +4752,8 @@ void thaw_workqueues(void)

/* restore max_active and repopulate worklist */
list_for_each_entry(wq, &workqueues, list) {
+ if (!(wq->flags & WQ_FREEZABLE))
+ continue;
mutex_lock(&wq->mutex);
wq->flags &= ~__WQ_FREEZING;
for_each_pwq(pwq, wq)
--
1.7.7.6

2013-04-04 02:06:48

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 7/7] workqueue: avoid false negative WARN_ON()

it is very common wq->dfl_pwq->refcnt > 1.

[ 7.939873] WARNING: at kernel/workqueue.c:4201 destroy_workqueue+0x6a/0x13e()
[ 7.943601] Hardware name: 4286C12
[ 7.947250] Modules linked in: sdhci_pci sdhci mmc_core usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video
[ 7.951313] Pid: 361, comm: umount Not tainted 3.9.0-rc5+ #29
[ 7.955309] Call Trace:
[ 7.959346] [<c04314a7>] warn_slowpath_common+0x7c/0x93
[ 7.963506] [<c044796a>] ? destroy_workqueue+0x6a/0x13e
[ 7.967748] [<c044796a>] ? destroy_workqueue+0x6a/0x13e
[ 7.971981] [<c04314e0>] warn_slowpath_null+0x22/0x24
[ 7.976383] [<c044796a>] destroy_workqueue+0x6a/0x13e
[ 7.980875] [<c056dc01>] ext4_put_super+0x43/0x2c4
[ 7.985407] [<c050bd48>] ? dispose_list+0x28/0x32
[ 7.989987] [<c050c652>] ? evict_inodes+0xcf/0xd7
[ 7.994509] [<c04fb7b8>] generic_shutdown_super+0x4b/0xb9
[ 7.999130] [<c04fb848>] kill_block_super+0x22/0x60
[ 8.003594] [<c04fb960>] deactivate_locked_super+0x2f/0x56
[ 8.008077] [<c04fc41b>] deactivate_super+0x2e/0x31
[ 8.012523] [<c050f1e6>] mntput_no_expire+0x103/0x108
[ 8.017050] [<c050fdce>] sys_umount+0x2a2/0x2c4
[ 8.021429] [<c050fe0e>] sys_oldumount+0x1e/0x20
[ 8.025678] [<c085ba4d>] sysenter_do_call+0x12/0x38

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3f33077..f015c38 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4198,7 +4198,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
}
}

- if (WARN_ON(pwq->refcnt > 1) ||
+ if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
WARN_ON(pwq->nr_active) ||
WARN_ON(!list_empty(&pwq->delayed_works))) {
mutex_unlock(&wq->mutex);
--
1.7.7.6

2013-04-04 02:06:02

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/7] workqueue: rename rebind_workers() to associate_cpu_pool()

merge the code of clearing POOL_DISASSOCIATED to rebind_workers(), and
rename rebind_workers() to associate_cpu_pool().

It merges high related code together and simplify
workqueue_cpu_up_callback().

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 21 ++++++++++-----------
1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 66a9d71..b4369de 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2270,7 +2270,7 @@ recheck:
* worker or that someone else has already assumed the manager
* role. This is where @worker starts participating in concurrency
* management if applicable and concurrency management is restored
- * after being rebound. See rebind_workers() for details.
+ * after being rebound. See associate_cpu_pool() for details.
*/
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

@@ -4431,12 +4431,13 @@ static void wq_unbind_fn(struct work_struct *work)
}

/**
- * rebind_workers - rebind all workers of a pool to the associated CPU
+ * associate_cpu_pool - rebind all workers of a pool to the associated CPU
* @pool: pool of interest
*
- * @pool->cpu is coming online. Rebind all workers to the CPU.
+ * @pool->cpu is coming online. Rebind all workers to the CPU and
+ * set the pool associated
*/
-static void rebind_workers(struct worker_pool *pool)
+static void associate_cpu_pool(struct worker_pool *pool)
{
struct worker *worker;
int wi;
@@ -4451,8 +4452,9 @@ static void rebind_workers(struct worker_pool *pool)
* from CPU_ONLINE, the following shouldn't fail.
*/
for_each_pool_worker(worker, wi, pool)
- WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
- pool->attrs->cpumask) < 0);
+ if (WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
+ pool->attrs->cpumask) < 0))
+ return;

spin_lock_irq(&pool->lock);

@@ -4491,6 +4493,7 @@ static void rebind_workers(struct worker_pool *pool)
ACCESS_ONCE(worker->flags) = worker_flags;
}

+ pool->flags &= ~POOL_DISASSOCIATED;
spin_unlock_irq(&pool->lock);
}

@@ -4558,11 +4561,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
mutex_lock(&pool->manager_mutex);

if (pool->cpu == cpu) {
- spin_lock_irq(&pool->lock);
- pool->flags &= ~POOL_DISASSOCIATED;
- spin_unlock_irq(&pool->lock);
-
- rebind_workers(pool);
+ associate_cpu_pool(pool);
} else if (pool->cpu < 0) {
restore_unbound_workers_cpumask(pool, cpu);
}
--
1.7.7.6

2013-04-04 02:07:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 6/7] workqueue: node-awared allocation for unbound pool

calculate the node of the pool earlier, and allocate the pool
from the node.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 29 +++++++++++++++--------------
1 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 737646d..3f33077 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -539,7 +539,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
* @wq: the target workqueue
* @node: the node ID
*
- * This must be called either with pwq_lock held or sched RCU read locked.
+ * This must be called either with wq->mutex held or sched RCU read locked.
* If the pwq needs to be used beyond the locking in effect, the caller is
* responsible for guaranteeing that the pwq stays online.
*/
@@ -3555,7 +3555,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
{
u32 hash = wqattrs_hash(attrs);
struct worker_pool *pool;
- int node;
+ int pool_node = NUMA_NO_NODE, node;

lockdep_assert_held(&wq_pool_mutex);

@@ -3563,29 +3563,30 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
if (wqattrs_equal(pool->attrs, attrs)) {
pool->refcnt++;
- goto out_unlock;
+ goto out_pool;
}
}

- /* nope, create a new one */
- pool = kzalloc(sizeof(*pool), GFP_KERNEL);
- if (!pool || init_worker_pool(pool) < 0)
- goto fail;
-
- lockdep_set_subclass(&pool->lock, 1); /* see put_pwq() */
- copy_workqueue_attrs(pool->attrs, attrs);
-
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
- if (cpumask_subset(pool->attrs->cpumask,
+ if (cpumask_subset(attrs->cpumask,
wq_numa_possible_cpumask[node])) {
- pool->node = node;
+ pool_node = node;
break;
}
}
}

+ /* create a new one */
+ pool = kzalloc_node(sizeof(*pool), GFP_KERNEL, pool_node);
+ if (!pool || init_worker_pool(pool) < 0)
+ goto fail;
+
+ lockdep_set_subclass(&pool->lock, 1); /* see put_pwq() */
+ copy_workqueue_attrs(pool->attrs, attrs);
+ pool->node = pool_node;
+
if (worker_pool_assign_id(pool) < 0)
goto fail;

@@ -3595,7 +3596,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)

/* install */
hash_add(unbound_pool_hash, &pool->hash_node, hash);
-out_unlock:
+out_pool:
return pool;
fail:
if (pool)
--
1.7.7.6

2013-04-04 02:07:35

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 5/7] workqueue, use default pwq when fail to allocate node pwd

When we fail to allocate the node pwq, we can use the default pwq
for the node.

Thus we can avoid failure after allocated default pwq, and remove
some code for failure path.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 28 +++++++---------------------
1 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a383eaf..737646d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3751,17 +3751,6 @@ static struct pool_workqueue *alloc_unbound_pwq(struct workqueue_struct *wq,
return pwq;
}

-/* undo alloc_unbound_pwq(), used only in the error path */
-static void free_unbound_pwq(struct pool_workqueue *pwq)
-{
- lockdep_assert_held(&wq_pool_mutex);
-
- if (pwq) {
- put_unbound_pool(pwq->pool);
- kfree(pwq);
- }
-}
-
/**
* wq_calc_node_mask - calculate a wq_attrs' cpumask for the specified node
* @attrs: the wq_attrs of interest
@@ -3891,12 +3880,12 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
for_each_node(node) {
if (wq_calc_node_cpumask(attrs, node, -1, tmp_attrs->cpumask)) {
pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs);
- if (!pwq_tbl[node])
- goto enomem_pwq;
- } else {
- dfl_pwq->refcnt++;
- pwq_tbl[node] = dfl_pwq;
+ if (pwq_tbl[node])
+ continue;
+ /* fallback to dfl_pwq if the allocation failed */
}
+ dfl_pwq->refcnt++;
+ pwq_tbl[node] = dfl_pwq;
}

mutex_unlock(&wq_pool_mutex);
@@ -3931,10 +3920,6 @@ out_free:
return ret;

enomem_pwq:
- free_unbound_pwq(dfl_pwq);
- for_each_node(node)
- if (pwq_tbl && pwq_tbl[node] != dfl_pwq)
- free_unbound_pwq(pwq_tbl[node]);
mutex_unlock(&wq_pool_mutex);
put_online_cpus();
enomem:
@@ -4017,7 +4002,8 @@ static void wq_update_unbound_numa(struct workqueue_struct *wq, int cpu,
if (!pwq) {
pr_warning("workqueue: allocation failed while updating NUMA affinity of \"%s\"\n",
wq->name);
- goto out_unlock;
+ mutex_lock(&wq->mutex);
+ goto use_dfl_pwq;
}

/*
--
1.7.7.6

2013-04-04 14:12:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/7] workqueue: add __WQ_FREEZING and remove POOL_FREEZING

Hello, Lai.

On Thu, Apr 04, 2013 at 10:05:32AM +0800, Lai Jiangshan wrote:
> @@ -4757,25 +4747,16 @@ void thaw_workqueues(void)
> {
> struct workqueue_struct *wq;
> struct pool_workqueue *pwq;
> - struct worker_pool *pool;
> - int pi;
>
> mutex_lock(&wq_pool_mutex);
>
> if (!workqueue_freezing)
> goto out_unlock;
>
> - /* clear FREEZING */
> - for_each_pool(pool, pi) {
> - spin_lock_irq(&pool->lock);
> - WARN_ON_ONCE(!(pool->flags & POOL_FREEZING));
> - pool->flags &= ~POOL_FREEZING;
> - spin_unlock_irq(&pool->lock);
> - }
> -
> /* restore max_active and repopulate worklist */
> list_for_each_entry(wq, &workqueues, list) {
> mutex_lock(&wq->mutex);
> + wq->flags &= ~__WQ_FREEZING;

I want an assertion here. Maybe we can fold the next patch into this
one and add WARN_ON_ONCE() here?

> for_each_pwq(pwq, wq)
> pwq_adjust_max_active(pwq);
> mutex_unlock(&wq->mutex);

Thanks.

--
tejun

2013-04-04 14:15:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 3/7] workqueue: rename rebind_workers() to associate_cpu_pool()

On Thu, Apr 04, 2013 at 10:05:34AM +0800, Lai Jiangshan wrote:
> merge the code of clearing POOL_DISASSOCIATED to rebind_workers(), and
> rename rebind_workers() to associate_cpu_pool().
>
> It merges high related code together and simplify
> workqueue_cpu_up_callback().
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/workqueue.c | 21 ++++++++++-----------
> 1 files changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 66a9d71..b4369de 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2270,7 +2270,7 @@ recheck:
> * worker or that someone else has already assumed the manager
> * role. This is where @worker starts participating in concurrency
> * management if applicable and concurrency management is restored
> - * after being rebound. See rebind_workers() for details.
> + * after being rebound. See associate_cpu_pool() for details.
> */
> worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
>
> @@ -4431,12 +4431,13 @@ static void wq_unbind_fn(struct work_struct *work)
> }
>
> /**
> - * rebind_workers - rebind all workers of a pool to the associated CPU
> + * associate_cpu_pool - rebind all workers of a pool to the associated CPU
> * @pool: pool of interest
> *
> - * @pool->cpu is coming online. Rebind all workers to the CPU.
> + * @pool->cpu is coming online. Rebind all workers to the CPU and
> + * set the pool associated
> */
> -static void rebind_workers(struct worker_pool *pool)
> +static void associate_cpu_pool(struct worker_pool *pool)
> {
> struct worker *worker;
> int wi;
> @@ -4451,8 +4452,9 @@ static void rebind_workers(struct worker_pool *pool)
> * from CPU_ONLINE, the following shouldn't fail.
> */
> for_each_pool_worker(worker, wi, pool)
> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
> - pool->attrs->cpumask) < 0);
> + if (WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
> + pool->attrs->cpumask) < 0))
> + return;
>
> spin_lock_irq(&pool->lock);
>
> @@ -4491,6 +4493,7 @@ static void rebind_workers(struct worker_pool *pool)
> ACCESS_ONCE(worker->flags) = worker_flags;
> }
>
> + pool->flags &= ~POOL_DISASSOCIATED;

So, now we're clearing DISASSOCIATED after rebinding workers instead
of before. Is that safe? Even if so, why no mention of that in the
patch description? Changes like this are functional changes which can
cause subtle issues and *should* be explained and justified in the
patch description. Please put more effort in explaining what's going
on.

Thanks.

--
tejun

2013-04-04 14:34:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 5/7] workqueue, use default pwq when fail to allocate node pwd

On Thu, Apr 04, 2013 at 10:05:36AM +0800, Lai Jiangshan wrote:
> When we fail to allocate the node pwq, we can use the default pwq
> for the node.
>
> Thus we can avoid failure after allocated default pwq, and remove
> some code for failure path.

I don't know about this one. The reason why we fall back to the
default one during CPU UP/DONW is because we don't want to interfere
with CPU hotplug which doesn't really have much to do with specific
workqueues and shouldn't fail even when things go pretty hairy -
e.g. if the user turned off the screen of his/her phone or a laptop is
thrown into the backpack with lid closed, CPU_DOWNs during suspend
better not fail from memory allocation.

apply_workqueue_attrs() is different. We *want* to notify the issuer
that something went wrong affnd the action requested couldn't be
fulfilled in full. We don't want to hide a failure. It'll show up as
a silent performance degradation that nobody knows why it's happening.

So, nope, doesn't look like a good idea to me.

Thanks.

--
tejun

2013-04-04 14:38:57

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 6/7] workqueue: node-awared allocation for unbound pool

Hello,

On Thu, Apr 04, 2013 at 10:05:37AM +0800, Lai Jiangshan wrote:
> calculate the node of the pool earlier, and allocate the pool
> from the node.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/workqueue.c | 29 +++++++++++++++--------------
> 1 files changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 737646d..3f33077 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -539,7 +539,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
> * @wq: the target workqueue
> * @node: the node ID
> *
> - * This must be called either with pwq_lock held or sched RCU read locked.
> + * This must be called either with wq->mutex held or sched RCU read locked.

It'd be nice to mention it in the patch description. Just add
something like "while at it, fix the wrong locking comment in XXX".

> @@ -3563,29 +3563,30 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
> if (wqattrs_equal(pool->attrs, attrs)) {
> pool->refcnt++;
> - goto out_unlock;
> + goto out_pool;

return pool; ?

Thanks.

--
tejun

2013-04-04 14:55:27

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] workqueue: avoid false negative WARN_ON() in destroy_workqueue()

>From 5c529597e922c26910fe49b8d5f93aeaca9a2415 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <[email protected]>
Date: Thu, 4 Apr 2013 10:05:38 +0800

destroy_workqueue() performs several sanity checks before proceeding
with destruction of a workqueue. One of the checks verifies that
refcnt of each pwq (pool_workqueue) is over 1 as at that point there
should be no in-flight work items and the only holder of pwq refs is
the workqueue itself.

This worked fine as a workqueue used to hold only one reference to its
pwqs; however, since 4c16bd327c ("workqueue: implement NUMA affinity
for unbound workqueues"), a workqueue may hold multiple references to
its default pwq triggering this sanity check spuriously.

Fix it by not triggering the pwq->refcnt assertion on default pwqs.

An example spurious WARN trigger follows.

WARNING: at kernel/workqueue.c:4201 destroy_workqueue+0x6a/0x13e()
Hardware name: 4286C12
Modules linked in: sdhci_pci sdhci mmc_core usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video
Pid: 361, comm: umount Not tainted 3.9.0-rc5+ #29
Call Trace:
[<c04314a7>] warn_slowpath_common+0x7c/0x93
[<c04314e0>] warn_slowpath_null+0x22/0x24
[<c044796a>] destroy_workqueue+0x6a/0x13e
[<c056dc01>] ext4_put_super+0x43/0x2c4
[<c04fb7b8>] generic_shutdown_super+0x4b/0xb9
[<c04fb848>] kill_block_super+0x22/0x60
[<c04fb960>] deactivate_locked_super+0x2f/0x56
[<c04fc41b>] deactivate_super+0x2e/0x31
[<c050f1e6>] mntput_no_expire+0x103/0x108
[<c050fdce>] sys_umount+0x2a2/0x2c4
[<c050fe0e>] sys_oldumount+0x1e/0x20
[<c085ba4d>] sysenter_do_call+0x12/0x38

tj: Rewrote description.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
---
Applied to wq/for-3.10.

Thanks.

kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dd2a4c4..c273376 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4201,7 +4201,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
}
}

- if (WARN_ON(pwq->refcnt > 1) ||
+ if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
WARN_ON(pwq->nr_active) ||
WARN_ON(!list_empty(&pwq->delayed_works))) {
mutex_unlock(&wq->mutex);
--
1.8.1.4

2013-04-05 07:36:22

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH] workqueue: avoid false negative WARN_ON() in destroy_workqueue()

On 04/04/2013 10:55 PM, Tejun Heo wrote:
>>From 5c529597e922c26910fe49b8d5f93aeaca9a2415 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <[email protected]>
> Date: Thu, 4 Apr 2013 10:05:38 +0800
>
> destroy_workqueue() performs several sanity checks before proceeding
> with destruction of a workqueue. One of the checks verifies that
> refcnt of each pwq (pool_workqueue) is over 1 as at that point there
> should be no in-flight work items and the only holder of pwq refs is
> the workqueue itself.
>
> This worked fine as a workqueue used to hold only one reference to its
> pwqs; however, since 4c16bd327c ("workqueue: implement NUMA affinity
> for unbound workqueues"), a workqueue may hold multiple references to
> its default pwq triggering this sanity check spuriously.
>
> Fix it by not triggering the pwq->refcnt assertion on default pwqs.
>
> An example spurious WARN trigger follows.
>
> WARNING: at kernel/workqueue.c:4201 destroy_workqueue+0x6a/0x13e()
> Hardware name: 4286C12
> Modules linked in: sdhci_pci sdhci mmc_core usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video
> Pid: 361, comm: umount Not tainted 3.9.0-rc5+ #29
> Call Trace:
> [<c04314a7>] warn_slowpath_common+0x7c/0x93
> [<c04314e0>] warn_slowpath_null+0x22/0x24
> [<c044796a>] destroy_workqueue+0x6a/0x13e
> [<c056dc01>] ext4_put_super+0x43/0x2c4
> [<c04fb7b8>] generic_shutdown_super+0x4b/0xb9
> [<c04fb848>] kill_block_super+0x22/0x60
> [<c04fb960>] deactivate_locked_super+0x2f/0x56
> [<c04fc41b>] deactivate_super+0x2e/0x31
> [<c050f1e6>] mntput_no_expire+0x103/0x108
> [<c050fdce>] sys_umount+0x2a2/0x2c4
> [<c050fe0e>] sys_oldumount+0x1e/0x20
> [<c085ba4d>] sysenter_do_call+0x12/0x38
>
> tj: Rewrote description.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Fengguang Wu <[email protected]>

Hi, Wu

Could you also send regression-report of workqueue to me?

Thanks,
Lai

> ---
> Applied to wq/for-3.10.
>
> Thanks.
>
> kernel/workqueue.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index dd2a4c4..c273376 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4201,7 +4201,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
> }
> }
>
> - if (WARN_ON(pwq->refcnt > 1) ||
> + if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
> WARN_ON(pwq->nr_active) ||
> WARN_ON(!list_empty(&pwq->delayed_works))) {
> mutex_unlock(&wq->mutex);

2013-04-20 16:12:17

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 1/7] workqueue: add __WQ_FREEZING and remove POOL_FREEZING

Please forget all my other patches.

But these 1/7 and 2/7 __WQ_FREEZING patches can be in 3.10

On Thu, Apr 4, 2013 at 10:12 PM, Tejun Heo <[email protected]> wrote:
> Hello, Lai.
>
> On Thu, Apr 04, 2013 at 10:05:32AM +0800, Lai Jiangshan wrote:
>> @@ -4757,25 +4747,16 @@ void thaw_workqueues(void)
>> {
>> struct workqueue_struct *wq;
>> struct pool_workqueue *pwq;
>> - struct worker_pool *pool;
>> - int pi;
>>
>> mutex_lock(&wq_pool_mutex);
>>
>> if (!workqueue_freezing)
>> goto out_unlock;
>>
>> - /* clear FREEZING */
>> - for_each_pool(pool, pi) {
>> - spin_lock_irq(&pool->lock);
>> - WARN_ON_ONCE(!(pool->flags & POOL_FREEZING));
>> - pool->flags &= ~POOL_FREEZING;
>> - spin_unlock_irq(&pool->lock);
>> - }
>> -
>> /* restore max_active and repopulate worklist */
>> list_for_each_entry(wq, &workqueues, list) {
>> mutex_lock(&wq->mutex);
>> + wq->flags &= ~__WQ_FREEZING;
>
> I want an assertion here.

freezing codes is very simple for verifying.

> Maybe we can fold the next patch into this
> one and add WARN_ON_ONCE() here?

I consider the two patches are different intent.

Thanks,
Lai

>
>> for_each_pwq(pwq, wq)
>> pwq_adjust_max_active(pwq);
>> mutex_unlock(&wq->mutex);
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-04-21 01:29:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/7] workqueue: add __WQ_FREEZING and remove POOL_FREEZING

On Sun, Apr 21, 2013 at 12:12:14AM +0800, Lai Jiangshan wrote:
> > I want an assertion here.
>
> freezing codes is very simple for verifying.

I still want it (and your patch is removing it). The usual cases are
fine but things can get messy on edge cases like freeze failure, bugs
in freezer code itself or whatnot. Please keep the assertion.

Thanks.

--
tejun