2013-03-19 19:28:42

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 00/21] workqueue: cleanups and better locking for recent changes

Hi, TJ

After reviewing(long time review!) for recent 3 patchset:
tj/review-misc-cleanups
tj/review-finer-locking
tj/review-restore-affinity

I did not find anything wrong for the patchset. You can add my Reviewed-by
for every patch.

Although I always tried hard to review, but I found a possible problem of my
reviewed patch: 29c91e9912bed7060df6116af90286500f5a700d. I think I should
print the patch and eat the paper. it is fixed by patch1 here.

In review, I like the patch "replace PF_THREAD_BOUND with PF_NO_SETAFFINITY"
which moves the PF_NO_SETAFFINITY test out from set_cpus_allowed_ptr().
the patch make workqueue.c much simpler.

In review, I did not like the full design of some aspects, or I found the patch
needs add some additional cleanup. but there is nothing wrongs. so I wrote
this patchset instead to reply to every original patch.

In review, I considered more about finer-locking.

wq_mutex:
* worker_pool_idr and unbound_pool_hash
* pool->refcnt
* workqueues list
* workqueue->flags, ->nr_drainers
* workqueue_freezing

pwq_lock:
* workqueue->pwqs
* workqueue->saved_max_active

wq->flush_mutex:
* workqueue->pwqs
* flushers and flushing fields

In this list, we can find that:
1) wq_mutex protects too much different kind of things.
2) workqueue->pwqs are protected by both wq->flush_mutex and pwq_lock,
but in many read sites, they are protected by both wq->flush_mutex and pwq_lock,
in some other read sites, they are protected by pwq_lock, but can be converted
to wq->flush_mutex. it means pwq_lock and wq->flush_mutex are redundancy here.
3) pwq_lock is global lock, but it protects only workqueue instance fields.
4) several workqueue instance fields are protected by different locks.

To make locking better, this patchset changes a little the design as:
pools_mutex protects the set of pools:
* worker_pool_idr and unbound_pool_hash
* pool->refcnt

wqs_mutex(renamed from wq_mutex) protects the set of workqueues and workqueue_freezing:
* workqueues list
* workqueue_freezing

workqueue instance mutex(wq->mutex, renamed from wq->flush_mutex) protects
workqueue instance:
* workqueue->pwqs
* flushers and workqueue's flushing fields
* workqueue->saved_max_active
* workqueue->freezing
(if we access the above fields, we access ->pwqs at the same time)
* workqueue->flags, ->nr_drainers (doing flush at the same time)

Any thought?

Patch1: fix problem of new pool's POOL_FREEZING bit.
Patch5-14: better locking.
Patch1,4,5,10,12,14,20: fix/cleanup/dev for freezing and max_active adjusting

others are single cleanup patches

Thanks,
Lai

Lai Jiangshan (21):
workqueue: add missing POOL_FREEZING
workqueue: don't free pool->worker_idr by RCU
workqueue: simplify current_is_workqueue_rescuer()
workqueue: swap the two branches in pwq_adjust_max_active() to get
better readability
workqueue: kick workers in pwq_adjust_max_active()
workqueue: separate out pools locking into pools_mutex
workqueue: rename wq_mutex to wqs_mutex
workqueue: rename wq->flush_mutex to wq->mutex
workqueue: use wq->mutex to protects ->nr_drainers and __WQ_DRAINING
workqueue: use rcu_read_lock_sched() instead for accessing pwq in RCU
workqueue: also allowed wq->mutex protect for_each_pwq()
workqueue: use wq->mutex to protect saved_max_active
workqueue: remove unused pwq_lock
workqueue: add wq->freezing and remove POOL_FREEZING
workqueue: remove worker_maybe_bind_and_lock()
workqueue: rename rebind_workers() to associate_cpu_pool()
workqueue: simplify workqueue_cpu_up_callback(CPU_ONLINE)
workqueue: read POOL_DISASSOCIATED bit under pool->lock
workqueue: remove @p_last_pwq from init_and_link_pwq()
workqueue: modify wq->freezing only when freezable
workqueue: avoid false negative in assert_manager_or_pool_lock()

kernel/workqueue.c | 465 ++++++++++++++++++++--------------------------------
1 files changed, 179 insertions(+), 286 deletions(-)

--
1.7.7.6


2013-03-19 19:28:43

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 02/21] workqueue: don't free pool->worker_idr by RCU

pool->worker_idr nor worker->id is not protected by RCU.
don't need to free pool->worker_idr by RCU.

Just free it directly.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 40f4017..298cefe 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3411,7 +3411,6 @@ static void rcu_free_pool(struct rcu_head *rcu)
{
struct worker_pool *pool = container_of(rcu, struct worker_pool, rcu);

- idr_destroy(&pool->worker_idr);
free_workqueue_attrs(pool->attrs);
kfree(pool);
}
@@ -3462,6 +3461,7 @@ static void put_unbound_pool(struct worker_pool *pool)
destroy_worker(worker);
WARN_ON(pool->nr_workers || pool->nr_idle);

+ idr_destroy(&pool->worker_idr);
spin_unlock_irq(&pool->lock);
mutex_unlock(&pool->manager_mutex);
mutex_unlock(&pool->manager_arb);
--
1.7.7.6

2013-03-19 19:28:50

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 04/21] workqueue: swap the two branches in pwq_adjust_max_active() to get better readability

"if (!freezable || !(pwq->pool->flags & POOL_FREEZING))" is hard to read.

Swap the two branches. it becomes
"if (freezable && (pwq->pool->flags & POOL_FREEZING))", it is better.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e1b31fc..8c882ae 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3592,14 +3592,14 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)

spin_lock(&pwq->pool->lock);

- if (!freezable || !(pwq->pool->flags & POOL_FREEZING)) {
+ if (freezable && (pwq->pool->flags & POOL_FREEZING)) {
+ pwq->max_active = 0;
+ } else {
pwq->max_active = wq->saved_max_active;

while (!list_empty(&pwq->delayed_works) &&
pwq->nr_active < pwq->max_active)
pwq_activate_first_delayed(pwq);
- } else {
- pwq->max_active = 0;
}

spin_unlock(&pwq->pool->lock);
--
1.7.7.6

2013-03-19 19:28:55

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 08/21] workqueue: rename wq->flush_mutex to wq->mutex

Currently wq->flush_mutex protects most workqueue instance's fields,
especially include all pwqs of this wq, it makes it like a mutex of
a workqueue instance.

So we rename wq->flush_mutex to wq->mutex to make it acts as instance's mutex.

We plan to covert wq->nr_drainers and wq->saved_max_active to be protected by it.
We plan to covert all pwqs of this wq to be protected _only_ by it or RCU.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 50 +++++++++++++++++++++++++-------------------------
1 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5252107..4ae6ba7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -118,8 +118,6 @@ enum {
* cpu or grabbing pool->lock is enough for read access. If
* POOL_DISASSOCIATED is set, it's identical to L.
*
- * F: wq->flush_mutex protected.
- *
* MG: pool->manager_mutex and pool->lock protected. Writes require both
* locks. Reads can happen under either lock.
*
@@ -131,7 +129,9 @@ enum {
*
* PW: pwq_lock protected.
*
- * FR: wq->flush_mutex and pwq_lock protected for writes. Sched-RCU
+ * Q: wq->mutex protected.
+ *
+ * QR: wq->mutex and pwq_lock protected for writes. Sched-RCU
* protected for reads.
*
* MD: wq_mayday_lock protected.
@@ -199,7 +199,7 @@ struct pool_workqueue {
int nr_active; /* L: nr of active works */
int max_active; /* L: max active works */
struct list_head delayed_works; /* L: delayed works */
- struct list_head pwqs_node; /* FR: node on wq->pwqs */
+ struct list_head pwqs_node; /* QR: node on wq->pwqs */
struct list_head mayday_node; /* MD: node on wq->maydays */

/*
@@ -216,8 +216,8 @@ struct pool_workqueue {
* Structure used to wait for workqueue flush.
*/
struct wq_flusher {
- struct list_head list; /* F: list of flushers */
- int flush_color; /* F: flush color waiting for */
+ struct list_head list; /* Q: list of flushers */
+ int flush_color; /* Q: flush color waiting for */
struct completion done; /* flush completion */
};

@@ -230,16 +230,16 @@ struct wq_device;
struct workqueue_struct {
unsigned int flags; /* QS: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
- struct list_head pwqs; /* FR: all pwqs of this wq */
+ struct list_head pwqs; /* QR: all pwqs of this wq */
struct list_head list; /* QS: list of all workqueues */

- struct mutex flush_mutex; /* protects wq flushing */
- int work_color; /* F: current work color */
- int flush_color; /* F: current flush color */
+ struct mutex mutex; /* protects wq */
+ int work_color; /* Q: current work color */
+ int flush_color; /* Q: current flush color */
atomic_t nr_pwqs_to_flush; /* flush in progress */
- struct wq_flusher *first_flusher; /* F: first flusher */
- struct list_head flusher_queue; /* F: flush waiters */
- struct list_head flusher_overflow; /* F: flush overflow list */
+ struct wq_flusher *first_flusher; /* Q: first flusher */
+ struct list_head flusher_queue; /* Q: flush waiters */
+ struct list_head flusher_overflow; /* Q: flush overflow list */

struct list_head maydays; /* MD: pwqs requesting rescue */
struct worker *rescuer; /* I: rescue worker */
@@ -2462,7 +2462,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
* advanced to @work_color.
*
* CONTEXT:
- * mutex_lock(wq->flush_mutex).
+ * mutex_lock(wq->mutex).
*
* RETURNS:
* %true if @flush_color >= 0 and there's something to flush. %false
@@ -2531,7 +2531,7 @@ void flush_workqueue(struct workqueue_struct *wq)
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);

- mutex_lock(&wq->flush_mutex);
+ mutex_lock(&wq->mutex);

/*
* Start-to-wait phase
@@ -2576,7 +2576,7 @@ void flush_workqueue(struct workqueue_struct *wq)
list_add_tail(&this_flusher.list, &wq->flusher_overflow);
}

- mutex_unlock(&wq->flush_mutex);
+ mutex_unlock(&wq->mutex);

wait_for_completion(&this_flusher.done);

@@ -2589,7 +2589,7 @@ void flush_workqueue(struct workqueue_struct *wq)
if (wq->first_flusher != &this_flusher)
return;

- mutex_lock(&wq->flush_mutex);
+ mutex_lock(&wq->mutex);

/* we might have raced, check again with mutex held */
if (wq->first_flusher != &this_flusher)
@@ -2661,7 +2661,7 @@ void flush_workqueue(struct workqueue_struct *wq)
}

out_unlock:
- mutex_unlock(&wq->flush_mutex);
+ mutex_unlock(&wq->mutex);
}
EXPORT_SYMBOL_GPL(flush_workqueue);

@@ -3552,15 +3552,15 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
return;

/*
- * Unlink @pwq. Synchronization against flush_mutex isn't strictly
+ * Unlink @pwq. Synchronization against wq->mutex isn't strictly
* necessary on release but do it anyway. It's easier to verify
* and consistent with the linking path.
*/
- mutex_lock(&wq->flush_mutex);
+ mutex_lock(&wq->mutex);
spin_lock_irq(&pwq_lock);
list_del_rcu(&pwq->pwqs_node);
spin_unlock_irq(&pwq_lock);
- mutex_unlock(&wq->flush_mutex);
+ mutex_unlock(&wq->mutex);

put_unbound_pool(pool);
call_rcu_sched(&pwq->rcu, rcu_free_pwq);
@@ -3631,12 +3631,12 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
INIT_LIST_HEAD(&pwq->mayday_node);
INIT_WORK(&pwq->unbound_release_work, pwq_unbound_release_workfn);

- mutex_lock(&wq->flush_mutex);
+ mutex_lock(&wq->mutex);
spin_lock_irq(&pwq_lock);

/*
* Set the matching work_color. This is synchronized with
- * flush_mutex to avoid confusing flush_workqueue().
+ * wq->mutex to avoid confusing flush_workqueue().
*/
if (p_last_pwq)
*p_last_pwq = first_pwq(wq);
@@ -3649,7 +3649,7 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
list_add_rcu(&pwq->pwqs_node, &wq->pwqs);

spin_unlock_irq(&pwq_lock);
- mutex_unlock(&wq->flush_mutex);
+ mutex_unlock(&wq->mutex);
}

/**
@@ -3766,7 +3766,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
/* init wq */
wq->flags = flags;
wq->saved_max_active = max_active;
- mutex_init(&wq->flush_mutex);
+ mutex_init(&wq->mutex);
atomic_set(&wq->nr_pwqs_to_flush, 0);
INIT_LIST_HEAD(&wq->pwqs);
INIT_LIST_HEAD(&wq->flusher_queue);
--
1.7.7.6

2013-03-19 19:29:13

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 20/21] workqueue: modify wq->freezing only when freezable

simplify pwq_adjust_max_active().
make freeze_workqueues_begin() and thaw_workqueues() fast skip non-freezable wq.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 882fe87..ce8fd7b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3487,18 +3487,15 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
static void pwq_adjust_max_active(struct pool_workqueue *pwq)
{
struct workqueue_struct *wq = pwq->wq;
- bool freezable = wq->flags & WQ_FREEZABLE;

/* for @wq->saved_max_active and @wq->freezing */
lockdep_assert_held(&wq->mutex);
+ if (WARN_ON_ONCE(!(wq->flags & WQ_FREEZABLE) && wq->freezing))
+ wq->freezing = false;

- /* fast exit for non-freezable wqs */
- if (!freezable && pwq->max_active == wq->saved_max_active)
- return;
-
- if (freezable && wq->freezing) {
+ if (wq->freezing) {
pwq->max_active = 0;
- } else {
+ } else if (pwq->max_active != wq->saved_max_active) {
spin_lock_irq(&pwq->pool->lock);
pwq->max_active = wq->saved_max_active;

@@ -3711,7 +3708,8 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
mutex_lock(&wqs_mutex);

mutex_lock(&wq->mutex);
- wq->freezing = workqueue_freezing;
+ if (wq->flags & WQ_FREEZABLE)
+ wq->freezing = workqueue_freezing;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
@@ -4206,6 +4204,8 @@ void freeze_workqueues_begin(void)
workqueue_freezing = true;

list_for_each_entry(wq, &workqueues, list) {
+ if (!(wq->flags & WQ_FREEZABLE))
+ continue;
mutex_lock(&wq->mutex);
/* set FREEZING */
WARN_ON_ONCE(wq->freezing);
@@ -4286,6 +4286,8 @@ void thaw_workqueues(void)
workqueue_freezing = false;

list_for_each_entry(wq, &workqueues, list) {
+ if (!(wq->flags & WQ_FREEZABLE))
+ continue;
mutex_lock(&wq->mutex);
/* clear FREEZING */
WARN_ON_ONCE(!wq->freezing);
--
1.7.7.6

2013-03-19 19:29:11

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 18/21] workqueue: read POOL_DISASSOCIATED bit under pool->lock

Simply move it to pool->lock C.S.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 21ef721..0c692d4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1611,11 +1611,10 @@ static struct worker *create_worker(struct worker_pool *pool)
* remains stable across this function. See the comments above the
* flag definition for details.
*/
+ spin_lock_irq(&pool->lock);
if (pool->flags & POOL_DISASSOCIATED)
worker->flags |= WORKER_UNBOUND;
-
/* successful, commit the pointer to idr */
- spin_lock_irq(&pool->lock);
idr_replace(&pool->worker_idr, worker, worker->id);
spin_unlock_irq(&pool->lock);

--
1.7.7.6

2013-03-19 19:29:43

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 21/21] workqueue: avoid false negative in assert_manager_or_pool_lock()

If lockdep complains something for other subsystem, lockdep_is_held() can be
false negative. so we need to also test debug_locks before do assert.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ce8fd7b..13d4d46 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -304,7 +304,8 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,

#ifdef CONFIG_LOCKDEP
#define assert_manager_or_pool_lock(pool) \
- WARN_ONCE(!lockdep_is_held(&(pool)->manager_mutex) && \
+ WARN_ONCE(debug_locks && \
+ !lockdep_is_held(&(pool)->manager_mutex) && \
!lockdep_is_held(&(pool)->lock), \
"pool->manager_mutex or ->lock should be held")
#else
--
1.7.7.6

2013-03-19 19:29:08

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 17/21] workqueue: simplify workqueue_cpu_up_callback(CPU_ONLINE)

If we have 4096 CPUs, workqueue_cpu_up_callback() will travel too much CPUs,
to avoid it, we use for_each_cpu_worker_pool for the cpu pools and
use unbound_pool_hash for unbound pools.

After it, for_each_pool() becomes unused, so we remove it.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 55 +++++++++++++++++-----------------------------------
1 files changed, 18 insertions(+), 37 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 29c5baa..21ef721 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -317,23 +317,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
(pool)++)

/**
- * for_each_pool - iterate through all worker_pools in the system
- * @pool: iteration cursor
- * @pi: integer used for iteration
- *
- * This must be called either with pools_mutex held or sched RCU read locked.
- * If the pool needs to be used beyond the locking in effect, the caller is
- * responsible for guaranteeing that the pool stays online.
- *
- * The if/else clause exists only for the lockdep assertion and can be
- * ignored.
- */
-#define for_each_pool(pool, pi) \
- idr_for_each_entry(&worker_pool_idr, pool, pi) \
- if (({ assert_rcu_or_pools_mutex(); false; })) { } \
- else
-
-/**
* for_each_pool_worker - iterate through all workers of a worker_pool
* @worker: iteration cursor
* @wi: integer used for iteration
@@ -4011,7 +3994,7 @@ static void associate_cpu_pool(struct worker_pool *pool)
struct worker *worker;
int wi;

- lockdep_assert_held(&pool->manager_mutex);
+ mutex_lock(&pool->manager_mutex);

/*
* Restore CPU affinity of all workers. As all idle workers should
@@ -4023,7 +4006,7 @@ static void associate_cpu_pool(struct worker_pool *pool)
for_each_pool_worker(worker, wi, pool)
if (WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0))
- return;
+ goto out_unlock;

spin_lock_irq(&pool->lock);

@@ -4064,6 +4047,9 @@ static void associate_cpu_pool(struct worker_pool *pool)

pool->flags &= ~POOL_DISASSOCIATED;
spin_unlock_irq(&pool->lock);
+
+out_unlock:
+ mutex_unlock(&pool->manager_mutex);
}

/**
@@ -4078,25 +4064,28 @@ static void associate_cpu_pool(struct worker_pool *pool)
*/
static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
{
- static cpumask_t cpumask;
+ static cpumask_t cpumask; /* protected by pools_mutex */
struct worker *worker;
int wi;

- lockdep_assert_held(&pool->manager_mutex);
+ mutex_lock(&pool->manager_mutex);

/* is @cpu allowed for @pool? */
if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
- return;
+ goto out_unlock;

/* is @cpu the only online CPU? */
cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
if (cpumask_weight(&cpumask) != 1)
- return;
+ goto out_unlock;

/* as we're called from CPU_ONLINE, the following shouldn't fail */
for_each_pool_worker(worker, wi, pool)
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);
+
+out_unlock:
+ mutex_unlock(&pool->manager_mutex);
}

/*
@@ -4109,7 +4098,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
{
int cpu = (unsigned long)hcpu;
struct worker_pool *pool;
- int pi;
+ int bkt;

switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
@@ -4123,20 +4112,12 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,

case CPU_DOWN_FAILED:
case CPU_ONLINE:
- mutex_lock(&pools_mutex);
-
- for_each_pool(pool, pi) {
- mutex_lock(&pool->manager_mutex);
-
- if (pool->cpu == cpu) {
- associate_cpu_pool(pool);
- } else if (pool->cpu < 0) {
- restore_unbound_workers_cpumask(pool, cpu);
- }
-
- mutex_unlock(&pool->manager_mutex);
- }
+ for_each_cpu_worker_pool(pool, cpu)
+ associate_cpu_pool(pool);

+ mutex_lock(&pools_mutex);
+ hash_for_each(unbound_pool_hash, bkt, pool, hash_node)
+ restore_unbound_workers_cpumask(pool, cpu);
mutex_unlock(&pools_mutex);
break;
}
--
1.7.7.6

2013-03-19 19:30:04

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 19/21] workqueue: remove @p_last_pwq from init_and_link_pwq()

make init_and_link_pwq() simplier.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 19 +++++++++----------
1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0c692d4..882fe87 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3519,10 +3519,10 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)

static void init_and_link_pwq(struct pool_workqueue *pwq,
struct workqueue_struct *wq,
- struct worker_pool *pool,
- struct pool_workqueue **p_last_pwq)
+ struct worker_pool *pool)
{
BUG_ON((unsigned long)pwq & WORK_STRUCT_FLAG_MASK);
+ lockdep_assert_held(&wq->mutex);

pwq->pool = pool;
pwq->wq = wq;
@@ -3532,14 +3532,10 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
INIT_LIST_HEAD(&pwq->mayday_node);
INIT_WORK(&pwq->unbound_release_work, pwq_unbound_release_workfn);

- mutex_lock(&wq->mutex);
-
/*
* Set the matching work_color. This is synchronized with
* wq->mutex to avoid confusing flush_workqueue().
*/
- if (p_last_pwq)
- *p_last_pwq = first_pwq(wq);
pwq->work_color = wq->work_color;

/* sync max_active to the current setting */
@@ -3547,8 +3543,6 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,

/* link in @pwq */
list_add_rcu(&pwq->pwqs_node, &wq->pwqs);
-
- mutex_unlock(&wq->mutex);
}

/**
@@ -3589,12 +3583,15 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
return -ENOMEM;
}

- init_and_link_pwq(pwq, wq, pool, &last_pwq);
+ mutex_lock(&wq->mutex);
+ last_pwq = first_pwq(wq);
+ init_and_link_pwq(pwq, wq, pool);
if (last_pwq) {
spin_lock_irq(&last_pwq->pool->lock);
put_pwq(last_pwq);
spin_unlock_irq(&last_pwq->pool->lock);
}
+ mutex_unlock(&wq->mutex);

return 0;
}
@@ -3609,14 +3606,16 @@ static int alloc_and_link_pwqs(struct workqueue_struct *wq)
if (!wq->cpu_pwqs)
return -ENOMEM;

+ mutex_lock(&wq->mutex);
for_each_possible_cpu(cpu) {
struct pool_workqueue *pwq =
per_cpu_ptr(wq->cpu_pwqs, cpu);
struct worker_pool *cpu_pools =
per_cpu(cpu_worker_pools, cpu);

- init_and_link_pwq(pwq, wq, &cpu_pools[highpri], NULL);
+ init_and_link_pwq(pwq, wq, &cpu_pools[highpri]);
}
+ mutex_unlock(&wq->mutex);
return 0;
} else {
return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
--
1.7.7.6

2013-03-19 19:29:03

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 14/21] workqueue: add wq->freezing and remove POOL_FREEZING

freezing is nothing related to pools, but POOL_FREEZING adds a connection.
it causes:
workqueue_freezing is protected by both wqs_mutex and pools_mutex.
freeze_workqueues_begin() and thaw_workqueues() are very complicated.

Since freezing is workqueue instance attribute, so we introduce wq->freezing
instead and remove POOL_FREEZING. all the problem are solved.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 49 +++++++++++++++----------------------------------
1 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e45f038..db08b63 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,6 @@ enum {
*/
POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
POOL_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */
- POOL_FREEZING = 1 << 3, /* freeze in progress */

/* worker flags */
WORKER_STARTED = 1 << 0, /* started */
@@ -243,6 +242,7 @@ struct workqueue_struct {

int nr_drainers; /* Q: drain in progress */
int saved_max_active; /* Q: saved pwq max_active */
+ bool freezing; /* Q&QS: the wq is freezing */

#ifdef CONFIG_SYSFS
struct wq_device *wq_dev; /* I: for sysfs interface */
@@ -260,7 +260,7 @@ static DEFINE_MUTEX(pools_mutex); /* protects pools */
static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */

static LIST_HEAD(workqueues); /* QS: list of all workqueues */
-static bool workqueue_freezing; /* QS&PS: have wqs started freezing? */
+static bool workqueue_freezing; /* QS: have wqs started freezing? */

/* the per-cpu worker pools */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
@@ -3495,9 +3495,6 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
if (!pool || init_worker_pool(pool) < 0)
goto fail;

- if (workqueue_freezing)
- pool->flags |= POOL_FREEZING;
-
lockdep_set_subclass(&pool->lock, 1); /* see put_pwq() */
copy_workqueue_attrs(pool->attrs, attrs);

@@ -3573,18 +3570,17 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
struct workqueue_struct *wq = pwq->wq;
bool freezable = wq->flags & WQ_FREEZABLE;

- /* for @wq->saved_max_active */
+ /* for @wq->saved_max_active and @wq->freezing */
lockdep_assert_held(&wq->mutex);

/* fast exit for non-freezable wqs */
if (!freezable && pwq->max_active == wq->saved_max_active)
return;

- spin_lock_irq(&pwq->pool->lock);
-
- if (freezable && (pwq->pool->flags & POOL_FREEZING)) {
+ if (freezable && wq->freezing) {
pwq->max_active = 0;
} else {
+ spin_lock_irq(&pwq->pool->lock);
pwq->max_active = wq->saved_max_active;

while (!list_empty(&pwq->delayed_works) &&
@@ -3598,9 +3594,8 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* But this function is slowpath, wake up worker unconditionally
*/
wake_up_worker(pwq->pool);
+ spin_unlock_irq(&pwq->pool->lock);
}
-
- spin_unlock_irq(&pwq->pool->lock);
}

static void init_and_link_pwq(struct pool_workqueue *pwq,
@@ -3798,6 +3793,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
mutex_lock(&wqs_mutex);

mutex_lock(&wq->mutex);
+ wq->freezing = workqueue_freezing;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
@@ -4285,28 +4281,20 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
*/
void freeze_workqueues_begin(void)
{
- struct worker_pool *pool;
struct workqueue_struct *wq;
struct pool_workqueue *pwq;
- int pi;

mutex_lock(&wqs_mutex);

/* set FREEZING */
- mutex_lock(&pools_mutex);
WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;

- for_each_pool(pool, pi) {
- spin_lock_irq(&pool->lock);
- WARN_ON_ONCE(pool->flags & POOL_FREEZING);
- pool->flags |= POOL_FREEZING;
- spin_unlock_irq(&pool->lock);
- }
- mutex_unlock(&pools_mutex);
-
list_for_each_entry(wq, &workqueues, list) {
mutex_lock(&wq->mutex);
+ /* set FREEZING */
+ WARN_ON_ONCE(wq->freezing);
+ wq->freezing = true;
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
@@ -4374,28 +4362,21 @@ void thaw_workqueues(void)
{
struct workqueue_struct *wq;
struct pool_workqueue *pwq;
- struct worker_pool *pool;
- int pi;

mutex_lock(&wqs_mutex);

if (!workqueue_freezing)
goto out_unlock;

- /* clear FREEZING */
- mutex_lock(&pools_mutex);
workqueue_freezing = false;
- for_each_pool(pool, pi) {
- spin_lock_irq(&pool->lock);
- WARN_ON_ONCE(!(pool->flags & POOL_FREEZING));
- pool->flags &= ~POOL_FREEZING;
- spin_unlock_irq(&pool->lock);
- }
- mutex_unlock(&pools_mutex);

- /* restore max_active and repopulate worklist */
list_for_each_entry(wq, &workqueues, list) {
mutex_lock(&wq->mutex);
+ /* clear FREEZING */
+ WARN_ON_ONCE(!wq->freezing);
+ wq->freezing = false;
+
+ /* restore max_active and repopulate worklist */
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
mutex_unlock(&wq->mutex);
--
1.7.7.6

2013-03-19 19:30:48

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 16/21] workqueue: rename rebind_workers() to associate_cpu_pool()

merge the code of clearing POOL_DISASSOCIATED to rebind_workers(), and
rename rebind_workers() to associate_cpu_pool().

It merges high related code together and simplify
workqueue_cpu_up_callback().

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 21 ++++++++++-----------
1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 12e3e3a..29c5baa 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2167,7 +2167,7 @@ recheck:
* worker or that someone else has already assumed the manager
* role. This is where @worker starts participating in concurrency
* management if applicable and concurrency management is restored
- * after being rebound. See rebind_workers() for details.
+ * after being rebound. See associate_cpu_pool() for details.
*/
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

@@ -4000,12 +4000,13 @@ static void wq_unbind_fn(struct work_struct *work)
}

/**
- * rebind_workers - rebind all workers of a pool to the associated CPU
+ * associate_cpu_pool - rebind all workers of a pool to the associated CPU
* @pool: pool of interest
*
- * @pool->cpu is coming online. Rebind all workers to the CPU.
+ * @pool->cpu is coming online. Rebind all workers to the CPU and
+ * set the pool associated
*/
-static void rebind_workers(struct worker_pool *pool)
+static void associate_cpu_pool(struct worker_pool *pool)
{
struct worker *worker;
int wi;
@@ -4020,8 +4021,9 @@ static void rebind_workers(struct worker_pool *pool)
* from CPU_ONLINE, the following shouldn't fail.
*/
for_each_pool_worker(worker, wi, pool)
- WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
- pool->attrs->cpumask) < 0);
+ if (WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
+ pool->attrs->cpumask) < 0))
+ return;

spin_lock_irq(&pool->lock);

@@ -4060,6 +4062,7 @@ static void rebind_workers(struct worker_pool *pool)
ACCESS_ONCE(worker->flags) = worker_flags;
}

+ pool->flags &= ~POOL_DISASSOCIATED;
spin_unlock_irq(&pool->lock);
}

@@ -4126,11 +4129,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
mutex_lock(&pool->manager_mutex);

if (pool->cpu == cpu) {
- spin_lock_irq(&pool->lock);
- pool->flags &= ~POOL_DISASSOCIATED;
- spin_unlock_irq(&pool->lock);
-
- rebind_workers(pool);
+ associate_cpu_pool(pool);
} else if (pool->cpu < 0) {
restore_unbound_workers_cpumask(pool, cpu);
}
--
1.7.7.6

2013-03-19 19:31:12

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 15/21] workqueue: remove worker_maybe_bind_and_lock()

worker_maybe_bind_and_lock() is only used by rescuers.
but rescuer thread can fails when it do bind, so using
set_cpus_allowed_ptr() is enough.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 67 +--------------------------------------------------
1 files changed, 2 insertions(+), 65 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index db08b63..12e3e3a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1545,70 +1545,6 @@ static void worker_leave_idle(struct worker *worker)
list_del_init(&worker->entry);
}

-/**
- * worker_maybe_bind_and_lock - try to bind %current to worker_pool and lock it
- * @pool: target worker_pool
- *
- * Bind %current to the cpu of @pool if it is associated and lock @pool.
- *
- * Works which are scheduled while the cpu is online must at least be
- * scheduled to a worker which is bound to the cpu so that if they are
- * flushed from cpu callbacks while cpu is going down, they are
- * guaranteed to execute on the cpu.
- *
- * This function is to be used by unbound workers and rescuers to bind
- * themselves to the target cpu and may race with cpu going down or
- * coming online. kthread_bind() can't be used because it may put the
- * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
- * verbatim as it's best effort and blocking and pool may be
- * [dis]associated in the meantime.
- *
- * This function tries set_cpus_allowed() and locks pool and verifies the
- * binding against %POOL_DISASSOCIATED which is set during
- * %CPU_DOWN_PREPARE and cleared during %CPU_ONLINE, so if the worker
- * enters idle state or fetches works without dropping lock, it can
- * guarantee the scheduling requirement described in the first paragraph.
- *
- * CONTEXT:
- * Might sleep. Called without any lock but returns with pool->lock
- * held.
- *
- * RETURNS:
- * %true if the associated pool is online (@worker is successfully
- * bound), %false if offline.
- */
-static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
-__acquires(&pool->lock)
-{
- while (true) {
- /*
- * The following call may fail, succeed or succeed
- * without actually migrating the task to the cpu if
- * it races with cpu hotunplug operation. Verify
- * against POOL_DISASSOCIATED.
- */
- if (!(pool->flags & POOL_DISASSOCIATED))
- set_cpus_allowed_ptr(current, pool->attrs->cpumask);
-
- spin_lock_irq(&pool->lock);
- if (pool->flags & POOL_DISASSOCIATED)
- return false;
- if (task_cpu(current) == pool->cpu &&
- cpumask_equal(&current->cpus_allowed, pool->attrs->cpumask))
- return true;
- spin_unlock_irq(&pool->lock);
-
- /*
- * We've raced with CPU hot[un]plug. Give it a breather
- * and retry migration. cond_resched() is required here;
- * otherwise, we might deadlock against cpu_stop trying to
- * bring down the CPU on non-preemptive kernel.
- */
- cpu_relax();
- cond_resched();
- }
-}
-
static struct worker *alloc_worker(void)
{
struct worker *worker;
@@ -2326,7 +2262,8 @@ repeat:
spin_unlock_irq(&wq_mayday_lock);

/* migrate to the target cpu if possible */
- worker_maybe_bind_and_lock(pool);
+ set_cpus_allowed_ptr(current, pool->attrs->cpumask);
+ spin_lock_irq(&pool->lock);
rescuer->pool = pool;

/*
--
1.7.7.6

2013-03-19 19:29:00

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 12/21] workqueue: use wq->mutex to protect saved_max_active

saved_max_active is instance's field, so we use wq->mutex to protect
saved_max_active and pwq_adjust_max_active().

The patch also convert for_each_pwq()(which are around pwq_adjust_max_active())
to be protected by wq->mutex.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 29 ++++++++++++++---------------
1 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 41e7737..a3460e7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -245,7 +245,7 @@ struct workqueue_struct {
struct worker *rescuer; /* I: rescue worker */

int nr_drainers; /* Q: drain in progress */
- int saved_max_active; /* PW: saved pwq max_active */
+ int saved_max_active; /* Q: saved pwq max_active */

#ifdef CONFIG_SYSFS
struct wq_device *wq_dev; /* I: for sysfs interface */
@@ -3581,13 +3581,13 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
bool freezable = wq->flags & WQ_FREEZABLE;

/* for @wq->saved_max_active */
- lockdep_assert_held(&pwq_lock);
+ lockdep_assert_held(&wq->mutex);

/* fast exit for non-freezable wqs */
if (!freezable && pwq->max_active == wq->saved_max_active)
return;

- spin_lock(&pwq->pool->lock);
+ spin_lock_irq(&pwq->pool->lock);

if (freezable && (pwq->pool->flags & POOL_FREEZING)) {
pwq->max_active = 0;
@@ -3607,7 +3607,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
wake_up_worker(pwq->pool);
}

- spin_unlock(&pwq->pool->lock);
+ spin_unlock_irq(&pwq->pool->lock);
}

static void init_and_link_pwq(struct pool_workqueue *pwq,
@@ -3626,7 +3626,6 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
INIT_WORK(&pwq->unbound_release_work, pwq_unbound_release_workfn);

mutex_lock(&wq->mutex);
- spin_lock_irq(&pwq_lock);

/*
* Set the matching work_color. This is synchronized with
@@ -3640,9 +3639,10 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
pwq_adjust_max_active(pwq);

/* link in @pwq */
+ spin_lock_irq(&pwq_lock);
list_add_rcu(&pwq->pwqs_node, &wq->pwqs);
-
spin_unlock_irq(&pwq_lock);
+
mutex_unlock(&wq->mutex);
}

@@ -3806,10 +3806,10 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
*/
mutex_lock(&wqs_mutex);

- spin_lock_irq(&pwq_lock);
+ mutex_lock(&wq->mutex);
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
- spin_unlock_irq(&pwq_lock);
+ mutex_unlock(&wq->mutex);

list_add(&wq->list, &workqueues);

@@ -3920,14 +3920,14 @@ void workqueue_set_max_active(struct workqueue_struct *wq, int max_active)

max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);

- spin_lock_irq(&pwq_lock);
+ mutex_lock(&wq->mutex);

wq->saved_max_active = max_active;

for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);

- spin_unlock_irq(&pwq_lock);
+ mutex_unlock(&wq->mutex);
}
EXPORT_SYMBOL_GPL(workqueue_set_max_active);

@@ -4314,13 +4314,12 @@ void freeze_workqueues_begin(void)
}
mutex_unlock(&pools_mutex);

- /* suppress further executions by setting max_active to zero */
- spin_lock_irq(&pwq_lock);
list_for_each_entry(wq, &workqueues, list) {
+ mutex_lock(&wq->mutex);
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
+ mutex_unlock(&wq->mutex);
}
- spin_unlock_irq(&pwq_lock);

mutex_unlock(&wqs_mutex);
}
@@ -4404,12 +4403,12 @@ void thaw_workqueues(void)
mutex_unlock(&pools_mutex);

/* restore max_active and repopulate worklist */
- spin_lock_irq(&pwq_lock);
list_for_each_entry(wq, &workqueues, list) {
+ mutex_lock(&wq->mutex);
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
+ mutex_unlock(&wq->mutex);
}
- spin_unlock_irq(&pwq_lock);
out_unlock:
mutex_unlock(&wqs_mutex);
}
--
1.7.7.6

2013-03-19 19:31:39

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 13/21] workqueue: remove unused pwq_lock

all first_pwq() and for_each_pwq() are converted to be protected by
wq->mutex or RCU_SCHED, so freaky pwq_lock is unused.

simply remove it.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 31 +++++++++++--------------------
1 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a3460e7..e45f038 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -127,12 +127,9 @@ enum {
*
* PR: pools_mutex protected for writes. Sched-RCU protected for reads.
*
- * PW: pwq_lock protected.
- *
* Q: wq->mutex protected.
*
- * QR: wq->mutex and pwq_lock protected for writes. Sched-RCU
- * protected for reads.
+ * QR: wq->mutex protected for writes. Sched-RCU protected for reads.
*
* MD: wq_mayday_lock protected.
*/
@@ -206,7 +203,7 @@ struct pool_workqueue {
* Release of unbound pwq is punted to system_wq. See put_pwq()
* and pwq_unbound_release_workfn() for details. pool_workqueue
* itself is also sched-RCU protected so that the first pwq can be
- * determined without grabbing pwq_lock.
+ * determined without grabbing wq->mutex.
*/
struct work_struct unbound_release_work;
struct rcu_head rcu;
@@ -260,7 +257,6 @@ static struct kmem_cache *pwq_cache;

static DEFINE_MUTEX(wqs_mutex); /* protects workqueues */
static DEFINE_MUTEX(pools_mutex); /* protects pools */
-static DEFINE_SPINLOCK(pwq_lock); /* protects pool_workqueues */
static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */

static LIST_HEAD(workqueues); /* QS: list of all workqueues */
@@ -301,11 +297,10 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
lockdep_is_held(&pools_mutex), \
"sched RCU or pools_mutex should be held")

-#define assert_rcu_or_pwq_lock(wq) \
+#define assert_rcu_or_wq_mutex(wq) \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
- lockdep_is_held(&wq->mutex) || \
- lockdep_is_held(&pwq_lock), \
- "sched RCU or pwq_lock should be held")
+ lockdep_is_held(&wq->mutex), \
+ "sched RCU or wq->mutex should be held")

#ifdef CONFIG_LOCKDEP
#define assert_manager_or_pool_lock(pool) \
@@ -359,7 +354,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
* @pwq: iteration cursor
* @wq: the target workqueue
*
- * This must be called either with pwq_lock held or sched RCU read locked.
+ * This must be called either with wq->mutex held or sched RCU read locked.
* If the pwq needs to be used beyond the locking in effect, the caller is
* responsible for guaranteeing that the pwq stays online.
*
@@ -368,7 +363,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
*/
#define for_each_pwq(pwq, wq) \
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node) \
- if (({ assert_rcu_or_pwq_lock(wq); false; })) { } \
+ if (({ assert_rcu_or_wq_mutex(wq); false; })) { } \
else

#ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -507,13 +502,13 @@ static int worker_pool_assign_id(struct worker_pool *pool)
* first_pwq - return the first pool_workqueue of the specified workqueue
* @wq: the target workqueue
*
- * This must be called either with pwq_lock held or sched RCU read locked.
+ * This must be called either with wq->mutex held or sched RCU read locked.
* If the pwq needs to be used beyond the locking in effect, the caller is
* responsible for guaranteeing that the pwq stays online.
*/
static struct pool_workqueue *first_pwq(struct workqueue_struct *wq)
{
- assert_rcu_or_pwq_lock(wq);
+ assert_rcu_or_wq_mutex(wq);
return list_first_or_null_rcu(&wq->pwqs, struct pool_workqueue,
pwqs_node);
}
@@ -3551,9 +3546,7 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
* and consistent with the linking path.
*/
mutex_lock(&wq->mutex);
- spin_lock_irq(&pwq_lock);
list_del_rcu(&pwq->pwqs_node);
- spin_unlock_irq(&pwq_lock);
mutex_unlock(&wq->mutex);

put_unbound_pool(pool);
@@ -3639,9 +3632,7 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
pwq_adjust_max_active(pwq);

/* link in @pwq */
- spin_lock_irq(&pwq_lock);
list_add_rcu(&pwq->pwqs_node, &wq->pwqs);
- spin_unlock_irq(&pwq_lock);

mutex_unlock(&wq->mutex);
}
@@ -4290,7 +4281,7 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
* pool->worklist.
*
* CONTEXT:
- * Grabs and releases wqs_mutex, pwq_lock and pool->lock's.
+ * Grabs and releases wqs_mutex, wq->mutex and pool->lock's.
*/
void freeze_workqueues_begin(void)
{
@@ -4377,7 +4368,7 @@ out_unlock:
* frozen works are transferred to their respective pool worklists.
*
* CONTEXT:
- * Grabs and releases wqs_mutex, pwq_lock and pool->lock's.
+ * Grabs and releases wqs_mutex, wq->mutex and pool->lock's.
*/
void thaw_workqueues(void)
{
--
1.7.7.6

2013-03-19 19:32:14

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 11/21] workqueue: also allowed wq->mutex protect for_each_pwq()

1) pwqs of a wq are modified with wq->mutex held, so we can use wq->mutex
for read access like for_each_pwq().
2) pwqs of a wq are instance's attribute, reviewers expects wq->mutex
protected it.
3) This patch is the most important step remove pwq_lock. only wq->mutex
is enough, don't need two locks.
4) after it, flush_workqueue_prep_pwqs() and drain_workqueue() don't
disable irq so such long time, disabling irq for long time adds latency
to hardirq and hurts RT peoples.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 34 ++++++++++++++--------------------
1 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 469269e..41e7737 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -301,8 +301,9 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
lockdep_is_held(&pools_mutex), \
"sched RCU or pools_mutex should be held")

-#define assert_rcu_or_pwq_lock() \
+#define assert_rcu_or_pwq_lock(wq) \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
+ lockdep_is_held(&wq->mutex) || \
lockdep_is_held(&pwq_lock), \
"sched RCU or pwq_lock should be held")

@@ -367,7 +368,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
*/
#define for_each_pwq(pwq, wq) \
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node) \
- if (({ assert_rcu_or_pwq_lock(); false; })) { } \
+ if (({ assert_rcu_or_pwq_lock(wq); false; })) { } \
else

#ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -512,7 +513,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
*/
static struct pool_workqueue *first_pwq(struct workqueue_struct *wq)
{
- assert_rcu_or_pwq_lock();
+ assert_rcu_or_pwq_lock(wq);
return list_first_or_null_rcu(&wq->pwqs, struct pool_workqueue,
pwqs_node);
}
@@ -2479,12 +2480,10 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
atomic_set(&wq->nr_pwqs_to_flush, 1);
}

- local_irq_disable();
-
for_each_pwq(pwq, wq) {
struct worker_pool *pool = pwq->pool;

- spin_lock(&pool->lock);
+ spin_lock_irq(&pool->lock);

if (flush_color >= 0) {
WARN_ON_ONCE(pwq->flush_color != -1);
@@ -2501,11 +2500,9 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
pwq->work_color = work_color;
}

- spin_unlock(&pool->lock);
+ spin_unlock_irq(&pool->lock);
}

- local_irq_enable();
-
if (flush_color >= 0 && atomic_dec_and_test(&wq->nr_pwqs_to_flush))
complete(&wq->first_flusher->done);

@@ -2693,14 +2690,14 @@ void drain_workqueue(struct workqueue_struct *wq)
reflush:
flush_workqueue(wq);

- local_irq_disable();
+ mutex_lock(&wq->mutex);

for_each_pwq(pwq, wq) {
bool drained;

- spin_lock(&pwq->pool->lock);
+ spin_lock_irq(&pwq->pool->lock);
drained = !pwq->nr_active && list_empty(&pwq->delayed_works);
- spin_unlock(&pwq->pool->lock);
+ spin_unlock_irq(&pwq->pool->lock);

if (drained)
continue;
@@ -2710,13 +2707,10 @@ reflush:
pr_warn("workqueue %s: drain_workqueue() isn't complete after %u tries\n",
wq->name, flush_cnt);

- local_irq_enable();
+ mutex_unlock(&wq->mutex);
goto reflush;
}

- local_irq_enable();
-
- mutex_lock(&wq->mutex);
if (!--wq->nr_drainers)
wq->flags &= ~__WQ_DRAINING;
mutex_unlock(&wq->mutex);
@@ -3846,13 +3840,13 @@ void destroy_workqueue(struct workqueue_struct *wq)
drain_workqueue(wq);

/* sanity checks */
- spin_lock_irq(&pwq_lock);
+ mutex_lock(&wq->mutex);
for_each_pwq(pwq, wq) {
int i;

for (i = 0; i < WORK_NR_COLORS; i++) {
if (WARN_ON(pwq->nr_in_flight[i])) {
- spin_unlock_irq(&pwq_lock);
+ mutex_unlock(&wq->mutex);
return;
}
}
@@ -3860,11 +3854,11 @@ void destroy_workqueue(struct workqueue_struct *wq)
if (WARN_ON(pwq->refcnt > 1) ||
WARN_ON(pwq->nr_active) ||
WARN_ON(!list_empty(&pwq->delayed_works))) {
- spin_unlock_irq(&pwq_lock);
+ mutex_unlock(&wq->mutex);
return;
}
}
- spin_unlock_irq(&pwq_lock);
+ mutex_unlock(&wq->mutex);

/*
* wq list is used to freeze wq, remove from list after
--
1.7.7.6

2013-03-19 19:32:34

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 10/21] workqueue: use rcu_read_lock_sched() instead for accessing pwq in RCU

rcu_read_lock_sched() is better than preempt_disable() if the code is
protected by RCU_SCHED.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ad190cd..469269e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3967,7 +3967,7 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
struct pool_workqueue *pwq;
bool ret;

- preempt_disable();
+ rcu_read_lock_sched();

if (!(wq->flags & WQ_UNBOUND))
pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
@@ -3975,7 +3975,7 @@ bool workqueue_congested(int cpu, struct workqueue_struct *wq)
pwq = first_pwq(wq);

ret = !list_empty(&pwq->delayed_works);
- preempt_enable();
+ rcu_read_unlock_sched();

return ret;
}
@@ -4361,16 +4361,16 @@ bool freeze_workqueues_busy(void)
* nr_active is monotonically decreasing. It's safe
* to peek without lock.
*/
- preempt_disable();
+ rcu_read_lock_sched();
for_each_pwq(pwq, wq) {
WARN_ON_ONCE(pwq->nr_active < 0);
if (pwq->nr_active) {
busy = true;
- preempt_enable();
+ rcu_read_unlock_sched();
goto out_unlock;
}
}
- preempt_enable();
+ rcu_read_unlock_sched();
}
out_unlock:
mutex_unlock(&wqs_mutex);
--
1.7.7.6

2013-03-19 19:28:54

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 09/21] workqueue: use wq->mutex to protects ->nr_drainers and __WQ_DRAINING

wq->nr_drainers is not workqueueS related field, it is just a field for
its own workqueue, it is not for all workqueueS.

convert it to protected by wq->mutex.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4ae6ba7..ad190cd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -228,7 +228,7 @@ struct wq_device;
* the appropriate worker_pool through its pool_workqueues.
*/
struct workqueue_struct {
- unsigned int flags; /* QS: WQ_* flags */
+ unsigned int flags; /* Q: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
struct list_head pwqs; /* QR: all pwqs of this wq */
struct list_head list; /* QS: list of all workqueues */
@@ -244,7 +244,7 @@ struct workqueue_struct {
struct list_head maydays; /* MD: pwqs requesting rescue */
struct worker *rescuer; /* I: rescue worker */

- int nr_drainers; /* QS: drain in progress */
+ int nr_drainers; /* Q: drain in progress */
int saved_max_active; /* PW: saved pwq max_active */

#ifdef CONFIG_SYSFS
@@ -2686,10 +2686,10 @@ void drain_workqueue(struct workqueue_struct *wq)
* hotter than drain_workqueue() and already looks at @wq->flags.
* Use __WQ_DRAINING so that queue doesn't have to check nr_drainers.
*/
- mutex_lock(&wqs_mutex);
+ mutex_lock(&wq->mutex);
if (!wq->nr_drainers++)
wq->flags |= __WQ_DRAINING;
- mutex_unlock(&wqs_mutex);
+ mutex_unlock(&wq->mutex);
reflush:
flush_workqueue(wq);

@@ -2716,10 +2716,10 @@ reflush:

local_irq_enable();

- mutex_lock(&wqs_mutex);
+ mutex_lock(&wq->mutex);
if (!--wq->nr_drainers)
wq->flags &= ~__WQ_DRAINING;
- mutex_unlock(&wqs_mutex);
+ mutex_unlock(&wq->mutex);
}
EXPORT_SYMBOL_GPL(drain_workqueue);

--
1.7.7.6

2013-03-19 19:33:05

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 07/21] workqueue: rename wq_mutex to wqs_mutex

Just simple rename. because it protects workqueueS, add "s" to it,
like pools_mutex.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 50 +++++++++++++++++++++++++-------------------------
1 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cc5eb61..5252107 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -123,7 +123,7 @@ enum {
* MG: pool->manager_mutex and pool->lock protected. Writes require both
* locks. Reads can happen under either lock.
*
- * WQ: wq_mutex protected.
+ * QS: wqs_mutex protected.
*
* PS: pools_mutex protected.
*
@@ -228,10 +228,10 @@ struct wq_device;
* the appropriate worker_pool through its pool_workqueues.
*/
struct workqueue_struct {
- unsigned int flags; /* WQ: WQ_* flags */
+ unsigned int flags; /* QS: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
struct list_head pwqs; /* FR: all pwqs of this wq */
- struct list_head list; /* WQ: list of all workqueues */
+ struct list_head list; /* QS: list of all workqueues */

struct mutex flush_mutex; /* protects wq flushing */
int work_color; /* F: current work color */
@@ -244,7 +244,7 @@ struct workqueue_struct {
struct list_head maydays; /* MD: pwqs requesting rescue */
struct worker *rescuer; /* I: rescue worker */

- int nr_drainers; /* WQ: drain in progress */
+ int nr_drainers; /* QS: drain in progress */
int saved_max_active; /* PW: saved pwq max_active */

#ifdef CONFIG_SYSFS
@@ -258,13 +258,13 @@ struct workqueue_struct {

static struct kmem_cache *pwq_cache;

-static DEFINE_MUTEX(wq_mutex); /* protects workqueues */
+static DEFINE_MUTEX(wqs_mutex); /* protects workqueues */
static DEFINE_MUTEX(pools_mutex); /* protects pools */
static DEFINE_SPINLOCK(pwq_lock); /* protects pool_workqueues */
static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */

-static LIST_HEAD(workqueues); /* WQ: list of all workqueues */
-static bool workqueue_freezing; /* WQ&PS: have wqs started freezing? */
+static LIST_HEAD(workqueues); /* QS: list of all workqueues */
+static bool workqueue_freezing; /* QS&PS: have wqs started freezing? */

/* the per-cpu worker pools */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
@@ -2686,10 +2686,10 @@ void drain_workqueue(struct workqueue_struct *wq)
* hotter than drain_workqueue() and already looks at @wq->flags.
* Use __WQ_DRAINING so that queue doesn't have to check nr_drainers.
*/
- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);
if (!wq->nr_drainers++)
wq->flags |= __WQ_DRAINING;
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);
reflush:
flush_workqueue(wq);

@@ -2716,10 +2716,10 @@ reflush:

local_irq_enable();

- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);
if (!--wq->nr_drainers)
wq->flags &= ~__WQ_DRAINING;
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);
}
EXPORT_SYMBOL_GPL(drain_workqueue);

@@ -3807,10 +3807,10 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
goto err_destroy;

/*
- * wq_mutex protects global freeze state and workqueues list. Grab
+ * wqs_mutex protects global freeze state and workqueues list. Grab
* it, adjust max_active and add the new @wq to workqueues list.
*/
- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);

spin_lock_irq(&pwq_lock);
for_each_pwq(pwq, wq)
@@ -3819,7 +3819,7 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,

list_add(&wq->list, &workqueues);

- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);

return wq;

@@ -3870,9 +3870,9 @@ void destroy_workqueue(struct workqueue_struct *wq)
* wq list is used to freeze wq, remove from list after
* flushing is complete in case freeze races us.
*/
- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);
list_del_init(&wq->list);
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);

workqueue_sysfs_unregister(wq);

@@ -4296,7 +4296,7 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
* pool->worklist.
*
* CONTEXT:
- * Grabs and releases wq_mutex, pwq_lock and pool->lock's.
+ * Grabs and releases wqs_mutex, pwq_lock and pool->lock's.
*/
void freeze_workqueues_begin(void)
{
@@ -4305,7 +4305,7 @@ void freeze_workqueues_begin(void)
struct pool_workqueue *pwq;
int pi;

- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);

/* set FREEZING */
mutex_lock(&pools_mutex);
@@ -4328,7 +4328,7 @@ void freeze_workqueues_begin(void)
}
spin_unlock_irq(&pwq_lock);

- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);
}

/**
@@ -4338,7 +4338,7 @@ void freeze_workqueues_begin(void)
* between freeze_workqueues_begin() and thaw_workqueues().
*
* CONTEXT:
- * Grabs and releases wq_mutex.
+ * Grabs and releases wqs_mutex.
*
* RETURNS:
* %true if some freezable workqueues are still busy. %false if freezing
@@ -4350,7 +4350,7 @@ bool freeze_workqueues_busy(void)
struct workqueue_struct *wq;
struct pool_workqueue *pwq;

- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);

WARN_ON_ONCE(!workqueue_freezing);

@@ -4373,7 +4373,7 @@ bool freeze_workqueues_busy(void)
preempt_enable();
}
out_unlock:
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);
return busy;
}

@@ -4384,7 +4384,7 @@ out_unlock:
* frozen works are transferred to their respective pool worklists.
*
* CONTEXT:
- * Grabs and releases wq_mutex, pwq_lock and pool->lock's.
+ * Grabs and releases wqs_mutex, pwq_lock and pool->lock's.
*/
void thaw_workqueues(void)
{
@@ -4393,7 +4393,7 @@ void thaw_workqueues(void)
struct worker_pool *pool;
int pi;

- mutex_lock(&wq_mutex);
+ mutex_lock(&wqs_mutex);

if (!workqueue_freezing)
goto out_unlock;
@@ -4417,7 +4417,7 @@ void thaw_workqueues(void)
}
spin_unlock_irq(&pwq_lock);
out_unlock:
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&wqs_mutex);
}
#endif /* CONFIG_FREEZER */

--
1.7.7.6

2013-03-19 19:33:23

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 06/21] workqueue: separate out pools locking into pools_mutex

currently wq_mutext protects:

* worker_pool_idr and unbound_pool_hash
* pool->refcnt
* workqueues list
* workqueue->flags, ->nr_drainers
* workqueue_freezing

We can see that it protects very different things.
So we need to split it and introduce a pools_mutex to protect:

* worker_pool_idr and unbound_pool_hash
* pool->refcnt

(all are pools related field.)

workqueue_freezing is special, it is protected by both of wq_mutext
pools_mutex. All are because get_unbound_pool() need to read it,
which are because POOL_FREEZING is a bad design which will be fixed later.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 66 ++++++++++++++++++++++++++++-----------------------
1 files changed, 36 insertions(+), 30 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fb81159..cc5eb61 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -125,7 +125,9 @@ enum {
*
* WQ: wq_mutex protected.
*
- * WR: wq_mutex protected for writes. Sched-RCU protected for reads.
+ * PS: pools_mutex protected.
+ *
+ * PR: pools_mutex protected for writes. Sched-RCU protected for reads.
*
* PW: pwq_lock protected.
*
@@ -163,8 +165,8 @@ struct worker_pool {
struct idr worker_idr; /* MG: worker IDs and iteration */

struct workqueue_attrs *attrs; /* I: worker attributes */
- struct hlist_node hash_node; /* WQ: unbound_pool_hash node */
- int refcnt; /* WQ: refcnt for unbound pools */
+ struct hlist_node hash_node; /* PS: unbound_pool_hash node */
+ int refcnt; /* PS: refcnt for unbound pools */

/*
* The current concurrency level. As it's likely to be accessed
@@ -256,20 +258,21 @@ struct workqueue_struct {

static struct kmem_cache *pwq_cache;

-static DEFINE_MUTEX(wq_mutex); /* protects workqueues and pools */
+static DEFINE_MUTEX(wq_mutex); /* protects workqueues */
+static DEFINE_MUTEX(pools_mutex); /* protects pools */
static DEFINE_SPINLOCK(pwq_lock); /* protects pool_workqueues */
static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */

static LIST_HEAD(workqueues); /* WQ: list of all workqueues */
-static bool workqueue_freezing; /* WQ: have wqs started freezing? */
+static bool workqueue_freezing; /* WQ&PS: have wqs started freezing? */

/* the per-cpu worker pools */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
cpu_worker_pools);

-static DEFINE_IDR(worker_pool_idr); /* WR: idr of all pools */
+static DEFINE_IDR(worker_pool_idr); /* PR: idr of all pools */

-/* WQ: hash of all unbound pools keyed by pool->attrs */
+/* PS: hash of all unbound pools keyed by pool->attrs */
static DEFINE_HASHTABLE(unbound_pool_hash, UNBOUND_POOL_HASH_ORDER);

/* I: attributes used when instantiating standard unbound pools on demand */
@@ -293,10 +296,10 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
#define CREATE_TRACE_POINTS
#include <trace/events/workqueue.h>

-#define assert_rcu_or_wq_mutex() \
+#define assert_rcu_or_pools_mutex() \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
- lockdep_is_held(&wq_mutex), \
- "sched RCU or wq_mutex should be held")
+ lockdep_is_held(&pools_mutex), \
+ "sched RCU or pools_mutex should be held")

#define assert_rcu_or_pwq_lock() \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
@@ -322,7 +325,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
* @pool: iteration cursor
* @pi: integer used for iteration
*
- * This must be called either with wq_mutex held or sched RCU read locked.
+ * This must be called either with pools_mutex held or sched RCU read locked.
* If the pool needs to be used beyond the locking in effect, the caller is
* responsible for guaranteeing that the pool stays online.
*
@@ -331,7 +334,7 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
*/
#define for_each_pool(pool, pi) \
idr_for_each_entry(&worker_pool_idr, pool, pi) \
- if (({ assert_rcu_or_wq_mutex(); false; })) { } \
+ if (({ assert_rcu_or_pools_mutex(); false; })) { } \
else

/**
@@ -488,7 +491,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
{
int ret;

- lockdep_assert_held(&wq_mutex);
+ lockdep_assert_held(&pools_mutex);

do {
if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL))
@@ -606,9 +609,9 @@ static struct pool_workqueue *get_work_pwq(struct work_struct *work)
*
* Return the worker_pool @work was last associated with. %NULL if none.
*
- * Pools are created and destroyed under wq_mutex, and allows read access
+ * Pools are created and destroyed under pools_mutex, and allows read access
* under sched-RCU read lock. As such, this function should be called
- * under wq_mutex or with preemption disabled.
+ * under pools_mutex or with preemption disabled.
*
* All fields of the returned pool are accessible as long as the above
* mentioned locking is in effect. If the returned pool needs to be used
@@ -620,7 +623,7 @@ static struct worker_pool *get_work_pool(struct work_struct *work)
unsigned long data = atomic_long_read(&work->data);
int pool_id;

- assert_rcu_or_wq_mutex();
+ assert_rcu_or_pools_mutex();

if (data & WORK_STRUCT_PWQ)
return ((struct pool_workqueue *)
@@ -3428,16 +3431,16 @@ static void put_unbound_pool(struct worker_pool *pool)
{
struct worker *worker;

- mutex_lock(&wq_mutex);
+ mutex_lock(&pools_mutex);
if (--pool->refcnt) {
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);
return;
}

/* sanity checks */
if (WARN_ON(!(pool->flags & POOL_DISASSOCIATED)) ||
WARN_ON(!list_empty(&pool->worklist))) {
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);
return;
}

@@ -3446,7 +3449,7 @@ static void put_unbound_pool(struct worker_pool *pool)
idr_remove(&worker_pool_idr, pool->id);
hash_del(&pool->hash_node);

- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);

/*
* Become the manager and destroy all workers. Grabbing
@@ -3488,7 +3491,7 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
u32 hash = wqattrs_hash(attrs);
struct worker_pool *pool;

- mutex_lock(&wq_mutex);
+ mutex_lock(&pools_mutex);

/* do we already have a matching pool? */
hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
@@ -3519,10 +3522,10 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
/* install */
hash_add(unbound_pool_hash, &pool->hash_node, hash);
out_unlock:
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);
return pool;
fail:
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);
if (pool)
put_unbound_pool(pool);
return NULL;
@@ -4199,7 +4202,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,

case CPU_DOWN_FAILED:
case CPU_ONLINE:
- mutex_lock(&wq_mutex);
+ mutex_lock(&pools_mutex);

for_each_pool(pool, pi) {
mutex_lock(&pool->manager_mutex);
@@ -4217,7 +4220,7 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
mutex_unlock(&pool->manager_mutex);
}

- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);
break;
}
return NOTIFY_OK;
@@ -4304,16 +4307,18 @@ void freeze_workqueues_begin(void)

mutex_lock(&wq_mutex);

+ /* set FREEZING */
+ mutex_lock(&pools_mutex);
WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;

- /* set FREEZING */
for_each_pool(pool, pi) {
spin_lock_irq(&pool->lock);
WARN_ON_ONCE(pool->flags & POOL_FREEZING);
pool->flags |= POOL_FREEZING;
spin_unlock_irq(&pool->lock);
}
+ mutex_unlock(&pools_mutex);

/* suppress further executions by setting max_active to zero */
spin_lock_irq(&pwq_lock);
@@ -4394,12 +4399,15 @@ void thaw_workqueues(void)
goto out_unlock;

/* clear FREEZING */
+ mutex_lock(&pools_mutex);
+ workqueue_freezing = false;
for_each_pool(pool, pi) {
spin_lock_irq(&pool->lock);
WARN_ON_ONCE(!(pool->flags & POOL_FREEZING));
pool->flags &= ~POOL_FREEZING;
spin_unlock_irq(&pool->lock);
}
+ mutex_unlock(&pools_mutex);

/* restore max_active and repopulate worklist */
spin_lock_irq(&pwq_lock);
@@ -4408,8 +4416,6 @@ void thaw_workqueues(void)
pwq_adjust_max_active(pwq);
}
spin_unlock_irq(&pwq_lock);
-
- workqueue_freezing = false;
out_unlock:
mutex_unlock(&wq_mutex);
}
@@ -4443,9 +4449,9 @@ static int __init init_workqueues(void)
pool->attrs->nice = std_nice[i++];

/* alloc pool ID */
- mutex_lock(&wq_mutex);
+ mutex_lock(&pools_mutex);
BUG_ON(worker_pool_assign_id(pool));
- mutex_unlock(&wq_mutex);
+ mutex_unlock(&pools_mutex);
}
}

--
1.7.7.6

2013-03-19 19:28:48

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 05/21] workqueue: kick workers in pwq_adjust_max_active()

if pwq_adjust_max_active() changes max_active from 0 to saved_max_active,
it needs to wakeup worker. This action is already done by thaw_workqueues().

if pwq_adjust_max_active() increase the max_active for unbound wq,
it also needs to wakeup worker. This action is missing.

To make these two cases happy, we move kicking workers code from
thaw_workqueues() to pwq_adjust_max_active().

It also makes thaw_workqueues() simpler.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 15 ++++++++-------
1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8c882ae..fb81159 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3600,6 +3600,14 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
while (!list_empty(&pwq->delayed_works) &&
pwq->nr_active < pwq->max_active)
pwq_activate_first_delayed(pwq);
+
+ /*
+ * Need to wake up worker in any of these cases:
+ * wq is just thawed
+ * unbound wq's max_active is just increased
+ * But this function is slowpath, wake up worker unconditionally
+ */
+ wake_up_worker(pwq->pool);
}

spin_unlock(&pwq->pool->lock);
@@ -4401,13 +4409,6 @@ void thaw_workqueues(void)
}
spin_unlock_irq(&pwq_lock);

- /* kick workers */
- for_each_pool(pool, pi) {
- spin_lock_irq(&pool->lock);
- wake_up_worker(pool);
- spin_unlock_irq(&pool->lock);
- }
-
workqueue_freezing = false;
out_unlock:
mutex_unlock(&wq_mutex);
--
1.7.7.6

2013-03-19 19:33:54

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 03/21] workqueue: simplify current_is_workqueue_rescuer()

current_is_workqueue_rescuer() <-> current is worker and worker->rescue_wq

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 298cefe..e1b31fc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3936,7 +3936,7 @@ bool current_is_workqueue_rescuer(void)
{
struct worker *worker = current_wq_worker();

- return worker && worker == worker->current_pwq->wq->rescuer;
+ return worker && worker->rescue_wq;
}

/**
--
1.7.7.6

2013-03-19 19:34:21

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 01/21] workqueue: add missing POOL_FREEZING

When we create a new pool via get_unbound_pool() when freezing.
the pool->flags' POOL_FREEZING is incorrect.

Fix it by adding POOL_FREEZING if workqueue_freezing.
(wq_mutex is already held for accessing workqueue_freezing.)

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e38d035..40f4017 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3503,6 +3503,9 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
if (!pool || init_worker_pool(pool) < 0)
goto fail;

+ if (workqueue_freezing)
+ pool->flags |= POOL_FREEZING;
+
lockdep_set_subclass(&pool->lock, 1); /* see put_pwq() */
copy_workqueue_attrs(pool->attrs, attrs);

--
1.7.7.6

2013-03-20 16:38:19

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 00/21] workqueue: cleanups and better locking for recent changes

Hi, Tejun

I am sorry for replying so late and replied with so huge patchset.

But problem happened now, my patches and your patches are conflict.
Which patchset should be rebased?

I think my patches need be merged at first. Thus workqueue code is
in a better base, then your patchset will be rebased on this base.

Since you are maintainer, your choice will be much reasonable.
If you do any choice, please let me know earlier.

I should write patches after you are done, even cleanups.

Thanks,
Lai



On Wed, Mar 20, 2013 at 3:28 AM, Lai Jiangshan <[email protected]> wrote:
> Hi, TJ
>
> After reviewing(long time review!) for recent 3 patchset:
> tj/review-misc-cleanups
> tj/review-finer-locking
> tj/review-restore-affinity
>
> I did not find anything wrong for the patchset. You can add my Reviewed-by
> for every patch.
>
> Although I always tried hard to review, but I found a possible problem of my
> reviewed patch: 29c91e9912bed7060df6116af90286500f5a700d. I think I should
> print the patch and eat the paper. it is fixed by patch1 here.
>
> In review, I like the patch "replace PF_THREAD_BOUND with PF_NO_SETAFFINITY"
> which moves the PF_NO_SETAFFINITY test out from set_cpus_allowed_ptr().
> the patch make workqueue.c much simpler.
>
> In review, I did not like the full design of some aspects, or I found the patch
> needs add some additional cleanup. but there is nothing wrongs. so I wrote
> this patchset instead to reply to every original patch.
>
> In review, I considered more about finer-locking.
>
> wq_mutex:
> * worker_pool_idr and unbound_pool_hash
> * pool->refcnt
> * workqueues list
> * workqueue->flags, ->nr_drainers
> * workqueue_freezing
>
> pwq_lock:
> * workqueue->pwqs
> * workqueue->saved_max_active
>
> wq->flush_mutex:
> * workqueue->pwqs
> * flushers and flushing fields
>
> In this list, we can find that:
> 1) wq_mutex protects too much different kind of things.
> 2) workqueue->pwqs are protected by both wq->flush_mutex and pwq_lock,
> but in many read sites, they are protected by both wq->flush_mutex and pwq_lock,
> in some other read sites, they are protected by pwq_lock, but can be converted
> to wq->flush_mutex. it means pwq_lock and wq->flush_mutex are redundancy here.
> 3) pwq_lock is global lock, but it protects only workqueue instance fields.
> 4) several workqueue instance fields are protected by different locks.
>
> To make locking better, this patchset changes a little the design as:
> pools_mutex protects the set of pools:
> * worker_pool_idr and unbound_pool_hash
> * pool->refcnt
>
> wqs_mutex(renamed from wq_mutex) protects the set of workqueues and workqueue_freezing:
> * workqueues list
> * workqueue_freezing
>
> workqueue instance mutex(wq->mutex, renamed from wq->flush_mutex) protects
> workqueue instance:
> * workqueue->pwqs
> * flushers and workqueue's flushing fields
> * workqueue->saved_max_active
> * workqueue->freezing
> (if we access the above fields, we access ->pwqs at the same time)
> * workqueue->flags, ->nr_drainers (doing flush at the same time)
>
> Any thought?
>
> Patch1: fix problem of new pool's POOL_FREEZING bit.
> Patch5-14: better locking.
> Patch1,4,5,10,12,14,20: fix/cleanup/dev for freezing and max_active adjusting
>
> others are single cleanup patches
>
> Thanks,
> Lai
>
> Lai Jiangshan (21):
> workqueue: add missing POOL_FREEZING
> workqueue: don't free pool->worker_idr by RCU
> workqueue: simplify current_is_workqueue_rescuer()
> workqueue: swap the two branches in pwq_adjust_max_active() to get
> better readability
> workqueue: kick workers in pwq_adjust_max_active()
> workqueue: separate out pools locking into pools_mutex
> workqueue: rename wq_mutex to wqs_mutex
> workqueue: rename wq->flush_mutex to wq->mutex
> workqueue: use wq->mutex to protects ->nr_drainers and __WQ_DRAINING
> workqueue: use rcu_read_lock_sched() instead for accessing pwq in RCU
> workqueue: also allowed wq->mutex protect for_each_pwq()
> workqueue: use wq->mutex to protect saved_max_active
> workqueue: remove unused pwq_lock
> workqueue: add wq->freezing and remove POOL_FREEZING
> workqueue: remove worker_maybe_bind_and_lock()
> workqueue: rename rebind_workers() to associate_cpu_pool()
> workqueue: simplify workqueue_cpu_up_callback(CPU_ONLINE)
> workqueue: read POOL_DISASSOCIATED bit under pool->lock
> workqueue: remove @p_last_pwq from init_and_link_pwq()
> workqueue: modify wq->freezing only when freezable
> workqueue: avoid false negative in assert_manager_or_pool_lock()
>
> kernel/workqueue.c | 465 ++++++++++++++++++++--------------------------------
> 1 files changed, 179 insertions(+), 286 deletions(-)
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-20 16:40:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 00/21] workqueue: cleanups and better locking for recent changes

Hey,

On Thu, Mar 21, 2013 at 12:38:17AM +0800, Lai Jiangshan wrote:
> I am sorry for replying so late and replied with so huge patchset.
>
> But problem happened now, my patches and your patches are conflict.
> Which patchset should be rebased?
>
> I think my patches need be merged at first. Thus workqueue code is
> in a better base, then your patchset will be rebased on this base.
>
> Since you are maintainer, your choice will be much reasonable.
> If you do any choice, please let me know earlier.
>
> I should write patches after you are done, even cleanups.

Let me first look at your patches. I don't really care either way and
don't expect too many conflicts from the two patchsets anyway. The
NUMA thing is pretty isolated in several functions after all. Even if
there are conflicts, they shouldn't be too difficult to resolve.
Anyways, will review soon.

Thanks.

--
tejun

2013-03-20 17:18:53

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 01/21] workqueue: add missing POOL_FREEZING

On Wed, Mar 20, 2013 at 03:28:01AM +0800, Lai Jiangshan wrote:
> When we create a new pool via get_unbound_pool() when freezing.
> the pool->flags' POOL_FREEZING is incorrect.
>
> Fix it by adding POOL_FREEZING if workqueue_freezing.
> (wq_mutex is already held for accessing workqueue_freezing.)
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Ah, you're absolutely right. What was I thinking adding
CONFIG_FREEZER around workqueue_freezing. I'll popping that one and
applying this one. Thanks for catching this.

--
tejun

2013-03-20 17:24:53

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 01/21] workqueue: add missing POOL_FREEZING

>From 12ee4fc67c00895b3d740297f7ca447239c1983b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <[email protected]>
Date: Wed, 20 Mar 2013 03:28:01 +0800

get_unbound_pool() forgot to set POOL_FREEZING if workqueue_freezing
is set and a new pool could go out of sync with the global freezing
state.

Fix it by adding POOL_FREEZING if workqueue_freezing. wq_mutex is
already held so no further locking is necessary. This also removes
the unused static variable warning when !CONFIG_FREEZER.

tj: Updated commit message.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e38d035..40f4017 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3503,6 +3503,9 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
if (!pool || init_worker_pool(pool) < 0)
goto fail;

+ if (workqueue_freezing)
+ pool->flags |= POOL_FREEZING;
+
lockdep_set_subclass(&pool->lock, 1); /* see put_pwq() */
copy_workqueue_attrs(pool->attrs, attrs);

--
1.8.1.4

2013-03-20 17:29:50

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/21] workqueue: don't free pool->worker_idr by RCU

On Wed, Mar 20, 2013 at 03:28:02AM +0800, Lai Jiangshan wrote:
> pool->worker_idr nor worker->id is not protected by RCU.
> don't need to free pool->worker_idr by RCU.
>
> Just free it directly.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
...
> @@ -3462,6 +3461,7 @@ static void put_unbound_pool(struct worker_pool *pool)
> destroy_worker(worker);
> WARN_ON(pool->nr_workers || pool->nr_idle);
>
> + idr_destroy(&pool->worker_idr);
> spin_unlock_irq(&pool->lock);
> mutex_unlock(&pool->manager_mutex);
> mutex_unlock(&pool->manager_arb);

I don't know about this one. It is correct but I'd prefer to have all
frees grouped together in one function rather than scattered across
two functions. It's not like idr_destroy() is expensive or anything.

Thanks.

--
tejun

2013-03-20 17:33:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 03/21] workqueue: simplify current_is_workqueue_rescuer()

On Wed, Mar 20, 2013 at 03:28:03AM +0800, Lai Jiangshan wrote:
> current_is_workqueue_rescuer() <-> current is worker and worker->rescue_wq
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Applied to wq/for-3.10.

--
tejun

2013-03-20 17:39:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 04/21] workqueue: swap the two branches in pwq_adjust_max_active() to get better readability

On Wed, Mar 20, 2013 at 03:28:04AM +0800, Lai Jiangshan wrote:
> "if (!freezable || !(pwq->pool->flags & POOL_FREEZING))" is hard to read.
>
> Swap the two branches. it becomes
> "if (freezable && (pwq->pool->flags & POOL_FREEZING))", it is better.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/workqueue.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index e1b31fc..8c882ae 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3592,14 +3592,14 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
>
> spin_lock(&pwq->pool->lock);
>
> - if (!freezable || !(pwq->pool->flags & POOL_FREEZING)) {
> + if (freezable && (pwq->pool->flags & POOL_FREEZING)) {
> + pwq->max_active = 0;
> + } else {
> pwq->max_active = wq->saved_max_active;
>
> while (!list_empty(&pwq->delayed_works) &&
> pwq->nr_active < pwq->max_active)
> pwq_activate_first_delayed(pwq);
> - } else {
> - pwq->max_active = 0;

Well, I have tendency to put what I think is the "main" branch above
(which also is what compiler assumes to be the hotter path too without
other input, not that it matters here), but yeah, maybe swapping them
is prettier. Does this even matter?

Trying the patch.... hmm.... I don't know. It looks weirder that way.
Maybe we can do the following if !A || !B bothers you?

if (!(freezable && (pwq->pool->flags & POOL_FREEZING)))

Thanks.

--
tejun

2013-03-20 17:56:42

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] workqueue: kick a worker in pwq_adjust_max_active()

>From 951a078a5285ad31bc22e190616ad54b78fac992 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <[email protected]>
Date: Wed, 20 Mar 2013 10:52:30 -0700

If pwq_adjust_max_active() changes max_active from 0 to
saved_max_active, it needs to wakeup worker. This is already done by
thaw_workqueues().

If pwq_adjust_max_active() increases max_active for an unbound wq,
while not strictly necessary for correctness, it's still desirable to
wake up a worker so that the requested concurrency level is reached
sooner.

Move wake_up_worker() call from thaw_workqueues() to
pwq_adjust_max_active() so that it can handle both of the above two
cases. This also makes thaw_workqueues() simpler.

tj: Updated comments and description.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d2ac6cb..79d1d34 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3598,6 +3598,12 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
while (!list_empty(&pwq->delayed_works) &&
pwq->nr_active < pwq->max_active)
pwq_activate_first_delayed(pwq);
+
+ /*
+ * Need to kick a worker after thawed or an unbound wq's
+ * max_active is bumped. It's a slow path. Do it always.
+ */
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
@@ -4401,13 +4407,6 @@ void thaw_workqueues(void)
}
spin_unlock_irq(&pwq_lock);

- /* kick workers */
- for_each_pool(pool, pi) {
- spin_lock_irq(&pool->lock);
- wake_up_worker(pool);
- spin_unlock_irq(&pool->lock);
- }
-
workqueue_freezing = false;
out_unlock:
mutex_unlock(&wq_mutex);
--
1.8.1.4

2013-03-20 17:58:26

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 06/21] workqueue: separate out pools locking into pools_mutex

On Wed, Mar 20, 2013 at 03:28:06AM +0800, Lai Jiangshan wrote:
> currently wq_mutext protects:
>
> * worker_pool_idr and unbound_pool_hash
> * pool->refcnt
> * workqueues list
> * workqueue->flags, ->nr_drainers
> * workqueue_freezing
>
> We can see that it protects very different things.
> So we need to split it and introduce a pools_mutex to protect:
>
> * worker_pool_idr and unbound_pool_hash
> * pool->refcnt
>
> (all are pools related field.)
>
> workqueue_freezing is special, it is protected by both of wq_mutext
> pools_mutex. All are because get_unbound_pool() need to read it,
> which are because POOL_FREEZING is a bad design which will be fixed later.
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Umm... I'm not sure about this one. What's the benefit of further
splitting wq_mutex? There's no identified bottleneck. It just makes
things more complex.

Thanks.

--
tejun

2013-03-20 18:01:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 10/21] workqueue: use rcu_read_lock_sched() instead for accessing pwq in RCU

On Wed, Mar 20, 2013 at 03:28:10AM +0800, Lai Jiangshan wrote:
> rcu_read_lock_sched() is better than preempt_disable() if the code is
> protected by RCU_SCHED.
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Applied to wq/for-3.10.

Thanks.

--
tejun

2013-03-20 18:10:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 15/21] workqueue: remove worker_maybe_bind_and_lock()

On Wed, Mar 20, 2013 at 03:28:15AM +0800, Lai Jiangshan wrote:
> static struct worker *alloc_worker(void)
> {
> struct worker *worker;
> @@ -2326,7 +2262,8 @@ repeat:
> spin_unlock_irq(&wq_mayday_lock);
>
> /* migrate to the target cpu if possible */
> - worker_maybe_bind_and_lock(pool);
> + set_cpus_allowed_ptr(current, pool->attrs->cpumask);
> + spin_lock_irq(&pool->lock);
> rescuer->pool = pool;

You can't do this. The CPU might go up between set_cpus_allowed_ptr()
and spin_lock_irq() and the rescuer can end up executing a work item
which was issued while the target CPU is up on the wrong CPU.

Thanks.

--
tejun

2013-03-20 18:16:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 17/21] workqueue: simplify workqueue_cpu_up_callback(CPU_ONLINE)

On Wed, Mar 20, 2013 at 03:28:17AM +0800, Lai Jiangshan wrote:
> If we have 4096 CPUs, workqueue_cpu_up_callback() will travel too much CPUs,
> to avoid it, we use for_each_cpu_worker_pool for the cpu pools and
> use unbound_pool_hash for unbound pools.
>
> After it, for_each_pool() becomes unused, so we remove it.
> case CPU_DOWN_FAILED:
> case CPU_ONLINE:
> - mutex_lock(&pools_mutex);
> -
> - for_each_pool(pool, pi) {
> - mutex_lock(&pool->manager_mutex);
> -
> - if (pool->cpu == cpu) {
> - associate_cpu_pool(pool);
> - } else if (pool->cpu < 0) {
> - restore_unbound_workers_cpumask(pool, cpu);
> - }
> -
> - mutex_unlock(&pool->manager_mutex);
> - }
> + for_each_cpu_worker_pool(pool, cpu)
> + associate_cpu_pool(pool);
>
> + mutex_lock(&pools_mutex);
> + hash_for_each(unbound_pool_hash, bkt, pool, hash_node)
> + restore_unbound_workers_cpumask(pool, cpu);
> mutex_unlock(&pools_mutex);
> break;

Hmmm... can you add for_each_unbound_pool() with proper lockdep
assertion? Also, don't shuffle locking and flag setting around. It
doesn't make any functional difference and I kinda like global stuff
in the hotplug callback and actual worker handling in the helper
functions.

Thanks.

--
tejun

2013-03-20 18:19:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 18/21] workqueue: read POOL_DISASSOCIATED bit under pool->lock

On Wed, Mar 20, 2013 at 03:28:18AM +0800, Lai Jiangshan wrote:
> Simply move it to pool->lock C.S.

Patch description should explain *why* in addition to what. Also,
please don't use abbreviations like C.S. Yeah, I know it's critical
section but it might as well be counter strike or context switch.

Thanks.

--
tejun

2013-03-20 18:22:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 21/21] workqueue: avoid false negative in assert_manager_or_pool_lock()

On Wed, Mar 20, 2013 at 03:28:21AM +0800, Lai Jiangshan wrote:
> If lockdep complains something for other subsystem, lockdep_is_held() can be
> false negative. so we need to also test debug_locks before do assert.
>
> Signed-off-by: Lai Jiangshan <[email protected]>

Applied to wq/for-3.10.

Thanks.

--
tejun

2013-03-20 18:30:51

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 00/21] workqueue: cleanups and better locking for recent changes

So, overall,

On Wed, Mar 20, 2013 at 03:28:00AM +0800, Lai Jiangshan wrote:
...
> In this list, we can find that:
> 1) wq_mutex protects too much different kind of things.

I don't agree with this and unless you can come up with a much better
reason, won't be splitting wq_mutex further. Also, I'm not gonna
rename it to wqs_mutex.

> 2) workqueue->pwqs are protected by both wq->flush_mutex and pwq_lock,
> but in many read sites, they are protected by both wq->flush_mutex and pwq_lock,
> in some other read sites, they are protected by pwq_lock, but can be converted
> to wq->flush_mutex. it means pwq_lock and wq->flush_mutex are redundancy here.
> 3) pwq_lock is global lock, but it protects only workqueue instance fields.

A global lock protecting different instances is perfectly fine unless
it actually causes contention in some paths. It often actually is
better to have a single global lock for cold paths as it pollutes less
amount of cache, so please don't split locks for that reason.

That said, I like the fact that wq->flush_mutex can be transformed
into wq->mutex and replaces pwq_lock making it go away, so, yeah, for
that reason, I like pwq_lock removal, so yeah, let's replace pwq_lock
with wq->mutex.

I applied the ones I can apply at this point. I'll rebase the NUMA
patchset on top of the applied ones. Please update the rest on top.

Thanks.

--
tejun

2013-03-21 11:03:33

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 15/21] workqueue: remove worker_maybe_bind_and_lock()

On Thu, Mar 21, 2013 at 2:10 AM, Tejun Heo <[email protected]> wrote:
> On Wed, Mar 20, 2013 at 03:28:15AM +0800, Lai Jiangshan wrote:
>> static struct worker *alloc_worker(void)
>> {
>> struct worker *worker;
>> @@ -2326,7 +2262,8 @@ repeat:
>> spin_unlock_irq(&wq_mayday_lock);
>>
>> /* migrate to the target cpu if possible */
>> - worker_maybe_bind_and_lock(pool);
>> + set_cpus_allowed_ptr(current, pool->attrs->cpumask);
>> + spin_lock_irq(&pool->lock);
>> rescuer->pool = pool;
>
> You can't do this. The CPU might go up between set_cpus_allowed_ptr()
> and spin_lock_irq() and the rescuer can end up executing a work item
> which was issued while the target CPU is up on the wrong CPU.


Why it is wrong?
It can be allowed if the wq has rescuer. (wq of work_on_cpu() don't has rescuer,
it does not hurt to work_on_cpu() of something else.

Actually it was allowed by recent patches.
If a cpu is up while a rescuer do process_scheduled_workers(),
all the later work item will be running the wrong CPU while
the target CPU is up.(due to for_cpu_pool_workers() don't include
active rescuer).

I don't want to go back to make cpuhotplug nor rescuer_theread become
complicated.
so I prefer to allow work item can run on wrong cpu if the wq has rescuer.
All just needs a comments.

I guess you will agaist me...... let me rewrite it .....

Thanks,
Lai

>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-21 17:41:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 15/21] workqueue: remove worker_maybe_bind_and_lock()

Hello, Lai.

On Thu, Mar 21, 2013 at 07:03:30PM +0800, Lai Jiangshan wrote:
> On Thu, Mar 21, 2013 at 2:10 AM, Tejun Heo <[email protected]> wrote:
> > On Wed, Mar 20, 2013 at 03:28:15AM +0800, Lai Jiangshan wrote:
> >> static struct worker *alloc_worker(void)
> >> {
> >> struct worker *worker;
> >> @@ -2326,7 +2262,8 @@ repeat:
> >> spin_unlock_irq(&wq_mayday_lock);
> >>
> >> /* migrate to the target cpu if possible */
> >> - worker_maybe_bind_and_lock(pool);
> >> + set_cpus_allowed_ptr(current, pool->attrs->cpumask);
> >> + spin_lock_irq(&pool->lock);
> >> rescuer->pool = pool;
> >
> > You can't do this. The CPU might go up between set_cpus_allowed_ptr()
> > and spin_lock_irq() and the rescuer can end up executing a work item
> > which was issued while the target CPU is up on the wrong CPU.
>
>
> Why it is wrong?
> It can be allowed if the wq has rescuer. (wq of work_on_cpu() don't has rescuer,
> it does not hurt to work_on_cpu() of something else.
>
> Actually it was allowed by recent patches.
> If a cpu is up while a rescuer do process_scheduled_workers(),
> all the later work item will be running the wrong CPU while
> the target CPU is up.(due to for_cpu_pool_workers() don't include
> active rescuer).

So,

* If the user wraps queueing and flushing inside get_online_cpus(), it
better gets executed and finishes on the target CPU.

* Similarly, it must guaranteed that any work items queued and flushed
by CPU_ONLINE or CPU_DOWN_PREP notifiers should start and finish
execution on the target CPU.

Your patch breaks the above.

> I don't want to go back to make cpuhotplug nor rescuer_theread become
> complicated.
> so I prefer to allow work item can run on wrong cpu if the wq has rescuer.
> All just needs a comments.

It is explained in the comment.

> I guess you will agaist me...... let me rewrite it .....

It's not because I'm "against" you. It's because it's buggy and we
wouldn't have any affinity guarantee with the proposed changes. If
you can maintain the guarantees and remove
worker_maybe_bind_and_lock(), please be my guest.

Thanks.

--
tejun