2013-03-14 02:57:35

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET wq/for-3.10] workqueue: break up workqueue_lock into multiple locks

Hello,

Currently, workqueue_lock protects most shared workqueue resources -
the pools, workqueues, pool_workqueues, draining, ID assignments,
mayday handling and so on. The coverage has grown organically and
there is no identified bottleneck coming from workqueue_lock, but it
has grown a bit too much and scheduled rebinding changes need the
pools and workqueues to be protected by a mutex instead of a spinlock.

The following seven patches break out workqueue_lock into three locks
- wq_mutex, pwq_lock and wq_mayday_lock - so that we get to have a
mutex for pools and workqueues, and the overall locking scheme is
easier to follow.

0001-workqueue-rename-worker_pool-assoc_mutex-to-manager_.patch
0002-workqueue-factor-out-initial-worker-creation-into-cr.patch
0003-workqueue-better-define-locking-rules-around-worker-.patch
0004-workqueue-relocate-global-variable-defs-and-function.patch
0005-workqueue-separate-out-pool-and-workqueue-locking-in.patch
0006-workqueue-separate-out-pool_workqueue-locking-into-p.patch
0007-workqueue-rename-workqueue_lock-to-wq_mayday_lock.patch

0001-0004 are prep / cleanup patches.

0005 separates out pool and workqueue locking into wq_mutex.

0006 separates out pool_workqueue locking into pwq_lock.

0007 renames workqueue_lock to wq_mayday_lock as that's the only left
user of the lock.

This patchset is on top of

wq/for-3.10 e626761691 ("workqueue: implement current_is_workqueue_rescuer()")
+ "workqueue: misc cleanups" patchset [1]

and available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-finer-locking

diffstat follows. Thanks.

kernel/workqueue.c | 355 +++++++++++++++++++++++++++++------------------------
1 file changed, 195 insertions(+), 160 deletions(-)

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1456511


2013-03-14 02:57:39

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/7] workqueue: rename worker_pool->assoc_mutex to ->manager_mutex

Manager operations are currently governed by two mutexes -
pool->manager_arb and ->assoc_mutex. The former is used to decide who
gets to be the manager and the latter to exclude the actual manager
operations including creation and destruction of workers. Anyone who
grabs ->manager_arb must perform manager role; otherwise, the pool
might stall.

Grabbing ->assoc_mutex blocks everyone else from performing manager
operations but doesn't require the holder to perform manager duties as
it's merely blocking manager operations without becoming the manager.

Because the blocking was necessary when [dis]associating per-cpu
workqueues during CPU hotplug events, the latter was named
assoc_mutex. The mutex is scheduled to be used for other purposes, so
this patch gives it a more fitting generic name - manager_mutex - and
updates / adds comments to explain synchronization around the manager
role and operations.

This patch is pure rename / doc update.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 62 +++++++++++++++++++++++++++++++++---------------------
1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f37421f..bc25bdf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -60,8 +60,8 @@ enum {
* %WORKER_UNBOUND set and concurrency management disabled, and may
* be executing on any CPU. The pool behaves as an unbound one.
*
- * Note that DISASSOCIATED can be flipped only while holding
- * assoc_mutex to avoid changing binding state while
+ * Note that DISASSOCIATED should be flipped only while holding
+ * manager_mutex to avoid changing binding state while
* create_worker() is in progress.
*/
POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
@@ -149,8 +149,9 @@ struct worker_pool {
DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
/* L: hash of busy workers */

+ /* see manage_workers() for details on the two manager mutexes */
struct mutex manager_arb; /* manager arbitration */
- struct mutex assoc_mutex; /* protect POOL_DISASSOCIATED */
+ struct mutex manager_mutex; /* manager exclusion */
struct ida worker_ida; /* L: for worker IDs */

struct workqueue_attrs *attrs; /* I: worker attributes */
@@ -1635,7 +1636,7 @@ static void rebind_workers(struct worker_pool *pool)
struct worker *worker, *n;
int i;

- lockdep_assert_held(&pool->assoc_mutex);
+ lockdep_assert_held(&pool->manager_mutex);
lockdep_assert_held(&pool->lock);

/* dequeue and kick idle ones */
@@ -2022,31 +2023,44 @@ static bool manage_workers(struct worker *worker)
struct worker_pool *pool = worker->pool;
bool ret = false;

+ /*
+ * Managership is governed by two mutexes - manager_arb and
+ * manager_mutex. manager_arb handles arbitration of manager role.
+ * Anyone who successfully grabs manager_arb wins the arbitration
+ * and becomes the manager. mutex_trylock() on pool->manager_arb
+ * failure while holding pool->lock reliably indicates that someone
+ * else is managing the pool and the worker which failed trylock
+ * can proceed to executing work items. This means that anyone
+ * grabbing manager_arb is responsible for actually performing
+ * manager duties. If manager_arb is grabbed and released without
+ * actual management, the pool may stall indefinitely.
+ *
+ * manager_mutex is used for exclusion of actual management
+ * operations. The holder of manager_mutex can be sure that none
+ * of management operations, including creation and destruction of
+ * workers, won't take place until the mutex is released. Because
+ * manager_mutex doesn't interfere with manager role arbitration,
+ * it is guaranteed that the pool's management, while may be
+ * delayed, won't be disturbed by someone else grabbing
+ * manager_mutex.
+ */
if (!mutex_trylock(&pool->manager_arb))
return ret;

/*
- * To simplify both worker management and CPU hotplug, hold off
- * management while hotplug is in progress. CPU hotplug path can't
- * grab @pool->manager_arb to achieve this because that can lead to
- * idle worker depletion (all become busy thinking someone else is
- * managing) which in turn can result in deadlock under extreme
- * circumstances. Use @pool->assoc_mutex to synchronize manager
- * against CPU hotplug.
- *
- * assoc_mutex would always be free unless CPU hotplug is in
- * progress. trylock first without dropping @pool->lock.
+ * With manager arbitration won, manager_mutex would be free in
+ * most cases. trylock first without dropping @pool->lock.
*/
- if (unlikely(!mutex_trylock(&pool->assoc_mutex))) {
+ if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
spin_unlock_irq(&pool->lock);
- mutex_lock(&pool->assoc_mutex);
+ mutex_lock(&pool->manager_mutex);
/*
* CPU hotplug could have happened while we were waiting
* for assoc_mutex. Hotplug itself can't handle us
* because manager isn't either on idle or busy list, and
* @pool's state and ours could have deviated.
*
- * As hotplug is now excluded via assoc_mutex, we can
+ * As hotplug is now excluded via manager_mutex, we can
* simply try to bind. It will succeed or fail depending
* on @pool's current state. Try it and adjust
* %WORKER_UNBOUND accordingly.
@@ -2068,7 +2082,7 @@ static bool manage_workers(struct worker *worker)
ret |= maybe_destroy_workers(pool);
ret |= maybe_create_worker(pool);

- mutex_unlock(&pool->assoc_mutex);
+ mutex_unlock(&pool->manager_mutex);
mutex_unlock(&pool->manager_arb);
return ret;
}
@@ -3436,7 +3450,7 @@ static int init_worker_pool(struct worker_pool *pool)
(unsigned long)pool);

mutex_init(&pool->manager_arb);
- mutex_init(&pool->assoc_mutex);
+ mutex_init(&pool->manager_mutex);
ida_init(&pool->worker_ida);

INIT_HLIST_NODE(&pool->hash_node);
@@ -4076,11 +4090,11 @@ static void wq_unbind_fn(struct work_struct *work)
for_each_cpu_worker_pool(pool, cpu) {
WARN_ON_ONCE(cpu != smp_processor_id());

- mutex_lock(&pool->assoc_mutex);
+ mutex_lock(&pool->manager_mutex);
spin_lock_irq(&pool->lock);

/*
- * We've claimed all manager positions. Make all workers
+ * We've blocked all manager operations. Make all workers
* unbound and set DISASSOCIATED. Before this, all workers
* except for the ones which are still executing works from
* before the last CPU down must be on the cpu. After
@@ -4095,7 +4109,7 @@ static void wq_unbind_fn(struct work_struct *work)
pool->flags |= POOL_DISASSOCIATED;

spin_unlock_irq(&pool->lock);
- mutex_unlock(&pool->assoc_mutex);
+ mutex_unlock(&pool->manager_mutex);
}

/*
@@ -4152,14 +4166,14 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
case CPU_DOWN_FAILED:
case CPU_ONLINE:
for_each_cpu_worker_pool(pool, cpu) {
- mutex_lock(&pool->assoc_mutex);
+ mutex_lock(&pool->manager_mutex);
spin_lock_irq(&pool->lock);

pool->flags &= ~POOL_DISASSOCIATED;
rebind_workers(pool);

spin_unlock_irq(&pool->lock);
- mutex_unlock(&pool->assoc_mutex);
+ mutex_unlock(&pool->manager_mutex);
}
break;
}
--
1.8.1.4

2013-03-14 02:57:44

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 4/7] workqueue: relocate global variable defs and function decls in workqueue.c

They're split across debugobj code for some reason. Collect them.

This patch is pure relocation.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ce1ab06..9a0cbb2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -248,6 +248,21 @@ struct workqueue_struct {

static struct kmem_cache *pwq_cache;

+/* Serializes the accesses to the list of workqueues. */
+static DEFINE_SPINLOCK(workqueue_lock);
+static LIST_HEAD(workqueues);
+static bool workqueue_freezing; /* W: have wqs started freezing? */
+
+/* the per-cpu worker pools */
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
+ cpu_worker_pools);
+
+/*
+ * R: idr of all pools. Modifications are protected by workqueue_lock.
+ * Read accesses are protected by sched-RCU protected.
+ */
+static DEFINE_IDR(worker_pool_idr);
+
/* W: hash of all unbound pools keyed by pool->attrs */
static DEFINE_HASHTABLE(unbound_pool_hash, UNBOUND_POOL_HASH_ORDER);

@@ -265,6 +280,10 @@ EXPORT_SYMBOL_GPL(system_unbound_wq);
struct workqueue_struct *system_freezable_wq __read_mostly;
EXPORT_SYMBOL_GPL(system_freezable_wq);

+static int worker_thread(void *__worker);
+static void copy_workqueue_attrs(struct workqueue_attrs *to,
+ const struct workqueue_attrs *from);
+
#define CREATE_TRACE_POINTS
#include <trace/events/workqueue.h>

@@ -431,25 +450,6 @@ static inline void debug_work_activate(struct work_struct *work) { }
static inline void debug_work_deactivate(struct work_struct *work) { }
#endif

-/* Serializes the accesses to the list of workqueues. */
-static DEFINE_SPINLOCK(workqueue_lock);
-static LIST_HEAD(workqueues);
-static bool workqueue_freezing; /* W: have wqs started freezing? */
-
-/* the per-cpu worker pools */
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
- cpu_worker_pools);
-
-/*
- * R: idr of all pools. Modifications are protected by workqueue_lock.
- * Read accesses are protected by sched-RCU protected.
- */
-static DEFINE_IDR(worker_pool_idr);
-
-static int worker_thread(void *__worker);
-static void copy_workqueue_attrs(struct workqueue_attrs *to,
- const struct workqueue_attrs *from);
-
/* allocate ID and assign it to @pool */
static int worker_pool_assign_id(struct worker_pool *pool)
{
--
1.8.1.4

2013-03-14 02:57:49

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 7/7] workqueue: rename workqueue_lock to wq_mayday_lock

With the recent locking updates, the only thing protected by
workqueue_lock is workqueue->maydays list. Rename workqueue_lock to
wq_mayday_lock.

This patch is pure rename.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 63856df..969be0b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -125,10 +125,10 @@ enum {
*
* PW: pwq_lock protected.
*
- * W: workqueue_lock protected.
- *
* FR: wq->flush_mutex and pwq_lock protected for writes. Sched-RCU
* protected for reads.
+ *
+ * MD: wq_mayday_lock protected.
*/

/* struct worker is defined in workqueue_internal.h */
@@ -194,7 +194,7 @@ struct pool_workqueue {
int max_active; /* L: max active works */
struct list_head delayed_works; /* L: delayed works */
struct list_head pwqs_node; /* FR: node on wq->pwqs */
- struct list_head mayday_node; /* W: node on wq->maydays */
+ struct list_head mayday_node; /* MD: node on wq->maydays */

/*
* Release of unbound pwq is punted to system_wq. See put_pwq()
@@ -235,7 +235,7 @@ struct workqueue_struct {
struct list_head flusher_queue; /* F: flush waiters */
struct list_head flusher_overflow; /* F: flush overflow list */

- struct list_head maydays; /* W: pwqs requesting rescue */
+ struct list_head maydays; /* MD: pwqs requesting rescue */
struct worker *rescuer; /* I: rescue worker */

int nr_drainers; /* WQ: drain in progress */
@@ -254,7 +254,7 @@ static struct kmem_cache *pwq_cache;

static DEFINE_MUTEX(wq_mutex); /* protects workqueues and pools */
static DEFINE_SPINLOCK(pwq_lock); /* protects pool_workqueues */
-static DEFINE_SPINLOCK(workqueue_lock);
+static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */

static LIST_HEAD(workqueues); /* WQ: list of all workqueues */
static bool workqueue_freezing; /* WQ: have wqs started freezing? */
@@ -1894,7 +1894,7 @@ static void send_mayday(struct work_struct *work)
struct pool_workqueue *pwq = get_work_pwq(work);
struct workqueue_struct *wq = pwq->wq;

- lockdep_assert_held(&workqueue_lock);
+ lockdep_assert_held(&wq_mayday_lock);

if (!wq->rescuer)
return;
@@ -1911,7 +1911,7 @@ static void pool_mayday_timeout(unsigned long __pool)
struct worker_pool *pool = (void *)__pool;
struct work_struct *work;

- spin_lock_irq(&workqueue_lock); /* for wq->maydays */
+ spin_lock_irq(&wq_mayday_lock); /* for wq->maydays */
spin_lock(&pool->lock);

if (need_to_create_worker(pool)) {
@@ -1926,7 +1926,7 @@ static void pool_mayday_timeout(unsigned long __pool)
}

spin_unlock(&pool->lock);
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&wq_mayday_lock);

mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL);
}
@@ -2404,7 +2404,7 @@ repeat:
}

/* see whether any pwq is asking for help */
- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&wq_mayday_lock);

while (!list_empty(&wq->maydays)) {
struct pool_workqueue *pwq = list_first_entry(&wq->maydays,
@@ -2415,7 +2415,7 @@ repeat:
__set_current_state(TASK_RUNNING);
list_del_init(&pwq->mayday_node);

- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&wq_mayday_lock);

/* migrate to the target cpu if possible */
worker_maybe_bind_and_lock(pool);
@@ -2442,10 +2442,10 @@ repeat:

rescuer->pool = NULL;
spin_unlock(&pool->lock);
- spin_lock(&workqueue_lock);
+ spin_lock(&wq_mayday_lock);
}

- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&wq_mayday_lock);

/* rescuers should never participate in concurrency management */
WARN_ON_ONCE(!(rescuer->flags & WORKER_NOT_RUNNING));
--
1.8.1.4

2013-03-14 02:58:02

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 6/7] workqueue: separate out pool_workqueue locking into pwq_lock

This patch continues locking cleanup from the previous patch. It
breaks out pool_workqueue synchronization from workqueue_lock into a
new spinlock - pwq_lock. The followings are protected by pwq_lock.

* workqueue->pwqs
* workqueue->saved_max_active

The conversion is straight-forward. workqueue_lock usages which cover
the above two are converted to pwq_lock. New locking label PW added
for things protected by pwq_lock and FR is updated to mean flush_mutex
+ pwq_lock + sched-RCU.

This patch shouldn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 69 ++++++++++++++++++++++++++++--------------------------
1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c3b59ff..63856df 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -123,9 +123,11 @@ enum {
*
* WR: wq_mutex protected for writes. Sched-RCU protected for reads.
*
+ * PW: pwq_lock protected.
+ *
* W: workqueue_lock protected.
*
- * FR: wq->flush_mutex and workqueue_lock protected for writes. Sched-RCU
+ * FR: wq->flush_mutex and pwq_lock protected for writes. Sched-RCU
* protected for reads.
*/

@@ -198,7 +200,7 @@ struct pool_workqueue {
* Release of unbound pwq is punted to system_wq. See put_pwq()
* and pwq_unbound_release_workfn() for details. pool_workqueue
* itself is also sched-RCU protected so that the first pwq can be
- * determined without grabbing workqueue_lock.
+ * determined without grabbing pwq_lock.
*/
struct work_struct unbound_release_work;
struct rcu_head rcu;
@@ -237,7 +239,7 @@ struct workqueue_struct {
struct worker *rescuer; /* I: rescue worker */

int nr_drainers; /* WQ: drain in progress */
- int saved_max_active; /* W: saved pwq max_active */
+ int saved_max_active; /* PW: saved pwq max_active */

#ifdef CONFIG_SYSFS
struct wq_device *wq_dev; /* I: for sysfs interface */
@@ -251,6 +253,7 @@ struct workqueue_struct {
static struct kmem_cache *pwq_cache;

static DEFINE_MUTEX(wq_mutex); /* protects workqueues and pools */
+static DEFINE_SPINLOCK(pwq_lock); /* protects pool_workqueues */
static DEFINE_SPINLOCK(workqueue_lock);

static LIST_HEAD(workqueues); /* WQ: list of all workqueues */
@@ -291,10 +294,10 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
lockdep_is_held(&wq_mutex), \
"sched RCU or wq_mutex should be held")

-#define assert_rcu_or_wq_lock() \
+#define assert_rcu_or_pwq_lock() \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
- lockdep_is_held(&workqueue_lock), \
- "sched RCU or workqueue lock should be held")
+ lockdep_is_held(&pwq_lock), \
+ "sched RCU or pwq_lock should be held")

#define for_each_cpu_worker_pool(pool, cpu) \
for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0]; \
@@ -326,16 +329,16 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
* @pwq: iteration cursor
* @wq: the target workqueue
*
- * This must be called either with workqueue_lock held or sched RCU read
- * locked. If the pwq needs to be used beyond the locking in effect, the
- * caller is responsible for guaranteeing that the pwq stays online.
+ * This must be called either with pwq_lock held or sched RCU read locked.
+ * If the pwq needs to be used beyond the locking in effect, the caller is
+ * responsible for guaranteeing that the pwq stays online.
*
* The if/else clause exists only for the lockdep assertion and can be
* ignored.
*/
#define for_each_pwq(pwq, wq) \
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node) \
- if (({ assert_rcu_or_wq_lock(); false; })) { } \
+ if (({ assert_rcu_or_pwq_lock(); false; })) { } \
else

#ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -474,13 +477,13 @@ static int worker_pool_assign_id(struct worker_pool *pool)
* first_pwq - return the first pool_workqueue of the specified workqueue
* @wq: the target workqueue
*
- * This must be called either with workqueue_lock held or sched RCU read
- * locked. If the pwq needs to be used beyond the locking in effect, the
- * caller is responsible for guaranteeing that the pwq stays online.
+ * This must be called either with pwq_lock held or sched RCU read locked.
+ * If the pwq needs to be used beyond the locking in effect, the caller is
+ * responsible for guaranteeing that the pwq stays online.
*/
static struct pool_workqueue *first_pwq(struct workqueue_struct *wq)
{
- assert_rcu_or_wq_lock();
+ assert_rcu_or_pwq_lock();
return list_first_or_null_rcu(&wq->pwqs, struct pool_workqueue,
pwqs_node);
}
@@ -3639,9 +3642,9 @@ static void pwq_unbound_release_workfn(struct work_struct *work)
* and consistent with the linking path.
*/
mutex_lock(&wq->flush_mutex);
- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);
list_del_rcu(&pwq->pwqs_node);
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);
mutex_unlock(&wq->flush_mutex);

put_unbound_pool(pool);
@@ -3669,7 +3672,7 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
bool freezable = wq->flags & WQ_FREEZABLE;

/* for @wq->saved_max_active */
- lockdep_assert_held(&workqueue_lock);
+ lockdep_assert_held(&pwq_lock);

/* fast exit for non-freezable wqs */
if (!freezable && pwq->max_active == wq->saved_max_active)
@@ -3706,7 +3709,7 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
INIT_WORK(&pwq->unbound_release_work, pwq_unbound_release_workfn);

mutex_lock(&wq->flush_mutex);
- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);

/*
* Set the matching work_color. This is synchronized with
@@ -3722,7 +3725,7 @@ static void init_and_link_pwq(struct pool_workqueue *pwq,
/* link in @pwq */
list_add_rcu(&pwq->pwqs_node, &wq->pwqs);

- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);
mutex_unlock(&wq->flush_mutex);
}

@@ -3886,10 +3889,10 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
*/
mutex_lock(&wq_mutex);

- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);

list_add(&wq->list, &workqueues);

@@ -3920,13 +3923,13 @@ void destroy_workqueue(struct workqueue_struct *wq)
drain_workqueue(wq);

/* sanity checks */
- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);
for_each_pwq(pwq, wq) {
int i;

for (i = 0; i < WORK_NR_COLORS; i++) {
if (WARN_ON(pwq->nr_in_flight[i])) {
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);
return;
}
}
@@ -3934,11 +3937,11 @@ void destroy_workqueue(struct workqueue_struct *wq)
if (WARN_ON(pwq->refcnt > 1) ||
WARN_ON(pwq->nr_active) ||
WARN_ON(!list_empty(&pwq->delayed_works))) {
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);
return;
}
}
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);

/*
* wq list is used to freeze wq, remove from list after
@@ -4000,14 +4003,14 @@ void workqueue_set_max_active(struct workqueue_struct *wq, int max_active)

max_active = wq_clamp_max_active(max_active, wq->flags, wq->name);

- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);

wq->saved_max_active = max_active;

for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);

- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);
}
EXPORT_SYMBOL_GPL(workqueue_set_max_active);

@@ -4266,7 +4269,7 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
* pool->worklist.
*
* CONTEXT:
- * Grabs and releases wq_mutex, workqueue_lock and pool->lock's.
+ * Grabs and releases wq_mutex, pwq_lock and pool->lock's.
*/
void freeze_workqueues_begin(void)
{
@@ -4289,12 +4292,12 @@ void freeze_workqueues_begin(void)
}

/* suppress further executions by setting max_active to zero */
- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);
list_for_each_entry(wq, &workqueues, list) {
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
}
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);

mutex_unlock(&wq_mutex);
}
@@ -4352,7 +4355,7 @@ out_unlock:
* frozen works are transferred to their respective pool worklists.
*
* CONTEXT:
- * Grabs and releases wq_mutex, workqueue_lock and pool->lock's.
+ * Grabs and releases wq_mutex, pwq_lock and pool->lock's.
*/
void thaw_workqueues(void)
{
@@ -4375,12 +4378,12 @@ void thaw_workqueues(void)
}

/* restore max_active and repopulate worklist */
- spin_lock_irq(&workqueue_lock);
+ spin_lock_irq(&pwq_lock);
list_for_each_entry(wq, &workqueues, list) {
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
}
- spin_unlock_irq(&workqueue_lock);
+ spin_unlock_irq(&pwq_lock);

/* kick workers */
for_each_pool(pool, pi) {
--
1.8.1.4

2013-03-14 02:58:44

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 5/7] workqueue: separate out pool and workqueue locking into wq_mutex

Currently, workqueue_lock protects most shared workqueue resources -
the pools, workqueues, pool_workqueues, draining, ID assignments,
mayday handling and so on. The coverage has grown organically and
there is no identified bottleneck coming from workqueue_lock, but it
has grown a bit too much and scheduled rebinding changes need the
pools and workqueues to be protected by a mutex instead of a spinlock.

This patch breaks out pool and workqueue synchronization from
workqueue_lock into a new mutex - wq_mutex. The followings are
protected by wq_mutex.

* worker_pool_idr and unbound_pool_hash
* pool->refcnt
* workqueues list
* workqueue->flags, ->nr_drainers

Most changes are mostly straight-forward. workqueue_lock is replaced
with wq_mutex where applicable and workqueue_lock lock/unlocks are
added where wq_mutex conversion leaves data structures not protected
by wq_mutex without locking. irq / preemption flippings were added
where the conversion affects them. Things worth noting are

* New WQ and WR locking lables added along with
assert_rcu_or_wq_mutex().

* worker_pool_assign_id() now expects to be called under wq_mutex.

* create_mutex is removed from get_unbound_pool(). It now just holds
wq_mutex.

This patch shouldn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 146 ++++++++++++++++++++++++++++-------------------------
1 file changed, 77 insertions(+), 69 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9a0cbb2..c3b59ff 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -119,9 +119,11 @@ enum {
*
* F: wq->flush_mutex protected.
*
- * W: workqueue_lock protected.
+ * WQ: wq_mutex protected.
+ *
+ * WR: wq_mutex protected for writes. Sched-RCU protected for reads.
*
- * R: workqueue_lock protected for writes. Sched-RCU protected for reads.
+ * W: workqueue_lock protected.
*
* FR: wq->flush_mutex and workqueue_lock protected for writes. Sched-RCU
* protected for reads.
@@ -155,8 +157,8 @@ struct worker_pool {
struct ida worker_ida; /* L: for worker IDs */

struct workqueue_attrs *attrs; /* I: worker attributes */
- struct hlist_node hash_node; /* W: unbound_pool_hash node */
- int refcnt; /* W: refcnt for unbound pools */
+ struct hlist_node hash_node; /* WQ: unbound_pool_hash node */
+ int refcnt; /* WQ: refcnt for unbound pools */

/*
* The current concurrency level. As it's likely to be accessed
@@ -218,10 +220,10 @@ struct wq_device;
* the appropriate worker_pool through its pool_workqueues.
*/
struct workqueue_struct {
- unsigned int flags; /* W: WQ_* flags */
+ unsigned int flags; /* WQ: WQ_* flags */
struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwq's */
struct list_head pwqs; /* FR: all pwqs of this wq */
- struct list_head list; /* W: list of all workqueues */
+ struct list_head list; /* WQ: list of all workqueues */

struct mutex flush_mutex; /* protects wq flushing */
int work_color; /* F: current work color */
@@ -234,7 +236,7 @@ struct workqueue_struct {
struct list_head maydays; /* W: pwqs requesting rescue */
struct worker *rescuer; /* I: rescue worker */

- int nr_drainers; /* W: drain in progress */
+ int nr_drainers; /* WQ: drain in progress */
int saved_max_active; /* W: saved pwq max_active */

#ifdef CONFIG_SYSFS
@@ -248,22 +250,19 @@ struct workqueue_struct {

static struct kmem_cache *pwq_cache;

-/* Serializes the accesses to the list of workqueues. */
+static DEFINE_MUTEX(wq_mutex); /* protects workqueues and pools */
static DEFINE_SPINLOCK(workqueue_lock);
-static LIST_HEAD(workqueues);
-static bool workqueue_freezing; /* W: have wqs started freezing? */
+
+static LIST_HEAD(workqueues); /* WQ: list of all workqueues */
+static bool workqueue_freezing; /* WQ: have wqs started freezing? */

/* the per-cpu worker pools */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
cpu_worker_pools);

-/*
- * R: idr of all pools. Modifications are protected by workqueue_lock.
- * Read accesses are protected by sched-RCU protected.
- */
-static DEFINE_IDR(worker_pool_idr);
+static DEFINE_IDR(worker_pool_idr); /* WR: idr of all pools */

-/* W: hash of all unbound pools keyed by pool->attrs */
+/* WQ: hash of all unbound pools keyed by pool->attrs */
static DEFINE_HASHTABLE(unbound_pool_hash, UNBOUND_POOL_HASH_ORDER);

/* I: attributes used when instantiating standard unbound pools on demand */
@@ -287,6 +286,11 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
#define CREATE_TRACE_POINTS
#include <trace/events/workqueue.h>

+#define assert_rcu_or_wq_mutex() \
+ rcu_lockdep_assert(rcu_read_lock_sched_held() || \
+ lockdep_is_held(&wq_mutex), \
+ "sched RCU or wq_mutex should be held")
+
#define assert_rcu_or_wq_lock() \
rcu_lockdep_assert(rcu_read_lock_sched_held() || \
lockdep_is_held(&workqueue_lock), \
@@ -305,16 +309,16 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
* @pool: iteration cursor
* @pi: integer used for iteration
*
- * This must be called either with workqueue_lock held or sched RCU read
- * locked. If the pool needs to be used beyond the locking in effect, the
- * caller is responsible for guaranteeing that the pool stays online.
+ * This must be called either with wq_mutex held or sched RCU read locked.
+ * If the pool needs to be used beyond the locking in effect, the caller is
+ * responsible for guaranteeing that the pool stays online.
*
* The if/else clause exists only for the lockdep assertion and can be
* ignored.
*/
#define for_each_pool(pool, pi) \
idr_for_each_entry(&worker_pool_idr, pool, pi) \
- if (({ assert_rcu_or_wq_lock(); false; })) { } \
+ if (({ assert_rcu_or_wq_mutex(); false; })) { } \
else

/**
@@ -455,13 +459,12 @@ static int worker_pool_assign_id(struct worker_pool *pool)
{
int ret;

+ lockdep_assert_held(&wq_mutex);
+
do {
if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL))
return -ENOMEM;
-
- spin_lock_irq(&workqueue_lock);
ret = idr_get_new(&worker_pool_idr, pool, &pool->id);
- spin_unlock_irq(&workqueue_lock);
} while (ret == -EAGAIN);

return ret;
@@ -574,9 +577,9 @@ static struct pool_workqueue *get_work_pwq(struct work_struct *work)
*
* Return the worker_pool @work was last associated with. %NULL if none.
*
- * Pools are created and destroyed under workqueue_lock, and allows read
- * access under sched-RCU read lock. As such, this function should be
- * called under workqueue_lock or with preemption disabled.
+ * Pools are created and destroyed under wq_mutex, and allows read access
+ * under sched-RCU read lock. As such, this function should be called
+ * under wq_mutex or with preemption disabled.
*
* All fields of the returned pool are accessible as long as the above
* mentioned locking is in effect. If the returned pool needs to be used
@@ -588,7 +591,7 @@ static struct worker_pool *get_work_pool(struct work_struct *work)
unsigned long data = atomic_long_read(&work->data);
int pool_id;

- assert_rcu_or_wq_lock();
+ assert_rcu_or_wq_mutex();

if (data & WORK_STRUCT_PWQ)
return ((struct pool_workqueue *)
@@ -2768,10 +2771,10 @@ void drain_workqueue(struct workqueue_struct *wq)
* hotter than drain_workqueue() and already looks at @wq->flags.
* Use __WQ_DRAINING so that queue doesn't have to check nr_drainers.
*/
- spin_lock_irq(&workqueue_lock);
+ mutex_lock(&wq_mutex);
if (!wq->nr_drainers++)
wq->flags |= __WQ_DRAINING;
- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);
reflush:
flush_workqueue(wq);

@@ -2796,12 +2799,12 @@ reflush:
goto reflush;
}

- spin_lock(&workqueue_lock);
+ local_irq_enable();
+
+ mutex_lock(&wq_mutex);
if (!--wq->nr_drainers)
wq->flags &= ~__WQ_DRAINING;
- spin_unlock(&workqueue_lock);
-
- local_irq_enable();
+ mutex_unlock(&wq_mutex);
}
EXPORT_SYMBOL_GPL(drain_workqueue);

@@ -3514,16 +3517,16 @@ static void put_unbound_pool(struct worker_pool *pool)
{
struct worker *worker;

- spin_lock_irq(&workqueue_lock);
+ mutex_lock(&wq_mutex);
if (--pool->refcnt) {
- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);
return;
}

/* sanity checks */
if (WARN_ON(!(pool->flags & POOL_DISASSOCIATED)) ||
WARN_ON(!list_empty(&pool->worklist))) {
- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);
return;
}

@@ -3532,7 +3535,7 @@ static void put_unbound_pool(struct worker_pool *pool)
idr_remove(&worker_pool_idr, pool->id);
hash_del(&pool->hash_node);

- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);

/*
* Become the manager and destroy all workers. Grabbing
@@ -3570,21 +3573,18 @@ static void put_unbound_pool(struct worker_pool *pool)
*/
static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
{
- static DEFINE_MUTEX(create_mutex);
u32 hash = wqattrs_hash(attrs);
struct worker_pool *pool;

- mutex_lock(&create_mutex);
+ mutex_lock(&wq_mutex);

/* do we already have a matching pool? */
- spin_lock_irq(&workqueue_lock);
hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
if (wqattrs_equal(pool->attrs, attrs)) {
pool->refcnt++;
goto out_unlock;
}
}
- spin_unlock_irq(&workqueue_lock);

/* nope, create a new one */
pool = kzalloc(sizeof(*pool), GFP_KERNEL);
@@ -3602,14 +3602,12 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
goto fail;

/* install */
- spin_lock_irq(&workqueue_lock);
hash_add(unbound_pool_hash, &pool->hash_node, hash);
out_unlock:
- spin_unlock_irq(&workqueue_lock);
- mutex_unlock(&create_mutex);
+ mutex_unlock(&wq_mutex);
return pool;
fail:
- mutex_unlock(&create_mutex);
+ mutex_unlock(&wq_mutex);
if (pool)
put_unbound_pool(pool);
return NULL;
@@ -3883,18 +3881,19 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
goto err_destroy;

/*
- * workqueue_lock protects global freeze state and workqueues list.
- * Grab it, adjust max_active and add the new workqueue to
- * workqueues list.
+ * wq_mutex protects global freeze state and workqueues list. Grab
+ * it, adjust max_active and add the new @wq to workqueues list.
*/
- spin_lock_irq(&workqueue_lock);
+ mutex_lock(&wq_mutex);

+ spin_lock_irq(&workqueue_lock);
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
+ spin_unlock_irq(&workqueue_lock);

list_add(&wq->list, &workqueues);

- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);

return wq;

@@ -3920,9 +3919,8 @@ void destroy_workqueue(struct workqueue_struct *wq)
/* drain it before proceeding with destruction */
drain_workqueue(wq);

- spin_lock_irq(&workqueue_lock);
-
/* sanity checks */
+ spin_lock_irq(&workqueue_lock);
for_each_pwq(pwq, wq) {
int i;

@@ -3940,14 +3938,15 @@ void destroy_workqueue(struct workqueue_struct *wq)
return;
}
}
+ spin_unlock_irq(&workqueue_lock);

/*
* wq list is used to freeze wq, remove from list after
* flushing is complete in case freeze races us.
*/
+ mutex_lock(&wq_mutex);
list_del_init(&wq->list);
-
- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);

workqueue_sysfs_unregister(wq);

@@ -4267,7 +4266,7 @@ EXPORT_SYMBOL_GPL(work_on_cpu);
* pool->worklist.
*
* CONTEXT:
- * Grabs and releases workqueue_lock and pool->lock's.
+ * Grabs and releases wq_mutex, workqueue_lock and pool->lock's.
*/
void freeze_workqueues_begin(void)
{
@@ -4276,26 +4275,28 @@ void freeze_workqueues_begin(void)
struct pool_workqueue *pwq;
int pi;

- spin_lock_irq(&workqueue_lock);
+ mutex_lock(&wq_mutex);

WARN_ON_ONCE(workqueue_freezing);
workqueue_freezing = true;

/* set FREEZING */
for_each_pool(pool, pi) {
- spin_lock(&pool->lock);
+ spin_lock_irq(&pool->lock);
WARN_ON_ONCE(pool->flags & POOL_FREEZING);
pool->flags |= POOL_FREEZING;
- spin_unlock(&pool->lock);
+ spin_unlock_irq(&pool->lock);
}

/* suppress further executions by setting max_active to zero */
+ spin_lock_irq(&workqueue_lock);
list_for_each_entry(wq, &workqueues, list) {
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
}
-
spin_unlock_irq(&workqueue_lock);
+
+ mutex_unlock(&wq_mutex);
}

/**
@@ -4305,7 +4306,7 @@ void freeze_workqueues_begin(void)
* between freeze_workqueues_begin() and thaw_workqueues().
*
* CONTEXT:
- * Grabs and releases workqueue_lock.
+ * Grabs and releases wq_mutex.
*
* RETURNS:
* %true if some freezable workqueues are still busy. %false if freezing
@@ -4317,7 +4318,7 @@ bool freeze_workqueues_busy(void)
struct workqueue_struct *wq;
struct pool_workqueue *pwq;

- spin_lock_irq(&workqueue_lock);
+ mutex_lock(&wq_mutex);

WARN_ON_ONCE(!workqueue_freezing);

@@ -4328,16 +4329,19 @@ bool freeze_workqueues_busy(void)
* nr_active is monotonically decreasing. It's safe
* to peek without lock.
*/
+ preempt_disable();
for_each_pwq(pwq, wq) {
WARN_ON_ONCE(pwq->nr_active < 0);
if (pwq->nr_active) {
busy = true;
+ preempt_enable();
goto out_unlock;
}
}
+ preempt_enable();
}
out_unlock:
- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);
return busy;
}

@@ -4348,7 +4352,7 @@ out_unlock:
* frozen works are transferred to their respective pool worklists.
*
* CONTEXT:
- * Grabs and releases workqueue_lock and pool->lock's.
+ * Grabs and releases wq_mutex, workqueue_lock and pool->lock's.
*/
void thaw_workqueues(void)
{
@@ -4357,35 +4361,37 @@ void thaw_workqueues(void)
struct worker_pool *pool;
int pi;

- spin_lock_irq(&workqueue_lock);
+ mutex_lock(&wq_mutex);

if (!workqueue_freezing)
goto out_unlock;

/* clear FREEZING */
for_each_pool(pool, pi) {
- spin_lock(&pool->lock);
+ spin_lock_irq(&pool->lock);
WARN_ON_ONCE(!(pool->flags & POOL_FREEZING));
pool->flags &= ~POOL_FREEZING;
- spin_unlock(&pool->lock);
+ spin_unlock_irq(&pool->lock);
}

/* restore max_active and repopulate worklist */
+ spin_lock_irq(&workqueue_lock);
list_for_each_entry(wq, &workqueues, list) {
for_each_pwq(pwq, wq)
pwq_adjust_max_active(pwq);
}
+ spin_unlock_irq(&workqueue_lock);

/* kick workers */
for_each_pool(pool, pi) {
- spin_lock(&pool->lock);
+ spin_lock_irq(&pool->lock);
wake_up_worker(pool);
- spin_unlock(&pool->lock);
+ spin_unlock_irq(&pool->lock);
}

workqueue_freezing = false;
out_unlock:
- spin_unlock_irq(&workqueue_lock);
+ mutex_unlock(&wq_mutex);
}
#endif /* CONFIG_FREEZER */

@@ -4417,7 +4423,9 @@ static int __init init_workqueues(void)
pool->attrs->nice = std_nice[i++];

/* alloc pool ID */
+ mutex_lock(&wq_mutex);
BUG_ON(worker_pool_assign_id(pool));
+ mutex_unlock(&wq_mutex);
}
}

--
1.8.1.4

2013-03-14 02:59:06

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/7] workqueue: better define locking rules around worker creation / destruction

When a manager creates or destroys workers, the operations are always
done with the manager_mutex held; however, initial worker creation or
worker destruction during pool release don't grab the mutex. They are
still correct as initial worker creation doesn't require
synchronization and grabbing manager_arb provides enough exclusion for
pool release path.

Still, let's make everyone follow the same rules for consistency and
such that lockdep annotations can be added.

Update create_and_start_worker() and put_unbound_pool() to grab
manager_mutex around thread creation and destruction respectively and
add lockdep assertions to create_worker() and destroy_worker().

This patch doesn't introduce any visible behavior changes.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cac7106..ce1ab06 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1715,6 +1715,8 @@ static struct worker *create_worker(struct worker_pool *pool)
struct worker *worker = NULL;
int id = -1;

+ lockdep_assert_held(&pool->manager_mutex);
+
spin_lock_irq(&pool->lock);
while (ida_get_new(&pool->worker_ida, &id)) {
spin_unlock_irq(&pool->lock);
@@ -1796,12 +1798,14 @@ static void start_worker(struct worker *worker)
* create_and_start_worker - create and start a worker for a pool
* @pool: the target pool
*
- * Create and start a new worker for @pool.
+ * Grab the managership of @pool and create and start a new worker for it.
*/
static int create_and_start_worker(struct worker_pool *pool)
{
struct worker *worker;

+ mutex_lock(&pool->manager_mutex);
+
worker = create_worker(pool);
if (worker) {
spin_lock_irq(&pool->lock);
@@ -1809,6 +1813,8 @@ static int create_and_start_worker(struct worker_pool *pool)
spin_unlock_irq(&pool->lock);
}

+ mutex_unlock(&pool->manager_mutex);
+
return worker ? 0 : -ENOMEM;
}

@@ -1826,6 +1832,9 @@ static void destroy_worker(struct worker *worker)
struct worker_pool *pool = worker->pool;
int id = worker->id;

+ lockdep_assert_held(&pool->manager_mutex);
+ lockdep_assert_held(&pool->lock);
+
/* sanity check frenzy */
if (WARN_ON(worker->current_work) ||
WARN_ON(!list_empty(&worker->scheduled)))
@@ -3531,6 +3540,7 @@ static void put_unbound_pool(struct worker_pool *pool)
* manager_mutex.
*/
mutex_lock(&pool->manager_arb);
+ mutex_lock(&pool->manager_mutex);
spin_lock_irq(&pool->lock);

while ((worker = first_worker(pool)))
@@ -3538,6 +3548,7 @@ static void put_unbound_pool(struct worker_pool *pool)
WARN_ON(pool->nr_workers || pool->nr_idle);

spin_unlock_irq(&pool->lock);
+ mutex_unlock(&pool->manager_mutex);
mutex_unlock(&pool->manager_arb);

/* shut down the timers */
--
1.8.1.4

2013-03-14 02:57:38

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/7] workqueue: factor out initial worker creation into create_and_start_worker()

get_unbound_pool(), workqueue_cpu_up_callback() and init_workqueues()
have similar code pieces to create and start the initial worker factor
those out into create_and_start_worker().

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <[email protected]>
---
kernel/workqueue.c | 47 +++++++++++++++++++++++------------------------
1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bc25bdf..cac7106 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1793,6 +1793,26 @@ static void start_worker(struct worker *worker)
}

/**
+ * create_and_start_worker - create and start a worker for a pool
+ * @pool: the target pool
+ *
+ * Create and start a new worker for @pool.
+ */
+static int create_and_start_worker(struct worker_pool *pool)
+{
+ struct worker *worker;
+
+ worker = create_worker(pool);
+ if (worker) {
+ spin_lock_irq(&pool->lock);
+ start_worker(worker);
+ spin_unlock_irq(&pool->lock);
+ }
+
+ return worker ? 0 : -ENOMEM;
+}
+
+/**
* destroy_worker - destroy a workqueue worker
* @worker: worker to be destroyed
*
@@ -3542,7 +3562,6 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
static DEFINE_MUTEX(create_mutex);
u32 hash = wqattrs_hash(attrs);
struct worker_pool *pool;
- struct worker *worker;

mutex_lock(&create_mutex);

@@ -3568,14 +3587,9 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
goto fail;

/* create and start the initial worker */
- worker = create_worker(pool);
- if (!worker)
+ if (create_and_start_worker(pool) < 0)
goto fail;

- spin_lock_irq(&pool->lock);
- start_worker(worker);
- spin_unlock_irq(&pool->lock);
-
/* install */
spin_lock_irq(&workqueue_lock);
hash_add(unbound_pool_hash, &pool->hash_node, hash);
@@ -4148,18 +4162,10 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
for_each_cpu_worker_pool(pool, cpu) {
- struct worker *worker;
-
if (pool->nr_workers)
continue;
-
- worker = create_worker(pool);
- if (!worker)
+ if (create_and_start_worker(pool) < 0)
return NOTIFY_BAD;
-
- spin_lock_irq(&pool->lock);
- start_worker(worker);
- spin_unlock_irq(&pool->lock);
}
break;

@@ -4409,15 +4415,8 @@ static int __init init_workqueues(void)
struct worker_pool *pool;

for_each_cpu_worker_pool(pool, cpu) {
- struct worker *worker;
-
pool->flags &= ~POOL_DISASSOCIATED;
-
- worker = create_worker(pool);
- BUG_ON(!worker);
- spin_lock_irq(&pool->lock);
- start_worker(worker);
- spin_unlock_irq(&pool->lock);
+ BUG_ON(create_and_start_worker(pool) < 0);
}
}

--
1.8.1.4

2013-03-14 16:00:46

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 1/7] workqueue: rename worker_pool->assoc_mutex to ->manager_mutex

On Thu, Mar 14, 2013 at 10:57 AM, Tejun Heo <[email protected]> wrote:
> Manager operations are currently governed by two mutexes -
> pool->manager_arb and ->assoc_mutex. The former is used to decide who
> gets to be the manager and the latter to exclude the actual manager
> operations including creation and destruction of workers. Anyone who
> grabs ->manager_arb must perform manager role; otherwise, the pool
> might stall.
>
> Grabbing ->assoc_mutex blocks everyone else from performing manager
> operations but doesn't require the holder to perform manager duties as
> it's merely blocking manager operations without becoming the manager.
>
> Because the blocking was necessary when [dis]associating per-cpu
> workqueues during CPU hotplug events, the latter was named
> assoc_mutex. The mutex is scheduled to be used for other purposes, so
> this patch gives it a more fitting generic name - manager_mutex - and
> updates / adds comments to explain synchronization around the manager
> role and operations.
>
> This patch is pure rename / doc update.
>
> Signed-off-by: Tejun Heo <[email protected]>
> ---
> kernel/workqueue.c | 62 +++++++++++++++++++++++++++++++++---------------------
> 1 file changed, 38 insertions(+), 24 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f37421f..bc25bdf 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -60,8 +60,8 @@ enum {
> * %WORKER_UNBOUND set and concurrency management disabled, and may
> * be executing on any CPU. The pool behaves as an unbound one.
> *
> - * Note that DISASSOCIATED can be flipped only while holding
> - * assoc_mutex to avoid changing binding state while
> + * Note that DISASSOCIATED should be flipped only while holding
> + * manager_mutex to avoid changing binding state while
> * create_worker() is in progress.
> */
> POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
> @@ -149,8 +149,9 @@ struct worker_pool {
> DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
> /* L: hash of busy workers */
>
> + /* see manage_workers() for details on the two manager mutexes */
> struct mutex manager_arb; /* manager arbitration */
> - struct mutex assoc_mutex; /* protect POOL_DISASSOCIATED */
> + struct mutex manager_mutex; /* manager exclusion */
> struct ida worker_ida; /* L: for worker IDs */
>
> struct workqueue_attrs *attrs; /* I: worker attributes */
> @@ -1635,7 +1636,7 @@ static void rebind_workers(struct worker_pool *pool)
> struct worker *worker, *n;
> int i;
>
> - lockdep_assert_held(&pool->assoc_mutex);
> + lockdep_assert_held(&pool->manager_mutex);
> lockdep_assert_held(&pool->lock);
>
> /* dequeue and kick idle ones */
> @@ -2022,31 +2023,44 @@ static bool manage_workers(struct worker *worker)
> struct worker_pool *pool = worker->pool;
> bool ret = false;
>
> + /*
> + * Managership is governed by two mutexes - manager_arb and
> + * manager_mutex. manager_arb handles arbitration of manager role.
> + * Anyone who successfully grabs manager_arb wins the arbitration
> + * and becomes the manager. mutex_trylock() on pool->manager_arb
> + * failure while holding pool->lock reliably indicates that someone
> + * else is managing the pool and the worker which failed trylock
> + * can proceed to executing work items. This means that anyone
> + * grabbing manager_arb is responsible for actually performing
> + * manager duties. If manager_arb is grabbed and released without
> + * actual management, the pool may stall indefinitely.
> + *
> + * manager_mutex is used for exclusion of actual management
> + * operations. The holder of manager_mutex can be sure that none
> + * of management operations, including creation and destruction of
> + * workers, won't take place until the mutex is released. Because
> + * manager_mutex doesn't interfere with manager role arbitration,
> + * it is guaranteed that the pool's management, while may be
> + * delayed, won't be disturbed by someone else grabbing
> + * manager_mutex.
> + */
> if (!mutex_trylock(&pool->manager_arb))
> return ret;
>
> /*
> - * To simplify both worker management and CPU hotplug, hold off
> - * management while hotplug is in progress. CPU hotplug path can't
> - * grab @pool->manager_arb to achieve this because that can lead to
> - * idle worker depletion (all become busy thinking someone else is
> - * managing) which in turn can result in deadlock under extreme
> - * circumstances. Use @pool->assoc_mutex to synchronize manager
> - * against CPU hotplug.
> - *
> - * assoc_mutex would always be free unless CPU hotplug is in
> - * progress. trylock first without dropping @pool->lock.
> + * With manager arbitration won, manager_mutex would be free in
> + * most cases. trylock first without dropping @pool->lock.
> */
> - if (unlikely(!mutex_trylock(&pool->assoc_mutex))) {
> + if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
> spin_unlock_irq(&pool->lock);
> - mutex_lock(&pool->assoc_mutex);
> + mutex_lock(&pool->manager_mutex);
> /*
> * CPU hotplug could have happened while we were waiting
> * for assoc_mutex. Hotplug itself can't handle us
> * because manager isn't either on idle or busy list, and
> * @pool's state and ours could have deviated.
> *
> - * As hotplug is now excluded via assoc_mutex, we can
> + * As hotplug is now excluded via manager_mutex, we can
> * simply try to bind. It will succeed or fail depending
> * on @pool's current state. Try it and adjust
> * %WORKER_UNBOUND accordingly.
> @@ -2068,7 +2082,7 @@ static bool manage_workers(struct worker *worker)
> ret |= maybe_destroy_workers(pool);
> ret |= maybe_create_worker(pool);
>
> - mutex_unlock(&pool->assoc_mutex);
> + mutex_unlock(&pool->manager_mutex);
> mutex_unlock(&pool->manager_arb);
> return ret;
> }
> @@ -3436,7 +3450,7 @@ static int init_worker_pool(struct worker_pool *pool)
> (unsigned long)pool);
>
> mutex_init(&pool->manager_arb);
> - mutex_init(&pool->assoc_mutex);
> + mutex_init(&pool->manager_mutex);
> ida_init(&pool->worker_ida);
>
> INIT_HLIST_NODE(&pool->hash_node);
> @@ -4076,11 +4090,11 @@ static void wq_unbind_fn(struct work_struct *work)
> for_each_cpu_worker_pool(pool, cpu) {
> WARN_ON_ONCE(cpu != smp_processor_id());
>
> - mutex_lock(&pool->assoc_mutex);
> + mutex_lock(&pool->manager_mutex);
> spin_lock_irq(&pool->lock);
>
> /*
> - * We've claimed all manager positions. Make all workers
> + * We've blocked all manager operations. Make all workers
> * unbound and set DISASSOCIATED. Before this, all workers
> * except for the ones which are still executing works from
> * before the last CPU down must be on the cpu. After
> @@ -4095,7 +4109,7 @@ static void wq_unbind_fn(struct work_struct *work)
> pool->flags |= POOL_DISASSOCIATED;
>
> spin_unlock_irq(&pool->lock);
> - mutex_unlock(&pool->assoc_mutex);
> + mutex_unlock(&pool->manager_mutex);

Hi, Tejun

It must conflicts with wq/for-3.9-fixes.

Thanks,
Lai

> }
>
> /*
> @@ -4152,14 +4166,14 @@ static int __cpuinit workqueue_cpu_up_callback(struct notifier_block *nfb,
> case CPU_DOWN_FAILED:
> case CPU_ONLINE:
> for_each_cpu_worker_pool(pool, cpu) {
> - mutex_lock(&pool->assoc_mutex);
> + mutex_lock(&pool->manager_mutex);
> spin_lock_irq(&pool->lock);
>
> pool->flags &= ~POOL_DISASSOCIATED;
> rebind_workers(pool);
>
> spin_unlock_irq(&pool->lock);
> - mutex_unlock(&pool->assoc_mutex);
> + mutex_unlock(&pool->manager_mutex);
> }
> break;
> }
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-14 16:41:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/7] workqueue: rename worker_pool->assoc_mutex to ->manager_mutex

Hey, Lai.

On Thu, Mar 14, 2013 at 9:00 AM, Lai Jiangshan <[email protected]> wrote:
>> @@ -4095,7 +4109,7 @@ static void wq_unbind_fn(struct work_struct *work)
>> pool->flags |= POOL_DISASSOCIATED;
>>
>> spin_unlock_irq(&pool->lock);
>> - mutex_unlock(&pool->assoc_mutex);
>> + mutex_unlock(&pool->manager_mutex);
>
> Hi, Tejun
>
> It must conflicts with wq/for-3.9-fixes.

Yeah, the resolution is trivial. I'll be carrying it in for-next once
this makes for-3.10.

Thanks.

--
tejun

2013-03-18 19:51:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET wq/for-3.10] workqueue: break up workqueue_lock into multiple locks

On Wed, Mar 13, 2013 at 07:57:18PM -0700, Tejun Heo wrote:
> and available in the following git branch.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-finer-locking

Applied to wq/for-3.10.

Thanks.

--
tejun

2013-03-20 14:01:52

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [PATCHSET wq/for-3.10] workqueue: break up workqueue_lock into multiple locks

2013/3/19 Tejun Heo <[email protected]>:
> On Wed, Mar 13, 2013 at 07:57:18PM -0700, Tejun Heo wrote:
>> and available in the following git branch.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-finer-locking
>
> Applied to wq/for-3.10.

Hello, Tejun.

I know I am late, but, please give me a change to ask a question.

Finer locking for workqueue code is really needed?
Is there a performance issue?
I think that there is too many locks and locking rules,
although the description about these are very nice.

Thanks.

> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-03-20 14:38:38

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET wq/for-3.10] workqueue: break up workqueue_lock into multiple locks

Hey,

On Wed, Mar 20, 2013 at 11:01:50PM +0900, JoonSoo Kim wrote:
> 2013/3/19 Tejun Heo <[email protected]>:
> > On Wed, Mar 13, 2013 at 07:57:18PM -0700, Tejun Heo wrote:
> >> and available in the following git branch.
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-finer-locking
> >
> > Applied to wq/for-3.10.
>
> Hello, Tejun.
>
> I know I am late, but, please give me a change to ask a question.
>
> Finer locking for workqueue code is really needed?
> Is there a performance issue?
> I think that there is too many locks and locking rules,
> although the description about these are very nice.

It isn't about performance. So, workqueue_lock is broken into three
locks by this series - wq_mutex, pwq_lock and mayday_lock. The
primary reason for this patchset is wq_mutex. We want to do blocking
operations while excluding workqueue and pool modifications which
isn't possible with workqueue_lock and at the same time there are
things which can't be protected by a mutex (should be a irq-safe
lock), so we need break up the lock. After breaking off wq_mutex from
workqueue_lock. Most stuff covered by workqueue_lock was pwq related
and mayday was the only one-off thing left, so one more lock there,
which I think actually make things easier to digest.

Thanks.

--
tejun