2013-04-14 16:42:32

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/8] workqueue: advance concurrency management

I found the early-increasing nr_running in wq_worker_waking_up() is useless
in many cases. it tries to avoid waking up idle workers for pending work item.
but delay increasing nr_running does not increase waking up idle workers.

so we delay increasing and remove wq_worker_waking_up() and ...

enjoy a simpler concurrency management.

Lai Jiangshan (8):
workqueue: remove @cpu from wq_worker_sleeping()
workqueue: use create_and_start_worker() in manage_workers()
workqueue: remove cpu_intensive from process_one_work()
workqueue: quit cm mode when sleeping
workqueue: remove disabled wq_worker_waking_up()
workqueue: make nr_running non-atomic
workqueue: move worker->flags up
workqueue: rename ->nr_running to ->nr_cm_workers

kernel/sched/core.c | 6 +-
kernel/workqueue.c | 234 +++++++++++++++---------------------------
kernel/workqueue_internal.h | 9 +-
3 files changed, 89 insertions(+), 160 deletions(-)

--
1.7.7.6


2013-04-14 16:42:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/8] workqueue: remove @cpu from wq_worker_sleeping()

WARN_ON_ONCE(cpu != raw_smp_processor_id()) in
wq_worker_sleeping() in useless, the caller ensures
cpu == raw_smp_processor_id().

We should use WARN_ON_ONCE(pool->cpu != raw_smp_processor_id())
to do the expected test.

It results @cpu removed from wq_worker_sleeping()

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/sched/core.c | 2 +-
kernel/workqueue.c | 7 +++----
kernel/workqueue_internal.h | 2 +-
3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 23606ee..ffc06ad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2907,7 +2907,7 @@ need_resched:
if (prev->flags & PF_WQ_WORKER) {
struct task_struct *to_wakeup;

- to_wakeup = wq_worker_sleeping(prev, cpu);
+ to_wakeup = wq_worker_sleeping(prev);
if (to_wakeup)
try_to_wake_up_local(to_wakeup);
}
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c273376..b3095ad 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -807,7 +807,6 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
/**
* wq_worker_sleeping - a worker is going to sleep
* @task: task going to sleep
- * @cpu: CPU in question, must be the current CPU number
*
* This function is called during schedule() when a busy worker is
* going to sleep. Worker on the same cpu can be woken up by
@@ -817,9 +816,9 @@ void wq_worker_waking_up(struct task_struct *task, int cpu)
* spin_lock_irq(rq->lock)
*
* RETURNS:
- * Worker task on @cpu to wake up, %NULL if none.
+ * Worker task on the same pool to wake up, %NULL if none.
*/
-struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
+struct task_struct *wq_worker_sleeping(struct task_struct *task)
{
struct worker *worker = kthread_data(task), *to_wakeup = NULL;
struct worker_pool *pool;
@@ -835,7 +834,7 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu)
pool = worker->pool;

/* this can only happen on the local cpu */
- if (WARN_ON_ONCE(cpu != raw_smp_processor_id()))
+ if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
return NULL;

/*
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 84ab6e1..aec8df4 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -57,6 +57,6 @@ static inline struct worker *current_wq_worker(void)
* sched.c and workqueue.c.
*/
void wq_worker_waking_up(struct task_struct *task, int cpu);
-struct task_struct *wq_worker_sleeping(struct task_struct *task, int cpu);
+struct task_struct *wq_worker_sleeping(struct task_struct *task);

#endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
--
1.7.7.6

2013-04-14 16:42:43

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/8] workqueue: use create_and_start_worker() in manage_workers()

After we allocated worker, we are free to access the worker without and
protection before it is visiable/published.

In old code, worker is published by start_worker(), and it is visiable only
after start_worker(), but in current code, it is visiable by
for_each_pool_worker() after
"idr_replace(&pool->worker_idr, worker, worker->id);"

It means the step of publishing worker is not atomic, it is very fragile.
(although I did not find any bug from it in current code). it should be fixed.

It can be fixed by moving "idr_replace(&pool->worker_idr, worker, worker->id);"
to start_worker() or by folding start_worker() in to create_worker().

I choice the second one. It makes the code much simple.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 62 +++++++++++++++------------------------------------
1 files changed, 18 insertions(+), 44 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b3095ad..d1e10c5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -64,7 +64,7 @@ enum {
*
* Note that DISASSOCIATED should be flipped only while holding
* manager_mutex to avoid changing binding state while
- * create_worker() is in progress.
+ * create_and_start_worker_locked() is in progress.
*/
POOL_MANAGE_WORKERS = 1 << 0, /* need to manage workers */
POOL_DISASSOCIATED = 1 << 2, /* cpu can't serve workers */
@@ -1542,7 +1542,10 @@ static void worker_enter_idle(struct worker *worker)
(worker->hentry.next || worker->hentry.pprev)))
return;

- /* can't use worker_set_flags(), also called from start_worker() */
+ /*
+ * can't use worker_set_flags(), also called from
+ * create_and_start_worker_locked().
+ */
worker->flags |= WORKER_IDLE;
pool->nr_idle++;
worker->last_active = jiffies;
@@ -1663,12 +1666,10 @@ static struct worker *alloc_worker(void)
}

/**
- * create_worker - create a new workqueue worker
+ * create_and_start_worker_locked - create and start a worker for a pool
* @pool: pool the new worker will belong to
*
- * Create a new worker which is bound to @pool. The returned worker
- * can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * Create a new worker which is bound to @pool and start it.
*
* CONTEXT:
* Might sleep. Does GFP_KERNEL allocations.
@@ -1676,7 +1677,7 @@ static struct worker *alloc_worker(void)
* RETURNS:
* Pointer to the newly created worker.
*/
-static struct worker *create_worker(struct worker_pool *pool)
+static struct worker *create_and_start_worker_locked(struct worker_pool *pool)
{
struct worker *worker = NULL;
int id = -1;
@@ -1734,9 +1735,15 @@ static struct worker *create_worker(struct worker_pool *pool)
if (pool->flags & POOL_DISASSOCIATED)
worker->flags |= WORKER_UNBOUND;

- /* successful, commit the pointer to idr */
spin_lock_irq(&pool->lock);
+ /* successful, commit the pointer to idr */
idr_replace(&pool->worker_idr, worker, worker->id);
+
+ /* start worker */
+ worker->flags |= WORKER_STARTED;
+ worker->pool->nr_workers++;
+ worker_enter_idle(worker);
+ wake_up_process(worker->task);
spin_unlock_irq(&pool->lock);

return worker;
@@ -1752,23 +1759,6 @@ fail:
}

/**
- * start_worker - start a newly created worker
- * @worker: worker to start
- *
- * Make the pool aware of @worker and start it.
- *
- * CONTEXT:
- * spin_lock_irq(pool->lock).
- */
-static void start_worker(struct worker *worker)
-{
- worker->flags |= WORKER_STARTED;
- worker->pool->nr_workers++;
- worker_enter_idle(worker);
- wake_up_process(worker->task);
-}
-
-/**
* create_and_start_worker - create and start a worker for a pool
* @pool: the target pool
*
@@ -1779,14 +1769,7 @@ static int create_and_start_worker(struct worker_pool *pool)
struct worker *worker;

mutex_lock(&pool->manager_mutex);
-
- worker = create_worker(pool);
- if (worker) {
- spin_lock_irq(&pool->lock);
- start_worker(worker);
- spin_unlock_irq(&pool->lock);
- }
-
+ worker = create_and_start_worker_locked(pool);
mutex_unlock(&pool->manager_mutex);

return worker ? 0 : -ENOMEM;
@@ -1934,17 +1917,8 @@ restart:
mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);

while (true) {
- struct worker *worker;
-
- worker = create_worker(pool);
- if (worker) {
- del_timer_sync(&pool->mayday_timer);
- spin_lock_irq(&pool->lock);
- start_worker(worker);
- if (WARN_ON_ONCE(need_to_create_worker(pool)))
- goto restart;
- return true;
- }
+ if (create_and_start_worker_locked(pool))
+ break;

if (!need_to_create_worker(pool))
break;
--
1.7.7.6

2013-04-14 16:42:47

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/8] workqueue: remove cpu_intensive from process_one_work()

In process_one_work(), we can use "worker->flags & WORKER_CPU_INTENSIVE"
instead "cpu_intensive" and because worker->flags is hot field
(accessed when process each work item). so this change will not cause
any performance down.

It prepare for also clearing WORKER_QUIT_CM in the same place.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d1e10c5..a4bc589 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2068,7 +2068,6 @@ __acquires(&pool->lock)
{
struct pool_workqueue *pwq = get_work_pwq(work);
struct worker_pool *pool = worker->pool;
- bool cpu_intensive = pwq->wq->flags & WQ_CPU_INTENSIVE;
int work_color;
struct worker *collision;
#ifdef CONFIG_LOCKDEP
@@ -2118,7 +2117,7 @@ __acquires(&pool->lock)
* CPU intensive works don't participate in concurrency
* management. They're the scheduler's responsibility.
*/
- if (unlikely(cpu_intensive))
+ if (unlikely(pwq->wq->flags & WQ_CPU_INTENSIVE))
worker_set_flags(worker, WORKER_CPU_INTENSIVE, true);

/*
@@ -2161,8 +2160,8 @@ __acquires(&pool->lock)

spin_lock_irq(&pool->lock);

- /* clear cpu intensive status */
- if (unlikely(cpu_intensive))
+ /* clear cpu intensive status if it is set */
+ if (unlikely(worker->flags & WORKER_CPU_INTENSIVE))
worker_clr_flags(worker, WORKER_CPU_INTENSIVE);

/* we're done with it, release */
--
1.7.7.6

2013-04-14 16:42:54

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 4/8] workqueue: quit cm mode when sleeping

When a work is waken up from sleeping, it makes very small sense if
we still consider this worker is RUNNING(in view of concurrency management)
o if the work goes to sleep again, it is not RUNNING again.
o if the work runs long without sleeping, the worker should be consider
as CPU_INTENSIVE.
o if the work runs short without sleeping, we can still consider
this worker is not RUNNING this harmless short time,
and fix it up before next work.

o In almost all cases, the increasing nr_running does not increase
nr_running from 0. there are other RUNNING workers, the other
workers will not goto sleeping very probably before this worker
finishes the work in may cases. this early increasing makes less
sense.

So don't need consider this worker is RUNNING so early and
we can delay increasing nr_running a little. we increase it after
finished the work.

It is done by adding a new worker flag: WORKER_QUIT_CM.
it used for disabling increasing nr_running in wq_worker_waking_up(),
and for increasing nr_running after finished the work.

This change maybe cause we wakeup(or create) more workers in raw case,
but this is not incorrect.

It make the currency management much more simpler

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 20 ++++++++++++++------
kernel/workqueue_internal.h | 2 +-
2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a4bc589..668e9b7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -75,11 +75,13 @@ enum {
WORKER_DIE = 1 << 1, /* die die die */
WORKER_IDLE = 1 << 2, /* is idle */
WORKER_PREP = 1 << 3, /* preparing to run works */
+ WORKER_QUIT_CM = 1 << 4, /* quit concurrency managed */
WORKER_CPU_INTENSIVE = 1 << 6, /* cpu intensive */
WORKER_UNBOUND = 1 << 7, /* worker is unbound */
WORKER_REBOUND = 1 << 8, /* worker was rebound */

- WORKER_NOT_RUNNING = WORKER_PREP | WORKER_CPU_INTENSIVE |
+ WORKER_NOT_RUNNING = WORKER_PREP | WORKER_QUIT_CM |
+ WORKER_CPU_INTENSIVE |
WORKER_UNBOUND | WORKER_REBOUND,

NR_STD_WORKER_POOLS = 2, /* # standard pools per cpu */
@@ -122,6 +124,10 @@ enum {
* cpu or grabbing pool->lock is enough for read access. If
* POOL_DISASSOCIATED is set, it's identical to L.
*
+ * LI: If POOL_DISASSOCIATED is NOT set, read/modification access should be
+ * done with local IRQ-disabled and only from local cpu.
+ * If POOL_DISASSOCIATED is set, it's identical to L.
+ *
* MG: pool->manager_mutex and pool->lock protected. Writes require both
* locks. Reads can happen under either lock.
*
@@ -843,11 +849,13 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
* Please read comment there.
*
* NOT_RUNNING is clear. This means that we're bound to and
- * running on the local cpu w/ rq lock held and preemption
+ * running on the local cpu w/ rq lock held and preemption/irq
* disabled, which in turn means that none else could be
* manipulating idle_list, so dereferencing idle_list without pool
- * lock is safe.
+ * lock is safe. And which in turn also means that we can
+ * manipulating worker->flags.
*/
+ worker->flags |= WORKER_QUIT_CM;
if (atomic_dec_and_test(&pool->nr_running) &&
!list_empty(&pool->worklist))
to_wakeup = first_worker(pool);
@@ -2160,9 +2168,9 @@ __acquires(&pool->lock)

spin_lock_irq(&pool->lock);

- /* clear cpu intensive status if it is set */
- if (unlikely(worker->flags & WORKER_CPU_INTENSIVE))
- worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
+ /* clear cpu intensive status or WORKER_QUIT_CM if they are set */
+ if (unlikely(worker->flags & (WORKER_CPU_INTENSIVE | WORKER_QUIT_CM)))
+ worker_clr_flags(worker, WORKER_CPU_INTENSIVE | WORKER_QUIT_CM);

/* we're done with it, release */
hash_del(&worker->hentry);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index aec8df4..1713ae7 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -35,7 +35,7 @@ struct worker {
/* L: for rescuers */
/* 64 bytes boundary on 64bit, 32 on 32bit */
unsigned long last_active; /* L: last active timestamp */
- unsigned int flags; /* X: flags */
+ unsigned int flags; /* LI: flags */
int id; /* I: worker id */

/* used only by rescuers to point to the target workqueue */
--
1.7.7.6

2013-04-14 16:43:03

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 5/8] workqueue: remove disabled wq_worker_waking_up()

When a worker is sleeping, its flags has WORKER_QUIT_CM, which means
worker->flags & WORKER_NOT_RUNNING is always non-zero, and which means
wq_worker_waking_up() is disabled.

so we removed wq_worker_waking_up(). (the access to worker->flags
in wq_worker_waking_up() is not protected by "LI". after this, it is alwasy
protected by "LI")

The patch also do these changes after removal:
1) because wq_worker_waking_up() is removed, we don't need schedule()
before zapping nr_running in wq_unbind_fn(), and don't need to
release/regain pool->lock.
2) the sanity check in worker_enter_idle() is changed to also check for
unbound/disassociated pools. (because the above change and nr_running
is expected always reliable in worker_enter_idle() now.)

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/sched/core.c | 4 ---
kernel/workqueue.c | 58 +++++++-----------------------------------
kernel/workqueue_internal.h | 3 +-
3 files changed, 11 insertions(+), 54 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ffc06ad..18f95884 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1276,10 +1276,6 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
{
activate_task(rq, p, en_flags);
p->on_rq = 1;
-
- /* if a worker is waking up, notify workqueue */
- if (p->flags & PF_WQ_WORKER)
- wq_worker_waking_up(p, cpu_of(rq));
}

/*
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 668e9b7..9f1ebdf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -790,27 +790,6 @@ static void wake_up_worker(struct worker_pool *pool)
}

/**
- * wq_worker_waking_up - a worker is waking up
- * @task: task waking up
- * @cpu: CPU @task is waking up to
- *
- * This function is called during try_to_wake_up() when a worker is
- * being awoken.
- *
- * CONTEXT:
- * spin_lock_irq(rq->lock)
- */
-void wq_worker_waking_up(struct task_struct *task, int cpu)
-{
- struct worker *worker = kthread_data(task);
-
- if (!(worker->flags & WORKER_NOT_RUNNING)) {
- WARN_ON_ONCE(worker->pool->cpu != cpu);
- atomic_inc(&worker->pool->nr_running);
- }
-}
-
-/**
* wq_worker_sleeping - a worker is going to sleep
* @task: task going to sleep
*
@@ -1564,14 +1543,8 @@ static void worker_enter_idle(struct worker *worker)
if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);

- /*
- * Sanity check nr_running. Because wq_unbind_fn() releases
- * pool->lock between setting %WORKER_UNBOUND and zapping
- * nr_running, the warning may trigger spuriously. Check iff
- * unbind is not in progress.
- */
- WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
- pool->nr_workers == pool->nr_idle &&
+ /* Sanity check nr_running. */
+ WARN_ON_ONCE(pool->nr_workers == pool->nr_idle &&
atomic_read(&pool->nr_running));
}

@@ -4385,24 +4358,12 @@ static void wq_unbind_fn(struct work_struct *work)

pool->flags |= POOL_DISASSOCIATED;

- spin_unlock_irq(&pool->lock);
- mutex_unlock(&pool->manager_mutex);
-
/*
- * Call schedule() so that we cross rq->lock and thus can
- * guarantee sched callbacks see the %WORKER_UNBOUND flag.
- * This is necessary as scheduler callbacks may be invoked
- * from other cpus.
- */
- schedule();
-
- /*
- * Sched callbacks are disabled now. Zap nr_running.
- * After this, nr_running stays zero and need_more_worker()
- * and keep_working() are always true as long as the
- * worklist is not empty. This pool now behaves as an
- * unbound (in terms of concurrency management) pool which
- * are served by workers tied to the pool.
+ * Zap nr_running. After this, nr_running stays zero
+ * and need_more_worker() and keep_working() are always true
+ * as long as the worklist is not empty. This pool now
+ * behaves as an unbound (in terms of concurrency management)
+ * pool which are served by workers tied to the pool.
*/
atomic_set(&pool->nr_running, 0);

@@ -4411,9 +4372,9 @@ static void wq_unbind_fn(struct work_struct *work)
* worker blocking could lead to lengthy stalls. Kick off
* unbound chain execution of currently pending work items.
*/
- spin_lock_irq(&pool->lock);
wake_up_worker(pool);
spin_unlock_irq(&pool->lock);
+ mutex_unlock(&pool->manager_mutex);
}
}

@@ -4466,9 +4427,10 @@ static void rebind_workers(struct worker_pool *pool)
* concurrency management. Note that when or whether
* @worker clears REBOUND doesn't affect correctness.
*
+ * Current CPU may not the cpu of this rebinding pool,
* ACCESS_ONCE() is necessary because @worker->flags may be
* tested without holding any lock in
- * wq_worker_waking_up(). Without it, NOT_RUNNING test may
+ * wq_worker_sleeping(). Without it, NOT_RUNNING test may
* fail incorrectly leading to premature concurrency
* management operations.
*/
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 1713ae7..e9fd05f 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -54,9 +54,8 @@ static inline struct worker *current_wq_worker(void)

/*
* Scheduler hooks for concurrency managed workqueue. Only to be used from
- * sched.c and workqueue.c.
+ * kernel/sched/core.c and kernel/workqueue.c.
*/
-void wq_worker_waking_up(struct task_struct *task, int cpu);
struct task_struct *wq_worker_sleeping(struct task_struct *task);

#endif /* _KERNEL_WORKQUEUE_INTERNAL_H */
--
1.7.7.6

2013-04-14 16:43:19

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 6/8] workqueue: make nr_running non-atomic

Now, nr_running is accessed only with local IRQ-disabled and only from local
cpu if the pool is assocated.(execpt read-access in insert_work()).

so we convert it to non-atomic to reduce the overhead of atomic.
It is protected by "LI"

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 49 +++++++++++++++++++++----------------------------
1 files changed, 21 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f1ebdf..25e2e5a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -150,6 +150,7 @@ struct worker_pool {
int node; /* I: the associated node ID */
int id; /* I: pool ID */
unsigned int flags; /* X: flags */
+ int nr_running; /* LI: count for running */

struct list_head worklist; /* L: list of pending works */
int nr_workers; /* L: total number of workers */
@@ -175,13 +176,6 @@ struct worker_pool {
int refcnt; /* PL: refcnt for unbound pools */

/*
- * The current concurrency level. As it's likely to be accessed
- * from other CPUs during try_to_wake_up(), put it in a separate
- * cacheline.
- */
- atomic_t nr_running ____cacheline_aligned_in_smp;
-
- /*
* Destruction of pool is sched-RCU protected to allow dereferences
* from get_work_pool().
*/
@@ -700,7 +694,7 @@ static bool work_is_canceling(struct work_struct *work)

static bool __need_more_worker(struct worker_pool *pool)
{
- return !atomic_read(&pool->nr_running);
+ return !pool->nr_running;
}

/*
@@ -725,8 +719,7 @@ static bool may_start_working(struct worker_pool *pool)
/* Do I need to keep working? Called from currently running workers. */
static bool keep_working(struct worker_pool *pool)
{
- return !list_empty(&pool->worklist) &&
- atomic_read(&pool->nr_running) <= 1;
+ return !list_empty(&pool->worklist) && pool->nr_running <= 1;
}

/* Do we need a new worker? Called from manager. */
@@ -823,21 +816,24 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
return NULL;

/*
- * The counterpart of the following dec_and_test, implied mb,
- * worklist not empty test sequence is in insert_work().
- * Please read comment there.
- *
* NOT_RUNNING is clear. This means that we're bound to and
* running on the local cpu w/ rq lock held and preemption/irq
* disabled, which in turn means that none else could be
* manipulating idle_list, so dereferencing idle_list without pool
* lock is safe. And which in turn also means that we can
- * manipulating worker->flags.
+ * manipulating worker->flags and pool->nr_running.
*/
worker->flags |= WORKER_QUIT_CM;
- if (atomic_dec_and_test(&pool->nr_running) &&
- !list_empty(&pool->worklist))
- to_wakeup = first_worker(pool);
+ if (--pool->nr_running == 0) {
+ /*
+ * This smp_mb() forces a mb between decreasing nr_running
+ * and reading worklist. It paires with the smp_mb() in
+ * insert_work(). Please read comment there.
+ */
+ smp_mb();
+ if (!list_empty(&pool->worklist))
+ to_wakeup = first_worker(pool);
+ }
return to_wakeup ? to_wakeup->task : NULL;
}

@@ -868,12 +864,10 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags,
*/
if ((flags & WORKER_NOT_RUNNING) &&
!(worker->flags & WORKER_NOT_RUNNING)) {
- if (wakeup) {
- if (atomic_dec_and_test(&pool->nr_running) &&
- !list_empty(&pool->worklist))
- wake_up_worker(pool);
- } else
- atomic_dec(&pool->nr_running);
+ pool->nr_running--;
+ if (wakeup && !pool->nr_running &&
+ !list_empty(&pool->worklist))
+ wake_up_worker(pool);
}

worker->flags |= flags;
@@ -905,7 +899,7 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
*/
if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
if (!(worker->flags & WORKER_NOT_RUNNING))
- atomic_inc(&pool->nr_running);
+ pool->nr_running++;
}

/**
@@ -1544,8 +1538,7 @@ static void worker_enter_idle(struct worker *worker)
mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);

/* Sanity check nr_running. */
- WARN_ON_ONCE(pool->nr_workers == pool->nr_idle &&
- atomic_read(&pool->nr_running));
+ WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
}

/**
@@ -4365,7 +4358,7 @@ static void wq_unbind_fn(struct work_struct *work)
* behaves as an unbound (in terms of concurrency management)
* pool which are served by workers tied to the pool.
*/
- atomic_set(&pool->nr_running, 0);
+ pool->nr_running = 0;

/*
* With concurrency management just turned off, a busy
--
1.7.7.6

2013-04-14 16:44:39

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 7/8] workqueue: move worker->flags up

worker->flags is hot field(accessed when process each work item).
Move it up the the first 64 bytes(32 byte in 32bis) which are
hot fields.

And move colder field worker->task down to ensure worker->pool is
still in the first 64 bytes.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue_internal.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index e9fd05f..63cfac7 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -20,6 +20,7 @@ struct worker_pool;
* Only to be used in workqueue and async.
*/
struct worker {
+ unsigned int flags; /* LI: flags */
/* on idle list while idle, on busy hash table while busy */
union {
struct list_head entry; /* L: while idle */
@@ -30,12 +31,11 @@ struct worker {
work_func_t current_func; /* L: current_work's fn */
struct pool_workqueue *current_pwq; /* L: current_work's pwq */
struct list_head scheduled; /* L: scheduled works */
- struct task_struct *task; /* I: worker task */
struct worker_pool *pool; /* I: the associated pool */
/* L: for rescuers */
/* 64 bytes boundary on 64bit, 32 on 32bit */
+ struct task_struct *task; /* I: worker task */
unsigned long last_active; /* L: last active timestamp */
- unsigned int flags; /* LI: flags */
int id; /* I: worker id */

/* used only by rescuers to point to the target workqueue */
--
1.7.7.6

2013-04-14 16:48:39

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 8/8] workqueue: rename ->nr_running to ->nr_cm_workers

nr_running is not a good name, the reviewers may think they are non-sleeping
busy workers. nr_running is actually a counter for concurrency managed
workers. renaming it to nr_cm_workers would be better.

s/nr_running/nr_cm_workers/
s/NOT_RUNNING/NOT_CM/
manually tune a little(indent and the comment for nr_cm_workers)

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 69 +++++++++++++++++++++++++--------------------------
1 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 25e2e5a..25e028c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -80,7 +80,7 @@ enum {
WORKER_UNBOUND = 1 << 7, /* worker is unbound */
WORKER_REBOUND = 1 << 8, /* worker was rebound */

- WORKER_NOT_RUNNING = WORKER_PREP | WORKER_QUIT_CM |
+ WORKER_NOT_CM = WORKER_PREP | WORKER_QUIT_CM |
WORKER_CPU_INTENSIVE |
WORKER_UNBOUND | WORKER_REBOUND,

@@ -150,7 +150,7 @@ struct worker_pool {
int node; /* I: the associated node ID */
int id; /* I: pool ID */
unsigned int flags; /* X: flags */
- int nr_running; /* LI: count for running */
+ int nr_cm_workers; /* LI: count for cm workers */

struct list_head worklist; /* L: list of pending works */
int nr_workers; /* L: total number of workers */
@@ -694,14 +694,14 @@ static bool work_is_canceling(struct work_struct *work)

static bool __need_more_worker(struct worker_pool *pool)
{
- return !pool->nr_running;
+ return !pool->nr_cm_workers;
}

/*
* Need to wake up a worker? Called from anything but currently
* running workers.
*
- * Note that, because unbound workers never contribute to nr_running, this
+ * Note that, because unbound workers never contribute to nr_cm_workers, this
* function will always return %true for unbound pools as long as the
* worklist isn't empty.
*/
@@ -719,7 +719,7 @@ static bool may_start_working(struct worker_pool *pool)
/* Do I need to keep working? Called from currently running workers. */
static bool keep_working(struct worker_pool *pool)
{
- return !list_empty(&pool->worklist) && pool->nr_running <= 1;
+ return !list_empty(&pool->worklist) && pool->nr_cm_workers <= 1;
}

/* Do we need a new worker? Called from manager. */
@@ -804,9 +804,9 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
/*
* Rescuers, which may not have all the fields set up like normal
* workers, also reach here, let's not access anything before
- * checking NOT_RUNNING.
+ * checking NOT_CM.
*/
- if (worker->flags & WORKER_NOT_RUNNING)
+ if (worker->flags & WORKER_NOT_CM)
return NULL;

pool = worker->pool;
@@ -816,17 +816,17 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
return NULL;

/*
- * NOT_RUNNING is clear. This means that we're bound to and
+ * NOT_CM is clear. This means that we're bound to and
* running on the local cpu w/ rq lock held and preemption/irq
* disabled, which in turn means that none else could be
* manipulating idle_list, so dereferencing idle_list without pool
* lock is safe. And which in turn also means that we can
- * manipulating worker->flags and pool->nr_running.
+ * manipulating worker->flags and pool->nr_cm_workers.
*/
worker->flags |= WORKER_QUIT_CM;
- if (--pool->nr_running == 0) {
+ if (--pool->nr_cm_workers == 0) {
/*
- * This smp_mb() forces a mb between decreasing nr_running
+ * This smp_mb() forces a mb between decreasing nr_cm_workers
* and reading worklist. It paires with the smp_mb() in
* insert_work(). Please read comment there.
*/
@@ -838,13 +838,13 @@ struct task_struct *wq_worker_sleeping(struct task_struct *task)
}

/**
- * worker_set_flags - set worker flags and adjust nr_running accordingly
+ * worker_set_flags - set worker flags and adjust nr_cm_workers accordingly
* @worker: self
* @flags: flags to set
* @wakeup: wakeup an idle worker if necessary
*
- * Set @flags in @worker->flags and adjust nr_running accordingly. If
- * nr_running becomes zero and @wakeup is %true, an idle worker is
+ * Set @flags in @worker->flags and adjust nr_cm_workers accordingly. If
+ * nr_cm_workers becomes zero and @wakeup is %true, an idle worker is
* woken up.
*
* CONTEXT:
@@ -858,14 +858,13 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags,
WARN_ON_ONCE(worker->task != current);

/*
- * If transitioning into NOT_RUNNING, adjust nr_running and
+ * If transitioning into NOT_CM, adjust nr_cm_workers and
* wake up an idle worker as necessary if requested by
* @wakeup.
*/
- if ((flags & WORKER_NOT_RUNNING) &&
- !(worker->flags & WORKER_NOT_RUNNING)) {
- pool->nr_running--;
- if (wakeup && !pool->nr_running &&
+ if ((flags & WORKER_NOT_CM) && !(worker->flags & WORKER_NOT_CM)) {
+ pool->nr_cm_workers--;
+ if (wakeup && !pool->nr_cm_workers &&
!list_empty(&pool->worklist))
wake_up_worker(pool);
}
@@ -874,11 +873,11 @@ static inline void worker_set_flags(struct worker *worker, unsigned int flags,
}

/**
- * worker_clr_flags - clear worker flags and adjust nr_running accordingly
+ * worker_clr_flags - clear worker flags and adjust nr_cm_workers accordingly
* @worker: self
* @flags: flags to clear
*
- * Clear @flags in @worker->flags and adjust nr_running accordingly.
+ * Clear @flags in @worker->flags and adjust nr_cm_workers accordingly.
*
* CONTEXT:
* spin_lock_irq(pool->lock)
@@ -893,13 +892,13 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
worker->flags &= ~flags;

/*
- * If transitioning out of NOT_RUNNING, increment nr_running. Note
- * that the nested NOT_RUNNING is not a noop. NOT_RUNNING is mask
+ * If transitioning out of NOT_CM, increment nr_cm_workers. Note
+ * that the nested NOT_CM is not a noop. NOT_CM is mask
* of multiple flags, not a single flag.
*/
- if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
- if (!(worker->flags & WORKER_NOT_RUNNING))
- pool->nr_running++;
+ if ((flags & WORKER_NOT_CM) && (oflags & WORKER_NOT_CM))
+ if (!(worker->flags & WORKER_NOT_CM))
+ pool->nr_cm_workers++;
}

/**
@@ -1237,7 +1236,7 @@ static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,

/*
* Ensure either wq_worker_sleeping() sees the above
- * list_add_tail() or we see zero nr_running to avoid workers lying
+ * list_add_tail() or we see zero nr_cm_workers to avoid workers lying
* around lazily while there are works to be processed.
*/
smp_mb();
@@ -1537,8 +1536,8 @@ static void worker_enter_idle(struct worker *worker)
if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);

- /* Sanity check nr_running. */
- WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
+ /* Sanity check nr_cm_workers. */
+ WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_cm_workers);
}

/**
@@ -2342,7 +2341,7 @@ repeat:
spin_unlock_irq(&wq_mayday_lock);

/* rescuers should never participate in concurrency management */
- WARN_ON_ONCE(!(rescuer->flags & WORKER_NOT_RUNNING));
+ WARN_ON_ONCE(!(rescuer->flags & WORKER_NOT_CM));
schedule();
goto repeat;
}
@@ -4352,13 +4351,13 @@ static void wq_unbind_fn(struct work_struct *work)
pool->flags |= POOL_DISASSOCIATED;

/*
- * Zap nr_running. After this, nr_running stays zero
+ * Zap nr_cm_workers. After this, nr_cm_workers stays zero
* and need_more_worker() and keep_working() are always true
* as long as the worklist is not empty. This pool now
* behaves as an unbound (in terms of concurrency management)
* pool which are served by workers tied to the pool.
*/
- pool->nr_running = 0;
+ pool->nr_cm_workers = 0;

/*
* With concurrency management just turned off, a busy
@@ -4413,8 +4412,8 @@ static void rebind_workers(struct worker_pool *pool)

/*
* We want to clear UNBOUND but can't directly call
- * worker_clr_flags() or adjust nr_running. Atomically
- * replace UNBOUND with another NOT_RUNNING flag REBOUND.
+ * worker_clr_flags() or adjust nr_cm_workers. Atomically
+ * replace UNBOUND with another NOT_CM flag REBOUND.
* @worker will clear REBOUND using worker_clr_flags() when
* it initiates the next execution cycle thus restoring
* concurrency management. Note that when or whether
@@ -4423,7 +4422,7 @@ static void rebind_workers(struct worker_pool *pool)
* Current CPU may not the cpu of this rebinding pool,
* ACCESS_ONCE() is necessary because @worker->flags may be
* tested without holding any lock in
- * wq_worker_sleeping(). Without it, NOT_RUNNING test may
+ * wq_worker_sleeping(). Without it, NOT_CM test may
* fail incorrectly leading to premature concurrency
* management operations.
*/
--
1.7.7.6

2013-04-18 22:10:59

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 0/8] workqueue: advance concurrency management

Ping.

On Mon, Apr 15, 2013 at 12:41 AM, Lai Jiangshan <[email protected]> wrote:
> I found the early-increasing nr_running in wq_worker_waking_up() is useless
> in many cases. it tries to avoid waking up idle workers for pending work item.
> but delay increasing nr_running does not increase waking up idle workers.
>
> so we delay increasing and remove wq_worker_waking_up() and ...
>
> enjoy a simpler concurrency management.
>
> Lai Jiangshan (8):
> workqueue: remove @cpu from wq_worker_sleeping()
> workqueue: use create_and_start_worker() in manage_workers()
> workqueue: remove cpu_intensive from process_one_work()
> workqueue: quit cm mode when sleeping
> workqueue: remove disabled wq_worker_waking_up()
> workqueue: make nr_running non-atomic
> workqueue: move worker->flags up
> workqueue: rename ->nr_running to ->nr_cm_workers
>
> kernel/sched/core.c | 6 +-
> kernel/workqueue.c | 234 +++++++++++++++---------------------------
> kernel/workqueue_internal.h | 9 +-
> 3 files changed, 89 insertions(+), 160 deletions(-)
>
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-04-19 18:12:01

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/8] workqueue: advance concurrency management

Hey,

On Fri, Apr 19, 2013 at 06:10:57AM +0800, Lai Jiangshan wrote:
> Ping.

Sorry, I've been at collab summit / lsf. Plus, it's a bit too late
for for-3.10 anyway. Anyways, after glancing over it, here are my
preliminary thoughts. The first one looks good but I'm not sure about
dropping nr_running adjustment. The only real benefit coming from
that is dropping a sched callback and if there's any performance /
overhead impact, I'm afraid it's gonna be negative. There are actual
benefits in using as few tasks as possible - the cache footprint gets
smaller, so unless there's a clear indication that the suggested
behavior is better in some way, I'm not sure what we're buying with
the proposed changes.

Thanks.

--
tejun

2013-04-20 16:02:28

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 0/8] workqueue: advance concurrency management

On Sat, Apr 20, 2013 at 2:11 AM, Tejun Heo <[email protected]> wrote:
> Hey,
>
> On Fri, Apr 19, 2013 at 06:10:57AM +0800, Lai Jiangshan wrote:
>> Ping.
>
> Sorry, I've been at collab summit / lsf. Plus, it's a bit too late
> for for-3.10 anyway. Anyways, after glancing over it, here are my
> preliminary thoughts. The first one looks good but I'm not sure about
> dropping nr_running adjustment. The only real benefit coming from
> that is dropping a sched callback and if there's any performance /
> overhead impact, I'm afraid it's gonna be negative. There are actual
> benefits in using as few tasks as possible -

waking_up() callback doesn't win too much in this.


> the cache footprint gets smaller,

cache footprint also be reduced in different way in the patchset.
and memory atomic operations are reduced.

> so unless there's a clear indication that the suggested

Only simple.
and remove the optimization from rare cases.

> behavior is better in some way, I'm not sure what we're buying with
> the proposed changes.
>
> Thanks.
>
> --
> tejun