These small patches change the rebind_workers() a little.
Patch1,5 fix possible bug.
Patch1,2 idle_worker_rebind() uses manage_mutex to wait rebind_workers()
to finish and ease WORKER_REBIND
Patch3,4 makes rebind_workers() single pass and makes code clean.
Patch5 use single write instruction to void other CPU see wrong flags.
Patch6,7 small fix.
Lai Jiangshan (7):
wait on manager_mutex instead of rebind_hold
simple clear WORKER_REBIND
explit way to wait for idles workers to finish
single pass rebind
ensure the wq_worker_sleeping() see the right flags
init 0
static idle_rebind
kernel/workqueue.c | 81 ++++++++++++++++++++-------------------------------
1 files changed, 32 insertions(+), 49 deletions(-)
--
1.7.4.4
Current idle_worker_rebind() has a bug.
Worker thread: The seconed CPU_ONLINE thread
idle_worker_rebind()
wait_event(gcwq->rebind_hold)
test for WORKER_REBIND and fails
sleeping...
#the first CPU_ONLINE is wokenup,
#finish its later work and gone.
this thread is also wokenup, but it is
not scheduled, it is still sleeping
sleeping...
#the cpu is offline again
#the cpu is online again,
#the online code do notify(CPU_ONLINE) call rebind_workers()
#I name this is the seconed CPU_ONLINE set WORKER_REBIND to idle workers
#thread, see the right. .
.
this thread is finally scheduled, .
sees the WORKER_REBIND is not cleared, .
go to sleep again waiting for (another) .
rebind_workers() to wake up me. <--bug-> waiting for the idles' ACK.
The two thread wait each other. It is bug.
This fix:
The idle_worker_rebind() don't wait on rebind_hold, it waits on manager_mutex
instead. When mutex_lock(manager_mutex) returns, the idles know that
the corresponding rebind_workers() is finish up, the idle_worker_rebind() can
returns.
This fix has an advantage: WORKER_REBIND is not used for wait_event(),
so we can clear it in idle_worker_rebind().(next patch)
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 13 +++----------
1 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 692d976..5872c31 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -184,8 +184,6 @@ struct global_cwq {
/* L: hash of busy workers */
struct worker_pool pools[2]; /* normal and highpri pools */
-
- wait_queue_head_t rebind_hold; /* rebind hold wait */
} ____cacheline_aligned_in_smp;
/*
@@ -1316,8 +1314,6 @@ struct idle_rebind {
*/
static void idle_worker_rebind(struct worker *worker)
{
- struct global_cwq *gcwq = worker->pool->gcwq;
-
/* CPU must be online at this point */
WARN_ON(!worker_maybe_bind_and_lock(worker));
if (!--worker->idle_rebind->cnt)
@@ -1325,7 +1321,8 @@ static void idle_worker_rebind(struct worker *worker)
spin_unlock_irq(&worker->pool->gcwq->lock);
/* we did our part, wait for rebind_workers() to finish up */
- wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
+ mutex_lock(&worker->pool->manager_mutex);
+ mutex_unlock(&worker->pool->manager_mutex);
}
/*
@@ -1386,7 +1383,7 @@ static void rebind_workers(struct global_cwq *gcwq)
/*
* Rebind idle workers. Interlocked both ways. We wait for
* workers to rebind via @idle_rebind.done. Workers will wait for
- * us to finish up by watching %WORKER_REBIND.
+ * us to finish up by competing on pool->manager_mutex.
*/
init_completion(&idle_rebind.done);
retry:
@@ -1429,8 +1426,6 @@ retry:
list_for_each_entry(worker, &pool->idle_list, entry)
worker->flags &= ~WORKER_REBIND;
- wake_up_all(&gcwq->rebind_hold);
-
/* rebind busy workers */
for_each_busy_worker(worker, i, pos, gcwq) {
struct work_struct *rebind_work = &worker->rebind_work;
@@ -3722,8 +3717,6 @@ static int __init init_workqueues(void)
mutex_init(&pool->manager_mutex);
ida_init(&pool->worker_ida);
}
-
- init_waitqueue_head(&gcwq->rebind_hold);
}
/* create the initial worker */
--
1.7.4.4
The compiler may compile this code into TWO write/modify instructions.
worker->flags &= ~WORKER_UNBOUND;
worker->flags |= WORKER_REBIND;
so the other CPU may see the temporary of worker->flags which has
not WORKER_UNBOUND nor WORKER_REBIND, it will wrongly do local wake up.
so we use one write/modify instruction explicitly instead.
This bug will not occur on idle workers, because they have another
WORKER_NOT_RUNNING flags.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 96485c0..ed23c9a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1415,10 +1415,13 @@ static void rebind_workers(struct global_cwq *gcwq)
/* rebind busy workers */
for_each_busy_worker(worker, i, pos, gcwq) {
struct work_struct *rebind_work = &worker->rebind_work;
+ unsigned long worker_flags = worker->flags;
/* morph UNBOUND to REBIND */
- worker->flags &= ~WORKER_UNBOUND;
- worker->flags |= WORKER_REBIND;
+ worker_flags &= ~WORKER_UNBOUND;
+ worker_flags |= WORKER_REBIND;
+ /* ensure the wq_worker_sleeping() see the right flags */
+ ACCESS_ONCE(worker->flags) = worker_flags;
if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,
work_data_bits(rebind_work)))
--
1.7.4.4
busy_worker_rebind_fn() can't return until all idle workers are rebound,
the code of busy_worker_rebind_fn() ensure this.
So we can change the order of the code of rebind_workers(),
and make it a single pass do the rebind_workers().
It makes the code much clean and better readability.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 19 +++++++------------
1 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f6e4394..96485c0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1394,16 +1394,12 @@ static void rebind_workers(struct global_cwq *gcwq)
* us to finish up by competing on pool->manager_mutex.
*/
init_completion(&idle_rebind.done);
-retry:
idle_rebind.cnt = 1;
INIT_COMPLETION(idle_rebind.done);
/* set REBIND and kick idle ones, we'll wait for these later */
for_each_worker_pool(pool, gcwq) {
list_for_each_entry(worker, &pool->idle_list, entry) {
- if (!(worker->flags & WORKER_UNBOUND))
- continue;
-
/* morph UNBOUND to REBIND */
worker->flags &= ~WORKER_UNBOUND;
worker->flags |= WORKER_REBIND;
@@ -1416,14 +1412,6 @@ retry:
}
}
- if (--idle_rebind.cnt) {
- spin_unlock_irq(&gcwq->lock);
- wait_for_completion(&idle_rebind.done);
- spin_lock_irq(&gcwq->lock);
- /* busy ones might have become idle while waiting, retry */
- goto retry;
- }
-
/* rebind busy workers */
for_each_busy_worker(worker, i, pos, gcwq) {
struct work_struct *rebind_work = &worker->rebind_work;
@@ -1442,6 +1430,13 @@ retry:
worker->scheduled.next,
work_color_to_flags(WORK_NO_COLOR));
}
+
+ /* waiting for all idle workers to be rebound */
+ if (--idle_rebind.cnt) {
+ spin_unlock_irq(&gcwq->lock);
+ wait_for_completion(&idle_rebind.done);
+ spin_lock_irq(&gcwq->lock);
+ }
}
static struct worker *alloc_worker(void)
--
1.7.4.4
busy_worker_rebind_fn() can't return until all idle workers are rebound.
This order is ensured by rebind_workers() currently.
We use mutex_lock(&worker->pool->manager_mutex) to wait for all idle workers
to be rebound. this is an explicit way and it will ease the pain of
the rebind_workers().
The sleeping mutex_lock(&worker->pool->manager_mutex) must be put in the top of
busy_worker_rebind_fn(), because this busy worker thread can sleep
before the WORKER_REBIND is cleared, but can't sleep after
the WORKER_REBIND cleared.
It adds a small overhead to the unlikely path.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 90ada8f..f6e4394 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1337,6 +1337,13 @@ static void busy_worker_rebind_fn(struct work_struct *work)
struct worker *worker = container_of(work, struct worker, rebind_work);
struct global_cwq *gcwq = worker->pool->gcwq;
+ /*
+ * Waiting all idle workers are rebound by competing on
+ * pool->manager_mutex and waiting for rebind_workers() to finish up.
+ */
+ mutex_lock(&worker->pool->manager_mutex);
+ mutex_unlock(&worker->pool->manager_mutex);
+
if (worker_maybe_bind_and_lock(worker))
worker_clr_flags(worker, WORKER_REBIND);
--
1.7.4.4
rebind_workers() is protected by cpu_hotplug lock,
so struct idle_rebind is also proteced by it.
And we can use a compile time allocated idle_rebind instead
of allocating it from the stack. it makes code clean.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 28 +++++++++++-----------------
1 files changed, 11 insertions(+), 17 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f38a65..a9bdf9c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -125,7 +125,6 @@ enum {
struct global_cwq;
struct worker_pool;
-struct idle_rebind;
/*
* The poor guys doing the actual heavy lifting. All on-duty workers
@@ -149,7 +148,6 @@ struct worker {
int id; /* I: worker id */
/* for rebinding worker to CPU */
- struct idle_rebind *idle_rebind; /* L: for idle worker */
struct work_struct rebind_work; /* L: for busy worker */
};
@@ -1302,10 +1300,8 @@ __acquires(&gcwq->lock)
}
}
-struct idle_rebind {
- int cnt; /* # workers to be rebound */
- struct completion done; /* all workers rebound */
-};
+static int idle_rebind_cnt; /* # workers to be rebound */
+static struct completion idle_rebind_done; /* all workers rebound */
/*
* Rebind an idle @worker to its CPU. During CPU onlining, this has to
@@ -1317,8 +1313,8 @@ static void idle_worker_rebind(struct worker *worker)
/* CPU must be online at this point */
WARN_ON(!worker_maybe_bind_and_lock(worker));
worker_clr_flags(worker, WORKER_REBIND);
- if (!--worker->idle_rebind->cnt)
- complete(&worker->idle_rebind->done);
+ if (!--idle_rebind_cnt)
+ complete(&idle_rebind_done);
spin_unlock_irq(&worker->pool->gcwq->lock);
/* we did our part, wait for rebind_workers() to finish up */
@@ -1377,7 +1373,6 @@ static void busy_worker_rebind_fn(struct work_struct *work)
static void rebind_workers(struct global_cwq *gcwq)
__releases(&gcwq->lock) __acquires(&gcwq->lock)
{
- struct idle_rebind idle_rebind;
struct worker_pool *pool;
struct worker *worker;
struct hlist_node *pos;
@@ -1390,12 +1385,12 @@ static void rebind_workers(struct global_cwq *gcwq)
/*
* Rebind idle workers. Interlocked both ways. We wait for
- * workers to rebind via @idle_rebind.done. Workers will wait for
+ * workers to rebind via @idle_rebind_done. Workers will wait for
* us to finish up by competing on pool->manager_mutex.
*/
- init_completion(&idle_rebind.done);
- idle_rebind.cnt = 0;
- INIT_COMPLETION(idle_rebind.done);
+ init_completion(&idle_rebind_done);
+ idle_rebind_cnt = 0;
+ INIT_COMPLETION(idle_rebind_done);
/* set REBIND and kick idle ones, we'll wait for these later */
for_each_worker_pool(pool, gcwq) {
@@ -1404,8 +1399,7 @@ static void rebind_workers(struct global_cwq *gcwq)
worker->flags &= ~WORKER_UNBOUND;
worker->flags |= WORKER_REBIND;
- idle_rebind.cnt++;
- worker->idle_rebind = &idle_rebind;
+ idle_rebind_cnt++;
/* worker_thread() will call idle_worker_rebind() */
wake_up_process(worker->task);
@@ -1435,9 +1429,9 @@ static void rebind_workers(struct global_cwq *gcwq)
}
/* waiting for all idle workers to be rebound */
- if (idle_rebind.cnt) {
+ if (idle_rebind_cnt) {
spin_unlock_irq(&gcwq->lock);
- wait_for_completion(&idle_rebind.done);
+ wait_for_completion(&idle_rebind_done);
spin_lock_irq(&gcwq->lock);
}
}
--
1.7.4.4
Access idle_rebind.cnt is always protected by gcwq->lock,
don't need to init it as 1.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ed23c9a..9f38a65 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1394,7 +1394,7 @@ static void rebind_workers(struct global_cwq *gcwq)
* us to finish up by competing on pool->manager_mutex.
*/
init_completion(&idle_rebind.done);
- idle_rebind.cnt = 1;
+ idle_rebind.cnt = 0;
INIT_COMPLETION(idle_rebind.done);
/* set REBIND and kick idle ones, we'll wait for these later */
@@ -1435,7 +1435,7 @@ static void rebind_workers(struct global_cwq *gcwq)
}
/* waiting for all idle workers to be rebound */
- if (--idle_rebind.cnt) {
+ if (idle_rebind.cnt) {
spin_unlock_irq(&gcwq->lock);
wait_for_completion(&idle_rebind.done);
spin_lock_irq(&gcwq->lock);
--
1.7.4.4
WORKER_REBIND is not used for other purpose,
idle_worker_rebind() can directly clear it.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 13 ++-----------
1 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5872c31..90ada8f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1316,6 +1316,7 @@ static void idle_worker_rebind(struct worker *worker)
{
/* CPU must be online at this point */
WARN_ON(!worker_maybe_bind_and_lock(worker));
+ worker_clr_flags(worker, WORKER_REBIND);
if (!--worker->idle_rebind->cnt)
complete(&worker->idle_rebind->done);
spin_unlock_irq(&worker->pool->gcwq->lock);
@@ -1393,7 +1394,7 @@ retry:
/* set REBIND and kick idle ones, we'll wait for these later */
for_each_worker_pool(pool, gcwq) {
list_for_each_entry(worker, &pool->idle_list, entry) {
- if (worker->flags & WORKER_REBIND)
+ if (!(worker->flags & WORKER_UNBOUND))
continue;
/* morph UNBOUND to REBIND */
@@ -1416,16 +1417,6 @@ retry:
goto retry;
}
- /*
- * All idle workers are rebound and waiting for %WORKER_REBIND to
- * be cleared inside idle_worker_rebind(). Clear and release.
- * Clearing %WORKER_REBIND from this foreign context is safe
- * because these workers are still guaranteed to be idle.
- */
- for_each_worker_pool(pool, gcwq)
- list_for_each_entry(worker, &pool->idle_list, entry)
- worker->flags &= ~WORKER_REBIND;
-
/* rebind busy workers */
for_each_busy_worker(worker, i, pos, gcwq) {
struct work_struct *rebind_work = &worker->rebind_work;
--
1.7.4.4
Hello, Lai.
On Tue, Aug 28, 2012 at 01:58:21AM +0800, Lai Jiangshan wrote:
> this thread is finally scheduled, .
> sees the WORKER_REBIND is not cleared, .
> go to sleep again waiting for (another) .
> rebind_workers() to wake up me. <--bug-> waiting for the idles' ACK.
>
> The two thread wait each other. It is bug.
Ooh, nice catch. Has this actually happened in the wild? Can you
reproduce it?
> This fix:
> The idle_worker_rebind() don't wait on rebind_hold, it waits on manager_mutex
> instead. When mutex_lock(manager_mutex) returns, the idles know that
> the corresponding rebind_workers() is finish up, the idle_worker_rebind() can
> returns.
Hmm... I can't really see how this change makes any difference tho.
While the exact wait condition changes, the mechanics doesn't.
IDLE WORKER HOTPLUG
- online
- rebind and kicks completion
- wait on manager_mutex
- finishes onlining
- offline
- online again, re-enters rebind_workers()
- wakes up but blocked on
manager_mutex
So, we're just deadlocked on a different thing. It seems what we need
is another interlocked step to make hotplug onlining wait for idle
worker finishes the last step, no?
Thanks.
--
tejun
Hello, Lai.
On Tue, Aug 28, 2012 at 01:58:24AM +0800, Lai Jiangshan wrote:
> busy_worker_rebind_fn() can't return until all idle workers are rebound,
> the code of busy_worker_rebind_fn() ensure this.
>
> So we can change the order of the code of rebind_workers(),
> and make it a single pass do the rebind_workers().
>
> It makes the code much clean and better readability.
I can't see how this could be correct. What prevents busy worker from
grabbing manager_mutex before idle one?
Thanks.
--
tejun
Hello,
On Tue, Aug 28, 2012 at 01:58:25AM +0800, Lai Jiangshan wrote:
> The compiler may compile this code into TWO write/modify instructions.
> worker->flags &= ~WORKER_UNBOUND;
> worker->flags |= WORKER_REBIND;
>
> so the other CPU may see the temporary of worker->flags which has
> not WORKER_UNBOUND nor WORKER_REBIND, it will wrongly do local wake up.
>
> so we use one write/modify instruction explicitly instead.
>
> This bug will not occur on idle workers, because they have another
> WORKER_NOT_RUNNING flags.
Yeap, I think this got broken while moving around when DISASSOCIATED
is cleared. Can you please put this at the head of the series?
Thanks.
--
tejun
On Tue, Aug 28, 2012 at 01:58:26AM +0800, Lai Jiangshan wrote:
> Access idle_rebind.cnt is always protected by gcwq->lock,
> don't need to init it as 1.
But then the completion could be triggered prematurely, no?
Thanks.
--
tejun
Hello, Lai.
On Tue, Aug 28, 2012 at 01:58:27AM +0800, Lai Jiangshan wrote:
> rebind_workers() is protected by cpu_hotplug lock,
> so struct idle_rebind is also proteced by it.
>
> And we can use a compile time allocated idle_rebind instead
> of allocating it from the stack. it makes code clean.
I don't know. While it seems correct, depending on outer locking
makes things more fragile and the dependency is implicit. I'd rather
keep it separate.
Thanks.
--
tejun
On Tue, Aug 28, 2012 at 4:05 AM, Tejun Heo <[email protected]> wrote:
> On Tue, Aug 28, 2012 at 01:58:26AM +0800, Lai Jiangshan wrote:
>> Access idle_rebind.cnt is always protected by gcwq->lock,
>> don't need to init it as 1.
>
> But then the completion could be triggered prematurely, no?
No, the idle_worker_rebind() can't be called until rebind_workers()
finish counting and release the gcwq->lock.
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Hello,
On Mon, Aug 27, 2012 at 9:36 PM, Lai Jiangshan <[email protected]> wrote:
>> But then the completion could be triggered prematurely, no?
>
> No, the idle_worker_rebind() can't be called until rebind_workers()
> finish counting and release the gcwq->lock.
I see but I still wanna keep it counting from 1. That's an
established pattern and is more robust. I really don't see what we're
gaining by doing it differently.
Thanks.
--
tejun
On 08/28/2012 03:04 AM, Tejun Heo wrote:
> Hello, Lai.
>
> On Tue, Aug 28, 2012 at 01:58:24AM +0800, Lai Jiangshan wrote:
>> busy_worker_rebind_fn() can't return until all idle workers are rebound,
>> the code of busy_worker_rebind_fn() ensure this.
>>
>> So we can change the order of the code of rebind_workers(),
>> and make it a single pass do the rebind_workers().
>>
>> It makes the code much clean and better readability.
>
> I can't see how this could be correct.
Could you elaborate more why it is not correct.
> What prevents busy worker from
> grabbing manager_mutex before idle one?
>
busy worker still has WORKER_REBIND when it enter busy_worker_rebind_fn(),
so it can sleep(include grab the manager_mutex). When this mutex_lock()
returns, it means rebind_workers() are done which means all idle are rebound
and can be locally woken up. So busy worker can return from busy_worker_rebind_fn()
and handle other possible-sleeping items.
If the CPU is offline again, the busy worker has WORKER_UNBOUND, it is also correct
when it returns from busy_worker_rebind_fn().