2014-07-26 03:04:07

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/3] workqueue: offload the worker-management out from kworker

Current kworker prefer creating worker (if required) to processing work items,
we hope the processing should be the first priority.

The jobs in managers are serialized, it is just wasting if we have multiple
managers, only one worker-creater is enough.

It causes much complication and tricky when manager is implemented inside
worker, using dedicated creater will make things more flexible.

So we offload the worker-management out from kworker into a single
dedicated creater kthread. It is done in patch2. And the patch1 is
preparation and patch3 is cleanup patch.

Lai Jiangshan (3):
workqueue: migrate the new worker before add it to idle_list
workqueue: use dedicated creater kthread for all pools
workqueue: cleanup may_start_working()

kernel/workqueue.c | 228 ++++++++++++++++++++++------------------------------
1 files changed, 96 insertions(+), 132 deletions(-)

--
1.7.4.4


2014-07-26 03:04:30

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/3] workqueue: use dedicated creater kthread for all pools

There are some problems with the managers:
1) The last idle worker prefer managing to processing.
It is better that the processing of work items should be the first
priority to make the whole system make progress earlier.
2) managers among different pools can be parallel, but actually
their major work are serialized on the kernel thread "kthreadd".
These managers are sleeping and wasted when the system is lack
of memory.

This patch introduces a dedicated creater kthread which offloads the
managing from the workers, thus every worker makes effort to process work
rather than create worker, and there is no manager wasting on sleeping
when the system is lack of memory. This dedicated creater kthread causes
a little more work serialized than before, but it is acceptable.

Implement Detail:
1) the creater_work: do the creation, creation exclusion
2) the cooldown timer: cool down when fail
3) the mayday timer: dismiss itself when pool->nr_idle > 0
4) the semantic for "pool->nr_idle == 0" active the creater_work, cooldown and mayday timer
5) the routine of the creater_work: create worker on two conditions
6) worker_thread(): start management without unlock/wait/recheck/retry
7) put_unbound_pool(): group destruction code by functionality
8) init_workqueues(): init creater kthread earlier than pools/workers

1) the creater_work
Every pool has a struct kthread_work creater_work to create worker, and
the dedicated creater kthread processes all these creater_works of
all pools. struct kthread_work has itself execution exclusion, so we don't
need the manager_arb to handle the creating exclusion any more.
put_unbound_pool() uses the flush_kthread_work() to synchronize with
the creating rather than uses the manager_arb.

2) the cooldown timer
The cooldown timer is introduced to implement the cool-down mechanism
rather than to causes the creater to sleep. When the create_worker()
fails, the cooldown timer is requested and it will restart the creater_work.

3) the mayday timer
The mayday timer is changed so it doesn't restart itself when
pool->nr_idle > 0. If it always restarts itself as before, we will add
a lot of complication in creater_work. The creater_work will need to
call del_timer_sync() after successful creation and grab the pool->lock
to check and restart the timer when pool->nr_idle becomes zero.
We don't want that complication, let the timer dismiss itself.

4) the semantic for "pool->nr_idle == 0"
Any moment when pool->nr_idle == 0, the pool must on the creating state.
The mayday timer must be active (pending or running) to ensure the mayday
can be sent, and at least one of the creater_work or the cooldown timer
must be active to ensure the creating is in progress or standby. So the
last worker who causes the pool->nr_idle reduce to 0 has the responsibility
to kick the mayday timer and the creater_work. And may_start_working()
becomes misleading due to the meaning of "!pool->nr_idle" is changed and
any worker should start working in spite of the value of pool->nr_idle.
may_start_working() will be cleanup in the next patch with the intention
that the size of this patch can be shorter for reviewing.

5) the routine of the creater_work
The creater_work creates a worker in these two conditions:
A) pool->nr_idle == 0
A new worker is needed to be created obviously even there is no
work item pending. The busy workers may sleep and the pool can't
serves the future new work items if no new worker is standby or
being created.
B) pool->nr_idle == 1 && pool->nr_running == 0
It should create a worker but not strictly needed since we still
have a free idle worker and it can restart the creation when it goes
to busy. But if we don't create worker in this condition, this
condition may occur frequently. If a work item is queued and the
last idle worker starts creater_work. But if the work item is
finished before the creater_work, this condition happens again,
and it can happen again and again in this way. So we had better
to create a worker in this condition.
The creater_work will quit directly in other conditions.

6) worker_thread()
There is no recheck nor retry when creating is required in worker_thread(),
it just kicks out the mayday timer and the creater work and goes to
process the work items directly.

7) put_unbound_pool()
put_unbound_pool() groups code by functionality not by the name, it
stops creation activity (creater_work, cooldown_timer, mayday_timer)
at first and then stops idle workers and idle_tiemr.

8) init_workqueues()
The struct kthread_worker kworker_creater is initialized earlier than
worker_pools in init_workqueues() so that kworker_creater_thread is
created than all early kworkers. Although the early kworkers are not
depends on kworker_creater_thread, but this initialization order makes
the pid of kworker_creater_thread smaller than kworkers which
seems more smooth.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 190 ++++++++++++++++++++++------------------------------
1 files changed, 80 insertions(+), 110 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1d44d8d..ce8e3fc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -152,17 +152,17 @@ struct worker_pool {
struct list_head idle_list; /* X: list of idle workers */
struct timer_list idle_timer; /* L: worker idle timeout */
struct timer_list mayday_timer; /* L: SOS timer for workers */
+ struct timer_list cooldown_timer; /* L: cool down before retry */

- /* a workers is either on busy_hash or idle_list, or the manager */
+ /* a workers is either on busy_hash or idle_list */
DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
/* L: hash of busy workers */

- /* see manage_workers() for details on the two manager mutexes */
- struct mutex manager_arb; /* manager arbitration */
struct mutex attach_mutex; /* attach/detach exclusion */
struct list_head workers; /* A: attached workers */
struct completion *detach_completion; /* all workers detached */

+ struct kthread_work creater_work; /* L: work for creating a new worker */
struct ida worker_ida; /* worker IDs for task name */

struct workqueue_attrs *attrs; /* I: worker attributes */
@@ -291,6 +291,10 @@ static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */
static LIST_HEAD(workqueues); /* PL: list of all workqueues */
static bool workqueue_freezing; /* PL: have wqs started freezing? */

+/* kthread worker for creating workers for pools */
+static DEFINE_KTHREAD_WORKER(kworker_creater);
+static struct task_struct *kworker_creater_thread __read_mostly;
+
/* the per-cpu worker pools */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
cpu_worker_pools);
@@ -753,8 +757,7 @@ static bool need_to_create_worker(struct worker_pool *pool)
/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
- bool managing = mutex_is_locked(&pool->manager_arb);
- int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
+ int nr_idle = pool->nr_idle;
int nr_busy = pool->nr_workers - nr_idle;

return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy;
@@ -1823,113 +1826,78 @@ static void pool_mayday_timeout(unsigned long __pool)
send_mayday(work);
}

+ if (!pool->nr_idle)
+ mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL);
+
spin_unlock(&pool->lock);
spin_unlock_irq(&wq_mayday_lock);
+}
+
+static void pool_cooldown_timeout(unsigned long __pool)
+{
+ struct worker_pool *pool = (void *)__pool;

- mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL);
+ spin_lock_irq(&pool->lock);
+ if (!pool->nr_idle)
+ queue_kthread_work(&kworker_creater, &pool->creater_work);
+ spin_unlock_irq(&pool->lock);
}

-/**
- * maybe_create_worker - create a new worker if necessary
- * @pool: pool to create a new worker for
- *
- * Create a new worker for @pool if necessary. @pool is guaranteed to
- * have at least one idle worker on return from this function. If
- * creating a new worker takes longer than MAYDAY_INTERVAL, mayday is
- * sent to all rescuers with works scheduled on @pool to resolve
- * possible allocation deadlock.
- *
- * On return, need_to_create_worker() is guaranteed to be %false and
- * may_start_working() %true.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times. Does GFP_KERNEL allocations. Called only from
- * manager.
+/*
+ * Start the mayday timer and the creater when pool->nr_idle is reduced to 0.
*
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
+ * Any moment when pool->nr_idle == 0, the mayday timer must be active
+ * (pending or running) to ensure the mayday can be sent, and at least one
+ * of the creater_work or the cooldown timer must be active to ensure
+ * the creating is in progress or standby.
*/
-static bool maybe_create_worker(struct worker_pool *pool)
-__releases(&pool->lock)
-__acquires(&pool->lock)
+static void start_creater_work(struct worker_pool *pool)
{
- if (!need_to_create_worker(pool))
- return false;
-restart:
- spin_unlock_irq(&pool->lock);
-
/* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);

- while (true) {
- if (create_worker(pool) || !need_to_create_worker(pool))
- break;
-
- schedule_timeout_interruptible(CREATE_COOLDOWN);
-
- if (!need_to_create_worker(pool))
- break;
- }
-
- del_timer_sync(&pool->mayday_timer);
- spin_lock_irq(&pool->lock);
- /*
- * This is necessary even after a new worker was just successfully
- * created as @pool->lock was dropped and the new worker might have
- * already become busy.
- */
- if (need_to_create_worker(pool))
- goto restart;
- return true;
+ queue_kthread_work(&kworker_creater, &pool->creater_work);
}

/**
- * manage_workers - manage worker pool
- * @worker: self
- *
- * Assume the manager role and manage the worker pool @worker belongs
- * to. At any given time, there can be only zero or one manager per
- * pool. The exclusion is handled automatically by this function.
- *
- * The caller can safely start processing works on false return. On
- * true return, it's guaranteed that need_to_create_worker() is false
- * and may_start_working() is true.
+ * pool_create_worker - create a new workqueue worker when needed
+ * @work: the struct kthread_work creater_work of the target pool
+ *
+ * The routine of the creater_work to create a new worker for the pool.
+ * The creater_work must be requested anytime when the last worker
+ * goes to process work item. And it should create a worker in some
+ * conditions when it is invoked:
+ *
+ * 1) pool->nr_idle == 0:
+ * It must create a worker or else the pool will lead to lengthy stall.
+ * 2) pool->nr_idle == 1 and there is no running worker
+ * It should create a worker but not strictly needed. It happens when
+ * a work item was just finished and the worker is available, but
+ * creater_work would be requested again if a new work item is queued.
+ * So it is better to create a worker in this case to avoid
+ * the creater_work being requested often without doing any thing.
+ * 3) other case
+ * Doesn't need to create an additional worker
*
* CONTEXT:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times. Does GFP_KERNEL allocations.
- *
- * Return:
- * %false if the pool don't need management and the caller can safely start
- * processing works, %true indicates that the function released pool->lock
- * and reacquired it to perform some management function and that the
- * conditions that the caller verified while holding the lock before
- * calling the function might no longer be true.
+ * Might sleep. Does GFP_KERNEL allocations. Called from kthread_work.
*/
-static bool manage_workers(struct worker *worker)
+static void pool_create_worker(struct kthread_work *work)
{
- struct worker_pool *pool = worker->pool;
- bool ret = false;
+ struct worker_pool *pool = container_of(work, struct worker_pool,
+ creater_work);
+ int nr_idle;

- /*
- * Anyone who successfully grabs manager_arb wins the arbitration
- * and becomes the manager. mutex_trylock() on pool->manager_arb
- * failure while holding pool->lock reliably indicates that someone
- * else is managing the pool and the worker which failed trylock
- * can proceed to executing work items. This means that anyone
- * grabbing manager_arb is responsible for actually performing
- * manager duties. If manager_arb is grabbed and released without
- * actual management, the pool may stall indefinitely.
- */
- if (!mutex_trylock(&pool->manager_arb))
- return ret;
+ spin_lock_irq(&pool->lock);
+ nr_idle = pool->nr_idle;
+ spin_unlock_irq(&pool->lock);

- ret |= maybe_create_worker(pool);
+ if (nr_idle == 0 || (nr_idle == 1 && !atomic_read(&pool->nr_running))) {
+ if (create_worker(pool))
+ return;

- mutex_unlock(&pool->manager_arb);
- return ret;
+ mod_timer(&pool->cooldown_timer, jiffies + CREATE_COOLDOWN);
+ }
}

/**
@@ -2123,14 +2091,14 @@ woke_up:
}

worker_leave_idle(worker);
-recheck:
+
/* no more worker necessary? */
if (!need_more_worker(pool))
goto sleep;

- /* do we need to manage? */
- if (unlikely(!may_start_working(pool)) && manage_workers(worker))
- goto recheck;
+ /* do we need to create worker? */
+ if (unlikely(!may_start_working(pool)))
+ start_creater_work(pool);

/*
* ->scheduled list can only be filled while a worker is
@@ -2141,8 +2109,8 @@ recheck:

/*
* Finish PREP stage. We're guaranteed to have at least one idle
- * worker or that someone else has already assumed the manager
- * role. This is where @worker starts participating in concurrency
+ * worker or the creater_work is issued.
+ * This is where @worker starts participating in concurrency
* management if applicable and concurrency management is restored
* after being rebound. See rebind_workers() for details.
*/
@@ -3354,11 +3322,13 @@ static int init_worker_pool(struct worker_pool *pool)

setup_timer(&pool->mayday_timer, pool_mayday_timeout,
(unsigned long)pool);
+ setup_timer(&pool->cooldown_timer, pool_cooldown_timeout,
+ (unsigned long)pool);

- mutex_init(&pool->manager_arb);
mutex_init(&pool->attach_mutex);
INIT_LIST_HEAD(&pool->workers);

+ init_kthread_work(&pool->creater_work, pool_create_worker);
ida_init(&pool->worker_ida);
INIT_HLIST_NODE(&pool->hash_node);
pool->refcnt = 1;
@@ -3410,19 +3380,20 @@ static void put_unbound_pool(struct worker_pool *pool)
idr_remove(&worker_pool_idr, pool->id);
hash_del(&pool->hash_node);

- /*
- * Become the manager and destroy all workers. Grabbing
- * manager_arb prevents @pool's workers from blocking on
- * attach_mutex.
- */
- mutex_lock(&pool->manager_arb);
+ /* wait for unfinished creating and shut down the associated timers */
+ flush_kthread_work(&pool->creater_work);
+ del_timer_sync(&pool->cooldown_timer);
+ del_timer_sync(&pool->mayday_timer);

+ /* all workers are anchored in idle_list, destroy them all at once */
spin_lock_irq(&pool->lock);
while ((worker = first_idle_worker(pool)))
destroy_worker(worker);
WARN_ON(pool->nr_workers || pool->nr_idle);
spin_unlock_irq(&pool->lock);

+ del_timer_sync(&pool->idle_timer);
+
mutex_lock(&pool->attach_mutex);
if (!list_empty(&pool->workers))
pool->detach_completion = &detach_completion;
@@ -3431,12 +3402,6 @@ static void put_unbound_pool(struct worker_pool *pool)
if (pool->detach_completion)
wait_for_completion(pool->detach_completion);

- mutex_unlock(&pool->manager_arb);
-
- /* shut down the timers */
- del_timer_sync(&pool->idle_timer);
- del_timer_sync(&pool->mayday_timer);
-
/* sched-RCU protected to allow dereferences from get_work_pool() */
call_rcu_sched(&pool->rcu, rcu_free_pool);
}
@@ -4837,6 +4802,11 @@ static int __init init_workqueues(void)

wq_numa_init();

+ kworker_creater_thread = kthread_run(kthread_worker_fn,
+ &kworker_creater,
+ "kworker/creater");
+ BUG_ON(!kworker_creater_thread);
+
/* initialize CPU pools */
for_each_possible_cpu(cpu) {
struct worker_pool *pool;
--
1.7.4.4

2014-07-26 03:04:28

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/3] workqueue: cleanup may_start_working()

The name of may_start_working() became misleading due to the semantics of
"!pool->nr_idle" is changed and any worker can start working in spite of
the value of pool->nr_idle.

So we remove the may_start_working() and use "!pool->nr_idle" directly,
need_to_create_worker() is also removed along with it.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 37 +++++++++++++------------------------
1 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ce8e3fc..e1ab4f9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -735,12 +735,6 @@ static bool need_more_worker(struct worker_pool *pool)
return !list_empty(&pool->worklist) && __need_more_worker(pool);
}

-/* Can I start working? Called from busy but !running workers. */
-static bool may_start_working(struct worker_pool *pool)
-{
- return pool->nr_idle;
-}
-
/* Do I need to keep working? Called from currently running workers. */
static bool keep_working(struct worker_pool *pool)
{
@@ -748,12 +742,6 @@ static bool keep_working(struct worker_pool *pool)
atomic_read(&pool->nr_running) <= 1;
}

-/* Do we need a new worker? Called from manager. */
-static bool need_to_create_worker(struct worker_pool *pool)
-{
- return need_more_worker(pool) && !may_start_working(pool);
-}
-
/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
@@ -1815,19 +1803,20 @@ static void pool_mayday_timeout(unsigned long __pool)
spin_lock_irq(&wq_mayday_lock); /* for wq->maydays */
spin_lock(&pool->lock);

- if (need_to_create_worker(pool)) {
- /*
- * We've been trying to create a new worker but
- * haven't been successful. We might be hitting an
- * allocation deadlock. Send distress signals to
- * rescuers.
- */
- list_for_each_entry(work, &pool->worklist, entry)
- send_mayday(work);
- }
+ if (!pool->nr_idle) {
+ if (need_more_worker(pool)) {
+ /*
+ * We've been trying to create a new worker but
+ * haven't been successful. We might be hitting an
+ * allocation deadlock. Send distress signals to
+ * rescuers.
+ */
+ list_for_each_entry(work, &pool->worklist, entry)
+ send_mayday(work);
+ }

- if (!pool->nr_idle)
mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL);
+ }

spin_unlock(&pool->lock);
spin_unlock_irq(&wq_mayday_lock);
@@ -2097,7 +2086,7 @@ woke_up:
goto sleep;

/* do we need to create worker? */
- if (unlikely(!may_start_working(pool)))
+ if (unlikely(!pool->nr_idle))
start_creater_work(pool);

/*
--
1.7.4.4

2014-07-26 03:04:27

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/3] workqueue: migrate the new worker before add it to idle_list

There is an undocumented requirement for create_worker() that it can
only be called from existing worker (aka. manager) except the first call.

The reason is that the current create_worker() queues the new worker to
idle_list at first and then wake up it. But the new worker is not
guaranteed to be migrated until it is waken up. Thus the
wq_worker_sleeping() may see the new non-local worker from the idle_list
if this block of code is not executed on the local CPU to disable
wq_worker_sleeping(). Existing worker can guarantee to run on local
CPU when !DISASSOCIATED, so create_worker() is required to be called
from existing worker/manager only currently.

But we are planning to allow create_worker() to be called out side
from its workers and the requirement should be alleviated. So we exchange
the order of the code, the new worker is woken up before queued
to idle_list.

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 370f947..1d44d8d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1708,8 +1708,13 @@ static struct worker *create_worker(struct worker_pool *pool)
/* start the newly created worker */
spin_lock_irq(&pool->lock);
worker->pool->nr_workers++;
- worker_enter_idle(worker);
+ /*
+ * Wake up the worker at first and then queue it to the idle_list,
+ * so that it is ensued that the wq_worker_sleeping() sees the worker
+ * had been migrated properly when sees this worker in the idle_list.
+ */
wake_up_process(worker->task);
+ worker_enter_idle(worker);
spin_unlock_irq(&pool->lock);

return worker;
--
1.7.4.4

2014-07-28 18:55:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/3] workqueue: use dedicated creater kthread for all pools

Hello, Lai.

On Sat, Jul 26, 2014 at 11:04:50AM +0800, Lai Jiangshan wrote:
> There are some problems with the managers:
> 1) The last idle worker prefer managing to processing.
> It is better that the processing of work items should be the first
> priority to make the whole system make progress earlier.
> 2) managers among different pools can be parallel, but actually
> their major work are serialized on the kernel thread "kthreadd".
> These managers are sleeping and wasted when the system is lack
> of memory.

It's a bit difficult to get excited about this patchset given that
this is mostly churn without many actual benefits. Sure, it
consolidates one-per-pool managers into one kthread_worker but it was
one-per-pool already. That said, I don't hate it and it may be
considered an improvement. I don't know.

> This patch introduces a dedicated creater kthread which offloads the
> managing from the workers, thus every worker makes effort to process work
> rather than create worker, and there is no manager wasting on sleeping
> when the system is lack of memory. This dedicated creater kthread causes
> a little more work serialized than before, but it is acceptable.

I do dislike the idea of creating this sort of hard serialization
among separate worker pools. It's probably acceptable but why are we
doing this? To save one thread per pool and shed 30 lines of code
while adding 40 lines to kthread_worker?

> 1) the creater_work
> Every pool has a struct kthread_work creater_work to create worker, and
> the dedicated creater kthread processes all these creater_works of
> all pools. struct kthread_work has itself execution exclusion, so we don't
> need the manager_arb to handle the creating exclusion any more.

This explanation can be a bit misleading, I think. We just don't have
multiple workers trying to become managers anymore.

> put_unbound_pool() uses the flush_kthread_work() to synchronize with
> the creating rather than uses the manager_arb.
>
> 2) the cooldown timer
> The cooldown timer is introduced to implement the cool-down mechanism
> rather than to causes the creater to sleep. When the create_worker()
> fails, the cooldown timer is requested and it will restart the creater_work.

Why? Why doesn't the creator sleep?

...
> 5) the routine of the creater_work
> The creater_work creates a worker in these two conditions:
> A) pool->nr_idle == 0
> A new worker is needed to be created obviously even there is no
> work item pending. The busy workers may sleep and the pool can't
> serves the future new work items if no new worker is standby or
> being created.

This is kinda silly when the duty of worker creation is served by an
external entity. Why would a pool need any idle worker?
insert_work() may as well just queue worker creation on demand.

> 8) init_workqueues()
> The struct kthread_worker kworker_creater is initialized earlier than
> worker_pools in init_workqueues() so that kworker_creater_thread is
> created than all early kworkers. Although the early kworkers are not
> depends on kworker_creater_thread, but this initialization order makes
> the pid of kworker_creater_thread smaller than kworkers which
> seems more smooth.

Just don't create idle workers?

Overall, I don't know. I like some aspects of it but there's nothing
really convincing about this change. It sure is different. Maybe
this is more straight-forward but I'm not sure this is worth the
changes.

Thanks.

--
tejun

2014-07-29 01:25:23

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 2/3] workqueue: use dedicated creater kthread for all pools

On 07/29/2014 02:55 AM, Tejun Heo wrote:
> Hello, Lai.
>
> On Sat, Jul 26, 2014 at 11:04:50AM +0800, Lai Jiangshan wrote:
>> There are some problems with the managers:
>> 1) The last idle worker prefer managing to processing.
>> It is better that the processing of work items should be the first
>> priority to make the whole system make progress earlier.
>> 2) managers among different pools can be parallel, but actually
>> their major work are serialized on the kernel thread "kthreadd".
>> These managers are sleeping and wasted when the system is lack
>> of memory.
>
> It's a bit difficult to get excited about this patchset given that
> this is mostly churn without many actual benefits. Sure, it
> consolidates one-per-pool managers into one kthread_worker but it was
> one-per-pool already. That said, I don't hate it and it may be
> considered an improvement. I don't know.

It prefers to processing works rather than creating worker without any
loss of the guarantee.

processing works makes directly progress for the system.
creating worker makes delay and indirectly progress.

>
>> This patch introduces a dedicated creater kthread which offloads the
>> managing from the workers, thus every worker makes effort to process work
>> rather than create worker, and there is no manager wasting on sleeping
>> when the system is lack of memory. This dedicated creater kthread causes
>> a little more work serialized than before, but it is acceptable.
>
> I do dislike the idea of creating this sort of hard serialization
> among separate worker pools. It's probably acceptable but why are we
> doing this?


I was about to use per-cpu kthread_worker, but I changed to use single
global kthread_worker after I had noticed the kthread_create() is already
serialized in kthreadd.

> To save one thread per pool and shed 30 lines of code
> while adding 40 lines to kthread_worker?

Countings are different between us!
Simplicity was also my aim in this patchset and total LOC was reduced.

>
>> 1) the creater_work
>> Every pool has a struct kthread_work creater_work to create worker, and
>> the dedicated creater kthread processes all these creater_works of
>> all pools. struct kthread_work has itself execution exclusion, so we don't
>> need the manager_arb to handle the creating exclusion any more.
>
> This explanation can be a bit misleading, I think. We just don't have
> multiple workers trying to become managers anymore.
>
>> put_unbound_pool() uses the flush_kthread_work() to synchronize with
>> the creating rather than uses the manager_arb.
>>
>> 2) the cooldown timer
>> The cooldown timer is introduced to implement the cool-down mechanism
>> rather than to causes the creater to sleep. When the create_worker()
>> fails, the cooldown timer is requested and it will restart the creater_work.
>
> Why? Why doesn't the creator sleep?
>
> ...
>> 5) the routine of the creater_work
>> The creater_work creates a worker in these two conditions:
>> A) pool->nr_idle == 0
>> A new worker is needed to be created obviously even there is no
>> work item pending. The busy workers may sleep and the pool can't
>> serves the future new work items if no new worker is standby or
>> being created.
>
> This is kinda silly when the duty of worker creation is served by an
> external entity. Why would a pool need any idle worker?

The idle worker must be ready or being prepared for wq_worker_sleeping()
or chained-wake-up.

percpu-kthreadd can serve for wq_worker_sleeping() in this case, but it is
not a good idle to introduce percpu-kthreadd now since no other user.

> insert_work() may as well just queue worker creation on demand.

Yes, it can save some workers for idle-unbound-pool.

>
>> 8) init_workqueues()
>> The struct kthread_worker kworker_creater is initialized earlier than
>> worker_pools in init_workqueues() so that kworker_creater_thread is
>> created than all early kworkers. Although the early kworkers are not
>> depends on kworker_creater_thread, but this initialization order makes
>> the pid of kworker_creater_thread smaller than kworkers which
>> seems more smooth.
>
> Just don't create idle workers?

It does a good idea.

Thanks for review and comments.
Lai

2014-07-29 02:16:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 2/3] workqueue: use dedicated creater kthread for all pools

Hello,

On Tue, Jul 29, 2014 at 09:26:35AM +0800, Lai Jiangshan wrote:
> > It's a bit difficult to get excited about this patchset given that
> > this is mostly churn without many actual benefits. Sure, it
> > consolidates one-per-pool managers into one kthread_worker but it was
> > one-per-pool already. That said, I don't hate it and it may be
> > considered an improvement. I don't know.
>
> It prefers to processing works rather than creating worker without any
> loss of the guarantee.
>
> processing works makes directly progress for the system.
> creating worker makes delay and indirectly progress.

That's misleading, isn't it? Both process work items the same. The
only difference is per-pool manager ends up using more tasks, thus a
bit more memory, doing it. There really is no signficant behavior
difference between the two schemes except for how many tasks end up
serving as the manager.

> > This is kinda silly when the duty of worker creation is served by an
> > external entity. Why would a pool need any idle worker?
>
> The idle worker must be ready or being prepared for wq_worker_sleeping()
> or chained-wake-up.
>
> percpu-kthreadd can serve for wq_worker_sleeping() in this case, but it is
> not a good idle to introduce percpu-kthreadd now since no other user.

Hmmm... I'm not really sure what we're getting with this. It doesn't
look much simpler to me. :(

Lai, I don't know. If this ends up simplifying things significantly,
sure, but as it currently stands, I can't see why we'd need to do
this. If you wanna pursue this, please try to make it more
beneficial.

Thanks.

--
tejun

2014-07-29 09:15:13

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH RFC 2/2 V2] workqueue: use dedicated creater kthread for all pools

There are some problems with the managers:
1) The last idle worker prefer managing to processing.
It is better that the processing of work items should be the first
priority to make the whole system make progress earlier.
2) each pool always needs an additional idle worker, it is just wasting.
3) managers among different pools can be parallel, but actually
their major work are serialized on the kernel thread "kthreadd".
These managers are sleeping and wasted when the system is lack
of memory.
4) it causes much complication and tricky when manager is implemented
inside worker

This patch introduces a dedicated creater kthread which offloads the
managing from the workers, thus every worker makes effort to process work
rather than create worker. And there is no idle worker nor manager wasting
on sleeping. And all the manager code is removed. This dedicated creater
kthread causes a little more work serialized than before, but it is
acceptable since their major works are already serialized.

We introduce the kthread_work creater_work which calls create_worker()
to create a worker and the start_creater_work() which starts the
mayday_timer and the creater_work when needed and the pool->creating
which handles the exclusion for the mayday_timer and the creater_work.

We create worker only when the worklist has work items pending but no
idle worker ready. So start_creater_work() is only called on
insert_work() or the last idle does go to busy with other work items
remained. And we don't create worker when init_workqueues() nor
get_unbound_pool() nor CPU_UP_PREPARE nor the last idle worker leaves
idle any more, it saves code and unneeded workers.

put_unbound_pool() groups code by functionality not by the name, it
stops creation activity (creater_work, mayday_timer) at first and then
stops idle workers and idle_timer.

(1/2 patch is the 1/3 patch of the v1, so it is not resent.)

Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/workqueue.c | 238 ++++++++++++++--------------------------------------
1 files changed, 63 insertions(+), 175 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1d44d8d..03e253f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -153,16 +153,16 @@ struct worker_pool {
struct timer_list idle_timer; /* L: worker idle timeout */
struct timer_list mayday_timer; /* L: SOS timer for workers */

- /* a workers is either on busy_hash or idle_list, or the manager */
+ /* a workers is either on busy_hash or idle_list */
DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
/* L: hash of busy workers */

- /* see manage_workers() for details on the two manager mutexes */
- struct mutex manager_arb; /* manager arbitration */
struct mutex attach_mutex; /* attach/detach exclusion */
struct list_head workers; /* A: attached workers */
struct completion *detach_completion; /* all workers detached */

+ struct kthread_work creater_work; /* L: work for creating a new worker */
+ bool creating; /* L: creater_work is busy */
struct ida worker_ida; /* worker IDs for task name */

struct workqueue_attrs *attrs; /* I: worker attributes */
@@ -291,6 +291,10 @@ static DEFINE_SPINLOCK(wq_mayday_lock); /* protects wq->maydays list */
static LIST_HEAD(workqueues); /* PL: list of all workqueues */
static bool workqueue_freezing; /* PL: have wqs started freezing? */

+/* kthread worker for creating workers for pools */
+static DEFINE_KTHREAD_WORKER(kworker_creater);
+static struct task_struct *kworker_creater_thread __read_mostly;
+
/* the per-cpu worker pools */
static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
cpu_worker_pools);
@@ -731,12 +735,6 @@ static bool need_more_worker(struct worker_pool *pool)
return !list_empty(&pool->worklist) && __need_more_worker(pool);
}

-/* Can I start working? Called from busy but !running workers. */
-static bool may_start_working(struct worker_pool *pool)
-{
- return pool->nr_idle;
-}
-
/* Do I need to keep working? Called from currently running workers. */
static bool keep_working(struct worker_pool *pool)
{
@@ -744,17 +742,10 @@ static bool keep_working(struct worker_pool *pool)
atomic_read(&pool->nr_running) <= 1;
}

-/* Do we need a new worker? Called from manager. */
-static bool need_to_create_worker(struct worker_pool *pool)
-{
- return need_more_worker(pool) && !may_start_working(pool);
-}
-
/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
- bool managing = mutex_is_locked(&pool->manager_arb);
- int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
+ int nr_idle = pool->nr_idle;
int nr_busy = pool->nr_workers - nr_idle;

return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy;
@@ -1228,6 +1219,21 @@ fail:
return -EAGAIN;
}

+/* Start the mayday timer and the creater when needed */
+static inline void start_creater_work(struct worker_pool *pool)
+{
+ if (pool->nr_idle || pool->creating || list_empty(&pool->worklist))
+ return;
+
+ pool->creating = true;
+ pool->mayday_timer.expires = jiffies + MAYDAY_INITIAL_TIMEOUT;
+ if (pool->flags & POOL_DISASSOCIATED)
+ add_timer(&pool->mayday_timer);
+ else
+ add_timer_on(&pool->mayday_timer, pool->cpu);
+ queue_kthread_work(&kworker_creater, &pool->creater_work);
+}
+
/**
* insert_work - insert a work into a pool
* @pwq: pwq @work belongs to
@@ -1249,6 +1255,7 @@ static void insert_work(struct pool_workqueue *pwq, struct work_struct *work,
/* we own @work, set data and link */
set_work_pwq(work, pwq, extra_flags);
list_add_tail(&work->entry, head);
+ start_creater_work(pool);
get_pwq(pwq);

/*
@@ -1658,18 +1665,17 @@ static void worker_detach_from_pool(struct worker *worker,

/**
* create_worker - create a new workqueue worker
- * @pool: pool the new worker will belong to
+ * @work: the struct kthread_work creater_work of the target pool
*
- * Create and start a new worker which is attached to @pool.
+ * Create and start a new worker which is attached to the target pool.
*
* CONTEXT:
* Might sleep. Does GFP_KERNEL allocations.
- *
- * Return:
- * Pointer to the newly created worker.
*/
-static struct worker *create_worker(struct worker_pool *pool)
+static void create_worker(struct kthread_work *work)
{
+ struct worker_pool *pool = container_of(work, struct worker_pool,
+ creater_work);
struct worker *worker = NULL;
int id = -1;
char id_buf[16];
@@ -1697,6 +1703,8 @@ static struct worker *create_worker(struct worker_pool *pool)
if (IS_ERR(worker->task))
goto fail;

+ del_timer_sync(&pool->mayday_timer);
+
set_user_nice(worker->task, pool->attrs->nice);

/* prevent userland from meddling with cpumask of workqueue workers */
@@ -1715,15 +1723,24 @@ static struct worker *create_worker(struct worker_pool *pool)
*/
wake_up_process(worker->task);
worker_enter_idle(worker);
+ pool->creating = false;
spin_unlock_irq(&pool->lock);

- return worker;
+ return;

fail:
if (id >= 0)
ida_simple_remove(&pool->worker_ida, id);
kfree(worker);
- return NULL;
+
+ /* cool down before next create_worker() */
+ schedule_timeout_interruptible(CREATE_COOLDOWN);
+ del_timer_sync(&pool->mayday_timer);
+
+ spin_lock_irq(&pool->lock);
+ pool->creating = false;
+ start_creater_work(pool);
+ spin_unlock_irq(&pool->lock);
}

/**
@@ -1812,7 +1829,7 @@ static void pool_mayday_timeout(unsigned long __pool)
spin_lock_irq(&wq_mayday_lock); /* for wq->maydays */
spin_lock(&pool->lock);

- if (need_to_create_worker(pool)) {
+ if (!pool->nr_idle && need_more_worker(pool)) {
/*
* We've been trying to create a new worker but
* haven't been successful. We might be hitting an
@@ -1830,109 +1847,6 @@ static void pool_mayday_timeout(unsigned long __pool)
}

/**
- * maybe_create_worker - create a new worker if necessary
- * @pool: pool to create a new worker for
- *
- * Create a new worker for @pool if necessary. @pool is guaranteed to
- * have at least one idle worker on return from this function. If
- * creating a new worker takes longer than MAYDAY_INTERVAL, mayday is
- * sent to all rescuers with works scheduled on @pool to resolve
- * possible allocation deadlock.
- *
- * On return, need_to_create_worker() is guaranteed to be %false and
- * may_start_working() %true.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times. Does GFP_KERNEL allocations. Called only from
- * manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
- */
-static bool maybe_create_worker(struct worker_pool *pool)
-__releases(&pool->lock)
-__acquires(&pool->lock)
-{
- if (!need_to_create_worker(pool))
- return false;
-restart:
- spin_unlock_irq(&pool->lock);
-
- /* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
- mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
-
- while (true) {
- if (create_worker(pool) || !need_to_create_worker(pool))
- break;
-
- schedule_timeout_interruptible(CREATE_COOLDOWN);
-
- if (!need_to_create_worker(pool))
- break;
- }
-
- del_timer_sync(&pool->mayday_timer);
- spin_lock_irq(&pool->lock);
- /*
- * This is necessary even after a new worker was just successfully
- * created as @pool->lock was dropped and the new worker might have
- * already become busy.
- */
- if (need_to_create_worker(pool))
- goto restart;
- return true;
-}
-
-/**
- * manage_workers - manage worker pool
- * @worker: self
- *
- * Assume the manager role and manage the worker pool @worker belongs
- * to. At any given time, there can be only zero or one manager per
- * pool. The exclusion is handled automatically by this function.
- *
- * The caller can safely start processing works on false return. On
- * true return, it's guaranteed that need_to_create_worker() is false
- * and may_start_working() is true.
- *
- * CONTEXT:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times. Does GFP_KERNEL allocations.
- *
- * Return:
- * %false if the pool don't need management and the caller can safely start
- * processing works, %true indicates that the function released pool->lock
- * and reacquired it to perform some management function and that the
- * conditions that the caller verified while holding the lock before
- * calling the function might no longer be true.
- */
-static bool manage_workers(struct worker *worker)
-{
- struct worker_pool *pool = worker->pool;
- bool ret = false;
-
- /*
- * Anyone who successfully grabs manager_arb wins the arbitration
- * and becomes the manager. mutex_trylock() on pool->manager_arb
- * failure while holding pool->lock reliably indicates that someone
- * else is managing the pool and the worker which failed trylock
- * can proceed to executing work items. This means that anyone
- * grabbing manager_arb is responsible for actually performing
- * manager duties. If manager_arb is grabbed and released without
- * actual management, the pool may stall indefinitely.
- */
- if (!mutex_trylock(&pool->manager_arb))
- return ret;
-
- ret |= maybe_create_worker(pool);
-
- mutex_unlock(&pool->manager_arb);
- return ret;
-}
-
-/**
* process_one_work - process single work
* @worker: self
* @work: work to process
@@ -1991,6 +1905,7 @@ __acquires(&pool->lock)
work_color = get_work_color(work);

list_del_init(&work->entry);
+ start_creater_work(pool);

/*
* CPU intensive works don't participate in concurrency management.
@@ -2123,15 +2038,11 @@ woke_up:
}

worker_leave_idle(worker);
-recheck:
+
/* no more worker necessary? */
if (!need_more_worker(pool))
goto sleep;

- /* do we need to manage? */
- if (unlikely(!may_start_working(pool)) && manage_workers(worker))
- goto recheck;
-
/*
* ->scheduled list can only be filled while a worker is
* preparing to process a work or actually processing it.
@@ -2140,11 +2051,9 @@ recheck:
WARN_ON_ONCE(!list_empty(&worker->scheduled));

/*
- * Finish PREP stage. We're guaranteed to have at least one idle
- * worker or that someone else has already assumed the manager
- * role. This is where @worker starts participating in concurrency
- * management if applicable and concurrency management is restored
- * after being rebound. See rebind_workers() for details.
+ * Finish PREP stage. This is where @worker starts participating in
+ * concurrency management if applicable and concurrency management is
+ * restored after being rebound. See rebind_workers() for details.
*/
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

@@ -3355,10 +3264,10 @@ static int init_worker_pool(struct worker_pool *pool)
setup_timer(&pool->mayday_timer, pool_mayday_timeout,
(unsigned long)pool);

- mutex_init(&pool->manager_arb);
mutex_init(&pool->attach_mutex);
INIT_LIST_HEAD(&pool->workers);

+ init_kthread_work(&pool->creater_work, create_worker);
ida_init(&pool->worker_ida);
INIT_HLIST_NODE(&pool->hash_node);
pool->refcnt = 1;
@@ -3410,19 +3319,19 @@ static void put_unbound_pool(struct worker_pool *pool)
idr_remove(&worker_pool_idr, pool->id);
hash_del(&pool->hash_node);

- /*
- * Become the manager and destroy all workers. Grabbing
- * manager_arb prevents @pool's workers from blocking on
- * attach_mutex.
- */
- mutex_lock(&pool->manager_arb);
+ /* wait for unfinished creating and shut down the mayday timer */
+ flush_kthread_work(&pool->creater_work);
+ del_timer_sync(&pool->mayday_timer);

+ /* all workers are anchored in idle_list, destroy them all at once */
spin_lock_irq(&pool->lock);
while ((worker = first_idle_worker(pool)))
destroy_worker(worker);
WARN_ON(pool->nr_workers || pool->nr_idle);
spin_unlock_irq(&pool->lock);

+ del_timer_sync(&pool->idle_timer);
+
mutex_lock(&pool->attach_mutex);
if (!list_empty(&pool->workers))
pool->detach_completion = &detach_completion;
@@ -3431,12 +3340,6 @@ static void put_unbound_pool(struct worker_pool *pool)
if (pool->detach_completion)
wait_for_completion(pool->detach_completion);

- mutex_unlock(&pool->manager_arb);
-
- /* shut down the timers */
- del_timer_sync(&pool->idle_timer);
- del_timer_sync(&pool->mayday_timer);
-
/* sched-RCU protected to allow dereferences from get_work_pool() */
call_rcu_sched(&pool->rcu, rcu_free_pool);
}
@@ -3499,10 +3402,6 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
if (worker_pool_assign_id(pool) < 0)
goto fail;

- /* create and start the initial worker */
- if (!create_worker(pool))
- goto fail;
-
/* install */
hash_add(unbound_pool_hash, &pool->hash_node, hash);

@@ -4562,15 +4461,6 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
int pi;

switch (action & ~CPU_TASKS_FROZEN) {
- case CPU_UP_PREPARE:
- for_each_cpu_worker_pool(pool, cpu) {
- if (pool->nr_workers)
- continue;
- if (!create_worker(pool))
- return NOTIFY_BAD;
- }
- break;
-
case CPU_DOWN_FAILED:
case CPU_ONLINE:
mutex_lock(&wq_pool_mutex);
@@ -4837,6 +4727,11 @@ static int __init init_workqueues(void)

wq_numa_init();

+ kworker_creater_thread = kthread_run(kthread_worker_fn,
+ &kworker_creater,
+ "kworker/creater");
+ BUG_ON(IS_ERR(kworker_creater_thread));
+
/* initialize CPU pools */
for_each_possible_cpu(cpu) {
struct worker_pool *pool;
@@ -4849,6 +4744,9 @@ static int __init init_workqueues(void)
pool->attrs->nice = std_nice[i++];
pool->node = cpu_to_node(cpu);

+ if (cpu_online(cpu))
+ pool->flags &= ~POOL_DISASSOCIATED;
+
/* alloc pool ID */
mutex_lock(&wq_pool_mutex);
BUG_ON(worker_pool_assign_id(pool));
@@ -4856,16 +4754,6 @@ static int __init init_workqueues(void)
}
}

- /* create the initial worker */
- for_each_online_cpu(cpu) {
- struct worker_pool *pool;
-
- for_each_cpu_worker_pool(pool, cpu) {
- pool->flags &= ~POOL_DISASSOCIATED;
- BUG_ON(!create_worker(pool));
- }
- }
-
/* create default unbound and ordered wq attrs */
for (i = 0; i < NR_STD_WORKER_POOLS; i++) {
struct workqueue_attrs *attrs;
--
1.7.4.4

2014-07-29 15:05:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2 V2] workqueue: use dedicated creater kthread for all pools

Hello,

On Tue, Jul 29, 2014 at 05:16:07PM +0800, Lai Jiangshan wrote:

First of all, the patch is too big. This is a rather pervasive
change. Please split it up if at all possible.

> +/* Start the mayday timer and the creater when needed */
> +static inline void start_creater_work(struct worker_pool *pool)
> +{
> + if (pool->nr_idle || pool->creating || list_empty(&pool->worklist))
> + return;

pool->creating is an optimization around queue_kthread_work(), right?
So that you don't have to grab the lock every time a work item is
queued. Please explain things like that explicitly. Also, the
condition itself needs explanation. This is what guarantees that the
queue is not stalled after all.

Hmmm... list_empty() is unnecessary when called from the queueing
path. Do we want to move that out of this function?

> /* we own @work, set data and link */
> set_work_pwq(work, pwq, extra_flags);
> list_add_tail(&work->entry, head);
> + start_creater_work(pool);

creator is spelled with an 'o' not 'e'. Also, it'd be better if the
name reflects that this is a kthread_work not a workqueue one.

> +static void create_worker(struct kthread_work *work)
> {
...
> fail:
> if (id >= 0)
> ida_simple_remove(&pool->worker_ida, id);
> kfree(worker);
> - return NULL;
> +
> + /* cool down before next create_worker() */
> + schedule_timeout_interruptible(CREATE_COOLDOWN);
> + del_timer_sync(&pool->mayday_timer);
> +
> + spin_lock_irq(&pool->lock);
> + pool->creating = false;
> + start_creater_work(pool);
> + spin_unlock_irq(&pool->lock);

Why? Just sleep and retry? What's the point of requeueing?

> -/**
> * process_one_work - process single work
> * @worker: self
> * @work: work to process
> @@ -1991,6 +1905,7 @@ __acquires(&pool->lock)
> work_color = get_work_color(work);
>
> list_del_init(&work->entry);
> + start_creater_work(pool);

Should this be combined with wake_up_worker()?

Thanks.

--
tejun

2014-07-30 00:31:31

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2 V2] workqueue: use dedicated creater kthread for all pools

On 07/29/2014 11:04 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Jul 29, 2014 at 05:16:07PM +0800, Lai Jiangshan wrote:
>
> First of all, the patch is too big. This is a rather pervasive
> change. Please split it up if at all possible.
>
>> +/* Start the mayday timer and the creater when needed */
>> +static inline void start_creater_work(struct worker_pool *pool)
>> +{
>> + if (pool->nr_idle || pool->creating || list_empty(&pool->worklist))
>> + return;
>
> pool->creating is an optimization around queue_kthread_work(), right?
> So that you don't have to grab the lock every time a work item is
> queued. Please explain things like that explicitly. Also, the
> condition itself needs explanation. This is what guarantees that the
> queue is not stalled after all.
>
> Hmmm... list_empty() is unnecessary when called from the queueing
> path. Do we want to move that out of this function?
>
>> /* we own @work, set data and link */
>> set_work_pwq(work, pwq, extra_flags);
>> list_add_tail(&work->entry, head);
>> + start_creater_work(pool);
>
> creator is spelled with an 'o' not 'e'. Also, it'd be better if the
> name reflects that this is a kthread_work not a workqueue one.
>
>> +static void create_worker(struct kthread_work *work)
>> {
> ...
>> fail:
>> if (id >= 0)
>> ida_simple_remove(&pool->worker_ida, id);
>> kfree(worker);
>> - return NULL;
>> +
>> + /* cool down before next create_worker() */
>> + schedule_timeout_interruptible(CREATE_COOLDOWN);
>> + del_timer_sync(&pool->mayday_timer);
>> +
>> + spin_lock_irq(&pool->lock);
>> + pool->creating = false;
>> + start_creater_work(pool);
>> + spin_unlock_irq(&pool->lock);
>
> Why? Just sleep and retry? What's the point of requeueing?

Accepted your comments except this one which may need to discuss
for an additional round. Requeueing passes the retry to the
kthread_worker and gives a change to the other pools which are also
creating worker.

This patch will be deferred until 3.19 due to some unbound patches
are ready soon.

Thanks!!
Lai

>
>> -/**
>> * process_one_work - process single work
>> * @worker: self
>> * @work: work to process
>> @@ -1991,6 +1905,7 @@ __acquires(&pool->lock)
>> work_color = get_work_color(work);
>>
>> list_del_init(&work->entry);
>> + start_creater_work(pool);
>
> Should this be combined with wake_up_worker()?
>
> Thanks.
>

2014-07-30 03:23:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2 V2] workqueue: use dedicated creater kthread for all pools

Hello, Lai.

On Wed, Jul 30, 2014 at 08:32:51AM +0800, Lai Jiangshan wrote:
> > Why? Just sleep and retry? What's the point of requeueing?
>
> Accepted your comments except this one which may need to discuss
> for an additional round. Requeueing passes the retry to the
> kthread_worker and gives a change to the other pools which are also
> creating worker.

But why is that a good idea? The fact that creation of a worker for a
specific pool is completely coincidental. The failed pool itself
isn't inherently blameable. It was just unlucky and moving onto a
different pool doesn't improve the chance of success in any way. The
only thing requeueing achieves is punishing the unlucky one by putting
it at the back of the queue while adding complexity.

> This patch will be deferred until 3.19 due to some unbound patches
> are ready soon.

We're too late for this merge window anyway. This is a pretty
pervasive change after all.

Thanks.

--
tejun

2014-07-30 03:44:43

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2 V2] workqueue: use dedicated creater kthread for all pools

On 07/30/2014 11:23 AM, Tejun Heo wrote:
> Hello, Lai.
>
> On Wed, Jul 30, 2014 at 08:32:51AM +0800, Lai Jiangshan wrote:
>>> Why? Just sleep and retry? What's the point of requeueing?
>>
>> Accepted your comments except this one which may need to discuss
>> for an additional round. Requeueing passes the retry to the
>> kthread_worker and gives a change to the other pools which are also
>> creating worker.
>
> But why is that a good idea? The fact that creation of a worker for a
> specific pool is completely coincidental. The failed pool itself
> isn't inherently blameable. It was just unlucky and moving onto a
> different pool doesn't improve the chance of success in any way. The
> only thing requeueing achieves is punishing the unlucky one by putting
> it at the back of the queue while adding complexity.

It add complexity to other things but not the code. The code is simplified.

And failed pool is much fewer than creation which is completely coincidental,
and after cooldown time, the other pool may also create worker.

Thanks,
Lai

>
>> This patch will be deferred until 3.19 due to some unbound patches
>> are ready soon.
>
> We're too late for this merge window anyway. This is a pretty
> pervasive change after all.
>
> Thanks.
>

2014-07-30 03:46:11

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH RFC 2/2 V2] workqueue: use dedicated creater kthread for all pools

On Wed, Jul 30, 2014 at 11:46:02AM +0800, Lai Jiangshan wrote:
> It add complexity to other things but not the code. The code is simplified.

How? It can simply repeat kthread_create() until it succeeds with
msleep() inbetween. How can that be more complex than what's
implemented now?

> And failed pool is much fewer than creation which is completely coincidental,
> and after cooldown time, the other pool may also create worker.

I don't follow the above at all. Can you please elaborate?

Thanks.

--
tejun