2013-03-18 15:24:02

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues

In order to save power, it would be useful to schedule light weight work on cpus
that aren't IDLE instead of waking up an IDLE one.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This is already implemented for timers as get_nohz_timer_target(). We can figure
out few more users of this feature, like workqueues.

This patchset converts get_nohz_timer_target() into a generic API
sched_select_cpu() so that other frameworks (like workqueue) can also use it.

This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
idle, local cpu is returned back. If local cpu is idle, then we must look for
another CPU which have all the flags passed as argument as set and isn't idle.

This patchset in first two patches creates generic API sched_select_cpu(). In
the third patch we create a new set of APIs for workqueues to queue work on any
cpu. All other patches migrate some of the users of workqueues which showed up
significantly on my setup. Others can be migrated later.

Earlier discussions over v1 and v2 can be found here:
http://www.mail-archive.com/[email protected]/msg13342.html
http://lists.linaro.org/pipermail/linaro-dev/2012-November/014344.html

Earlier discussions over this concept were done at last LPC:
http://summit.linuxplumbersconf.org/lpc-2012/meeting/90/lpc2012-sched-timer-workqueue/

Setup:
-----
- ARM Vexpress TC2 - big.LITTLE CPU
- Core 0-1: A15, 2-4: A7
- rootfs: linaro-ubuntu-devel

This patchset has been tested on a big LITTLE system (heterogeneous) but is
useful for all other homogeneous systems as well. During these tests audio was
played in background using aplay.

Results:
-------

Cluster A15 Energy Cluster A7 Energy Total
------------------ ----------------- -----

Without this patchset (Energy in Joules):
---------------------

0.151162 2.183545 2.334707
0.223730 2.687067 2.910797
0.289687 2.732702 3.022389
0.454198 2.745908 3.200106
0.495552 2.746465 3.242017

Average:
0.322866 2.619137 2.942003


With this patchset (Energy in Joules):
---------------------

0.133361 2.267822 2.401183
0.260626 2.833389 3.094015
0.142365 2.277929 2.420294
0.246793 2.582550 2.829343
0.130462 2.257033 2.387495

Average:
0.182721 2.443745 2.626466


Above tests are repeated multiple times and events are tracked using trace-cmd
and analysed using kernelshark. And it was easily noticeable that idle time for
many cpus has increased considerably, which eventually saved some power.

These patches are applied here for others to test:
http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

Viresh Kumar (7):
sched: Create sched_select_cpu() to give preferred CPU for power
saving
timer: hrtimer: Don't check idle_cpu() before calling
get_nohz_timer_target()
workqueue: Add helpers to schedule work on any cpu
PHYLIB: queue work on any cpu
mmc: queue work on any cpu
block: queue work on any cpu
fbcon: queue work on any cpu

block/blk-core.c | 6 +-
block/blk-ioc.c | 2 +-
block/genhd.c | 9 ++-
drivers/mmc/core/core.c | 2 +-
drivers/net/phy/phy.c | 9 +--
drivers/video/console/fbcon.c | 2 +-
include/linux/sched.h | 21 +++++-
include/linux/workqueue.h | 5 ++
kernel/hrtimer.c | 2 +-
kernel/sched/core.c | 63 +++++++++-------
kernel/timer.c | 2 +-
kernel/workqueue.c | 163 +++++++++++++++++++++++++++++-------------
12 files changed, 192 insertions(+), 94 deletions(-)

--
1.7.12.rc2.18.g61b472e


2013-03-18 15:24:21

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 2/7] timer: hrtimer: Don't check idle_cpu() before calling get_nohz_timer_target()

Check for current cpu's idleness is already done in implementation of
sched_select_cpu() which is called by get_nohz_timer_target(). So, no need to
call idle_cpu() twice, once from sched_select_cpu() and once from timer and
hrtimer before calling get_nohz_timer_target().

This patch removes calls to idle_cpu() from timer and hrtimer.

Signed-off-by: Viresh Kumar <[email protected]>
---
kernel/hrtimer.c | 2 +-
kernel/timer.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cc47812..c56668d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -161,7 +161,7 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
static int hrtimer_get_target(int this_cpu, int pinned)
{
#ifdef CONFIG_NO_HZ
- if (!pinned && get_sysctl_timer_migration() && idle_cpu(this_cpu))
+ if (!pinned && get_sysctl_timer_migration())
return get_nohz_timer_target();
#endif
return this_cpu;
diff --git a/kernel/timer.c b/kernel/timer.c
index dbf7a78..83a0d3c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -739,7 +739,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
cpu = smp_processor_id();

#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
- if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu))
+ if (!pinned && get_sysctl_timer_migration())
cpu = get_nohz_timer_target();
#endif
new_base = per_cpu(tvec_bases, cpu);
--
1.7.12.rc2.18.g61b472e

2013-03-18 15:24:33

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

queue_work() queues work on current cpu. This may wake up an idle CPU, which is
actually not required.

Some of these works can be processed by any CPU and so we must select a non-idle
CPU here. The initial idea was to modify implementation of queue_work(), but
that may end up breaking lots of kernel code that would be a nightmare to debug.

So, we finalized to adding new workqueue interfaces, for works that don't depend
on a cpu to execute them.

This patch adds following new routines:
- queue_work_on_any_cpu
- queue_delayed_work_on_any_cpu

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Signed-off-by: Viresh Kumar <[email protected]>
---
include/linux/workqueue.h | 5 ++
kernel/workqueue.c | 163 ++++++++++++++++++++++++++++++++--------------
2 files changed, 118 insertions(+), 50 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index df30763..f0f7068 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -114,6 +114,7 @@ struct delayed_work {
/* target workqueue and CPU ->timer uses to queue ->work */
struct workqueue_struct *wq;
int cpu;
+ bool on_any_cpu;
};

/*
@@ -418,10 +419,14 @@ int apply_workqueue_attrs(struct workqueue_struct *wq,
extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work);
extern bool queue_work(struct workqueue_struct *wq, struct work_struct *work);
+extern bool queue_work_on_any_cpu(struct workqueue_struct *wq,
+ struct work_struct *work);
extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *work, unsigned long delay);
extern bool queue_delayed_work(struct workqueue_struct *wq,
struct delayed_work *work, unsigned long delay);
+extern bool queue_delayed_work_on_any_cpu(struct workqueue_struct *wq,
+ struct delayed_work *dwork, unsigned long delay);
extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *dwork, unsigned long delay);
extern bool mod_delayed_work(struct workqueue_struct *wq,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0e4fa1d..cf9c570 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1215,7 +1215,7 @@ static bool is_chained_work(struct workqueue_struct *wq)
}

static void __queue_work(int cpu, struct workqueue_struct *wq,
- struct work_struct *work)
+ struct work_struct *work, bool on_any_cpu)
{
struct pool_workqueue *pwq;
struct worker_pool *last_pool;
@@ -1240,8 +1240,13 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
retry:
/* pwq which will be used unless @work is executing elsewhere */
if (!(wq->flags & WQ_UNBOUND)) {
- if (cpu == WORK_CPU_UNBOUND)
- cpu = raw_smp_processor_id();
+ if (cpu == WORK_CPU_UNBOUND) {
+ if (on_any_cpu)
+ cpu = sched_select_cpu(0);
+ else
+ cpu = raw_smp_processor_id();
+ }
+
pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
} else {
pwq = first_pwq(wq);
@@ -1315,6 +1320,22 @@ retry:
spin_unlock(&pwq->pool->lock);
}

+static bool __queue_work_on(int cpu, struct workqueue_struct *wq,
+ struct work_struct *work, bool on_any_cpu)
+{
+ bool ret = false;
+ unsigned long flags;
+
+ local_irq_save(flags);
+
+ if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+ __queue_work(cpu, wq, work, on_any_cpu);
+ ret = true;
+ }
+
+ local_irq_restore(flags);
+ return ret;
+}
/**
* queue_work_on - queue work on specific cpu
* @cpu: CPU number to execute work on
@@ -1329,18 +1350,7 @@ retry:
bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work)
{
- bool ret = false;
- unsigned long flags;
-
- local_irq_save(flags);
-
- if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
- __queue_work(cpu, wq, work);
- ret = true;
- }
-
- local_irq_restore(flags);
- return ret;
+ return __queue_work_on(cpu, wq, work, false);
}
EXPORT_SYMBOL_GPL(queue_work_on);

@@ -1356,21 +1366,38 @@ EXPORT_SYMBOL_GPL(queue_work_on);
*/
bool queue_work(struct workqueue_struct *wq, struct work_struct *work)
{
- return queue_work_on(WORK_CPU_UNBOUND, wq, work);
+ return __queue_work_on(WORK_CPU_UNBOUND, wq, work, false);
}
EXPORT_SYMBOL_GPL(queue_work);

+/**
+ * queue_work_on_any_cpu - queue work on any cpu on a workqueue
+ * @wq: workqueue to use
+ * @work: work to queue
+ *
+ * Returns %false if @work was already on a queue, %true otherwise.
+ *
+ * We queue the work to any non-idle (from schedulers perspective) cpu.
+ */
+bool queue_work_on_any_cpu(struct workqueue_struct *wq,
+ struct work_struct *work)
+{
+ return __queue_work_on(WORK_CPU_UNBOUND, wq, work, true);
+}
+EXPORT_SYMBOL_GPL(queue_work_on_any_cpu);
+
void delayed_work_timer_fn(unsigned long __data)
{
struct delayed_work *dwork = (struct delayed_work *)__data;

/* should have been called from irqsafe timer with irq already off */
- __queue_work(dwork->cpu, dwork->wq, &dwork->work);
+ __queue_work(dwork->cpu, dwork->wq, &dwork->work, dwork->on_any_cpu);
}
EXPORT_SYMBOL(delayed_work_timer_fn);

static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
- struct delayed_work *dwork, unsigned long delay)
+ struct delayed_work *dwork, unsigned long delay,
+ bool on_any_cpu)
{
struct timer_list *timer = &dwork->timer;
struct work_struct *work = &dwork->work;
@@ -1387,7 +1414,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
* on that there's no such delay when @delay is 0.
*/
if (!delay) {
- __queue_work(cpu, wq, &dwork->work);
+ __queue_work(cpu, wq, &dwork->work, on_any_cpu);
return;
}

@@ -1395,6 +1422,7 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,

dwork->wq = wq;
dwork->cpu = cpu;
+ dwork->on_any_cpu = on_any_cpu;
timer->expires = jiffies + delay;

if (unlikely(cpu != WORK_CPU_UNBOUND))
@@ -1403,19 +1431,9 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
add_timer(timer);
}

-/**
- * queue_delayed_work_on - queue work on specific CPU after delay
- * @cpu: CPU number to execute work on
- * @wq: workqueue to use
- * @dwork: work to queue
- * @delay: number of jiffies to wait before queueing
- *
- * Returns %false if @work was already on a queue, %true otherwise. If
- * @delay is zero and @dwork is idle, it will be scheduled for immediate
- * execution.
- */
-bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
- struct delayed_work *dwork, unsigned long delay)
+static bool __queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+ struct delayed_work *dwork, unsigned long delay,
+ bool on_any_cpu)
{
struct work_struct *work = &dwork->work;
bool ret = false;
@@ -1425,13 +1443,30 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
local_irq_save(flags);

if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
- __queue_delayed_work(cpu, wq, dwork, delay);
+ __queue_delayed_work(cpu, wq, dwork, delay, on_any_cpu);
ret = true;
}

local_irq_restore(flags);
return ret;
}
+
+/**
+ * queue_delayed_work_on - queue work on specific CPU after delay
+ * @cpu: CPU number to execute work on
+ * @wq: workqueue to use
+ * @dwork: work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * Returns %false if @work was already on a queue, %true otherwise. If
+ * @delay is zero and @dwork is idle, it will be scheduled for immediate
+ * execution.
+ */
+bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
+ struct delayed_work *dwork, unsigned long delay)
+{
+ return __queue_delayed_work_on(cpu, wq, dwork, delay, false);
+}
EXPORT_SYMBOL_GPL(queue_delayed_work_on);

/**
@@ -1445,11 +1480,50 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
bool queue_delayed_work(struct workqueue_struct *wq,
struct delayed_work *dwork, unsigned long delay)
{
- return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+ return __queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay,
+ false);
}
EXPORT_SYMBOL_GPL(queue_delayed_work);

/**
+ * queue_delayed_work_on_any_cpu - queue work on any non-idle cpu on a workqueue
+ * after delay
+ * @wq: workqueue to use
+ * @dwork: delayable work to queue
+ * @delay: number of jiffies to wait before queueing
+ *
+ * Equivalent to queue_delayed_work() but tries to use any non-idle (from
+ * schedulers perspective) CPU.
+ */
+bool queue_delayed_work_on_any_cpu(struct workqueue_struct *wq,
+ struct delayed_work *dwork, unsigned long delay)
+{
+ return __queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay,
+ true);
+}
+EXPORT_SYMBOL_GPL(queue_delayed_work_on_any_cpu);
+
+static bool __mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
+ struct delayed_work *dwork, unsigned long delay,
+ bool on_any_cpu)
+{
+ unsigned long flags;
+ int ret;
+
+ do {
+ ret = try_to_grab_pending(&dwork->work, true, &flags);
+ } while (unlikely(ret == -EAGAIN));
+
+ if (likely(ret >= 0)) {
+ __queue_delayed_work(cpu, wq, dwork, delay, on_any_cpu);
+ local_irq_restore(flags);
+ }
+
+ /* -ENOENT from try_to_grab_pending() becomes %true */
+ return ret;
+}
+
+/**
* mod_delayed_work_on - modify delay of or queue a delayed work on specific CPU
* @cpu: CPU number to execute work on
* @wq: workqueue to use
@@ -1470,20 +1544,7 @@ EXPORT_SYMBOL_GPL(queue_delayed_work);
bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *dwork, unsigned long delay)
{
- unsigned long flags;
- int ret;
-
- do {
- ret = try_to_grab_pending(&dwork->work, true, &flags);
- } while (unlikely(ret == -EAGAIN));
-
- if (likely(ret >= 0)) {
- __queue_delayed_work(cpu, wq, dwork, delay);
- local_irq_restore(flags);
- }
-
- /* -ENOENT from try_to_grab_pending() becomes %true */
- return ret;
+ return __mod_delayed_work_on(cpu, wq, dwork, delay, false);
}
EXPORT_SYMBOL_GPL(mod_delayed_work_on);

@@ -1498,7 +1559,8 @@ EXPORT_SYMBOL_GPL(mod_delayed_work_on);
bool mod_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork,
unsigned long delay)
{
- return mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
+ return __mod_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay,
+ dwork->on_any_cpu);
}
EXPORT_SYMBOL_GPL(mod_delayed_work);

@@ -2952,7 +3014,8 @@ bool flush_delayed_work(struct delayed_work *dwork)
{
local_irq_disable();
if (del_timer_sync(&dwork->timer))
- __queue_work(dwork->cpu, dwork->wq, &dwork->work);
+ __queue_work(dwork->cpu, dwork->wq, &dwork->work,
+ dwork->on_any_cpu);
local_irq_enable();
return flush_work(&dwork->work);
}
--
1.7.12.rc2.18.g61b472e

2013-03-18 15:24:41

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 4/7] PHYLIB: queue work on any cpu

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces the schedule_work() and schedule_delayed_work() routines
with their queue_[delayed_]work_on_any_cpu() siblings with system_wq as
parameter.

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/net/phy/phy.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 298b4c2..a517706 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -439,7 +439,7 @@ void phy_start_machine(struct phy_device *phydev,
{
phydev->adjust_state = handler;

- schedule_delayed_work(&phydev->state_queue, HZ);
+ queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue, HZ);
}

/**
@@ -527,7 +527,7 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
disable_irq_nosync(irq);
atomic_inc(&phydev->irq_disable);

- schedule_work(&phydev->phy_queue);
+ queue_work_on_any_cpu(system_wq, &phydev->phy_queue);

return IRQ_HANDLED;
}
@@ -682,7 +682,7 @@ static void phy_change(struct work_struct *work)

/* reschedule state queue work to run as soon as possible */
cancel_delayed_work_sync(&phydev->state_queue);
- schedule_delayed_work(&phydev->state_queue, 0);
+ queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue, 0);

return;

@@ -966,7 +966,8 @@ void phy_state_machine(struct work_struct *work)
if (err < 0)
phy_error(phydev);

- schedule_delayed_work(&phydev->state_queue, PHY_STATE_TIME * HZ);
+ queue_delayed_work_on_any_cpu(system_wq, &phydev->state_queue,
+ PHY_STATE_TIME * HZ);
}

static inline void mmd_phy_indirect(struct mii_bus *bus, int prtad, int devad,
--
1.7.12.rc2.18.g61b472e

2013-03-18 15:24:51

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 5/7] mmc: queue work on any cpu

mmc uses workqueues for running mmc_rescan(). There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces the queue_delayed_work() with
queue_delayed_work_on_any_cpu() siblings.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Chris Ball <[email protected]>
Cc: [email protected]
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/mmc/core/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 9290bb5..adf331a 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -85,7 +85,7 @@ MODULE_PARM_DESC(
static int mmc_schedule_delayed_work(struct delayed_work *work,
unsigned long delay)
{
- return queue_delayed_work(workqueue, work, delay);
+ return queue_delayed_work_on_any_cpu(workqueue, work, delay);
}

/*
--
1.7.12.rc2.18.g61b472e

2013-03-18 15:25:03

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 6/7] block: queue work on any cpu

block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces schedule_work() and queue_[delayed_]work() with
queue_[delayed_]work_on_any_cpu() siblings.

These routines would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Jens Axboe <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
block/blk-core.c | 6 +++---
block/blk-ioc.c | 2 +-
block/genhd.c | 9 ++++++---
3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 186603b..14ae74f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -225,7 +225,7 @@ static void blk_delay_work(struct work_struct *work)
void blk_delay_queue(struct request_queue *q, unsigned long msecs)
{
if (likely(!blk_queue_dead(q)))
- queue_delayed_work(kblockd_workqueue, &q->delay_work,
+ queue_delayed_work_on_any_cpu(kblockd_workqueue, &q->delay_work,
msecs_to_jiffies(msecs));
}
EXPORT_SYMBOL(blk_delay_queue);
@@ -2852,14 +2852,14 @@ EXPORT_SYMBOL_GPL(blk_rq_prep_clone);

int kblockd_schedule_work(struct request_queue *q, struct work_struct *work)
{
- return queue_work(kblockd_workqueue, work);
+ return queue_work_on_any_cpu(kblockd_workqueue, work);
}
EXPORT_SYMBOL(kblockd_schedule_work);

int kblockd_schedule_delayed_work(struct request_queue *q,
struct delayed_work *dwork, unsigned long delay)
{
- return queue_delayed_work(kblockd_workqueue, dwork, delay);
+ return queue_delayed_work_on_any_cpu(kblockd_workqueue, dwork, delay);
}
EXPORT_SYMBOL(kblockd_schedule_delayed_work);

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..2eefeb1 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -144,7 +144,7 @@ void put_io_context(struct io_context *ioc)
if (atomic_long_dec_and_test(&ioc->refcount)) {
spin_lock_irqsave(&ioc->lock, flags);
if (!hlist_empty(&ioc->icq_list))
- schedule_work(&ioc->release_work);
+ queue_work_on_any_cpu(system_wq, &ioc->release_work);
else
free_ioc = true;
spin_unlock_irqrestore(&ioc->lock, flags);
diff --git a/block/genhd.c b/block/genhd.c
index a1ed52a..4bdb735 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1488,9 +1488,11 @@ static void __disk_unblock_events(struct gendisk *disk, bool check_now)
intv = disk_events_poll_jiffies(disk);
set_timer_slack(&ev->dwork.timer, intv / 4);
if (check_now)
- queue_delayed_work(system_freezable_wq, &ev->dwork, 0);
+ queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+ 0);
else if (intv)
- queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+ queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+ intv);
out_unlock:
spin_unlock_irqrestore(&ev->lock, flags);
}
@@ -1626,7 +1628,8 @@ static void disk_check_events(struct disk_events *ev,

intv = disk_events_poll_jiffies(disk);
if (!ev->block && intv)
- queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+ queue_delayed_work_on_any_cpu(system_freezable_wq, &ev->dwork,
+ intv);

spin_unlock_irq(&ev->lock);

--
1.7.12.rc2.18.g61b472e

2013-03-18 15:25:15

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 7/7] fbcon: queue work on any cpu

fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up few times just to
service this work. It would be better if we can schedule it on a cpu which isn't
idle to save on power.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This patch replaces schedule_work() routine with queue_work_on_any_cpu()
sibling with system_wq as workqueue.

This routine would look for the closest (via scheduling domains) non-idle cpu
(non-idle from schedulers perspective). If the current cpu is not idle or all
cpus are idle, work will be scheduled on local cpu.

Cc: Dave Airlie <[email protected]>
Cc: [email protected]
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/video/console/fbcon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 3cd6759..a900997 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -404,7 +404,7 @@ static void cursor_timer_handler(unsigned long dev_addr)
struct fb_info *info = (struct fb_info *) dev_addr;
struct fbcon_ops *ops = info->fbcon_par;

- schedule_work(&info->queue);
+ queue_work_on_any_cpu(system_wq, &info->queue);
mod_timer(&ops->cursor_timer, jiffies + HZ/5);
}

--
1.7.12.rc2.18.g61b472e

2013-03-18 15:30:33

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

In order to save power, it would be useful to schedule light weight work on cpus
that aren't IDLE instead of waking up an IDLE one.

By idle cpu (from scheduler's perspective) we mean:
- Current task is idle task
- nr_running == 0
- wake_list is empty

This is already implemented for timers as get_nohz_timer_target(). We can figure
out few more users of this feature, like workqueues.

This patch converts get_nohz_timer_target() into a generic API
sched_select_cpu() so that other frameworks (like workqueue) can also use it.

This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
idle, local cpu is returned back. If local cpu is idle, then we must look for
another CPU which have all the flags passed as argument as set and isn't idle.

This patch reuses the code from get_nohz_timer_target() routine, which had
similar implementation.

Signed-off-by: Viresh Kumar <[email protected]>
---
include/linux/sched.h | 21 +++++++++++++++--
kernel/sched/core.c | 63 +++++++++++++++++++++++++++++----------------------
2 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e20580d..216fa0d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -230,14 +230,31 @@ extern void init_idle_bootup_task(struct task_struct *idle);

extern int runqueue_is_locked(int cpu);

-#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
+#ifdef CONFIG_SMP
+extern int sched_select_cpu(unsigned int sd_flags);
+
+#ifdef CONFIG_NO_HZ
extern void nohz_balance_enter_idle(int cpu);
extern void set_cpu_sd_state_idle(void);
-extern int get_nohz_timer_target(void);
+/*
+ * In the semi idle case, use the nearest busy cpu for migrating timers
+ * from an idle cpu. This is good for power-savings.
+ *
+ * We don't do similar optimization for completely idle system, as
+ * selecting an idle cpu will add more delays to the timers than intended
+ * (as that cpu's timer base may not be uptodate wrt jiffies etc).
+ */
+#define get_nohz_timer_target() sched_select_cpu(0)
#else
static inline void nohz_balance_enter_idle(int cpu) { }
static inline void set_cpu_sd_state_idle(void) { }
+
+static inline int sched_select_cpu(unsigned int sd_flags)
+{
+ return raw_smp_processor_id();
+}
#endif
+#endif /* CONFIG_SMP */

/*
* Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b36635e..ccd76d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -551,33 +551,6 @@ void resched_cpu(int cpu)

#ifdef CONFIG_NO_HZ
/*
- * In the semi idle case, use the nearest busy cpu for migrating timers
- * from an idle cpu. This is good for power-savings.
- *
- * We don't do similar optimization for completely idle system, as
- * selecting an idle cpu will add more delays to the timers than intended
- * (as that cpu's timer base may not be uptodate wrt jiffies etc).
- */
-int get_nohz_timer_target(void)
-{
- int cpu = smp_processor_id();
- int i;
- struct sched_domain *sd;
-
- rcu_read_lock();
- for_each_domain(cpu, sd) {
- for_each_cpu(i, sched_domain_span(sd)) {
- if (!idle_cpu(i)) {
- cpu = i;
- goto unlock;
- }
- }
- }
-unlock:
- rcu_read_unlock();
- return cpu;
-}
-/*
* When add_timer_on() enqueues a timer into the timer wheel of an
* idle CPU then this timer might expire before the next timer event
* which is scheduled to wake up that CPU. In case of a completely
@@ -648,6 +621,42 @@ void sched_avg_update(struct rq *rq)
}
}

+/*
+ * This routine returns the nearest non-idle cpu. It accepts a bitwise OR of
+ * SD_* flags present in linux/sched.h. If the local CPU isn't idle, it is
+ * returned back. If it is idle, then we must look for another CPU which have
+ * all the flags passed as argument as set.
+ */
+int sched_select_cpu(unsigned int sd_flags)
+{
+ struct sched_domain *sd;
+ int cpu = smp_processor_id();
+ int i;
+
+ /* If Current cpu isn't idle, don't migrate anything */
+ if (!idle_cpu(cpu))
+ return cpu;
+
+ rcu_read_lock();
+ for_each_domain(cpu, sd) {
+ /* If sd doesnt' have sd_flags set skip sd. */
+ if ((sd->flags & sd_flags) != sd_flags)
+ continue;
+
+ for_each_cpu(i, sched_domain_span(sd)) {
+ if (i == cpu)
+ continue;
+ if (!idle_cpu(i)) {
+ cpu = i;
+ goto unlock;
+ }
+ }
+ }
+unlock:
+ rcu_read_unlock();
+ return cpu;
+}
+
#else /* !CONFIG_SMP */
void resched_task(struct task_struct *p)
{
--
1.7.12.rc2.18.g61b472e

2013-03-18 15:39:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013/3/18 Viresh Kumar <[email protected]>:
> In order to save power, it would be useful to schedule light weight work on cpus
> that aren't IDLE instead of waking up an IDLE one.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This is already implemented for timers as get_nohz_timer_target(). We can figure
> out few more users of this feature, like workqueues.
>
> This patch converts get_nohz_timer_target() into a generic API
> sched_select_cpu() so that other frameworks (like workqueue) can also use it.
>
> This routine returns the cpu which is non-idle. It accepts a bitwise OR of SD_*
> flags present in linux/sched.h. If the local CPU isn't idle OR all cpus are
> idle, local cpu is returned back. If local cpu is idle, then we must look for
> another CPU which have all the flags passed as argument as set and isn't idle.
>
> This patch reuses the code from get_nohz_timer_target() routine, which had
> similar implementation.
>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> include/linux/sched.h | 21 +++++++++++++++--
> kernel/sched/core.c | 63 +++++++++++++++++++++++++++++----------------------
> 2 files changed, 55 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e20580d..216fa0d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -230,14 +230,31 @@ extern void init_idle_bootup_task(struct task_struct *idle);
>
> extern int runqueue_is_locked(int cpu);
>
> -#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
> +#ifdef CONFIG_SMP
> +extern int sched_select_cpu(unsigned int sd_flags);
> +
> +#ifdef CONFIG_NO_HZ
> extern void nohz_balance_enter_idle(int cpu);
> extern void set_cpu_sd_state_idle(void);
> -extern int get_nohz_timer_target(void);
> +/*
> + * In the semi idle case, use the nearest busy cpu for migrating timers
> + * from an idle cpu. This is good for power-savings.
> + *
> + * We don't do similar optimization for completely idle system, as
> + * selecting an idle cpu will add more delays to the timers than intended
> + * (as that cpu's timer base may not be uptodate wrt jiffies etc).
> + */
> +#define get_nohz_timer_target() sched_select_cpu(0)
> #else
> static inline void nohz_balance_enter_idle(int cpu) { }
> static inline void set_cpu_sd_state_idle(void) { }
> +
> +static inline int sched_select_cpu(unsigned int sd_flags)
> +{
> + return raw_smp_processor_id();

I feel this should be symetric with the requirement of having
preemption disabled as in the CONFIG_NO_HZ version. This should be
smp_processor_id().

[...]
> @@ -648,6 +621,42 @@ void sched_avg_update(struct rq *rq)
> }
> }
>
> +/*
> + * This routine returns the nearest non-idle cpu. It accepts a bitwise OR of
> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle, it is
> + * returned back. If it is idle, then we must look for another CPU which have
> + * all the flags passed as argument as set.
> + */
> +int sched_select_cpu(unsigned int sd_flags)

It would be nice to have some more precise naming. sched_select_busy_cpu() ?

2013-03-18 15:44:13

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

On Mon, Mar 18, 2013 at 9:09 PM, Frederic Weisbecker <[email protected]> wrote:
> 2013/3/18 Viresh Kumar <[email protected]>:

>> +static inline int sched_select_cpu(unsigned int sd_flags)
>> +{
>> + return raw_smp_processor_id();
>
> I feel this should be symetric with the requirement of having
> preemption disabled as in the CONFIG_NO_HZ version. This should be
> smp_processor_id().

Yes, my fault.

>> +int sched_select_cpu(unsigned int sd_flags)
>
> It would be nice to have some more precise naming. sched_select_busy_cpu() ?

You are correct that name can be improved but busy may give it a different
meaning. Maybe sched_select_non_idle_cpu()?

Thanks for your instantaneous review :)

2013-03-18 15:57:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

2013/3/18 Viresh Kumar <[email protected]>:
> On Mon, Mar 18, 2013 at 9:09 PM, Frederic Weisbecker <[email protected]> wrote:
>> 2013/3/18 Viresh Kumar <[email protected]>:
>
>>> +static inline int sched_select_cpu(unsigned int sd_flags)
>>> +{
>>> + return raw_smp_processor_id();
>>
>> I feel this should be symetric with the requirement of having
>> preemption disabled as in the CONFIG_NO_HZ version. This should be
>> smp_processor_id().
>
> Yes, my fault.
>
>>> +int sched_select_cpu(unsigned int sd_flags)
>>
>> It would be nice to have some more precise naming. sched_select_busy_cpu() ?
>
> You are correct that name can be improved but busy may give it a different
> meaning. Maybe sched_select_non_idle_cpu()?

That works too yeah.

Thanks.

2013-03-18 17:33:46

by David Miller

[permalink] [raw]
Subject: Re: [PATCH V3 4/7] PHYLIB: queue work on any cpu

From: Viresh Kumar <[email protected]>
Date: Mon, 18 Mar 2013 20:53:26 +0530

> Phylib uses workqueues for multiple purposes. There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the schedule_work() and schedule_delayed_work() routines
> with their queue_[delayed_]work_on_any_cpu() siblings with system_wq as
> parameter.
>
> These routines would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: "David S. Miller" <[email protected]>
> Cc: [email protected]
> Signed-off-by: Viresh Kumar <[email protected]>

This will need to be applied to whatever tree adds these new interfaces,
and for that:

Acked-by: David S. Miller <[email protected]>

2013-03-19 05:00:20

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 0/7] Create sched_select_cpu() and use it for workqueues

On 18 March 2013 20:53, Viresh Kumar <[email protected]> wrote:
> In order to save power, it would be useful to schedule light weight work on cpus
> that aren't IDLE instead of waking up an IDLE one.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty

Oops!! I forgot the change log:

V2->V3:
- Dropped changes into core queue_work() API, rather create *_on_any_cpu()
APIs
- Dropped running timers migration patch as that was broken
- Migrated few users of workqueues to use *_on_any_cpu() APIs.

2013-03-19 05:15:12

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

On 18 March 2013 21:27, Frederic Weisbecker <[email protected]> wrote:
> 2013/3/18 Viresh Kumar <[email protected]>:
>>> It would be nice to have some more precise naming. sched_select_busy_cpu() ?
>>
>> You are correct that name can be improved but busy may give it a different
>> meaning. Maybe sched_select_non_idle_cpu()?
>
> That works too yeah.

Here is the fixup that i applied (it fixes another stupid mistake in sched.h).
I have pushed this series here again:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/sched-wq-migration-v3

commit e769ff0d78fd2fb504ae056a44b70bba7c259126
Author: Viresh Kumar <[email protected]>
Date: Fri Sep 14 15:13:05 2012 +0530

fixup! sched: Create sched_select_cpu() to give preferred CPU for
power saving

Signed-off-by: Viresh Kumar <[email protected]>
---
include/linux/sched.h | 19 ++++++++++---------
kernel/sched/core.c | 2 +-
2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 216fa0d..db6655a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,11 +231,18 @@ extern void init_idle_bootup_task(struct
task_struct *idle);
extern int runqueue_is_locked(int cpu);

#ifdef CONFIG_SMP
-extern int sched_select_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+#else
+static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+{
+ return smp_processor_id();
+}
+#endif

-#ifdef CONFIG_NO_HZ
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
extern void nohz_balance_enter_idle(int cpu);
extern void set_cpu_sd_state_idle(void);
+
/*
* In the semi idle case, use the nearest busy cpu for migrating timers
* from an idle cpu. This is good for power-savings.
@@ -244,17 +251,11 @@ extern void set_cpu_sd_state_idle(void);
* selecting an idle cpu will add more delays to the timers than intended
* (as that cpu's timer base may not be uptodate wrt jiffies etc).
*/
-#define get_nohz_timer_target() sched_select_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
#else
static inline void nohz_balance_enter_idle(int cpu) { }
static inline void set_cpu_sd_state_idle(void) { }
-
-static inline int sched_select_cpu(unsigned int sd_flags)
-{
- return raw_smp_processor_id();
-}
#endif
-#endif /* CONFIG_SMP */

/*
* Only dump TASK_* tasks. (0 for all tasks)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ccd76d7..0265a5e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -627,7 +627,7 @@ void sched_avg_update(struct rq *rq)
* returned back. If it is idle, then we must look for another CPU which have
* all the flags passed as argument as set.
*/
-int sched_select_cpu(unsigned int sd_flags)
+int sched_select_non_idle_cpu(unsigned int sd_flags)
{
struct sched_domain *sd;
int cpu = smp_processor_id();

2013-03-19 05:15:36

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On 18 March 2013 20:53, Viresh Kumar <[email protected]> wrote:
> queue_work() queues work on current cpu. This may wake up an idle CPU, which is
> actually not required.
>
> Some of these works can be processed by any CPU and so we must select a non-idle
> CPU here. The initial idea was to modify implementation of queue_work(), but
> that may end up breaking lots of kernel code that would be a nightmare to debug.
>
> So, we finalized to adding new workqueue interfaces, for works that don't depend
> on a cpu to execute them.

Fixup:

commit 6e5b6fac3353f5e4e5490931b3eb6d3fd7864ca0
Author: Viresh Kumar <[email protected]>
Date: Tue Mar 19 10:34:20 2013 +0530

fixup! workqueue: Add helpers to schedule work on any cpu
---
kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cf9c570..68daf50 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1242,7 +1242,7 @@ retry:
if (!(wq->flags & WQ_UNBOUND)) {
if (cpu == WORK_CPU_UNBOUND) {
if (on_any_cpu)
- cpu = sched_select_cpu(0);
+ cpu = sched_select_non_idle_cpu(0);
else
cpu = raw_smp_processor_id();
}

2013-03-19 07:58:56

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH V3 5/7] mmc: queue work on any cpu

On 18 March 2013 16:23, Viresh Kumar <[email protected]> wrote:
> mmc uses workqueues for running mmc_rescan(). There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the queue_delayed_work() with
> queue_delayed_work_on_any_cpu() siblings.
>
> This routine would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: Chris Ball <[email protected]>
> Cc: [email protected]
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> drivers/mmc/core/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9290bb5..adf331a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -85,7 +85,7 @@ MODULE_PARM_DESC(
> static int mmc_schedule_delayed_work(struct delayed_work *work,
> unsigned long delay)
> {
> - return queue_delayed_work(workqueue, work, delay);
> + return queue_delayed_work_on_any_cpu(workqueue, work, delay);
> }
>
> /*
> --
> 1.7.12.rc2.18.g61b472e
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Acked-by: Ulf Hansson <[email protected]>

2013-03-19 12:30:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

On Mon, 2013-03-18 at 20:53 +0530, Viresh Kumar wrote:
> +/*
> + * This routine returns the nearest non-idle cpu. It accepts a
> bitwise OR of
> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle,
> it is
> + * returned back. If it is idle, then we must look for another CPU
> which have
> + * all the flags passed as argument as set.
> + */
> +int sched_select_cpu(unsigned int sd_flags)

So the only problem I have is that you expose the SD_flags to !sched
code. The only proposed user is in the workqueue code and passes 0.

Why not leave out the sd_flags argument and introduce it once you start
using it; at which point we can argue on the interface.

2013-03-19 12:52:06

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

On 19 March 2013 18:00, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2013-03-18 at 20:53 +0530, Viresh Kumar wrote:
>> +/*
>> + * This routine returns the nearest non-idle cpu. It accepts a
>> bitwise OR of
>> + * SD_* flags present in linux/sched.h. If the local CPU isn't idle,
>> it is
>> + * returned back. If it is idle, then we must look for another CPU
>> which have
>> + * all the flags passed as argument as set.
>> + */
>> +int sched_select_cpu(unsigned int sd_flags)
>
> So the only problem I have is that you expose the SD_flags to !sched
> code. The only proposed user is in the workqueue code and passes 0.
>
> Why not leave out the sd_flags argument and introduce it once you start
> using it; at which point we can argue on the interface.

Yes, that can be done. Will fix it up.

2013-03-19 13:22:28

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] sched: Create sched_select_cpu() to give preferred CPU for power saving

On 19 March 2013 18:22, Viresh Kumar <[email protected]> wrote:
> On 19 March 2013 18:00, Peter Zijlstra <[email protected]> wrote:
>> Why not leave out the sd_flags argument and introduce it once you start
>> using it; at which point we can argue on the interface.
>
> Yes, that can be done. Will fix it up.

Fixup:

commit 77927939224520cbb0ac47270d3458bedffe42c4
Author: Viresh Kumar <[email protected]>
Date: Tue Mar 19 18:50:37 2013 +0530

fixup! sched: Create sched_select_non_idle_cpu() to give preferred
CPU for power saving
---
include/linux/sched.h | 6 +++---
kernel/sched/core.c | 6 +-----
2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index db6655a..37eb1dd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -231,9 +231,9 @@ extern void init_idle_bootup_task(struct task_struct *idle);
extern int runqueue_is_locked(int cpu);

#ifdef CONFIG_SMP
-extern int sched_select_non_idle_cpu(unsigned int sd_flags);
+extern int sched_select_non_idle_cpu(void);
#else
-static inline int sched_select_non_idle_cpu(unsigned int sd_flags)
+static inline int sched_select_non_idle_cpu(void)
{
return smp_processor_id();
}
@@ -251,7 +251,7 @@ extern void set_cpu_sd_state_idle(void);
* selecting an idle cpu will add more delays to the timers than intended
* (as that cpu's timer base may not be uptodate wrt jiffies etc).
*/
-#define get_nohz_timer_target() sched_select_non_idle_cpu(0)
+#define get_nohz_timer_target() sched_select_non_idle_cpu()
#else
static inline void nohz_balance_enter_idle(int cpu) { }
static inline void set_cpu_sd_state_idle(void) { }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0265a5e..f597d2b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -627,7 +627,7 @@ void sched_avg_update(struct rq *rq)
* returned back. If it is idle, then we must look for another CPU which have
* all the flags passed as argument as set.
*/
-int sched_select_non_idle_cpu(unsigned int sd_flags)
+int sched_select_non_idle_cpu(void)
{
struct sched_domain *sd;
int cpu = smp_processor_id();
@@ -639,10 +639,6 @@ int sched_select_non_idle_cpu(unsigned int sd_flags)

rcu_read_lock();
for_each_domain(cpu, sd) {
- /* If sd doesnt' have sd_flags set skip sd. */
- if ((sd->flags & sd_flags) != sd_flags)
- continue;
-
for_each_cpu(i, sched_domain_span(sd)) {
if (i == cpu)
continue;

2013-03-19 13:23:35

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On 19 March 2013 10:45, Viresh Kumar <[email protected]> wrote:
> On 18 March 2013 20:53, Viresh Kumar <[email protected]> wrote:
>> queue_work() queues work on current cpu. This may wake up an idle CPU, which is
>> actually not required.
>>
>> Some of these works can be processed by any CPU and so we must select a non-idle
>> CPU here. The initial idea was to modify implementation of queue_work(), but
>> that may end up breaking lots of kernel code that would be a nightmare to debug.
>>
>> So, we finalized to adding new workqueue interfaces, for works that don't depend
>> on a cpu to execute them.

Another fixup:

commit 8753c6d936faa6e3233cbf44a55913d05de05683
Author: Viresh Kumar <[email protected]>
Date: Tue Mar 19 18:50:59 2013 +0530

fixup! workqueue: Add helpers to schedule work on any cpu
---
kernel/workqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 68daf50..4e023ab 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1242,7 +1242,7 @@ retry:
if (!(wq->flags & WQ_UNBOUND)) {
if (cpu == WORK_CPU_UNBOUND) {
if (on_any_cpu)
- cpu = sched_select_non_idle_cpu(0);
+ cpu = sched_select_non_idle_cpu();
else
cpu = raw_smp_processor_id();
}

2013-03-21 00:12:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

Hello, Viresh.

On Mon, Mar 18, 2013 at 08:53:25PM +0530, Viresh Kumar wrote:
> queue_work() queues work on current cpu. This may wake up an idle CPU, which is
> actually not required.
>
> Some of these works can be processed by any CPU and so we must select a non-idle
> CPU here. The initial idea was to modify implementation of queue_work(), but
> that may end up breaking lots of kernel code that would be a nightmare to debug.
>
> So, we finalized to adding new workqueue interfaces, for works that don't depend
> on a cpu to execute them.
>
> This patch adds following new routines:
> - queue_work_on_any_cpu
> - queue_delayed_work_on_any_cpu
>
> These routines would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.

For some reason, I've been always unsure about this patchset. I've
been thinking about it past few days and I think I now know why.

I can see that strategy like this being useful for timer, and
impressive power saving numbers BTW - we definitely want that. While
workqueue shares a lot of attributes with timers, there's one
important difference - scheduler is involved in work item executions.

Work item scheduling for per-cpu work items artificially removes the
choice of CPU from the scheduler with the assumption that such
restrictions would lead to lower overall overhead. There is another
variant of workqueue - WQ_UNBOUND - when such assumption wouldn't hold
too well and it would be best to give the scheduler full latitude
regarding scheduling work item execution including choice of CPU.

Now, this patchset, kinda sorta adds back CPU selection by scheduler
in an ad-hoc way, right? If we're gonna do that, why not just make it
unbound workqueues? Then, the scheduler would have full discretion
over them and we don't need to implement this separate ad-hoc thing on
the side. It's the roundaboutness that bothers me.

I'm not sure about other workqueues but the block one can be very
performance sensitive on machines with high IOPS and it would
definitely be advisable to keep it CPU-affine unless power consumption
is the overriding concern, so how about introducing another workqueue
flag, say WQ_UNBOUND_FOR_POWER (it's terrible, please come up with
something better), which is WQ_UNBOUND on configurations where this
matters and becomes noop if not?

Thanks.

--
tejun

2013-03-21 10:57:26

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On 21 March 2013 05:42, Tejun Heo <[email protected]> wrote:
> On Mon, Mar 18, 2013 at 08:53:25PM +0530, Viresh Kumar wrote:

> For some reason, I've been always unsure about this patchset. I've
> been thinking about it past few days and I think I now know why.

I knew this since the beginning and thought power numbers i showed might
change your view. My hard luck :)

> I can see that strategy like this being useful for timer, and
> impressive power saving numbers BTW - we definitely want that. While
> workqueue shares a lot of attributes with timers, there's one
> important difference - scheduler is involved in work item executions.

Yes, scheduler is involved for queuing works queued on a UNBOUND wq.

> Work item scheduling for per-cpu work items artificially removes the
> choice of CPU from the scheduler with the assumption that such
> restrictions would lead to lower overall overhead.

Yes.. We aren't touching them here and we can't :)

> There is another
> variant of workqueue - WQ_UNBOUND - when such assumption wouldn't hold
> too well and it would be best to give the scheduler full latitude
> regarding scheduling work item execution including choice of CPU.

I liked this point of yours and what you said is correct too. But there is a
difference here with our approach.

I here see two parallel solutions:

1. What we proposed:
- Add another wq api and used sched_select_non_idle_cpu() to choose
the right cpu
- Fix user drivers we care about

2. What you proposed:
- Fix user drivers we care about by using UNBOUNDED workqueues rather than
system_wq or other private wq's. And let the scheduler do everything. Right?


Now there are two things i am worried about.

A.
About the difference in behavior of scheduler and sched_select_non_idle_cpu().
Scheduler will treat this work as any normal task and can schedule it anywhere
based on what it feels is the right place and may still wake up an idle one for
load balancing.

In our approach, We aren't switching CPU for a work if that cpu (or system) is
busy enough, like in case of high IOPS for block layer. We will find that cpu as
busy, most of the time on such situations and so we will stay where we were.

In case system is fairly idle, then also we stay on the same cpu.
The only time we migrate is when current cpu is idle but the whole system
isn't.

B.
It is sort of difficult to teach users about BOUND and UNBOUND workqueues
and people have used them without thinking too much about them since some
time. We need a straightforward API to make this more transparent. And
queue_work_on_any_cpu() was a good candidate here. I am not talking about
its implementation but about the interface.


Though in both suggestions we need to fix user drivers one by one to use
queue_work_on_any_cpu() or use WQ_UNBOUND..

> Now, this patchset, kinda sorta adds back CPU selection by scheduler
> in an ad-hoc way, right?

Yes.

> If we're gonna do that, why not just make it
> unbound workqueues? Then, the scheduler would have full discretion
> over them and we don't need to implement this separate ad-hoc thing on
> the side. It's the roundaboutness that bothers me.

I had my own reasons as explained earlier.

> I'm not sure about other workqueues but the block one can be very
> performance sensitive on machines with high IOPS and it would
> definitely be advisable to keep it CPU-affine unless power consumption
> is the overriding concern, so how about introducing another workqueue
> flag, say WQ_UNBOUND_FOR_POWER (it's terrible, please come up with
> something better), which is WQ_UNBOUND on configurations where this
> matters and becomes noop if not?

Maybe people wouldn't suffer much because of the reasons i told you earlier.
On a busy system we will never switch cpus.

This static configuration might not work well with multiplatform images.

--
viresh

2013-03-21 18:29:45

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

Hey, Viresh.

On Thu, Mar 21, 2013 at 04:27:23PM +0530, Viresh Kumar wrote:
> > For some reason, I've been always unsure about this patchset. I've
> > been thinking about it past few days and I think I now know why.
>
> I knew this since the beginning and thought power numbers i showed might
> change your view. My hard luck :)

Heh, sorry about the trouble but, yeah, your numbers are impressive
and no matter how this discussion turns out, we're gonna have it in
some form, so please don't think that your numbers didn't change
anything. Actually, to me, it seems that identifying which workqueues
are good candidates for conversion and determining how much powersave
can be gained are 90% of the work.

> Now there are two things i am worried about.
>
> A.
> About the difference in behavior of scheduler and sched_select_non_idle_cpu().
> Scheduler will treat this work as any normal task and can schedule it anywhere
> based on what it feels is the right place and may still wake up an idle one for
> load balancing.
>
> In our approach, We aren't switching CPU for a work if that cpu (or system) is
> busy enough, like in case of high IOPS for block layer. We will find that cpu as
> busy, most of the time on such situations and so we will stay where we were.
>
> In case system is fairly idle, then also we stay on the same cpu.
> The only time we migrate is when current cpu is idle but the whole system
> isn't.

Yes, I actually like that part a lot although I do wish the idle check
was inlined.

What I'm wondering is whether the kinda out-of-band decision via
sched_select_cpu() is justified given that it can and is likely to go
through full scheduling decision anyway. For timer, we don't have
that, so it makes sense. For work items, it's a bit different.

To rephrase, *if* the scheduler can't already make proper decisions
regarding power consumption on an idlish system, maybe it can be
improved to do so? It could as well be that this CPU selection is
special enough that it's just better to keep it separate as this
patchset proposes. This is something up to the scheduler people.
Peter, Ingo, what do you guys think?

> B.
> It is sort of difficult to teach users about BOUND and UNBOUND workqueues
> and people have used them without thinking too much about them since some
> time. We need a straightforward API to make this more transparent. And
> queue_work_on_any_cpu() was a good candidate here. I am not talking about
> its implementation but about the interface.
>
> Though in both suggestions we need to fix user drivers one by one to use
> queue_work_on_any_cpu() or use WQ_UNBOUND..

I don't know. Workqueues are attribute domains and I kinda dislike
having a completely separate set of interface for this. Regardless of
how the backend gets implemented, I'd really much prefer it being
implemented as a workqueue attribute rather than another set of
queueing interface functions.

> Maybe people wouldn't suffer much because of the reasons i told you earlier.
> On a busy system we will never switch cpus.
>
> This static configuration might not work well with multiplatform images.

Well, it doesn't have to be a compile time thing. It can be
dynamically switched with, say, static_branch(), but I think that's a
minor point at this point.

Thanks.

--
tejun

2013-03-22 15:05:40

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH V3 6/7] block: queue work on any cpu

On Mon, Mar 18 2013, Viresh Kumar wrote:
> block layer uses workqueues for multiple purposes. There is no real dependency
> of scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces schedule_work() and queue_[delayed_]work() with
> queue_[delayed_]work_on_any_cpu() siblings.
>
> These routines would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.

What are the performance implications of this? From that perspective,
scheduling on a local core is the better choice. Hopefully it's already
running on the local socket of the device, keeping it on that node would
be preferable.

For the delayed functions, the timer is typically added on the current
CPU. It's hard to know what the state of idleness of CPUs would be when
the delay has expired. How are you handling this?

I can see the win from a power consumption point of view, but it quickly
gets a little more complicated than that.

--
Jens Axboe

2013-03-22 17:09:25

by Chris Ball

[permalink] [raw]
Subject: Re: [PATCH V3 5/7] mmc: queue work on any cpu

Hi,

On Mon, Mar 18 2013, Viresh Kumar wrote:
> mmc uses workqueues for running mmc_rescan(). There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the queue_delayed_work() with
> queue_delayed_work_on_any_cpu() siblings.
>
> This routine would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: Chris Ball <[email protected]>
> Cc: [email protected]
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> drivers/mmc/core/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9290bb5..adf331a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -85,7 +85,7 @@ MODULE_PARM_DESC(
> static int mmc_schedule_delayed_work(struct delayed_work *work,
> unsigned long delay)
> {
> - return queue_delayed_work(workqueue, work, delay);
> + return queue_delayed_work_on_any_cpu(workqueue, work, delay);
> }
>
> /*

Thanks, pushed to mmc-next for 3.10.

- Chris.
--
Chris Ball <[email protected]> <http://printf.net/>
One Laptop Per Child

2013-03-22 17:26:18

by Chris Ball

[permalink] [raw]
Subject: Re: [PATCH V3 5/7] mmc: queue work on any cpu

Hi,

On Mon, Mar 18 2013, Viresh Kumar wrote:
> mmc uses workqueues for running mmc_rescan(). There is no real dependency of
> scheduling these on the cpu which scheduled them.
>
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which isn't
> idle to save on power.
>
> By idle cpu (from scheduler's perspective) we mean:
> - Current task is idle task
> - nr_running == 0
> - wake_list is empty
>
> This patch replaces the queue_delayed_work() with
> queue_delayed_work_on_any_cpu() siblings.
>
> This routine would look for the closest (via scheduling domains) non-idle cpu
> (non-idle from schedulers perspective). If the current cpu is not idle or all
> cpus are idle, work will be scheduled on local cpu.
>
> Cc: Chris Ball <[email protected]>
> Cc: [email protected]
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> drivers/mmc/core/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 9290bb5..adf331a 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -85,7 +85,7 @@ MODULE_PARM_DESC(
> static int mmc_schedule_delayed_work(struct delayed_work *work,
> unsigned long delay)
> {
> - return queue_delayed_work(workqueue, work, delay);
> + return queue_delayed_work_on_any_cpu(workqueue, work, delay);
> }
>
> /*

/home/cjb/git/mmc/drivers/mmc/core/core.c: In function ‘mmc_schedule_delayed_work’:
/home/cjb/git/mmc/drivers/mmc/core/core.c:88:2: error: implicit declaration of function ‘queue_delayed_work_on_any_cpu’ [-Werror=implicit-function-declaration]

I've dropped this patch for now. This function doesn't seem to be
defined in linux-next either.

- Chris.
--
Chris Ball <[email protected]> <http://printf.net/>
One Laptop Per Child

2013-03-22 17:27:52

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 5/7] mmc: queue work on any cpu

On 22 March 2013 22:56, Chris Ball <[email protected]> wrote:
> On Mon, Mar 18 2013, Viresh Kumar wrote:
>
> /home/cjb/git/mmc/drivers/mmc/core/core.c: In function ?mmc_schedule_delayed_work?:
> /home/cjb/git/mmc/drivers/mmc/core/core.c:88:2: error: implicit declaration of function ?queue_delayed_work_on_any_cpu? [-Werror=implicit-function-declaration]
>
> I've dropped this patch for now. This function doesn't seem to be
> defined in linux-next either.

Hi chris,

This patch was part of a bigger patchset which also adds this API. I
don't want you to
apply this one but just Ack here. Probably Tejun or some scheduler
maintainer will
apply it later (if they like all patches).

2013-03-22 17:30:39

by Chris Ball

[permalink] [raw]
Subject: Re: [PATCH V3 5/7] mmc: queue work on any cpu

Hi,

On Fri, Mar 22 2013, Viresh Kumar wrote:
> On 22 March 2013 22:56, Chris Ball <[email protected]> wrote:
>> On Mon, Mar 18 2013, Viresh Kumar wrote:
>>
>> /home/cjb/git/mmc/drivers/mmc/core/core.c: In function
>> ‘mmc_schedule_delayed_work’:
>> /home/cjb/git/mmc/drivers/mmc/core/core.c:88:2: error: implicit
>> declaration of function ‘queue_delayed_work_on_any_cpu’
>> [-Werror=implicit-function-declaration]
>>
>> I've dropped this patch for now. This function doesn't seem to be
>> defined in linux-next either.
>
> Hi chris,
>
> This patch was part of a bigger patchset which also adds this API. I
> don't want you to
> apply this one but just Ack here. Probably Tejun or some scheduler
> maintainer will
> apply it later (if they like all patches).

Thanks, makes sense. For [5/7]:

Acked-by: Chris Ball <[email protected]>

- Chris.
--
Chris Ball <[email protected]> <http://printf.net/>
One Laptop Per Child

2013-03-23 06:44:47

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 6/7] block: queue work on any cpu

On 22 March 2013 20:35, Jens Axboe <[email protected]> wrote:

Hi Jens,

First of all, please have a look at:

1. https://lkml.org/lkml/2013/3/18/364

This is cover-letter of my mail where i have shown power savings achieved
with this patchset.

2. https://lkml.org/lkml/2013/3/21/172

This is the link for discussion i had with Tejun which answers some of these
questions.

> What are the performance implications of this? From that perspective,
> scheduling on a local core is the better choice. Hopefully it's already
> running on the local socket of the device, keeping it on that node would
> be preferable.

If the local cpu is busy or all cpus are idle then we don't migrate, and so
for performance shouldn't be affected with it.

> For the delayed functions, the timer is typically added on the current
> CPU.

This is the code that executes here:

if (unlikely(cpu != WORK_CPU_UNBOUND))
add_timer_on(timer, cpu);
else
add_timer(timer);

Clearly for queue_delayed_work() call we don't use local cpu but we use
the same sched_select_non_idle_cpu() routine to give us the right cpu.

> It's hard to know what the state of idleness of CPUs would be when
> the delay has expired. How are you handling this?

Correct, and we are not deciding that right now, but when the timer expires.

Thanks for your feedback.

2013-03-28 18:13:41

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On Thu, Mar 21, 2013 at 11:29:37AM -0700, Tejun Heo wrote:
> Yes, I actually like that part a lot although I do wish the idle check
> was inlined.
>
> What I'm wondering is whether the kinda out-of-band decision via
> sched_select_cpu() is justified given that it can and is likely to go
> through full scheduling decision anyway. For timer, we don't have
> that, so it makes sense. For work items, it's a bit different.
>
> To rephrase, *if* the scheduler can't already make proper decisions
> regarding power consumption on an idlish system, maybe it can be
> improved to do so? It could as well be that this CPU selection is
> special enough that it's just better to keep it separate as this
> patchset proposes. This is something up to the scheduler people.
> Peter, Ingo, what do you guys think?

Ping. Peter, Ingo?

Viresh, would it be difficult to make another measurement of the same
workload with the said workqueues converted to unbound? I think that
would at least provide a nice reference point.

Thanks.

--
tejun

2013-03-29 02:39:52

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On 28 March 2013 23:43, Tejun Heo <[email protected]> wrote:
> Ping. Peter, Ingo?

Thanks for your ping, even i was thinking of it since sometime :)

> Viresh, would it be difficult to make another measurement of the same
> workload with the said workqueues converted to unbound? I think that
> would at least provide a nice reference point.

I will try.

2013-03-29 07:27:26

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On 28 March 2013 23:43, Tejun Heo <[email protected]> wrote:
> Viresh, would it be difficult to make another measurement of the same
> workload with the said workqueues converted to unbound? I think that
> would at least provide a nice reference point.

My heart is shattered after getting the results :)


Cluster A15 Energy Cluster A7 Energy Total
------------------------- ----------------------- ------

Without this patchset (Energy in Joules):
---------------------------------------------------

0.151162 2.183545 2.334707
0.223730 2.687067 2.910797
0.289687 2.732702 3.022389
0.454198 2.745908 3.200106
0.495552 2.746465 3.242017

Average:
0.322866 2.619137 2.942003


With this patchset (Energy in Joules):
-----------------------------------------------

0.133361 2.267822 2.401183
0.260626 2.833389 3.094015
0.142365 2.277929 2.420294
0.246793 2.582550 2.829343
0.130462 2.257033 2.387495

Average:
0.182721 2.443745 2.626466


With WQ Unbound (Energy in Joules):
-----------------------------------------------

0.226421 2.283658 2.510079
0.151361 2.236656 2.388017
0.197726 2.249849 2.447575
0.221915 2.229446 2.451361
0.347098 2.257707 2.604805

Average:
0.2289042 2.2514632 2.4803674

PS: I have to add another generic workqueue for these tests:
system_unbound_freezable_wq as block layer was using
system_freezable_wq.

--
viresh

2013-03-29 17:40:12

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

Hello, Viresh.

On Fri, Mar 29, 2013 at 12:57:23PM +0530, Viresh Kumar wrote:
> On 28 March 2013 23:43, Tejun Heo <[email protected]> wrote:
> > Viresh, would it be difficult to make another measurement of the same
> > workload with the said workqueues converted to unbound? I think that
> > would at least provide a nice reference point.
>
> My heart is shattered after getting the results :)
>
>
> Cluster A15 Energy Cluster A7 Energy Total
> ------------------------- ----------------------- ------
>
> Without this patchset (Energy in Joules):
> ---------------------------------------------------
> 0.322866 2.619137 2.942003
>
>
> With this patchset (Energy in Joules):
> -----------------------------------------------
> 0.182721 2.443745 2.626466
>
>
> With WQ Unbound (Energy in Joules):
> -----------------------------------------------
> 0.2289042 2.2514632 2.4803674


So the scheduler does know what it's doing after all, which is a nice
news. Given the result, the best thing to do would be somehow marking
these workqueues as unbound for powersaving?

Thanks!

--
tejun

2013-03-29 17:56:54

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On Fri, Mar 29, 2013 at 10:40:03AM -0700, Tejun Heo wrote:
> So the scheduler does know what it's doing after all, which is a nice
> news. Given the result, the best thing to do would be somehow marking
> these workqueues as unbound for powersaving?

BTW, recent changes to workqueue means that it shouldn't be too
difficult to create a combined workqueue which distributes work items
to both per-cpu and unbound worker pools depending on cpu_idle(), but,
at least for now, I think we can make do with much coarser approach.
We can refine it afterwards.

Thanks.

--
tejun

2013-03-30 03:30:28

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V3 3/7] workqueue: Add helpers to schedule work on any cpu

On 29 March 2013 23:10, Tejun Heo <[email protected]> wrote:
> So the scheduler does know what it's doing after all, which is a nice
> news. Given the result, the best thing to do would be somehow marking
> these workqueues as unbound for powersaving?

Yes. I am going to do it soon.